Re: [OSM-dev] Compression types in PBF Format
On Wed, Dec 1, 2010 at 4:42 PM, Anthony o...@inbox.org wrote: On Wed, Dec 1, 2010 at 10:35 AM, Peter Körner osm-li...@mazdermind.de wrote: Am 30.11.2010 23:44, schrieb Anthony: On Tue, Nov 30, 2010 at 5:19 PM, Matt Amoszerebub...@gmail.com wrote: because XML is a nearly human-readable, easy to explain and inspect format. Except when you don't include any line feeds :). What can be solved with a perl one-liner (well, everything can be solved with a perl one-liner ^^) What's the perl one-liner? I wound up writing a C program. (Now that I think about it, I guess I'd just have to set $/ to '', right?) This has been asked on IRC as well, interesting that so many people find the need to insert line feeds into a file. Perhaps it's better to include them? -- /emj ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Fri, Dec 3, 2010 at 4:57 AM, Erik Johansson e...@kth.se wrote: On Wed, Dec 1, 2010 at 4:42 PM, Anthony o...@inbox.org wrote: What's the perl one-liner? I wound up writing a C program. (Now that I think about it, I guess I'd just have to set $/ to '', right?) This has been asked on IRC as well, interesting that so many people find the need to insert line feeds into a file. Perhaps it's better to include them? Definitely. What's the point of using an incredibly bloated format which is difficult to parse by machines, and then omitting line feeds giving you a space savings of what, a tenth of a percent of the compressed file? AFAIK it's only the full history files which exhibit this nasty trait. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, Dec 1, 2010 at 4:42 PM, Anthony o...@inbox.org wrote: What's the perl one-liner? perl -0076 -pe '$\=\n' filename seems to work. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, Nov 30, 2010 at 11:03 PM, Anthony o...@inbox.org wrote: On Tue, Nov 30, 2010 at 8:29 PM, Stefan de Konink ste...@konink.de wrote: If any of gzip/bzip2/lzma in the general give better compression ratio's (20% smaller), then this compression scheme should become the default format. Depends on the performance. If all you want is max compression without regard to performance, you're almost surely better off using raw and then compressing the entire file with LZMA (e.g. 7zip or xz). LZMA vs. zlib actually makes less of a difference than I thought it would: -rw-r--r-- 1 a a 103M 2010-12-01 08:07 florida.osm.bz2 -rw-r--r-- 1 a a 129M 2010-12-01 08:32 florida.osm.gz -rw-r--r-- 1 a a 74M 2010-12-01 08:19 florida.osm.pbf -rw-r--r-- 1 a a 169M 2010-12-01 08:15 florida.osm.rawpbf -rw-r--r-- 1 a a 62M 2010-12-01 08:15 florida.osm.rawpbf.xz -rw-r--r-- 1 a a 86M 2010-11-25 11:29 florida.osm.xz I suspect it would make *much more difference* when it comes to the full history .osm, though. Does PBF support full history files? Does Osmosis? ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, 1 Dec 2010, Anthony wrote: On Tue, Nov 30, 2010 at 11:03 PM, Anthony o...@inbox.org wrote: On Tue, Nov 30, 2010 at 8:29 PM, Stefan de Konink ste...@konink.de wrote: If any of gzip/bzip2/lzma in the general give better compression ratio's (20% smaller), then this compression scheme should become the default format. Depends on the performance. If all you want is max compression without regard to performance, you're almost surely better off using raw and then compressing the entire file with LZMA (e.g. 7zip or xz). LZMA vs. zlib actually makes less of a difference than I thought it would: -rw-r--r-- 1 a a 103M 2010-12-01 08:07 florida.osm.bz2 -rw-r--r-- 1 a a 129M 2010-12-01 08:32 florida.osm.gz -rw-r--r-- 1 a a 74M 2010-12-01 08:19 florida.osm.pbf -rw-r--r-- 1 a a 169M 2010-12-01 08:15 florida.osm.rawpbf -rw-r--r-- 1 a a 62M 2010-12-01 08:15 florida.osm.rawpbf.xz -rw-r--r-- 1 a a 86M 2010-11-25 11:29 florida.osm.xz I suspect it would make *much more difference* when it comes to the full history .osm, though. Does PBF support full history files? Does Osmosis? Did you benchmark what pbf + lzma did or did you embed lzma in osmosis? Stefan___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, Dec 1, 2010 at 9:28 AM, Stefan de Konink ste...@konink.de wrote: On Wed, 1 Dec 2010, Anthony wrote: LZMA vs. zlib actually makes less of a difference than I thought it would: -rw-r--r-- 1 a a 103M 2010-12-01 08:07 florida.osm.bz2 -rw-r--r-- 1 a a 129M 2010-12-01 08:32 florida.osm.gz -rw-r--r-- 1 a a 74M 2010-12-01 08:19 florida.osm.pbf -rw-r--r-- 1 a a 169M 2010-12-01 08:15 florida.osm.rawpbf -rw-r--r-- 1 a a 62M 2010-12-01 08:15 florida.osm.rawpbf.xz -rw-r--r-- 1 a a 86M 2010-11-25 11:29 florida.osm.xz I suspect it would make *much more difference* when it comes to the full history .osm, though. Does PBF support full history files? Does Osmosis? Did you benchmark what pbf + lzma did or did you embed lzma in osmosis? xz uses lzma. I made an uncompressed pbf file (florida.osm.rawpbf) and then compressed it with xz (florida.osm.rawpbf.xz). This isn't the same as making a pbf file which uses lzma, but it should be a good approximation of the compression achievable by embedding lzma in the pbf. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, 1 Dec 2010, Anthony wrote: Did you benchmark what pbf + lzma did or did you embed lzma in osmosis? xz uses lzma. I made an uncompressed pbf file (florida.osm.rawpbf) and then compressed it with xz (florida.osm.rawpbf.xz). This isn't the same as making a pbf file which uses lzma, but it should be a good approximation of the compression achievable by embedding lzma in the pbf. Yeah, but your lead basically shows we are talking about more than 10%... ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
Am 30.11.2010 23:44, schrieb Anthony: On Tue, Nov 30, 2010 at 5:19 PM, Matt Amoszerebub...@gmail.com wrote: On Tue, Nov 30, 2010 at 8:41 PM, Stefan de Koninkste...@konink.de wrote: And if we can change the API every week, I wonder why we are still at XML then. because XML is a nearly human-readable, easy to explain and inspect format. Except when you don't include any line feeds :). What can be solved with a perl one-liner (well, everything can be solved with a perl one-liner ^^) Peter ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, Dec 1, 2010 at 10:24 AM, Stefan de Konink ste...@konink.de wrote: On Wed, 1 Dec 2010, Anthony wrote: Did you benchmark what pbf + lzma did or did you embed lzma in osmosis? xz uses lzma. I made an uncompressed pbf file (florida.osm.rawpbf) and then compressed it with xz (florida.osm.rawpbf.xz). This isn't the same as making a pbf file which uses lzma, but it should be a good approximation of the compression achievable by embedding lzma in the pbf. Yeah, but your lead basically shows we are talking about more than 10%... Yeah, probably, but at the expense of more complicated code, greater memory usage, etc. I'm interested now in seeing how the full history compression goes, though. If it can achieve 70, 80, 90% on top of zlib, then it might be worth embedding the compression as opposed to just using it for transfer over the Internet. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, 1 Dec 2010, Anthony wrote: Yeah, but your lead basically shows we are talking about more than 10%... Yeah, probably, but at the expense of more complicated code, greater memory usage, etc. The hole process is IO-bound... memory is used anyway to overcome the IO issues... I'm interested now in seeing how the full history compression goes, though. If it can achieve 70, 80, 90% on top of zlib, then it might be worth embedding the compression as opposed to just using it for transfer over the Internet. The dictionary is compressed per block, so it greatly depends if the trick works. Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, Dec 1, 2010 at 10:47 AM, Stefan de Konink ste...@konink.de wrote: On Wed, 1 Dec 2010, Anthony wrote: Yeah, but your lead basically shows we are talking about more than 10%... Yeah, probably, but at the expense of more complicated code, greater memory usage, etc. The hole process is IO-bound... memory is used anyway to overcome the IO issues... Not in an embedded system, which is where a small difference like 10% is going to matter. I'm interested now in seeing how the full history compression goes, though. If it can achieve 70, 80, 90% on top of zlib, then it might be worth embedding the compression as opposed to just using it for transfer over the Internet. The dictionary is compressed per block, so it greatly depends if the trick works. 32 megs is a lot better than 900K, though. 900K is how much zlib uses, right? ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, 1 Dec 2010, Anthony wrote: Not in an embedded system, which is where a small difference like 10% is going to matter. Please elaborate? Either the memory is used for a block cache or for the program. I'm interested now in seeing how the full history compression goes, though. If it can achieve 70, 80, 90% on top of zlib, then it might be worth embedding the compression as opposed to just using it for transfer over the Internet. The dictionary is compressed per block, so it greatly depends if the trick works. 32 megs is a lot better than 900K, though. 900K is how much zlib uses, right? I don't get your point here, what do you mean? Do you mean that the memory requirements for zlib is lower? Because don't forget that the extracted piece is kept in memory + the deserialised version. Which is basically much bigger right? Stefan___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Op 01-12-10 17:30, Anthony schreef: Anyway, I'm probably completely wrong about this. Sorry. I guess the fastest way to verify all this is someone that adds the LZMA and BZ2 library to java and check in osmosis. Your numbers give me the impression that it is worth to pursue different compression strategies. Stefan -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEAREKAAYFAkz2gGcACgkQYH1+F2Rqwn1tLACfbuO+z3uLarrQ/BUUkkmHsfvX 2mIAoIlrEHucqWkmz6DV8z+9OkSDT3kf =5p2d -END PGP SIGNATURE- ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, Dec 1, 2010 at 9:50 AM, Anthony o...@inbox.org wrote: On Wed, Dec 1, 2010 at 10:47 AM, Stefan de Konink ste...@konink.de wrote: On Wed, 1 Dec 2010, Anthony wrote: Yeah, but your lead basically shows we are talking about more than 10%... Yeah, probably, but at the expense of more complicated code, greater memory usage, etc. The hole process is IO-bound... memory is used anyway to overcome the IO issues... Not in an embedded system, which is where a small difference like 10% is going to matter. CPU's are going to get faster, for free. Developer time, especially OSM developer time is severely limited. The community is better served by having them doing new stuff than coding an overcomplicated format. I'm interested now in seeing how the full history compression goes, though. If it can achieve 70, 80, 90% on top of zlib, then it might be worth embedding the compression as opposed to just using it for transfer over the Internet. The dictionary is compressed per block, so it greatly depends if the trick works. 32 megs is a lot better than 900K, though. 900K is how much zlib uses, right? Each fileblock is independently decodable, which means that I have to reset the dictionary for each fileblock. There are around 100k fileblocks in the planet, and 13gb uncompressed, which means that the average fileblock has 130kb of data. gzip has a 32kb or 64kb (?) window, smaller than the number of bytes in the fileblock. bzip2 has a window that is 900kb, and LZMA is megabytes but lzma's multimegabyte window doesn't matter, because the compressor is restarted for each fileblock, every few hundred kilobytes. The 15% gain you measured between .rawpbf.xz and .pbf really lets lzma cheat too much, because it can exploit a window tens of times larger than it would if integrated. Could you run your test on a whole planet, or a hack-integration of LZMA into osmosis? Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Wed, Dec 1, 2010 at 12:40 PM, Scott Crosby scro...@cs.rice.edu wrote: The 15% gain you measured between .rawpbf.xz and .pbf really lets lzma cheat too much, because it can exploit a window tens of times larger than it would if integrated. I'm not sure how much that mattered. xz -3, which I believe uses a 1 megabyte window, still compresses to 63M. Could you run your test on a whole planet, Not today. I'm due to receive my new hard drives today, which I bought last friday, so my filesystem is in complete disarray. I'm not even sure which drive/partition I have my whole planet file on, at the moment. I'm sure whatever partition it is, it isn't currently mounted. or a hack-integration of LZMA into osmosis? That I can definitively say I'm not going to do. I'd sooner reverse engineer the pbf format in C than mess around with the code of osmosis. It took me long enough just to figure out how to install the right jre to get osmosis to run. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, 30 Nov 2010, Jochen Topf wrote: The PBF format supports three compression types: zlib, lzma, and bzip2. Do we have to support all of them? What is the currently existing software using? IMHO it would make more sense to just define one and stick with it. Easier to implement for everybody, less reliance on external libs. Why this mentality? It is trivial to implement a decompression algorithm and some work better than others. Sounds like complaining about stuff you don't have to care about. Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
Hallo! Why this mentality? It is trivial to implement a decompression algorithm and some work better than others. Sounds like complaining about stuff you don't have to care about. I would not implement decompression myself, I have better things to do. Thus I would a library for this. A library however is a dependency, that must be build installed and delivered (not all the world runs Linux with smart packaging systems) and licences have to be checked. It makes sense for teh developer to try to reduce such dependencies by agreeing on one standard compression format. -- Gruß... Tim ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, 30 Nov 2010, Tim Teulings wrote: Hallo! Why this mentality? It is trivial to implement a decompression algorithm and some work better than others. Sounds like complaining about stuff you don't have to care about. I would not implement decompression myself, I have better things to do. Thus I would a library for this. A library however is a dependency, that must be build installed and delivered (not all the world runs Linux with smart packaging systems) and licences have to be checked. It makes sense for teh developer to try to reduce such dependencies by agreeing on one standard compression format. Since all program interfaces virtually equal gzip, and it gives perfect extendability. The choose for supporting the 3 most used compression schemes is perfectly sound. Stop wining about code that either you do not write, or didn't care about before. There was this great moment when the bitstream was defined, and absolutely nobody cared until people started to write pbf code. Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, Nov 30, 2010 at 2:21 AM, Jochen Topf joc...@remote.org wrote: The PBF format supports three compression types: zlib, lzma, and bzip2. Do we have to support all of them? What is the currently existing software using? Good question. I think that the bzip2 compression option is useless. Too slow, especially on the decompression side. I'm not sure what to do about LZMA. It offers higher compression ratio's at little loss in decompression speed. The catch is that while everything supports deflate, LZMA decompressor support is a lot less widespread. It might be a valuable future option, but it is also untested. To my knowledge, Osmosis has the only implementation of a PBF writer, it only uses uncompressed and zlib? Has anyone else implemented a writer? If nobody else has their own writer, then would anyone object to me unilaterally removing (not depreciating) bzip2 entirely, and disabling/marking lzma as a proposed future extension? If tests in the future show that LZMA offers significant size decreases, it can be enabled and support can be added. IMHO it would make more sense to just define one and stick with it. Easier to implement for everybody, less reliance on external libs. Agreed. Insofar that a reader has to support every permutation of the format, reducing those permutations is important. I think LZMA has something to offer, but it needs to be tested first. Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
Stefan, Stefan de Konink wrote: This is the place for the 'too little, too late'. We are beyond the point of 'what' the bitstream should look like: you ought to handle what is defined now. This is not how we work in OSM. We don't have standards. We can change stuff at any time, and indeed I would not hesitate for a second to change something in the PBF format if it turns out to repair a design problem or bring great benefit. (If it were my call which it isn't.) I really don't like your attitude. It's great that you took the time to write pbf2osm but it seems you expect to be revered for it. You give the impression of someone for whom coding something is only a means to climb onto a platform from where he can heap spite onto others. (I remember you derogatory comments about C++ while you wrote pbf2osm, and putting comments like osmosis devs failed to read the specs in one's code is not exactly a sign of maturity either.) Then you probably also noticed that it is still a (huge) open question to write a regression testsuite for all parsers and generators. And since the general opinion is now that nobody wants to move until there is a second implementation of osm2pbf (instead of actually switching), everyone is waiting this greatly annoys me and probably not only me but also the guy that actually took great effort to define the protocol and review code of others and answer questions. What exactly is your problem? PBF is alive and kicking. I'm using both Osmosis PBF support and your implementation of pbf2osm on a daily basis, and many downstream users of Geofabrik do the same. I find it totally respectless that *you* are now doubting his qualities but didn't step forward when feedback was asked. Excuse me, but discussing potential problems of a design is not a show of lack of respect - unless presented in a form like the aforementioned osmosis devs failed to read the specs. Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, Nov 30, 2010 at 10:53:16AM -0600, Scott Crosby wrote: On Tue, Nov 30, 2010 at 2:21 AM, Jochen Topf joc...@remote.org wrote: The PBF format supports three compression types: zlib, lzma, and bzip2. Do we have to support all of them? What is the currently existing software using? Good question. I think that the bzip2 compression option is useless. Too slow, especially on the decompression side. I'm not sure what to do about LZMA. It offers higher compression ratio's at little loss in decompression speed. The catch is that while everything supports deflate, LZMA decompressor support is a lot less widespread. It might be a valuable future option, but it is also untested. To my knowledge, Osmosis has the only implementation of a PBF writer, it only uses uncompressed and zlib? Has anyone else implemented a writer? If nobody else has their own writer, then would anyone object to me unilaterally removing (not depreciating) bzip2 entirely, and disabling/marking lzma as a proposed future extension? If tests in the future show that LZMA offers significant size decreases, it can be enabled and support can be added. +1 Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hi, Op 30-11-10 20:49, Frederik Ramm schreef: Stefan de Konink wrote: This is the place for the 'too little, too late'. We are beyond the point of 'what' the bitstream should look like: you ought to handle what is defined now. This is not how we work in OSM. We don't have standards. For some reason we do, this is not a free form fight. And if we can change the API every week, I wonder why we are still at XML then. We can change stuff at any time, and indeed I would not hesitate for a second to change something in the PBF format if it turns out to repair a design problem or bring great benefit. (If it were my call which it isn't.) The only reason your friend/collegue Jochen started to ask about it is because he found it difficult to implement 4 ways to encode/decode the data, which are in principle the same. So what that your tool doesn't support a specific extension? If that compression is often used, who are you fooling? Are you suddenly caring about linking -lbz2? I really don't like your attitude. It's great that you took the time to write pbf2osm but it seems you expect to be revered for it. You give the impression of someone for whom coding something is only a means to climb onto a platform from where he can heap spite onto others. (I remember you derogatory comments about C++ while you wrote pbf2osm, and putting comments like osmosis devs failed to read the specs in one's code is not exactly a sign of maturity either.) Whats your point? I also wrote the entire API 0.5 (R/W) and XAPI in a C extention to a webserver. Ab-so-lu-te-ly nobody cares what I (or probably anyone else) writes here, it was interesting that after 2 weeks of publication Lennard came up with some detail that everyone who would have checked the output could have come up with after the first day the code was published here. My point is pretty clear, you want the threat PBF as something that is in flux, I observe that feedback was requested and (virtually) nobody cared. Protocolbuffers is something that can be extended. If someone would actually CARE baout removing certain compression techniques he would benchmark the compressionalgorithms on the data presented and not start in a: I do care that it seems I am writing code that might never be used. ...so all code of Jochen should be used now? Get real. So exactly what Scott suggest: why does nobody step in then, write code that nobody uses afterwards and present a proper benchmark to show that bzip/gzip/lzma is useless? Then you probably also noticed that it is still a (huge) open question to write a regression testsuite for all parsers and generators. And since the general opinion is now that nobody wants to move until there is a second implementation of osm2pbf (instead of actually switching), everyone is waiting this greatly annoys me and probably not only me but also the guy that actually took great effort to define the protocol and review code of others and answer questions. What exactly is your problem? PBF is alive and kicking. I'm using both Osmosis PBF support and your implementation of pbf2osm on a daily basis, and many downstream users of Geofabrik do the same. This is my problem: http://planet.openstreetmap.org/ And the fact that protocol buffers probably would make the API far more efficient. I find it totally respectless that *you* are now doubting his qualities but didn't step forward when feedback was asked. Excuse me, but discussing potential problems of a design is not a show of lack of respect - unless presented in a form like the aforementioned osmosis devs failed to read the specs. Oh dear, so because I actually feedbacked on Scott and asked questions, and verified my code and implemented the specs I cannot complain osmosis didn't? Sounds like we cannot bash IE6 anymore because it did an effort to implement HTML rendering... Why does this subject get me so angry? Because the request shows lazyness and not an effort te show that something is useless because the compression algorithm are not suited. Stefan -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEAREKAAYFAkz1YXQACgkQYH1+F2Rqwn2+UwCglRWja5rs5jYs4iFp9C/PgJuE Vw8An01ZXFsY6XFcFhEDDC9NP4B705W6 =l28+ -END PGP SIGNATURE- ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, Nov 30, 2010 at 8:41 PM, Stefan de Konink ste...@konink.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hi, Op 30-11-10 20:49, Frederik Ramm schreef: Stefan de Konink wrote: This is the place for the 'too little, too late'. We are beyond the point of 'what' the bitstream should look like: you ought to handle what is defined now. This is not how we work in OSM. We don't have standards. For some reason we do, this is not a free form fight. And if we can change the API every week, I wonder why we are still at XML then. because XML is a nearly human-readable, easy to explain and inspect format. the same cannot be said of the PBF format, but then the declared design goals of it were reduction in parsing time and file size, not readability - and it achieves those goals superbly. however, i think using it in the API wouldn't provide enough of a speedup, limited by Amdahl's law, to offset the loss of those other benefits. cheers, matt ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, Nov 30, 2010 at 5:19 PM, Matt Amos zerebub...@gmail.com wrote: On Tue, Nov 30, 2010 at 8:41 PM, Stefan de Konink ste...@konink.de wrote: And if we can change the API every week, I wonder why we are still at XML then. because XML is a nearly human-readable, easy to explain and inspect format. Except when you don't include any line feeds :). ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, Nov 30, 2010 at 2:41 PM, Stefan de Konink ste...@konink.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 ...so all code of Jochen should be used now? Get real. So exactly what Scott suggest: why does nobody step in then, write code that nobody uses afterwards and present a proper benchmark to show that bzip/gzip/lzma is useless? The real question is does supporting bzip2/lzma offer advantages that are commensurate with the added implementation complexity, not just in pbf2osm but in every other reader too. Would you be willing to run an experiment with LZMA? If it shaves a gigabyte off of the planet, then I'd say its worth further consideration; if it shaves 100MB, then its not. Make a case for why it should be included. Excuse me, but discussing potential problems of a design is not a show of lack of respect - unless presented in a form like the aforementioned osmosis devs failed to read the specs. Oh dear, so because I actually feedbacked on Scott and asked questions, and verified my code and implemented the specs I cannot complain osmosis didn't? You do realize that *I* designed the format AND wrote the spec AND wrote the osmosis reference implementation? That means that if there are any errors or omissions in that implementation or spec, they are my mistakes. If there is an ambiguity, then I have made the call as to what is right. If there are any differences between the spec, reference implementation, and the conceptual design, I'm the one resolving the conflict and determining the best way to fix the issue. I do appreciate you finding the bugs and ambiguities in the spec by being the first independent implementation, and I hope you will consider running the LZMA experiment, but you have been rude and insulting. Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hi Scott, Op 01-12-10 00:41, Scott Crosby schreef: The real question is does supporting bzip2/lzma offer advantages that are commensurate with the added implementation complexity, not just in pbf2osm but in every other reader too. If any of gzip/bzip2/lzma in the general give better compression ratio's (20% smaller), then this compression scheme should become the default format. Since (sadly) PBF goes into an 'archival' format opposed to a wire format. Would you be willing to run an experiment with LZMA? If it shaves a gigabyte off of the planet, then I'd say its worth further consideration; if it shaves 100MB, then its not. Make a case for why it should be included. I completely agree. But experimenting with LZMA means first a osm2pbf that supports LZMA. And currently I feel that the only 'true' tool that should do something like this should be named pgsql2pbf. I honestly cannot find a single reason why it would be good to use the XML as intermediate format, except for legacy support. Excuse me, but discussing potential problems of a design is not a show of lack of respect - unless presented in a form like the aforementioned osmosis devs failed to read the specs. Oh dear, so because I actually feedbacked on Scott and asked questions, and verified my code and implemented the specs I cannot complain osmosis didn't? You do realize that *I* designed the format AND wrote the spec AND wrote the osmosis reference implementation? No, I didn't. But my archive also states that James Michael DuPont also published his OSM-Osmosis version. indepth And for the reader; that only was presented my flame to the osmosis implementation; Out of the blue the OSMOSIS implementation started to introduce -1 userid's, this is in no place documented, neither is it a default at present to represent past anonymous edits with a negative userid. Especially since at that time the uid's couldn't be negative (by spec) and the format specifies 'has_uid'. /indepth That means that if there are any errors or omissions in that implementation or spec, they are my mistakes. If there is an ambiguity, then I have made the call as to what is right. If there are any differences between the spec, reference implementation, and the conceptual design, I'm the one resolving the conflict and determining the best way to fix the issue. Since the current osmformat.proto still has a int32 for a uid, which is in fact always positive number in the openstreetmap database, the problem has been reported before. Would be obvious to haven't defined it at all in message Info and use 0 in DenseInfo. I do appreciate you finding the bugs and ambiguities in the spec by being the first independent implementation, and I hope you will consider running the LZMA experiment, but you have been rude and insulting. Basically you are asking me to run tests that Jochen should have come up with to prove that your specification of multiple compression formats sucked. I find this insulting. I think your choice is sound, and if a tool doesn't implement compression scheme X, then just inform the user. And if you found my comment in the code rude and/or insulting, I would have expected an email of you in private about two months ago, because honestly something by-far more rude was written there. But again, nobody seems to care what happens here or what is written. It is not strange that a flamewar over a format starts seven months after initial publication or that a pointing fingers at code starts about two months after the publication of it. I do find it interesting someone actually bothered to read the code, sadly I cannot speak about any broad collaboration. Stefan -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEAREKAAYFAkz1pP4ACgkQYH1+F2Rqwn2JHQCbBbYJN0EiYFCgtF2bQCP+CsVm MA8AnjrA8bV/Tk8JE9KnqB78xwm6ma+b =X7fJ -END PGP SIGNATURE- ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, Nov 30, 2010 at 7:29 PM, Stefan de Konink ste...@konink.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hi Scott, Op 01-12-10 00:41, Scott Crosby schreef: The real question is does supporting bzip2/lzma offer advantages that are commensurate with the added implementation complexity, not just in pbf2osm but in every other reader too. If any of gzip/bzip2/lzma in the general give better compression ratio's (20% smaller), then this compression scheme should become the default format. Since (sadly) PBF goes into an 'archival' format opposed to a wire format. I don't see anything in principal that keeps pbf from being a wire format. Would you be willing to run an experiment with LZMA? If it shaves a gigabyte off of the planet, then I'd say its worth further consideration; if it shaves 100MB, then its not. Make a case for why it should be included. I completely agree. But experimenting with LZMA means first a osm2pbf that supports LZMA. Or, hacking it into osmosis, which has the rest of the code already written. And currently I feel that the only 'true' tool that should do something like this should be named pgsql2pbf. I expect it will be written eventually. The planet has doubled in size in the last year. I honestly cannot find a single reason why it would be good to use the XML as intermediate format, except for legacy support. It is human readible and much more corruption resistant. (Well, XML is corruption resistant, bzipped XML is too. gzipped XML isn't, unless made with the option --rsyncable.). It makes a much more secure archival format. indepth And for the reader; that only was presented my flame to the osmosis implementation; Out of the blue the OSMOSIS implementation started to introduce -1 userid's, this is in no place documented, neither is it a default at present to represent past anonymous edits with a negative userid. Actually, osmosis used OsmUser.NONE to represent those anonymous edits. The problem is how to represent those within the limitations of the PBF format (see below) Especially since at that time the uid's couldn't be negative (by spec) and the format specifies 'has_uid'. DenseInfo doesn't have a has_uid method to check, as it delta-encodes uid's. /indepth Since the current osmformat.proto still has a int32 for a uid, which is in fact always positive number in the openstreetmap database, To be totally pedantic, the domain of UID's is either {set of all nonnegative integers} + 'NULL'. OR {set of all positive integers} + 'NULL'. This is an edge case, which you properly identified and we came up with a resolution. There's no 'right' fix, unfortunately, within the limitations of the PBF format. The problem is that the PBF format cannot express NULL, meaning no such user. Unless all metadata is stripped, It must encode a UID *number*. Not knowing which of the domains applied, or if UID's can be negative in legitimate circumstances, I took the easy way out. I didn't need to care what the domain of UID's was, I just used whatever integer osmosis returned when calling OsmUser.getUID(), which happens to be -1 for OsmUser.NONE. My mistake was assuming that this mapping was universal in the rest of the OSM stack. the problem has been reported before. Would be obvious to haven't defined it at all in message Info and use 0 in DenseInfo. I do appreciate you finding the bugs and ambiguities in the spec by being the first independent implementation, and I hope you will consider running the LZMA experiment, but you have been rude and insulting. Basically you are asking me to run tests that Jochen should have come up with to prove that your specification of multiple compression formats sucked. I viewed it differently, He wanted to know if the specification needed to be that complicated, to which I have to admit that I did not know. That is a legitimate question. The essence of a good design is simplicity. Each feature should have a reason for being there, a reason strong enough to warrant being included. Does LZMA meet that burden of proof? Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Compression types in PBF Format
On Tue, Nov 30, 2010 at 8:29 PM, Stefan de Konink ste...@konink.de wrote: If any of gzip/bzip2/lzma in the general give better compression ratio's (20% smaller), then this compression scheme should become the default format. Depends on the performance. If all you want is max compression without regard to performance, you're almost surely better off using raw and then compressing the entire file with LZMA (e.g. 7zip or xz). ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev