[OSM-dev] Namefinder replacements (was: Re: implemented API for J2SE and J2ME)
On Wed, Apr 22, 2009 at 07:41:56PM +0200, Frederik Ramm wrote: The status is, basically, that everything is vapourware except the Geocoder in OpenRouteService which is closed-source due to ongoing academic activity. Well, vapourware is a bit hard, but at least my own implementation (see the archive) would indeed require some work (mainly the frontend code - the timing critical backend part is done) to be of actual use. One reason I didn't finish the project is that I never got any answer to exactly under which circumstances the current namefinder is slow - during my own tests, it was reasonably fast. Without knowing that I cannot test whether my own implementation is actually performing better under comparable conditions, so replacing the current one doesn't make much sense. The code is available under the terms of GPL v2 if anyone would like to finish or reuse it. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Export TMC-Event-Table from PDF to CSV
On Fri, Apr 03, 2009 at 10:16:29AM +0200, marcus.wolsc...@googlemail.com wrote: With 1590 lines it would be quite a task to convert that into something computer-readable using regexp only. 1590 is the highest line number in Event List_DE_30.xls (from BASt) as well, perhaps that document is sufficient? Does anyone have the TMC-events in a machine-readable format? What's the license of the TMC events specification? Is it allowed to convert and distribute it at all? CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Errors while svn up
On Sun, Mar 08, 2009 at 07:56:15PM +, Tom Hughes wrote: I think mod_evasive may be the problem, so I've disabled it for now. It has been on for a while now though so I'm not sure why it has started causing problems now. Maybe SVN content got large enough to exceed the hash table? From [1]: Detection is performed by creating an internal dynamic hash table of IP Addresses and URIs, and denying any single IP address from any of the following: * Requesting the same page more than a few times per second Maybe the server got faster or you reduced the number of Apache child processes so a single one would handle more requests from the same originator? * Making more than 50 concurrent requests on the same child per second [1] http://www.zdziarski.com/projects/mod_evasive/ CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] openstreetmap could start at user's approximate location using geo-ip
On Sun, Jan 18, 2009 at 03:34:14PM +, Tom Hughes wrote: It uses the cookie first, then your home location (assuming you're logged in), then GeoIP via hostip.info, and finally defaults to Europe. Take a look at www.hostip.info if you want to see where it thinks you are. hostip.info locates me at Stuttgart, Germany, but OSM shows Europe centered on UK. Cookies, cache and proxy disabled for testing. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [josm-dev] JOSM Fail
On Fri, Jan 16, 2009 at 01:08:35PM +0100, Pieren wrote: +1 for making this harder to activate (separate mode, ctrl+drag, whatever). It's pretty annoying when trying to select a region in a densely mapped area. -1 I'm moving ways. I cannot say every day but I move ways when I: - improve roundabouts (position, circle) - draw dual carriageways (copy, paste, reverse, move) - draw areas like buildings or parks How would e.g. having to press Ctrl in addition to dragging impact your workflow? I don't see why you're opposed to making it harder to _unintentionally_ using it. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ josm-dev mailing list josm-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/josm-dev
Re: [josm-dev] JOSM Fail
On Fri, Jan 16, 2009 at 08:49:45PM +0100, Rolf Bode-Meyer wrote: This isn't about newcomers vs. experienced users. I _am_ experienced and during my normal work I _often_ move objects when I want to select something instead, just because JOSM binds two separate actions to the same input. Huh? What two separate actions? Region Select and Moving. Just depending on where exactly between two ways the pointer is, JOSM does different things. If it's too close to one of them, it will move it, instead of starting to select a region. If ways are too close together (remember: densely mapped area), it's impossible to do a region select. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ josm-dev mailing list josm-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/josm-dev
Re: [josm-dev] JOSM Mappaint - major improvements
On Tue, Jan 13, 2009 at 12:58:38PM +0100, Rolf Bode-Meyer wrote: What I don't understand - with your change and before - is that paint speed seems to depend on the amount of data in the layer even if it's outside the view. That's to be expected. Even with a spatial index (AFAIK josm still doesn't use one yet), looking up the objects inside a give bounding box (the view) is dependant on the total number of objects. For example, a 2D-PR-Tree lookup is about O(sqrt(n)) [1], with n being the total number of objects stored in the tree. If some genius invents a spatial index with lookup in O(1), I'm sure we'll hear about it. :) [1] http://www.cse.ust.hk/~yike/prtree/pr.pdf CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ josm-dev mailing list josm-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/josm-dev
Re: [OSM-dev] Anyone with a speedy gazetteer
On Mon, Jan 12, 2009 at 10:57:24AM +0100, Erik Johansson wrote: I have two questions; why is gazetteer.openstreetmap.org so slow at 20-50 seconds per request, I remember David Earl mentioning a scheduled database rebuild. Perhaps its currently running. Otherwise, I'd really like to know the reasons (so I can test my own implementation accordingly). and if anyone has code for faster variants of name finders? Speed is essential.. I have done an alternative, but suspended its development because a) the current Namefinder was sufficiently fast when I tried to compare them = don't know under which conditions it's slow, so I cannot test whether my implementation really is faster in these cases; and b) because I haven't had enough time to finish it. The core (written in C/C++) is almost finished (only some minor changes for name canonicalization needed). What's really missing is the front-end part, i.e. the web server and search plan generation. A Python module implementing the protocol between Core and Frontend is included, so someone else could start implementing it right away. :) To give some funny, unscientific numbers: General conditions: planet-080813.osm.gz, Athlon 64 BE-2300 dual-core 1.9Ghz, 4GB DDR2-800, Limit 100 ways + 100 nodes per step Data base size: ~8GB import time: 206 minutes startup time: 41 seconds RAM usage after startup (- names and hash tables): 501MB name search (Provenceweg, exact):0.001117s loc search (48.51 9.07 48.52 9.08):0.061777s exact name + loc: 0.001331s loc + exact name: 0.005612s name search (Provenceweg, regex):2.582227s loc + regex: 0.232248s exact name + loc (whole world):2.263841s loc (whole world): 0.069157s Name search (Provenceweg, substring):0.600431s Unfair comparison with current namefinder (my code: local, no frontend; gazetteer: remote; with caching, i.e. first result is discarded): old new Beethovenweg0.681s 0.008970s random string (= 0 results) 0.310s 0.000352s I've released my code under GPLv2 in my arch repository [1] as osmsearch--devel--0.1 (haven't had an idea for a better name yet, sorry). You'll need some data files [2-4] as well. Instructions for fetching the source (assuming the GNU arch client tla is installed): 1. tla register-archive http://sascha.silbe.org/arch/sascha-a...@silbe.org--2008 2. tla get sascha-a...@silbe.org--2008/osmsearch--devel--0.1 osmsearch--devel--0.1 This will put the source into a newly-created directory called osmsearch--devel--0.1. [1] http://sascha.silbe.org/arch/sascha-a...@silbe.org--2008 [2] http://sascha.silbe.org/tmp/osmsearch.conf [3] http://sascha.silbe.org/tmp/nodesDisplay.rules [4] http://sascha.silbe.org/tmp/waysDisplay.rules CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] mmap() (was: Re: Some statistics for file-format -development)
On Thu, Dec 04, 2008 at 03:20:02PM -0800, Dave Hansen wrote: I don't think that's possible. Both 32bit and 64bit userspace and emulation work with a 64bit kernel, but at least 64bit userspace on a 32bit kernel isn't possible. Believe it or not, people actually do this on powerpc. Interesting. Is it possible on AMD64 as well? How does the additional address space get managed? As this is getting off topic, I'd suggest to move to private mails. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] mmap() (was: Re: Some statistics for file-format -development)
On Tue, Nov 25, 2008 at 08:47:16AM +0100, Marcus Wolschon wrote: Okay, on a big AMD-multicore-laptop the limit in Java seems to be 1GB of memory-mapped files. That's pretty small given that kernel/user split is usually 2G/2G or even 3G/1G... Ulimit was unlimited and --XX:MaxDirectMemorySize was set to 32GB -Xmx was a few GB. I am using the current Java6 -patchlevel 10 from Sun. Have you tried using a smaller -Xmx size? I have the impression that Java allocates the maximum allowed size directly on startup, relying on the OS to do lazy allocation of pages. If that's the case, reducing -Xmx should increase free virtual address space, allowing bigger memory mappings. I'd like to try this on a 64bit-Linux on that box but don't know how to do that yet. Any help on turning a debian 32bit into a 64-bit or running a 64bit-VM in a 32bit-Linux? I don't think that's possible. Both 32bit and 64bit userspace and emulation work with a 64bit kernel, but at least 64bit userspace on a 32bit kernel isn't possible. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Some statistics for file-format -development
On Mon, Nov 24, 2008 at 10:23:31AM +0100, Marcus Wolschon wrote: It works fine on a 2GB desktop-PC but fails to aquire the mapping on the eeepc with 2GB. Both run Windows XP. So on Windows the maximum mmap() size depends on the RAM size? Ugly. Is it physical RAM size or virtual (i.e. increasing swap file allows larger mmap() sizes)? PS: Thanks for reporting back. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Some statistics for file-format -development
On Mon, Nov 24, 2008 at 12:59:18PM +0100, Marcus Wolschon wrote: Both have 2GB of Ram and both run Windows XP. But one is an Atom N270 -processor and one a desktop Core2. Ah, all right. Unfortunately, none of datasheets [1,2] at Intel mentions the size of the virtual address space. I'd have expected it to be the same for all IA32 systems then... :-/ [1] http://download.intel.com/design/processor/datashts/320032.pdf [2] http://download.intel.com/design/processor/datashts/319977.pdf CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Some statistics for file-format -development
On Fri, Nov 21, 2008 at 10:59:27AM +0100, Marcus Wolschon wrote: I ran into an outOfMemory-Exception memory-mapping the file when importing Baden-Württemberg. [...] (I did not think this would become an issue for file-sizes of about 100MB) Did you try it on Windows or a POSIX compatible OS? 32bit or 64bit architecture/OS/userspace? On 64bit Linux, I can easily mmap() even the whole planet database (~50GB total, with largest file being ~30GB) - on 32bit it's less, but still way more than a few hundred MBs. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] distribution of IDs
On Sun, Nov 16, 2008 at 04:16:19PM +0100, Marcus Wolschon wrote: Does this assumption hold true or are our ID-values more sparsely distributed in the world-file? For planet-080827, I got the following numbers (queried from the Python interface of my own binary db implementation ;) ) : count maxId ratio Nodes 261675640 291467921 0.90 Ways 2121076226580947 0.80 Relations 22149 29694 0.75 Some malformed ways were skipped during import, but I don't expect that to alter above values significantly. Over time, I'd expect the ratios to drop even further since deleted object IDs cannot be recycled (history is still available). CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] The wiki defines the database (was: relations)
On Wed, Nov 05, 2008 at 11:57:07AM +, Andy Allan wrote: [...], which triggered a virulent campaign by the wiki-types to repeatedly delete the information that I had put up, [...] Who exactly are the wiki-types you mention? IMO there shouldn't be the wiki-types and the mappers, those should be one and the same. Without defining what tags (=syntax) mean (=semantic), it's hard to use them properly. From reading the discussions regularly popping up on the mailing lists, I'm getting the impression there's a minority on the wiki disturbing the work of others. That's vandalism to me, nothing more and nothing less. So what about trying to get this minority to stop impeding our work, instead of splitting ourselves into the wiki-types (those defining the semantics) and the mappers (those using the syntax to enter data into the database)? Of course there are other ways of communicating the semantics of the tags you use (e.g. mailing lists), but the wiki is currently the best we have in terms of successful information retrieval. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)
On Sun, Oct 19, 2008 at 01:42:39PM +0200, Freek wrote: Much simpler, as more complex index structures also use more memory (thus risking disk access penalty). Sorted lists, scanned with binary search (= O(log n)). Ok, but what about the O(1)? Indices (or data) with one entry per id (potentially a starting point inside another index). Regarding index structures, I think you can reduce disk access penalty by using slightly more complex index structures (in this case B-trees for example). The O(log_2 n) is the number of disk operations you have to perform. Using a B-tree you can get this down to O(log_B n), where B is the number of elements per node. If you use your implicit pointer idea, you don't even have the extra storage penalty :-) Depending on the application (RAM size, access patterns, etc.), it might be worth it. Configurable index methods would be great. :) So you transparently rewrite internal IDs to external ones and vice versa on in- and output? Not transparently within an application, but for in- and output (e.g. start, goal and path for a router) the application probably will convert them, so it's transparently for the user. What is the advantage of having separate internal IDs? OSM ids are sparse (at most one object per id). Internal ids always point to exactly one object (at least one, at most one). Ah, nice. By the way, is all this still in (early) development, or is there already a description/website/demo online? There's no website or demo, but it works quite fine in router prototype (routing works fine but CLI and postprocessing are not implemented yet). It's available (license is GPL) in my 2008 GNU arch repository [1] as osmbindb--devel--0.1 (the router is osmroute--devel--2.0 in my 2007 repository [2]). To check it out: 1. install tla (apt-get install tla, emerge tla, ...) 2. tla register-archive http:///sascha.silbe.org/arch/[EMAIL PROTECTED] 3. tla get osmbindb--devel--0.1 osmbindb--devel--0.1 This will check it out into a directory called osmbindb--devel--0.1 (the second parameter of the tla get invocation) Ok. Do you plan to make your current format into a kind of standard for binary, indexed OSM data or do you want to go with, say, Marcus' plans (for which I don't clearly see if he wants it to be lossy or lossless)? Well, obviously I like my format better, but if any other proposal (including Markus' one) turns out to be better, that's fine with me. :) [Geocoding with Gosmore] I remembered that it did. The feature list on the wiki says: Incremental search of all tags. Results are ordered from nearest to farthest. Judging from Nic Roets' mails on [EMAIL PROTECTED], there is no spatial index involved. [1] http:///sascha.silbe.org/arch/[EMAIL PROTECTED] [2] http:///sascha.silbe.org/arch/[EMAIL PROTECTED] CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] on-disk indexing of geodata
On Fri, Oct 17, 2008 at 10:30:41AM +0200, Marcus Wolschon wrote: I am looking for advise on how to create an on-disk index in one dimension (element-id-offset where it is stored) and 2 dimensions (bounding-box-nodeIDs and boundingBox-intersecting bouding-boxes of ways). For the second point (i.e. 2D) take a look at [1] and [2]. My own database implementation (GPL, used in my C(++)-level OSM projects) is based on the methods presented in these papers (currently using Peano, not Hilbert). [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.9043 [2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.136 CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] on-disk indexing of geodata
On Fri, Oct 17, 2008 at 11:30:07AM +0200, Marcus Wolschon wrote: Do you have any Idea how such an index can be constructed, given that * the indexed dataset is mutable * At no time can the complete index be loaded into memory (e.g. for rebuilding the index). Unfortunately not. I suppose once you're dealing with non-constant data, looking at a ready-made database implementation is worth a try. Most of the overhead of usual databases will probably come from having to support updates. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] on-disk indexing of geodata
On Fri, Oct 17, 2008 at 02:31:26PM +0200, Freek wrote: See for example this follow-up paper on Hilbert R-trees: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.9180 Interesting paper. Thanks for the pointer! The whole point of R-trees is that they are efficient when stored in external memory. In the case of Hilbert R-trees, bulk-loading is particularly easy because you can just use an external-memory sorting algorithm to sort your input data along a space-filling curve, store consecutive sets of input-item bounding boxes in the leaves of the tree, and build the rest of the tree bottom-up. That's exactly what I do. And since everything is constant, the R-Tree node addresses are constant, too = no need for pointers, very efficient on-disk storage format. Unfortunately not. I suppose once you're dealing with non-constant data, looking at a ready-made database implementation is worth a try. Most of the overhead of usual databases will probably come from having to support updates. Also, they often use (as far as I can tell) the original R-tree algorithm by Guttman from 1984. In the meantime more efficient algorithms have been developed, the Hilbert R-tree being a particularly practical and quite efficient one. Actually it seems I've been limited in view by my own application. The problem I'd have with updating data isn't how to map data updates to tree updates (the topic covered by the paper mentioned above) but how to efficiently do inserts and deletes on an on-disk tree. I haven't done any research on that topic, though, since I don't need it yet (bulk imports are quite fast - ~25 minutes for the 865MB europe.osm.bz2). CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] on-disk indexing of geodata
On Fri, Oct 17, 2008 at 03:25:59PM +0200, Freek wrote: Generally you keep a bit of free space in all nodes to accommodate a number of insertions and deletions, and only split or merge nodes when they overflow or underflow so that you don't need to move around all kinds of elements all the time. But how do you do those splits, i.e. node inserts? Just moving everything behind the insertion point by one block would be way too expensive for large trees, so there's probably a better method. I haven't done any research on that topic, though, since I don't need it yet (bulk imports are quite fast - ~25 minutes for the 865MB europe.osm.bz2). That's not bad, I suppose your index still fits in memory? Yup. I mmap() the whole database. For europe it's less than twice the size of my physical memory (5.3GB vs. 3.5GB), so most of it fits into memory. For whole planet imports it gets much slower (5.6h instead of 2.5h if it were scaling linearly). R-Tree generation scales fine (still fits into memory), 1D index sorting is O(2n) and way processing (including MBR calculation) is O(5n). That's probably due to the nodes being sorted by id in the database so spatial locality is reduced. Quicksort (used for sorting 1D indices) also doesn't use locality for the outer iterations, but isn't a bottleneck yet. Fitting as much as possible into physical RAM to avoid disk accesses was the primary design goal (the last design was severly I/O-constrained). There's still some room for improvement, but the performance gain for the router is already quite impressive (compared to the previous Python+PostgreSQL solution), though not entirely fair yet (only way type and in-builtup-area are used, oneway and maxspeed are ignored; only minimal postprocessing). I think that in most cases the top part of the tree should fit in memory and only the leaves (containing pointers/addresses to the actual items) will stay on disk during operation. With the current node size of 8 entries, the overhead (#non-leafs/#leafs) is about 14%, with all R-Tree structures (leaf+non-leaf MBR, leaf pointers) totalling 128+32=160MB (space is allocated exponentially) for europe, so it fits into memory quite nicely (given modern desktop memory sizes, that is). CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] on-disk indexing of geodata
On Fri, Oct 17, 2008 at 05:20:52PM +0200, Freek wrote: But how do you do those splits, i.e. node inserts? Just pick a new block somewhere else where you have free space (for example at the end of your mmap()'ed file), then split the pointers to the child nodes evenly over the two blocks (old and new one) so they are about half-full, and finally add a pointer to the new block in the parent node (again splitting if necessary, etc.). OK, so you just use pointers (= index gets much larger) and often give up locality (even if you spread some free blocks across the index). Somehow I'd hoped for a silver bullet. ;) For regular planet imports and hourly diffs, using the current structure + invalidation bitmap for the bulk part and a pointer-based RTree for the updated objects might be an idea. Or just a second database with the current structure (= easy to implement) for the updated objects - updates usually are small (compared to the whole data set), so an index rebuild should be fast. R-Tree generation scales fine (still fits into memory), 1D index sorting is O(2n) O(2n)? I guess you are using big O's for a different purpose then I do normally ;-) It's not strictly canonical use, yes. :) It was intended to say that it's about twice as slow as if it would scale strictly linearly. While O(...) ignores constant factors (by definition), they do matter in practice. :) CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] on-disk indexing of geodata
On Fri, Oct 17, 2008 at 06:10:51PM +0200, Freek wrote: OK, so you just use pointers (= index gets much larger) Ah, you're using the fact that your tree is a complete fan-out-8 tree (except for the last part)? Exactly. and often give up locality (even if you spread some free blocks across the index). If your blocks are so large that seeking for them takes about as long as retrieving them, it should not be that big a problem. I think a fan-out of 8 is quite low. Interesting point, might be worth some experimentation. To be honest, I've chosen that value rather arbitrarily - power of 2, overhead somewhere around 10%. :) [constant bulk database with invalidation + dynamic database with pointer-based indices for updates] That might be quite a good idea in OSM practice, you just rebuild the bulk database every so often. Perhaps you can even just use an internal-memory data structure (KD-tree, quad tree, something like that) for the updated part. Depending on the application, that may also be a useful approach. The current mmap() based implementation has the nice side effect that the database effectively resides in the OS cache = startup overhead is minimal, multiple processes share the data. For a long-running server process this might not matter, though. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)
On Fri, Oct 17, 2008 at 07:47:14PM +0200, Freek wrote: Looks like we have enough ideas for adding indexes and such to an OSM binary format, so if the rest of the structure is fixed I think we can get it off the ground (I wouldn't mind at least helping out here and there). OK, so what do we want in a binary DB? Currently there is (in my implementation): Node/way/relation internal - external id: - int - ext O(1) - ext - int O(log n) Node location (2x 32bit int, 64bit Peano): - id - location O(1) - location - int O(log n) Node/way/relation types (using tag - id mapping with wildcard support during import): - id - types ~O(1) - type - ids ~O(log n) Node/way/relation interesting bitmap (name/ref/is_in set and/or known type): - id - interesting O(1) Node/way/relation tags (k/v-pairs are \n-separated and properly escaped): - id - all tags O(1) Way MBR / RTree: - id - MBR O(1) - MBR - ids ~O(log n) Way members (ordered sequence of node ids): - way id - member ids ~O(1) - member id - way ids ~O(1) Relation members (ordered sequence of id+type+role): - rel id - member ids ~O(1) - member id - rel ids ~O(log n) Node/way/relation name/ref/is_in: - text files sorted by id (intended for namefinder - could be changed into proper database entries similar to the way tags are handled) ~O(...) means it's actually also dependant on the number of entries for the given node/way/relation resp. the size of the MBR for RTree lookup. Some O(log n) steps be can optimized by an apprioriate index to be O(1), but this further increases db size = probability of exceeding RAM and thus actually reducing instead of improving the performance is significant. Actually, there probably isn't a one-fits-all solution for the indices, it depends too much on the application and its environment. So each index should be optional for the high-level API (makes it more complex, though). My implementation isn't documented well and there's no high-level API (though both can obviously be fixed), but the above list should be a good basis for discussion. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Binary OSM db
On Fri, Oct 17, 2008 at 08:58:00PM +0200, Stefan de Konink wrote: Node/way/relation name/ref/is_in: - text files sorted by id (intended for namefinder - could be changed into proper database entries similar to the way tags are handled) What do you actually implement here? Example (excerpt from db.NodeName.txt = content of name=... tags on nodes): 39: Kaiserdamm 46: Spandauer Damm 54: Siemensdamm 59: Dreieck Funkturm 86: Saatwinkler Damm It's meant to be parsed and put into RAM, not accessed randomly inside the file. For non-namefinder applications, this has to be replaced (the text files are rather large). For tags, I do store a \0-terminated block per element, separating key and value by = and k/v pairs by \n, with proper escaping. There's an index from element id - start of tags block, but nothing else. Feature type recognition (implemented in the bulk importer) and extraction of other information relevant to the given application is expected to be a preparation / precalculation step, not a runtime one. Of course there might be applications that need efficient access via tag name or tag value even at runtime, so a generic OSM binary database implementation could also provide those indices. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Faster loading with scabies
On Thu, Oct 16, 2008 at 09:21:54AM +1100, Brett Henderson wrote: The biggest problem I found wasn't the actual processing of INSERT statements, it was MySQL scaling non-linearly with the number of rows. MyISAM tables are very fast to import regardless of number of rows, but InnoDB seems to slow down as the number of rows increase. I'm surprised loading with LOAD DATA INFILE fixes that. Just a quick stab in the dark as I do know almost nothing about osmosis and MySQL: If you create the indices before loading data, they will be constantly rebuilt as you add each row = would explain the exponential slow-down. The PostgreSQL manual has a section about optimizing database imports giving some useful tips tricks; I'd guess the same is true for the MySQL one. If osmosis already defers index creation (including referential constraints which usually build indices implicitly), just ignore this mail. :) CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[josm-dev] Disable splash screen
Hi! How do I disable the splash screen for JOSM? CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ josm-dev mailing list josm-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/josm-dev
Re: [OSM-dev] Spatial vs. multi-column indexes for points
On Thu, Sep 11, 2008 at 01:49:55PM +0200, Andreas Kalsch wrote: Has anybody used GIS successfully in MySQL or PGSQL and can tell me how the performance compares between the two techniques? At least in PostgreSQL/PostGIS, the geometric indices are WAY faster. Just remember to either use the ST_* family of predicates (implicit bounding box scan) or add an term to do an explicit bounding box scan so the index is actually used. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Daily Diff parsing - tile expiring
On Thu, Aug 21, 2008 at 08:49:12AM +0200, bvh wrote: [New namefinder including custom database implementation] Very nice. Do you plan on open sourcing this? Definitely interested! Just added the necessary text blocks. It's available in my GNU arch repository [1] as osmsearch--devel--0.1. Steps for retrieving it: 1. sudo aptitude install tla || sudo emerge tla 2. tla register-archive http://sascha.silbe.org/arch/[EMAIL PROTECTED] 3. tla get [EMAIL PROTECTED]/osmsearch--devel--0.1 osmsearch [1] http://sascha.silbe.org/arch/[EMAIL PROTECTED] CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] FW: [OSM-newbies] Calculating bounding box of a clipped rectangle
On Fri, May 09, 2008 at 03:07:38PM +0100, Andy Robinson (blackadder-lists) wrote: The bounding box seem to be ok but not accurate. Each location within the box is off by quite a tens of meters (20-30). I'm using Vincenty's algorithm. There's a JavaScript implementation on [1] that I translated to Python, pyrex and even C for use on an AVR microcontroller. [1] http://www.movable-type.co.uk/scripts/latlong-vincenty-direct.html CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev