[OSM-dev] Namefinder replacements (was: Re: implemented API for J2SE and J2ME)

2009-04-22 Thread Sascha Silbe

On Wed, Apr 22, 2009 at 07:41:56PM +0200, Frederik Ramm wrote:


The status is, basically, that everything is vapourware except the Geocoder in 
OpenRouteService which is closed-source due to ongoing academic activity.

Well, vapourware is a bit hard, but at least my own implementation (see the 
archive) would indeed require some work (mainly the frontend code - the timing 
critical backend part is done) to be of actual use.
One reason I didn't finish the project is that I never got any answer to 
exactly under which circumstances the current namefinder is slow - during my 
own tests, it was reasonably fast. Without knowing that I cannot test whether 
my own implementation is actually performing better under comparable 
conditions, so replacing the current one doesn't make much sense.
The code is available under the terms of GPL v2 if anyone would like to finish 
or reuse it.

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/

signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Export TMC-Event-Table from PDF to CSV

2009-04-03 Thread Sascha Silbe
On Fri, Apr 03, 2009 at 10:16:29AM +0200, marcus.wolsc...@googlemail.com 
wrote:



With 1590 lines it would be quite a task to convert
that into something computer-readable using regexp
only.
1590 is the highest line number in Event List_DE_30.xls (from BASt) as 
well, perhaps that document is sufficient?



Does anyone have the TMC-events in a machine-readable format?
What's the license of the TMC events specification? Is it allowed to 
convert and distribute it at all?


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Errors while svn up

2009-03-09 Thread Sascha Silbe

On Sun, Mar 08, 2009 at 07:56:15PM +, Tom Hughes wrote:

I think mod_evasive may be the problem, so I've disabled it for now. 
It has been on for a while now though so I'm not sure why it has 
started causing problems now.

Maybe SVN content got large enough to exceed the hash table?

 From [1]:
  Detection is performed by creating an internal dynamic hash table of 
IP Addresses and

  URIs, and denying any single IP address from any of the following:



* Requesting the same page more than a few times per second


Maybe the server got faster or you reduced the number of Apache child 
processes so a single one would handle more requests from the same 
originator?


* Making more than 50 concurrent requests on the same child per 
second



[1] http://www.zdziarski.com/projects/mod_evasive/

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] openstreetmap could start at user's approximate location using geo-ip

2009-01-18 Thread Sascha Silbe

On Sun, Jan 18, 2009 at 03:34:14PM +, Tom Hughes wrote:

It uses the cookie first, then your home location (assuming you're 
logged in), then GeoIP via hostip.info, and finally defaults to 
Europe.
Take a look at www.hostip.info if you want to see where it thinks you 
are.
hostip.info locates me at Stuttgart, Germany, but OSM shows Europe 
centered on UK. Cookies, cache and proxy disabled for testing.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [josm-dev] JOSM Fail

2009-01-16 Thread Sascha Silbe

On Fri, Jan 16, 2009 at 01:08:35PM +0100, Pieren wrote:


+1 for making this harder to activate (separate mode, ctrl+drag,
whatever). It's pretty annoying when trying to select a region in a
densely mapped area.

-1
I'm moving ways. I cannot say every day but I move ways when I:
- improve roundabouts (position, circle)
- draw dual carriageways (copy, paste, reverse, move)
- draw areas like buildings or parks
How would e.g. having to press Ctrl in addition to dragging impact your 
workflow?
I don't see why you're opposed to making it harder to _unintentionally_ 
using it.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
josm-dev mailing list
josm-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/josm-dev


Re: [josm-dev] JOSM Fail

2009-01-16 Thread Sascha Silbe

On Fri, Jan 16, 2009 at 08:49:45PM +0100, Rolf Bode-Meyer wrote:

This isn't about newcomers vs. experienced users. I _am_ experienced 
and
during my normal work I _often_ move objects when I want to select 
something
instead, just because JOSM binds two separate actions to the same 
input.

Huh? What two separate actions?
Region Select and Moving. Just depending on where exactly between two 
ways the pointer is, JOSM does different things. If it's too close to 
one of them, it will move it, instead of starting to select a region.
If ways are too close together (remember: densely mapped area), it's 
impossible to do a region select.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
josm-dev mailing list
josm-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/josm-dev


Re: [josm-dev] JOSM Mappaint - major improvements

2009-01-13 Thread Sascha Silbe

On Tue, Jan 13, 2009 at 12:58:38PM +0100, Rolf Bode-Meyer wrote:


What I don't understand - with your change and before - is that paint
speed seems to depend on the amount of data in the layer even if it's
outside the view.
That's to be expected. Even with a spatial index (AFAIK josm still 
doesn't use one yet), looking up the objects inside a give bounding box 
(the view) is dependant on the total number of objects. For example, a 
2D-PR-Tree lookup is about O(sqrt(n)) [1], with n being the total number 
of objects stored in the tree.
If some genius invents a spatial index with lookup in O(1), I'm sure 
we'll hear about it. :)



[1] http://www.cse.ust.hk/~yike/prtree/pr.pdf

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
josm-dev mailing list
josm-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/josm-dev


Re: [OSM-dev] Anyone with a speedy gazetteer

2009-01-12 Thread Sascha Silbe

On Mon, Jan 12, 2009 at 10:57:24AM +0100, Erik Johansson wrote:


I have two questions; why is gazetteer.openstreetmap.org so slow  at
20-50 seconds per request,
I remember David Earl mentioning a scheduled database rebuild. Perhaps 
its currently running. Otherwise, I'd really like to know the reasons 
(so I can test my own implementation accordingly).


and if anyone has code for faster variants of name finders? Speed is 
essential..

I have done an alternative, but suspended its development because
a) the current Namefinder was sufficiently fast when I tried to compare 
them = don't know under which conditions it's slow, so I cannot test 
whether my implementation really is faster in these cases; and

b) because I haven't had enough time to finish it.

The core (written in C/C++) is almost finished (only some minor changes 
for name canonicalization needed). What's really missing is the 
front-end part, i.e. the web server and search plan generation. A 
Python module implementing the protocol between Core and Frontend is 
included, so someone else could start implementing it right away. :)



To give some funny, unscientific numbers:

General conditions: planet-080813.osm.gz, Athlon 64 BE-2300 dual-core 
1.9Ghz, 4GB DDR2-800, Limit 100 ways + 100 nodes per step


Data base size: ~8GB
import time: 206 minutes
startup time: 41 seconds
RAM usage after startup (- names and hash tables): 501MB

name search (Provenceweg, exact):0.001117s
loc search (48.51 9.07 48.52 9.08):0.061777s
exact name + loc:  0.001331s
loc + exact name:  0.005612s
name search (Provenceweg, regex):2.582227s
loc + regex:   0.232248s
exact name + loc (whole world):2.263841s
loc (whole world): 0.069157s
Name search (Provenceweg, substring):0.600431s


Unfair comparison with current namefinder (my code: local, no frontend; 
gazetteer: remote; with caching, i.e. first result is discarded):


old new
Beethovenweg0.681s  0.008970s
random string (= 0 results) 0.310s  0.000352s



I've released my code under GPLv2 in my arch repository [1] as 
osmsearch--devel--0.1 (haven't had an idea for a better name yet, 
sorry).

You'll need some data files [2-4] as well.

Instructions for fetching the source (assuming the GNU arch client tla 
is installed):


1. tla register-archive 
http://sascha.silbe.org/arch/sascha-a...@silbe.org--2008
2. tla get sascha-a...@silbe.org--2008/osmsearch--devel--0.1 
osmsearch--devel--0.1


This will put the source into a newly-created directory called 
osmsearch--devel--0.1.



[1] http://sascha.silbe.org/arch/sascha-a...@silbe.org--2008
[2] http://sascha.silbe.org/tmp/osmsearch.conf
[3] http://sascha.silbe.org/tmp/nodesDisplay.rules
[4] http://sascha.silbe.org/tmp/waysDisplay.rules

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] mmap() (was: Re: Some statistics for file-format -development)

2008-12-04 Thread Sascha Silbe

On Thu, Dec 04, 2008 at 03:20:02PM -0800, Dave Hansen wrote:

I don't think that's possible. Both 32bit and 64bit userspace and 
emulation work with a 64bit kernel, but at least 64bit userspace on a 
32bit kernel isn't possible.

Believe it or not, people actually do this on powerpc.
Interesting. Is it possible on AMD64 as well? How does the additional 
address space get managed?

As this is getting off topic, I'd suggest to move to private mails.

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] mmap() (was: Re: Some statistics for file-format -development)

2008-11-25 Thread Sascha Silbe

On Tue, Nov 25, 2008 at 08:47:16AM +0100, Marcus Wolschon wrote:


Okay, on a big AMD-multicore-laptop the limit in Java seems to be
1GB of memory-mapped files.
That's pretty small given that kernel/user split is usually 2G/2G or 
even 3G/1G...



Ulimit was unlimited and --XX:MaxDirectMemorySize was set to 32GB
-Xmx was a few GB. I am using the current Java6 -patchlevel 10 from
Sun.
Have you tried using a smaller -Xmx size? I have the impression that 
Java allocates the maximum allowed size directly on startup, relying on 
the OS to do lazy allocation of pages. If that's the case, reducing -Xmx 
should increase free virtual address space, allowing bigger memory 
mappings.


I'd like to try this on a 64bit-Linux on that box but don't know how 
to
do that yet. Any help on turning a debian 32bit into a 64-bit or 
running

a 64bit-VM in a 32bit-Linux?
I don't think that's possible. Both 32bit and 64bit userspace and 
emulation work with a 64bit kernel, but at least 64bit userspace on a 
32bit kernel isn't possible.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Some statistics for file-format -development

2008-11-24 Thread Sascha Silbe

On Mon, Nov 24, 2008 at 10:23:31AM +0100, Marcus Wolschon wrote:

It works fine on a 2GB desktop-PC but fails to aquire the mapping on 
the eeepc with 2GB.

Both run Windows XP.
So on Windows the maximum mmap() size depends on the RAM size? Ugly. Is 
it physical RAM size or virtual (i.e. increasing swap file allows larger 
mmap() sizes)?


PS: Thanks for reporting back.

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Some statistics for file-format -development

2008-11-24 Thread Sascha Silbe

On Mon, Nov 24, 2008 at 12:59:18PM +0100, Marcus Wolschon wrote:


Both have 2GB of Ram and both run Windows XP.
But one is an Atom N270 -processor and one a
desktop Core2.
Ah, all right. Unfortunately, none of datasheets [1,2] at Intel mentions 
the size of the virtual address space. I'd have expected it to be the 
same for all IA32 systems then... :-/



[1] http://download.intel.com/design/processor/datashts/320032.pdf
[2] http://download.intel.com/design/processor/datashts/319977.pdf

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Some statistics for file-format -development

2008-11-21 Thread Sascha Silbe

On Fri, Nov 21, 2008 at 10:59:27AM +0100, Marcus Wolschon wrote:

I ran into an outOfMemory-Exception memory-mapping the file when 
importing

Baden-Württemberg.

[...]
(I did not think this would become an issue for file-sizes of about 
100MB)
Did you try it on Windows or a POSIX compatible OS? 32bit or 64bit 
architecture/OS/userspace?
On 64bit Linux, I can easily mmap() even the whole planet database 
(~50GB total, with largest file being ~30GB) - on 32bit it's less, but 
still way more than a few hundred MBs.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] distribution of IDs

2008-11-16 Thread Sascha Silbe

On Sun, Nov 16, 2008 at 04:16:19PM +0100, Marcus Wolschon wrote:


Does this assumption hold true or are our ID-values more sparsely
distributed in the world-file?
For planet-080827, I got the following numbers (queried from the Python 
interface of my own binary db implementation ;) ) :


count   maxId   ratio
Nodes   261675640   291467921   0.90
Ways 2121076226580947   0.80
Relations   22149   29694   0.75


Some malformed ways were skipped during import, but I don't expect that 
to alter above values significantly.


Over time, I'd expect the ratios to drop even further since deleted 
object IDs cannot be recycled (history is still available).


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] The wiki defines the database (was: relations)

2008-11-05 Thread Sascha Silbe

On Wed, Nov 05, 2008 at 11:57:07AM +, Andy Allan wrote:


[...], which
triggered a virulent campaign by the wiki-types to repeatedly delete
the information that I had put up, [...]

Who exactly are the wiki-types you mention?
IMO there shouldn't be the wiki-types and the mappers, those should 
be one and the same. Without defining what tags (=syntax) mean 
(=semantic), it's hard to use them properly.
From reading the discussions regularly popping up on the mailing lists, 
I'm getting the impression there's a minority on the wiki disturbing the 
work of others. That's vandalism to me, nothing more and nothing less.
So what about trying to get this minority to stop impeding our work, 
instead of splitting ourselves into the wiki-types (those defining the 
semantics) and the mappers (those using the syntax to enter data into 
the database)?


Of course there are other ways of communicating the semantics of the 
tags you use (e.g. mailing lists), but the wiki is currently the best we 
have in terms of successful information retrieval.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)

2008-10-22 Thread Sascha Silbe

On Sun, Oct 19, 2008 at 01:42:39PM +0200, Freek wrote:


Much simpler, as more complex index structures also use more memory
(thus risking disk access penalty). Sorted lists, scanned with binary
search (= O(log n)).

Ok, but what about the O(1)?
Indices (or data) with one entry per id (potentially a starting point 
inside another index).


Regarding index structures, I think you can reduce disk access penalty 
by using slightly more complex index structures (in this case B-trees 
for example). The O(log_2 n) is the number of disk operations you have 
to perform. Using a B-tree you can get this down to O(log_B n), where 
B is the number of elements per node. If you use your implicit 
pointer idea, you don't even have the extra storage penalty :-)
Depending on the application (RAM size, access patterns, etc.), it might 
be worth it. Configurable index methods would be great. :)


So you transparently rewrite internal IDs to external ones and vice 
versa on in- and output?
Not transparently within an application, but for in- and output (e.g. 
start, goal and path for a router) the application probably will convert 
them, so it's transparently for the user.



What is the advantage of having separate internal IDs?
OSM ids are sparse (at most one object per id). Internal ids always 
point to exactly one object (at least one, at most one).


Ah, nice. By the way, is all this still in (early) development, or is 
there already a description/website/demo online?
There's no website or demo, but it works quite fine in router prototype 
(routing works fine but CLI and postprocessing are not implemented yet).
It's available (license is GPL) in my 2008 GNU arch repository [1] as 
osmbindb--devel--0.1 (the router is osmroute--devel--2.0 in my 2007 
repository [2]). To check it out:


1. install tla (apt-get install tla, emerge tla, ...)
2. tla register-archive 
http:///sascha.silbe.org/arch/[EMAIL PROTECTED]

3. tla get osmbindb--devel--0.1 osmbindb--devel--0.1

This will check it out into a directory called osmbindb--devel--0.1 (the 
second parameter of the tla get invocation)


Ok. Do you plan to make your current format into a kind of standard 
for binary, indexed OSM data or do you want to go with, say, Marcus' 
plans (for which I don't clearly see if he wants it to be lossy or 
lossless)?
Well, obviously I like my format better, but if any other proposal 
(including Markus' one) turns out to be better, that's fine with me. :)


[Geocoding with Gosmore]

I remembered that it did. The feature list on the wiki says:
Incremental search of all tags. Results are ordered from nearest to 
farthest.
Judging from Nic Roets' mails on [EMAIL PROTECTED], there is no spatial index 
involved.



[1] http:///sascha.silbe.org/arch/[EMAIL PROTECTED]
[2] http:///sascha.silbe.org/arch/[EMAIL PROTECTED]

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe

On Fri, Oct 17, 2008 at 10:30:41AM +0200, Marcus Wolschon wrote:


I am looking for advise on how to create an on-disk index in one
dimension (element-id-offset where it is stored)
and 2 dimensions (bounding-box-nodeIDs and boundingBox-intersecting
bouding-boxes
of ways).
For the second point (i.e. 2D) take a look at [1] and [2]. My own 
database implementation (GPL, used in my C(++)-level OSM projects) is 
based on the methods presented in these papers (currently using Peano, 
not Hilbert).



[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.9043
[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.136

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe

On Fri, Oct 17, 2008 at 11:30:07AM +0200, Marcus Wolschon wrote:


Do you have any Idea how such an index can be constructed, given that
* the indexed dataset is mutable
* At no time can the complete index be loaded into memory (e.g. for
  rebuilding the index).
Unfortunately not. I suppose once you're dealing with non-constant data, 
looking at a ready-made database implementation is worth a try. Most of 
the overhead of usual databases will probably come from having to 
support updates.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe

On Fri, Oct 17, 2008 at 02:31:26PM +0200, Freek wrote:


See for example this follow-up paper on Hilbert R-trees:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.9180

Interesting paper. Thanks for the pointer!

The whole point of R-trees is that they are efficient when stored in 
external memory. In the case of Hilbert R-trees, bulk-loading is 
particularly easy because you can just use an external-memory sorting 
algorithm to sort your input data along a space-filling curve, store 
consecutive sets of input-item bounding boxes in the leaves of the 
tree, and build the rest of the tree bottom-up.
That's exactly what I do. And since everything is constant, the R-Tree 
node addresses are constant, too = no need for pointers, very 
efficient on-disk storage format.


Unfortunately not. I suppose once you're dealing with non-constant 
data,
looking at a ready-made database implementation is worth a try. Most 
of

the overhead of usual databases will probably come from having to
support updates.
Also, they often use (as far as I can tell) the original R-tree 
algorithm by Guttman from 1984. In the meantime more efficient 
algorithms have been developed, the Hilbert R-tree being a 
particularly practical and quite efficient one.
Actually it seems I've been limited in view by my own application. The 
problem I'd have with updating data isn't how to map data updates to 
tree updates (the topic covered by the paper mentioned above) but how to 
efficiently do inserts and deletes on an on-disk tree. I haven't done 
any research on that topic, though, since I don't need it yet (bulk 
imports are quite fast - ~25 minutes for the 865MB europe.osm.bz2).


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe

On Fri, Oct 17, 2008 at 03:25:59PM +0200, Freek wrote:

Generally you keep a bit of free space in all nodes to accommodate a 
number of insertions and deletions, and only split or merge nodes when 
they overflow or underflow so that you don't need to move around all 
kinds of elements all the time.
But how do you do those splits, i.e. node inserts? Just moving 
everything behind the insertion point by one block would be way too 
expensive for large trees, so there's probably a better method.


I haven't done any research on that topic, though, since I don't need 
it yet (bulk

imports are quite fast - ~25 minutes for the 865MB europe.osm.bz2).

That's not bad, I suppose your index still fits in memory?
Yup. I mmap() the whole database. For europe it's less than twice the 
size of my physical memory (5.3GB vs. 3.5GB), so most of it fits into 
memory. For whole planet imports it gets much slower (5.6h instead of 
2.5h if it were scaling linearly). R-Tree generation scales fine (still 
fits into memory), 1D index sorting is O(2n) and way processing 
(including MBR calculation) is O(5n). That's probably due to the nodes 
being sorted by id in the database so spatial locality is reduced. 
Quicksort (used for sorting 1D indices) also doesn't use locality for 
the outer iterations, but isn't a bottleneck yet.
Fitting as much as possible into physical RAM to avoid disk accesses was 
the primary design goal (the last design was severly I/O-constrained). 
There's still some room for improvement, but the performance gain for 
the router is already quite impressive (compared to the previous 
Python+PostgreSQL solution), though not entirely fair yet (only way type 
and in-builtup-area are used, oneway and maxspeed are ignored; only 
minimal postprocessing).


I think that in most cases the top part of the tree should fit in 
memory and only the leaves (containing pointers/addresses to the 
actual items) will stay on disk during operation.
With the current node size of 8 entries, the overhead 
(#non-leafs/#leafs) is about 14%, with all R-Tree structures 
(leaf+non-leaf MBR, leaf pointers) totalling 128+32=160MB (space is 
allocated exponentially) for europe, so it fits into memory quite nicely 
(given modern desktop memory sizes, that is).


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe

On Fri, Oct 17, 2008 at 05:20:52PM +0200, Freek wrote:


But how do you do those splits, i.e. node inserts?
Just pick a new block somewhere else where you have free space (for 
example at the end of your mmap()'ed file), then split the pointers to 
the child nodes evenly over the two blocks (old and new one) so they 
are about half-full, and finally add a pointer to the new block in the 
parent node (again splitting if necessary, etc.).
OK, so you just use pointers (= index gets much larger) and often give 
up locality (even if you spread some free blocks across the index). 
Somehow I'd hoped for a silver bullet. ;)
For regular planet imports and hourly diffs, using the current structure 
+ invalidation bitmap for the bulk part and a pointer-based RTree for 
the updated objects might be an idea. Or just a second database with the 
current structure (= easy to implement) for the updated objects - 
updates usually are small (compared to the whole data set), so an index 
rebuild should be fast.



R-Tree generation scales fine (still fits into memory),
1D index sorting is O(2n)
O(2n)? I guess you are using big O's for a different purpose then I do 
normally ;-)

It's not strictly canonical use, yes. :)
It was intended to say that it's about twice as slow as if it would 
scale strictly linearly. While O(...) ignores constant factors (by 
definition), they do matter in practice. :)


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe

On Fri, Oct 17, 2008 at 06:10:51PM +0200, Freek wrote:


OK, so you just use pointers (= index gets much larger)
Ah, you're using the fact that your tree is a complete fan-out-8 tree 
(except for the last part)?

Exactly.

and often give up locality (even if you spread some free blocks 
across the index).
If your blocks are so large that seeking for them takes about as long 
as retrieving them, it should not be that big a problem. I think a 
fan-out of 8 is quite low.
Interesting point, might be worth some experimentation. To be honest, 
I've chosen that value rather arbitrarily - power of 2, overhead 
somewhere around 10%. :)


[constant bulk database with invalidation + dynamic database with 
pointer-based indices for updates]
That might be quite a good idea in OSM practice, you just rebuild the 
bulk database every so often. Perhaps you can even just use an 
internal-memory data structure (KD-tree, quad tree, something like 
that) for the updated part.
Depending on the application, that may also be a useful approach. The 
current mmap() based implementation has the nice side effect that the 
database effectively resides in the OS cache = startup overhead is 
minimal, multiple processes share the data. For a long-running server 
process this might not matter, though.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Binary OSM db (was: Re: on-disk indexing of geodata)

2008-10-17 Thread Sascha Silbe

On Fri, Oct 17, 2008 at 07:47:14PM +0200, Freek wrote:

Looks like we have enough ideas for adding indexes and such to an OSM 
binary format, so if the rest of the structure is fixed I think we can 
get it off the ground (I wouldn't mind at least helping out here and 
there).
OK, so what do we want in a binary DB? Currently there is (in my 
implementation):


Node/way/relation internal - external id:
   - int - ext O(1)
   - ext - int O(log n)

Node location (2x 32bit int, 64bit Peano):
   - id - location O(1)
   - location - int O(log n)

Node/way/relation types (using tag - id mapping with wildcard support 
during import):

   - id - types ~O(1)
   - type - ids ~O(log n)

Node/way/relation interesting bitmap (name/ref/is_in set and/or known 
type):

   - id - interesting O(1)

Node/way/relation tags (k/v-pairs are \n-separated and properly 
escaped):

   - id - all tags O(1)

Way MBR / RTree:
   - id - MBR O(1)
   - MBR - ids ~O(log n)

Way members (ordered sequence of node ids):
   - way id - member ids ~O(1)
   - member id - way ids ~O(1)

Relation members (ordered sequence of id+type+role):
   - rel id - member ids ~O(1)
   - member id - rel ids ~O(log n)

Node/way/relation name/ref/is_in:
   - text files sorted by id (intended for namefinder - could be 
changed into proper database entries similar to the way tags are 
handled)


~O(...) means it's actually also dependant on the number of entries for 
the given node/way/relation resp. the size of the MBR for RTree lookup.
Some O(log n) steps be can optimized by an apprioriate index to be O(1), 
but this further increases db size = probability of exceeding RAM and 
thus actually reducing instead of improving the performance is 
significant.
Actually, there probably isn't a one-fits-all solution for the indices, 
it depends too much on the application and its environment. So each 
index should be optional for the high-level API (makes it more complex, 
though).


My implementation isn't documented well and there's no high-level API 
(though both can obviously be fixed), but the above list should be a 
good basis for discussion.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Binary OSM db

2008-10-17 Thread Sascha Silbe

On Fri, Oct 17, 2008 at 08:58:00PM +0200, Stefan de Konink wrote:


Node/way/relation name/ref/is_in:
- text files sorted by id (intended for namefinder - could be 
changed

into proper database entries similar to the way tags are handled)

What do you actually implement here?
Example (excerpt from db.NodeName.txt = content of name=... tags on 
nodes):


39: Kaiserdamm
46: Spandauer Damm
54: Siemensdamm
59: Dreieck Funkturm
86: Saatwinkler Damm

It's meant to be parsed and put into RAM, not accessed randomly inside 
the file. For non-namefinder applications, this has to be replaced (the 
text files are rather large).
For tags, I do store a \0-terminated block per element, separating key 
and value by = and k/v pairs by \n, with proper escaping. There's an 
index from element id - start of tags block, but nothing else.
Feature type recognition (implemented in the bulk importer) and 
extraction of other information relevant to the given application is 
expected to be a preparation / precalculation step, not a runtime one.
Of course there might be applications that need efficient access via tag 
name or tag value even at runtime, so a generic OSM binary database 
implementation could also provide those indices.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Faster loading with scabies

2008-10-16 Thread Sascha Silbe

On Thu, Oct 16, 2008 at 09:21:54AM +1100, Brett Henderson wrote:


The biggest problem I found wasn't the actual processing of INSERT
statements, it was MySQL scaling non-linearly with the number of rows.
MyISAM tables are very fast to import regardless of number of rows, 
but
InnoDB seems to slow down as the number of rows increase.  I'm 
surprised

loading with LOAD DATA INFILE fixes that.
Just a quick stab in the dark as I do know almost nothing about osmosis 
and MySQL:
If you create the indices before loading data, they will be constantly 
rebuilt as you add each row = would explain the exponential slow-down. 
The PostgreSQL manual has a section about optimizing database imports 
giving some useful tips  tricks; I'd guess the same is true for the 
MySQL one.


If osmosis already defers index creation (including referential 
constraints which usually build indices implicitly), just ignore this 
mail. :)


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[josm-dev] Disable splash screen

2008-10-16 Thread Sascha Silbe

Hi!

How do I disable the splash screen for JOSM?

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
josm-dev mailing list
josm-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/josm-dev


Re: [OSM-dev] Spatial vs. multi-column indexes for points

2008-09-11 Thread Sascha Silbe

On Thu, Sep 11, 2008 at 01:49:55PM +0200, Andreas Kalsch wrote:

Has anybody used GIS successfully in MySQL or PGSQL and can tell me 
how the performance compares between the two techniques?
At least in PostgreSQL/PostGIS, the geometric indices are WAY faster. 
Just remember to either use the ST_* family of predicates (implicit 
bounding box scan) or add an  term to do an explicit bounding box 
scan so the index is actually used.


CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Daily Diff parsing - tile expiring

2008-08-21 Thread Sascha Silbe

On Thu, Aug 21, 2008 at 08:49:12AM +0200, bvh wrote:

[New namefinder including custom database implementation]

Very nice. Do you plan on open sourcing this? Definitely interested!
Just added the necessary text blocks. It's available in my GNU arch 
repository [1] as osmsearch--devel--0.1. Steps for retrieving it:


1. sudo aptitude install tla || sudo emerge tla
2. tla register-archive 
http://sascha.silbe.org/arch/[EMAIL PROTECTED]

3. tla get [EMAIL PROTECTED]/osmsearch--devel--0.1 osmsearch


[1] http://sascha.silbe.org/arch/[EMAIL PROTECTED]

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] FW: [OSM-newbies] Calculating bounding box of a clipped rectangle

2008-05-09 Thread Sascha Silbe
On Fri, May 09, 2008 at 03:07:38PM +0100, Andy Robinson 
(blackadder-lists) wrote:


The bounding box seem to be ok but not accurate. Each location within 
the

box is off by quite a tens of meters (20-30).
I'm using Vincenty's algorithm. There's a JavaScript implementation on 
[1] that I translated to Python, pyrex and even C for use on an AVR 
microcontroller.



[1] http://www.movable-type.co.uk/scripts/latlong-vincenty-direct.html

CU Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev