bz2 is *very* slow to decompress, so yes if you have the space I'd recommend
decompressing the osm first before running the splitter (since the splitter
has to make a minimum of two passes over the file, thus also decompressing
it at least twice). The (limited and simple) benchmarks I tried with
Before patches I gained about 5% by extrackting witz 7z unpacker on
commandline, before running the splitter (however compiled 10 days ago, so
excluding the lates changes).
2009/8/7 Clinton Gladstone
> On Fri, Aug 7, 2009 at 2:56 PM, Chris Miller
> wrote:
>
> > I've got 4 cores (8 with hyperthre
On Fri, Aug 7, 2009 at 2:56 PM, Chris Miller wrote:
> I've got 4 cores (8 with hyperthreading) so this is something I'm acutely
> aware of. Watching my PC churn away at only 12.5% CPU for a few hours isn't
> my idea or resources well spent! Unfortunately there's no quick win because
> the XML pars
Great, I'll keep an impatient eye at the mailinglist ;-)
Chris Miller wrote:
>> While you're on it, dare I ask to check if it's easy to do some stuff
>> in different threads? I notice that one core is at 100% almost
>> constantly while the harddisk is still far from being maxed-out. There
>> shoul
> While you're on it, dare I ask to check if it's easy to do some stuff
> in different threads? I notice that one core is at 100% almost
> constantly while the harddisk is still far from being maxed-out. There
> should be room to further speed things up a bit.
I've got 4 cores (8 with hyperthreadi
Chris Miller wrote:
> Thanks for the info. Sounds like you've found about the same limit I have
> with --max-nodes and -Xmx4000m. This weekend I'm going to add a --max-areas
> parameter. Setting this to a number < 255 should allow for higher a
> --max-nodes
> value with the same heap size. In y
>> Just an thought from reading the thread:
>> Multiple parsing runs with an bz2 zipped file could do worse to the
>> performance. It would mean multiple decompressing of the input files.
>> And in my experience decompressing bz2 costs a lot of resources.
>>
>> (In my case I'm directly using the
Thanks for the info. Sounds like you've found about the same limit I have
with --max-nodes and -Xmx4000m. This weekend I'm going to add a --max-areas
parameter. Setting this to a number < 255 should allow for higher a --max-nodes
value with the same heap size. In your case with 367 areas, settin
Tested this latest version on my American extract with -Xmx4000m:
With 1.2 million nodes the Java VM crashed due to lack of memory. Using
1 million nodes the split succeeded with 367 areas in 3:20 hours. Some
swapping was noticed (bad for speed).
Although I'd rather use the 1.2 million settin
Thanks for the info. Given that your run started printing out the areas being
split then yes you were very very close to getting the areas.list file. After
that's generated, the heap size drops right down and starts growing again
on the splitting pass. The smaller --max-nodes value you used woul
Oops, I just realize that I ran Splitter with --max-node=120k instead of
1.2 million. Dunno if this influences the test results...
Lambertus wrote:
> I've ran this splitter against my America extract on a centrino2 laptop
> with 4gb ram, below are the results.
>
> Splitter was working on ~243 m
I've ran this splitter against my America extract on a centrino2 laptop
with 4gb ram, below are the results.
Splitter was working on ~243 million nodes and Java had 3.9 GB heap
space (Xmx3900m).
Memory usage before I went to bed: 4.3 GB virtual, 3.4 memory, Splitter
was calculating the areas t
Thank you. Here are my final results:
old splitter ( 51922 bytes): 196s (uncompressed)
new splitter (175493 bytes): 181s (uncompressed)
new splitter (175493 bytes): 197s (gzip)
new splitter (175493 bytes): 611s (bzip2)
Rudi
___
mkgmap-dev mailing list
Aha, thank you, this was very helpful! I've been doing most of my testing
with compressed osm files but I just did a few runs with an uncompressed
file instead and I see what you mean.
The problem was due to some buffering I'd added when reading the osm file.
It didn't make much difference for
Johann Gail writes:
> Just an thought from reading the thread:
> Multiple parsing runs with an bz2 zipped file could do worse to the
> performance. It would mean multiple decompressing of the input files.
> And in my experience decompressing bz2 costs a lot of resources.
>
> (In my case I'm di
Exactly, uncompressing is pretty hard on the CPU so ideally we'd want to
make as few passes as possible. There are other factors too though, eg I'm
thinking about doing the decompression on one thread and the parsing/splitting
on another thread if there is more than one core available. I haven't
> Multiple parsing runs could be made over the planet file instead
> however I can't see that offering very good performance.
>
>
Just an thought from reading the thread:
Multiple parsing runs with an bz2 zipped file could do worse to the
performance. It would mean multiple decompressing of
> Can you please tell me exactly what parameters you used to run both the
> old
> and the new splitters to get your results? Would it be possible to get
> hold
> of the osm file you used to run your test against? I'd be interested in
> replicating
> the poor performance so I can try to fix it.
Hi
I've built a new version that *might* be able to handle the planet OK. I
don't know how many areas North America breaks in to, but if you're able
to handle 255 areas (at 1,600,000 nodes each) with an older version of the
splitter, then I think this version should work for the whole planet:
http
>> old splitter: 370 sec
>> new splitter: 470 sec
>
> Hmm, I'm surprised the new version is taking longer actually. I didn't
> benchmark my most recent changes so perhaps I've done something silly.
> It should be the same speed or faster than the old. Thanks for the
> statistics, I'll take a look a
My latest changes were just quick fixes to get relations and 256+ tiles
working,
there's still more that needs to be done before this is really useful. In
particular, I need to prevent the splitter from hanging on to node and way
information that it doesn't need for a given set of 255 areas. In
> thanks a lot for new splitter beta release. I tried your version with
> a small region (Bayern in Germany). Max nodes 800.000, 12 tiles:
>
> old splitter: 310 MB RAM
> new splitter: 240 MB RAM
> old splitter: 370 sec
> new splitter: 470 sec
Hmm, I'm surprised the new version is taking longer ac
Thanks for your work on this, much appreciated!
I'm trying to get a grip on how to use this version of splitter and
re-read your mail a few times. I hope I understand it:
So this version should eliminate the need to split the planet with
Osmosis before giving it to splitter. Just provide the pl
> Any feedback, questions or suggestions are welcome.
>
> Chris
Hi Chris,
thanks a lot for new splitter beta release. I tried your version with a small
region (Bayern in Germany). Max nodes 800.000, 12 tiles:
old splitter: 310 MB RAM
new splitter: 240 MB RAM
old splitter: 370 sec
new splitter
I've now made some changes that remove the 4-area per relation limit and
also the 255 tile limit. The 255 tile fix is just a workaround for now, it
requires a full reprocess for each set of 255 tiles rather than tackling
them all in a single pass. This will still be significantly better than onl
Great info, thanks! I'll try this solution, it sounds almost perfect...
Chris Miller wrote:
>> Crap, this means I'll have to split North America into two sections
>> using Osmosis. At ground level this means broken routing between the
>> sections and that some cities/villages will be divided into
this works perfect, I have used it to get all relations into the split
files.
total split is very slow because each run extracts 4 tiles only. but
compute time is cheap :)
your improvements will be a big step forward
On Aug 4, 2009, at 10:06 AM, Chris Miller wrote:
>> Crap, this means I'll
> Crap, this means I'll have to split North America into two sections
> using Osmosis. At ground level this means broken routing between the
> sections and that some cities/villages will be divided into two...
I think you may have misunderstood me here. You can run the splitter against
as big an
Chris Miller wrote:
>> Ah, this might explain the 'POI only' tiles I have without good reason
>> in the NE USA and Canada. Is there a possibility for a quick fix for
>> this behavior, as this would be most welcome...? :-)
>
> Not really... currently the limitation is that each node, way and relati
> What currently happens when not enough memory is available is ofcourse
> that the heap is getting swapped in and out to disk by the os' memory
> manager. This is so slow that it's not workable. Do you think that
> doing this intelligently in Splitter will be much faster? I.e. would
> it actually
Chris Miller wrote:
> Note that I've started working on overhauling the splitter to try to reduce
> the memory requirements as much as possible without compromising too much
> on performance. I'm also hoping to overcome the limitations on the number
> of tiles (255) and number of areas for each
Note that I've started working on overhauling the splitter to try to reduce
the memory requirements as much as possible without compromising too much
on performance. I'm also hoping to overcome the limitations on the number
of tiles (255) and number of areas for each relation (currently 4). My p
Paul Ortyl wrote:
Could you explain why you write that
"There is a maximum of 255 output files. This should anyway be enough
with the current amount of data."
?
That statement is not true with the current splitter version. When
splitting North/South America I get 318 tiles.
Hi
> Could you please tell me which "Map" would have to be "reimplemented"?
> There was a lot of changes (and file deletions) since version 3.
Yes, it is completely different code.
The original code (r3) did nothing except attempt to save the nodes to a
berkeley db. I then gave up on it and wro
2009/7/1 Steve Ratcliffe :
> Hi
>
>> In PERL it is builtin "tie" functionality, I have, however, no idea
>> what is the used data structure in mkgmap and splitter and how to
>> translate the trick into Java.
>> If you think that the change is trivial and point me to the critical
>> section I might
Hi
> In PERL it is builtin "tie" functionality, I have, however, no idea
> what is the used data structure in mkgmap and splitter and how to
> translate the trick into Java.
> If you think that the change is trivial and point me to the critical
> section I might see if I get it implemented.
I was
2009/7/1 WessexMario :
> Instead of converting code to use RAM/hard disk, why don't you just increase
> your computer's virtual memory?
> Whether Linux (swap space) or Windows (paging file), as long as you have
> spare hard drive space you can increase virtual memory very easily, which
> will stop
Instead of converting code to use RAM/hard disk, why don't you just
increase your computer's virtual memory?
Whether Linux (swap space) or Windows (paging file), as long as you have
spare hard drive space you can increase virtual memory very easily,
which will stop program failures due to memory
2009/7/1 Lambertus :
> Just to get some figures to talk about: On an 8 core machine with 8 gb ram
> it takes about 4 hours to split North/South America en about 1 hour to split
> the rest of the world. The pre-splitting of the planet into those two large
> sections is done using Osmosis in my case.
Just to get some figures to talk about: On an 8 core machine with 8 gb
ram it takes about 4 hours to split North/South America en about 1 hour
to split the rest of the world. The pre-splitting of the planet into
those two large sections is done using Osmosis in my case.
So using a 10x processi
Hi,
I used to process the europe.osm (from geofabrik.de) file using
splitter. It does not work (in finited time) any more because of the
4GB limit of my hardware RAM.
I have a suggestion, which comes from my experience years ago, where I
had to crunch gigabytes of textual data using PERL. In th
41 matches
Mail list logo