Hello James, I wanted to split the planet into overlapping bboxes like this (click to see actual size): http://dev.openstreetmap.de/gosmore/
On talk I described how I was dissatisfied with osmosis's memory consumption. So I came up with this observation: Most entities will end up in one or two extracts. And when it's two, it's in a pattern that is often repeated, say Africa bbox and Middle East bbox. Never Africa and Canada. So of the 2^168 possible combinations only around 3000 is actually used. So bboxSplit allocates 16 bits for each entity. Those are then indexes into the array of 'youniouns'. If a new node comes along, I check it against list of bboxes and it typically matches 1 or 2. So to find out quickly if I already have that combination of bboxes, I also have an STL map on the array of younions. A hashtable would have been faster. Ways and relations also trigger the code that merge younions. bboxSplit is faster than the corresponding bunzip and any program that uses libxml, i.e. very fast. Regards, Nic On Sat, Mar 13, 2010 at 10:03 PM, [email protected] <[email protected]> wrote: > That is very deep c++ code! > care to comment on how it works? > would be very interested to understand its performance ! looks very fast. > mike > > On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets <[email protected]> wrote: >> >> My understanding is that all Xml compliant* parsers will abort at the >> file offsets that Frederik mentions. >> My advice is to use the egrep filter when in doubt, because you will >> loose no more than a dozen lines in a planet file of billions of >> lines. >> >> *: (My split program is not compliant and will happily ignore these >> errors: >> >> http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp) >> >> On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell <[email protected]> >> wrote: >> > Will this also be a problem if you try to import via osm2pgsql into >> > postgres? >> > >> > Thanks, >> > >> > John >> > >> > On 3/13/10, hbogner <[email protected]> wrote: >> >> Thx for help, I'll try it. >> >> >> >> Now I have to follow 'dev' too :D >> >> >> >> Nic Roets wrote: >> >>> There's a bug in the code that generated this week's planet. You >> >>> should either wait until next week or filter the planet with the >> >>> following command: >> >>> bzcat /osm/planet-10*.osm.bz2 |egrep -v '&#[0-9]*;'|... >> >>> >> >>> There has been a long discussion on 'dev', mentioning other remedies. >> >>> >> >> >> >> >> >> _______________________________________________ >> >> talk mailing list >> >> [email protected] >> >> http://lists.openstreetmap.org/listinfo/talk >> >> >> > >> > >> > -- >> > John J. Mitchell >> > >> > _______________________________________________ >> > talk mailing list >> > [email protected] >> > http://lists.openstreetmap.org/listinfo/talk >> > >> >> _______________________________________________ >> talk mailing list >> [email protected] >> http://lists.openstreetmap.org/listinfo/talk > > _______________________________________________ talk mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk

