pray tell how do you process the compressed bzip data? is there documentation on this? thanks,
On Sat, Mar 13, 2010 at 9:51 PM, Nic Roets <[email protected]> wrote: > No. It runs on the uncompressed planet, like this : > bzcat /osm/planet-10*.osm.bz2 | /osm/gosmore/bboxSplit \ > -85.05113 73.12500 9.44906 180.00000 gzip 0720048510241024.osm.gz \ > -25.48295 120.58594 72.91964 180.00000 gzip 0855020310240587.osm.gz \ > -85.05113 98.43750 13.23995 172.61719 gzip 0792047410031024.osm.gz \ > ... > > I'm not too worried about further optimizations: Unlike wikipedia, > there isn't the same urgency to have up-to-date. Except for disaster > relief. > > > On Sat, Mar 13, 2010 at 10:42 PM, [email protected] > <[email protected]> wrote: > > you are bunziping the code ? you are scanning the bzip blocks? > > it is faster than the bunzip. But maybe you mean that it is very fast. > > > > I have experimented with bziprecover to extract blocks on their own, > > i made a perl script to extract blocks from a wikipedia file that can be > > used to run the processing of the huge file by many people in parallel. > > > > > https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia<https://code.launchpad.net/%7Ejamesmikedupont/+junk/openstreetmap-wikipedia> > > > > It is a tool to extract lat/long coords from the wikipedia articles. > > > > Such a processing of the large files would allow us to team up and all > help. > > We really need to just have an index file of all the blocks so that we > can > > find the ones that we need. Imagine being able to process the bzip file > > directly! > > > > mike > > > > On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets <[email protected]> wrote: > >> > >> Hello James, > >> > >> I wanted to split the planet into overlapping bboxes like this (click > >> to see actual size): > >> http://dev.openstreetmap.de/gosmore/ > >> > >> On talk I described how I was dissatisfied with osmosis's memory > >> consumption. So I came up with this observation: Most entities will > >> end up in one or two extracts. And when it's two, it's in a pattern > >> that is often repeated, say Africa bbox and Middle East bbox. Never > >> Africa and Canada. So of the 2^168 possible combinations only around > >> 3000 is actually used. > >> > >> So bboxSplit allocates 16 bits for each entity. Those are then indexes > >> into the array of 'youniouns'. If a new node comes along, I check it > >> against list of bboxes and it typically matches 1 or 2. So to find out > >> quickly if I already have that combination of bboxes, I also have an > >> STL map on the array of younions. A hashtable would have been faster. > >> > >> Ways and relations also trigger the code that merge younions. > >> > >> bboxSplit is faster than the corresponding bunzip and any program that > >> uses libxml, i.e. very fast. > >> > >> Regards, > >> Nic > >> > >> On Sat, Mar 13, 2010 at 10:03 PM, [email protected] > >> <[email protected]> wrote: > >> > That is very deep c++ code! > >> > care to comment on how it works? > >> > would be very interested to understand its performance ! looks very > >> > fast. > >> > mike > >> > > >> > On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets <[email protected]> wrote: > >> >> > >> >> My understanding is that all Xml compliant* parsers will abort at the > >> >> file offsets that Frederik mentions. > >> >> My advice is to use the egrep filter when in doubt, because you will > >> >> loose no more than a dozen lines in a planet file of billions of > >> >> lines. > >> >> > >> >> *: (My split program is not compliant and will happily ignore these > >> >> errors: > >> >> > >> >> > >> >> > http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp > ) > >> >> > >> >> On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell < > [email protected]> > >> >> wrote: > >> >> > Will this also be a problem if you try to import via osm2pgsql into > >> >> > postgres? > >> >> > > >> >> > Thanks, > >> >> > > >> >> > John > >> >> > > >> >> > On 3/13/10, hbogner <[email protected]> wrote: > >> >> >> Thx for help, I'll try it. > >> >> >> > >> >> >> Now I have to follow 'dev' too :D > >> >> >> > >> >> >> Nic Roets wrote: > >> >> >>> There's a bug in the code that generated this week's planet. You > >> >> >>> should either wait until next week or filter the planet with the > >> >> >>> following command: > >> >> >>> bzcat /osm/planet-10*.osm.bz2 |egrep -v '&#[0-9]*;'|... > >> >> >>> > >> >> >>> There has been a long discussion on 'dev', mentioning other > >> >> >>> remedies. > >> >> >>> > >> >> >> > >> >> >> > >> >> >> _______________________________________________ > >> >> >> talk mailing list > >> >> >> [email protected] > >> >> >> http://lists.openstreetmap.org/listinfo/talk > >> >> >> > >> >> > > >> >> > > >> >> > -- > >> >> > John J. Mitchell > >> >> > > >> >> > _______________________________________________ > >> >> > talk mailing list > >> >> > [email protected] > >> >> > http://lists.openstreetmap.org/listinfo/talk > >> >> > > >> >> > >> >> _______________________________________________ > >> >> talk mailing list > >> >> [email protected] > >> >> http://lists.openstreetmap.org/listinfo/talk > >> > > >> > > > > > >
_______________________________________________ talk mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk

