FYI, the import wiki page is http://wiki.openstreetmap.org/wiki/Import/Catalogue/Japan_KSJ2_Import and there's separate sub-pages for each type of data. It looks like User:Tatata wrote one script being used?
On Mon, Jun 20, 2011 at 4:05 PM, Frederik Ramm <frede...@remote.org> wrote: > Hi, > > is someone on this list involved in OSM in Japan? I'll go to talk-jp with > the issue if not, but maybe the right people are reading this here also. > > I noticed that a lot of data has been imported from a "KSJ2" data set, and > this data has many tags that I consider unnecessary. > > The whole import seems to comprise about 3.5 million nodes, 680k ways, and > 9000 relations. > > 3.3 million nodes are tagged with something like > > <tag k="KSJ2:coordinate" v="32.787857 130.687672"/> > <tag k="KSJ2:lat" v="32.787857"/> > <tag k="KSJ2:long" v="130.687672"/> > > which means that the node coordinates are stored three times - once in the > node itself and twice in the tags. > > About 3.3 million objects are tagged with something like > > <tag k="note" v="National-Land Numerical Information (Railway) 2007, > MLIT Japan"/> > <tag k="note:ja" v="??????(?????)??19??????"/> > <tag k="source" v="KSJ2"/> > <tag k="source_ref" v="http://nlftp.mlit.go.jp/** > ksj/jpgis/datalist/KsjTmplt-**N02-v1_1.html<http://nlftp.mlit.go.jp/ksj/jpgis/datalist/KsjTmplt-N02-v1_1.html> > "/> > > which is a lot of text where in my opinion a simple source tag on the > changeset would have been sufficient. (The overwhelming majority of > source_ref tags, 2.9 million, point to "KsjTmplt-N03.html", but another 17 > are in use; the distribution for note:ja is similar, with two messages being > used 1.8 and 1.0 million times respectively, and a handful of others in > use.) > > 3.1 million nodes used by ways are tagged with something like > > <tag k="KSJ2:curve_id" v="c00100298"/> > <tag k="KSJ2:filename" v="N03-090320_40_new.xml"/> > > which strikes me as a bit unnecessary as well; if really required, then > that could go on the way using the nodes and not on every single node! > > In addition to that, we have 1.1 million objects tagged with > > <tag k="created_by" v="National-Land-Numerical-** > Information_MLIT_Japan"/> > > - also something that we usually but on changesets, and that seems to > duplicate information already in the source and note tags. > > There are also about 360k occurrences, on nodes used by ways, of the tags > KSJ2:INT, KSJ2:INT_label, KSJ2:LIN, KSJ2:OPC, KSJ2:RAC; I have no idea what > these are for but do they have to go on the nodes really? > > I would like to see this (in my opinion) superfluous information removed. > We would get rid of about 30 million tags. The size of the Japan dataset (in > XML form) would shrink by 13% from 13.1 to 11.5 GB, the .osm.pbf would > shrink by 14% from 585 to 501 MB. About 1 GB of database storage would be > saved on the central OSM database server. > > Needless to say, any software that processes the Japan dataset would also > run faster and consume less resources. > > Can anybody comment on this? Are any of the tags that I mentioned above > actually used by anyone for anything? > > In addition, there are 22 multipolygons from the same import, with more > than 1000 members each (the top three being #1337942 with 10865 members, > #1060553 with 5637, and #1069424 with 4518). While it is not wrong for a > multipolygon to have so many members, this makes the affected areas very > difficult to render and edit, and has the potential to bring unsuspecting > relation processing software to a halt. Most of these multipolygons cannot > even be downloaded via the API becuase it takes so long. I would like these > multipolygons (all natural=wood I believe) split up into smaller entities. > > It would be great if someone involved with the Japan community could deal > with these issues; but I would also be willing to do it myself if that's ok > with the community in Japan. > > Finally, I am unsure if the KSJ2 import is even complete; if it is not, and > still ongoing, then the numbers reported above might not even be the last > word. In that case I would like to ask whoever is masterminding the import > to maybe modify their scripts to include less superfluous tags. (Objects in > question seem to be uploaded by a variety of users so I cannot detect from > the object history alone who runs the import.) > > Bye > Frederik > > -- > Frederik Ramm ## eMail frede...@remote.org ## N49°00'09" E008°23'33" > > ______________________________**_________________ > talk mailing list > talk@openstreetmap.org > http://lists.openstreetmap.**org/listinfo/talk<http://lists.openstreetmap.org/listinfo/talk> >
_______________________________________________ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk