FYI, the import wiki page is and
there's separate sub-pages for each type of data.  It looks like User:Tatata
wrote one script being used?

On Mon, Jun 20, 2011 at 4:05 PM, Frederik Ramm <> wrote:

> Hi,
>   is someone on this list involved in OSM in Japan? I'll go to talk-jp with
> the issue if not, but maybe the right people are reading this here also.
> I noticed that a lot of data has been imported from a "KSJ2" data set, and
> this data has many tags that I consider unnecessary.
> The whole import seems to comprise about 3.5 million nodes, 680k ways, and
> 9000 relations.
> 3.3 million nodes are tagged with something like
>    <tag k="KSJ2:coordinate" v="32.787857 130.687672"/>
>    <tag k="KSJ2:lat" v="32.787857"/>
>    <tag k="KSJ2:long" v="130.687672"/>
> which means that the node coordinates are stored three times - once in the
> node itself and twice in the tags.
> About 3.3 million objects are tagged with something like
>    <tag k="note" v="National-Land Numerical Information (Railway) 2007,
> MLIT Japan"/>
>    <tag k="note:ja" v="??????(?????)??19??????"/>
>    <tag k="source" v="KSJ2"/>
>    <tag k="source_ref" v="**
> ksj/jpgis/datalist/KsjTmplt-**N02-v1_1.html<>
> "/>
> which is a lot of text where in my opinion a simple source tag on the
> changeset would have been sufficient. (The overwhelming majority of
> source_ref tags, 2.9 million, point to "KsjTmplt-N03.html", but another 17
> are in use; the distribution for note:ja is similar, with two messages being
> used 1.8 and 1.0 million times respectively, and a handful of others in
> use.)
> 3.1 million nodes used by ways are tagged with something like
>    <tag k="KSJ2:curve_id" v="c00100298"/>
>    <tag k="KSJ2:filename" v="N03-090320_40_new.xml"/>
> which strikes me as a bit unnecessary as well; if really required, then
> that could go on the way using the nodes and not on every single node!
> In addition to that, we have 1.1 million objects tagged with
>    <tag k="created_by" v="National-Land-Numerical-**
> Information_MLIT_Japan"/>
> - also something that we usually but on changesets, and that seems to
> duplicate information already in the source and note tags.
> There are also about 360k occurrences, on nodes used by ways, of the tags
> KSJ2:INT, KSJ2:INT_label, KSJ2:LIN, KSJ2:OPC, KSJ2:RAC; I have no idea what
> these are for but do they have to go on the nodes really?
> I would like to see this (in my opinion) superfluous information removed.
> We would get rid of about 30 million tags. The size of the Japan dataset (in
> XML form) would shrink by 13% from 13.1 to 11.5 GB, the .osm.pbf would
> shrink by 14% from 585 to 501 MB. About 1 GB of database storage would be
> saved on the central OSM database server.
> Needless to say, any software that processes the Japan dataset would also
> run faster and consume less resources.
> Can anybody comment on this? Are any of the tags that I mentioned above
> actually used by anyone for anything?
> In addition, there are 22 multipolygons from the same import, with more
> than 1000 members each (the top three being #1337942 with 10865 members,
> #1060553 with 5637, and #1069424 with 4518). While it is not wrong for a
> multipolygon to have so many members, this makes the affected areas very
> difficult to render and edit, and has the potential to bring unsuspecting
> relation processing software to a halt. Most of these multipolygons cannot
> even be downloaded via the API becuase it takes so long. I would like these
> multipolygons (all natural=wood I believe) split up into smaller entities.
> It would be great if someone involved with the Japan community could deal
> with these issues; but I would also be willing to do it myself if that's ok
> with the community in Japan.
> Finally, I am unsure if the KSJ2 import is even complete; if it is not, and
> still ongoing, then the numbers reported above might not even be the last
> word. In that case I would like to ask whoever is masterminding the import
> to maybe modify their scripts to include less superfluous tags. (Objects in
> question seem to be uploaded by a variety of users so I cannot detect from
> the object history alone who runs the import.)
> Bye
> Frederik
> --
> Frederik Ramm  ##  eMail  ##  N49°00'09" E008°23'33"
> ______________________________**_________________
> talk mailing list
> http://lists.openstreetmap.**org/listinfo/talk<>
talk mailing list

Reply via email to