Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-05 Thread Brett Henderson
On Sun, Oct 5, 2008 at 2:12 AM, Matt Amos [EMAIL PROTECTED] wrote:

 On Sat, Oct 4, 2008 at 9:36 AM, Florian Lohoff [EMAIL PROTECTED] wrote:
  To get the ROMA database in sync again i replaced the notes by
  broken-utf8 - As notes typically get not rendered thats not a problem
  for me though. ROMA was down for a half a day before i discovered the
  broken files and fixed them ...

 likewise. the easiest way to fix them was hand-editing the change
 files. i don't find it to be particularly onerous - just the price we
 pay for being on the bleeding edge ;-)


The corrupted data in the db has been fixed by TomH and I've re-generated
the changeset files.  Unfortunately it's too late for you guys now ...
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-04 Thread Florian Lohoff
On Sat, Oct 04, 2008 at 03:24:12PM +1000, Brett Henderson wrote:
 Another 2 change files contain utf-8 bugs and osmosis refuses to process
 them:
 
 200810031022-200810031023.osc
 200810031023-200810031024.osc
   
 I've tested both of these files and they seem okay.  The only problem I 
 can find is way 27483626 which has a broken note tag in file 
 2008100310-2008100311.osc.  Are you sure these files are broken?

wget -O - 
http://planet.openstreetmap.org/minute/200810031022-200810031023.osc.gz | gzip 
-d | iconv -f utf8 -t utf8
[...]
way id=14783001 timestamp=2008-10-03T10:22:11Z user=logictheo
  nd ref=145957773/
  nd ref=163161140/
  nd ref=146004252/
  nd ref=301736490/
  tag k=name v=Οδός Ιουστινιανού/
  tag k=created_by v=Potlatch 0.6a/
  tag k=highway v=residential/
  tag k=name:en v=Ioustinianou Street/
  tag k=note v=Ρώτησα ένα φίλο που μένει καιρό εδώ εάν αυτός ήταν 
κάποτε δρόμος. Το κοίταξα και από κοντά. Βλέπω μπάρες και στις 2 άκρες που 
είναι για να εμπ
iconv: illegal input sequence at position 16342


wget -O - 
http://planet.openstreetmap.org/minute/200810031023-200810031024.osc.gz | gzip 
-d | iconv -f utf8 -t utf8
[...]
way id=27483626 timestamp=2008-10-03T10:23:02Z user=logictheo
  nd ref=301736490/
  nd ref=145958259/
  nd ref=301736491/
  tag k=name v=Οδός Ιουστινιανού/
  tag k=created_by v=Potlatch 0.6a/
  tag k=highway v=pedestrian/
  tag k=name:en v=Ioustinianou Street/
  tag k=note v=Ρώτησα ένα φίλο που μένει καιρό εδώ εάν αυτός ήταν 
κάποτε δρόμος. Το κοίταξα και από κοντά. Βλέπω μπάρες και στις 2 άκρες που 
είναι για να εμπ
iconv: illegal input sequence at position 58891

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-04 Thread Florian Lohoff
On Sat, Oct 04, 2008 at 06:34:12PM +1000, Brett Henderson wrote:
 Subject: Re: [OSM-dev] way 27483626 UTF-8 truncation
 
 Florian Lohoff wrote:
 On Sat, Oct 04, 2008 at 03:24:12PM +1000, Brett Henderson wrote:
   
 Another 2 change files contain utf-8 bugs and osmosis refuses to process
 them:
 
 200810031022-200810031023.osc
 200810031023-200810031024.osc
  
   
 I've tested both of these files and they seem okay.  The only problem I 
 can find is way 27483626 which has a broken note tag in file 
 2008100310-2008100311.osc.  Are you sure these files are broken?
 
 
 wget -O - 
 http://planet.openstreetmap.org/minute/200810031022-200810031023.osc.gz | 
 gzip -d | iconv -f utf8 -t utf8
 [...]
 way id=14783001 timestamp=2008-10-03T10:22:11Z user=logictheo
   nd ref=145957773/
   nd ref=163161140/
   nd ref=146004252/
   nd ref=301736490/
   tag k=name v=Οδός Ιουστινιανού/
   tag k=created_by v=Potlatch 0.6a/
   tag k=highway v=residential/
   tag k=name:en v=Ioustinianou Street/
   tag k=note v=Ρώτησα ένα φίλο που μένει 
   καιρό εδώ εάν αυτός ήταν κάποτε 
   δρόμος. Το κοίταξα και από κοντά. 
   Βλέπω μπάρες και στις 2 άκρες που 
   είναι για να εμπ
 iconv: illegal input sequence at position 16342
 
 
 wget -O - 
 http://planet.openstreetmap.org/minute/200810031023-200810031024.osc.gz | 
 gzip -d | iconv -f utf8 -t utf8
 [...]
 way id=27483626 timestamp=2008-10-03T10:23:02Z user=logictheo
   nd ref=301736490/
   nd ref=145958259/
   nd ref=301736491/
   tag k=name v=Οδός Ιουστινιανού/
   tag k=created_by v=Potlatch 0.6a/
   tag k=highway v=pedestrian/
   tag k=name:en v=Ioustinianou Street/
   tag k=note v=Ρώτησα ένα φίλο που μένει 
   καιρό εδώ εάν αυτός ήταν κάποτε 
   δρόμος. Το κοίταξα και από κοντά. 
   Βλέπω μπάρες και στις 2 άκρες που 
   είναι για να εμπ
 iconv: illegal input sequence at position 58891
 
 Flo
   
 Ah, sorry. I misread your first email. I didn't realise you were 
 referring to minute changesets. I didn't realise there were two errors 
 in that hourly file. I have to leave now, I'll try to take another look 
 tomorrow morning (approx 15 hours from now).

To get the ROMA database in sync again i replaced the notes by
broken-utf8 - As notes typically get not rendered thats not a problem
for me though. ROMA was down for a half a day before i discovered the
broken files and fixed them ...

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-04 Thread Matt Amos
On Sat, Oct 4, 2008 at 9:36 AM, Florian Lohoff [EMAIL PROTECTED] wrote:
 To get the ROMA database in sync again i replaced the notes by
 broken-utf8 - As notes typically get not rendered thats not a problem
 for me though. ROMA was down for a half a day before i discovered the
 broken files and fixed them ...

likewise. the easiest way to fix them was hand-editing the change
files. i don't find it to be particularly onerous - just the price we
pay for being on the bleeding edge ;-)

cheers,

matt

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Florian Lohoff
On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
 Subject: [OSM-dev] way 27483626 UTF-8 truncation
 
 i just noticed that the hourly change file
 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
 tag for way 27483626 (
 http://www.openstreetmap.org/browse/way/27483626/history ). i have
 trunctated it to the nearest word, so this email is just to give
 forewarning that hourly or daily diff imports today might have a bit
 of trouble.
 
 its the same problem as discussed here
 http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html

Another 2 change files contain utf-8 bugs and osmosis refuses to process
them:

200810031022-200810031023.osc
200810031023-200810031024.osc

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Brett Henderson
Florian Lohoff wrote:
 On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
   
 Subject: [OSM-dev] way 27483626 UTF-8 truncation

 i just noticed that the hourly change file
 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
 tag for way 27483626 (
 http://www.openstreetmap.org/browse/way/27483626/history ). i have
 trunctated it to the nearest word, so this email is just to give
 forewarning that hourly or daily diff imports today might have a bit
 of trouble.

 its the same problem as discussed here
 http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html
 

 Another 2 change files contain utf-8 bugs and osmosis refuses to process
 them:

 200810031022-200810031023.osc
 200810031023-200810031024.osc
   
Any idea which nodes or ways are broken in these?

This isn't an osmosis bug.  The database now has incorrect/corrupted tag 
data in the history tables that needs to be corrected.  Following the URL:

http://www.openstreetmap.org/browse/way/27483626/history

results in random results from the API.

If we can identity the broken records we can ask TomH nicely to fix 
them.  I can then move osmosis backwards in time to re-generate the 
affected time period.  I don't know how this broken data gets created in 
the first place.  There was some discussion about this the last time it 
happened, I'll have to try to dig up the emails.

It's not simple to fix osmosis to prevent this occurring.  Osmosis is 
reading doubly encoded data from the database and removing the double 
encoding as it writes to the xml file.  It's a hack and there is no 
simple way of verifying the data before it gets written to the file.  I 
have a local process running at home verifying the output which has 
detected the problem, but I was asleep at the time it occurred :-)


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Brett Henderson
Florian Lohoff wrote:
 On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
   
 Subject: [OSM-dev] way 27483626 UTF-8 truncation

 i just noticed that the hourly change file
 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
 tag for way 27483626 (
 http://www.openstreetmap.org/browse/way/27483626/history ). i have
 trunctated it to the nearest word, so this email is just to give
 forewarning that hourly or daily diff imports today might have a bit
 of trouble.

 its the same problem as discussed here
 http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html
 

 Another 2 change files contain utf-8 bugs and osmosis refuses to process
 them:

 200810031022-200810031023.osc
 200810031023-200810031024.osc
   
I've tested both of these files and they seem okay.  The only problem I 
can find is way 27483626 which has a broken note tag in file 
2008100310-2008100311.osc.  Are you sure these files are broken?


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Brett Henderson
Brett Henderson wrote:
 Florian Lohoff wrote:
 On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
  
 Subject: [OSM-dev] way 27483626 UTF-8 truncation

 i just noticed that the hourly change file
 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
 tag for way 27483626 (
 http://www.openstreetmap.org/browse/way/27483626/history ). i have
 trunctated it to the nearest word, so this email is just to give
 forewarning that hourly or daily diff imports today might have a bit
 of trouble.

 its the same problem as discussed here
 http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html
 

 Another 2 change files contain utf-8 bugs and osmosis refuses to process
 them:

 200810031022-200810031023.osc
 200810031023-200810031024.osc
   
 I've tested both of these files and they seem okay.  The only problem 
 I can find is way 27483626 which has a broken note tag in file 
 2008100310-2008100311.osc.  Are you sure these files are broken?


I've sent an email to TomH asking if he can fix the problematic tag.  If 
anybody has any other ideas on how to update the db let me know.

Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev