Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-05 Thread Brett Henderson
On Sun, Oct 5, 2008 at 2:12 AM, Matt Amos <[EMAIL PROTECTED]> wrote:

> On Sat, Oct 4, 2008 at 9:36 AM, Florian Lohoff <[EMAIL PROTECTED]> wrote:
> > To get the ROMA database in sync again i replaced the notes by
> > "broken-utf8" - As notes typically get not rendered thats not a problem
> > for me though. ROMA was down for a half a day before i discovered the
> > broken files and fixed them ...
>
> likewise. the easiest way to fix them was hand-editing the change
> files. i don't find it to be particularly onerous - just the price we
> pay for being on the bleeding edge ;-)


The corrupted data in the db has been fixed by TomH and I've re-generated
the changeset files.  Unfortunately it's too late for you guys now ...
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-04 Thread Matt Amos
On Sat, Oct 4, 2008 at 9:36 AM, Florian Lohoff <[EMAIL PROTECTED]> wrote:
> To get the ROMA database in sync again i replaced the notes by
> "broken-utf8" - As notes typically get not rendered thats not a problem
> for me though. ROMA was down for a half a day before i discovered the
> broken files and fixed them ...

likewise. the easiest way to fix them was hand-editing the change
files. i don't find it to be particularly onerous - just the price we
pay for being on the bleeding edge ;-)

cheers,

matt

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-04 Thread Florian Lohoff
On Sat, Oct 04, 2008 at 06:34:12PM +1000, Brett Henderson wrote:
> Subject: Re: [OSM-dev] way 27483626 UTF-8 truncation
> 
> Florian Lohoff wrote:
> >On Sat, Oct 04, 2008 at 03:24:12PM +1000, Brett Henderson wrote:
> >  
> >>>Another 2 change files contain utf-8 bugs and osmosis refuses to process
> >>>them:
> >>>
> >>>200810031022-200810031023.osc
> >>>200810031023-200810031024.osc
> >>> 
> >>>  
> >>I've tested both of these files and they seem okay.  The only problem I 
> >>can find is way 27483626 which has a broken "note" tag in file 
> >>2008100310-2008100311.osc.  Are you sure these files are broken?
> >>
> >
> >wget -O - 
> >http://planet.openstreetmap.org/minute/200810031022-200810031023.osc.gz | 
> >gzip -d | iconv -f utf8 -t utf8
> >[...]
> >
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  http://planet.openstreetmap.org/minute/200810031023-200810031024.osc.gz | 
> >gzip -d | iconv -f utf8 -t utf8
> >[...]
> >
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  

signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-04 Thread Brett Henderson
Florian Lohoff wrote:
> On Sat, Oct 04, 2008 at 03:24:12PM +1000, Brett Henderson wrote:
>   
>>> Another 2 change files contain utf-8 bugs and osmosis refuses to process
>>> them:
>>>
>>> 200810031022-200810031023.osc
>>> 200810031023-200810031024.osc
>>>  
>>>   
>> I've tested both of these files and they seem okay.  The only problem I 
>> can find is way 27483626 which has a broken "note" tag in file 
>> 2008100310-2008100311.osc.  Are you sure these files are broken?
>> 
>
> wget -O - 
> http://planet.openstreetmap.org/minute/200810031022-200810031023.osc.gz | 
> gzip -d | iconv -f utf8 -t utf8
> [...]
> 
>   
>   
>   
>   
>   
>   
>   
>   
>   http://planet.openstreetmap.org/minute/200810031023-200810031024.osc.gz | 
> gzip -d | iconv -f utf8 -t utf8
> [...]
> 
>   
>   
>   
>   
>   
>   
>   
>   http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-04 Thread Florian Lohoff
On Sat, Oct 04, 2008 at 03:24:12PM +1000, Brett Henderson wrote:
> >Another 2 change files contain utf-8 bugs and osmosis refuses to process
> >them:
> >
> >200810031022-200810031023.osc
> >200810031023-200810031024.osc
> >  
> I've tested both of these files and they seem okay.  The only problem I 
> can find is way 27483626 which has a broken "note" tag in file 
> 2008100310-2008100311.osc.  Are you sure these files are broken?

wget -O - 
http://planet.openstreetmap.org/minute/200810031022-200810031023.osc.gz | gzip 
-d | iconv -f utf8 -t utf8
[...]

  
  
  
  
  
  
  
  
  http://planet.openstreetmap.org/minute/200810031023-200810031024.osc.gz | gzip 
-d | iconv -f utf8 -t utf8
[...]

  
  
  
  
  
  
  
  
signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Brett Henderson
Brett Henderson wrote:
> Florian Lohoff wrote:
>> On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
>>  
>>> Subject: [OSM-dev] way 27483626 UTF-8 truncation
>>>
>>> i just noticed that the hourly change file
>>> 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
>>> tag for way 27483626 (
>>> http://www.openstreetmap.org/browse/way/27483626/history ). i have
>>> trunctated it to the nearest word, so this email is just to give
>>> forewarning that hourly or daily diff imports today might have a bit
>>> of trouble.
>>>
>>> its the same problem as discussed here
>>> http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html
>>> 
>>
>> Another 2 change files contain utf-8 bugs and osmosis refuses to process
>> them:
>>
>> 200810031022-200810031023.osc
>> 200810031023-200810031024.osc
>>   
> I've tested both of these files and they seem okay.  The only problem 
> I can find is way 27483626 which has a broken "note" tag in file 
> 2008100310-2008100311.osc.  Are you sure these files are broken?
>
>
I've sent an email to TomH asking if he can fix the problematic tag.  If 
anybody has any other ideas on how to update the db let me know.

Brett


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Brett Henderson
Florian Lohoff wrote:
> On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
>   
>> Subject: [OSM-dev] way 27483626 UTF-8 truncation
>>
>> i just noticed that the hourly change file
>> 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
>> tag for way 27483626 (
>> http://www.openstreetmap.org/browse/way/27483626/history ). i have
>> trunctated it to the nearest word, so this email is just to give
>> forewarning that hourly or daily diff imports today might have a bit
>> of trouble.
>>
>> its the same problem as discussed here
>> http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html
>> 
>
> Another 2 change files contain utf-8 bugs and osmosis refuses to process
> them:
>
> 200810031022-200810031023.osc
> 200810031023-200810031024.osc
>   
I've tested both of these files and they seem okay.  The only problem I 
can find is way 27483626 which has a broken "note" tag in file 
2008100310-2008100311.osc.  Are you sure these files are broken?


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Brett Henderson
Florian Lohoff wrote:
> On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
>   
>> Subject: [OSM-dev] way 27483626 UTF-8 truncation
>>
>> i just noticed that the hourly change file
>> 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
>> tag for way 27483626 (
>> http://www.openstreetmap.org/browse/way/27483626/history ). i have
>> trunctated it to the nearest word, so this email is just to give
>> forewarning that hourly or daily diff imports today might have a bit
>> of trouble.
>>
>> its the same problem as discussed here
>> http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html
>> 
>
> Another 2 change files contain utf-8 bugs and osmosis refuses to process
> them:
>
> 200810031022-200810031023.osc
> 200810031023-200810031024.osc
>   
Any idea which nodes or ways are broken in these?

This isn't an osmosis bug.  The database now has incorrect/corrupted tag 
data in the history tables that needs to be corrected.  Following the URL:

http://www.openstreetmap.org/browse/way/27483626/history

results in random results from the API.

If we can identity the broken records we can ask TomH nicely to fix 
them.  I can then move osmosis backwards in time to re-generate the 
affected time period.  I don't know how this broken data gets created in 
the first place.  There was some discussion about this the last time it 
happened, I'll have to try to dig up the emails.

It's not simple to fix osmosis to prevent this occurring.  Osmosis is 
reading doubly encoded data from the database and removing the double 
encoding as it writes to the xml file.  It's a hack and there is no 
simple way of verifying the data before it gets written to the file.  I 
have a local process running at home verifying the output which has 
detected the problem, but I was asleep at the time it occurred :-)


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Florian Lohoff
On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
> Subject: [OSM-dev] way 27483626 UTF-8 truncation
> 
> i just noticed that the hourly change file
> 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
> tag for way 27483626 (
> http://www.openstreetmap.org/browse/way/27483626/history ). i have
> trunctated it to the nearest word, so this email is just to give
> forewarning that hourly or daily diff imports today might have a bit
> of trouble.
> 
> its the same problem as discussed here
> http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html

Another 2 change files contain utf-8 bugs and osmosis refuses to process
them:

200810031022-200810031023.osc
200810031023-200810031024.osc

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] way 27483626 UTF-8 truncation

2008-10-03 Thread Matt Amos
i just noticed that the hourly change file
2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
tag for way 27483626 (
http://www.openstreetmap.org/browse/way/27483626/history ). i have
trunctated it to the nearest word, so this email is just to give
forewarning that hourly or daily diff imports today might have a bit
of trouble.

its the same problem as discussed here
http://lists.openstreetmap.org/pipermail/dev/2008-August/011525.html

cheers,

matt

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev