Well, I'm not affiliated with Linked Geo Data, but have already looked at way too many RDF-related encoding problems in my life, so why not look at one more ...

It is indeed a problem in Linked Geo Data.

The Moscow resource
  http://linkedgeodata.org/triplify/node/27503927

has the following value for the :name property, in N-Triples:

"\u00D0\u009C\u00D0\u00BE\u00D1\u0081\u00D0\u00BA \u00D0\u00B2\u00D0\u00B0"

These are characters escaped with the \u notation of N-Triples. If one decodes the characters, this is garbage: Москва

I guess the problem is that the Linked Geo Data code messes up an UTF-8 encoded input stream that comes from the input dataset. It looks like the original stream contained bytes (hexadecimal)

  D0 9C D0 BE D1 81 D0 BA D0 B2 D0 B0

If interpreted as a UTF-8 encoded Unicode string, this is: Москва

Now apparently in Linked Geo Data this byte sequence was escaped into \u notation simply by prepending \u00 to every byte. That doesn't work. One actually has to decode the UTF-8 into Unicode characters first, and then escape them one by one, resulting in:
  "\u041C\u043E\u0441\u043A\u0432\u0430"

A well-tested PHP implementation of this string escaping for N-Triples is available in DBpedia as RDFliteral::escape():
  
http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/extraction/core/RDFliteral.php?view=markup

Best,
Richard



On 22 Mar 2010, at 10:33, Hugh Williams wrote:

Hi Mitko/Alexander,

Perhaps someone on the Linked Geo Data group I have added to this reply, can comment ?

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 22 Mar 2010, at 10:21, Mitko Iliev wrote:

The problem is in LinkedGeoData dataset. Can be reproduced  with :

ttlp (http_get ('http://linkedgeodata.org/triplify/node/27503927'), '', 'http://linkedgeodata.org/triplify/node/27503927'); and query : select * where { <http://linkedgeodata.org/triplify/node/27503927#id > ?y ?z . }

Best Regards,
Mitko


On Mar 20, 2010, at 9:33 PM, Alexander Sidorov wrote:

Hm... Look at this query results:

SELECT ?s ?p ?o ?name
WHERE
{
?s ?p ?o .
?s a <http://linkedgeodata.org/vocabulary#city> .
?o bif:contains '"moscow"' .
OPTIONAL
{
 ?s <http://linkedgeodata.org/vocabulary#name> ?name
}
}

Do you see "Москва" as name? I see some strange symbols despite I see correct cyrillic symbols at your query results. Looks like LinkedGeoData specific problem.


2010/3/17 Mitko Iliev <[email protected]>
Hi Alexander,

The sparql endpoint returns UTF8, also the experiments shows proper encoding, for example try to execute : SELECT ?o WHERE {<http://dbpedia.org/resource/Moscow> rdfs:label ? o . filter (lang(?o) = 'ru' ) }
or
SELECT ?o WHERE { ?s ?p ?o . ?o bif:contains '"Москва"' } limit 100 against http://lod.openlinksw.com/sparql . both returns readable content.

If your query executed on endpoint above returns bad utf8 please give us the query so we can debug what happens, otherwise a possible problem is at client side re-coding the response or reading it as narrow charset.

Best Regards,
Mitko


On Mar 17, 2010, at 3:54 AM, Alexander Sidorov wrote:

Hi Hugh,

As I remember ADO.NET encoding bug was fixed (I haven't checked because it has no sense while other Entity Framework bug you know about is not fixed).

But this problem has no relation to ADO.NET. As I haven't yet deployed my application to Amazon EC2, I execute geo queries using lod.openlinksw.com/sparql endpoint using SPARQL protocol (but not using database directly). Here are my screen shots:
1. Manchester: http://img171.imageshack.us/img171/5568/manchesterk.png
2. Moscow: http://img204.imageshack.us/img204/7850/moscow.png

Regards,
Alexander

2010/3/17 Hugh Williams <[email protected]>
Hi Alexander,

Is this the encoding issue with the ADO.Net Provider you reported previously as that is the only one I am aware of, which is still to be resolved ?

Note, their is a 40K limit on the size of emails to this mailing list thus your mail with attachment which exceeded this limit was with held pending approval initially. Please place such attachments on a remote server and provide links in your mails in future ...

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 17 Mar 2010, at 00:27, Alexander Sidorov wrote:

Hello!

I have already asked about LOD encoding problems before but no feedback followed. To be more expressive I have attached my application's screen shots with information about Manchester (english symbols - everything is okay) and Moscow (russian symbols are displayed incorrectly).

Regards,
Alexander
< Manchester .png > < Moscow .png > ------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev_______________________________________________
Virtuoso-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev_______________________________________________
Virtuoso-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


--
Mitko Iliev
Developer Virtuoso Team
OpenLink Software
http://www.openlinksw.com/virtuoso
Cross Platform Web Services Middleware




--
Mitko Iliev
Developer Virtuoso Team
OpenLink Software
http://www.openlinksw.com/virtuoso
Cross Platform Web Services Middleware


--
You received this message because you are subscribed to the Google Groups "Linked Geo Data" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected] . For more options, visit this group at http://groups.google.com/group/linked-geo-data?hl=en .



Reply via email to