On Tue, Aug 21, 2012 at 2:30 AM, harish suvarna <[email protected]> wrote:
>>
>> I had not yet time to look at dbpedia 3.8. They might have changed
>> names of some dump files. Generally "instance_types" are very
>> important (this provides the information about the type of an Entity).
>> "person_data" includes additional information for persons, AFAIK those
>> information are not included in the default configuration of the
>> dbpedia indexing tool
>>
>>
> Not all language dumps have these files. Japanese, Italian also donot have
> these files. These files are listed in the readme file. Hence I was looking
> for these.
>
Types are the same for all languages. Therefore they are only
available in English.
I am no sure about "person_data" but there it might be the same.

In other words - if you build an index for a specific language you
need to include the English dumps of those that are not language
specific.

>
>> > I get a java exception.
>>
>> The included exceptions look like the RDF file containing the Chinese
>> labels is not well formatted. The experience says that this is most
>> likely related to char encoding issues. This was also the case with
>> some dbpedia 3.7 files (see the special treatment of some files in the
>> shell script of the dbpedia).
>>
>> OK. I will try to debug this.
>
>
>> You will need to have a look at the line that caused the error
>> (labels_zh.nt.bz2; [line: 6972, col: 46] Broken token:
>> http://www.w3.org/2000/01/rdf-sche). If it is indeed a encoding
>> related issue there are some linux command line utilities to check and
>> correct those issues. If you are unsure feel free to post this line
>> within this thread.
>>
>>
>> Chinese labels for the English dbpedia
>> ("http://dboedua.org/resource/{name}";) should work for that reason.
>> The Chinese version ("http://zh.dboedua.org/resource/{name}";) would
>> just provide more Entities (not more information for entities included
>> in the English version.
>>
>> "dboedua"? I dont find 
>> http://dboedua.org<http://dboedua.org/resource/%7Bname%7D>any server. Is it 
>> some keyboard mistake? (yours being a different language
> keyboard).
>

It's a typo ... it should be http://dbpedia.org and I used "{name}" as
wildcard (e.g. for "http://dbpedia.org/resource/Paris"; the {name} is
Paris).

> -harish



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to