On Tue, Aug 21, 2012 at 2:30 AM, harish suvarna <[email protected]> wrote: >> >> I had not yet time to look at dbpedia 3.8. They might have changed >> names of some dump files. Generally "instance_types" are very >> important (this provides the information about the type of an Entity). >> "person_data" includes additional information for persons, AFAIK those >> information are not included in the default configuration of the >> dbpedia indexing tool >> >> > Not all language dumps have these files. Japanese, Italian also donot have > these files. These files are listed in the readme file. Hence I was looking > for these. > Types are the same for all languages. Therefore they are only available in English. I am no sure about "person_data" but there it might be the same.
In other words - if you build an index for a specific language you need to include the English dumps of those that are not language specific. > >> > I get a java exception. >> >> The included exceptions look like the RDF file containing the Chinese >> labels is not well formatted. The experience says that this is most >> likely related to char encoding issues. This was also the case with >> some dbpedia 3.7 files (see the special treatment of some files in the >> shell script of the dbpedia). >> >> OK. I will try to debug this. > > >> You will need to have a look at the line that caused the error >> (labels_zh.nt.bz2; [line: 6972, col: 46] Broken token: >> http://www.w3.org/2000/01/rdf-sche). If it is indeed a encoding >> related issue there are some linux command line utilities to check and >> correct those issues. If you are unsure feel free to post this line >> within this thread. >> >> >> Chinese labels for the English dbpedia >> ("http://dboedua.org/resource/{name}") should work for that reason. >> The Chinese version ("http://zh.dboedua.org/resource/{name}") would >> just provide more Entities (not more information for entities included >> in the English version. >> >> "dboedua"? I dont find >> http://dboedua.org<http://dboedua.org/resource/%7Bname%7D>any server. Is it >> some keyboard mistake? (yours being a different language > keyboard). > It's a typo ... it should be http://dbpedia.org and I used "{name}" as wildcard (e.g. for "http://dbpedia.org/resource/Paris" the {name} is Paris). > -harish -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
