On Mon, Aug 20, 2012 at 9:30 PM, Rupert Westenthaler < [email protected]> wrote:
> On Tue, Aug 21, 2012 at 2:30 AM, harish suvarna <[email protected]> > wrote: > >> > >> I had not yet time to look at dbpedia 3.8. They might have changed > >> names of some dump files. Generally "instance_types" are very > >> important (this provides the information about the type of an Entity). > >> "person_data" includes additional information for persons, AFAIK those > >> information are not included in the default configuration of the > >> dbpedia indexing tool > >> > >> > > Not all language dumps have these files. Japanese, Italian also donot > have > > these files. These files are listed in the readme file. Hence I was > looking > > for these. > > > Types are the same for all languages. Therefore they are only > available in English. > I am no sure about "person_data" but there it might be the same. > > In other words - if you build an index for a specific language you > need to include the English dumps of those that are not language > specific. > > >>> I will try this. Thanks a lot. > > > >> > I get a java exception. > >> > >> The included exceptions look like the RDF file containing the Chinese > >> labels is not well formatted. The experience says that this is most > >> likely related to char encoding issues. This was also the case with > >> some dbpedia 3.7 files (see the special treatment of some files in the > >> shell script of the dbpedia). > >> > >> OK. I will try to debug this. > > > >>>> I converted the labels_zh.nt to utf-8 using ms word. MS word adds the bom bytes though. I needed to remove the bom bytes. Then lables_ZH.NT WENT THROUGH. But long abstracts has same problem. So I am still working on these other files. Thanks a lot for all your patience and all stanbol teachings. -- Thanks Harish
