Hi,
Usually when one does not get the expected results it is related to
data contained by the dbpedia referenced site. So I will try to
provide some information on how to best debug what is happening.
Can you maybe provide data for some Entities by providing the results
of a Entityhub
query such as
curl -H "Accept: application/rdf+xml" \
"http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Paris"
You can also use an other Entity as Paris if it is more representative
for your data.
An other interesting thing todo is
1) staring Stanbol in the DEBUG modus (by adding the "-l DEBUG" option
when starting)
2) send a Document to the Enhancer
3) now you should see the used Solr Queries in the log (you might need
to filter the extensive logging for the component
"org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory"
4) check those queries manually by sending them to
http://localhost:8080/solr/default/dbpedia/select?q=
BTW: You can also look at data stored in the Solr Index by requesting
a Document via its URI e.g.
http://localhost:8080/solr/default/dbpedia/select?q=uri:http\://dbpedia.org/resource/Paris
This should help in looking into your issue.
best
Rupert
On Thu, Aug 23, 2012 at 12:57 AM, harish suvarna <[email protected]> wrote:
> I am finally successfull after converting some chinese dbpedia dump files
> to utf8. But I can't hit any dbpedia links in stanbol using this solr dump.
> I am just wondering whether I should pre-process the chinese dbpedia dump
> files. I uploaded the new jar file successfully as a new bundle. Then I
> defined a new engine using this reference site 'dbpedia'. I donot have any
> other dbpedia solr dump. The chain says it is active and all 3 engines are
> available.
> If I put the dbpedia solr index from Ogrisel (1.19GB), it works fine. I get
> some dbpedia links.
> Am I missing anything
> else?<http://localhost:8080/system/console/bundles/179>I did add the
> instance_types and person_data from english dump.
>
> -harish
>
>
>
> On Tue, Aug 21, 2012 at 6:22 PM, harish suvarna <[email protected]> wrote:
>
>>
>>
>> On Mon, Aug 20, 2012 at 9:30 PM, Rupert Westenthaler <
>> [email protected]> wrote:
>>
>>> On Tue, Aug 21, 2012 at 2:30 AM, harish suvarna <[email protected]>
>>> wrote:
>>> >>
>>> >> I had not yet time to look at dbpedia 3.8. They might have changed
>>> >> names of some dump files. Generally "instance_types" are very
>>> >> important (this provides the information about the type of an Entity).
>>> >> "person_data" includes additional information for persons, AFAIK those
>>> >> information are not included in the default configuration of the
>>> >> dbpedia indexing tool
>>> >>
>>> >>
>>> > Not all language dumps have these files. Japanese, Italian also donot
>>> have
>>> > these files. These files are listed in the readme file. Hence I was
>>> looking
>>> > for these.
>>> >
>>> Types are the same for all languages. Therefore they are only
>>> available in English.
>>> I am no sure about "person_data" but there it might be the same.
>>>
>>> In other words - if you build an index for a specific language you
>>> need to include the English dumps of those that are not language
>>> specific.
>>>
>>> >>> I will try this. Thanks a lot.
>>
>>> >
>>> >> > I get a java exception.
>>> >>
>>> >> The included exceptions look like the RDF file containing the Chinese
>>> >> labels is not well formatted. The experience says that this is most
>>> >> likely related to char encoding issues. This was also the case with
>>> >> some dbpedia 3.7 files (see the special treatment of some files in the
>>> >> shell script of the dbpedia).
>>> >>
>>> >> OK. I will try to debug this.
>>> >
>>>
>> >>>>
>>
>> I converted the labels_zh.nt to utf-8 using ms word. MS word adds the bom
>> bytes though. I needed to remove the bom bytes.
>> Then lables_ZH.NT WENT THROUGH. But long abstracts has same problem. So I
>> am still working on these other files.
>> Thanks a lot for all your patience and all stanbol teachings.
>>
>>
>>
>> --
>> Thanks
>> Harish
>>
>>
>
>
> --
> Thanks
> Harish
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen