Re: Poisonous models (was the bad word)

Renaud Delbru Tue, 20 Jul 2010 02:58:03 -0700

Hi Hugh,

comment below,


On 19/07/10 08:22, Hugh Glaser wrote:

to answer to your question, Sindice will accept the document, perform
reasoning and index it as it is. However, Sindice is somehow robust to
this kind of "poisonous" data. Sindice is performing a particular kind
of reasoning that we call "context-dependent" reasoning [1], in which
inference is performed in the "context of the document". The inference
will only be true in the context of this document, and will not have a
global impact, i.e., will not alter the inference on other documents.
Therefore, Sindice avoids undesirable assertions. In fact, we do not
restrict the freedom of expression of data publishers as in other
approach like SAOR [2] where certain statements are considered invalid
and ignored.  Data publishers are allowed to reuse and extend ontologies
or existing entities in any manner, but the consequences of their
modifications will be confined in their own context, and will not alter
the intended semantics of the other RDF models published on the Web.

Cool.
Sounds really good that the inference part of Sindice is robust to this.
Although I guess if I use Sindice to find relevant documents for
dbpedia:Darby_Riordan and load them into my store, I am likely to end up
with a pretty poisonned store.

As you are saying, you are looking for relevant documents aboutdbpedia:Darby_Riordan. In this case, with an appropriate ranking, it isunlikely that poisonous/spamming documents will appear in the top-k results.

However, if somebody requests all documents stating<?s, owl:sameas,
dbpedia:Darby_Riordan>, Sindice will return you the document
http://data.totl.net/dave.rdf. But such problem can be tackled with
appropriate ranking methodologies (based on link analysis methods such
as [3]).
Poisonous documents published on the web are likely to not have
any incoming links (or only from other poisonous documents, but this can
be detected), and therefore will be ranked very low and will never
appear in the top-k search results.

Not sure of this.
Poisonous documents may well have many links to them (saying they are
poisonous?).

Good point, but in this case, it means that people agree on a certainvocabulary to point out poisonous documents. In this case, thisinformation (meaning of the link) can be integrated into the rankingfunction. If a document has many incoming links, e.g., of typeisPoisonous, then we can rank it lower.After, finding the right ranking function is another problem (andinteresting problem), but it is possible.

This seems to me to be comparable to the citation problem, where a paper
gets very high citations because everyone cites it as being wrong.
Of course, sentiment analysis etc may help (and may be easier in the
semantic web), but pure reference count is dangerous.

The ranking should not be purely based on references, and it should alsotake into consideration the meaning of the links. Also, only taking themeaning of the links is dangerous.For example, if I create a link to a dbpedia:document saying it ispoisonous, why should people trust me ? However, if there is a multitudeof links saying that the document is poisonous, then we can have moreconfidence in the fact that the document is really poisonous.


Regards,
--
Renaud Delbru

Re: Poisonous models (was the bad word)

Reply via email to