Re: Poisonous models (was the bad word)

Renaud Delbru Sun, 18 Jul 2010 09:50:36 -0700

Hi Hugh,

to answer to your question, Sindice will accept the document, performreasoning and index it as it is. However, Sindice is somehow robust tothis kind of "poisonous" data. Sindice is performing a particular kindof reasoning that we call "context-dependent" reasoning [1], in whichinference is performed in the "context of the document". The inferencewill only be true in the context of this document, and will not have aglobal impact, i.e., will not alter the inference on other documents.Therefore, Sindice avoids undesirable assertions. In fact, we do notrestrict the freedom of expression of data publishers as in otherapproach like SAOR [2] where certain statements are considered invalidand ignored. Data publishers are allowed to reuse and extend ontologiesor existing entities in any manner, but the consequences of theirmodifications will be confined in their own context, and will not alterthe intended semantics of the other RDF models published on the Web.

However, if somebody requests all documents stating <?s, owl:sameas,dbpedia:Darby_Riordan>, Sindice will return you the documenthttp://data.totl.net/dave.rdf. But such problem can be tackled withappropriate ranking methodologies (based on link analysis methods suchas [3]). Poisonous documents published on the web are likely to not haveany incoming links (or only from other poisonous documents, but this canbe detected), and therefore will be ranked very low and will neverappear in the top-k search results.


[1] http://renaud.delbru.fr/doc/pub/SSWS2008-context.pdf
[2] http://www.deri.ie/fileadmin/documents/DERI-TR-2009-04-21.pdf
[3] http://renaud.delbru.fr/doc/pub/eswc2010-ding.pdf

Regards,
--
Renaud Delbru

On 18/07/10 16:58, Hugh Glaser wrote:

Sure, Nathan may be.
But Richard and Toby moved into the poisoning world.
You can only use the techniques you describe if you have concepts of where 
things can/can't come from.
And as Toby says, if Google (or Sindice) took this...
What does happen if Sindice accepts this document?

Hugh

On 18 Jul 2010, at 05:54, "Daniël 
Bos"<[email protected]<mailto:[email protected]>>  wrote:


I think Nathan isn't talking about poisoning models (which could be prevented 
using reification, or using quads, which include the source of the statement, 
and then only trust selected statements), but about the problem of giving 
spammers a tool to much easier collect email and postal addresses from the web, 
by simply parsing pages instead of scraping and somehow detecting the 
information.

Though I can see the danger in that, I personally don't think it is that much 
of an issue, since email addresses have always been easy to scrape, and postal 
addresses are in most cases easy to collect from e.g. business directories. 
Semantic markup makes it easier, but those wanting to collect this kind of data 
could and would do that anyway.

--
With kind regards,
Daniël Bos

On Jul 18, 2010 12:55 AM, "Hugh 
Glaser"<<mailto:[email protected]>[email protected]<mailto:[email protected]>>
  wrote:

You better hope your system can cope with this.
<http://data.totl.net/dave.rdf>http://data.totl.net/dave.rdf

Hugh

On 17 Jul 2010, at 11:35, 
"Nathan"<<mailto:[email protected]>[email protected]<mailto:[email protected]>>  
wrote:

So, after seeing this question on s...

Re: Poisonous models (was the bad word)

Reply via email to