Re: 200 OK with Content-Location might work: But maybe it can be simpler?

2010-11-05 Thread Robert Fuller



On 05/11/10 17:26, Nathan wrote:

Giovanni Tummarello wrote:

How about something that's totally independant from HEADER issues?

think normal people here. absolutely 0 interest to mess with headers
and http responses.. absolutely no business incentive to do it.

as a baseline think someone wanting to annotate with RDFa a hand
crafted, apached served html file.
really.. as simple as serving this people.

as simple as anyone who's using opengraph just copy pastes into their
HTML template.. as simple as this
really, please, its the only thing that can work?


+1 from me - all this  uri and 303 nonsense, now other codes and
any form of HTTP awareness is best completely removed. uri#frag gives us
that semantic indirection we need, without anybody even noticing (and
allows 200 OK).


What about 404 ;-) ?

What about

http://iandavis.com/2010/303/toucan#FredFlintstone





Best,

Nathan



--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: Is 303 really necessary - demo

2010-11-05 Thread Robert Fuller

On 05/11/10 16:50, Ian Davis wrote:

On Fri, Nov 5, 2010 at 4:42 PM, Robert Fuller  wrote:

I submitted both urls to sindice earlier. Both were indexed and have the
same content. In the search results[1] one displays with title "A Toucan",
the other with title, "A Description of a Toucan".

http://sindice.com/search?q=toucan+domain%3Aiandavis.com&qt=term



So SIndice see them as distinct resources and doesn't concern itself
with the lack of a 303 redirect?


Both pages returned http status code of 200 and some content. Sindice 
extracted metadata from the content (using any23), and associated that 
content with the requested url's.


Sindice doesn't "expect" 303's, but it follows them.

This isn't always a good thing...
http://inspector.sindice.com/inspect?url=http://xmlns.com/foaf/0.1/IanDavis




Ian


--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: Is 303 really necessary - demo

2010-11-05 Thread Robert Fuller

Hi,

I submitted both urls to sindice earlier. Both were indexed and have the 
same content. In the search results[1] one displays with title "A 
Toucan", the other with title, "A Description of a Toucan".


http://sindice.com/search?q=toucan+domain%3Aiandavis.com&qt=term

Robert.

On 05/11/10 09:43, Ian Davis wrote:

Hi all,

To aid discussion I create a small demo of the idea put forth in my
blog post http://iand.posterous.com/is-303-really-necessary

Here is the URI of a toucan:

http://iandavis.com/2010/303/toucan

Here is the URI of a description of that toucan:

http://iandavis.com/2010/303/toucan.rdf

As you can see both these resources have distinct URIs.

I created a new property http://vocab.org/desc/schema/description to
link the toucan to its description. The schema for that property is
here:

http://vocab.org/desc/schema

(BTW I looked at the powder describedBy property and it's clearly
designed to point to one particular type of description, not a general
RDF one. I also looked at
http://ontologydesignpatterns.org/ont/web/irw.owl and didn't see
anything suitable)

Here is the URI Burner view of the toucan resource and of its
description document:

http://linkeddata.uriburner.com/about/html/http://iandavis.com/2010/303/toucan

http://linkeddata.uriburner.com/about/html/http/iandavis.com/2010/303/toucan.rdf

I'd like to use this demo to focus on the main thrust of my question:
does this break the web  and if so, how?

Cheers,

Ian

P.S. I am not fully caught up on the other thread, so maybe someone
has already produced this demo



--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: What would break, a question for implementors? (was Re: Is 303 really necessary?)

2010-11-05 Thread Robert Fuller



On 05/11/10 15:06, Ian Davis wrote:

On Fri, Nov 5, 2010 at 12:12 PM, Nathan  wrote:

However, if you use 303's the then first GET redirects there, then you store
the ontology against the redirected-to URI, you still have to do 40+ GETs
but each one is fast with no response-body (ontology sent down the wire)
then the next request for the 303'd to URI comes right out of the cache.
It's still 40+ requests unless you code around it in some way, but it's
better than 40+ requests and 40+ copies of the single ontology.


But in practice, don't you look in your cache first? If you already
have a label for foaf:knows because you looked up foaf:mbox a few
seconds ago why would you issue another request?


Sindice would, because Fred could also define a label for foaf:knows in 
the flintstone schema. The Sindice contextualised reasoning is performed 
in a sandbox to ensure that Fred's malicious schema isn't going to 
pollute any inferencing from your document, unless your document also 
references Fred's schema. Without checking we can't be sure that 
foaf:knows and foaf:mbox are defined in the same ontology.



--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: What would break, a question for implementors? (was Re: Is 303 really necessary?)

2010-11-05 Thread Robert Fuller



So here's a couple of questions for those of you on the list who have
implemented Linked Data tools, applications, services, etc:

* Do you rely on or require HTTP 303 redirects in your application? Or
does your app just follow the redirect?


For sindice - no we do not rely on or require them, merely follow.


* Would your application tool/service/etc break or generic inaccurate
data if Ian's pattern was used to publish Linked Data.


It wouldn't break sindice.

However... with regard to publishing ontologies, we could expect 
additional overhead if same content is delivered on retrieving different 
Resources for example http://example.com/schema/latitude and 
http://example.com/schema/longitude . In such a case ETag could be used 
to suggest the contents are identical, but not sure that is a practical 
solution. I expect that without 303 it will be more difficult in 
particular to publish and process ontologies.


Rob.
--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: Is 303 really necessary?

2010-11-04 Thread Robert Fuller
It has been pointed out to me that the many resources we are 
encountering for

http://opengraphprotocol.org/schema/latitude
are actually wrong - so deserving a 404, the resource should correctly 
be written:


http://ogp.me/ns#latitude

But never mind, that doesn't resolve either...

On 04/11/10 18:38, Robert Fuller wrote:

Hi,

Feel free anyone to suggest opengraph use 301, 302, 303, 307 (we support
them all), since at the moment with a 404 they are missing out on all
the benefit of the sindice reasoner ;-)

http://opengraphprotocol.org/schema/latitude

It is common when publishing an ontology to have the url for each
property redirect to the rdf schema. It works great.

I would expect that a request for the aforementioned url (with accept
header set correctly) would redirect me to (probably)
http://opengraphprotocol.org/schema

Which would download nicely with a 200 status code (it doesn't, you need
to get the ontology from here
http://opengraphprotocol.org/schema/?format=rdf )

Later, when we encounter another opengraph property
http://opengraphprotocol.org/schema/longitude
We would also hope to get a 303, which would again redirect us to
http://opengraphprotocol.org/schema

Of course, we don't want to bring down opengraph server, so we have
already cached the schema the first time we downloaded (if it worked)
and know not to fetch it again now.

In my experience processing millions of rdf documents daily, the 303 has
proven quite useful and very efficient, and I would definitely recommend
it's use to opengraph and other publishers of ontologies.

Robert.



On 04/11/10 13:22, Ian Davis wrote:

Hi all,

The subject of this email is the title of a blog post I wrote last
night questioning whether we actually need to continue with the 303
redirect approach for Linked Data. My suggestion is that replacing it
with a 200 is in practice harmless and that nothing actually breaks on
the web. Please take a moment to read it if you are interested.

http://iand.posterous.com/is-303-really-necessary

Cheers,

Ian





--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: Is 303 really necessary?

2010-11-04 Thread Robert Fuller

Hi,

Feel free anyone to suggest opengraph use 301, 302, 303, 307 (we support 
them all), since at the moment with a 404 they are missing out on all 
the benefit of the sindice reasoner ;-)


http://opengraphprotocol.org/schema/latitude

It is common when publishing an ontology to have the url for each 
property redirect to the rdf schema. It works great.


I would expect that a request for the aforementioned url (with accept 
header set correctly) would redirect me to (probably)

http://opengraphprotocol.org/schema

Which would download nicely with a 200 status code (it doesn't, you need 
to get the ontology from here

http://opengraphprotocol.org/schema/?format=rdf )

Later, when we encounter another opengraph property
http://opengraphprotocol.org/schema/longitude
We would also hope to get a 303, which would again redirect us to
http://opengraphprotocol.org/schema

Of course, we don't want to bring down opengraph server, so we have 
already cached the schema the first time we downloaded (if it worked) 
and know not to fetch it again now.


In my experience processing millions of rdf documents daily, the 303 has 
proven quite useful and very efficient, and I would definitely recommend 
it's use to opengraph and other publishers of ontologies.


Robert.



On 04/11/10 13:22, Ian Davis wrote:

Hi all,

The subject of this email is the title of a blog post I wrote last
night questioning whether we actually need to continue with the 303
redirect approach for Linked Data. My suggestion is that replacing it
with a 200 is in practice harmless and that nothing actually breaks on
the web. Please take a moment to read it if you are interested.

http://iand.posterous.com/is-303-really-necessary

Cheers,

Ian



--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: Subjects as Literals

2010-07-06 Thread Robert Fuller

+1

On 06/07/10 09:23, Danny Ayers wrote:

I've been studiously avoiding this rat king of a thread, but just on
this suggestion:

On 2 July 2010 11:16, Reto Bachmann-Gmuer  wrote:
...

Serialization formats could support

"Jo" :nameOf :Jo

as a shortcut for

[ owl:sameAs "Jo"; :nameOf :Jo]

and a store could (internally) store the latter as

"Jo" :nameOf :Jo

for compactness and efficiency.


what about keeping the internal storage idea, but instead of owl:sameAs, using:

:Jo rdfs:value "Jo"

together with

:Jo rdf:type rdfs:Literal

?

Cheers,
Danny.



--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: Show me the money - (was Subjects as Literals)

2010-07-01 Thread Robert Fuller

Saw them, smiled, threw them in the bin.

I can't present a use case for "Literals as Subject", but I did have a 
relevant experience recently when having written a reasoner for sindice 
I was briefly intrigued to discover that executing some owl rules leads 
to a production of statements where literals appear in the subject 
position.


As the reasoner was written primarily with performance and memory 
constraints in mind, it never occurred to me to investigate whether the 
principles of rdf inferencing prohibit generating such statements.


But since triples with literal in the subject position are currently not 
of any interest to us, we simply discard them during a filtering phase.


Kind regards,
Robert

On 01/07/10 17:05, John Erickson wrote:

RE getting "a full list of the benefits," surely if it's being
discussed here, "Literals as Subjects" must be *somebody's* Real(tm)
Problem and the benefits are inherent in its solution?

And if it isn't, um, why is it being discussed here? ;)

On Thu, Jul 1, 2010 at 11:46 AM, Henry Story  wrote:

Jeremy, the point is to start the process, but put it on a low burner,
so that in 4-5 years time, you will be able to sell a whole new RDF+ suite to 
your customers with this new benefit.  ;-)

On 1 Jul 2010, at 17:38, Jeremy Carroll wrote:



I am still not hearing any argument to justify the costs of literals as subjects

I have loads and loads of code, both open source and commercial that assumes 
throughout that a node in a subject position is not a literal, and a node in a 
predicate position is a URI node.


but is that really correct? Because bnodes can be names for literals, and so 
you really do have
literals in subject positions No?



Of course, the "correct" thing to do is to allow all three node types in all 
three positions. (Well four if we take the graph name as well!)

But if we make a change,  all of my code base will need to be checked for this 
issue.
This costs my company maybe $100K (very roughly)
No one has even showed me $1K of advantage for this change.


I agree, it would be good to get a full list of the benefits.



It is a no brainer not to do the fix even if it is technically correct

Jeremy












--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: Please stop massive crawling against http://openean.kaufkauf.net/id/

2010-06-08 Thread Robert Fuller

Kingsley Idehen wrote:

The LOD Cloud Cache at DERI is a live Virtuoso instance with 15 Billion+ 
Triples loaded. It covers as much of the LOD Cloud as we've be able to 
get our hands on plus 6.4 Billion Triples from the Data.Gov effort.


I'll drop a more detailed note about this instance (via blog post) once 
we are done with data loading (there's a massive collection of eCommerce 
oriented Products & Services data to be loaded amongst others).


I wonder is this data load the culprit responsible for the "massive 
crawling"?


--
Robert Fuller
Research Associate
Sindice Team
DERI, Galway
http://sindice.com/



Re: Please stop massive crawling against http://openean.kaufkauf.net/id/

2010-06-08 Thread Robert Fuller

Hi,

Sindice clearly identifies itself in the user agent http header. 
Currently we use these user agents:


1. "Mozilla/5.0 (compatible; sindice-fetcher/0.1.0 
+http://sindice.com/developers/bot)"


2. "SindiceFetcher/Ping Manager (http://sindice.com/developers/bot";

3. "sindice.net ontology fetcher"

Niceness is implemented in our main fetcher. In some cases there may be 
bursts on sites providing distributed ontologies. Speaking with the 
group here it seems unlikely that we have not been hitting kaufkauf.net, 
 however if you can provide an IP address I can do some further 
verification.


I understand that http://lod.openlinksw.com/sparql is now hosted at 
DERI, and I wonder could some of the traffic be related to that? Again, 
if you can provide an IP address I will do some further verification.



Kind regards,
Rob.

--
Robert Fuller
Research Associate
DERI, Galway