from:"Giovanni Tummarello"

Hohoho - Kibi 0.2 available

2015-12-25 Thread Giovanni Tummarello

If you're a Real Geek you know this is a great time of the year to play
around with new data toys! :)

so give Kibi 0.2 a spin, for it is finally here...

*What is Kibi?*

Kibi 0.2 is a Kibana fork for Data Intelligence.

For those that dont know Kibana, it's a quite amazing data search/analytics
tool by Elasticsearch with tens of thousands of industrial users and
amazing open source comunity.

Kibi build on Kibana and adds, bit by bit, some of the best ideas we're
talked about here for a long time.

To make it real, Kibi leverages our high tech (distributed/ultra optimized)
Elasticsearch plugins for data joins and (soon) graph like interfacing with
Elasticsearch data.

This version, for example, introduces "Relational Configuration" and
"Timeline Widget" which operate across Elasticsearch indexes and allow "set
to set" navigation.

announcement below:
Kibi 0.2, is here - (hohoho)

*What's new?*
Relational Panel Filter, in action

Kibi 0.2. introduces a new way to filter relational data: just click on the
checkboxes of the "relational panel" to show only the related records in of
the dashboards, live updated. Works great also in conjunction with the
standard kibi relational filters.


Configure the "relational schema" of your data in Elasticsearch

Do this to enable the relational panel above and many more smart behaviours
(now: relational panel, next so much more stuff)




Say hi to our first Cross Index Widget: Timeline

Data Intelligence means to understand your "target" with data from many
different indexes.

Once Kibi is configured to know how indexes and datasources related then
things get.. way powerful. Like our new Cross Index Timeline widget.

Here is all about "Songbird' coming from 4 different saved searches".. a
segment showing company creation till company deadpooling, then:
investments, competing companies, articles mentioning it...




User configurable data sources, full Rest API support

Users can now configure datasources (SQL, General REST Apis, SPARQL over
HTTP), directly from the UI.

Use it in conjunction with our "query templates" and "query based
aggregators" (to power analytics, filters etc). Very very poweful, watch
for blog posts soon :)



 Marry Xmas and.. what's cooking for 2016?

We promised Kibi to be an "in sync" fork with the main Kibana project and
we'll deliver.

So to begin 2016, Kibi 0.3 will come out based on Kibana 5.x (so it is
called in the Trunk) and therefore bring Kibana plugin compatibility,
Timelion, black themes and all the other good stuff :)

... in the meanwhile may you have a great xmas break - and we look forward
to work with you in 2016!

http://siren.solutions/kibi

Sign up to our youtube channel!
https://www.youtube.com/channel/UCKGsC-vD28r7hW6T9QspKPA

Best Wishes to all!

Re: Are there any datasets about companies? ( DBpedia Open Data Initiative)

2015-11-03 Thread Giovanni Tummarello

Hi Sebastian, just for context

(i am collaborating with a leadingmarket data provider) there are 17 M+
organizations in italy alone (either alive or dead .. but maybe worth still
being in a database).

Maaaybe, just maaybe its worth tto talk to some of these organization and
campaign the opening up of a super minimal dataset e.g. just name,
registration city, status dead or alive.

The rationale is that they could receive more hits to get all the "rest of
the data" from paying customers.

but it will be quite difficult one has to come up with a good pitch, and a
lot of patience. Consider that permid seems to have one such super open
dataset so maybe that's a starting point.

Self catered "add your company" approaches, are not going to work in my
opinion.

Gio

On Tue, Nov 3, 2015 at 2:05 PM, Nandana Mihindukulasooriya <
nmihi...@fi.upm.es> wrote:

> Hi Sebastian,
>
> Open PermID and Open Calais [1,2] initiatives from Thomson Reuters with
> Linked Data + bulk download (CC-BY 4.0) might be of interest to your
> work. Brian Ulicny presented it in ISWC 2015 [3] and it has identifiers
> curated and maintained by Thomson Reuters for more than 3.5 million
> organizations .
>
> It also has several useful information about those organizations.
> http://tinyurl.com/permid-org-properties
> http://tinyurl.com/permid-triple-patterns
>
> Best Regards,
> Nandana
>
> [1] https://permid.org/faq
> [2] http://www.opencalais.com/about/
> [3] https://twitter.com/nandanamihindu/status/653232796874506240
>
> On Tue, Nov 3, 2015 at 4:17 PM, Sebastian Hellmann <
> hellm...@informatik.uni-leipzig.de> wrote:
>
>> [Apologies for cross-posting]
>>
>> Dear all,
>> this message is part announcement of an open data initiative and part
>> call for feedback and support.
>>
>> We are considering to work on creating a free, open and interoperable
>> dataset on companies and organisations, which we are planing to integrate
>> into DBpedia+ and offer as dump download. As we are in a very early phase
>> of the endeavour, we would like to know whether there is existing work in
>> this area.
>>
>> We are looking for any available datasets which have information about
>> companies and other organizations in any language and any country. Ideally,
>> the datasets are:
>> 1. downloadable as dump
>> 2. openly licensed , e.g. CC-BY following the
>> http://opendefinition.org/
>> 3. in an easily parseable format, e.g. RDF or CSV and not PDF
>>
>> But hey! Send around anything you know, and we will look at it and see
>> whether we can make use of it. You can reach us either by replying  to this
>> email or send feedback directly to me and Kay Müller
>> 
>> .
>> If you have any private/closed data, please contact us as well. We might
>> make use of it to cross-reference and validate public/open data with it. Or
>> just learn from it to build a good scheme.
>>
>> We started a link collection here (and attached the current status at the
>> end of this email)
>>
>> https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit
>> Also we started to collect potential identifiers for linking here:
>>
>> https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0
>>
>> Regards and thank you for any support on this,
>> Sebastian and Kay
>>
>> ##
>>
>>
>> https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> * Open Company Data Open Company Data
>> 
>> Identifiers for companies/organisation
>> 
>> URIs (Linked Data/Semantic Web)
>> 
>> Downloadable Datasets with Company info (confirmed)
>> 
>> Portals with no bulk downloads
>> 
>> Portals, we will still need to investigate
>> 
>> Identifiers for companies/organisation Table with identifiers:
>> https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0
>> 
>> URIs (Linked Data/Semantic Web) - DBpedia/Wikipedia/Wikidata URIs -
>>

Re: Announcing KIBI a Data Intelligence platform (with some "semantic web")

2015-09-21 Thread Giovanni Tummarello

Thanks a lot Alfredo.
We put a good amount of care in making Kibi as easy as possible to try and
deploy. A Full demo comes included in the distribution, it should be a
couple of clicks (on windows) or just two simple commands on Mac/Linux :).
Please refer to the user guide for a guided tour
http://siren.solutions/kibi/docs/current/

Gio


On Mon, Sep 21, 2015 at 3:30 PM, Alfredo Serafini <ser...@gmail.com> wrote:

> wow, it seems a great work! :-)
>
> I'll try it as soon as I can, thank you for sharing
>
> 2015-09-21 15:56 GMT+02:00 Giovanni Tummarello <g.tummare...@gmail.com>:
>
>> Dear all
>>
>> we at Siren Solutions are very happy to announce today Kibi an Open
>> Source Data Intelligence platform for "Data Intelligence"
>>
>> Kibi is a "friendly fork" of Kibana - an amazing platform for browsing
>> data and getting analytics backed by Elasticsearch.
>>
>> Kibi extends Kibana with the ability of handle relational data, either
>> via cross Elasticsearch index joins, or via querying external SQL or SPARQL
>> (!) endpoints.
>>
>> With these extensions, Kibi can deliver ultrafast, realtime,
>> scalable/search and analytics (quasi BI grade) on mixed/semistructured
>> datasets - in an entirely user customizable data environment.
>>
>> Possible applications - among which several from actual early adopters:
>>
>>
>>- Security and IP intelligence: display which servers are being
>>attacked by a set of malicious IP addresses, stored in a separate index.
>>- News/Financial Intelligence: perform analytics on companies
>>mentioned in social media streams, news feeds and analysts reports; show
>>related financial information over a custom time period.
>>- Business Intelligence: * what are the most purchased products by
>>customers that during any email or support interactions have mentioned the
>>name of a competitor in the past quarter?*
>>- Life Science: browse targets, references, formulae and molecular
>>structures related to the papers from a specific author.
>>- Law Enforcement: display informations about suspects and
>>offenders , *extends your search by filters created by querying an
>>external high performance graph store.*
>>- Internet of things, sensors data: display the location of the
>>sensors on a map, restrict the visualization to a specific area, then
>>display all the communication logs generated by the sensors in the area in
>>real time.
>>- Legal Practice Management: display all the information about the
>>cases related to a specific topic, outcome or time, then drill down
>>on related cases.
>>- Mobility planning: perform analytics on vehicle behaviour by
>>joining traffic data, vehicle registration numbers, driver licenses and
>>violations, e.g. *see the top five violations from drivers under 30
>>years driving a car with a power above a certain threshold in the past 
>> year*
>>.
>>- Local authority planning: related building permission
>>documents with information about architects, owners and nearby buildings.
>>
>>
>> But most of all Kibi is a lot of fun to us, give it a shot :) - the
>> distribution comes preloaded with a nice large relational data demo. A
>> couple of clicks and you'll be playing with it.
>>
>> *Blog post with details:*
>>
>> http://siren.solutions/kibi-a-kibana-fork-for-data-intelligence/
>>
>> *Homepage:*
>>
>> http://siren.solutions/kibi
>>
>> Screencast (6m) - much better for those who know Kibana, but will still
>> give some ideas.
>>
>> https://www.youtube.com/watch?v=Zkig4iXl_HM=youtu.be
>>
>> *Open Source*
>>
>> Kibi is Opensource, release as Apache (frontend - the Kibana fork) and
>> AGPL - the Backend, the SIREn relational join plugins.
>>
>> *The relationship with "Semantic Web"*
>>
>> Within Kibi and in its roadmap are numerous features which are inspired
>> or inherit some of the good ideas of the SW. For example Kibi uses "URIs"
>> internally also to point at records, be these in Elasticsearch or SQL
>> databases.  Also, support for taxonomies and graph queries will be coming
>> relatively soon, as well as simple inference.
>>
>> *Acknowledgements:*
>>
>> Acknowledgements go the SmartOpenData FP7 project - where Kibi is
>> providing analytics on geographical data - and the MixedEmotions H2020
>> project, where Kibi is used for emotion analytics on partner's
>> media. Acknowledgement also go to the Data and Knowledge Management
>> Research Unit at Fondazione Bruno Kessler  FBK institute for the project
>> idea support and feedback.
>>
>> Giovanni Tummarello
>> SIREn Solutions (Formerly Sindicetech)
>>
>
>

Announcing KIBI a Data Intelligence platform (with some "semantic web")

2015-09-21 Thread Giovanni Tummarello

Dear all

we at Siren Solutions are very happy to announce today Kibi an Open Source
Data Intelligence platform for "Data Intelligence"

Kibi is a "friendly fork" of Kibana - an amazing platform for browsing data
and getting analytics backed by Elasticsearch.

Kibi extends Kibana with the ability of handle relational data, either via
cross Elasticsearch index joins, or via querying external SQL or SPARQL (!)
endpoints.

With these extensions, Kibi can deliver ultrafast, realtime,
scalable/search and analytics (quasi BI grade) on mixed/semistructured
datasets - in an entirely user customizable data environment.

Possible applications - among which several from actual early adopters:


   - Security and IP intelligence: display which servers are being attacked
   by a set of malicious IP addresses, stored in a separate index.
   - News/Financial Intelligence: perform analytics on companies mentioned
   in social media streams, news feeds and analysts reports; show related
   financial information over a custom time period.
   - Business Intelligence: * what are the most purchased products by
   customers that during any email or support interactions have mentioned the
   name of a competitor in the past quarter?*
   - Life Science: browse targets, references, formulae and molecular
   structures related to the papers from a specific author.
   - Law Enforcement: display informations about suspects and
offenders , *extends
   your search by filters created by querying an external high performance
   graph store.*
   - Internet of things, sensors data: display the location of the sensors
   on a map, restrict the visualization to a specific area, then display all
   the communication logs generated by the sensors in the area in real time.
   - Legal Practice Management: display all the information about the cases
   related to a specific topic, outcome or time, then drill down on related
   cases.
   - Mobility planning: perform analytics on vehicle behaviour by joining
   traffic data, vehicle registration numbers, driver licenses and violations,
   e.g. *see the top five violations from drivers under 30 years driving a
   car with a power above a certain threshold in the past year*.
   - Local authority planning: related building permission documents with
   information about architects, owners and nearby buildings.


But most of all Kibi is a lot of fun to us, give it a shot :) - the
distribution comes preloaded with a nice large relational data demo. A
couple of clicks and you'll be playing with it.

*Blog post with details:*

http://siren.solutions/kibi-a-kibana-fork-for-data-intelligence/

*Homepage:*

http://siren.solutions/kibi

Screencast (6m) - much better for those who know Kibana, but will still
give some ideas.

https://www.youtube.com/watch?v=Zkig4iXl_HM=youtu.be

*Open Source*

Kibi is Opensource, release as Apache (frontend - the Kibana fork) and AGPL
- the Backend, the SIREn relational join plugins.

*The relationship with "Semantic Web"*

Within Kibi and in its roadmap are numerous features which are inspired or
inherit some of the good ideas of the SW. For example Kibi uses "URIs"
internally also to point at records, be these in Elasticsearch or SQL
databases.  Also, support for taxonomies and graph queries will be coming
relatively soon, as well as simple inference.

*Acknowledgements:*

Acknowledgements go the SmartOpenData FP7 project - where Kibi is providing
analytics on geographical data - and the MixedEmotions H2020 project, where
Kibi is used for emotion analytics on partner's media. Acknowledgement also
go to the Data and Knowledge Management Research Unit at Fondazione Bruno
Kessler  FBK institute for the project idea support and feedback.

Giovanni Tummarello
SIREn Solutions (Formerly Sindicetech)

Re: DBpedia-based RDF dumps for Wikidata

2015-05-15 Thread Giovanni Tummarello

Hi Dimistris, everyone in the team. congratulations, great job.. it will
certainly be useful

Gio

On Fri, May 15, 2015 at 11:28 AM, Dimitris Kontokostas 
kontokos...@informatik.uni-leipzig.de wrote:

 Dear all,

 Following up on the early prototype we announced earlier [1] we are happy
 to announce a consolidated Wikidata RDF dump based on DBpedia.
 (Disclaimer: this work is not related or affiliated with the official
 Wikidata RDF dumps)

 We provide:
  * sample data for preview http://wikidata.dbpedia.org/downloads/sample/
  * a complete dump with over 1 Billion triples:
 http://wikidata.dbpedia.org/downloads/20150330/
  * a  SPARQL endpoint: http://wikidata.dbpedia.org/sparql
  * a Linked Data interface: http://wikidata.dbpedia.org/resource/Q586

 Using the wikidata dump from March we were able to retrieve more that 1B
 triples, 8.5M typed things according to the DBpedia ontology along with 48M
 transitive types, 6.4M coordinates and 1.5M depictions. A complete report
 for this effort can be found here:
 http://svn.aksw.org/papers/2015/ISWC_Wikidata2DBpedia/public.pdf

 The extraction code is now fully integrated in the DBpedia Information
 Extraction Framework.

 We are eagerly waiting for your feedback and your help in improving the
 DBpedia to Wikidata mapping coverage
 http://mappings.dbpedia.org/server/ontology/wikidata/missing/

 Best,

 Ali Ismayilov, Dimitris Kontokostas, Sören Auer, Jens Lehmann, Sebastian
 Hellmann

 [1]
 http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg06936.html

 --
 Dimitris Kontokostas
 Department of Computer Science, University of Leipzig  DBpedia
 Association
 Projects: http://dbpedia.org, http://http://aligned-project.eu
 Homepage:http://aksw.org/DimitrisKontokostas
 Research Group: http://aksw.org

Re: Survey on Faceted Browsers for RDF data ?

2015-04-27 Thread Giovanni Tummarello

Hi Hugh, Christian

not sure. This[1] takes RDF as an input, the user couldnt care less about
the internal, just gets a fast, powerful relational browser. Nothing here
is SPARQL there isnt even a triplestore. RDF is processed via hadop, Solr
indexes are created, no URI is looked up.

Still it might be described as a faceted browser for RDF..?

Gio

[1] https://www.youtube.com/watch?v=TW4Po6re6LY


On Mon, Apr 27, 2015 at 3:24 PM, Hugh Glaser h...@glasers.org wrote:

 Hi.
  On 27 Apr 2015, at 14:02, Christian Morbidoni 
 christian.morbid...@gmail.com wrote:
 
  Dear Bernadette, all
 
  I can surely share my list, and I'll do as soon as I find some time to
 give it some structure and write in proper english...
 
  Honestly I got a bit stuck asking myself What exactly am I looking
 for? In other words: what is exactly a faceted browser for RDF data?
 I am not surprised - my head started to hurt when I thought about it!

  Does it mean that it has to query a SPRQL endpoint with no
 intermediaries, in real-time? In fact, this approach is not the best in my
 opinion...one probably needs to materialize data in some other more
 facets-friedly system (e.g. solr, elastic search) to gain good performances
 (I might be wrong but this is what my - limited - experience told me).
 Woah there!
 I would say that is exactly where a faceted browser stops.
 If there is free text search going on, then it isn’t really doing Linked
 Data, or it is doing much more in addition.
 (This is the Linked Data list.)
 So I would say that some constraints are that a faceted browser is
 something that lets you look at things identified by Linked Data URIs. It
 shows facets of those URIs, primarily by the Linked Data look up (through
 SPARQL or URI resolution), and in particular, doesn’t do (a lot of?) facets
 from text search engines especially if they don’t represent their data as
 RDF.

 Best
  Then I started asking me...if you put a SPARQL connector, then every
 existing faceted browser can be a RDF data faceted browser...
  So...what is the kind of tools that you think should go in this list?
 All existing systems that provide faceted browsing functionality? And what
 kind of features should a comparison take into account? may be how is it
 easy to connect a tool to a SPARQL endpoint? not sure...
 
  best,
 
  Christian
 
  On Wed, Apr 22, 2015 at 7:17 PM, Bernadette Hyland 
 bhyl...@3roundstones.com wrote:
  Hi Christian,
  If you produce a list of platforms/browsers for RDF, you'd create a
 valuable resource for the open data / Web of data community.  Thanks in
 advance.  Please share with the list when completed.
 
  Please add the Callimachus Project to your list.[1]  Callimachus is an
 Open Source web application server. It's an actively supported Open Source
 project that commercial companies, including 3 Round Stones support.
 Callimachus is on GitHub.[2]
 
  Used by government agencies, healthcare organizations and scientific
 researchers, Callimachus is used to to rapidly build and visualize data
 from the public Web or behind the firewall. It uses a Linked Data approach
 and based on W3C data standards, including RDF.
 
  Developers use a range of JavaScript libraries for visualizations,
 including D3 and Google Charts. Here is a sampling of apps that use
 Callimachus -- I share these to show it goes well beyond faceted browsing.
 
  Open Data Directory - Simple app - A crowdsourced community run
 directory of organizations using Linked Data for projects, see the W3C Open
 Data Directory [3]
 
  GeoHealth US - In beta. GeoHealth.us generates hundreds of millions of
 data-driven pages, including visualizations (heat maps, pollution reports,
 etc) related to environmental exposure and related diseases. It will be
 launched at the upcoming National Health Datapalooza in Washington DC in
 early June.[4]
 
  ORGpedia - A research project funded by the Alfred P. Sloan Foundation
 and led by New York University Professor Beth Noveck's team at the Wagner
 School of Public Policy.[5]
 
  WeatherHealth - A pilot for Sentara Healthcare. It combines data from
 multiple government open data sites to demonstrate  the power of patient
 education for better health.[6]
 
  Linked Data Books website.[7] This community run site publishes
 resources for developers, executives and academics. It's open to anyone who
 wishes to add a publication to the list.  If during your research you
 identify some good books to add, please send us an email. The Linked Data
 Books website was created using the Callimachus Project.
 
  Lastly, Callimachus served as a reference implementation for the Linked
 Data Platform.[8]
 
  Cheers,
 
  Bernadette Hyland
  CEO, 3 Round Stones, Inc.
 
  http://3roundstones.com  || http://about.me/bernadettehyland
 
  
  [1] http://callimachusproject.org
 
  [2] https://github.com/3-Round-Stones/callimachus/
 
  [3] The Open Data Directory - see http://dir.w3.org
 
  [4] Environmental exposures  diseases mapper - see

Re: Enterprise information system

2015-02-25 Thread Giovanni Tummarello

Hugh,

i think if you send them down a route where you have to write bespoke
software (which uses RDF concept, hard to find developers to write and
maintain) for purposes for which mature widely tested and widely spread
software exists you'd be doing them a disservice.

Eventually they'll find someone showing them how normally these things are
done they'd say hey but this is what we need really - give it to us now.
This will at that point both possibly spoil your reputation with them and
their perception toward LD technologies, which could on the other hand be
useful if used in moderation - or in domain where data variability is
indeed extreme.

I would recommend look for good open source personnel or project management
system (Groupware etc) and see if it makes sense to introduce concepts such
as unique identifiers used across the organization (Which could be
resolvable URI thus giving you a homepage for every core concept of the
company). But be flexible even in this case if you are to add any LD at
all.. people often prefer typ+number (e.g. personnel ID, project code) than
URIs  so if you do a global lookup interface for all, dont insist they must
use URIs to find something. However if anything does in fact show on a
stable and nice URI in their browser, they'll naturally refer to it when
passing each other references in emails etc.  but this is the same than
what they would be doing with any reputable content management system.

my2c
Gio

On Wed, Feb 25, 2015 at 10:06 PM, Hugh Glaser h...@glasers.org wrote:

 So, here’s a thing.

 Usually you talk to a company about introducing Linked Data technologies
 to their existing IT infrastructure, emphasising that you can add stuff to
 work with existing systems (low risk, low cost etc.) to improve all sorts
 of stuff (silo breakdown, comprehensive dashboards, etc..)

 But what if you start from scratch?

 So, the company wants to base all its stuff around Linked Data
 technologies, starting with information about employees, what they did and
 are doing, projects, etc., and moving on to embrace the whole gamut.
 (Sort of like a typical personnel management core, plus a load of other
 related DBs.)

 Let’s say for an organisation of a few thousand, roughly none of whom are
 technical, of course.

 It’s a pretty standard thing to need, and gives great value.
 Is there a solution out of the box for all the data capture from
 individuals, and reports, queries, etc.?
 Or would they end up with a team of developers having to build bespoke
 things?
 Or, heaven forfend!, would they end up using conventional methods for all
 the interface management, and then have the usual LD extra system?

 Any thoughts?

 --
 Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652

opengraph like search on sparql?

2015-02-03 Thread Giovanni Tummarello

Hi all,

would anyone be able to recommend/point to available system/library to get
opengraph search capabilities like.. like in facebook .. things like

pictures that my friends like
people depicted in pictures posted by my friends

thanks in advance.
Gio

Re: [Ann] WebVOWL 0.3 - Visualize your ontology on the web

2014-12-20 Thread Giovanni Tummarello



 Great job! Clearly when it rains, it pours!

 Lot's of great Linked Data visualizations are now popping up everywhere,
 just what we all needed.


hi kingsley
which other are you referring to? (i likely have missed them)
thanks
Gio

Re: Updated LOD Cloud Diagram - what is the message?

2014-08-18 Thread Giovanni Tummarello

Hi Chris, this is interesting, and its great you're looking also at the
world of marked up data.

my 2c shortly



 If you build an application that requires
 DBpedia/YAGO/Freebase/UMBEL/Cyc-style general knowledge about entities, or
 you build an applications that requires geographic, live science, or
 linguistic data, the datasets can be quite useful for you and the fact that
 they are partly interlinked can save you quite some work as you need to
 invest less effort into integrating them yourself.


sure, under the assumption that the interlinks (which are provided as best
effort by the producers) are of reasonable enough quality for your
application. As we know quality might be strongly related to application
e.g. in certain applications you might need more precisions, in certain
others recall etc.
certainly what is there provides a starting point however, courtesy again
of the best efforts of those few.

It is to be asked how linked data (dereferenciable uris etc) really helps
in fostering the quality of such interlinkage e.g. do people really have
mechanisms in place that resolve such uris to check the entity on the other
end or do they just download the dataset, convert it to something way
flatter, use disambiguation/interlining processes and then publish back?

...  but in fairness, the fact that you can look at a single Record and
somehow see as a human that it has a link to another dataset is per se
likely to have some positive effect on the willingness of people to indeed
go and do such interlinking.


Personally, I think it is quite interesting to compare the deployment of
 Microdata/RDFa/Microformats and Linked Data on the Web. We also
 investigated the deployment of Microdata/RDFa/Microformats  [1][2] and the
 comparison currently looks like this:



1.   The overall number of websites publishing
 Microdata/RDFa/Microformats is three orders of magnitude larger than the
 number of websites publishing Linked Data.

 2.   Topic wise, Microdata/RDFa/Microformats markup covers products,
 reviews, businesses, addresses, events, people, job postings and recipes.
 While Linked Data covers much more specific data from domains such as
 e-government, libraries, life science, linguistics or geography. So there
 is not too much overlap between the data that is published using the two
 technologies.

 3.   In the context of Microdata/RDFa/Microformats, data providers do
 not set links pointing at data items in other datasets. In the Linked Data
 context, data providers do set such links to a certain extend. Not setting
 links of course reduces the effort required for data publishers (you just
 need to add some semantic markup to the PHP template that renders your
 website and you are done). On the other hand without such links, using the
 data within applications is much more painful. For an example on how much
 effort it took to integrate some Microdata describing products from
 different websites, see [3] (we needed sophisticated information extraction
 techniques to generate features from the product names and descriptions and
 then sophisticated identity resolution techniques to guess which
 descriptions refer to the same product).

 4.   The Microdata/RDFa/Microformats are very shallow with usually
 only 3 or 4 attributes used to describe an entity and most interesting
 semantics only provided as free text (long product or job descriptions as
 text). In contrast, the data that is published as Linked Data is often much
 more structured (e-government, life science data, general-purpose KBs) and
 entities are described with more attributes (having kind of well-defined
 semantics) and is thus likely to enable more sophisticated applications.



 Looking at this comparison, I think the empirical results nicely reflect
 the strengths of both technologies. Microdata/RDFa/Microformats aim at
 being



This is quite interesting, but isnt this conclusion neglecting a huge fact..

how many people that professionally work with e-government, libraries,
life science, linguistics or geography use linked data technology format
vs other formats that are of relevance in that world? could it be again
around 3 orders of magnitude?

lets take the simples format, CSV
how would your 1 2 3 4  answers be with CSV included.

wouldnt we say that 2 orders ofmagnitude more datasets are published in
CSV, they are also much more complete than those of microdata/microformats,
definitely not less complete than those published in RDF and might or might
not include identifiers that can link them to others



 a simple technology for annotating webpages that puts very little effort
 on webmasters in order to find wide-spread deployment (Guha made this point
 rather clear in his LDOW2014 keynote [4]).




  Linked Data on the other hand is a technology for sharing the data
 integration effort between data publishers and data consumers (the more
 effort publishers put into setting RDF links, the easier it becomes for

Updated LOD Cloud Diagram - what is the message?

2014-08-17 Thread Giovanni Tummarello

Chris hi,

i would be interested in discussing what is the message that will
accompany this new version?

If i am not wrong there appear to be more bubbles than last time here so
i wonder is the message that's going out with this diagram that adoption
has increased (e.g. as there were 200 and now there are 500)?

if so, i do wonderif that is not misleading, based on this diagram alone.

For example how many of these are published by independent individuals or
organizations (some IP technique might be handy here also)?

That statusnet, gov.uk, bio2rdf etc has gone a bit more industrial and
published plenty of dataset is good, but is that significative in
evaluating a general data publishing technology?

More interesting it would be: how many of these are private companies, not
in the context of a publicly funded research projects? are there many that
are just created by hackers or students just making a point?

So many of the old datasets seem to have disappeared, what hapened to them?

Are those that stayed alive really and used? (i see http://revyu.com who's
biggest tag is good beers from 2007 the year where it was used by people
at the banff conference)
Is the usage really significant? (is see apache o'reilly - really?)

So. bottom line.

Sure one can say hey we gave a definition and we're following it to create
this diagram, everything else is out of the question.

.. and sure it doesnt have to be YOU answeing all those questions above. (i
guess your list of sites is public for other to investigate?).

I would however think it important that the message sent with this new
diagram did its best to avoid being possibly misleading :)

What are your thoughts?
Gio

On Fri, Aug 15, 2014 at 9:07 AM, Christian Bizer ch...@bizer.de wrote:

Hi all,

on July 24th, we published a Linked Open Data (LOD) Cloud diagram
containing
crawlable linked datasets and asked the community to point us at further
datasets that our crawler has missed [1].

Lots of thanks to everybody that did respond to our call and did enter
missing datasets into the DataHub catalog [2].

Based on your feedback, we have now drawn a draft version of the LOD cloud
containing:
1. the datasets that our crawler discovered
2. the datasets that did not allow crawling
3. the datasets you pointed us at.

The new version of the cloud altogether contains 558 linked datasets which
are connected by altogether 2883 link sets. As we were pointed at quite a
number of linguistic datasets [3], we added linguistic data as a new
category to the diagram.

The current draft version of the LOD Cloud diagram is found at:

http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/extendedLO
DCloud/extendedCloud.png
http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/extendedLODCloud/extendedCloud.png

Please note that we only included datasets that are accessible via
dereferencable URIs and are interlinked with other datasets.

It would be great if you could check if we correctly included your datasets
into the diagram and whether we missed some link sets pointing from your
datasets to other datasets.

If we did miss something, it would be great if you could point us at what
we
have missed and update your entry in the DataHub catalog [2] accordingly.

Please send us feedback until August 20th. Afterwards, we will finalize the
diagram and publish the final August 2014 version.

Cheers,

Chris, Max and Heiko

--
Prof. Dr. Christian Bizer
Data and Web Science Research Group
Universität Mannheim, Germany
ch...@informatik.uni-mannheim.de
www.bizer.de

Ph.D position on big financial/enterprise data

2014-06-04 Thread Giovanni Tummarello

Dear all,

a Ph.D position in the Web of Data Unit, University of Trento/ FBK
Institute is now accepting submissions with deadline 16/6.

This position is co-sponsored by www.spaziodati.eu which will also make
available to the candidate highly valuable, not otherwise available data as
provided by partners and investors in sectors such as
finance/banking/tourism and more.

The position would be ideal for strongly motivated individual with focus on
research that can be turned into products in the next 3 to 5 years.

FBK.eu is a vibrant institute, with a truly international culture set in
what is one of the, if not the, highest standard of living location in
italy.

Please see the full call here. http://spaziodati.eu/jobs#phd-grant-fbk

Gio

Re: Sindice.com end of support and history

2014-05-13 Thread Giovanni Tummarello

Dear all,
we're glad to say the article, also in improved version, is now available
on SemanticWeb.com

http://semanticweb.com/end-support-sindice-com-search-engine-history-lessons-learned-legacy-guest-post_b42797

cheers
Gio


On Wed, Apr 30, 2014 at 3:05 AM, Giovanni Tummarello g.tummare...@gmail.com
 wrote:

 Thanks for the mails and comments.

 Apologies for the broken link at the moment, for now please refer to
 google cache. [1]

 While we work to resolve the issue, our management asks us to clarify that
 the team departure from Sindice.com, does not necessarily mean the end
 the project.

 Best
 Giovanni


 [1]
 http://webcache.googleusercontent.com/search?q=cache:09H7ZzKW8AcJ:blog.sindice.com/2014/04/28/end-of-support-for-sindice-com-history-and-legacy/+cd=1hl=enct=clnk


 On Tue, Apr 29, 2014 at 6:28 PM, Giovanni Tummarello 
 g.tummare...@gmail.com wrote:

 Dear all,

 the Sindice team announces today the end of the support of the
 Sindice.com service. Effective late March we have put the service in “read
 only” mode. Maintenance on our side will continue until August 30th.

 With the launch in 2012 of Schema.org, Google and others have effectively
 embraced the vision of the “Semantic Web”. With the RDFa standard, but now
 even more with JSON-LD, richer and richer markup is becoming more and more
 popular on websites. While there might not be public web data “search
 apis”, large collections of crawled data (pages http://commoncrawl.org/
  and RDF http://webdatacommons.org/) exist today which are made
 available on cloud computing platforms for easy analysis with your favorite
 big data paradigm.

 Even more interestingly, the technology of Sindice.com has been made
 available in several projects maintained either as open source (see the
 blog post) or commercially supported by the team, now transitioned to the
 Sindice LTD company, AKA SindiceTech http://sindicetech.com/.

 For example, the Sindice.com main search engine, Siren, for is now
 available at http://sirendb.com .

 We recommend the community looks at it for what we believe to be
 unparalleled search capabilities on rich semistructured data (e.g. Json/XML
 and or text enhanced with entity descriptions or relational data).

 It has been quite a journey for us, and given there is no single summary
 anywhere we thought we’d take this occasion to write and share it. For
 “historical” reasons and as a way to glimpse at future directions of this
 field and technologies.

 The Sindice.com Founders

 Dr. Giovanni Tummarello  Dr. Renaud Delbru

 http://blog.sindice.com/2014/04/28/end-of-support-for-
 sindice-com-history-and-legacy/

Sindice.com end of support and history

2014-04-29 Thread Giovanni Tummarello

Dear all,

the Sindice team announces today the end of the support of the Sindice.com
service. Effective late March we have put the service in “read only” mode.
Maintenance on our side will continue until August 30th.

With the launch in 2012 of Schema.org, Google and others have effectively
embraced the vision of the “Semantic Web”. With the RDFa standard, but now
even more with JSON-LD, richer and richer markup is becoming more and more
popular on websites. While there might not be public web data “search
apis”, large collections of crawled data (pages http://commoncrawl.org/
 and RDF http://webdatacommons.org/) exist today which are made available
on cloud computing platforms for easy analysis with your favorite big data
paradigm.

Even more interestingly, the technology of Sindice.com has been made
available in several projects maintained either as open source (see the
blog post) or commercially supported by the team, now transitioned to the
Sindice LTD company, AKA SindiceTech http://sindicetech.com/.

For example, the Sindice.com main search engine, Siren, for is now
available at http://sirendb.com .

We recommend the community looks at it for what we believe to be
unparalleled search capabilities on rich semistructured data (e.g. Json/XML
and or text enhanced with entity descriptions or relational data).

It has been quite a journey for us, and given there is no single summary
anywhere we thought we’d take this occasion to write and share it. For
“historical” reasons and as a way to glimpse at future directions of this
field and technologies.

The Sindice.com Founders

Dr. Giovanni Tummarello  Dr. Renaud Delbru

http://blog.sindice.com/2014/04/28/end-of-support-for-
sindice-com-history-and-legacy/

Re: Sindice.com end of support and history

2014-04-29 Thread Giovanni Tummarello

Thanks for the mails and comments.

Apologies for the broken link at the moment, for now please refer to google
cache. [1]

While we work to resolve the issue, our management asks us to clarify that
the team departure from Sindice.com, does not necessarily mean the end the
project.

Best
Giovanni


[1]
http://webcache.googleusercontent.com/search?q=cache:09H7ZzKW8AcJ:blog.sindice.com/2014/04/28/end-of-support-for-sindice-com-history-and-legacy/+cd=1hl=enct=clnk


On Tue, Apr 29, 2014 at 6:28 PM, Giovanni Tummarello g.tummare...@gmail.com
 wrote:

 Dear all,

 the Sindice team announces today the end of the support of the Sindice.com
 service. Effective late March we have put the service in “read only” mode.
 Maintenance on our side will continue until August 30th.

 With the launch in 2012 of Schema.org, Google and others have effectively
 embraced the vision of the “Semantic Web”. With the RDFa standard, but now
 even more with JSON-LD, richer and richer markup is becoming more and more
 popular on websites. While there might not be public web data “search
 apis”, large collections of crawled data (pages http://commoncrawl.org/
  and RDF http://webdatacommons.org/) exist today which are made
 available on cloud computing platforms for easy analysis with your favorite
 big data paradigm.

 Even more interestingly, the technology of Sindice.com has been made
 available in several projects maintained either as open source (see the
 blog post) or commercially supported by the team, now transitioned to the
 Sindice LTD company, AKA SindiceTech http://sindicetech.com/.

 For example, the Sindice.com main search engine, Siren, for is now
 available at http://sirendb.com .

 We recommend the community looks at it for what we believe to be
 unparalleled search capabilities on rich semistructured data (e.g. Json/XML
 and or text enhanced with entity descriptions or relational data).

 It has been quite a journey for us, and given there is no single summary
 anywhere we thought we’d take this occasion to write and share it. For
 “historical” reasons and as a way to glimpse at future directions of this
 field and technologies.

 The Sindice.com Founders

 Dr. Giovanni Tummarello  Dr. Renaud Delbru

 http://blog.sindice.com/2014/04/28/end-of-support-for-
 sindice-com-history-and-legacy/

Freebase RDF distro released.

2014-02-13 Thread Giovanni Tummarello

Hi all

also with the very kind support of the RDF guys at google dev relationship
 (thanks Dan! :)
SindiceTech is happy to announce the availability of our Freebase
Distribution in the clouds.

*A Freebase data distribution that's easy to use.*

Freebase is an amazing data resource at the core of Google's Knowledge
Graph. The entire data on Freebase is available as a simple data dump.

However, using it as a whole is not exactly simple!

The SindiceTech Freebase distribution addresses this by providing all the
Freebase data pre-loaded in a database (more specifically, a
triplestorehttp://en.wikipedia.org/wiki/Triplestore)
for you to easily query and explore. Further more, you also get a  set of
tools that make it much easier to understand the data as a whole and aid
you in querying the data set.

*Why have all the data locally?*

You basically get your own private freebase instance. This means you that
you can privately query for whatever you want without any limitations on
the complexity of the query. Further more, you can easily combine Freebase
data with your own data sets and query the same in a unified manner. Its a
great start to creating your own Knowledge
Graphhttp://www.sindicetech.com/overview.html
. http://www.sindicetech.com/

*On Google Cloud*

The distribution is packaged as a virtual machine snapshot that you can
easily spin up in Google Cloud. Join our Google
grouphttps://groups.google.com/forum/#!forum/sindicetech-freebase
and
follow the instructions to get started and have it up and running in
minutes.

Official Page, with tool description
http://sindicetech.com/freebasehttp://www.sindicetech.com/freebase-distribution.html
.


*Watch us on Google Developers YouTube channel  *
An interview with us and the nice folks at Google Developer Relationships
is now published on the Google Dev youtube channel. It is about 20m long
and among the other things it also has a screencast of the distribution in
action.

Making Sense of the Graph
 
http://www.youtube.com/watch?v=m6EdVYt9rgshttp://www.youtube.com/watch?v=m6EdVYt9rgs

Questions? want to discuss ideas? we're here to help
Giovanni

Re: LOD publishing question

2014-01-31 Thread Giovanni Tummarello

Thanks Hugh,

crawling the web accurately is a billion dollar thing nowadays (not my
words) and all the big guys accurately crawl all the metadata now (though
dont give any public api).

I still think a more focused version of sindice e.g. just on demand etc
might be useful and have impact but resources are necessarely limited

Announcements with respect to the rest are coming next week :)
have a good weekend and thanks for the thanks, appreciated.

Gio


On Thu, Jan 30, 2014 at 8:13 PM, Hugh Glaser h...@glasers.org wrote:

 Hi Giovanni,
 Thank you for the update.
 I am sorry to hear that Sindice is going into a frozen state, and that
 circumstances are making that happen, but of course pleased that you are
 able to keep it going at all.
 I send you and your team my personal thanks for the service you have
 provided over the last 5 or so years, and wish you all well.
 Very best
 Hugh.


 On 28 Jan 2014, at 14:19, Giovanni Tummarello g.tummare...@gmail.com
 wrote:

  With respect to Sindice
 
  for a number of reasons, the people who originally created it, the
 former Data Intensive Infrastructure group, are either not working in the
 original institution hosting it, National University of Ireland Galway,
 institute formerly known as DERI or have been assigned to other tasks.
 
  Sindice has been operating for 5+ years, updating its index, (though we
 were never perfect) and we believe supported a lot of works on the field,
  but its now time to move on.  In the meanwhile the project will continue
 answer queries but without updating its index.
 
  Apologies for the inconvenience of course, we'll be posting on this soon
 and update the homepage to reflect the change.
 
  Giovanni
 
 
 
  On Tue, Jan 28, 2014 at 11:27 AM, Hugh Glaser h...@glasers.org wrote:
  Good question.
  I'll report what I found, rather than advising.
 
  So I went there when you published that email, looking for stuff to put
 in my sameas.org site.
  I tried exploring, and when I went to Browse I only found a few things,
 so wasn't encouraged :-)
  (And, as an aside, Advanced Search didn't seem to do anything, and the
 search links at the bottom were not links.)
  So I decided that it wasn't really mature enough to make it worth the
 effort (yet?), even though there should be massive scope for linkage
 eventually.
 
  But the real problem was that I couldn't find any Linked Data, or even
 an RDF store.
  The URIs you use are not very Cool URIs, and I tried to see if there was
 RDF at the end of them by doing Content Negotiation, but there wasn't.
  I am thinking of things like
 http://tundra.csd.sc.edu/rol/view-person.php?id=291
 
  So I went away :-)
 
  For people like me, you could put something about how to see the RDF in
 an About page (or if it is there, make it easier to find). You only get one
 chance to snare people on the web, after all.
  Of course as Alfredo says, for spidering search engines, and it would
 have helped me too, you need robots.txt (which I couldn't find either),
 sitemap, sitemap.xml, voiD description.
 
  Good luck!
  Hugh
 
  On 28 Jan 2014, at 04:12, WILDER, COLIN wilde...@mailbox.sc.edu wrote:
 
   Another question to you very helpful people-
  
   and apologies again for semi cross-posting
  
   Our LOD working group is having trouble publishing our data (see email
 below) in RDF form. Our programmer, a master's student, who is working
 under the supervision of myself and a computer science professor, has
 mapped sample data into RDF, has the triplestore on a D2RQ server
 (software) on our server and has set up a SPARQL end-point on the latter.
 But he has been unsuccessful so far getting 3 candidate semantic web search
 engines (Falcons, Swoogle and Sindice) to  be able to find our data when he
 puts a test query in to them. He has tried communicating with the people
 who run these, but to little avail. Any suggestions about sources of
 information, pointers, best practices for this actual process of publishing
 LOD? Or, if you know of problems with any of those three search engines and
 would suggest a different candidate, that would be great too.
  
   Thanks again,
  
   Colin Wilder
  
  
   From: WILDER, COLIN [mailto:wilde...@mailbox.sc.edu]
   Sent: Thursday, January 16, 2014 11:51 AM
   To: 'public-lod@w3.org'
   Subject: LOD for historical humanities information about people and
 texts
  
   To the many people who have kindly responded to my recent email:
  
   Thanks for your suggestions and clarifying questions. To explain a bit
 better, we have a data curation platform called RL, which is a large,
 complex web-based MySQL database designed for users to be able to simply
 input, store and share data about social and textual networks with each
 other, or to share it globally in RL's data commons. The data involved are
 individual data items, such as info about one person's name, age, a book
 title, a specific social relationship, etc. The entity types (in the
 ordinary-language sense

Re: LOD publishing question

2014-01-28 Thread Giovanni Tummarello

With respect to Sindice

for a number of reasons, the people who originally created it, the former
Data Intensive Infrastructure group, are either not working in the original
institution hosting it, National University of Ireland Galway, institute
formerly known as DERI or have been assigned to other tasks.

Sindice has been operating for 5+ years, updating its index, (though we
were never perfect) and we believe supported a lot of works on the field,
 but its now time to move on.  In the meanwhile the project will continue
answer queries but without updating its index.

Apologies for the inconvenience of course, we'll be posting on this soon
and update the homepage to reflect the change.

Giovanni



On Tue, Jan 28, 2014 at 11:27 AM, Hugh Glaser h...@glasers.org wrote:

 Good question.
 I'll report what I found, rather than advising.

 So I went there when you published that email, looking for stuff to put in
 my sameas.org site.
 I tried exploring, and when I went to Browse I only found a few things, so
 wasn't encouraged :-)
 (And, as an aside, Advanced Search didn't seem to do anything, and the
 search links at the bottom were not links.)
 So I decided that it wasn't really mature enough to make it worth the
 effort (yet?), even though there should be massive scope for linkage
 eventually.

 But the real problem was that I couldn't find any Linked Data, or even an
 RDF store.
 The URIs you use are not very Cool URIs, and I tried to see if there was
 RDF at the end of them by doing Content Negotiation, but there wasn't.
 I am thinking of things like
 http://tundra.csd.sc.edu/rol/view-person.php?id=291

 So I went away :-)

 For people like me, you could put something about how to see the RDF in an
 About page (or if it is there, make it easier to find). You only get one
 chance to snare people on the web, after all.
 Of course as Alfredo says, for spidering search engines, and it would have
 helped me too, you need robots.txt (which I couldn't find either), sitemap,
 sitemap.xml, voiD description.

 Good luck!
 Hugh

 On 28 Jan 2014, at 04:12, WILDER, COLIN wilde...@mailbox.sc.edu wrote:

  Another question to you very helpful people-
 
  and apologies again for semi cross-posting
 
  Our LOD working group is having trouble publishing our data (see email
 below) in RDF form. Our programmer, a master's student, who is working
 under the supervision of myself and a computer science professor, has
 mapped sample data into RDF, has the triplestore on a D2RQ server
 (software) on our server and has set up a SPARQL end-point on the latter.
 But he has been unsuccessful so far getting 3 candidate semantic web search
 engines (Falcons, Swoogle and Sindice) to  be able to find our data when he
 puts a test query in to them. He has tried communicating with the people
 who run these, but to little avail. Any suggestions about sources of
 information, pointers, best practices for this actual process of publishing
 LOD? Or, if you know of problems with any of those three search engines and
 would suggest a different candidate, that would be great too.
 
  Thanks again,
 
  Colin Wilder
 
 
  From: WILDER, COLIN [mailto:wilde...@mailbox.sc.edu]
  Sent: Thursday, January 16, 2014 11:51 AM
  To: 'public-lod@w3.org'
  Subject: LOD for historical humanities information about people and texts
 
  To the many people who have kindly responded to my recent email:
 
  Thanks for your suggestions and clarifying questions. To explain a bit
 better, we have a data curation platform called RL, which is a large,
 complex web-based MySQL database designed for users to be able to simply
 input, store and share data about social and textual networks with each
 other, or to share it globally in RL's data commons. The data involved are
 individual data items, such as info about one person's name, age, a book
 title, a specific social relationship, etc. The entity types (in the
 ordinary-language sense of actors and objects, not in the database tabular
 sense) can be seen athttp://tundra.csd.sc.edu/rol/browse.php. The data
 commons in RL is basically a subset of user data that users have elected
 (irrevocably) to share with all other users of the system. NB there is a
 lot of dummy data in the data commons right now because of testing.
 
  We are designing an expansion of RL's functionality so as to publish
 data from the data commons as LOD, so I am doing some preliminary work to
 assess feasibility and fit by matching up our entity types with
 RDFvocabularies. Here is what I have so far. First are the entity(ies) and
 relationships, followed by the appropriate vocabularies:
 
  1.   Persons, social relations: FOAF, BIO. The Catalogus
 Professorum Lipsiensis or CPL(
 http://svn.aksw.org/papers/2010/ISWC_CP/public.pdf) looks enormously
 useful for connecting academics (people), their relations and their books.
  But, I cannot seem to get any info page or specification page to load,
 making me worry that it's dead.
  2.

Re: WebID Frustration

2013-08-06 Thread Giovanni Tummarello

Science fiction (ooh ooh ooh) double feature
Doctor X (ooh ooh ooh) will build a creature
See androids fighting (ooh ooh ooh) Brad and Janet
Anne Francis stars in (ooh ooh ooh) Forbidden Planet
Wo oh oh oh oh oh
At the late night, double feature, picture show

http://www.deezer.com/album/238003

if you want to hear the full version you're 1 click away with your existing
google or facebook account.

Gio


On Tue, Aug 6, 2013 at 12:37 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:

 Well, RWW.IO looked exciting, so I decided to start with it.
 And it seemed a good idea to have an account, so I decided I would finally
 create a WebID login - I know that lots of people think that this is the
 Way Ahead.
 I have a foaf file (actually more than one), and trawling the web, it
 seems that I if I have a foaf file I can use it for WebID.
 I certainly don't want to create it on some other site - I need another
 account like I need a hole in the head - in fact, that is what is meant to
 be good about WebID!
 Surely it isn't Just one last new account.

 Anyway, you can guess that a while later I still don't seem to have
 managed it.
 I have read any number of pages that give me simple guides to doing
 stuff, with links to things that should help, etc. (often dead).
 I confess that I was definitely looking for the easiest way - for example,
 downloading a program to run just doesn't seem the sort of thing I want to
 do for something that is meant to be simple.
 Sorry if that all sounds provocative, but I am a bit frustrated!

 So have I missed something here?
 Is there really not a page that will really work for me?
 I'm using Safari on a Mac, by the way.
 And I'm trying to login in to https://hugh.rww.io

 Best
 Hugh

Fwd: [ANNOUNCEMENT] Apache Any23 0.8.0 Release

2013-07-11 Thread Giovanni Tummarello

-- Forwarded message --
From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com
Date: Wed, Jul 10, 2013 at 11:06 PM
Subject: [ANNOUNCEMENT] Apache Any23 0.8.0 Release
To: u...@any23.apache.org, d...@any23.apache.org d...@any23.apache.org

Hi All,

The Any23 PMC are pleased to announce the immediate release and
availability of Apache Any23 0.8.0.

Anything To Triples (any23) is a library, a web service and a command line
tool that extracts structured data in RDF format from a variety of Web
documents. Currently it supports the following input formats:

RDF/XML, Turtle, Notation 3
RDFa with RDFa1.1 prefix mechanism
Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview,
License, XFN and Species
HTML5 Microdata: (such as Schema.org)
CSV: Comma Separated Values with separator autodetection.

This release includes a major re-factoring of the codebase providing
improved modularity and enabling much better use of Any23 within your
applications. Additionally library upgrades have been made to Apache Tika
1.2. A full list of the features in this release can be seen in our
RELEASE-NOTES.txt [0] or in the release report [1].

Please head over to the Apache Any23 downloads [0] page for advice on how
to download this release and include it within your projects.

Thank you on behalf of the Any23 PMC
Lewis

[0] http://www.apache.org/dist/any23/0.8.0/RELEASE-NOTES.txt
[1] http://s.apache.org/iO
[2] http://any23.apache.org/download.html

-- 
*Lewis*

Re: Linked Data discussions require better communication

2013-06-20 Thread Giovanni Tummarello

My 2c is .. i agree with kingsley diagram , linked data should be possible
without RDF (no matter serialization) :)
however this is different from previous definitions

i think its a step forward.. but it is different from previously. Do we
want to call it  Linked Data 2.0? under this definition also
schema.orgmarked up pages would be linked data .. and i agree plenty
with this .

Gio


On Thu, Jun 20, 2013 at 6:27 PM, Kingsley Idehen kide...@openlinksw.comwrote:

  On 6/20/13 11:45 AM, Luca Matteis wrote:

 On Thu, Jun 20, 2013 at 5:02 PM, Melvin Carvalho melvincarva...@gmail.com
  wrote:

 Restate/reflect ideas that in other posts that are troubling/puzzling and
 ask for confirmation or clarification.


  I am simply confused with the idea brought forward by Kingsley that RDF
 is *not* part of the definition of Linked Data. The evidence shows the
 contrary: the top sites that define Linked Data, such as Wikipedia,
 Linkeddata.org and Tim-BL's meme specifically mention RDF, for example:

  It builds upon standard Web technologies such as HTTP, RDF and URIs -
 http://en.wikipedia.org/wiki/Linked_data
  connecting pieces of data, information, and knowledge on the Semantic
 Web using URIs and RDF. - http://linkeddata.org/

  This is *the only thing* that I'm discussing here. Nothing else. The
 current *definition* of Linked Data.


 Here's what I am saying, again:

 1. You can create and publish web-like structured data without any
 knowledge of RDF .

 2. You can create and publish web-like data that's enhanced with human-
 and machine-comprehensible entity relationship semantics when you add RDF
 to the mix.

 Venn diagram based Illustration of my point: http://bit.ly/16EVFVG .

 If you want your Linked Data to be interpretable by machine, then you can
 achieve that goal via RDF based Linked Data and applications equipped with
 RDF processing capability.

 RDF entity relationship semantics are *explicit* whereas run-of-the-mill
 entity relationship model based entity relationship semantics are
 *implicit*.

 RDF is the W3C's recommended framework for increasing the semantic
 fidelity of relations that constitute the World Wide Web.

 It isn't really that complicated.

 RDF can be talked about usefully without inadvertently creating an
 eternally distracting Reality Distortion Field, laden with indefensible
 ambiguity.

 --

 Regards,

 Kingsley Idehen   
 Founder  CEO
 OpenLink Software
 Company Web: http://www.openlinksw.com
 Personal Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca handle: @kidehen
 Google+ Profile: https://plus.google.com/112399767740508618350/about
 LinkedIn Profile: http://www.linkedin.com/in/kidehen

Re: Linked Data discussions require better communication

2013-06-20 Thread Giovanni Tummarello

Not implying that, i'd hope RDF can represent all really but RDF would not
be needed for linked data while an RDF description even alone lonely on the
web could be called Linked Data (if it uses URIs)
GIo



On Thu, Jun 20, 2013 at 7:05 PM, Pat Hayes pha...@ihmc.us wrote:


 On Jun 20, 2013, at 11:54 AM, Giovanni Tummarello wrote:

  My 2c is .. i agree with kingsley diagram , linked data should be
 possible without RDF (no matter serialization) :)
  however this is different from previous definitions
 
  i think its a step forward.. but it is different from previously. Do we
 want to call it  Linked Data 2.0? under this definition also schema.orgmarked 
 up pages would be linked data .. and i agree plenty with this .

 So, I imagine, does everyone else. But are you implying that Schema markup
 is somehow incompatible with RDF? If so, try reading
 http://blog.schema.org/2012/06/semtech-rdfa-microdata-and-more.html

 Pat

 
  Gio
 
 
  On Thu, Jun 20, 2013 at 6:27 PM, Kingsley Idehen kide...@openlinksw.com
 wrote:
  On 6/20/13 11:45 AM, Luca Matteis wrote:
  On Thu, Jun 20, 2013 at 5:02 PM, Melvin Carvalho 
 melvincarva...@gmail.com wrote:
  • Restate/reflect ideas that in other posts that are troubling/puzzling
 and ask for confirmation or clarification.
 
  I am simply confused with the idea brought forward by Kingsley that RDF
 is *not* part of the definition of Linked Data. The evidence shows the
 contrary: the top sites that define Linked Data, such as Wikipedia,
 Linkeddata.org and Tim-BL's meme specifically mention RDF, for example:
 
  It builds upon standard Web technologies such as HTTP, RDF and URIs -
 http://en.wikipedia.org/wiki/Linked_data
  connecting pieces of data, information, and knowledge on the Semantic
 Web using URIs and RDF. - http://linkeddata.org/
 
  This is *the only thing* that I'm discussing here. Nothing else. The
 current *definition* of Linked Data.
 
  Here's what I am saying, again:
 
  1. You can create and publish web-like structured data without any
 knowledge of RDF .
 
  2. You can create and publish web-like data that's enhanced with human-
 and machine-comprehensible entity relationship semantics when you add RDF
 to the mix.
 
  Venn diagram based Illustration of my point: http://bit.ly/16EVFVG .
 
  If you want your Linked Data to be interpretable by machine, then you
 can achieve that goal via RDF based Linked Data and applications equipped
 with RDF processing capability.
 
  RDF entity relationship semantics are *explicit* whereas run-of-the-mill
 entity relationship model based entity relationship semantics are
 *implicit*.
 
  RDF is the W3C's recommended framework for increasing the semantic
 fidelity of relations that constitute the World Wide Web.
 
  It isn't really that complicated.
 
  RDF can be talked about usefully without inadvertently creating an
 eternally distracting Reality Distortion Field, laden with indefensible
 ambiguity.
 
  --
 
  Regards,
 
  Kingsley Idehen
  Founder  CEO
  OpenLink Software
  Company Web:
  http://www.openlinksw.com
 
  Personal Weblog:
  http://www.openlinksw.com/blog/~kidehen
 
  Twitter/Identi.ca handle: @kidehen
  Google+ Profile:
  https://plus.google.com/112399767740508618350/about
 
  LinkedIn Profile:
  http://www.linkedin.com/in/kidehen
 
 
 
 
 
 
 

 
 IHMC (850)434 8903 or (650)494 3973
 40 South Alcaniz St.   (850)202 4416   office
 Pensacola(850)202 4440   fax
 FL 32502  (850)291 0667   mobile
 phayesAT-SIGNihmc.us   http://www.ihmc.us/users/phayes

Re: Linked Data Visualization: HTML5 based PivotViewer

2013-06-11 Thread Giovanni Tummarello

Interesting kingsley, not sure what the implication is of GPL2 is e.g.
would one have to redistribute the whole source code of anything attached
to it?

anyway great,
Gio

On Tue, Jun 11, 2013 at 6:13 PM, Kingsley Idehen kide...@openlinksw.comwrote:

All,

Here is a link [1][2] demonstrating what's now possible following a port
of the Microsoft Silverlight variant of PivotViewer to HTML5.

Background:

A while back, Microsoft introduced a powerful data visualization tool
called Silverlight that ended up being under utilized and eventually killed
off due to its delivery as a plugin. A group of us decided to get the data
visualization tool ported to HTML5 [3] as part of an open source project.

Deliverables:

We now have an HTML5 rendition of the original PivotViewer combined with
some innovations at the platform independence (works on the iPad and iPhone
[4]), layout (includes tabular presentation), and data access (supports
SPARQL) levels.

With regards to SPARQL, an endpoint simply needs to include support for
CXML as one of its output formats.

Links:

1. http://bit.ly/13UKvK8 -- example of a SPARQL query against BBC Wild
Life Nature Data presented via the HTML5 PivotViewer
2. http://bit.ly/1a17nfx -- SPARQL query definition URL that places you
inside the PivotViewer hosted query editor
3. http://bit.ly/QWYP1T -- Github project page
4. http://bit.ly/RiAzU1 -- screencast showing its use on the iPhone and
iPad.

Regards,

Kingsley Idehen
Founder CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog:
http://www.openlinksw.com/**blog/~kidehenhttp://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile:
https://plus.google.com/**112399767740508618350/abouthttps://plus.google.com/112399767740508618350/about
LinkedIn Profile:
http://www.linkedin.com/in/**kidehenhttp://www.linkedin.com/in/kidehen

Re: List Etiquette - It isn't really fair

2013-04-18 Thread Giovanni Tummarello

Completely agree Hugh, lets make sure we stick to the thread.
Gio

On Thu, Apr 18, 2013 at 11:41 AM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 Someone starts a thread (in this case Luca and his Restpark), about something 
 they would like to get some feedback on.
 In the very first reply, an issue arises that is at best tangential to the 
 thread subject, but (in my opinion) has no direct bearing on it:
 issues around SPARQL scales? and perhaps in comparison with REST, etc.

 40+ messages follow on scaling, with the few on Restpark interspersed.
 Only the hardiest souls interested in Restpark would have combed through 
 these messages to see the topic that interests them
 (or people who are retired with nothing better to do because they don't like 
 gardening :-) )

 This is no way to run a mailing list to get the widest engagement.
 It was clear very early (third message?) that the scaling topic had arisen - 
 at that stage the discussion should have moved to a new thread on scaling;
 or simply changed the subject line to have SPARQL Scaling - was Restpark - 
 Minimal….
 Then the people who might want to discuss Restpark can do so in their own 
 thread, and the scaling people can have their thread, without being bothered 
 by the Restpark discussion if they don't want to be.
 Simples!

 I wouldn't bother, but this seems to be the normal way this lists works - 
 check out the archive if you want!
 It makes it quite dysfunctional.

 Note that I did not simply add this message to the Restpark thread, which is 
 what usually happens in this list!

 Best
 Hugh

cool project break

2013-01-31 Thread Giovanni Tummarello

Anyone seriously sparql/java/javascript interested in a 4 month
project to bring a fundamentally useful semweb tool to its full open
source potential?
well rewarded, visit Ireland, work with the Sindice team, now until June.
Write me if interested.
Cheers
Gio

Re: Linked Data Dogfood circa. 2013

2013-01-05 Thread Giovanni Tummarello

Hi David

you're describing the plain concept of Open Data. SURE there are
great datasets our tere.Open Data is already a  sucess,it's good and
great and will save lifes if it hasnt done so already (e.g. all this
data now being made available)

However this really has not to do with what i was referring to which
is the failure at OUR specific task: coming up with specifications,
models,clients whatever to technically make 'Open Data on the Web as
revolutionary as WWW was -  the true killer app of hypertext the it
all makes sense now.

So with respect to this we failed, so far, OUR task.

Give up? no in my opinion. just lets recognize that asking people to
publish data alone makes no sense and actually its counterproductive
and damaging given the specifications we're asking people to follow
are silly as they dont serve any real purpose.

Something else is needed.

E.g. starting point: what makes the web of data fundamentally
different fromthe Web?  On the web it is indeed often sufficient to
create a web site even a crappy one and you get immediate benefits.
Given there are benefits people do it, period.

IMO a client is missing. or a set of clients, that will do useful
things for non fictional - important enough share of people.

With said clients (giving benefits already when accessing a few marked
up websites, e.g. something that allows you to use rottentomatoes.com
much better because of the markup that's ALREADY on it) then it willbe
PEOPLE writing to webmasters saying mark it up please otherwise i
cant XYor webmasters themselves wanting to mark up because then
people with clients willbe able to XY.

Gio

On Fri, Jan 4, 2013 at 11:44 PM, David Booth da...@dbooth.org wrote:
 I don't agree that the idea of publish some stuff and they will come,
 both new publishers and consumers has failed.  But I do think some
 expectations have been too high.

 Perhaps it is like a scale-free distribution.  Sure, there is lots of
 data that is published and ignored, just as there are millions of
 personal blog sites on the web that are ignored.  But there is also some
 data that is published and is very valuable to Real Applications, just
 as sites like http://www.nytimes.com/ are valuable to many readers.
 Biological / life sciences data comes to mind.  It is not always 5-star
 -- often 4-star or only 3-star:
 http://www.w3.org/DesignIssues/LinkedData.html

 One would be foolish to think that one's personal blog would be useful
 to many others just because it is published on the web.  Similarly one
 would be foolish to think that one's data would be useful to others
 merely because it is published as Linked Data.  But blogs and datasets
 need to be published before consumers can decide which of them are
 valuable, so I think it's good to keep encouraging data publication.

 David


 On Fri, 2013-01-04 at 21:18 +, Hugh Glaser wrote:
 Wow Giovanni.
 I wrote the following this afternoon, and have been sitting trying to
 work out whether I should send it.
 I think it means you are not alone in your views!:

 I'm going to sound like a broken record here.

 All well and good, yes it would be great to have the Dogfood server
 working properly.
 But (to push the analogy further), is there any point in making
 DogFood if there are no dogs eating it?
 Is this really what all these clever people should be spending their
 time on?

 I knew Dogfood wasn't in a very good state because I get error reports
 when my system accesses it.
 But did anyone else notice?

 I'm so sad (yes really!) that after all these years people still run
 around getting excited about publishing data, and fiddling with little
 things, and yet it seems there is hardly a system that does any
 significant consumption (of any third party data).

 Best
 Hugh

 On 4 Jan 2013, at 21:02, Giovanni Tummarello giovanni.tummare...@deri.org
  wrote:

  One might just simply stay silent and move along, but i take a few
  seconds to restate the obvious.
 
  It is a fact that Linked data as  publish some stuff and they will
  come, both new publishers and consumers has failed.
 
  The idea of putting some extra energy would simply be useless per se
  BUT it becomes  wrong when one tries to involve others e.g. gullible
  newcomers,  fresh ph.d students who trust that hey if my ph.d advisor
  made a career out of it, and EU gave him so much money it must be real
  right?
 
  IAs community of people who claim to have something to do with
  research (and not a cult) every once in a while is learn from the
  above lesson and devise NEW methods and strategies. In other words,
  move ahead in a smart way.
 
  I am by no mean trowing all away.
 
  * publishing structured data on the web is already a *huge thing* with
  schema.org and the rest. Why? because of the clear incentive SEO.
  * RDF is a great model for heterogeneous data integration and i think
  it will explode in (certain) enterprises (knowledge intensive)
 
  What we're seeking here is more advanced, flexible

Re: Linked Data Dogfood circa. 2013

2013-01-04 Thread Giovanni Tummarello

One might just simply stay silent and move along, but i take a few
seconds to restate the obvious.

It is a fact that Linked data as  publish some stuff and they will
come, both new publishers and consumers has failed.

The idea of putting some extra energy would simply be useless per se
BUT it becomes  wrong when one tries to involve others e.g. gullible
newcomers,  fresh ph.d students who trust that hey if my ph.d advisor
made a career out of it, and EU gave him so much money it must be real
right?

IAs community of people who claim to have something to do with
research (and not a cult) every once in a while is learn from the
above lesson and devise NEW methods and strategies. In other words,
move ahead in a smart way.

I am by no mean trowing all away.

* publishing structured data on the web is already a *huge thing* with
schema.org and the rest. Why? because of the clear incentive SEO.
* RDF is a great model for heterogeneous data integration and i think
it will explode in (certain) enterprises (knowledge intensive)

What we're seeking here is more advanced, flexible uses of structured
data published, e.g. by smart clients, that do useful things for
people.
The key is to show these clients, these useful things. What other
(realistic) incentive can we create that make people publish data? how
would a real linked data client work and provide benefit to a real
world, non academic example class of users (if not all?) .

my wish for 2013 about linked data is that the discussion focuses on
this. With people concentrated on the full circle, round trip
experience, with incentives for all (and how to start the virtuous
circle).

Gio


On Fri, Jan 4, 2013 at 2:03 PM, William Waites w...@styx.org wrote:
 hmmm not so tasty:

 warning: array_keys() [function.array-keys]: The first argument should
 be an array in
 /var/www/drupal-6.22/sites/all/modules/dogfood/dogfood.module on
 line 1807.

 digging deeper:

 The proxy server received an invalid response from an upstream server.
 The proxy server could not handle the request POST /sparql.

 Reason: DNS lookup failure for: data.semanticweb.org

 Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 PHP/5.2.0-8+etch16 mod_ssl/2.2.3
 OpenSSL/0.9.8c Server at data.semanticweb.org Port 80

 (appears to be a reverse proxy at data.semanticweb.org)

 I think I prefer people food...

 Cheers,
 -w

Re: Proposal: register /.well-known/sparql with IANA

2012-12-25 Thread Giovanni Tummarello

 A good argument ... for using sitemaps·


 Yes, those too.

 Fundamentally, we need to give discoverability and associated patterns a lot
 more focus that has been done in the past. This is such a critical component
 for making Linked Data easier to discover and appreciate.


good point re discoverability but you need clients too.

 we rolled out something very simple to understand and deploy in
sitemap back in 2007 even.

http://sw.deri.org/2007/07/sitemapextension/

it has a concept of dataset (each can have a dump a sparql endpoitn
and an extention used to serve resolvable uris)

a few data producers did actually implement it but the problem was on
the consumer side.

We consumed it ..okish at sindice.com but nobody else did, because
there was no semantic web/linked data  client really ever.
focus was on publish your data and something will happen,

Can we think of a client that does something useful:

* for real and not for a made up use corner case easily solved with a
google search + 2 clicks.
* connected to the reality of everyday browsing and web usage e.g.
facebook, chrome browsing or mobile and not . So forget alice wants
to publish her own foaf file. 
* generic enough and giving repeated value not to be a one off thing
not only usable in super narrow contexts.
* for real sustainability and growth, the value must be for both data
publisher and consumer,should be directly measurable in ways people
understand (roi etc)

the client, the use case == the value , everything follows from there.

Google schema.org etc clearly hits all the above except the client its
THEM and everyone goes trough them.

saying this in general for those not in specific to you kingsley :)

Gio

Re: Querying different SPARQL endpoints

2012-12-19 Thread Giovanni Tummarello

Generally speaking its yet another big gaping unsolved problem.

Our stab at it one is to do big data Hadoop based summarization and
then use the summary to understand how to query it.

E.g. this unpublished link below exposes the Sindice.com data graph
summary containing a summary of 20-30 billion triples in sindice.

http://demo.sindice.net/dataset/

take a look at the bbc example, you can see classes, for each class
you can see which property you can use etc.

In this way it becomes possible to write queries that make sense and
not in the blind.

An application that uses this graph (this sort of graph) is then the
assisted sparql query editor (you point it to the summary graph and
have these recommendations)

http://sindicetech.com/sindice-suite/sparqled/

(Contains simple description of the data graph summary)

Gio

On Wed, Dec 19, 2012 at 12:28 PM, Vishal Sinha
vishal.sinha...@yahoo.com wrote:
 There are many public SPARQL endpoints available.
 For example the cultural linked data:
 http://cultura.linkeddata.es/sparql

 How can I know what type of information is available in this dataset.
 Based on what assumption I can query it?
 Do I need to know any structure beforehand?

 Viashal

Re: DBpedia Data Quality Evaluation Campaign

2012-11-15 Thread Giovanni Tummarello

Am i really supposed to know if any of the fact below is wrong?
really?

Gio


dbp-owl:PopulatedPlace/area
10.63 (@type = http://dbpedia.org/datatype/squareKilometre)
dbp-owl:abstract
La Chapelle-Saint-Laud is a commune in the Maine-et-Loire department
of western France. (@lang = en)
dbp-owl:area
1.063e+07 (@type = http://www.w3.org/2001/XMLSchema#double)
dbp-owl:canton
dbpedia:Canton_of_Seiches-sur-le-Loir
dbp-owl:country
dbpedia:France
dbp-owl:department
dbpedia:Maine-et-Loire
dbp-owl:elevation
85.0 (@type = http://www.w3.org/2001/XMLSchema#double)
dbp-owl:intercommunality
dbpedia:Pays_Loire-Angers
dbp-owl:intercommunality
dbpedia:Communauté_de_communes_du_Loir
dbp-owl:maximumElevation
98.0 (@type = http://www.w3.org/2001/XMLSchema#double)
dbp-owl:minimumElevation
28.0 (@type = http://www.w3.org/2001/XMLSchema#double)
dbp-owl:populationTotal
583 (@type = http://www.w3.org/2001/XMLSchema#integer)
dbp-owl:postalCode
49140 (@lang = en)
dbp-owl:region
dbpedia:Pays_de_la_Loire
dbp-prop:areaKm
11 (@type = http://www.w3.org/2001/XMLSchema#integer)
dbp-prop:arrondissement
Angers (@lang = en)
dbp-prop:canton
dbpedia:Canton_of_Seiches-sur-le-Loir
dbp-prop:demonym
Capellaudain, Capellaudaine (@lang = en)
dbp-prop:department
dbpedia:Maine-et-Loire
dbp-prop:elevationM
85 (@type = http://www.w3.org/2001/XMLSchema#integer)
dbp-prop:elevationMaxM
98 (@type = http://www.w3.org/2001/XMLSchema#integer)
dbp-prop:elevationMinM
28 (@type = http://www.w3.org/2001/XMLSchema#integer)
dbp-prop:insee
49076 (@type = http://www.w3.org/2001/XMLSchema#integer)
dbp-prop:intercommunality
dbpedia:Pays_Loire-Angers
dbp-prop:intercommunality
dbpedia:Communauté_de_communes_du_Loir

On Thu, Nov 15, 2012 at 4:58 PM,  zav...@informatik.uni-leipzig.de wrote:
 Dear all,

 As we all know, DBpedia is an important dataset in Linked Data as it is not
 only connected to and from numerous other datasets, but it also is relied
 upon for useful information. However, quality problems are inherent in
 DBpedia be it in terms of incorrectly extracted values or datatype problems
 since it contains information extracted from crowd-sourced content.

 However, not all the data quality problems are automatically detectable.
 Thus, we aim at crowd-sourcing the quality assessment of the dataset. In
 order to perform this assessment, we have developed a tool whereby a user
 can evaluate a random resource by analyzing each triple individually and
 store the results. Therefore, we would like to request you to help us by
 using the tool and evaluating a minimum of 3 resources. Here is the link to
 the tool: http://nl.dbpedia.org:8080/TripleCheckMate/, which also includes
 details on how to use it.

 In order to thank you for your contributions, a lucky winner will win either
 a Samsung Galaxy Tab 2 or an Amazon voucher worth 300 Euro. So, go ahead,
 start evaluating now !! Deadline for submitting your evaluations is 9th
 December, 2012.

 If you have any questions or comments, please do not hesitate to contact us
 at dbpedia-data-qual...@googlegroups.com.

 Thank you very much for your time.

 Regards,
 DBpedia Data Quality Evaluation Team.
 https://groups.google.com/d/forum/dbpedia-data-quality

 
 This message was sent using IMP, the Internet Messaging Program.

Re: DBpedia Data Quality Evaluation Campaign

2012-11-15 Thread Giovanni Tummarello

Hi Soren

i understand. Anyway also wrt to wrong extractions it might be of use
to consider supporting the users e.g. proposing only suspicious cases
and not any resource.

Freebase, as a very last resort, has also been (is?) using
crowdsourcing (e.g. amazon mechanical turk) to solve certain conflicts
that only humans can spot. But this usually enter(ed?) the play after
other tricks had prepared the field.

, e.g. statistical analysis that highlight suspicious cases first e.g.
dates should statistically fall in a certain range, names also
statistically look like names, address like addresses.. etc. if they
dont, send them to the turks.

Proposing to the user just the cases that seem suspicious (and
highlighting which if the many fields) might turn out to help plenty.
cheers
Gio

On Thu, Nov 15, 2012 at 6:19 PM, Sören Auer
a...@informatik.uni-leipzig.de wrote:
 Am 15.11.2012 19:12, schrieb Giovanni Tummarello:
 Am i really supposed to know if any of the fact below is wrong?
 really?

 Its not about factual correctness, but about correct extraction and
 representation. If Wikipedia contains false information DBpedia will
 too, so we can not change this (at that point). What we want to improve,
 however, is the quality of the extraction.

 Best,

 Sören

 dbp-owl:PopulatedPlace/area
 10.63 (@type = http://dbpedia.org/datatype/squareKilometre)
 dbp-owl:abstract
 La Chapelle-Saint-Laud is a commune in the Maine-et-Loire department
 of western France. (@lang = en)
 dbp-owl:area
 1.063e+07 (@type = http://www.w3.org/2001/XMLSchema#double)
 dbp-owl:canton
 dbpedia:Canton_of_Seiches-sur-le-Loir
 dbp-owl:country
 dbpedia:France
 dbp-owl:department
 dbpedia:Maine-et-Loire
 dbp-owl:elevation
 85.0 (@type = http://www.w3.org/2001/XMLSchema#double)
 dbp-owl:intercommunality
 dbpedia:Pays_Loire-Angers
 dbp-owl:intercommunality
 dbpedia:Communauté_de_communes_du_Loir
 dbp-owl:maximumElevation
 98.0 (@type = http://www.w3.org/2001/XMLSchema#double)
 dbp-owl:minimumElevation
 28.0 (@type = http://www.w3.org/2001/XMLSchema#double)
 dbp-owl:populationTotal
 583 (@type = http://www.w3.org/2001/XMLSchema#integer)
 dbp-owl:postalCode
 49140 (@lang = en)
 dbp-owl:region
 dbpedia:Pays_de_la_Loire
 dbp-prop:areaKm
 11 (@type = http://www.w3.org/2001/XMLSchema#integer)
 dbp-prop:arrondissement
 Angers (@lang = en)
 dbp-prop:canton
 dbpedia:Canton_of_Seiches-sur-le-Loir
 dbp-prop:demonym
 Capellaudain, Capellaudaine (@lang = en)
 dbp-prop:department
 dbpedia:Maine-et-Loire
 dbp-prop:elevationM
 85 (@type = http://www.w3.org/2001/XMLSchema#integer)
 dbp-prop:elevationMaxM
 98 (@type = http://www.w3.org/2001/XMLSchema#integer)
 dbp-prop:elevationMinM
 28 (@type = http://www.w3.org/2001/XMLSchema#integer)
 dbp-prop:insee
 49076 (@type = http://www.w3.org/2001/XMLSchema#integer)
 dbp-prop:intercommunality
 dbpedia:Pays_Loire-Angers
 dbp-prop:intercommunality
 dbpedia:Communauté_de_communes_du_Loir

 On Thu, Nov 15, 2012 at 4:58 PM,  zav...@informatik.uni-leipzig.de wrote:
 Dear all,

 As we all know, DBpedia is an important dataset in Linked Data as it is not
 only connected to and from numerous other datasets, but it also is relied
 upon for useful information. However, quality problems are inherent in
 DBpedia be it in terms of incorrectly extracted values or datatype problems
 since it contains information extracted from crowd-sourced content.

 However, not all the data quality problems are automatically detectable.
 Thus, we aim at crowd-sourcing the quality assessment of the dataset. In
 order to perform this assessment, we have developed a tool whereby a user
 can evaluate a random resource by analyzing each triple individually and
 store the results. Therefore, we would like to request you to help us by
 using the tool and evaluating a minimum of 3 resources. Here is the link to
 the tool: http://nl.dbpedia.org:8080/TripleCheckMate/, which also includes
 details on how to use it.

 In order to thank you for your contributions, a lucky winner will win either
 a Samsung Galaxy Tab 2 or an Amazon voucher worth 300 Euro. So, go ahead,
 start evaluating now !! Deadline for submitting your evaluations is 9th
 December, 2012.

 If you have any questions or comments, please do not hesitate to contact us
 at dbpedia-data-qual...@googlegroups.com.

 Thank you very much for your time.

 Regards,
 DBpedia Data Quality Evaluation Team.
 https://groups.google.com/d/forum/dbpedia-data-quality

 
 This message was sent using IMP, the Internet Messaging Program.

Re: Current agreement upon named graphs

2012-11-12 Thread Giovanni Tummarello

Sorry all i might be missing a lot of subtleties

are we saying that in the current specs and implementation one can
alter the content of graph B by messing with some triples on a graph A
(one with a blank node?) 

Pat i dont get the 'case where subsets of a single large graph are
being isolated and processed'  ? could you make an example?

thanks
Gio

On Mon, Nov 12, 2012 at 10:18 AM, Pat Hayes pha...@ihmc.us wrote:
 formally allow two distinct graphs to be able to share bnodes, because of the 
 case where subsets of a single large graph are being isolated and processed. 
 We also wanted to ensure tha

Tenured position on Intelligent Big Data (FBK-Italy)

2012-10-16 Thread Giovanni Tummarello

Maybe of interest to some, can be related to data streams,low level
semantic but also to linked data/higher level knowledge
representation.

http://risorseumane.fbk.eu/sites/risorseumane.fbk.eu/files/Call%20TenureTrackICT-BigData.pdf

Gio

Re: Expensive links in Linked Data

2012-09-29 Thread Giovanni Tummarello

Actually some interesting stuff was already in

http://www.w3.org/Submission/2012/SUBM-ldbp-20120326/

(e.g. paging, ordering) not sure what happened to it thought.
i guess it will be considered in the WG you mention.
It ends in 2014, looking forward to see the outcome.

Gio

On Fri, Sep 28, 2012 at 7:02 PM, Barry Norton barry.nor...@ontotext.com wrote:
 It's worth pointing out that there IS finally a W3C working group looking at
 these issues:

 http://www.w3.org/2012/ldp/charter.html


 Barry

 - Reply message -
 From: SERVANT Francois-Paul francois-paul.serv...@renault.com
 Date: Fri, Sep 28, 2012 17:54
 Subject: Expensive links in Linked Data
 To: Giovanni Tummarello giovanni.tummare...@deri.org
 Cc: Heiko Paulheim paulh...@ke.tu-darmstadt.de, public-lod@w3.org
 public-lod@w3.org


 Hi,

 may I say that the situation you describe is a bit disappointing? The
 unaddressed issues that you mention had already been raised shortly after
 the publishing of the linked data principles, years ago. I find it is a
 pity if they remain unanswered, because this can jeopardize one of the major
 benefits of RDF and Linked Data: the ability to publish data that can then
 easily been read, aggregated and used in generic ways.

 Best,

 fps



 -Message d'origine-
 De : g.tummare...@gmail.com [mailto:g.tummare...@gmail.com]
 De la part de Giovanni Tummarello
 Envoyé : vendredi 28 septembre 2012 17:13
 À : SERVANT Francois-Paul
 Cc : Heiko Paulheim; public-lod@w3.org
 Objet : Re: Expensive links in Linked Data

 Short answer is no,

 linked data standards have never addressed this and many
 other even basic problems(e.g. what if there are too many
 properties of one kind,  what kind of level of description
 you're supposed to get (e.g.
 recourse on blank nodes?), what is a standard way to find the
 entry URI for an object exposed given a description?  etc etc.

 Just create a normal web API (rest?)  and throttle/meter/bill
 as desired using one of the services to do that quickly my2c Gio

 On Fri, Sep 28, 2012 at 4:54 PM, SERVANT Francois-Paul
 francois-paul.serv...@renault.com wrote:
  Thanks,
 
  no, this doesn't solve the problem. A user gets ex:e0 (the cheap
  resource). Though she can see that there is the link to the
 expensive
  resource, she doesn't know the meaning of the link (it is just an
  owl:sameAs): she doesn't know what this is about. (Note also that
  there could be several expensive properties)
 
  Best,
 
  fps
 
 
 
  -Message d'origine-
  De : Heiko Paulheim [mailto:paulh...@ke.tu-darmstadt.de]
  Envoyé : vendredi 28 septembre 2012 16:42 À : public-lod@w3.org;
  SERVANT Francois-Paul Objet : Re: Expensive links in Linked Data
 
  Hi Francois-Paul,
 
  how about that solution:
 
  You publish the cheap data about your entity under
  http://example.org/e0, which is the official URI of that entity:
  ex:e0 owl:sameAs ex:e0expensive
  ex:e0 :cheapProp ...
 
  And under http://example.org/ex:e0expensive, you publish
  ex:e0expensive owl:sameAs ex:e0 ex:e0expensive :expensiveProp ...
 
  So people following links in LOD will always land at a
 page without
  the expensive properties, and those who really want to know can
  follow the sameAs link.
 
  Does that solve your problem?
 
  Best,
  Heiko
 
 
 
  Am 28.09.2012 16:32, schrieb SERVANT Francois-Paul:
   Hi,
  
   How do you include links to results of computations in
 Linked Data?
  
   For instance, you publish data about entities of a given
  class. A property, let's call it :expensiveProp, has this class as
  domain, and you know that computing or publishing the
 corresponding
  triples is expensive. In such a case, you don't want to
 produce these
  triples each time one of your entities is accessed. You want to
  include in the representation of your entity only a link to that
  information.
  
   A no-brainer, at first sight.
  
   Are there any recommended ways to proceed?
  
   TIA
  
   fps
   -- Disclaimer 
   Ce message ainsi que les eventuelles pieces jointes
  constituent une correspondance privee et confidentielle a
 l'attention
  exclusive du destinataire designe ci-dessus. Si vous n'etes pas le
  destinataire du present message ou une personne susceptible de
  pouvoir le lui delivrer, il vous est signifie que toute
 divulgation,
  distribution ou copie de cette transmission est strictement
  interdite. Si vous avez recu ce message par erreur, nous vous
  remercions d'en informer l'expediteur par telephone ou de lui
  retourner le present message, puis d'effacer immediatement
 ce message
  de votre systeme.
  
   *** This e-mail and any attachments is a confidential
  correspondence intended only for use of the individual or entity
  named above. If you are not the intended recipient or the agent
  responsible for delivering the message to the intended
 recipient, you
  are hereby notified that any disclosure, distribution or
 copying of
  this communication is strictly

Re: Expensive links in Linked Data

2012-09-28 Thread Giovanni Tummarello

Short answer is no,

linked data standards have never addressed this and many other even
basic problems(e.g. what if there are too many properties of one kind,
 what kind of level of description you're supposed to get (e.g.
recourse on blank nodes?), what is a standard way to find the entry
URI for an object exposed given a description?  etc etc.

Just create a normal web API (rest?)  and throttle/meter/bill as
desired using one of the services to do that quickly
my2c
Gio

On Fri, Sep 28, 2012 at 4:54 PM, SERVANT Francois-Paul
francois-paul.serv...@renault.com wrote:
 Thanks,

 no, this doesn't solve the problem. A user gets ex:e0 (the cheap resource). 
 Though she can see that there is the link to the expensive resource, she 
 doesn't know the meaning of the link (it is just an owl:sameAs): she doesn't 
 know what this is about. (Note also that there could be several expensive 
 properties)

 Best,

 fps



 -Message d'origine-
 De : Heiko Paulheim [mailto:paulh...@ke.tu-darmstadt.de]
 Envoyé : vendredi 28 septembre 2012 16:42
 À : public-lod@w3.org; SERVANT Francois-Paul
 Objet : Re: Expensive links in Linked Data

 Hi Francois-Paul,

 how about that solution:

 You publish the cheap data about your entity under
 http://example.org/e0, which is the official URI of that entity:
 ex:e0 owl:sameAs ex:e0expensive
 ex:e0 :cheapProp ...

 And under http://example.org/ex:e0expensive, you publish
 ex:e0expensive owl:sameAs ex:e0 ex:e0expensive :expensiveProp ...

 So people following links in LOD will always land at a page
 without the expensive properties, and those who really want
 to know can follow the sameAs link.

 Does that solve your problem?

 Best,
 Heiko



 Am 28.09.2012 16:32, schrieb SERVANT Francois-Paul:
  Hi,
 
  How do you include links to results of computations in Linked Data?
 
  For instance, you publish data about entities of a given
 class. A property, let's call it :expensiveProp, has this
 class as domain, and you know that computing or publishing
 the corresponding triples is expensive. In such a case, you
 don't want to produce these triples each time one of your
 entities is accessed. You want to include in the
 representation of your entity only a link to that information.
 
  A no-brainer, at first sight.
 
  Are there any recommended ways to proceed?
 
  TIA
 
  fps
  -- Disclaimer 
  Ce message ainsi que les eventuelles pieces jointes
 constituent une correspondance privee et confidentielle a
 l'attention exclusive du destinataire designe ci-dessus. Si
 vous n'etes pas le destinataire du present message ou une
 personne susceptible de pouvoir le lui delivrer, il vous est
 signifie que toute divulgation, distribution ou copie de
 cette transmission est strictement interdite. Si vous avez
 recu ce message par erreur, nous vous remercions d'en
 informer l'expediteur par telephone ou de lui retourner le
 present message, puis d'effacer immediatement ce message de
 votre systeme.
 
  *** This e-mail and any attachments is a confidential
 correspondence intended only for use of the individual or
 entity named above. If you are not the intended recipient or
 the agent responsible for delivering the message to the
 intended recipient, you are hereby notified that any
 disclosure, distribution or copying of this communication is
 strictly prohibited. If you have received this communication
 in error, please notify the sender by phone or by replying
 this message, and then delete this message from your system.
 
 

 --
 Dr. Heiko Paulheim
 Knowledge Engineering Group
 Technische Universität Darmstadt
 Phone: +49 6151 16 6634
 Fax:   +49 6151 16 5482
 http://www.ke.tu-darmstadt.de/staff/heiko-paulheim


 -- Disclaimer 
 Ce message ainsi que les eventuelles pieces jointes constituent une 
 correspondance privee et confidentielle a l'attention exclusive du 
 destinataire designe ci-dessus. Si vous n'etes pas le destinataire du present 
 message ou une personne susceptible de pouvoir le lui delivrer, il vous est 
 signifie que toute divulgation, distribution ou copie de cette transmission 
 est strictement interdite. Si vous avez recu ce message par erreur, nous vous 
 remercions d'en informer l'expediteur par telephone ou de lui retourner le 
 present message, puis d'effacer immediatement ce message de votre systeme.

 *** This e-mail and any attachments is a confidential correspondence intended 
 only for use of the individual or entity named above. If you are not the 
 intended recipient or the agent responsible for delivering the message to the 
 intended recipient, you are hereby notified that any disclosure, distribution 
 or copying of this communication is strictly prohibited. If you have received 
 this communication in error, please notify the sender by phone or by replying 
 this message, and then delete this message from your system.

Re: LD browser rot

2012-09-22 Thread Giovanni Tummarello

Sebastian, you might want to use the classic inspector we have at Sindice

http://inspector.sindice.com/inspect?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gagacontent=contentType=auto#SIGMA

inspector.sindice.com is used often so you can count on it to be
pretty much ingood working order.

also notice that the SIGMA view in inspector will resolve the URLS to
retrieve, for example, labels of objects that are mentioned in that
graph. (notice how the labels of the songs get resoved ina ajaxy ways)

It deals with linked data but also any other makup handled by
apacheany23.  (microdata, microformats,rdfa etc etc)

I also particularly appreciate this view

http://inspector.sindice.com/inspect?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gagacontent=contentType=auto#HTML

courtesy of Roberto Garcia

Gio

On Sat, Sep 22, 2012 at 12:34 PM, Sebastian Hellmann
hellm...@informatik.uni-leipzig.de wrote:
 Hi all,
 I was looking for simple linked data browsers and started at:
 http://browse.semanticweb.org/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gaga
 Here is the sad story:

 First, I had to switch to Chrome as browse.semanticweb.org didn't work in my
 Firefox (could be the fault of my Firefox customization)

 URIs in order as  given by http://browse.semanticweb.org

 Fail -
 http://dig.csail.mit.edu/2005/ajar/release/tabulator/0.8/tab?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gaga
 Good:
 http://iwb.fluidops.com/resource/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gaga
 Fail -
 http://visinav.deri.org/detail?focus=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gaga
 Fail -
 http://www5.wiwiss.fu-berlin.de/marbles?lang=enuri=http://dbpedia.org/resource/Lady_Gaga
 Fail -
 http://dataviewer.zitgist.com/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gaga
 Good -
 http://demo.openlinksw.com/rdfbrowser2/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gaga
 Fail -
 http://www4.wiwiss.fu-berlin.de/rdf_browser/?browse_uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gaga
 Good -
 http://graphite.ecs.soton.ac.uk/browser/?uri=http://dbpedia.org/resource/Lady_Gaga
 ?? -
 http://139.82.71.26:3001/explorator/index?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FLady_Gaga
 ?? - Triplr- putting non information resources in Triplr doesn't make sense:
 http://triplr.org/turtle/dbpedia.org/resource/Lady_Gaga

 DBpedia seems to be working fine:
 curl -IL -H Accept: application/rdf+xml
 http://dbpedia.org/resource/Lady_Gaga

 I would added this issue to individual trackers, but I think it is something
 a community should solve together.

 All the best,
 Sebastian


 --
 Dipl. Inf. Sebastian Hellmann
 Department of Computer Science, University of Leipzig
 Events:
 * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
 * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
 Projects: http://nlp2rdf.org , http://dbpedia.org
 Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
 Research Group: http://aksw.org

Re: Linked Data Demand Discussion Culture on this List, WAS: Introducing Semgel, a semantic database app for gathering analyzing data from websites

2012-07-21 Thread Giovanni Tummarello

In the past months i have worked a lot on the commercialization of RDF
basedknowledge technologies so i feel like giving a contribution.

We tried to understand what could be of interest to enterprise and
came up with the slogan - or lets say adopted -  enterprise linked
data clouds with an internally matured understanding of what this
means and how it deliver value.

In our experience, Linked Data that can be of interest to enterprise
cannot be further away from so many of the things that have been
preached and pushed with prominence (i'll mention a few things like
303s,  follow your nose even  resolvable data uris,  sameAs , 5
star data publishing , vocabolary x y that was never used outside
demos... insert here so much more ).

Similary is very far away from saying 'replace your existing running
system with anything RDF based'. Wont even speak about preaching the
value of publishin data as lod.

To find value that can be sold i'd go back to the basic a bit.

 RDF is very nice at Knowledge Representation.  Matter of fact might
be the most solid industrial tool there is for this. Great way to
serialize knowledge with properties attached to the data, great way to
merge, great way to ship it to others (and hope they'll understand it)
thanks to shared URIs of properties.  A mature query language.

Ok so where does this come into use SPECIFICALLY? (that is you can
demonstrate superiority vs other existing technologies)

I'd say only in environments/use cases/ business sectors  where

* knowledge can come from many sources, AND
* new sources popping up all the time,  AND
*  sources which are complex, might have a lot of rich descriptions,
* time to explore and understand them is limited,
* AND of course sufficient SCALE of the operation/business to support
the development/ have time to learn and understand this etc.

The first sectors that come to mind with these needs are (at least
come to mind to me) pharmaceutical, defense-military, scientific
technical publishing.  (they're the first that come to mind given that
in my ownlittle personal experience these are the sector that 'came to
us' and really didnt need pitching or just minimal)

One can say that, looking well, a lot of others, potentially, in the
future might have similar need.

True.. but they might when you put another elements into this: data
scale (bigdata)  and robustness AND (given the last point of the
previous list which is) enterprise strenght credibility.

Here we as a community, IMO have not been shining:.

* bigdata - just not there. Sorry but publishing a big data set as
in LOD doesnt count as a difficult data operation to do. Semantic
technologies have notoriously been proposed by academics with very
often not even the slightest notion of what traditional data
processing systems do, even a basic RDBMS. Get the names of the
peoplewho have published and have been incensed on semantic web and
intersect that with that of conferences that matter to industry (and
the world)

* robustness - all systems have been shaky at best again due to being
too often just trow away prototypes (when coming from academia). In
other cases companies venturing into this field have been way too much
distracted/ pressured/ (and finally got self convinced) into
implementing and caring about features (see all those mentioned above
and more)  that were unrequested to begin with, and which value was
just based on a conjecture.

* missing obvious features. Other features were neglected becouse not
fitting with the pure originalvisions why restricting ourself to
triples? quads or quintuples for example make so much sense but oh my
god what would the community have said. And now systems that have
these features e.g. certain graph sstores are the obvious choices in
certain cases.

Somebody mentioned Garlik as a success story earlier. They got this
right, but by concentrating on thigs that made sense for industry
(their industry) with minimal features that were needed (their 5store
- the production large scale data processing triplestore really
implements just a bare subsset of sparql, they reason only with some
simple rules etc) but done with proper engineering.

So my conclusion in short.

There are, in our opinion and analysis,  reasons why semantic data
technologies/ large scale knowledge representation have a lot to give
to society. However to have credibility have some result, the
community must get humble , look at what's happening in the real
world of data integration and big data.
The community must honestly assess where semantic technologies don't
fit and on the other hand which features of the semantic web  stack
make some sense and bring value to the scenarios that have (bring)
economic value)

Gio




On Sat, Jul 21, 2012 at 1:05 AM, Sebastian Schaffert
sebastian.schaff...@salzburgresearch.at wrote:
 Hi Dave,

 comments inline. :)

 Am 20.07.2012 um 23:25 schrieb Dave Reynolds:

 Hi Sebastian,

 I completely agree with what you say about:
  o Harish's

Re: SparQLed: Data assisted SPARQL editor available OpenSource

2012-07-19 Thread Giovanni Tummarello

Thanks

MQL editor from freebase has always been an inspiration for us (like
everything else that came somehow from the MIT Simile group, David
Huyhn, Stefano Mazzocchi etc).

The goal here is to hopefully ignite activity on this long missing
piece of sem web tool. Wether its going to be sparqled or something
else that takes inspiration from it doesnt matter as long as we can
finally get sparql to be usable.

Gio


On Wed, Jul 18, 2012 at 8:40 PM, Yury Katkov katkov.ju...@gmail.com wrote:
 Hi!

 Looks very cool and reminds me on equally awesome MQL Editor on
 Freebase. [1] Thanks!

 [1] http://www.freebase.com/queryeditor
 -
 Yury Katkov



 On Tue, Jul 17, 2012 at 2:30 PM, Giovanni Tummarello
 giovanni.tummare...@deri.org wrote:
 Thanks for the comments we received.

 To answer some of the requests and the will it scale on complex
 datasets  we have now a sparqled which assists writing queries on the
 latest DBPedia dump

 http://demo.sindice.net/dbpedia-sparqled/

 We look forward to making Sparql a collaborative, collectively owned
 project. Pls sign up  to the google group to express your support.

 cheers
 Gio


 On Sat, Jun 30, 2012 at 6:52 PM, Giovanni Tummarello
 giovanni.tummare...@deri.org wrote:
 Dear all,

 we're happy to release open source today (actually yesterday :) )  a
 first version of our data assisted SPARQL query editor

 here is a short blog post which then leads to the homepage and other 
 material

 http://www.sindicetech.com/blog/?p=14preview=true

 ---

 Our desire is to make this a community driven project.

  In a few weeks we plan to licence the whole things as Apache and,
 with your support, make this a significant improvement into usability
 of semantic web tools.

 we look forward to your feedback.

 Gio

regularly refreshed partial LOD + Web sparql endpoint

2012-07-18 Thread Giovanni Tummarello

It might be of interest to some that in Sindice.com we switched from
trying to index all in SPARQL to a mixed approach where all appears on
the frontpage realtime but just selected Websites (rdf,rdfa,
microformats, microdaa etc) + selected LOD datasets appear in a
regularly updated (though not real time) appear in SPARQL.

This solution allows us to have a reasonable quality of service -
while fitting in our limited research resources (as Sindice.com is a
research project).

By providing this service we intend to foster experimentation by the
community that can now be sure that their favorite dataset is loaded
(just send us a request) and can be queried e.g. in SPARQL next to
their favorite web of data website (just make sure its in the list of
those indexed or send us a request).

Some details of this mechanism (and the fact that this made us process
100M rdf docs in a day) in this blog post.

A UI making all more clear is coming in august.

http://blog.sindice.com/2012/07/18/how-we-ingested-100m-semantic-documents-in-a-day-and-were-do-they-come-from/

Thanks must go to Openlink for the support provided in setting this
mechanism up and to the others mentioned in the blog post.
Gio

Re: SparQLed: Data assisted SPARQL editor available OpenSource

2012-07-17 Thread Giovanni Tummarello

Thanks for the comments we received.

To answer some of the requests and the will it scale on complex
datasets  we have now a sparqled which assists writing queries on the
latest DBPedia dump

http://demo.sindice.net/dbpedia-sparqled/

We look forward to making Sparql a collaborative, collectively owned
project. Pls sign up  to the google group to express your support.

cheers
Gio


On Sat, Jun 30, 2012 at 6:52 PM, Giovanni Tummarello
giovanni.tummare...@deri.org wrote:
 Dear all,

 we're happy to release open source today (actually yesterday :) )  a
 first version of our data assisted SPARQL query editor

 here is a short blog post which then leads to the homepage and other material

 http://www.sindicetech.com/blog/?p=14preview=true

 ---

 Our desire is to make this a community driven project.

  In a few weeks we plan to licence the whole things as Apache and,
 with your support, make this a significant improvement into usability
 of semantic web tools.

 we look forward to your feedback.

 Gio

SparQLed: Data assisted SPARQL editor available OpenSource

2012-06-30 Thread Giovanni Tummarello

Dear all,

we're happy to release open source today (actually yesterday :) )  a
first version of our data assisted SPARQL query editor

here is a short blog post which then leads to the homepage and other material

http://www.sindicetech.com/blog/?p=14preview=true

---

Our desire is to make this a community driven project.

 In a few weeks we plan to licence the whole things as Apache and,
with your support, make this a significant improvement into usability
of semantic web tools.

we look forward to your feedback.

Gio

Re: Is there a web service for RDF validation?

2012-05-13 Thread Giovanni Tummarello

Mightwant to try Inspector.sindice.com

might be of SOME use e.g. it will report errors that the w3c wont
report it will support also RDFa of course and other form of data
embedding
might notsupport trig :)
cheers
Gio

On Fri, May 11, 2012 at 11:08 AM, Mark Thompson mark9...@gmail.com wrote:
 Hi all,

 A pretty straight-forward question from a newbie to this field: are
 there any web-services available for doing RDF validation?

 By web-service I mean something with a proper web-service API: e.g.
 REST or SOAP, which you could easily call from within your own
 application.

 Of course I've seen http://www.w3.org/RDF/Validator/ and
 http://www.rdfabout.com/demo/validator/ , but they don't have
 REST-like interfaces and also seem quite out-of date with respect to
 input formats (e.g. no trig). Also, named graph support would be great
 to have.

 Given the advanced state of development of RDF tools like Jena /
 Sesame it seems a web-service should be available somewhere... or is
 it time to build something like that myself? Any pointers greatly
 appreciated!

 Cheers,
 Mark

Yet more metadata statistics out - from Sindice

2012-04-17 Thread Giovanni Tummarello

HI Peter, all

to add (a probably small element of discussion) to this

i am happy to say that last week we released on the frontpage some
analytics stats which are fresh updated every week.

At the moment they come from  500million+ web URLS. Maybe not much but
pls notice we ONLY retain web urld which return RDF, RDFa, Microdata,
Microforamts etc (and trow away trivial markup).

Next week we hope to release the detailed per domain stats.

General analytics (classes) :

http://sindice.com/stats/direct/basic-class-stats?settings=%7B%22iCreate%22%3A1334676375502%2C%22iStart%22%3A0%2C%22iEnd%22%3A50%2C%22iLength%22%3A50%2C%22sFilter%22%3A%22%22%2C%22sFilterEsc%22%3Atrue%2C%22aaSorting%22%3A%5B%5B4%2C%22desc%22%5D%5D%2C%22aaSearchCols%22%3A%5B%5B%22%22%2Ctrue%5D%2C%5B%22%22%2Ctrue%5D%2C%5B%22%22%2Ctrue%5D%2C%5B%22%22%2Ctrue%5D%2C%5B%22%22%2Ctrue%5D%5D%2C%22abVisCols%22%3A%5Btrue%2Ctrue%2Ctrue%2Ctrue%2Ctrue%5D%2C%22ssDelta%22%3A%22%22%7D

Schema specific analytics:

http://sindice.com/stats/direct/basic-class-stats?settings=%7B%22iCreate%22%3A1334676375502%2C%22iStart%22%3A0%2C%22iEnd%22%3A50%2C%22iLength%22%3A50%2C%22sFilter%22%3A%22%22%2C%22sFilterEsc%22%3Atrue%2C%22aaSorting%22%3A%5B%5B4%2C%22desc%22%5D%5D%2C%22aaSearchCols%22%3A%5B%5B%22%22%2Ctrue%5D%2C%5B%22%22%2Ctrue%5D%2C%5B%22%22%2Ctrue%5D%2C%5B%22%22%2Ctrue%5D%2C%5B%22%22%2Ctrue%5D%5D%2C%22abVisCols%22%3A%5Btrue%2Ctrue%2Ctrue%2Ctrue%2Ctrue%5D%2C%22ssDelta%22%3A%22%22%7D


Its all on the homepage on http://sindice.com (see analytics tab)

Note: sindice is NOT at this point wildly crawling the web but rather
is accepting (and acting immediately) submissions of sitemaps, pings
and RDF datasets. Please submit yours to see them indexed (and
refreshed) at a reasonable rate nowadays


cheers
Gio

On Tue, Apr 17, 2012 at 4:06 PM, Peter Mika pm...@yahoo-inc.com wrote:
 Hi All,

 To add one more data point to the previous discussion about
 webdatacommons.org, we have recently presented a short position paper at the
 LDOW 2012 workshop at WWW 2012. Online at

 http://events.linkeddata.org/ldow2012/papers/ldow2012-inv-paper-1.pdf

 Please compare this carefully with the results of Bizer et al.:

 http://events.linkeddata.org/ldow2012/papers/ldow2012-inv-paper-2.pdf

 As it always the case with statistics, it matters what you count on and how
 you count ;) For example, Chris and his co-authors did not consider most of
 OGP data on the Web, which results in large discrepancies in the counts for
 RDFa, as well as overall counts.

 Nevertheless, both studies confirm that the Semantic Web, and in particular
 metadata in HTML, is taking on in major ways thanks to the efforts of
 Facebook, the sponsors of schema.org and many other individuals and
 organizations. Comparing to our previous numbers, for example we see a
 five-fold increase in RDFa usage with 25% of webpages containing RDFa data
 (including OGP), and over 7% of web pages containing microdata. These are
 incredibly impressive numbers, which illustrate that this part of the
 Semantic Web has gone mainstream.

 Cheers,
 Peter

Re: ANN: LODIB - Linked Open Data Integration Benchmark

2012-04-11 Thread Giovanni Tummarello

Hi lots of work here! Wondering Did you consider evaluating if its worth
learning the quite involved rdf based syntaxes of many of these vs a simple
python jython or whatever script? Obviously you need Turing completeness so
I wonder (and a benchmark should bother to show imo) if its worth the
effort of ad hoc systems. (Let alone - and you can't really leave this
alone in 2012 - the lack of ide or tool support for non standard
programming paradigms)
Gio
On Apr 11, 2012 1:23 PM, Andreas Schultz a.schu...@fu-berlin.de wrote:

 (Apologies for cross-posting)

 Hi all,

 we are happy to announce the release of the Linked Open Data
 Integration Benchmark (LODIB), which has been devised for comparing the
 expressivity as well as the runtime performance of Linked Data translation
 systems. It was developed by a collaboration between the University of
 Sevilla and the Web Based Systems Group.

 The benchmark was devised by looking at a sample of entities from the LOD
 Cloud. From this sample we extracted a catalog of fifteen data translation
 patterns and survey how often these patterns occur in the example set.
 Based on these statistics, we designed our benchmark that aims to reflect
 the real-world heterogeneities that exist on the Web of Data. The
 benchmark also comes with a data generator in order to test data
 translation systems at differet scales.

 We applied the benchmark to test the performance of two data translation
 systems, Mosto and LDIF, and compare the performance of the systems with
 the SPARQL 1.1 CONSTRUCT query performance of the Jena TDB RDF store.

 LODIB is publicly available at:

 http://lodib.wbsg.de


 Best regards,
 Carlos R. Rivero (University of Sevilla), Andreas Schultz (Freie
 Universität
 Berlin), Christian Bizer (Freie Universität Berlin) and David Ruiz
 (University of Sevilla)

Re: Change Proposal for HttpRange-14

2012-03-27 Thread Giovanni Tummarello

Tom if you were to do a serious assessment then measuring milliseconds
and redirect hits means looking at a misleading 10% of the problem.

Cognitive loads,economics and perception of benefits are the over the
90% of the question here.

An assessment that could begin describing the issue

* get a normal webmaster calculate how much it takes to explain him
the thing,follow him on and
* see how quickly he forgets,
* assess how much it takes to VALIDATE the whole thing works (E.g. a
newly implemented spects)
* assess what are the tools that would check if something break
* assess the same thing for implementers e.g. of applications or
consuming APIs to get all teh above
* then once you calculate the huge cost above then compare it with the
perceived benefits.

THEN REDO ALL AT MANAGEMENT LEVEL once you're finished with technical
level because for sites that matters ITS MANAGERS THAT DECIDE geek run
websites dont count, sorry.

Same thing when looking at 'real world applications' by counting just
geeky hacked together demostrators or semweb aficionados libs has the
same skew.. these people and apps were paid by EU money or research
money or  so they should'n  count toward real world economics driven
apps, so if one was thinking of counting   50 apps that would break
that'd be just as partial and misleading.

.. and we could go on. Now do you really need to do the above? (let
alone how difficult it is to do inproper terms) me and a whole crowd
know already the  results for the same exercise have been done over
and over and we've been witnessing it.
 i sincerely hope this is the time we get this fixed so we can indeed
go back and talk about the new linked data (linked data 2.0) to actual
web developers, it managers etc.

removing the 303 thing doesnt solve the whole problem, it is just the
beginning. Looking forward to discuss next steps

Gio




On Mon, Mar 26, 2012 at 6:13 PM, Tom Heath tom.he...@talis.com wrote:
 Hi Jeni,

 On 26 March 2012 16:47, Jeni Tennison j...@jenitennison.com wrote:
 Tom,

 On 26 Mar 2012, at 16:05, Tom Heath wrote:
 On 23 March 2012 15:35, Steve Harris steve.har...@garlik.com wrote:
 I'm sure many people are just deeply bored of this discussion.

 No offense intended to Jeni and others who are working hard on this,
 but *amen*, with bells on!

 One of the things that bothers me most about the many years worth of
 httpRange-14 discussions (and the implications that HR14 is
 partly/heavily/solely to blame for slowing adoption of Linked Data) is
 the almost complete lack of hard data being used to inform the
 discussions. For a community populated heavily with scientists I find
 that pretty tragic.


 What hard data do you think would resolve (or if not resolve, at least move 
 forward) the argument? Some people  are contributing their own experience 
 from building systems, but perhaps that's too anecdotal? Would a
 structured survey be helpful? Or do you think we might be able to pick up 
 trends from the webdatacommons.org  (or similar) data?

 A few things come to mind:

 1) a rigorous assessment of how difficult people *really* find it to
 understand distinctions such as things vs documents about things.
 I've heard many people claim that they've failed to explain this (or
 similar) successfully to developers/adopters; my personal experience
 is that everyone gets it, it's no big deal (and IRs/NIRs would
 probably never enter into the discussion).

 2) hard data about the 303 redirect penalty, from a consumer and
 publisher side. Lots of claims get made about this but I've never seen
 hard evidence of the cost of this; it may be trivial, we don't know in
 any reliable way. I've been considering writing a paper on this for
 the ISWC2012 Experiments and Evaluation track, but am short on spare
 time. If anyone wants to join me please shout.

 3) hard data about occurrences of different patterns/anti-patterns; we
 need something more concrete/comprehensive than the list in the change
 proposal document.

 4) examples of cases where the use of anti-patterns has actually
 caused real problems for people, and I don't mean problems in
 principle; have planes fallen out of the sky, has anyone died? Does it
 really matter from a consumption perspective? The answer to this is
 probably not, which may indicate a larger problem of non-adoption.

 The larger question is how do we get to a state where we *don't* have this 
 permathread running, year in year
 out. Jonathan and the TAG's aim with the call for change proposals is to get 
 us to that state. The idea is that by
 getting people who think that the specs should say something different to 
 put their money where their mouth is  and express what that should be, we 
 have something more solid to work from than reams and reams of
 opinionated emails.

 This is a really worthy goal, and thank you to you, Jonathan and the
 TAG for taking it on. I long for the situation you describe where the
 permathread is 'permadead' :)

 But we do all need

Re: The Battle for Linked Data

2012-03-26 Thread Giovanni Tummarello

Hugh

 here i share my recent experience with a big time (smart) tech
manager of a big time (smart) enterprises we're working on.

* He kept on telling us we're doing liked data, linked data is hot
* I tried to convince him that no.. really liked data is this insane
things that people should use 303,  link to others etc..
* He kept on saying na.. we've structured data, we're linking it
inside the way we think its reasonable, we're importing other datasets
and linked them.. we're doing linked data.
*i stopped arguing.. :) and now i used the term when talking to
enterprises and they get it. The whole web protocol part neglected

just sharing the way it is now. The term is just too good to pass. I
would love a looser more comprehensive, more reusable, more useful
definition so we could finally all use it in a way that's supported by
some doc out there.. but until that doc exists..

Gio

On Mon, Mar 26, 2012 at 5:49 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 So What is Linked Data?
 And relatedly, Who Owns the Term Linked Data?
 (If we used a URI for Linked Data, it might or might not be clearer.)

 Of course most people think that What *I* think is Linked Data is Linked 
 Data.
 And by construction, if it is different it is not Linked Data.
 Kingsley views the stuff people are talking about that does not, for example, 
 conform to a policy that includes Range-14 as Structured Data - naming 
 things is important, as we well know, and can serve to separate communities..

 There are clearly quite a few people who would like to relax things, and even 
 go so far as to drop the IR thing completely, but still want to have the 
 Linked Data badge on the resultant Project.
 There are others for whom that is anathema.

 I actually think that what we are watching is the attempt of the Linked Data 
 child to fly the nest from the Semantic Web.
 Can it develop on its own, and possibly have different views to the Semantic 
 Web, or must it always be obedient to the objectives of its parent?

 Often the objectives of Linked Data engineers are very different to the 
 objectives of Semantic Web engineers.
 (A Data Integration technology or a global AI system.)
 So it is not surprising that the technologies they want might be different, 
 and even incompatible.

 If I push the parent/child analogy beyond its limit, I can see the 
 forthcoming TAG meeting as the moment at which the child proposes to reason 
 with the parent to try to reach a compromise.
 The TAG seems to be part of the ownership of the term Linked Data, because 
 the Linked Data people (whoever they are) so agree at the moment - but this 
 is not a God-given right - I don't think there is any trade- or copy-right on 
 the term.
 A failure to arrive at something that the child finds acceptable can often 
 lead to a complete rift, where the child leaves home entirely and even 
 changes its name.

 And of course, after such a separation, exactly who would be using the term 
 Linked Data to badge their activities?

 Like others in this discussion I am typing one-handed, after earlier biting 
 my arm off in preference to entering the Range-14 discussion again.
 But I do think this is an important moment for the Linked Data world.

 Best
 Hugh
 --
 Hugh Glaser,
             Web and Internet Science
             Electronics and Computer Science,
             University of Southampton,
             Southampton SO17 1BJ
 Work: +44 23 8059 3670, Fax: +44 23 8059 3045
 Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
 http://www.ecs.soton.ac.uk/~hg/

Re: Change Proposal for HttpRange-14

2012-03-23 Thread Giovanni Tummarello

2012/3/23 Sergio Fernández sergio.fernan...@fundacionctic.org:
 Do you really think that base your proposal on the usage on a Powder
 annotation is a good idea?

 Sorry, but IMHO HttpRange-14 is a good enough agreement.

yup performed brilliantly so far, nothing to say. Industry is flocking
to adoption, and what a consensus.

Re: Address Bar URI

2011-10-15 Thread Giovanni Tummarello

me2c

if you can rewrite http://yourserver/page so that it shows as
http://yourpage/resource when page was the result of a redirect that
would indeed finllay resolve the completely unacceptable situations
where users are force to understand (and see in their browser bars)
the distinction.

honestly the web has a single set of uris now: that of schema or so
annotated page.
so live with the idea (and useful software) or die chatting here :)

wrt if we can hide back the confusion and claim that eheheh we knew
all along this rewrite thing was coming then , i guess nice.

Gio



On Fri, Oct 14, 2011 at 1:08 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:
 Hi.
 My colleague, Don Cruickshank asked me if it was good practice to rewrite the 
 URI in the Address Bar to be the NIR, rather than the IR.
 I was surprised, but he tells me that it is permitted in HTML5.
 My response was  Er, yes, sounds great!

 Finally we can get away from having to explain to users that the URL of the 
 document cannot be cut and pasted as the URI!
 Yipp!
 Don is about to make the MyExperiment site move to this, so that URIs such as 
 http://www.myexperiment.org/workflows/158.html will not show the .html
 And if sites such as dbpedia were to adopt this, it would mean I no longer 
 make the mistake of doing things like fbase:Italy owl:sameAs 
 http://dbpedia.org/page/Italy; when I cut and paste or whatever, and would 
 find them in the wild a lot less.
 Not to mention me making the same mistake when I use my own RKBExplorer IDs.

 This sort of seems non-controversial - and I don't think I have seen no 
 discussion of it here, either because it hasn't hit the radar, or it is a 
 http://dbpedia.org/page/Slam_dunk (sic).

 So is it?

 Cheers
 --
 Hugh Glaser,
              Web and Internet Science
              Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
 Work: +44 23 8059 3670, Fax: +44 23 8059 3045
 Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
 http://www.ecs.soton.ac.uk/~hg/

Ann: Sig.ma EE available: on the fly entity data consolidation and browsing from your own chosen sources

2011-08-24 Thread Giovanni Tummarello

Dear all, Sig.ma Enterprise Edition (EE) is now available open source
from http://sig.ma . It can be deployed easily and merge the sources
you want both from Sindice and from your chosed LOD sources.

blog post excerpt follows



The original Sig.ma

The http://sig.ma service was created as a demonstration of live, on
the fly Web of Data mashup. Provide a query and Sig.ma will
demonstrate how the Web of Data is likely to contain surprising
structured information about it (pages that embed RDF, RDFa,
Microdata, Microformats)

By using the Sindice search engine Sig.ma allows a person to get a
(live) view of what’s on the “Web of Data” about a given topic. For
more information see our blog post. For academic use please cite
Sig.ma as in [1].

Introducing  Sig.ma Enterprise Edition (EE)

We’re happy today to introduce Sig.ma EE , a standalone, deployable,
customisable version of Sig.ma.

Sig.ma EE is deployed as a web application and will perform on the fly
data integration from both local data source and remote services
(including Sindice.com) ; just like Sig.ma but mixing your chosen
public and private sources.

Sig.ma EE currently supports the following data providers:

* SPARQL endpoints (tested on Virtuoso, 4store )
* YBoss + the Web (Uses YBoss to search then Any23 to get the data)
* Sindice (with optionally the ability to use Sindice Cache API for
fast parallel data collection)

It is very easy to customise visually and to implement new custom data
providers (e.g. for your relational or CMS data) by following
documentation provided with Sig.ma EE.

Want to give it a quick try? Sig.ma EE now also powers the service on
the http://sig.ma homepage so you can add a custom datasource (e.g.
your publicly available SPARQL endpoint) directly from the “Options”
menu you get after you search for something. It is very easy to
implement new custom data source (e.g. for your relational or web CMS
data) by implementing interfaces following the examples.

Sig.ma EE is open source and available for download. The standard
license under which Sig.ma EE is distributed is the GNU Affero General
Public License, version 3. Sig.ma EE is also available under a
commercial licence. Please contact us to discuss further.

Acknowledgements

Sig.MA EE is built with as part of the LOD2 project. http://lod2.eu
Call: FP7-ICT-2009-5


[1] Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon
Danielczyk, Renaud Delbru, Stefan Decker “Sig.ma: Live views on the
Web of Data”, Journal of Web Semantics: Science, Services and Agents
on the World Wide Web – Volume 8, Issue 4, November 2010, Pages
355-364

Re: Ann: Sig.ma EE available: on the fly entity data consolidation and browsing from your own chosen sources

2011-08-24 Thread Giovanni Tummarello

should be back now.
 My bad for announcing it basically the second our system
administrator/engineer walked outside the door.

Gio

On Wed, Aug 24, 2011 at 5:44 PM, Juan Sequeda juanfeder...@gmail.com wrote:
 Awesome! Congrats!
 It seems like everybody is checking it out because http://sig.ma/ is down..
 at least for me.
 Juan Sequeda
 +1-575-SEQ-UEDA
 www.juansequeda.com


 On Wed, Aug 24, 2011 at 11:17 AM, Giovanni Tummarello
 giovanni.tummare...@deri.org wrote:

 Dear all, Sig.ma Enterprise Edition (EE) is now available open source
 from http://sig.ma . It can be deployed easily and merge the sources
 you want both from Sindice and from your chosed LOD sources.

 blog post excerpt follows

 

 The original Sig.ma

 The http://sig.ma service was created as a demonstration of live, on
 the fly Web of Data mashup. Provide a query and Sig.ma will
 demonstrate how the Web of Data is likely to contain surprising
 structured information about it (pages that embed RDF, RDFa,
 Microdata, Microformats)

 By using the Sindice search engine Sig.ma allows a person to get a
 (live) view of what’s on the “Web of Data” about a given topic. For
 more information see our blog post. For academic use please cite
 Sig.ma as in [1].

 Introducing  Sig.ma Enterprise Edition (EE)

 We’re happy today to introduce Sig.ma EE , a standalone, deployable,
 customisable version of Sig.ma.

 Sig.ma EE is deployed as a web application and will perform on the fly
 data integration from both local data source and remote services
 (including Sindice.com) ; just like Sig.ma but mixing your chosen
 public and private sources.

 Sig.ma EE currently supports the following data providers:

 * SPARQL endpoints (tested on Virtuoso, 4store )
 * YBoss + the Web (Uses YBoss to search then Any23 to get the data)
 * Sindice (with optionally the ability to use Sindice Cache API for
 fast parallel data collection)

 It is very easy to customise visually and to implement new custom data
 providers (e.g. for your relational or CMS data) by following
 documentation provided with Sig.ma EE.

 Want to give it a quick try? Sig.ma EE now also powers the service on
 the http://sig.ma homepage so you can add a custom datasource (e.g.
 your publicly available SPARQL endpoint) directly from the “Options”
 menu you get after you search for something. It is very easy to
 implement new custom data source (e.g. for your relational or web CMS
 data) by implementing interfaces following the examples.

 Sig.ma EE is open source and available for download. The standard
 license under which Sig.ma EE is distributed is the GNU Affero General
 Public License, version 3. Sig.ma EE is also available under a
 commercial licence. Please contact us to discuss further.

 Acknowledgements

 Sig.MA EE is built with as part of the LOD2 project. http://lod2.eu
 Call: FP7-ICT-2009-5


 [1] Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon
 Danielczyk, Renaud Delbru, Stefan Decker “Sig.ma: Live views on the
 Web of Data”, Journal of Web Semantics: Science, Services and Agents
 on the World Wide Web – Volume 8, Issue 4, November 2010, Pages
 355-364

Re: New draft of Linked Data Patterns book

2011-08-20 Thread Giovanni Tummarello

Seems pretty interesting, clearly out of practical experience !
thanks!
Gio


On Fri, Aug 19, 2011 at 3:56 PM, Leigh Dodds leigh.do...@talis.com wrote:
 Hi,

 There's a new draft of the Linked Data patterns book available, with
 12 new patterns, mainly in the application patterns section.

 The latest version is available from here:

 http://patterns.dataincubator.org/book/

 There are PDF and EPUB versions linked from the homepage. The source
 is also available in github at:

 https://github.com/ldodds/ld-patterns

 Cheers,

 L.

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Mobile: 07850 928381
 http://kasabi.com
 http://talis.com

 Talis Systems Ltd
 43 Temple Row
 Birmingham
 B2 5LS

Re: ANN: Sparallax! - Browse sets of things together (now those on your SPARQL endpoint)

2011-08-17 Thread Giovanni Tummarello

Hi Danny,

i liked sparallax a lot, problem is its hard to maintain. David didnt
upgrade parallax any longer and the intern who did the sparql to MQL
conversion that allows sparallax to operate on sparql is now not
working with us anymore. Hard to say how difficult it would be to
progress on that project

On the other hand in terms of browser i am going to be sponsoring
development of TFacets , which is pretty good IMO, so we might be
releasing a new version in a few months together with the original
developer of course. What do you think of that?

Gio

On Thu, Aug 18, 2011 at 12:03 AM, Danny Ayers danny.ay...@gmail.com wrote:
 Nice work!

 Due to requirements of query functionalities and aggregates, Sparallax
 currently only works on Virtuoso SPARQL endpoints. (do other
 triplestores have aggregates? if so please let us know and we'll try
 to support other syntaxes as well)

 Which aggregate functions are needed?
 ARQ has some support (very possibly more than) listed here:
 http://jena.sourceforge.net/ARQ/group-by.html

 [Andy, are there any more query examples around? I can't seem to get
 count(*) working here]

 I see there are Graph and Dataset fields on the form - how are they
 used? What happens with an endpoint that only has a default, unnamed
 graph?

 Try sparallax on our test datasets or put just the URL of your SPARQL
 endpoint (but make sure you have materialized your RDFs triples, it's
 needed)

 materialized?

 (I assume that's not the httpRange-14 extension for dogs :)

 What else can be done to the data to make it more Sparallax friendly?

 I noticed clues in the form boxes for abstractprop and imageprop, also
 the config file -

 http://code.google.com/p/freebase-parallax/source/browse/app-parallax/sparallax/scripts/config.xml

 Do I take it from the rdfs:Class entry there that it uses RDFS to
 decide on collections?

 Sorry for the raft of questions, but I recently put LD browser on my
 todo list, completely forgetting about Parallax and all the boxes that
 already ticks, and now Sparallax... so I want to get an idea of how it
 works to avoid reinventing the wheel (in a square shape).

 Cheers,
 Danny.





 --
 http://dannyayers.com

Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Giovanni Tummarello

Hi out of curiousity
Will you be taking off the diagram those that are NOT online regularly?
Gio

On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1]. We would like to invite those of you who publish
 data sets as Linked Data to join the other ~2000 data sets already in CKAN (
 http://ckan.net ) to help us extend the list of ~300 candidates to the LOD
 cloud diagram. For those of you that already have entries on CKAN, we ask
 you to please review and update your entries accordingly. Please finalize
 your dataset descriptions until the end of this week to ensure that your
 entry will be considered for this round of the diagram.

 We will be analyzing all data sets tagged with lod in CKAN from the
 perspective of a data consumer, looking for best practices that make it
 easier to access, understand and use your data. The compliance with the best
 practices will be checked manually and with scripts that download and
 analyze data from the data sources. Therefore it is important that you
 provide as much information as possible in your CKAN entry.

 You can use the CKAN entry for DBpedia as one example:
 http://ckan.net/package/dbpedia

 In order to aid you in this quest, we have provided a validation page for
 your CKAN entry with step-by-step guidance for the information that we will
 be looking for:
 http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

 After you have completed the description of your data sets, we invite you to
 fill up this 5 minutes survey about your experience. This will help us to
 make the process easier, more complete and exciting for the next time
 around.
 http://www.surveymonkey.com/s/TDS3TML

 Thank you and happy dataset description!

 Cheers,
 Pablo, Anja, Richard and Chris
 [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/

Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Giovanni Tummarello

i meant a much simpler and significant thing. Go in CKAN click on the
LOD tag, then start clicking around datasets.
Many dont work, are offline etc. They have been for weeks or months.
Are you checking these and removing them from the new lod diagram or
will the lod diagram just grow regardless reality?
thanks
Gio

 Second, I assume you refer to the relatively short time span between my
 message to the list and the desired date for finishing the entries for the
 new release.

Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Giovanni Tummarello

Chris,

i am not interested in specific content of the diagram, but rather i
am interested in understanding what its value of it which depends on
the method you're going to follow in the update. You're answeing this
saying basically there wont be a check for old dead datasets.

I admit never having looked at this closely but i think i cant be the
only one thinking its a bit of a joke if we're telling people to
publish data in a way.. that doesnt even have a way to know if data is
thre or not?

please notice that i am trying to be constructive by suggesting the
diagram is made to mean something that one can rely on e.g. let me go
see the latest diagram so that i can.. . a suggestion in this sense
could be to require that linked data in ckan publishes  URIs with
sample data are given, that sites are exposing either dumps or a
sitemap (so that they can be collected) etc.
cheers
Gio











 and uselessness of the initiative, of the diagram of ckan and more.



On Wed, Jul 13, 2011 at 12:05 AM,  bi...@zedat.fu-berlin.de wrote:
 Hi Giovanni,

 Will you be taking off the diagram those that are NOT online regularly?

 could you please be a bit more precise and clearly say which datasets you
 are talking about.

 Which datasets do not provide dereferencable URIs anymore?

 (Linked Data and the LOD diagram is not about SPARQL endpoints)

 A constructive approach, which I guess would be highly appreciated by the
 community, would be that you directly mark these datasets on CKAN using
 the tags that are proposed at the end of this page

 http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 Cheers,

 Chris


 Hi out of curiousity
 Will you be taking off the diagram those that are NOT online regularly?
 Gio

 On Tue, Jul 12, 2011 at 7:45 PM, Pablo Mendes pablomen...@gmail.com
 wrote:
 Dear fellow Linked Open Data publishers and consumers,
 We are in the process of regenerating the next LOD cloud diagram and
 associated statistics [1]. We would like to invite those of you who
 publish
 data sets as Linked Data to join the other ~2000 data sets already in
 CKAN (
 http://ckan.net ) to help us extend the list of ~300 candidates to the
 LOD
 cloud diagram. For those of you that already have entries on CKAN, we
 ask
 you to please review and update your entries accordingly. Please
 finalize
 your dataset descriptions until the end of this week to ensure that your
 entry will be considered for this round of the diagram.

 We will be analyzing all data sets tagged with lod in CKAN from the
 perspective of a data consumer, looking for best practices that make it
 easier to access, understand and use your data. The compliance with the
 best
 practices will be checked manually and with scripts that download and
 analyze data from the data sources. Therefore it is important that you
 provide as much information as possible in your CKAN entry.

 You can use the CKAN entry for DBpedia as one example:
 http://ckan.net/package/dbpedia

 In order to aid you in this quest, we have provided a validation page
 for
 your CKAN entry with step-by-step guidance for the information that we
 will
 be looking for:
 http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

 After you have completed the description of your data sets, we invite
 you to
 fill up this 5 minutes survey about your experience. This will help us
 to
 make the process easier, more complete and exciting for the next time
 around.
 http://www.surveymonkey.com/s/TDS3TML

 Thank you and happy dataset description!

 Cheers,
 Pablo, Anja, Richard and Chris
 [1] http://www4.wiwiss.fu-berlin.de/lodcloud/state/

Re: Get your dataset on the next LOD cloud diagram

2011-07-12 Thread Giovanni Tummarello

 If you are seeking stats re. what I mean re. intertia, just keep track of
 what's happening on the schema.org front re. adoption curve.


 here are 100+ datasets

http://sindice.com/search?q=schemanq=fq=class%3Ahttp%3A%2F%2Fschema.org%2F*sortbydate=1facet.field=domaininterface=guru

started collecting 2 weeks ago and we did NOT reanalyze/recrawl
previously known sites ATM . How fair is it to call them datasets
rather than marked up pages that is up to discussion - possibly a
reasonably interesting one.

Gio

Re: A proposal for handling bulk data requests

2011-07-11 Thread Giovanni Tummarello

 An idea that arose out of a recent discussion with Juergen (in CC): how
 about providing a sort of 'bulk data request' facility for your SPARQL
 endpoints [1] [2] (as they are, I gather, the more popular ones on the WoD
 ;)?



Hi wrt to Sindice, it is important that data we index publicly
reflects a public web site and, in general is the same that a user
would normally find on a hosting website.  People trust and understand
the content of websites much better than datasets.

the goal however is indeed that to expose a dataset and to have people
query that (e.g. on our sparql endpoint).

For smaller websites, content managemeny systems with rdfa/schema
markup etc, honestly sitemap.xml does its job. (also the overhead -
conceptual and otherwise-   of creating a dump for these sites is too
much)

for big websites, lod datasets etc, we are indeed working specifically
on supporting dumps.

We used to support semantic sitemaps which provide pointers to dumps
but reality of thing is dumps come in many different forms, some split
the same dataset in multiple files, some provide different version
(e.g. datest) etc. some provide different formats.

then, how to index these in a way that mimick the way they are exposed
on the web (with descriptions handed out when an authorititative URI
is resolved)? we used to have some constructs in the Semantic Sitemap
proposed protocol , but very few got it right, incentives for getting
it right or fixing issues are seriously lacking.

bottom line: we'll provide a form (very soon) to provide a link and a
description of your dataset, as well as an email. We'll then process
manually until we're reasonably solid to propose a viable proposed
more mechanized solution.

Gio

Re: ANN: Sudoc bibliographic ans authority data

2011-07-10 Thread Giovanni Tummarello

hi Antoine, Yann all

my advice is to keep it simple and complete.

very simple indeed. Please forget about content negotiation. It was a
horrible idea all alone, it doesn't work because it WILL break since
no humans are looking at it. Really: anything that redirects and
changes the URL when you put it in a browser is just so wrong

have 1 single version of the page with rdfa+schema.org i know they say
dont do that on schema.org but they're just being silly they will read
microdata anyway (the schema part) the rdfa part its 1 line of code to
extract if they want to do so if they dont who cares - they only care
about the schema part anyway, let others use the rdf/a

in terms of full crawling, if you allow of 1 url per second should be
sustained this way data would be in in 3 months or so which seems
still ridicolous but thats what search engine do. if you have the
proper lastupdatd set that's great the updates will be just
incremental

otherwise yes a dump would allow us to ingest all in full but it is a
manual operation betwen us and you

these are my advices, this said i know that one might have several
ideas/motivs etc which might be different from what these advices
suggest. worry not. whoever consumes data better get ready to be
pretty flexible, so we take all you offer really :)
cheers

Giovanni

On Sun, Jul 10, 2011 at 12:22 PM, Antoine Isaac ais...@few.vu.nl wrote:
Yann, Giovanni,

Which side effects are probable ?

Giovanni has made the same comment on data.europeana.eu a couple of weeks
ago. The data we serve there is different from the RDFa mark-up on our web
portal.
We had some reasons to do this, including, well, that the RDFa data is
mixing the info and non-info resources for making easier data consumption
(not mandatorily by search engines, btw), and working with URIs that
pre-date our linked data service.

The RDFa and the RDF obtained with LD-style conneg is also not about the
same URIs, which should avoid any confusion.
But I can understand that if Sindice tries to fetch both data sources, it
may assume the data to be the same. And this assumption could bring a number
of undesirable side effects if Sindice merges all what it gets...

That being said, perhaps the solution lies in Sindice being less greedy ;-)
and just work with the first data source it finds, for a given URI.
I do like the idea of having several (simple) channels for data publication
over the web, which serve different goals.
Maybe we need to better articulate the practices and expectations, though...

Cheers,

Antoine

Hi Giovanni,

Le 09/07/2011 23:10, Giovanni Tummarello a écrit :

Hi Nicolas,

Its getting in Sindice indeed -

Yes, I have noticed :)

quite politely e.g. 1 every 5 secs-
we'll monitor speed and completeness. iff you think its ok for us to
crawl faster please say so via robot.txt directive or just say so

May I suggest that you crawl twice faster ?

http://sindice.com/search?q=booknq=fq=domain%3Awww.sudoc.frsortbydate=1interface=advanced

at the same time i notice something funny in the markup e.g. if you go
with a browser you get redirected to something that has almost no data

for example the sitemap contains

http://www.sudoc.fr/00043

if you go there you get redirected to

http://www.sudoc.abes.fr/DB=2.1/SRCH?IKT=12TRM=00043

which if you put in the inspector

http://inspector.sindice.com/inspect?url=http%3A%2F%2Fwww.sudoc.abes.fr%2FDB%3D2.1%2FSRCH%3FIKT%3D12%26TRM%3D00043#TRIPLES

you get very little data

however of course if i use the inspector on
http://www.sudoc.fr/00043 i get data

http://inspector.sindice.com/inspect?url=http%3A%2F%2Fwww.sudoc.fr%2F00043content=contentType=auto#TRIPLES

which however is mostly schema.org data!

but in sindice i have lots of RDF data with all sort of other ontologies

http://sindice.com/search/page?url=http%3A%2F%2Fwww.sudoc.fr%2F000385123

is there any way you could try to normalize all into a single markup
type? i think it would be easier to debug and ultimately better for
all..

I will try to explain our intention, our constraints and the mechanism
we've implemented.

- Intention -

We want to meet several needs :
. providing RDF/XML to semantic-oriented clients like Sindice
. providing HTML + schema.org microdata to traditional search engines like
Google
. providing an HTML UI to users

- Constraints -

. For some reasons, we can't add microdata to our traditional Sudoc UI.
Hence the necessity of special HTML+microdata pages for search engines. :(
. HTML+microdata pages and RDF pages can't support the same vocabularies,
schema.org /oblige/.

- Mechanisms -

Let's start from : http://www.sudoc.fr/132133520

. If RDF/XML is called by the request, we provide RDF/XML content (as if
you had requested http://www.sudoc.fr/132133520.rdf)
It is what Sindice Crawler is doing and getting : the 55,764 documents
that are found in your index are composed of triples extracted

Re: ANN: Sudoc bibliographic ans authority data

2011-07-09 Thread Giovanni Tummarello

Hi Nicolas,

Its getting in Sindice indeed - quite politely e.g. 1 every 5 secs-
we'll monitor speed and completeness. iff you think its ok for us to
crawl faster please say so via robot.txt directive or just say so

http://sindice.com/search?q=booknq=fq=domain%3Awww.sudoc.frsortbydate=1interface=advanced

at the same time i notice something funny in the markup e.g. if you go
with a browser you get redirected to something that has almost no data

for example the sitemap contains

http://www.sudoc.fr/00043

if you go there you get redirected to

http://www.sudoc.abes.fr/DB=2.1/SRCH?IKT=12TRM=00043

which if you put in the inspector

http://inspector.sindice.com/inspect?url=http%3A%2F%2Fwww.sudoc.abes.fr%2FDB%3D2.1%2FSRCH%3FIKT%3D12%26TRM%3D00043#TRIPLES

you get very little data

however of course if i use the inspector on
http://www.sudoc.fr/00043 i get data

http://inspector.sindice.com/inspect?url=http%3A%2F%2Fwww.sudoc.fr%2F00043content=contentType=auto#TRIPLES

which however is mostly schema.org data!

but in sindice i have lots of RDF data with all sort of other ontologies

http://sindice.com/search/page?url=http%3A%2F%2Fwww.sudoc.fr%2F000385123

is there any way you could try to normalize all into a single markup
type? i think it would be easier to debug and ultimately better for
all..

looking forward to support
Giovanni
Gio


On Fri, Jul 8, 2011 at 1:27 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 On 7/8/11 8:31 AM, Yann NICOLAS wrote:

 Le 08/07/2011 01:42, Kingsley Idehen a écrit :

 On 7/7/11 10:17 PM, Yann NICOLAS wrote:

 Bonjour,

 Sudoc [1], the French academic union catalogue maintained by ABES [2], has
 just been released as linked open data.

 10 million bibliographic records are now available as RDF/XML.

 Examples for the Sudoc record whose internal id is 132133520 :
 . Resource URI : http://www.sudoc.fr/132133520/id
 . Generic document : http://www.sudoc.fr/132133520 (content negotiation is
 supported)


 Great job!

 Is there an RDF dump anywhere?


 Sorry, we don't provide any dump, as the 10 000 000 files are generated on
 the fly from Oracle (stored as XML type + some more tables).
 We provide a complete sitemap at
 http://www.sudoc.fr/noticesbiblio/sitemap.txt , and we hope that Sindice
 will crawl the whole stuff.
 Would it help ?

 Any advice welcome,

 Yann

 --
 --
 Yann NICOLAS
 Etudes  Projets
 ABES

 Okay, no problem with sitemaps as dump alternatives re. getting data
 imported into Linked Data hubs such our LOD cloud cache and Sindice etc..


 --

 Regards,

 Kingsley Idehen   
 President  CEO
 OpenLink Software
 Web: http://www.openlinksw.com
 Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca: kidehen

Re: Squaring the HTTP-range-14 circle [was Re: Schema.org in RDF ...]

2011-06-19 Thread Giovanni Tummarello


 particular confusion is so destructive. Unlike the dogs-vs-bitches case,
 the difference between the document and its topic, the thing, is that one is
 ABOUT the other. This is not simply a matter of ignoring some


Could it be exactly the other way around? that documents and things
described in it are easy to distinguis EXACTLY becouse one is about the
other, no one can possibly mess them up/except for idiotic computer
algorithms from the 70s that limits themselves to simbolic AI techniques.

Otherwise you seem to say that  its more difficult to distinguish between a
dog and a bitch than it is to distinguish between a dog and a stream of
bytes in return to an HTTP request, and that seems a bit funny?

look if someone points me at a facebook URL i know its about a person and
not about the damn page (which has 2000 ways to change every time that url
is resolved anyway.


 certainly breaks **semantic** architecture. It completely destroys any
 semantic coherence we might, in some perhaps impossibly optimistic vision of
 the future, manage to create within the semantic web. So yes indeed, the Web
 will go on happily confusing things with documents, partly because the Web
 really has no actual contact with things at all: it is entirely constructed
 from documents (in a wide sense). But the SEMANTIC Web will wither and die,
 or perhaps be still-born, if it cannot find some way to keep use and mention
 separate and coherent.



i mean we can go on and tell oursellf we cant possibly write applications
that know or understand what  facebook URL is about.

but dont be surprised as  less and less people will be willing to listen as
more and more applications (Eg.. all the stuff based  on schema.org) pop up
never knowing there was this problem... (not in general. of course there is
in general, but for their specific use cases)

Gio

Re: Semantic Web Challenge 2011 CfP and Billion Triple Challenge 2011 Data Set published.

2011-06-17 Thread Giovanni Tummarello

 This year, the Billion Triple Challenge data set consists of 2 billion
 triples. The dataset was crawled during May/June 2011 using a random sample
 of URIs from the BTC 2010 dataset as seed URIs. Lots of thanks to Andreas
 Harth for all his effort put into crawling the web to compile this dataset,
 and to the Karlsruher Institut für Technologie which provided the necessary
 hardware for this labour-intensive task.

 **



On a related note,

 while nothing can beat a custom job obviously,

i feel like reminding that those that don't have said mighty
time/money/resources that any amount of data that one wants  rom the
repositories in Sindice which we do make freely available for things like
this. (0 to 20++ billion triples, LOD or non LOD, microformats, RDFa, custom
filtered etc)

See the  TREC 2011 competition
http://data.sindice.com/trec2011/download.html (1TB+ of data!)  or the
recent W3C data anaysis which is leading to a new reccomendation (
http://www.w3.org/2010/02/rdfa/profile/data/)  etc.

trying to help.
Congrats on the great job guys of course for the Semantic web challenge
which is a long standing great initiative!
Gio

Re: Squaring the HTTP-range-14 circle

2011-06-16 Thread Giovanni Tummarello

Hi Tim ,

documents per se (a la HTTP response 200 response) on the web are less and
less relevant as opposed to the conceptual entities that are represented
by this document and  held e.g. as DB records inside CMS, social networks
etc.

e.g. a social network is about people those are the important entities.
Then there might be 1000 different HTTP documents that you can get e.g.i f
you're logged if you're not logged, if you have a cookie if you have another
cookie, if you add format=print. Specific URLs are pretty irrelevant as
they contain all sort of extra information.

Layouts of CMS or web apps change all the time (and so do the HTML docs) but
not the entities.

that's why http response 200 level annotations are of such little
ambiguity really you say you have so many annotations about documents, i
honestly dont understand what you're referring to, are these HTTP
retrievable documents? where are the annotations? are we talking about the
http headers? about the meta tags in the head these are about the
subject of the page too most of the time, not the page itself.

and this is the idea behind schema.org (opengraph whatever) which sorry Tim
you have to live with and we have to do the most with.

When someone refers to a URL which embeds a opengraph or
schema.organnotation then it is 99.+ (with the number of 9 augmenting
as the web
evolves to a rich app platform) certain that they refer to the entity
described in it and not to the web document itself (which can and does
change all the time and is of overall no conceptual relevance).

With respect to schema.org, we (as semantic web community) have not been
ignored: our work and proposals have been very well considered and then
diregarded alltogether - and for several reasons : 12 years of work, not an
agreement on ontology, not an easy way for people to publish data ( the 303
thing is a complete total utter insanity (as i had said in vain so many
times) ). etc.

So, think of how browsers work: they fix all the broken HTML markup doing
what it takes to undertand more or less the intention behind the broken
markup.

The same will exactly happen with applications that work on semantic markup
at web scale. they will do the specific cleanups and adaptations as they
need.

*the UPSIDE* of this is that RDF is a totally cool technology which can most
of the time rule them all .

Sindice is entirely RDF based, but then reads and processes microformats,
RDF, RDFa, and next week schema.org too microdata. So long life to all
really.

Fights work fighting: having RDFa play well along schema.org so that
schema.org tags can be written in RDFa and search engines will still read
it. This will allow people to still use rich representations and
vocabularies while not loosing compatibilities with the mainstream apps
which will be developed for schema.org compatible pages.

Gio








On Thu, Jun 16, 2011 at 7:04 PM, Tim Berners-Lee ti...@w3.org wrote:

 I disagree with this post very strongly, and it is hard to know where to
 start,
 and I am surprised to see it.

 On 2011-06 -13, at 07:41, Richard Cyganiak wrote:

  On 13 Jun 2011, at 09:59, Christopher Gutteridge wrote:
  The real problem seems to me that making resolvable, HTTP URIs for real
 world things was a clever but dirty hack and does not make any semantic
 sense.
 
  Well, you worry about *real-world things*, but even people who just worry
 about *documents* have said for two decades that the web is broken because
 it conflates names and addresses.

 No, some people didn't get the architecture in that they had learned
 systems where there that
 there was a big distinction between names and address, and they had
 different properties,
 and then they came across URIs which had properties of both.


  And they keep proposing things like URNs and info: URIs and tag: URIs and
 XRIs and DOIs to fix that and to separate the naming concern from the
 address concern. And invariably, these things fizzle around in their little
 niche for a while and then mostly die, because this aspect that you call a
 “clever but dirty hack” is just SO INCREDIBLY USEFUL. And being useful
 trumps making semantic sense.

 I agree ... except that ther URI architectre being like names and like
 addresses isn't a clever but dirty hack.

 You then connect this with the idea of using HTTP URIs for real-world
 things, which is a separate queston.
 This again is a question of architecture. Of design of a system.
 We can make it work either way.
 We have to work out which is best.

 I don't think 303 is a quick and dirty hack.
 It does mean a large extension of HTTP to be uses with non-documents.
 It does have efficiency problems.
 It is an architectural extension to the web architecture.

 
  HTTP has been successfully conflating names and addresses since 1989.

 That is COMPLETELY irrelevant.
 It is not a question of the web being fuzzy or ambiguous and getting away
 with it.
 It is a clean architecture where the concepts of name and address don't

Re: Schema.org in RDF ...

2011-06-11 Thread Giovanni Tummarello

My sincere congratulations, i had someone overlooked at this level of
detail needed here.

The choices are pragmatic and - in my personal opinion having talked
directly at SemTech with a lot of people involved in this - should
serve the community as good as possible.

will you be posting this as a FAQ i think its definitely worth it.

Gio

On Sat, Jun 11, 2011 at 6:55 PM, Richard Cyganiak rich...@cyganiak.de wrote:
 All,

 Thanks for the thoughtful feedback regarding schema.rdfs.org, both here and 
 off-list.

 This is a collective response to various arguments brought up. I'll 
 paraphrase the arguments.

 Limiting ranges of properties to strings is bad because we LD people might 
 want to use URIs or blank nodes there.

 Schema.org says the range is a string, and the RDFS translation reflects 
 this. We tried to formally describe schema.org in RDFS. We did not try to 
 make a fork that improves upon their modelling. That might be a worthwhile 
 project too, but a different project.

 Schema.org documentation explicitly say that you can use a text instead of a 
 Thing/Person/other type.

 This is the opposite case from the one above: They say that in place of a 
 resource, you can always use a text. That's ok—we didn't say that 
 schema:Thing is disjoint from literals. (I'm tempted to add “xsd:string 
 rdfs:subClassOf schema:Thing.” to capture this bit of the schema.org 
 documentation.)

 The range should use rdfs:Literal instead of xsd:string to allow language 
 tags.

 That's a good point. The problem is that xsd:string is too narrow and 
 rdfs:Literal is too broad. RDF 1.1 is likely to define a class of all string 
 literals (tagged and untagged), we'll use that when its name has been 
 settled, and perhaps just leave the inaccurate xsd:string in place for now.

 You should use owl:allValuesFrom instead of the union domains/ranges.

 Probably correct in terms of good OWL modelling. But the current modelling is 
 not wrong AFAICT, and it's nicer to use the same construct for single- and 
 multi-type domains and ranges.

 Nothing is gained from the range assertions. They should be dropped.

 They capture a part of the schema.org documentation: the “expected type” of 
 each property. That part of the documentation would be lost. Conversely, 
 nothing is gained by dropping them.

 You should jiggle where rdfs:isDefinedBy points to, or use wdrs:describedby.


 This could probably be done better, but the way we currently do it is simple, 
 and not wrong, so we're a bit reluctant to change it.

 You're missing an owl:Class type on the anonymous union classes.

 Good catch, fixed. Thanks Holger!

 You should add owl:FunctionalProperty for all single-valued properties.

 The schema.org documentation unfortunately doesn't talk about the cardinality 
 of properties. Using heuristics to determine which properties could be 
 functional seems a bit risky, given that it's easy to shoot oneself in the 
 foot with owl:FunctionalProperty.

 There are UTF-8 encoding problems in comments.

 Fixed. Thanks Aidan!

 You should mint new URIs and use http://schema.rdfs.org/Thing instead of 
 http://schema.org/Thing.


 Schema.org defines URIs for a set of useful vocabulary terms. The nice thing 
 about it is that the URIs have Google backing. The Google backing would be 
 lost by forking with a different set of URIs.

 You should mint new URIs because the schema.org URIs don't resolve to RDF.


 Dereferenceability is only a means to an end: establishing identifiers that 
 are widely understood as denoting a particular thing. Let's acknowledge 
 reality: Google-backed URIs with HTML-only documentation achieve this better 
 than researcher-backed URIs which follow best practices to a tee with a 
 cherry on top.

 You are violating httpRange-14 because you say that http://schema.org/Thing 
 is a class, while it clearly is an information resource.

 Schema.org documentation uses these URIs as classes and properties in RDFa. 
 They also return 200 from those URIs. So it's them who are violating 
 httpRange-14, not us. Draw your own conclusion about the viability of 
 httpRange-14.

 You should use http://schema.org/Thing#this.


 Schema.org is using http://schema.org/Thing as a class in their RDFa 
 documentation. I don't think we should mint different URIs in their namespace.

 http://schema.org/Person is not the same as foaf:Person; one is a class of 
 documents, the other the class of people.

 I don't think that's correct at all. http://schema.org/Person is the class of 
 people and is equivalent to foaf:Person. It's just that the schema.org 
 designers don't seem to care much about the distinction between information 
 resources and angels and pinheads. This is the prevalent attitude outside of 
 this mailing list and we should come to terms with this.

 Best,
 Richard

Re: Schema.org in RDF ...

2011-06-09 Thread Giovanni Tummarello

my2c

i would seriously advice against using  triples with http://schema.rdfs.org  .

That  would be totally and entirely validating their claim that either
you impose things or fragmentation will distroy everything and that
talking to the community is a waste of time.

For how little this matters really - i'd really advice anyone wanting
to produce RDFa of schema to live with it and use direct
http://schema.org uris as per their example in RDFa.

Gio

On Tue, Jun 7, 2011 at 9:49 AM, Patrick Logan patrickdlo...@gmail.com wrote:
 Would it be reasonable to use  http://schema.rdfs.org rather than
 http://schema.org in the URIs? Essentially mirror what one might hope
 for schema.org to become. Then if it does become that, link the two
 together?


 On Tue, Jun 7, 2011 at 1:22 AM, Michael Hausenblas
 michael.hausenb...@deri.org wrote:
 Something I don't understand. If I read well all savvy discussions so far,
 publishers behind http://schema.org URIs are unlikely to ever provide any
 RDF description,

 What makes you so sure about that not one day in the (near?) future the
 Schema.org URIs will serve RDF or JSON, FWIW, additionally to HTML? ;)

 Cheers,
        Michael
 --
 Dr. Michael Hausenblas, Research Fellow
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html

 On 7 Jun 2011, at 08:44, Bernard Vatant wrote:

 Hi all

 Something I don't understand. If I read well all savvy discussions so far,
 publishers behind http://schema.org URIs are unlikely to ever provide any
 RDF description, so why are those URIs declared as identifiers of RDFS
 classes in the http://schema.rdfs.org/all.rdf. For all I can see,
 http://schema.org/Person is the URI of an information resource, not of a
 class.
 So I would rather have expected mirroring of the schema.org URIs by
 schema.rdfs.org URIs, the later fully dereferencable proper RDFS classes
 expliciting the semantics of the former, while keeping the reference to the
 source in some dcterms:source element.

 Example, instead of ...

 rdf:Description rdf:about=http://schema.org/Person;
 rdf:type rdf:resource=http://www.w3.org/2000/01/rdf-schema#Class/
 rdfs:label xml:lang=enPerson/rdfs:label
 rdfs:comment xml:lang=enA person (alive, dead, undead, or
 fictional)./rdfs:comment
 rdfs:subClassOf rdf:resource=http://schema.org/Thing/
 rdfs:isDefinedBy rdf:resource=http://schema.org/Person/
 /rdf:Description

 where I see a clear abuse of rdfs:isDefinedBy, since if you dereference
 the said URI, you don't find any explicit RDF definition ...

 I would rather have the following

 rdf:Description rdf:about=http://schema.rdfs.org/Person;
 rdf:type rdf:resource=http://www.w3.org/2000/01/rdf-schema#Class/
 rdfs:label xml:lang=enPerson/rdfs:label
 rdfs:comment xml:lang=enA person (alive, dead, undead, or
 fictional)./rdfs:comment
 rdfs:subClassOf rdf:resource=http://schema.rdfs.org/Thing/
 dcterms:source rdf:resource=http://schema.org/Person/
 /rdf:Description

 To the latter declaration, one could safely add statements like

 schema.rdfs:Person rdfs:subClassOf  foaf:Person

 etc

 Or do I miss the point?

 Bernard

 2011/6/3 Michael Hausenblas michael.hausenb...@deri.org

 http://schema.rdfs.org

 ... is now available - we're sorry for the delay ;)

 Cheers,
       Michael
 --
 Dr. Michael Hausenblas, Research Fellow
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html





 --
 Bernard Vatant
 Senior Consultant
 Vocabulary  Data Integration
 Tel:       +33 (0) 971 488 459
 Mail:     bernard.vat...@mondeca.com
 
 Mondeca
 3, cité Nollez 75018 Paris France
 Web:    http://www.mondeca.com
 Blog:    http://mondeca.wordpress.com

Re: Schema.org in RDF ...

2011-06-09 Thread Giovanni Tummarello

http://schema.org/docs/datamodel.html

On Thu, Jun 9, 2011 at 3:05 AM, Kingsley Idehen kide...@openlinksw.com wrote:
 On 6/9/11 9:58 AM, Michael Hausenblas wrote:

 For how little this matters really - i'd really advice anyone wanting
 to produce RDFa of schema to live with it and use direct
 http://schema.org uris as per their example in RDFa.

 URL of the example in question?


 Kingsley

 +1

 Cheers,
    Michael
 --
 Dr. Michael Hausenblas, Research Fellow
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html

 On 9 Jun 2011, at 09:54, Giovanni Tummarello wrote:

 my2c

 i would seriously advice against using  triples with
 http://schema.rdfs.org  .

 That  would be totally and entirely validating their claim that either
 you impose things or fragmentation will distroy everything and that
 talking to the community is a waste of time.

 For how little this matters really - i'd really advice anyone wanting
 to produce RDFa of schema to live with it and use direct
 http://schema.org uris as per their example in RDFa.

 Gio

 On Tue, Jun 7, 2011 at 9:49 AM, Patrick Logan patrickdlo...@gmail.com
 wrote:

 Would it be reasonable to use  http://schema.rdfs.org rather than
 http://schema.org in the URIs? Essentially mirror what one might hope
 for schema.org to become. Then if it does become that, link the two
 together?


 On Tue, Jun 7, 2011 at 1:22 AM, Michael Hausenblas
 michael.hausenb...@deri.org wrote:

 Something I don't understand. If I read well all savvy discussions so
 far,
 publishers behind http://schema.org URIs are unlikely to ever provide
 any
 RDF description,

 What makes you so sure about that not one day in the (near?) future the
 Schema.org URIs will serve RDF or JSON, FWIW, additionally to HTML? ;)

 Cheers,
       Michael
 --
 Dr. Michael Hausenblas, Research Fellow
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html

 On 7 Jun 2011, at 08:44, Bernard Vatant wrote:

 Hi all

 Something I don't understand. If I read well all savvy discussions so
 far,
 publishers behind http://schema.org URIs are unlikely to ever provide
 any
 RDF description, so why are those URIs declared as identifiers of RDFS
 classes in the http://schema.rdfs.org/all.rdf. For all I can see,
 http://schema.org/Person is the URI of an information resource, not of
 a
 class.
 So I would rather have expected mirroring of the schema.org URIs by
 schema.rdfs.org URIs, the later fully dereferencable proper RDFS
 classes
 expliciting the semantics of the former, while keeping the reference
 to the
 source in some dcterms:source element.

 Example, instead of ...

 rdf:Description rdf:about=http://schema.org/Person;
 rdf:type rdf:resource=http://www.w3.org/2000/01/rdf-schema#Class/
 rdfs:label xml:lang=enPerson/rdfs:label
 rdfs:comment xml:lang=enA person (alive, dead, undead, or
 fictional)./rdfs:comment
 rdfs:subClassOf rdf:resource=http://schema.org/Thing/
 rdfs:isDefinedBy rdf:resource=http://schema.org/Person/
 /rdf:Description

 where I see a clear abuse of rdfs:isDefinedBy, since if you
 dereference
 the said URI, you don't find any explicit RDF definition ...

 I would rather have the following

 rdf:Description rdf:about=http://schema.rdfs.org/Person;
 rdf:type rdf:resource=http://www.w3.org/2000/01/rdf-schema#Class/
 rdfs:label xml:lang=enPerson/rdfs:label
 rdfs:comment xml:lang=enA person (alive, dead, undead, or
 fictional)./rdfs:comment
 rdfs:subClassOf rdf:resource=http://schema.rdfs.org/Thing/
 dcterms:source rdf:resource=http://schema.org/Person/
 /rdf:Description

 To the latter declaration, one could safely add statements like

 schema.rdfs:Person rdfs:subClassOf  foaf:Person

 etc

 Or do I miss the point?

 Bernard

 2011/6/3 Michael Hausenblas michael.hausenb...@deri.org

 http://schema.rdfs.org

 ... is now available - we're sorry for the delay ;)

 Cheers,
      Michael
 --
 Dr. Michael Hausenblas, Research Fellow
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html





 --
 Bernard Vatant
 Senior Consultant
 Vocabulary  Data Integration
 Tel:       +33 (0) 971 488 459
 Mail:     bernard.vat...@mondeca.com
 
 Mondeca
 3, cité Nollez 75018 Paris France
 Web:    http://www.mondeca.com
 Blog:    http://mondeca.wordpress.com
 










 --

 Regards,

 Kingsley Idehen
 President  CEO
 OpenLink Software
 Web: http://www.openlinksw.com
 Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter

Re: Schema.org in RDF ... expected Types in RDFS

2011-06-06 Thread Giovanni Tummarello

 So, can someone clarify, if possible, whether if I publish a page using RDFa 
 and schema.rdf.org syntax, it will be properly parsed and indexed in any of 
 those search engines?


that's all they'd have to say not to piss people off but they decided
not to do it.

didnt cost anything. pretty sad. Lets work on having that clarified.
having schema.rdfs.org helps so we can present our case neatly i.e.
what to parse for them

Gio

Re: Minting URIs: how to deal with unknown data structures

2011-04-16 Thread Giovanni Tummarello

Hi Frank, my 2c from the Sindice.com point of view..  (as we struggle
to actually make use and make easy for others to use all this)

i wouldn't really worry too much,

just give to the machines what you'd give to humans, that technically
means simply make sure all the pages you display (and that talk about
your content) have RDFa on them.

so if you have pages for your employees, just add triples on them with
proper markup.

Use http://inspector.sindice.net to see/inspect how you're doing.

My advice is : make sure the description is rich, as rich as possible
to enable disambiguation.

-- Ideally you should reuse other people's URIs', or put sameas in
practice i think this sort of advice is just utopia - i mean, i find
its asking people with perfectly good data to do some huge effort when
the benefits of it are really unclear and intangible at this point.
-- In practice i would aim at simply describing your data very
well.that is making sure your descriptions are rich and expressive
enough so that one could (if needed) easily link your descriptions to
other datasets. We wrote a small position paper some time ago which i
feel like reccomanding [1]

once this is done, make sure you have a sitemap.xml file to tell the
world what your exposed data is (e.g. your employees, your products,
whatever) and you're set.

If you change something, search engines (or agents) will simply index
your new structures.. and eventually make sense of it. You're data
wont be any more strange then that of other people, so either we're
smart enough in adapting or the Web of Data beyond the current
google rich snippets or facebook opengraph will never be.

Gio


[1] Publishing Data that Links Itself: A Conjecture
G Tummarello, R Delbru - 2010 AAAI Spring Symposium Series, 2010 - aaai.org
http://www.aaai.org/ocs/index.php/SSS/SSS10/paper/download/1189/1467

On Fri, Apr 15, 2011 at 2:48 PM, Frans Knibbe frans.kni...@geodan.nl wrote:
 Hello,

 Some newbie questions here...

 I have recently come in contact with the concept of Linked Data and I have
 become enthusiastic. I would like to promote the idea within my company (we
 specialize is geographical data) and within my country. I have read the
 excellent Linked Data book (“Linked Data: Evolving the Web into a Global
 Data Space”) and I think I am almost ready to start publishing Linked Data.
 I understand that it is important to get the URIs right, and not have to
 change them later. That is what my questions are about.

 I have acquired the first part (authority) of my URIs, let's say it is
 lod.mycompany.com. Now I am faced with the question: How do I come up with a
 URI scheme that will stand the test of time? I think I will start with
 publishing some FOAF data of myself and co-workers. And then hopefully more
 and more data will follow. At this moment I can not possible imagine which
 types of data we will publish. They are likely to have some kind of
 geographical component, but that is true for a lot of data. I believe it is
 not possible to come up with any hierarchical structure that will
 accommodate all types of data that might ever be published.

 So I think it is best to leave out any indication of data organization in
 the path element of the URI (i.e. http://lod.mycompany.com/people is a bad
 idea). In my understanding, I could use base URIs like
 http://lod.mycompany.com/resource, http://lod.mycompany.com/page and
 hhtp://lod.mycompany.com.data, and then use unique identifiers for all the
 things I want to publish something about. If I understand correctly, I don't
 need the URI to describe the hierarchy of my data because all Linked Data
 are self-describing. Nice.

 But then I am faced with the problem: What method do I use to mint my
 identifiers? Those identifiers need to be unique. Should I use a number
 sequence, or a hash function? In those cases the URIs would be uniform and
 give no indication of the type of data. But a number sequence seems unsafe,
 and in the case of a hash function I would still need to make some kind of
 structured choice of input values.

 I would welcome any advice on this topic from people who have had some more
 experience with publishing Linked Data.

 Regards,
 Frans Knibbe

Re: How many instances of foaf:Person are there in the LOD Cloud?

2011-04-13 Thread Giovanni Tummarello

sindice.com main index has 37,312,159 documents occurrences of  foaf:person.

http://sindice.com/search?q=foaf%3Aperson
(a lot of these come from microformats via the any23 library but anyway)

which means there are many more actual persons inside.

Gio


On Wed, Apr 13, 2011 at 10:15 AM, Bernard Vatant
bernard.vat...@mondeca.com wrote:
 Hello all

 Just trying to figure what is the size of personal information available as
 LOD vs billions of person profiles stored by Google, Amazon, Facebook,
 LinkedIn, unameit ... in proprietary formats.

 Any hint of the proportion of living people vs historical characters is
 also welcome.

 Any idea?

 Bernard


 --
 Bernard Vatant
 Senior Consultant
 Vocabulary  Data Integration
 Tel:       +33 (0) 971 488 459
 Mail:     bernard.vat...@mondeca.com
 
 Mondeca
 3, cité Nollez 75018 Paris France
 Web:    http://www.mondeca.com
 Blog:    http://mondeca.wordpress.com

Re: How many instances of foaf:Person are there in the LOD Cloud?

2011-04-13 Thread Giovanni Tummarello

to add to this, internal sources report for xmlns.com/foaf/0.1/Person

totalReferences (number of triple  involving a foaf person) 964563435
( almost a billion, obviously not unique individuals)
graphReferences (number of pages /resolvable URLs/graphs)34915501
domainReferences (number of distinct domains) 3439696
sldReferences (number of distinct second level domains, aggregates all
the foo.example.com foo2.example.com)  69004

 I think fakefriends.me creates a lot of indeed false occurrences (we
have banned it now but some data is still there) but other than that
.. enjoy :)

cheers

On Wed, Apr 13, 2011 at 4:48 PM, Giovanni Tummarello
giovanni.tummare...@deri.org wrote:
 sindice.com main index has 37,312,159 documents occurrences of  foaf:person.

 http://sindice.com/search?q=foaf%3Aperson
 (a lot of these come from microformats via the any23 library but anyway)

 which means there are many more actual persons inside.

 Gio


 On Wed, Apr 13, 2011 at 10:15 AM, Bernard Vatant
 bernard.vat...@mondeca.com wrote:
 Hello all

 Just trying to figure what is the size of personal information available as
 LOD vs billions of person profiles stored by Google, Amazon, Facebook,
 LinkedIn, unameit ... in proprietary formats.

 Any hint of the proportion of living people vs historical characters is
 also welcome.

 Any idea?

 Bernard


 --
 Bernard Vatant
 Senior Consultant
 Vocabulary  Data Integration
 Tel:       +33 (0) 971 488 459
 Mail:     bernard.vat...@mondeca.com
 
 Mondeca
 3, cité Nollez 75018 Paris France
 Web:    http://www.mondeca.com
 Blog:    http://mondeca.wordpress.com

Re: data schema / vocabulary / ontology / repositories

2011-03-13 Thread Giovanni Tummarello

To the best of my knowledge there isnt anything that one could call
modern, updated out there.

something modern and credible would be actual data + social backed
(votes, comments, etc) . . as said in the past  we in Sindice  we'd be
delighted to provide the data part if anyone wanted to coordinate the
rest. Something based on pure data analysis will be made available
shortly anyway.

Gio

On Sun, Mar 13, 2011 at 5:15 PM, Dieter Fensel dieter.fen...@sti2.at wrote:
 Dear all,

 for a number of projects I was searching for vocabularies/Ontologies
 to describe linked data. Could you please recommend me places
 where to look for them? I failed to find a convenient entrance point for
 such
 kind of information. I only found some scattered information here and
 there?

 Thanks,

 Dieter
 --
 Dieter Fensel
 Director STI Innsbruck, University of Innsbruck, Austria
 http://www.sti-innsbruck.at/
 phone: +43-512-507-6488/5, fax: +43-512-507-9872

Re: ANN: geometry2rdf software library

2011-01-17 Thread Giovanni Tummarello

Boris would you be able to provide a bit of explanation on why would
you want o do that e.g. what evidence are there (nice use cases) were
an rdf export of low level features in the map is of use
thanks!
Gio

On Mon, Jan 17, 2011 at 2:34 AM, Boris Villazón Terrazas
bvilla...@fi.upm.es wrote:

 Victor, Miguel Angel and Boris

Re: Is 303 really necessary?

2010-11-28 Thread Giovanni Tummarello

 - the rest of the web continue to use 200

 Tim

yes but the rest of the web will use 200 also to show what we would
consider 208, e.g.

http://www.rottentomatoes.com/celebrity/antonio_banderas/

see the trilples
http://inspector.sindice.com/inspect?url=http://www.rottentomatoes.com/celebrity/antonio_banderas/#TRIPLES

http://www.rottentomatoes.com/celebrity/antonio_banderas/

is clearly a web page but its also an actor, it is pointed by their
graph in other pages as such and the same page contains the opengraph
triple  type actor

We should not get ourself in the position to have to try to evangelize
all to change something for reasons that are really not apparent to
your normal web world. I think the solution we should be seeking
consider RDFa publishing via normal 200 code as the example above
absolutely ok

an agent would then be able to distinguish which properties apply to
the page and which to the actor looking at the.. properties
themselves i guess?  sad but possibly unavoidable?

Giovanni

Re: survey: who uses the triple foaf:name rdfs:subPropertyOf rdfs:label?

2010-11-12 Thread Giovanni Tummarello

Yes Sig.ma heavily checks for properties that are subclass of label
and uses them.
I think sparallax as well.
Gio

On Fri, Nov 12, 2010 at 12:08 PM, Dan Brickley dan...@danbri.org wrote:
 Dear all,

 The FOAF RDFS/OWL document currently includes the triple

  foaf:name rdfs:subPropertyOf rdfs:label .

 This is one of several things that OWL DL oriented tools (eg.
 http://www.mygrid.org.uk/OWL/Validator) don't seem to like, since it
 mixes application schemas with the W3C builtins.

 So for now, pure fact-finding. I would like to know if anyone is
 actively using this triple, eg. for Linked Data browsers. If we can
 avoid this degenerating into a thread about the merits or otherwise of
 description logic, I would be hugely grateful.

 So -

 1. do you have code / applications that checks to see if a property is
 rdfs:subPropertyOf rdfs:label ?
 2. do you have any scope to change this behaviour (eg. it's a web
 service under your control, rather than shipping desktop software )
 3. would you consider checking for ?x rdf:type foaf:LabelProperty or
 other idioms instead (or rather, as well).
 4. would you object if the triple foaf:name rdfs:subPropertyOf
 rdfs:label  is removed from future version of the main FOAF RDFS/OWL
 schema? (it could be linked elsewhere, mind)

 Thanks in advance,

 Dan

Re: A(nother) Guide to Publishing Linked Data Without Redirects

2010-11-10 Thread Giovanni Tummarello

Bravo Harry :-)

let me also add without adding anythng to the header..  *keeping HTTP
completely outside the picture*
http header are for pure optimization issues, almos networking level.
Caching fetching crawling, nothing to do with semantics.

A conjecture: the right howto document is about 2 pages long it says
somethingsimply put RDFa on your pages and.. ( a) there is a default
interpretation which works 99.99% of the time e.g. if it has RDFa it
talks about something that's an entity and its not a page or b) you
add a triple but no triple means by default that.. or c)  )

we're almost there i feel it.

Gio

On Thu, Nov 11, 2010 at 1:50 AM, Harry Halpin hhal...@ibiblio.org wrote:
 On Wed, Nov 10, 2010 at 11:15 PM, David Wood da...@3roundstones.com wrote:
 Hi all,

 I've collected my thoughts on The Great 303 Debate of 2010 (as it will be 
 remembered) at:
  http://prototypo.blogspot.com/2010/11/another-guide-to-publishing-linked-data.html

 Briefly, I propose a new HTTP status code (210 Description Found) to 
 disambiguate between generic information resources and the special class of 
 information resources that provide metadata descriptions about URIs 
 addressed.

 My proposal is basically the same as posted earlier to this list, but 
 significantly updated to include a mechanism to allow for the publication of 
 Linked Data using a new HTTP status code on Web hosting services.  Several 
 poorly thought out corner cases were also dealt with.

 I don't this solution cuts it or solves the problem to the extent that
 Ian Davis was proposing. To recap my opinion, the *entire* problem
 from many publisher's perpsectives is the use of status codes at all -
 whether it's 303 or 210 doesn't really matter. Most people, they will
 just want to publish their linked data in a directory without having
 to worry about status codes. So, de facto, the only status code that
 will matter is 200.

 The question is how to build Linked Data on top of *only* HTTP 200 -
 the case where the data publisher either cannot alter their server
 set-up (.htaccess) files or does not care to.


 I look forward to feedback from the community.  However, if you are about to 
 say something like, the Web is just fine as it is, then I will have little 
 patience.  We invent the Web as we go and need not be artificially 
 constrained.  The Semantic Web is still young enough to be done right (or 
 more right, or maybe somewhat right).

 Regards,
 Dave

Re: Is 303 really necessary - demo

2010-11-05 Thread Giovanni Tummarello

I might be wrong but I dont like it much . Sindice would index it as 2
documents.

http://iandavis.com/2010/303/toucan
http://iandavis.com/2010/303/toucan.rdf

i *really* would NOT want to different URLs resolving to the same thing

thanks
Giovanni


On Fri, Nov 5, 2010 at 10:43 AM, Ian Davis m...@iandavis.com wrote:
 Hi all,

 To aid discussion I create a small demo of the idea put forth in my
 blog post http://iand.posterous.com/is-303-really-necessary

 Here is the URI of a toucan:

 http://iandavis.com/2010/303/toucan

 Here is the URI of a description of that toucan:

 http://iandavis.com/2010/303/toucan.rdf

 As you can see both these resources have distinct URIs.

 I created a new property http://vocab.org/desc/schema/description to
 link the toucan to its description. The schema for that property is
 here:

 http://vocab.org/desc/schema

 (BTW I looked at the powder describedBy property and it's clearly
 designed to point to one particular type of description, not a general
 RDF one. I also looked at
 http://ontologydesignpatterns.org/ont/web/irw.owl and didn't see
 anything suitable)

 Here is the URI Burner view of the toucan resource and of its
 description document:

 http://linkeddata.uriburner.com/about/html/http://iandavis.com/2010/303/toucan

 http://linkeddata.uriburner.com/about/html/http/iandavis.com/2010/303/toucan.rdf

 I'd like to use this demo to focus on the main thrust of my question:
 does this break the web  and if so, how?

 Cheers,

 Ian

 P.S. I am not fully caught up on the other thread, so maybe someone
 has already produced this demo

200 OK with Content-Location might work: But maybe it can be simpler?

2010-11-05 Thread Giovanni Tummarello

How about something that's totally independant from HEADER issues?

think normal people here. absolutely 0 interest to mess with headers
and http responses.. absolutely no business incentive to do it.

as a baseline think someone wanting to annotate with RDFa a hand
crafted, apached served html file.
really.. as simple as serving this people.

as simple as anyone who's using opengraph just copy pastes into their
HTML template.. as simple as this
really, please, its the only thing that can work?

Giovanni

On Fri, Nov 5, 2010 at 5:55 PM, Nathan nat...@webr3.org wrote:
 Mike Kelly wrote:

 http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-12#page-14

 snipped and fuller version inserted:

   4.  If the response has a Content-Location header field, and that URI
       is not the same as the effective request URI, then the response
       asserts that its payload is a representation of the resource
       identified by the Content-Location URI.  However, such an
       assertion cannot be trusted unless it can be verified by other
       means (not defined by HTTP).

 If a client wants to make a statement  about the specific document
 then a response that includes a content-location is giving you the
 information necessary to do that correctly. It's complemented and
 further clarified in the entity body itself through something like
 isDescribedBy.

 I stand corrected, think there's something in this, and it could maybe
 possibly provide the semantic indirection needed when Content-Location is
 there, and different to the effective request uri, and complimented by some
 statements (perhaps RDF in the body, or Link header, or html link element)
 to assert the same.

 Covers a few use-cases, might have legs (once HTTP-bis is a standard?).

 Nicely caught Mike!

 Best,

 Nathan

Re: Is 303 really necessary?

2010-11-04 Thread Giovanni Tummarello

Hi Ian

no its not needed see this discussion
http://lists.w3.org/Archives/Public/semantic-web/2007Jul/0086.html
pointing to 203 406 or thers..

..but a number of social community mechanisms will activate if you
bring this up, ranging from russian style you're being antipatriotic
criticizing the existing status quo  to ..but its so deployed now
and .. you're distracting the community from other more important
issues , none of this will make sense if analized by proper logical
means of course (e.g. by a proper IT manager in a proper company, paid
based on actual results).

But the core of the matter really is : who cares. My educated guess
looking at Sindice flowing data is that everyday out of 100 new sites
on  web of data 99.9 simply use RDFa which doesnt have this issue.

choose how to publish yourself but here is another one. If you chose
NOT to use RDFa you will miss out on anything which will enhance the
user experience based on annotations. As an example see our entry in
the  semantic web challange [1].

Giovanni

[1] http://www.cs.vu.nl/~pmika/swc/submissions/swc2010_submission_19.pdf



On Thu, Nov 4, 2010 at 2:22 PM, Ian Davis m...@iandavis.com wrote:
 Hi all,

 The subject of this email is the title of a blog post I wrote last
 night questioning whether we actually need to continue with the 303
 redirect approach for Linked Data. My suggestion is that replacing it
 with a 200 is in practice harmless and that nothing actually breaks on
 the web. Please take a moment to read it if you are interested.

 http://iand.posterous.com/is-303-really-necessary

 Cheers,

 Ian

Re: Is 303 really necessary?

2010-11-04 Thread Giovanni Tummarello

 I think it's an orthogonal issue to the one RDFa solves. How should I
 use RDFa to respond to requests to http://iandavis.com/id/me which is
 a URI that denotes me?


hashless?

mm one could be to return HTML + RDFa describing yourself. add a
triple saying http://iandavis.com/id/me
containstriplesonlyabouttheresourceandnoneaboutitselfasinformationresource

its up to clients to really care about the distinction, i personally
know of no useful clients for the web of data that will visibly
misbehave if a person is mistaken for a page.. so your you can certify
to your customer your solution works well with any client

if one will come up which operates usefully on both people and pages
and would benefit from making your distinction than those coding that
client will definitely learn about your
containstriplesonlyabouttheresourceandnoneaboutitselfasinformationresource
and support it.

how about this ? :-)

as an alternative the post i pointed you earlier (the one about 203
406) did actually contain an answer i believe.  406 is perfect IMO ..
I'd say a client which will care to make the distinction would learn
to support it as in my previous example.

cheers

Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices

2010-10-21 Thread Giovanni Tummarello

 But again: I agree that crawling the Web of Data and then deriving a dataset
 catalog as well as meta-data about the datasets directly from the crawled
 data would be clearly preferable and would also scale way better.

 Thus: Could please somebody start a crawler and build such a catalog?

 As long as nobody does this, I will keep on using CKAN.


Hi Chris, all

I can only restate that within Sindice we're very open to anyone who
wanted to develop data anlisys apps creating catalogs automatically.
At the moment a map reduce job a couple of week ago gave an excess of
100k independent datasets. How many interlinked etc? to be analyzed.

Our interest (and the interest of the Semantic Web vision i want to
sposor) is to make sure RDFa sites are fully included and so are those
who provide markup which can however be translated in an
automatic/agreeable way (so no scraping or sponging) into RDF. (that
is anything that any23.org can turn into triples)

If you were indeed interested in running your or developing your
algorithms in our running dataset no problem, the code can be made
opensource so it would run on others similarly structured datasets.

This said yes i think too that in this phase a CKAN like repository
can be an interesting aggregation point, why not.

 But i do think the diagram, which made great sense as an example when
Richard started it is now at risk of providing a disservice
which is in line which what Martin is making noticed.

The diagram as it is now kinda implicitly conveys the sense that if
something is so large then all that matters must be there and that's
absolutely not the case.

a) there are plenty of extremely useful datasets is RDF/RDFa etc which
are not there
b) the usefulness of being linked is all but a proven fact, so on the
one hand people might want to be there on the other you'd have to do
pushing toward serious commercial entities (for example) to link to
dbpedia for reasons that arent clear and that hurts your credibility.

So danny ayers has fun linking to dbpedia so he is in there with his
joke dataset, but you cant credibly bring that argument to large
retailers so they're left out?

this would be ok if the diagram was just hey its my own thing i set
my rules - fine but the fanfare around it gives it a different
meaning and thus the controversy above.

.. just tried to put in words what might be a general unspoken feeling..

Short message recap
a) ckan - nice why not might be useful but..
b) generated diagram : we have the data or can collect it so whoever
is interested in analitics pls let us know and we can work it out
(matter of fact it turns out most uf us in here are paid by EU for
doing this in collaborative projects :-) )

cheers
Giovanni

Re: Deltas of RDF files from Sindice or other site?

2010-10-03 Thread Giovanni Tummarello

Hi Mattihas,

sorry for the delay. it is indeed a possible API which we call
longstanding query or notification api . Not yet available , but
we have many requests for it so it wil come.

my advice at the moment would be to do it yourself client side using
say a DB state and fetching the data from our cache
http://sindice.com/developers/cacheapi
there is a timestamp which reflect the last updated date.
you might want to send sindice specific messages to the
http://groups.google.com/group/sindice-dev

Cheers Gio

On Fri, Oct 1, 2010 at 7:21 PM, Matthias Quasthoff
matthias--...@quasthoffs.de wrote:
 Dear all,

 is there a way of obtain the changes to RDF graphs over time, e.g., from
 Sindice? I.e., that I not only see

 _:p1 foaf:knows _:p2, _:p3

 but rather

 [ rdf:subject _:p1; rdf:predicate foaf:knows; rdf:object _:p2; dc:date
 sometime long ago ]
 [ rdf:subject _:p1; rdf:predicate foaf:knows; rdf:object _:p3; dc:date
 more recently ]

 It'd be so useful to have such information (maybe using more appropriate
 vocabulary) to somehow build something social like the Facebook Wall or
 something based on FOAF. I'm currently collecting such data myself with the
 input of PSTW, but maybe it has already been done?

 Thanks + cheers
 Matthias

Re: Linked Data and IRI dereferencing (scale limits?)

2010-08-06 Thread Giovanni Tummarello

Only solution for you now is to use SPARQL instead of resolving the URI.

 Much less traffic and it would actually work


 SPARQL doesn't make the problem go away, it just pushes the limits further
 out. SPARQL endpoints that see significant traffic have similar restrictions
 built in, either on query complexity or query runtime or number of results.
 So you might hit the limit at 16000 statements rather than 2000 or whatever.



Jorn could have asked for the rpoperties he knows how to handle and would
have received them (his problem was that they were cut off by the 2000
limit)..

Gio

Re: Linked Data and IRI dereferencing (scale limits?)

2010-08-06 Thread Giovanni Tummarello

Thanks Paul, this sort of feedback is indeed tremeoudly useful,

I somehow just wish you had had 1/10th of the replies of the subjects as
literal thread.:-)
Gio

(obviously we're talking business of LOD at large and the true state of it
despite the growing number of lines in the lod cloud diagram. We're not a
specific tecnicalities of dbpedia which is obviously run as good as the guys
economically can)


On Thu, Aug 5, 2010 at 4:07 PM, Paul Houle ontolo...@gmail.com wrote:

 If you want to get something done with dbpedia,  you should (i) work from
 the data dumps,  or (ii) give up and use Freebase instead.

 I used to spend weeks figuring how to to clean up the mess in dbpedia until
 the day I wised up and realized I could do in 15 minutes w/ Freebase what
 takes 2 weeks to do w/ dbpedia,  because w/ dbpedia you need to do a huge
 amount of data cleaning to get anything that makes sense.

 The issue here isn't primarily RDF vs Freebase but it's really a matter
 of the business model (or lack thereof) behind dbpedia;  frankly,  nobody
 gets excited when dbpedia doesn't work,  and that's the problem.  For
 instance,  nobody at dbpedia seems to give a damn that dbpedia contains 3000
 countries,  wheras there's more like 200 actual active countries in the
 world...  Sure,  it's great to have a category for things like
 Austria-Hungary and The Teutonic Knights,  but an awful lot of people
 give up on dbpedia when they see they can't easily get a list of very basic
 things,  like a list of countries.

 Now,  I was able to,  more-or-less,  define active country as a
 restriction type:  anything that has an ISO country code in freebase is an
 active country,  or is pretty close.  The ISO codes aren't in dbpedia
 (because they're not in wikipedia infoboxes) so this can't be done with
 dbpedia:  i'd probably need to code some complex rules that try to guess at
 this based on category memberships and what facts are available in the
 infobox.

 I complained on both dbpedia and freebase discussion lists,  and found
 that:  (i) nobody at dbpedia wants to do anything about this,  and (ii) the
 people at freebase have investigated this and they are going to do something
 about it.

 

 In my mind,  anyway,  the semantic web is a set of structured boxes. It's
 not like there's one T Box and one A Box but there are nested boxes of
 increasing specificity.  In the systems I'm building,  a Freebase-dbpedia
 merge is used as a sort of T' Box that helps to structure and interpret
 information that comes from other sources.  With a little thinking about
 data structures,  it's efficient to have a local copy of this data and use
 it as a skeleton that gets fleshed out with other stuff.  Closed-world
 reasoning about this taxonomic core is useful in a number of ways,
 particularly in the detection of key integrity problems,  data holes,
 inconsistencies,  junk data,  etc.  I think the dereference and merge
 paradigm is useful once you've got the taxocore and you're merging little
 bits of high-qualtiy data,  but w/o control of the taxocore you're just
 doomed.

Re: Linked Data and IRI dereferencing (scale limits?)

2010-08-05 Thread Giovanni Tummarello

Jorn you're right.

linked data with plain dereferenciable URIs it plain doesnt work once you
move from the simplest examples.  This is for some of the reasons you
mention as well as other others  (e.g. how do you really ask what are the
1000 URis most visited (assuming this was in the DB) or the 100 biggest
cities or what is the URI which is sameas geonames:united_states . You
cant.

anyway see below

1. DBpedia still uses skos:subject quite often, even though it's deprecated.

If you look the URI http://www.w3.org/2004/02/skos/core#subject I'm silently
 redirected to the current skos definition http://www.w3.org/TR/skos-
 reference/skos.html#subjecthttp://www.w3.org/TR/skos-%0Areference/skos.html#subject,
 but there is no #subject in it anymore. This
 means: no rdfs:label for a property which is ubiquitous in DBpedia.
 Am I missing out some Header option for the content negotiation or is this
 a
 problem of the w3.org end?


in http://sig.ma to get something we hope looks like a label we often have
to split the URI in the end..
Make yourself a local cache with those ontologies as long as the URI
semantics doesnt change (it shouldnt) you cna then give a local label.



 2. When dereferencing DBpedia URIs I repeatedly found a suspiciously equal
 number of triples per fetched IRI in the local cache: 2001 triples,
 sometimes
 2002. I remembered: ah, yes...



 There is no rdfs:label, no rdf:type, etc. in it, while all these useful
 things
 are in the HTML version.
 I'm not pointing this out to say that there is a problem in DBpedia. I
 think
 this is a serious problem of scale. How do you decide what is useful for
 someone dereferencing your URIs? How do you keep unnecessary traffic low at
 the
 same time?
 I think maybe a few standard triples should be included in any case (e.g.,
 rdfs:label, rdf:type, ...),




Only solution for you now is to use SPARQL instead of resolving the URI.
Much less traffic and it would actually work (and less parsing on your
side!)

or ask the HTML side, if there is RDFa bingo, there are very good reason why
this should indeed be the only way one should tell people to serve data [1].
With RDFa out there i personally hope redirections become a thing of the
past (not negotiation, negotiatin is transparently good. Negotiating an RDF
version and getting it shold always be possible).

Maybe something could be done in the future by adding special values in a
dereferencing response like igotmoreoftheseaskifyouneedthem . This has
been proposed but not standardized/ implemented AFAIK  etc.

cheers and show us the final result :-)
Giovanni

[1] http://tantek.com/log/2005/06.html#d03t2359

Efficient Data discovery and Sync Support - proposed method and Sindice implementation

2010-07-08 Thread Giovanni Tummarello

Apologies for cross posting
-

Dear all

So far semantic web search engines and semantic aggregation services have
been inserting datasets by hand or have been based on random walk like
crawls with no data completeness or freshness guarantees.

After quite some work, we are happy to announce that Sindice is now
supporting effective large scale data acquisition with *efficient syncing*
capabilities based on already existing standards (a specific use of  the
sitemap protocol).

For example if you publish 30 products using RDFa or whatever you want
to use (microformats,  303s etc), by making sure you comply to the proposed
method, Sindice will now guarantee you

a) to crawl your dataset completely (might take some time since we do this
politely)
b) ..but only crawl you once and then get just the updated URLs on a daily
bases! (so timely data update guarantee)

So this is not Crawling anymore, but rather a live DB like connection
between remote, diverse dataset all based on http. in our opinion this is a
*very* important step forward for semantic web data aggregation
infrastructures.

The specification we support (and how to make sure you're being properly
indexed) are published here  (pretty simple stuff actually!)

http://sindice.com/developers/publishing

and results can be seen from websites which are already implementing these
(you might be already doing that indeed without knowing..)

http://sindice.com/search?q=domain:www.scribd.com+date:last_weekqt=term

Why not make sure that your site can be effectively kept in sync today?

As always  we look forward for comments, suggestions and ideas on how to
serve better your data needs (e.g. yes, we'll also support Openlink dataset
sync proposal once the specs are finalized). Feel free to ask specific
questions about this or any other Sindice related issue on our dev forum
http://sindice.com/main/forum

Giovanni,
on behalf of the Sindice team http://sindice.com/main/about. Special credits
for this to Tamas Benko and Robert Fuller.

p.s. we're hiring

Re: DBpedia-Live and Delta Exposure

2010-06-19 Thread Giovanni Tummarello

Hi there :-) looks very cool.

could you please point us to the specifics of protocol? so we can start
considering integrating in Sindice

Note:  we're about to announce (monday?) delta support in Sindice based on
Sitemaps lastmod which seems to be the easiest possible for the HTML+ RDFa
world.

For the RDF world however it would be cool to support your proposal.

Cheers

On Sat, Jun 19, 2010 at 7:24 PM, Kingsley Idehen kide...@openlinksw.comwrote:

 All,

 Note:

 http://dbpedia-live.openlinksw.com/live/

 Note the delta icon.

 Get closer to being done :-)
 Example:
 http://dbpedia-live.openlinksw.com/live/delta.vsp?uri=http://dbpedia.org/resource/Emerson_Pereira_da_Silva

 --

 Regards,

 Kingsley Idehen   President  CEO OpenLink Software Web:
 http://www.openlinksw.com
 Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca: kidehen

Hiring opportunities in Sindice

2010-06-12 Thread Giovanni Tummarello

For the interested,

within several new EU projects there are now hiring opportunities
available to work on Sindice current and future services: cloud
computing postdoc/researcher, cloud/semantic/integration developers.
Internships also available with possible ph.d continuation.

Good community interaction capabilities are a good plus.
Location: Galway (DERI) or Trento (FBK)

Thanks for passing it along! Interested individual may inquire
directly writing to me
cheers
Giovanni

Sindice real time widget/api, and news feed

2010-04-26 Thread Giovanni Tummarello

Hi all,

A new version of the Sindice frontend with some interesting improvements.
e.g. a  realtime data widget on the homepage, and the new API to
restrict to new day documents (or weekly) etc.

http://sindice.com

Also Facebook support for RDFa is making the web now bubble with new triples.

See how these are supported right away:

http://sindice.com/developers/inspector/?url=http%3A%2F%2Fwww.rottentomatoes.com%2Fm%2F10011268-oceans%2F

New important features and capabilities are in the pipeline for the
next weeks and months, those interested may now follow us on the
sindice_news twitter feed.

 http://twitter.com/Sindice_news


On behalf of the Sindice Team
Giovanni

Re: Semantic black holes at sameas.org Re: [GeoNames] LOD mappings

2010-04-23 Thread Giovanni Tummarello

 sws.geonames URIs, SPARQL endpoint etc. Bearing in mind that Geonames.org
 has no dedicated resources for it, who will care of that in a scalable way?
 What is the business model? Good questions. Volunteers, step forward :)

 Bernard

Hi Bernard, the need to automatically interlink at large scale, and
give clean, and high performance querable datasets to users is well
recognized and supported e.g. also by the new EU funded projects which
still cant be named (i guess) yet but are now about to be confirmed.

so hang on tight a bit.. we're working on this, just continue
publishing high quality data with good entity descriptions (as much as
you know abou YOUR stuff), and the links will come to you just like
that at some point. I promise :)

Giovanni

Re: Semantic black holes at sameas.org Re: [GeoNames] LOD mappings

2010-04-23 Thread Giovanni Tummarello

 so hang on tight a bit.. we're working on this, just continue
 publishing high quality data with good entity descriptions (as much as
 you know about YOUR stuff), and the links will come to you just like
 that at some point. I promise :)

 WOW ... rings a bell  ...and all these things will be given to you as well.
 Let me find out. Here it is : http://bible.cc/matthew/6-33.htm
 BTW, amazing coreference resource :))



:-) i think the link is somehow not so outragious ..
if a human moral behaviour is that that thinks not of ones own
interest only but considers the good of all, similarly a good web of
data citizen would describe his/her entities in way that's not only
meant to serve his/her immediate use case but would indeed give enough
context and surrounding data for others to understand and reuse.

In which case once you have the propermachinery in place, good thigns
should happen almost inevitably, or at least we like to hope.
enjoy the weekend everyone :-) i am of myself as they finally did find
me a flight

Gio

Re: RDF Dataset Notifications

2010-04-17 Thread Giovanni Tummarello

Hi Leigh

i tell you what we're going to be supporting in Sindice very soon and
it would be great if you could add it to the table:

simple existing sitemaps :-). Sitemaps provide the list of URLs to
crawl and for each one either a last updated field or update
frequerncy.

If the website cares to update the last updated properly then also
huge datasets can be kept in sync on daily (or less) bases.

by publishing RDF in entity based slices (HTML + RDFa) the mechanism
simply works fine and it is the same large web publishers have been
using for years to expose the deep web so it is not difficult to
explain etc.

for large datasets which are large RDF files, the Semantic Sitemap
extention does its job for us (dbpedia and many others are in Sindice
because of that)

What do you think?
cheers

On Fri, Apr 16, 2010 at 9:19 PM, Leigh Dodds leigh.do...@talis.com wrote:
 Hi,

 There's been a fair bit of discussion, and more than a few papers around
 dataset notifications recently. I've written up a blog post and a quick
 survey of technologies to start to classify the available approaches:

 http://www.ldodds.com/blog/2010/04/rdf-dataset-notifications/

 Plenty more detail to be added, but thought I'd post it here for discussion.

 Cheers,

 L.

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Talis
 leigh.do...@talis.com
 http://www.talis.com

Re: What would you build with a web of data?

2010-04-11 Thread Giovanni Tummarello

+1 thanks Nathan for pointing this out, very very relevant.
luckly so far it seems a bit too rooted in MS stack of things (just
looking at it very very superficially) :-)?

Gio

 ps: realistically there's the whole microsoft thing to keep in the back
 of our minds; they have pretty much a semi-proprietary full end to end
 of most of the above, from M through OData through Pivot via silverlight
 and seadragon - and realistically by 2011 this will be starting to take
 off in a big way; there is a chance linked data could miss the boat
 and become nothing more than legacy data which people transform in to
 odata then use on the (by then) well supported and rolled out tech stack.

 pps: google are pushing in this direction too, it won't be long before
 we get a big surprise from their end (gdata + openid + oauth +
 gmail/buzz-additions + chromium-os + chrome + android + comparatively
 unlimited resources and thousands of amazing developers + a huge
 developer community)

 regardless of what anybody says, these two companies will push there own
 versions of what we're doing out within the next 12-18 months, with full
 developer support.

 please do remember I'm a huge linked data fan  have my interests firmly
 planted in linked data + read/write web - just aware of the realities at
 hand.

 regards!

Re: [Patterns] Materialize Inferences (was Re: Triple materialization at publisher level)

2010-04-07 Thread Giovanni Tummarello

The problem is not the standard but its the process.

a webmaster should dump the data in a owl or so reasoner (or rdfs
whatever) , then pick the outcoming triples and basically hand write
those extra classes in the nice HTML template.. mmm :-)

many extra triples (subject to change if ontologies change or better
reasoning happens or new data etc) + serialization not fully perfomred
automatically would seem an irrealistic



On Wed, Apr 7, 2010 at 12:38 PM, Vasiliy Faronov vfaro...@gmail.com wrote:
 Giovanni Tummarello wrote:
 In this casematerialization is likely not going to happen much (you
 wouldnt want to materialize inside something visible for the end user
 etc).

 Why not? RDFa has easy support for the most common case of
 materialization, which is subclass and subproperty inference: you just
 write all the classes and properties separated by spaces. Like this:

        div about=#me typeof=foaf:Person dcterms:Agent
                h1 property=foaf:name rdfs:labelJohn Doe/h1
        /div

 Drupal 7 does this for its SIOC export (at least that used to be shown
 on their demo site[1] which is now empty for whatever reason).

 [1] http://drupalrdf.openspring.net/

 --
 Vasiliy Faronov

Re: Triple materialization at publisher level

2010-04-06 Thread Giovanni Tummarello

Wrt this,
i feel like sharing how we address this issue in Sindice and the tools
we provide.
We do materialization at central level following recursively the links
to ontologies e.g. by resolving property names.

This allows data producers to be consideraly more concise in the
markup (e.g. think of RDFa pages) and indeed skip all the of the
materialization itself

A tool to test how many inferred triples you get out of your explicit
ones once full materialization is performed is the Sindice web data
inspector

http://sindice.com/developers/inspector

e.g. here it shows the ontologies used in a simple foaf files

http://sindice.com/developers/inspector/?url=http%3A%2F%2Fg1o.net%2Ffoaf.rdfdoReasoning=true#ontologies

and the inferred triples are shown in the other tab

http://sindice.com/developers/inspector/?url=http%3A%2F%2Fg1o.net%2Ffoaf.rdfdoReasoning=true#triples

at any time, clients can exploit the full machinery either live (API doc)

http://sindice.com/developers/inspector/?url=http%3A%2F%2Fg1o.net%2Ffoaf.rdfdoReasoning=true#api

or via the cache (at break neck speed is ok here, served directly by hbase)

http://sindice.com/search/page?q=g1o.net+foaf+giovanni+tummarelloqt=termurl=http%3A%2F%2Fg1o.net%2Ffoaf.rdf%23me#api

hopes this helps
Giovanni

On Tue, Apr 6, 2010 at 8:58 PM, Vasiliy Faronov vfaro...@gmail.com wrote:
Hi,

The announcement of the Linked Data Patterns book[1] prompted me to
raise this question, which I haven't yet seen discussed on its own. If
I'm missing something, please point me to the relevant archives.

The question is: should publishers of RDF data explicitly include
(materialize) triples that are implied by the ontologies or rules used;
and if yes, to what extent?

For example, should it be
exspecies:14119 skos:prefLabel Jellyfish .
ex:bob a foaf:Person .
or
exspecies:14119 skos:prefLabel Jellyfish ;
rdfs:label Jellyfish .
ex:bob a foaf:Person , foaf:Agent .
?

The reason I find this worthy of attention is because there seems to be
a gap between simple RDF processing and reasoning. It's easy to find an
RDF library for your favourite language, fill a graph with some data and
do useful things with it, but it's somewhat harder to set up proper
RDFS/OWL reasoning over it, not to mention the added requirements for
computational power.

I think this is one area where a general best practice or design
pattern can be developed.

[1] http://patterns.dataincubator.org/book/

--
Vasiliy Faronov

Re: Triple materialization at publisher level

2010-04-06 Thread Giovanni Tummarello

Hi Vasily yes, you can use Sindice for that purpose.

either from asking data from the full reasoned cache (ask away ,we
can serve plenty) or from the reasoning API (with a bit of moderation,
it is an intense process although we do have many layers of caching)

a blog post about the details
http://blog.sindice.com/2009/10/12/new-inspector-full-cache-api-all-with-online-data-reasoning/

in the future we might release the whole machinery as open source
(however it is a bit involved to run! e.g. requires hbase, hadoop and
plenty of services)

Giovanni

 This allows data producers to be consideraly more concise in the
 markup (e.g. think of RDFa pages) and indeed skip all the of the
 materialization itself

 Do I understand correctly that Sindice can serve as a kind of a middle
 reasoning layer between the original data publisher and the consumers?
 I.e. that a client can request data indirectly from Sindice and have all
 the implied triples included in the response?

 --
 Vasiliy Faronov

Re: ISWC2009 Metadata Available

2009-10-23 Thread Giovanni Tummarello

 - general chair Enrico Motta:
 http://data.semanticweb.org/person/enrico-motta (see that is general chair 
 2009)
 - a paper from the research track:
 http://data.semanticweb.org/conference/iswc/2009/paper/research/311
 - a workshop at ISWC2009:
 http://data.semanticweb.org/workshop/terra_cognita/2009

cool, it works.  :-)

http://sig.ma/search?q=Enrico+Motta
http://sig.ma/search?q=Terra+Cognita+2009

thanks guys.
Think about it what if easychair was to publish linked data? it would
be pretty powerful and work for lots of conferences automatically
possibly.

Giovanni

Re: The Power of Virtuoso Sponger Technology

2009-10-18 Thread Giovanni Tummarello

I'd say, if i understand well

that that works only for queries where you need the extra dereferenced
data just additionally e.g. to add a label to your result se
if you need the remote, on the fly reference data to e.g. sort by
price you'd have to fetch all from the remote site ..

Gio



On Sun, Oct 18, 2009 at 2:57 PM, Olaf Hartig
har...@informatik.hu-berlin.de wrote:
 Hey,

 On Sunday 18 October 2009 09:37:14 Martin Hepp (UniBW) wrote:
 [...]
 So it will boil down to technology that combines (1) crawling and
 caching rather stable data sets with (2) distributing queries and parts
 of queries among the right SPARQL endpoints (whatever actual DB
 technology they expose).

 You can keep a text index of the whole Web, if crawling cycles in the
 order of magnitude of weeks are fine. For structured, linked data that
 exposes dynamic database content, dumb crawling and caching will not
 scale.

 Interesting discussion!

 An alternative approach to query federation is the link traversal based query
 execution as implemented in the SemWeb Client Lib. The main idea of this
 approach is to look-up URIs during the query execution itself. With this
 approach you don't rely on the existence of SPARQL endpoints and -even more
 important- you don't have to know all the sources that contribute to the query
 result in advance. Plus, the results are based on the most up-to-date data you
 can get.

 Greetings,
 Olaf

Re: The Power of Virtuoso Sponger Technology

2009-10-18 Thread Giovanni Tummarello

I agree wihtt this, a combination of the 2, without into unrealistic
services descriptions, is exactly its the question.

its great to be talking about this.

I'd be gladly have a chat about all this at ISWC for those who are there?

Cheers
Giovanni


On Sun, Oct 18, 2009 at 8:37 AM, Martin Hepp (UniBW)
martin.h...@ebusiness-unibw.org wrote:
 Guys,
 the Web of Data cannot rely on mass data crawling of the whole Web but must
 combine cached data with federated on-demand queries. Structured data
 requires much faster update cycles than typical text-based Web indices. For
 example, Google and Yahoo can rely on the fact that http://www.cnn.com; is
 relevant for news. Such will not change within minutes. And both Google
 and Yahoo need up to several weeks to visit your page again.

 When it comes to structured price and availability information, your data
 may become outdated within hours, if not seconds. Think of eBay auctions,
 hotel or flight availability, etc.

 So it will boil down to technology that combines (1) crawling and caching
 rather stable data sets with (2) distributing queries and parts of queries
 among the right SPARQL endpoints (whatever actual DB technology they
 expose).

 You can keep a text index of the whole Web, if crawling cycles in the order
 of magnitude of weeks are fine. For structured, linked data that exposes
 dynamic database content, dumb crawling and caching will not scale.

 If the DB technology is able to involve the right set of endpoints for parts
 of the query, why would you need a complete replication of all databases in
 the world inside one huge repository?

 That repository will be a million-node cluster anyway. Why not directly use
 the millions of nodes that provide the data and cache just the endpoint
 meta-data?

 Martin



 Giovanni Tummarello wrote:

 With respect to crawling and scraping or sponging or .. trying to
 guess based on partial fragments of structured information i can say
 3 thngs

 a) No, we're not doing it at the moment, we are only covering those
 who chose to put structured semantics. Some book stuff shows up in
 Sig.ma .. e.g. http://sig.ma/search?q=frank+van+harmelensources=100
 bookfinder, our jerome digital library installation, but the triplees
 they provide are scarce and dont contribute much.  It would take so
 little for this to improve on their side i believe.

 b) No, we are not religious about this. We have talked about it
 several times, it might make sense to try to understand as much as the
 web as possible and index it. Maybe we'll do it in the future for
 selected fractions of the web to show how it looks

 c) crawling should be just one mean of acquiring the semantic web. in
 case of bestbuy or other large retailers where prices change possibly
 everyday crawling as a mean to emulate a simple.. call to a web
 service seems really not the smart thing to do. Will data providers
 really support with data dumps?

 cheers
 Giovanni


 On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda juanfeder...@gmail.com
 wrote:


 But Sindice could at least crawl Amazon.
 It would be great to use sig.ma to create a meshup with the amazon data.


 Juan Sequeda, Ph.D Student
 Dept. of Computer Sciences
 The University of Texas at Austin
 www.juansequeda.com
 www.semanticwebaustin.org


 On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW)
 h...@ebusiness-unibw.org wrote:


 I don't think so, because this would require that Sindice crawled the
 whole regular web and checked the Spongers for each URL (sic!).

 Juan Sequeda wrote:

 Does Sindice crawl this (or any other semantic web search engines)?
 Juan Sequeda, Ph.D Student
 Dept. of Computer Sciences
 The University of Texas at Austin
 www.juansequeda.com
 www.semanticwebaustin.org


 On Sat, Oct 17, 2009 at 4:24 AM, Martin Hepp (UniBW) 
 h...@ebusiness-unibw.org wrote:



 Dear all:

 I just found out that the Virtuoso Sponger technology is even more
 powerful than I thought.

 Briefly: Spongers create rich GoodRelations (and other RDF) meta-data
 for existing Web pages on-the-fly. Other than traditional
 screen-scraping approaches, Spongers reuse public APIs and other
 techniques, so the data is of unprecedented degree of structure.

 Now, this can be directly used in arbitrary queries... by simply using
 the URI of the *existing* HTML Web page in the FROM clause of a SPARQL
 query.

 Example:




 http://www.amazon.com/Semantic-Web-Real-World-Applications-Industry/dp/0387485309

 is a Web page in plain HTML offering a book. Amazon does not yet produce
 GoodRelations meta-data on their pages.

 If you go to

http://uriburner.com/sparql

 and paste the URI in the Default Graph URI  field and select Retrieve
 remote RDF for all missing source graphs, then a query like

   SELECT * WHERE {?s ?p ?o} LIMIT 50

 returns a fully-fledged GoodRelations description for that page - as if
 Amazon was already supporting GoodRelations for each of its  4 million
 items!

 There are spongers for BestBuy, eBay

Re: The Power of Virtuoso Sponger Technology

2009-10-18 Thread Giovanni Tummarello

 A) The wrapper's Semantic Sitemap points you at the original Sitemap, and
 says how it is doing the wrapping. And because you know how the wrapper is
 behaving, you can process the standard Sitemap to get the information you
 want about what the wrapping site provides.
 Actually, the slicing in the current spec is something similar to this -
 my Linked Data site is a wrapper around my SPARQL endpoint, and I provide a
 description of this along with dumps of the contents of the RDF store.


i get it. The problem here is the automation. This would effectively
mean Sindice fetching takes order from a site (site A)  to go and
fetch some third party site (site B) and index it the way site A says.
Seems scary :/ but yes no work for site A to do really


 B) Another way is for the wrapper to actually process the Sitemap and data
 dumps to produce a Semantic Sitemap and RDF dumps. Really wrapping the whole
 site, not just the data. This would require no extra facilities at the
 Sindice end.

This is better under a security/trust/provenence ... site A fetches
the content of site B (lets use the term fecth instead of crawl to
indicate a bunch of sitemal URLs to be fetch, but they can easily be
hundreds of thousands, so a several day job) , then wraps it creates a
nice dump and i am happy.

... this is good but seems to a) require a lot of job for site A, the
reward is not that clear, b) puts site A in some for of repsonsibility
for republishing data of site B (without having a large automatic
service like a search engine)


this is still about fetching all and not about integrating some form
of service description (as martin suggests) (note that i am SURE we
necessarely have to integrate services, but it would seem logical
afterall, yet somehow very different from what we have so far been
considering, data explicitly published.



is it possible to come up with a super light service description that
would allow me to simply understand when the service needs to be
invoked to possibly answer a query?

Maybe something in the middle?like products descriptions in RDF and
then a special node for the price that says see service here?  or
seeupdated price list here ?

 if so i could index such descriptions and when somebody asks me i could say

a) these are the answers i know alrady
b) these services claim to be able to give you some additional answer
or (probably better) i do the calling for you in parallel cached mode,
sort the result and return it with the provenence indication etc ?

Giovanni

Re: The Power of Virtuoso Sponger Technology

2009-10-17 Thread Giovanni Tummarello

With respect to crawling and scraping or sponging or .. trying to
guess based on partial fragments of structured information i can say
3 thngs

a) No, we're not doing it at the moment, we are only covering those
who chose to put structured semantics. Some book stuff shows up in
Sig.ma .. e.g. http://sig.ma/search?q=frank+van+harmelensources=100
bookfinder, our jerome digital library installation, but the triplees
they provide are scarce and dont contribute much. It would take so
little for this to improve on their side i believe.

b) No, we are not religious about this. We have talked about it
several times, it might make sense to try to understand as much as the
web as possible and index it. Maybe we'll do it in the future for
selected fractions of the web to show how it looks

c) crawling should be just one mean of acquiring the semantic web. in
case of bestbuy or other large retailers where prices change possibly
everyday crawling as a mean to emulate a simple.. call to a web
service seems really not the smart thing to do. Will data providers
really support with data dumps?

cheers
Giovanni

On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda juanfeder...@gmail.com wrote:
But Sindice could at least crawl Amazon.
It would be great to use sig.ma to create a meshup with the amazon data.

Juan Sequeda, Ph.D Student
Dept. of Computer Sciences
The University of Texas at Austin
www.juansequeda.com
www.semanticwebaustin.org

On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW)
h...@ebusiness-unibw.org wrote:

I don't think so, because this would require that Sindice crawled the
whole regular web and checked the Spongers for each URL (sic!).

Juan Sequeda wrote:

Does Sindice crawl this (or any other semantic web search engines)?
Juan Sequeda, Ph.D Student
Dept. of Computer Sciences
The University of Texas at Austin
www.juansequeda.com
www.semanticwebaustin.org

On Sat, Oct 17, 2009 at 4:24 AM, Martin Hepp (UniBW)
h...@ebusiness-unibw.org wrote:

Dear all:

I just found out that the Virtuoso Sponger technology is even more
powerful than I thought.

Briefly: Spongers create rich GoodRelations (and other RDF) meta-data
for existing Web pages on-the-fly. Other than traditional
screen-scraping approaches, Spongers reuse public APIs and other
techniques, so the data is of unprecedented degree of structure.

Now, this can be directly used in arbitrary queries... by simply using
the URI of the *existing* HTML Web page in the FROM clause of a SPARQL
query.

Example:

http://www.amazon.com/Semantic-Web-Real-World-Applications-Industry/dp/0387485309

is a Web page in plain HTML offering a book. Amazon does not yet produce
GoodRelations meta-data on their pages.

If you go to

http://uriburner.com/sparql

and paste the URI in the Default Graph URI field and select Retrieve
remote RDF for all missing source graphs, then a query like

SELECT * WHERE {?s ?p ?o} LIMIT 50

returns a fully-fledged GoodRelations description for that page - as if
Amazon was already supporting GoodRelations for each of its 4 million
items!

There are spongers for BestBuy, eBay, Zillow, and many other types of
resources.

Wow!

Congrats to Kingsley and his team!

Best wishes

Martin Hepp

--
--
martin hepp
e-business web science research group
universitaet der bundeswehr muenchen

e-mail: h...@ebusiness-unibw.org
phone: +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
http://www.heppnetz.de/ (personal)
skype: mfhepp
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=

Webcast:
http://www.heppnetz.de/projects/goodrelations/webcast/

Recipe for Yahoo SearchMonkey:
http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey

Talk at the Semantic Technology Conference 2009:
Semantic Web-based E-Commerce: The GoodRelations Ontology

http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287

Overview article on Semantic Universe:

http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html

Project page:
http://purl.org/goodrelations/

Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations

Tutorial materials:
CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey

http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709

--
--
martin hepp
e-business web science research group
universitaet der bundeswehr muenchen

e-mail: h...@ebusiness-unibw.org
phone: +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)

Re: Mirror for PIPS food ontology?

2009-10-15 Thread Giovanni Tummarello

Kind of make me thing.. we could put it virtually back in the same
place as originally on our Sindice cache [1]

i wonder if the operation make sense.. on the one hand a chace is
usually intended for reflecting reality on the other i'd see obvious
practical advantages.

Maybe we could offer an archive service, parallel to the cache?

just trying to figure out what can help developers, ideas?

Giovanni

[1] http://sindice.com/developers/cacheapi

On Thu, Oct 15, 2009 at 12:15 PM, Alexander Garcia Castro
alexgarc...@gmail.com wrote:
 You can find them at:
 http://babel.informatik.uni-bremen.de/ontologies
 this is our Onto repository, built upon BioPortal technology. cheers.

 On Wed, Oct 14, 2009 at 3:25 PM, Michael Haas l...@laga.ath.cx wrote:

 Hello everyone,


 I hope it is not inappropriate to ask this question here.

 Does anyone have a copy of the PIPS food ontology?

 The original link at http://www.csc.liv.ac.uk/~jcantais/PIPSFood.owl is
 dead.


 Regards,


 Michael





 --
 Alexander Garcia
 http://www.alexandergarcia.name/
 http://www.usefilm.com/photographer/75943.html
 http://www.linkedin.com/in/alexgarciac
 Postal address:
 Alexander Garcia, Tel.: +49 421 218 64211
 Universität Bremen
 Enrique-Schmidt-Str. 5
 D-28359 Bremen

1 2 >

1 - 100 of 157 matches

Mail list logo