Re: Linked Data must die. (was: Linked Data and a new Browser API event)
I'd definitely like to keep the implementation of whatever formats we use in Gaia given that this is still an experimental feature and the use cases are likely to evolve as we get user feedback. It seems to me that given that our use case here, beyond OG, is only our internal content, I.e. Gaia. So effectively we can choose whatever format here we want as it has no effect on web content. Given that, I'd definitely optimize for simplicity and simply extend OG. If we want those extensions to not leak to the rest of the web we can simply make the system app only honor those tags in Gaia content. / Jonas On Mon, Jun 29, 2015 at 2:47 PM, Benjamin Francis bfran...@mozilla.com wrote: Thanks for the responses, Let me reiterate the Product requirements: 1. Support for a syntax and vocabulary already in wide use on the web to allow the creation of cards for the largest possible volume of existing pinnable content 2. Support for a syntax with a large enough and/or extensible vocabulary to allow cards to be created for all the types of pinnable content and associated actions we need in Gaia We need to deliver this by B2G 2.5 FL in September. *Existing Web Content* I think we're agreed that Open Graph gives us enough of a minimum viable product for the first requirement. However, it's not OK to just hard code particular og types into Gecko, we need to be able to experiment with cards for lots of different Open Graph types without having to modify Gecko every time (imagine system app addons with experimental card packs). Open Graph is just meta tags and we already have a mechanism for detecting specific meta tags in Gaia - the metachange event on the Browser API. As a minimum all we need to do to access Open Graph meta tags is to extend this event to include all meta tags with a property attribute, which is only used by Open Graph. We could go a step further and extend the event to all meta tags, which would also give us access to Twitter card markup for example, but that isn't essential. We do not need an RDFa parser for this, we can filter/clean up the data in the system app in Gaia where necessary (the system app is widely regarded to be part of the platform itself). *Gaia Content* Open Graph does not have a large enough vocabulary, or (as Kelly says) the ability to associate actions with content, needed for the second requirement. Schema.org has a large existing vocabulary which basically fulfils these use cases, though some parts are more tested than others, with examples given in Microdata, RDFa and JSON-LD syntaxes, eg: - Contact - http://schema.org/Person - Event - http://schema.org/Event - Photo - http://schema.org/Photograph - Song - http://schema.org/MusicRecording - Video - http://schema.org/VideoObject - Radio station - http://schema.org/RadioChannel - Email - http://schema.org/EmailMessage - Message - http://schema.org/Comment Schema.org also provides existing schemas for actions associated with items (https://schema.org/docs/actions.html), although examples are only given in JSON-LD syntax. Schema.org is just a vocabulary and Tantek tells me it's theoretically possible to express this vocabulary in Microformats syntax too - it's possible to create new vendor prefixed types, or suggest new standard types to be added to the Microformats wiki. This would be required because Microformats does not have a big enough existing vocabulary for Gaia's needs. Microdata, RDFa and JSON-LD use URL namespaces so are extensible by design with a non-centralised vocabulary (this is seen as a strength by some, as a weakness by others). The data we have [1][2][3][4] shows that Microdata, then RDFa (sometimes considered to include Open Graph), is used by the most pinnable content on the web, but the data does not include all modern Microformats. We also don't have any data for JSON-LD usage. However, existing usage is not the most important criteria for the second requirement, it's how well it fits the more complex use cases in Gaia (and how much work it is to implement). There is resistance to implementing a full Microdata or RDFa parser in Gecko due to its complexity. JSON-LD is more self-contained by design (for better or worse) and could be handed over to the Gaia system app directly via the Browser API without any parsing in Gecko. Microformats is possibly less Gecko work to implement than Microdata or RDFa, but more than JSON-LD. *Conclusions* My conclusion is that the least required work in Gecko for the highest return would be: 1. *Open Graph* (bug 1178484) - Extending the existing metachange Browser API event to include all meta tags with a property attribute. This would allow Gaia to add support for all of the Open Graph types, fulfilling requirement 1. 2. *JSON-LD* (bug 1178491) - Adding a linkeddatachange event to the Browser API which is dispatched by Gecko whenever it
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On Thu, Jul 2, 2015 at 4:37 AM, Tantek Çelik tan...@cs.stanford.edu wrote: Schema.org also provides existing schemas for actions associated with items (https://schema.org/docs/actions.html), ... Currently the IndieWeb community is pursuing Web Actions (and has them working across sites) http://indiewebcamp.com/webactions TL;DR WebActions, as presented in [1], are not sufficiently well developed for us to base an implementation upon. With lots of additional work, they could one day form the basis of an implementation, but, as a target for FirefoxOS 2.5, they are simply not there yet. !(TL;DR) I am uneasy about going in too much detail about each point as I feel that doing so will be a waste of my time and yours. So, I'll try to keep it short. The WebActions referred to in [1] have many problems which need to be addressed before they enter general usage: - They are not well defined - They are not well defined enough to compute over - There is no well defined means of extension - There is no active community - There is no means to specify action parameters - The vocabulary of current actions is not sufficient to do anything now - ... They are not well defined - The main indie-action tag is used to wrap any third party/silo action buttons/links. What exactly is a third party action? Are schema.org actions supported? If so, how does the schema.org target attribute interact with the indie-action with attribute? If schema.org actions are not supported, what third party actions are? Explicitly, how do the URL templates, say, of such unspecified third party actions interact with the indie-action with attribute?Basically, this document[1] needs much work before it can be said to define an indie-action. They are not well defined enough to compute over - Assume a dailer web app presents a indie-action tag corresponding to a dial action. (This is a simple use case that we have to be able to handle. This is not an edge case.) Assume further that there was a well defined means of adding actions so such an action could even exist. Does the indie-action tag for the dial action contain a URL for every possible number it can dial? (The description of [1] never uses anything like URL templates. So, this seems to imply that only non-template URL's are allowed.) Assuming that this was a mistake in the description of [1] and URL templates are allowed, how does one specify the type of a URL template argument. In other words, could I pass fldska as a telephone number? This is never touched uponAgain, it's obvious here that much work is needed before the document[1] can be said to define an indie-action that is able to be computed over. There is no well defined means of extension - How do I add a new action? This is not mentioned. (There is mention made of one possible new verb tip[2], but no detail is given on its meaning. It's only stated tip - for Flattr, Gittip buttons and maybe other payment providers.) Mention is made that we can create a common verb registry like the rel registry, but no registry is ever presented nor is the means to register in such a registry if it even existed. Again, this needs lots of work which hasn't been done. There is no active community - The last entry in the History section of [1] is from 2012. In contrast, the last entry in the schema.org github repository[3] is from yesterday. There is no means to specify action parameters - There is no mention of how an action is parameterized. Again, a simple example. For a dailer web app that exposes the dialing web action to dial an arbitrary phone number. How is this dail action exposed with WebActions? There is no specification or discussion of how this would occur. We can't possibly have a fixed URL for all possible numbers; there has to be some type of URL template that one can specify. There is no mention made of this. Again, this needs work. The vocabulary of current actions is not sufficient to do anything now - The only actions that exist now are post, reply, repost, and like (and maybe tip), and there is no official means to add new actions. So, what if we wanted a dial action? Currently this is impossible. With the current limited vocabulary and no means to add new actions, WebActions is a non-starter for my use cases and those of Taipei. TL;DR: WebActions, as presented in [1], are not sufficiently well developed for us to base an implementation upon. With lots of additional work, they could one day form the basis of an implementation, but, as a target for FirefoxOS 2.5, they are simply not there yet. [1] http://indiewebcamp.com/webactions [2] http://indiewebcamp.com/webactions-verbs-brainstorming [3] https://github.com/schemaorg/schemaorg/commits/sdo-ganymede -- Kelly Davis Bringing a voice to Firefox OS ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On 2 July 2015 at 03:37, Tantek Çelik tan...@cs.stanford.edu wrote: tl;dr: It's time. Let's land microformats parsing support in Gecko as a Q3 Platform deliverable that Gaia can use. Happy to hear this! I think there's rough consensus that a subset of OG, as described by Ted, satisfies this. Minimizing our exposure to OG (including Twitter Cards) is ideal for a number of reasons (backcompat/proprietary maintenance etc.). That's certainly a good start. It seems a shame to intentionally filter out all the extra meta tags used by other Open Graph types like: - music.song - music.album - music.playlist - music.radio_station - video.movie - video.episode - video.tv_show - article - book - profile - business - fitness.course - game.achievement - place - product - restaurant.menu I envisage allowing the community to contribute addons to add extra experimental card packs for types we don't support out of the box from day one. Filtering out this data would make it very difficult for them to do that, for no good reason. I absolutely understand the argument about having to maintain backwards compatibility with a format if we don't want to promote it going forward though, which is why I agree we should be conservative when adding built-in Open Graph types. There appear to be multiple options for this, with the best (most open, aligned with our mission, already open source interoperably implemented, etc.) being microformats. That is your opinion. There may be things you don't like about JSON-LD for example, but it is a W3C Recommendation created through a standards body and has open source implementations in just as many languages as Microformats. There may be other more subjective measures of open you're talking about, but I think it would be better for us all to stick to arguments about technical merit and adoption statistics when making comparisons in this case, at the risk of falling into the Not Invented Here trap. fulfils mostly in theory. Schema is 99% overdesigned and aspirational, most objects and properties not showing up anywhere even in search results (except generic testing tools perhaps). A small handful of Schema objects and subset of properties are actually implemented by anyone in anything user-facing. As I mentioned, level of current usage is not the most important criteria for Gaia's own requirements, but if we're talking about how proven these schemas are, according to schema.org these are the number of domains which use the schemas we're talking about: - Person - over 1,000,000 domains - Event - 100,000 - 250,000 domains - ImageObject - over 1,000,000 domains - AudioObject - 10,000 - 50,000 domains - VideoObject - 100,000 - 200,000 domains - RadioChannel - fewer than 10 domains - EmailMessage - 100 - 1000 domains - Comment - 10,000 - 50,000 domains The only equivalent data I have for Microformats is for hCard (equivalent to the Person schema) from a crawl at the end of last year [1], and it has about the same usage: - hCard - 1,095,517 domains The data also shows that Microdata and RDFa are used on more pages per domain than Microformats. I'd say that Microformats looks at best equally as unproven on that basis, though I'm open to new data. Everything else is untested, and claiming fulfils these use cases puts far too much faith in a company known for abandoning their overdesigned efforts (APIs, vocabularies, syntaxes!) every few years. Google Base / gData / etc. likely fulfilled these use cases too. Our Gecko and Gaia code is not going to stop working if Google decides to use something else. Content authors on the wider web might migrate to newer vocabularies (or even syntaxes) over time, but that's something we're going to have to monitor on an ongoing basis anyway. Existing interoperably implemented microformats support most of these: - Contact - http://microformats.org/wiki/h-card - Event - http://microformats.org/wiki/h-event - Photo - http://microformats.org/wiki/h-entry with u-photo property - Song - no current vocabulary - classic hAudio vocabulary could be simplified for this - Video - http://microformats.org/wiki/h-entry with u-video property - Radio station - no current vocabulary - worth researching with schema RadioChannel as input - Email - http://microformats.org/wiki/h-entry with u-in-reply-to property - Message - http://microformats.org/wiki/h-entry OK, so there are actually three Microformats that are useful to us here. For photos, videos, emails and messages we have to re-use the same hEntry Microformat and try to figure out from its properties which type of thing it is. For song and radio station we'd need to invent something new. This is not very attractive for Firefox OS where we'd like to have cleary defined types of cards with different card templates. It also makes it harder for the community to create new types of cards (e.g. via addons)
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
This thread has been fun to follow. There are only 2 hard problems in Comp Sci and naming things is one of them ;). Just wanted to quickly chip in: during our lively discussion about naming, let’s not forget Postel’s Law. It’s smart to debate which format we should encourage for _publishing_. It’s wise to be liberal in what formats we _accept_. So we can encourage developers to use the solution we think is best, while simultaneously falling back to anything reasonable that’s there. og:x, twitter:y, Microformats... if it’s being actively used on the web we would be silly to turn up our nose at good data! --- Gordon Brander Sr Design Strategist Mozilla On July 2, 2015 at 10:59:15 , Benjamin Francis (bfran...@mozilla.com) wrote: On 2 July 2015 at 03:37, Tantek Çelik wrote: tl;dr: It's time. Let's land microformats parsing support in Gecko as a Q3 Platform deliverable that Gaia can use. Happy to hear this! I think there's rough consensus that a subset of OG, as described by Ted, satisfies this. Minimizing our exposure to OG (including Twitter Cards) is ideal for a number of reasons (backcompat/proprietary maintenance etc.). That's certainly a good start. It seems a shame to intentionally filter out all the extra meta tags used by other Open Graph types like: - music.song - music.album - music.playlist - music.radio_station - video.movie - video.episode - video.tv_show - article - book - profile - business - fitness.course - game.achievement - place - product - restaurant.menu I envisage allowing the community to contribute addons to add extra experimental card packs for types we don't support out of the box from day one. Filtering out this data would make it very difficult for them to do that, for no good reason. I absolutely understand the argument about having to maintain backwards compatibility with a format if we don't want to promote it going forward though, which is why I agree we should be conservative when adding built-in Open Graph types. There appear to be multiple options for this, with the best (most open, aligned with our mission, already open source interoperably implemented, etc.) being microformats. That is your opinion. There may be things you don't like about JSON-LD for example, but it is a W3C Recommendation created through a standards body and has open source implementations in just as many languages as Microformats. There may be other more subjective measures of open you're talking about, but I think it would be better for us all to stick to arguments about technical merit and adoption statistics when making comparisons in this case, at the risk of falling into the Not Invented Here trap. fulfils mostly in theory. Schema is 99% overdesigned and aspirational, most objects and properties not showing up anywhere even in search results (except generic testing tools perhaps). A small handful of Schema objects and subset of properties are actually implemented by anyone in anything user-facing. As I mentioned, level of current usage is not the most important criteria for Gaia's own requirements, but if we're talking about how proven these schemas are, according to schema.org these are the number of domains which use the schemas we're talking about: - Person - over 1,000,000 domains - Event - 100,000 - 250,000 domains - ImageObject - over 1,000,000 domains - AudioObject - 10,000 - 50,000 domains - VideoObject - 100,000 - 200,000 domains - RadioChannel - fewer than 10 domains - EmailMessage - 100 - 1000 domains - Comment - 10,000 - 50,000 domains The only equivalent data I have for Microformats is for hCard (equivalent to the Person schema) from a crawl at the end of last year [1], and it has about the same usage: - hCard - 1,095,517 domains The data also shows that Microdata and RDFa are used on more pages per domain than Microformats. I'd say that Microformats looks at best equally as unproven on that basis, though I'm open to new data. Everything else is untested, and claiming fulfils these use cases puts far too much faith in a company known for abandoning their overdesigned efforts (APIs, vocabularies, syntaxes!) every few years. Google Base / gData / etc. likely fulfilled these use cases too. Our Gecko and Gaia code is not going to stop working if Google decides to use something else. Content authors on the wider web might migrate to newer vocabularies (or even syntaxes) over time, but that's something we're going to have to monitor on an ongoing basis anyway. Existing interoperably implemented microformats support most of these: - Contact - http://microformats.org/wiki/h-card - Event - http://microformats.org/wiki/h-event - Photo - http://microformats.org/wiki/h-entry with u-photo property - Song - no current vocabulary - classic hAudio vocabulary could be simplified for this - Video -
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On Thu, Jul 2, 2015 at 11:47 AM, Gordon Brander gbran...@mozilla.com wrote: This thread has been fun to follow. There are only 2 hard problems in Comp Sci and naming things is one of them ;). Just wanted to quickly chip in: during our lively discussion about naming, let’s not forget Postel’s Law. It’s smart to debate which format we should encourage for _publishing_. It’s wise to be liberal in what formats we _accept_ Hmm... I'm not sure Postel was really referring to this kind of case so much as about specification compliance. In any case, I think there's an argument to be made that supporting a lot of format is not a good thing. See also:: http://datatracker.ietf.org/doc/draft-thomson-postel-was-wrong/ -Ekr So we can encourage developers to use the solution we think is best, while simultaneously falling back to anything reasonable that’s there. og:x, twitter:y, Microformats... if it’s being actively used on the web we would be silly to turn up our nose at good data! --- Gordon Brander Sr Design Strategist Mozilla On July 2, 2015 at 10:59:15 , Benjamin Francis (bfran...@mozilla.com) wrote: On 2 July 2015 at 03:37, Tantek Çelik wrote: tl;dr: It's time. Let's land microformats parsing support in Gecko as a Q3 Platform deliverable that Gaia can use. Happy to hear this! I think there's rough consensus that a subset of OG, as described by Ted, satisfies this. Minimizing our exposure to OG (including Twitter Cards) is ideal for a number of reasons (backcompat/proprietary maintenance etc.). That's certainly a good start. It seems a shame to intentionally filter out all the extra meta tags used by other Open Graph types like: - music.song - music.album - music.playlist - music.radio_station - video.movie - video.episode - video.tv_show - article - book - profile - business - fitness.course - game.achievement - place - product - restaurant.menu I envisage allowing the community to contribute addons to add extra experimental card packs for types we don't support out of the box from day one. Filtering out this data would make it very difficult for them to do that, for no good reason. I absolutely understand the argument about having to maintain backwards compatibility with a format if we don't want to promote it going forward though, which is why I agree we should be conservative when adding built-in Open Graph types. There appear to be multiple options for this, with the best (most open, aligned with our mission, already open source interoperably implemented, etc.) being microformats. That is your opinion. There may be things you don't like about JSON-LD for example, but it is a W3C Recommendation created through a standards body and has open source implementations in just as many languages as Microformats. There may be other more subjective measures of open you're talking about, but I think it would be better for us all to stick to arguments about technical merit and adoption statistics when making comparisons in this case, at the risk of falling into the Not Invented Here trap. fulfils mostly in theory. Schema is 99% overdesigned and aspirational, most objects and properties not showing up anywhere even in search results (except generic testing tools perhaps). A small handful of Schema objects and subset of properties are actually implemented by anyone in anything user-facing. As I mentioned, level of current usage is not the most important criteria for Gaia's own requirements, but if we're talking about how proven these schemas are, according to schema.org these are the number of domains which use the schemas we're talking about: - Person - over 1,000,000 domains - Event - 100,000 - 250,000 domains - ImageObject - over 1,000,000 domains - AudioObject - 10,000 - 50,000 domains - VideoObject - 100,000 - 200,000 domains - RadioChannel - fewer than 10 domains - EmailMessage - 100 - 1000 domains - Comment - 10,000 - 50,000 domains The only equivalent data I have for Microformats is for hCard (equivalent to the Person schema) from a crawl at the end of last year [1], and it has about the same usage: - hCard - 1,095,517 domains The data also shows that Microdata and RDFa are used on more pages per domain than Microformats. I'd say that Microformats looks at best equally as unproven on that basis, though I'm open to new data. Everything else is untested, and claiming fulfils these use cases puts far too much faith in a company known for abandoning their overdesigned efforts (APIs, vocabularies, syntaxes!) every few years. Google Base / gData / etc. likely fulfilled these use cases too. Our Gecko and Gaia code is not going to stop working if Google decides to use something else. Content authors on the wider web might migrate to newer vocabularies (or even
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
This. I don't want to lose Jonas' point in this long thread, but I also haven't read anything here that warrants new native parser(s) yet. Let's iterate in Gaia for now. I don't see how a C++ metadata parser is advantageous at this point, and the RDF history lessons certainly don't encourage that path. --Jet On Wed, Jul 1, 2015 at 11:11 PM, Jonas Sicking jo...@sicking.cc wrote: I'd definitely like to keep the implementation of whatever formats we use in Gaia given that this is still an experimental feature and the use cases are likely to evolve as we get user feedback.reiterate the ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On Wed, Jul 1, 2015 at 7:37 PM, Tantek Çelik tan...@cs.stanford.edu wrote: There *is* a pretty strong engineering consensus, in both this thread, and other threads *against* any use of JSON-LD, or anything Linked Data or otherwise rebranded RDF / Semantic Web, and for good reason. Indeed, just a few days ago bsmedberg -- the sole RDF module peer -- said (in https://bugzilla.mozilla.org/show_bug.cgi?id=1176160#c5): I'm hoping to just rm -rf rdf/ one of these days anyway. See also https://bugzilla.mozilla.org/show_bug.cgi?id=833098 (Kick RDF out of Firefox) and https://bugzilla.mozilla.org/show_bug.cgi?id=420506 (Remove RDF use from Thunderbird). Nick ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
Great discussion and feedback in this thread - plenty to act on. Thanks Ted Clancy for kicking this off with an impassioned reality check. And Thanks in particular to Benjamin Francis for summarizing product requirements and use-cases, and especially to both Ted and Ben taking the time last week in Whistler to discuss all of this in person - I definitely came away with a better understanding of the data, problem space, and perspectives for Gaia's use-cases. I've also followed up with Gregor and jst to broaden and double-check my understanding and possible paths forward. tl;dr: It's time. Let's land microformats parsing support in Gecko as a Q3 Platform deliverable that Gaia can use. Specifically: On Mon, Jun 29, 2015 at 2:47 PM, Benjamin Francis bfran...@mozilla.com wrote: Thanks for the responses, Let me reiterate the Product requirements: 1. Support for a syntax and vocabulary already in wide use on the web to allow the creation of cards for the largest possible volume of existing pinnable content I think there's rough consensus that a subset of OG, as described by Ted, satisfies this. Minimizing our exposure to OG (including Twitter Cards) is ideal for a number of reasons (backcompat/proprietary maintenance etc.). 2. Support for a syntax with a large enough and/or extensible vocabulary to allow cards to be created for all the types of pinnable content and associated actions we need in Gaia There appear to be multiple options for this, with the best (most open, aligned with our mission, already open source interoperably implemented, etc.) being microformats. On that in particular: *Gaia Content* Open Graph does not have a large enough vocabulary, or (as Kelly says) the ability to associate actions with content, needed for the second requirement The associate actions with content use-case is an interesting one that's worthy of more specific follow-up on Kelly's response. More on that separately. Schema.org has a large existing vocabulary which basically fulfils these use cases, though some parts are more tested than others, fulfils mostly in theory. Schema is 99% overdesigned and aspirational, most objects and properties not showing up anywhere even in search results (except generic testing tools perhaps). A small handful of Schema objects and subset of properties are actually implemented by anyone in anything user-facing. Everything else is untested, and claiming fulfils these use cases puts far too much faith in a company known for abandoning their overdesigned efforts (APIs, vocabularies, syntaxes!) every few years. Google Base / gData / etc. likely fulfilled these use cases too. with examples given in Microdata, RDFa and JSON-LD syntaxes, eg: - Contact - http://schema.org/Person - Event - http://schema.org/Event - Photo - http://schema.org/Photograph - Song - http://schema.org/MusicRecording - Video - http://schema.org/VideoObject - Radio station - http://schema.org/RadioChannel - Email - http://schema.org/EmailMessage - Message - http://schema.org/Comment This explicit list of use-cases is very helpful. Existing interoperably implemented microformats support most of these: - Contact - http://microformats.org/wiki/h-card - Event - http://microformats.org/wiki/h-event - Photo - http://microformats.org/wiki/h-entry with u-photo property - Song - no current vocabulary - classic hAudio vocabulary could be simplified for this - Video - http://microformats.org/wiki/h-entry with u-video property - Radio station - no current vocabulary - worth researching with schema RadioChannel as input - Email - http://microformats.org/wiki/h-entry with u-in-reply-to property - Message - http://microformats.org/wiki/h-entry For Song and Radio Station in particular - I will take the action of bringing these use-cases to the microformats community and see what the community can come up with, and how quickly. Discussion will be on #microformats on Freenode (archived, see microformats.org/wiki/irc) if anyone wants to contribute or just lurk. Schema.org also provides existing schemas for actions associated with items (https://schema.org/docs/actions.html), The actions space has been a difficult and challenging one. Google's (abandoned) web intents was one such effort. Currently the IndieWeb community is pursuing Web Actions (and has them working across sites) http://indiewebcamp.com/webactions There's likely potential there to connect webactions to be part of the format of the post/page to be parsed, consumed, re-used. Again, this is something I'll take to the #microformats community and we can see what people there come up with. although examples are only given in JSON-LD syntax. Schema.org is just a vocabulary and Tantek tells me it's theoretically possible to express this vocabulary in Microformats syntax too - it's possible to create new vendor prefixed types, or suggest new standard types to be added to the
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
Let me start by saying I don't care which format we use. (Formats come, and formats go.) I do care, however, that my use case is supported. My use case, speech enabling web apps and web pages for Firefox OS's voice assistant Vaani, requires that the chosen format support something akin to schema.org's actions[1] as well as the ability for anyone to add custom actions. This use case is also required by the Taipei team working on the Firefox OS TV. Open Graph[2] does not support such actions. Thus, it is not sufficient for our use case. (Facebook extended Open Graph with actions[3]. However, the set of valid actions is completely under Facebook's control which makes their Open Graph extension a non-starter.) Microdata[4], RDFa[5], and JSON-LD[6] do support actions. Hence, support for at least one of these is sufficient for our use case. Microformats[7] currently does not support actions. Hence, it is not sufficient for our use case. The Vaani team and the Taipei team working on the Firefox OS TV would love to base our work on that being done for pinning the web. (One of the 3 virtues of a programmer *is* laziness.) However, if neither Microdata, RDFa, nor JSON-LD is supported, we will, unfortunately, be forced to go our own way. [1] http://schema.org/Action [2] http://ogp.me/ [3] https://developers.facebook.com/docs/sharing/opengraph/using-actions [4] http://www.w3.org/TR/microdata/ [5] http://www.w3.org/TR/xhtml-rdfa-primer/ [6] http://www.w3.org/TR/json-ld/ [7] http://microformats.org/wiki/Main_Page ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On June 27, 2015 at 10:02:47 AM, Anne van Kesteren (ann...@annevk.nl) wrote: The data I have does not back this up, Microdata is shown to be growing fast whereas Microformats usage has remained relatively stable. Also, we didn't find Microformats usage on any of the example high profile sites we used during prototyping, it seems to be more commonly used on Wordpress blogs and Indie Web style web sites. Could we see some examples of the cards you are generating already with existing data from the Web (from your prototype)? The value is really in seeing that users will get some real benefit, without expecting developers to add additional metadata to their sites. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On June 29, 2015 at 7:07:33 AM, Michael Henretty (mhenre...@mozilla.com) wrote: We will definitely start with the simple open graph stuff that Ted mentioned (og:title, og:type, og:url, og:image, og:description) since they are so widely used. And yes, even these simple ones are problematic. For instance, when navigating between dailymotion videos they keep the current meta tags, and just updates the html body content. In fact, single-page-apps in general are hard here. Also, on the mobile version of youtube they leave out og tags entirely, probably as a performance optimization. Turns out, many sites do this. So in 2.5 we will have to account for all of this and the solution might not be pretty. ok, it's good to see you've already started to encounter the issues. I think Microformats addresses the aforementioned problems. They might, though they can also change from under you in fun ways, or be invalid/incorrect. But if youtube, wikipedia, pinterest, twitter, facebook, tumblr, etc don't use them widely what is the point of supporting them in a moz-internal API? Let's be pragmatic and start with og. What's the next biggest win for us? Is the data clear? Ben seems to think JSON-LD [1], does anyone have data to the contrary? I don't have data, just some graying hair and warnings from the distant past [1]. You've all seen already how controversial these formats are, and hopefully you understand why now (expecting validity/sanity from the web is a non-starter - it's the fallacy of the semantic web, and why we mockingly call it the pedantic web and recoil in horror and lash out with rage at the mere mention of it). So flip the problem a bit: what you actually want is just simple data that can be transformed into a card, right? basically, we scrape some text values from a HTML page and you just put it into a different HTML document: the card. As long as you don't expect validity of that data (i.e., you don't expect a standards conforming JSON-LD, RDFa, microdata, microformat, whatever parser*) then that frees us to build some kind of HTML Scraper that is actually built for purpose (one that is fault tolerant, and basically doesn't give a crap what the RDFa or JSON-LD spec says, but is designed to aggressively find the data you need to build nice cards). This is also why I suggest you start with og: data, because it basically takes the same approach: it doesn't give a crap what the RDFa spec says (and neither do developers that add it to their pages, as I'm sure you've already seen), it just defines some things by using some HTML elements that kinda-sorta looks like RDFa. However, it comes with a ton of problems which you will have a great time trying to deal with as you build the pinned-sites feature. The same with Twitter's card format. At the end of the day, what Gecko should be passing back is a simple JS object that contains: { og: {... name/value pairs...} twitter: {... name/value pairs...} other_because_we_can_add_new_things_as_needed_yay: {... name/value pairs...} } If we are not going to be doing any semantic inferencing on that data or actually doing the linked data part, then we don't need a JSON-LD representation of it. We just need a fairly simple structure from which FxOS can build different cards. That avoids talk of supporting controversial formats like JSON-LD and RDFa, while actually supporting web content: in the sense that, we are just pulling this 'og' meta stuff from the page, we don't care what it is. My 2c, [1] Warning from 2003, that the same things happened with RSS. They had to abandon XML: http://www.xml.com/pub/a/2003/01/22/dive-into-xml.html I know, I know, this is how HTML got to be tag soup: browsers that never complained. Now the same thing is happening in the RSS world because the same social dynamics apply. End users who can't even spell XML certainly don't care about silly little formatting rules; they just want to follow their favorite sites in their news aggregator. When 10% of the world's RSS feeds are not well-formed -- including some high-profile feeds that thousands of people want to read -- the ability to parse ill-formed feeds becomes a competitive advantage. (And if you think the same thing won't happen when RDF and the Semantic Web go mainstream, you're deluding yourself. The same social dynamics apply. Boy, is that going to be messy.) ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On Saturday, June 27, 2015, Benjamin Francis bfran...@mozilla.com wrote: On 26 June 2015 at 19:25, Marcos Caceres mar...@marcosc.com javascript:_e(%7B%7D,'cvml','mar...@marcosc.com'); wrote: Could we see some examples of the cards you are generating already with existing data from the Web (from your prototype)? The value is really in seeing that users will get some real benefit, without expecting developers to add additional metadata to their sites. The prototype only supports Open Graph, you can see some example cards in this video Pinning the Web - Prototoype https://www.youtube.com/watch?v=FiLnRoRjD5k These look fantastic! so why not start with just those? Or are all those card types done and thoroughly tested on a good chunk of Web content? As I mentioned before, I'd be worried about the amount of error recovery code that will be needed just for those types of cards. (Sorry, I don't know any of the background and if you've already dealt with this). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
Thanks for the responses, Let me reiterate the Product requirements: 1. Support for a syntax and vocabulary already in wide use on the web to allow the creation of cards for the largest possible volume of existing pinnable content 2. Support for a syntax with a large enough and/or extensible vocabulary to allow cards to be created for all the types of pinnable content and associated actions we need in Gaia We need to deliver this by B2G 2.5 FL in September. *Existing Web Content* I think we're agreed that Open Graph gives us enough of a minimum viable product for the first requirement. However, it's not OK to just hard code particular og types into Gecko, we need to be able to experiment with cards for lots of different Open Graph types without having to modify Gecko every time (imagine system app addons with experimental card packs). Open Graph is just meta tags and we already have a mechanism for detecting specific meta tags in Gaia - the metachange event on the Browser API. As a minimum all we need to do to access Open Graph meta tags is to extend this event to include all meta tags with a property attribute, which is only used by Open Graph. We could go a step further and extend the event to all meta tags, which would also give us access to Twitter card markup for example, but that isn't essential. We do not need an RDFa parser for this, we can filter/clean up the data in the system app in Gaia where necessary (the system app is widely regarded to be part of the platform itself). *Gaia Content* Open Graph does not have a large enough vocabulary, or (as Kelly says) the ability to associate actions with content, needed for the second requirement. Schema.org has a large existing vocabulary which basically fulfils these use cases, though some parts are more tested than others, with examples given in Microdata, RDFa and JSON-LD syntaxes, eg: - Contact - http://schema.org/Person - Event - http://schema.org/Event - Photo - http://schema.org/Photograph - Song - http://schema.org/MusicRecording - Video - http://schema.org/VideoObject - Radio station - http://schema.org/RadioChannel - Email - http://schema.org/EmailMessage - Message - http://schema.org/Comment Schema.org also provides existing schemas for actions associated with items (https://schema.org/docs/actions.html), although examples are only given in JSON-LD syntax. Schema.org is just a vocabulary and Tantek tells me it's theoretically possible to express this vocabulary in Microformats syntax too - it's possible to create new vendor prefixed types, or suggest new standard types to be added to the Microformats wiki. This would be required because Microformats does not have a big enough existing vocabulary for Gaia's needs. Microdata, RDFa and JSON-LD use URL namespaces so are extensible by design with a non-centralised vocabulary (this is seen as a strength by some, as a weakness by others). The data we have [1][2][3][4] shows that Microdata, then RDFa (sometimes considered to include Open Graph), is used by the most pinnable content on the web, but the data does not include all modern Microformats. We also don't have any data for JSON-LD usage. However, existing usage is not the most important criteria for the second requirement, it's how well it fits the more complex use cases in Gaia (and how much work it is to implement). There is resistance to implementing a full Microdata or RDFa parser in Gecko due to its complexity. JSON-LD is more self-contained by design (for better or worse) and could be handed over to the Gaia system app directly via the Browser API without any parsing in Gecko. Microformats is possibly less Gecko work to implement than Microdata or RDFa, but more than JSON-LD. *Conclusions* My conclusion is that the least required work in Gecko for the highest return would be: 1. *Open Graph* (bug 1178484) - Extending the existing metachange Browser API event to include all meta tags with a property attribute. This would allow Gaia to add support for all of the Open Graph types, fulfilling requirement 1. 2. *JSON-LD* (bug 1178491) - Adding a linkeddatachange event to the Browser API which is dispatched by Gecko whenever it encounters a script tag with a type of application/ld+json (as per the W3C recommendation [5]), including the JSON content in the payload of the event. This would allow the Gaia system app to support existing schema.org schemas (including actions), with the least amount of work in Gecko, and already in a JSON format it can store directly in the Places database (DataStore/IndexedDB). Kan-Ru is the owner of the Browser API module in Gecko and has said he's happy with this approach and is happy to review the code. Let's go ahead with that now, unblocking the work on the Gaia side. (Note that I have no intention of building a full RDF style parser in Gaia, we'll just extract the data we need from the JSON, for the good reasons that
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On Sat, Jun 27, 2015 at 5:51 AM, Marcos Caceres mar...@marcosc.com wrote: These look fantastic! so why not start with just those? Or are all those card types done and thoroughly tested on a good chunk of Web content? As I mentioned before, I'd be worried about the amount of error recovery code that will be needed just for those types of cards. (Sorry, I don't know any of the background and if you've already dealt with this). We will definitely start with the simple open graph stuff that Ted mentioned (og:title, og:type, og:url, og:image, og:description) since they are so widely used. And yes, even these simple ones are problematic. For instance, when navigating between dailymotion videos they keep the current meta tags, and just updates the html body content. In fact, single-page-apps in general are hard here. Also, on the mobile version of youtube they leave out og tags entirely, probably as a performance optimization. Turns out, many sites do this. So in 2.5 we will have to account for all of this and the solution might not be pretty. I think Microformats addresses the aforementioned problems. But if youtube, wikipedia, pinterest, twitter, facebook, tumblr, etc don't use them widely what is the point of supporting them in a moz-internal API? Let's be pragmatic and start with og. What's the next biggest win for us? Is the data clear? Ben seems to think JSON-LD [1], does anyone have data to the contrary? 1.) https://groups.google.com/d/msg/mozilla.dev.platform/5sUoRTPDnSE/24ckuPSydjQJ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On 26 June 2015 at 19:25, Marcos Caceres mar...@marcosc.com wrote: Could we see some examples of the cards you are generating already with existing data from the Web (from your prototype)? The value is really in seeing that users will get some real benefit, without expecting developers to add additional metadata to their sites. The prototype only supports Open Graph, you can see some example cards in this video https://www.youtube.com/watch?v=FiLnRoRjD5k ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On 26 June 2015 at 12:58, Ted Clancy tcla...@mozilla.com wrote: My apologies for the fact that this is such an essay, but I think this has become necessary. Firefox OS 2.5 will be unveiling a new feature called Pinning The Web, and there's been some discussion about whether we should leverage technologies like RDFa, Microdata, JSON-LD, Open Graph, and Microformats for this purpose. First, I'd like to give some background on these technologies. In 2001, Tim Berners-Lee said that the Semantic Web was the future of the web and was going to revolutionize our world. ( http://www.scientificamerican.com/article/the-semantic-web/) The Semantic Web was a doomed idea, for reasons best articulated in essay by Cory Doctorow entitled Metacrap, also written in 2001. ( http://www.well.com/~doctorow/metacrap.htm) After 14 years of the Semantic Web not revolutionizing our world, I think history suggests that Cory Doctorow was right. But because the Semantic Web was the next big thing, millions of dollars were poured into it (mostly in the form of research grants and crappy specs, from what I can gather). In 2004, RDFa became the first big standard to emerge from this work. RDFa is a W3C Recommendation, and work is still proceeding on it. JSON-LD was started in 2008 as a JSON-based alternative to RDFa. As the author of JSON-LD, Manu Sporny, states: RDF is a shitty data model. It doesn’t have native support for lists. LISTS for fuck’s sake! [...] to work with RDF you typically needed a quad store, a SPARQL engine, and some hefty libraries. Your standard web developer has no interest in that toolchain because it adds more complexity to the solution than is necessary. ( http://manu.sporny.org/2014/json-ld-origins-2/) However, though it originally wanted to distance itself from RDFa, JSON-LD ended up being chosen as a serialization for RDFa: Around mid-2012, the JSON-LD stuff was going pretty well and the newly chartered RDF Working Group was going to start work on RDF 1.1. One of the work items was a serialization of RDF for JSON. [...] The biggest problem being that many of the participants in the RDF Working Group at the time didn’t understand JSON. (ibid) (I just want everyone to note that in 2012, *THE AUTHORS OF RDFa DID NOT KNOW JSON*. This is in a spec that casually throws around propositional logic terms like entails, and subject-predicate-object triples.) JSON-LD is now a W3C recommendation, and has undergone added complexity to align it with RDFa. As Manu Sporny states, Nobody was happy with the result (ibid). Microdata is similar to RDFa, but without the benefit of being a W3C recommendation. Open Graph is a technology developed by Facebook. It's putatively a subset of RDFa. There is a small subset of Open Graph tags (og:title, og:type, og:url, and og:image) which are widely used for sharing content on social media like Facebook and Twitter. RDFa, Microdata, and JSON-LD can collectively be described as Linked Data technologies, so called because their intention is that semantic objects across different web pages would link to each other to create a Semantic Web. Microformats was developed circa 2005 as a lightweight way of putting semantic information into web pages, but does not aim to be a Linked Data or Semantic Web technology. It does not have an official standards body behind it, instead being maintained by a community of volunteers. One of our Mozilla employees, Tantek Çelik, was instrumental in its development. Thanks for the history lesson :) When I started to research this area I learnt very quickly that there are a lot of strong feelings on all sides about which format is the best, and many formats claim to supersede each other. The reality is that there's still no clear winner on the web. So what I've tried to do is to take a data driven approach to look at which syntaxes and vocabularies are getting the most traction according to research papers based on the Common Crawl corpus, the Bing corpus and the Yahoo corpus (all the data I've found so far). There are two high level requirements for the Pin the Web features: 1) Getting the most possible user value out of the data that already exists on the web today 2) Finding the best solution for the use cases we have in Gaia apps which can be implemented in the time frame we have for the 2.5 release (Feature Landing on 21st September) Based on the data available and the level of effort of implementation my most recent conclusions for those requirements were: 1) Open Graph 2) JSON-LD However, there's also a case for bonus points for a solution that we as Mozilla actually want to see used in the future! Okay, now I'd like to discuss whether or not we should use these technologies for Pinning The Web. Open Graph: I think we need to use the four tags og:title, og:type, og:url and og:image, since they are widely used. Apart from that, I don't think we need to support the rest of
Linked Data must die. (was: Linked Data and a new Browser API event)
My apologies for the fact that this is such an essay, but I think this has become necessary. Firefox OS 2.5 will be unveiling a new feature called Pinning The Web, and there's been some discussion about whether we should leverage technologies like RDFa, Microdata, JSON-LD, Open Graph, and Microformats for this purpose. First, I'd like to give some background on these technologies. In 2001, Tim Berners-Lee said that the Semantic Web was the future of the web and was going to revolutionize our world. ( http://www.scientificamerican.com/article/the-semantic-web/) The Semantic Web was a doomed idea, for reasons best articulated in essay by Cory Doctorow entitled Metacrap, also written in 2001. ( http://www.well.com/~doctorow/metacrap.htm) After 14 years of the Semantic Web not revolutionizing our world, I think history suggests that Cory Doctorow was right. But because the Semantic Web was the next big thing, millions of dollars were poured into it (mostly in the form of research grants and crappy specs, from what I can gather). In 2004, RDFa became the first big standard to emerge from this work. RDFa is a W3C Recommendation, and work is still proceeding on it. JSON-LD was started in 2008 as a JSON-based alternative to RDFa. As the author of JSON-LD, Manu Sporny, states: RDF is a shitty data model. It doesn’t have native support for lists. LISTS for fuck’s sake! [...] to work with RDF you typically needed a quad store, a SPARQL engine, and some hefty libraries. Your standard web developer has no interest in that toolchain because it adds more complexity to the solution than is necessary. ( http://manu.sporny.org/2014/json-ld-origins-2/) However, though it originally wanted to distance itself from RDFa, JSON-LD ended up being chosen as a serialization for RDFa: Around mid-2012, the JSON-LD stuff was going pretty well and the newly chartered RDF Working Group was going to start work on RDF 1.1. One of the work items was a serialization of RDF for JSON. [...] The biggest problem being that many of the participants in the RDF Working Group at the time didn’t understand JSON. (ibid) (I just want everyone to note that in 2012, *THE AUTHORS OF RDFa DID NOT KNOW JSON*. This is in a spec that casually throws around propositional logic terms like entails, and subject-predicate-object triples.) JSON-LD is now a W3C recommendation, and has undergone added complexity to align it with RDFa. As Manu Sporny states, Nobody was happy with the result (ibid). Microdata is similar to RDFa, but without the benefit of being a W3C recommendation. Open Graph is a technology developed by Facebook. It's putatively a subset of RDFa. There is a small subset of Open Graph tags (og:title, og:type, og:url, and og:image) which are widely used for sharing content on social media like Facebook and Twitter. RDFa, Microdata, and JSON-LD can collectively be described as Linked Data technologies, so called because their intention is that semantic objects across different web pages would link to each other to create a Semantic Web. Microformats was developed circa 2005 as a lightweight way of putting semantic information into web pages, but does not aim to be a Linked Data or Semantic Web technology. It does not have an official standards body behind it, instead being maintained by a community of volunteers. One of our Mozilla employees, Tantek Çelik, was instrumental in its development. Okay, now I'd like to discuss whether or not we should use these technologies for Pinning The Web. Open Graph: I think we need to use the four tags og:title, og:type, og:url and og:image, since they are widely used. Apart from that, I don't think we need to support the rest of Open Graph. RDFa, Microdata, and JSON-LD: I'd be afraid of using these. They were designed for something much bigger and more complicated than just pinning websites/contacts/events. I'd be afraid of people getting the idea that Mozilla supports RDFa, because that would give the wrong idea and just lead to disappointment and/or headache. Also, they are complex, and our developer effort is limited. JSON-LD has the additional problem that it exists separately from the content of the webpage, meaning that the JSON-LD data can get out-of-sync with the webpage, leading to confusion for users. (We've all see the way code comments quickly get out-of-sync with the code they purport to describe.) The argument has been made on this discussion list that RDFa and Microdata data is abundant, and so we should take advantage of it. But it's questionable how much of that data is actually good. The main use of RDFa and Microdata right now is for search engine optimization, which means the data isn't necessarily in a form presentable to the user. (Also, it might be all lies.) Microformats: Yes, we should use these. We've had support for Microformats in Firefox since Firefox 3 ( https://developer.mozilla.org/en-US/docs/Using_microformats), so it's just a matter of updating and expanding
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On Fri, Jun 26, 2015 at 2:18 PM, Benjamin Francis bfran...@mozilla.com wrote: When I look at RDFa, Microdata and JSON-LD I see formal W3C recommendations, extensive vocabularies which (at least on the surface) are agreed on by all the big search engines, and I see a clean engineering solution (albeit fairly complex). Based on this kind of reasoning we almost ended up with XForms. I would encourage you to go a little deeper. Let's make it clear for all of dev.platform, a W3C Recommendation means nothing. Pretty much anyone can get one. We need to judge standards on their merits and not jump on the next XForms/XML/WS-*/SVG bandwagon. -- https://annevankesteren.nl/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linked Data must die. (was: Linked Data and a new Browser API event)
On 26 June 2015 at 17:02, Anne van Kesteren ann...@annevk.nl wrote: I would encourage you to go a little deeper... We need to judge standards on their merits I did look deeper. I read most of all the specifications and several papers on their adoption. My personal conclusion was that not only does Microformats appear to be used less widely than other competing formats, but that from a technical point of view just adding h- prefixes to class names seems like a massive hack. Many of the arguments I've heard in favour of Microformats are that it's the grassroots or non-evil solution. It's equally true that not being a W3C recommendation doesn't automatically make something better either. But I'm not the person that will have to implement this, and the people who are think we should use Microformats. Ben ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform