Re: [Wikidata-l] Call for development openness

Paul Houle Fri, 20 Feb 2015 10:14:52 -0800

Gerard,

    I should probably keep my mouth shut about this but I am so offended
but what you say that I am not.

I am not an academic.  The people behind Wikidata are.

I am a professional programmer who has spent a lot of time being the guy
who finishes what other people started;  I typically come on when a project
has been two years late for two years and I do whatever it takes to get the
product in front of the customer.

I know that building a system in PHP to do what Wikidata is the road to
hell not because I hate PHP but because I have done it myself and learned
it through experience.

I first heard about Wikidata at SemTech in San Francisco and I was told
very directly that they were not interested in working with anybody who was
experienced with putting data from generic database in front of users
because they had worked so hard to get academic positions and get a grant
from the Allen Institute and it is more cost-effective and more compatible
with academic advancement to hire a bunch of young people who don't know
anything but will follow orders.

If you hire 3x the people you need and have good management you can make
that work;  just the fact that the project has a heavyweight project
manager is a very good sign.  I mean that is how the CMM 5 shops in India
do it,  and perhaps they have done that because actually Wikidata has
succeeded quite well from a software engineering perspective.

Now so far as RDF and SPARQL go if you'd seen my history you'd see I am an
American in the Winston Churchill sense that I've tried everything except
for the right thing and finally settled on it.  I really had my conversion
when I discovered I could take data from Freebase and put it through
something more like a reconstruction than a transformation,  convert it to
RDF and I could write SPARQL queries that "just worked".

RDF* and SPARQL* do not come from an academic background but from a
commercial organization that expects to make money by satisfying people's
needs and it is being supported by a number of other commercial
organizations.  See

http://wiki.bigdata.com/wiki/index.php/Reification_Done_Right

This is something that builds on everything successful about RDF and SPARQL
and adds the "missing links" that it takes to implement data wikis.  If
somebody was starting Wikidata today or if the kind of billionaire who buys
sports teams the way I might buy a game console wanted to fund an effort to
keep Freebase going,  RDF*/SPARQL* is the way to do it.

What happens when you build a half-baked system from scratch and don't what
algebra is using is that you run into problems that get progressively worse
and you wind up like woman who ate the cat because she ate the rat and so
forth.  If I had a dime for every time I had to fix up some application
where people could not figure out how to make primary keys that are unique
or every boss who didn't want me to take my time to understand a race
condition and wished I would be like the guy who made the race conditions,
 just trying random things until it sorta works I would be a billionaire
and I would take Freebase over and fix all the things that are wrong with
it.  (Which are really not that bad,  but never happened because Google
didn't have any incentive to say,  improve the book database.)

Or would I?

A structural problem with open data is that people are NOT paying for it.
If you were paying for it,  the publishers of the data would have an
incentive to serve the PEOPLE who are using it.  Wikidata is playing to
whims of a few rich people and it could disappear at any time when those
people get tired of it or decide they have what they want and don't want to
make it any easier for competitors to follow them.

 Most conventional forms of academic funding that come from governments
have the same problems.  I mean,  you get your grant,  you publish a paper,
 it doesn't particularly matter that what you did worked or not.  There is
also the perpetual "project" orientation which is not suitable for things
like arXiv.org or DBpedia which are really programs or operations.  You can
have a very hard time finding $400k a year for something that is high
impact (i.e. 50,000 scientists use it every day),  while next door there is
somebody who got $5 million to make something that goes nowhere (i.e. the
postdoc wants to use Hadoop to process the server logs but the number of
hits is small enough you could do it by hand.)

In terms of producing a usable product Wikidata has made some very good
progress in terms of having data that is clean (if not copious),  but in
terms of having a usable query facility or dump files that are easy to work
with,  it is still behind DBpedia.  I mean,  you can do a lot with the
DBpedia files with grep and awk and tools like that and it is not that hard
to load it into a triple store and you have SPARQL 1.1 which is eminently
practical because you can use your whole bag of tricks that you use with
relational databases.

In the big picture though,  quality is in the eye of the end user,  so it
only happens if you have a closed feedback loop where the end user has a
major influence on the behavior of the producer and certainly the
possibility of making more money if you do a better job and (even more so)
going out of business if you fail to do so is a powerful way to do it.

The trouble is that most people interested in open data seem to think their
time is worth nothing and other people's time is worth nothing and aren't
interested in paying even a small amount for services so the producers
throw stuff that almost works over the wall.  I don't think it would be all
that difficult for me to do for Wikidata what I did for Freebase but I am
not doing it because you aren't going to pay for it.

[... GOES BACK TO WORK ON A "SKUNK WORKS" PROJECT THAT JUST MIGHT PAY OFF]

On Fri, Feb 20, 2015 at 8:09 AM, Gerard Meijssen <gerard.meijs...@gmail.com>
wrote:

> Hoi,
> I have waited for some time to reply. FIrst of all. Wikidata is not your
> average data repository. It would not be as relevant as it is if it were
> not for the fact that it links Wikipedia articles of any language to
> statements on items.
>
> This is the essence of Wikidata. After that we can all complain about the
> fallacies of Wikidata.. I have my pet pieves and it is not your RDF SPARQL
> and stuff. That is mostly stuff for academics and it its use is largely
> academic and not useful on the level where I want progress. Exposing this
> information to PEOPLE is what I am after and by and large they do not live
> in the ivory towers where RDF and SPARQL live.
>
> I am delighted to learn that a production grade replacement of WDQ is
> being worked on. I am delighted that a front-end (javascript) ? developers
> is being sought. That is what it takes to bring the sum of al knowledge to
> all people. It is in enriching the data in Wikidata not in yet another pet
> project where we can make a difference because that is what the people will
> see. When SPARQL is available with Wikidata data.. do wonder how you would
> serve all the readers of Wikipedia.. Does SPARQL sparkle enough when it is
> challenged in this way ?
> Thanks,
>      GerardM
>
> On 18 February 2015 at 21:25, Paul Houle <ontolo...@gmail.com> wrote:
>
>> What bugs me about it is that Wikidata has gone down the same road as
>> Freebase and Neo4J in the sense of developing something ad-hoc that is not
>> well understood.
>>
>> I understand the motivations that lead there,  because there are
>> requirements to meet that standards don't necessarily satisfy,  plus
>> Wikidata really is doing ambitious things in the sense of capturing
>> provenance information.
>>
>> Perhaps it has come a little too late to help with Wikidata but it seems
>> to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that
>> you can view data as plain ordinary RDF and query with SPARQL but you can
>> also attach provenance and other metadata in a sane way with sweet syntax
>> for writing it in Turtle or querying it in other ways.
>>
>> Another way of thinking about it is that RDF* is formalizing the property
>> graph model which has always been ad hoc in products like Neo4J.  I can say
>> that knowing what the algebra is you are implementing helps a lot in
>> getting the tools to work right.  So you not only have SPARQL queries as a
>> possibility but also languages like Gremlin and Cypher and this is all
>> pretty exciting.  It is also exciting that vendors are getting on board
>> with this and we are going to seeing some stuff that is crazy scalable (way
>> past 10^12 facts on commodity hardware) very soon.
>>
>>
>>
>>
>> On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw <jeroended...@gmail.com>
>> wrote:
>>
>>> Hey,
>>>
>>> As Lydia mentioned, we obviously do not actively discourage outside
>>> contributions, and will gladly listen to suggestions on how we can do
>>> better. That being said, we are actively taking steps to make it easier for
>>> developers not already part of the community to start contributing.
>>>
>>> For instance, we created a website about our software itself [0], which
>>> lists the MediaWiki extensions and the different libraries [1] we created.
>>> For most of our libraries, you can just clone the code and run composer
>>> install. And then you're all set. You can make changes, run the tests and
>>> submit them back. Different workflow than what you as MediaWiki developer
>>> are used to perhaps, though quite a bit simpler. Furthermore, we've been
>>> quite progressive in adopting practices and tools from the wider PHP
>>> community.
>>>
>>> I definitely do not disagree with you that some things could, and
>>> should, be improved. Like you I'd like to see the Wikibase git repository
>>> and naming of the extensions be aligned more, since it indeed is confusing.
>>> Increased API stability, especially the JavaScript one, is something else
>>> on my wish-list, amongst a lot of other things. There are always reasons of
>>> why things are the way they are now and why they did not improve yet. So I
>>> suggest to look at specific pain points and see how things can be improved
>>> there. This will get us much further than looking at the general state,
>>> concluding people do not want third party contributions, and then
>>> protesting against that.
>>>
>>> [0] http://wikiba.se/
>>> [1] http://wikiba.se/components/
>>>
>>> Cheers
>>>
>>> --
>>> Jeroen De Dauw - http://www.bn2vs.com
>>> Software craftsmanship advocate
>>> Evil software architect at Wikimedia Germany
>>> ~=[,,_,,]:3
>>>
>>> _______________________________________________
>>> Wikidata-l mailing list
>>> Wikidata-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>>
>>
>>
>> --
>> Paul Houle
>> Expert on Freebase, DBpedia, Hadoop and RDF
>> (607) 539 6254    paul.houle on Skype   ontolo...@gmail.com
>> http://legalentityidentifier.info/lei/lookup
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> Wikidata-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>

-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontolo...@gmail.com
http://legalentityidentifier.info/lei/lookup

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] Call for development openness

Reply via email to