Hoi,
I have seen the statistics. The quality of Freebase cannot be understood by
simply looking at the problems. People have been looking for problems and
been identifying them. As a consequence more data ended up in the error
bucket than in the good bucket. I have for instance added a lot of
statements as "wrong" because they were exactly the same as the value
already present. Consequently the error rate is not representative.

Denny, I have a suggestion. It is backed by math, it is backed by how
people think. All the arguments are on my side. I have not heard your
arguments and the "primary sources tool" was announced as a good thing and
the community never agreed to having it. So leave the community out of it
and focus on arguments.

   - why would someone work on data in the primary sources tool when it is
   more effective to add data directly
   - why is data that is over 90% good denied access to Wikidata (ie as
   good as Wikidata itself)
   - how do you justify the pst when so little data was included in Wikidata
   - why not have Kian learn from the data set of Freebase and Wikidata and
   have smart suggestions
   - why waste people's time adding one item/statement at a time when you
   can focus on the statements that are in doubt (either in Freebase or in
   Wikidata

The notion of having all new data go through the primary sources tool will
see me leave the project when this is realised. I will feel that my time
and intelligence is wasted.

Thanks,

      GerardM

On 28 September 2015 at 22:54, Denny Vrandečić <vrande...@google.com> wrote:

> Hi Gerard,
>
> given the statistics you cite from
>
> https://tools.wmflabs.org/wikidata-primary-sources/status.html
>
> I see that 19.6k statements have been approved through the tool, and 5.1k
> statements have been rejected - which means that about 1 in 5 statements is
> deemed unsuitable by the users of primary sources.
>
> Given that there are 12.4M statements in the tool, this means that about
> 2.5M statements will turn out to be unsuitable for inclusion in Wikidata
> (if the current ratio holds). Are you suggesting to upload all of these
> statements to Wikidata?
>
> Tpt already did upload pieces of the data which have sufficient quality
> outside the primary sources tool, and more is planned. But for the data
> where the suitability for Wikidata seems questionable, I would not know
> what other approach to use. Do you have a suggestion?
>
> Once you have a suggestion and there is community consensus in doing it,
> no one will stand in the way of implementing that suggestion.
>
> Cheers,
> Denny
>
>
> On Mon, Sep 28, 2015 at 1:19 PM John Erling Blad <jeb...@gmail.com> wrote:
>
>> Another; make a kind of worklist on Wikidata that reflect the watchlist
>> on the clients (Wikipedias) but then, we often have items on our watchlist
>> that we don't know much about. (Digression: Somehow we should be able to
>> sort out those things we know (the place we live, the persons we have meet)
>> from those things we have done (edited, copy-pasted).)
>>
>> I been trying to get some interest in the past for worklists on
>> Wikipedia, it isn't much interest to make them. It would speed up tedious
>> tasks of finding the next page to edit after a given edit is completed. It
>> is the same problem with imports from Freebase on Wikidata, locate the next
>> item on Wikidata with the same queued statement from Freebase, but within
>> some worklist that the user has some knowledge about.
>>
>> Imagine "municipalities within a county" or "municipalities that is also
>> on the users watchlist", and combine that with available unhandled
>> Freebase-statements.
>>
>> On Mon, Sep 28, 2015 at 10:09 PM, John Erling Blad <jeb...@gmail.com>
>> wrote:
>>
>>> Could it be possible to create some kind of info (notification?) in a
>>> wikipedia article that additional data is available in a queue ("freebase")
>>> somewhere?
>>>
>>> If you have the article on your watch-list, then you will get a warning
>>> that says "You lazy boy, get your ass over here and help us out!" Or
>>> perhaps slightly rephrased.
>>>
>>> On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch <
>>> mar...@semantic-mediawiki.org> wrote:
>>>
>>>> Hi Gerard, hi all,
>>>>
>>>> The key misunderstanding here is that the main issue with the Freebase
>>>> import would be data quality. It is actually community support. The goal of
>>>> the current slow import process is for the Wikidata community to "adopt"
>>>> the Freebase data. It's not about "storing" the data somewhere, but about
>>>> finding a way to maintain it in the future.
>>>>
>>>> The import statistics show that Wikidata does not currently have enough
>>>> community power for a quick import. This is regrettable, but not something
>>>> that we can fix by dumping in more data that will then be orphaned.
>>>>
>>>> Freebase people: this is not a small amount of data for our young
>>>> community. We really need your help to digest this huge amount of data! I
>>>> am absolutely convinced from the emails I saw here that none of the former
>>>> Freebase editors on this list would support low quality standards. They
>>>> have fought hard to fix errors and avoid issues coming into their data for
>>>> a long time.
>>>>
>>>> Nobody believes that either Freebase or Wikidata can ever be free of
>>>> errors, and this is really not the point of this discussion at all [1]. The
>>>> experienced community managers among us know that it is not about the
>>>> amount of data you have. Data is cheap and easy to get, even free data with
>>>> very high quality. But the value proposition of Wikidata is not that it can
>>>> provide storage space for lot of data -- it is that we have a functioning
>>>> community that can maintain it. For the Freebase data donation, we do not
>>>> seem to have this community yet. We need to find a way to engage people to
>>>> do this. Ideas are welcome.
>>>>
>>>> What I can see from the statistics, however, is that some users (and I
>>>> cannot say if they are "Freebase users" or "Wikidata users" ;-) are putting
>>>> a lot of effort into integrating the data already. This is great, and we
>>>> should thank these people because they are the ones who are now working on
>>>> what we are just talking about here. In addition, we should think about
>>>> ways of engaging more community in this. Some ideas:
>>>>
>>>> (1) Find a way to clean and import some statements using bots. Maybe
>>>> there are cases where Freebase already had a working import infrastructure
>>>> that could be migrated to Wikidata? This would also solve the community
>>>> support problem in one way. We just need to import the maintenance
>>>> infrastructure together with the data.
>>>>
>>>> (2) Find a way to expose specific suggestions to more people. The
>>>> Wikidata Games have attracted so many contributions. Could some of the
>>>> Freebase data be solved in this way, with a dedicated UI?
>>>>
>>>> (3) Organise Freebase edit-a-thons where people come together to work
>>>> through a bunch of suggested statements.
>>>>
>>>> (4) Form wiki projects that discuss a particular topic domain in
>>>> Freebase and how it could be imported faster using (1)-(3) or any other
>>>> idea.
>>>>
>>>> (5) Connect to existing Wiki projects to make them aware of valuable
>>>> data they might take from Freebase.
>>>>
>>>> Freebase is a much better resource than many other data resources we
>>>> are already using with similar approaches as (1)-(5) above, and yet it
>>>> seems many people are waiting for Google alone to come up with a solution.
>>>>
>>>> Cheers,
>>>>
>>>> Markus
>>>>
>>>> [1] Gerard, if you think otherwise, please let us know which error
>>>> rates you think are typical or acceptable for Freebase and Wikidata,
>>>> respectively. Without giving actual numbers you just produce empty strawman
>>>> arguments (for example: claiming that anyone would think that Wikidata is
>>>> better quality than Freebase and then refuting this point, which nobody is
>>>> trying to make). See https://en.wikipedia.org/wiki/Straw_man
>>>>
>>>>
>>>> On 26.09.2015 18:31, Gerard Meijssen wrote:
>>>>
>>>>> Hoi,
>>>>> When you analyse the statistics, it shows how bad the current state of
>>>>> affairs is. Slightly over one in a thousanths of the content of the
>>>>> primary sources tool has been included.
>>>>>
>>>>> Markus, Lydia and myself agree that the content of Freebase may be
>>>>> improved. Where we differ is that the same can be said for Wikidata. It
>>>>> is not much better and by including the data from Freebase we have a
>>>>> much improved coverage of facts. The same can be said for the content
>>>>> of
>>>>> DBpedia probably other sources as well.
>>>>>
>>>>> I seriously hate this procrastination and the denial of the efforts of
>>>>> others. It is one type of discrimination that is utterly deplorable.
>>>>>
>>>>> We should concentrate on comparing Wikidata with other sources that are
>>>>> maintained. We should do this repeatedly and concentrate on workflows
>>>>> that seek the differences and provide workflows that help our community
>>>>> to improve what we have. What we have is the sum of all available
>>>>> knowledge and by splitting it up, we are weakened as a result.
>>>>> Thanks,
>>>>>        GerardM
>>>>>
>>>>> On 26 September 2015 at 03:32, Thad Guidry <thadgui...@gmail.com
>>>>> <mailto:thadgui...@gmail.com>> wrote:
>>>>>
>>>>>     Also, Freebase users themselves who did daily, weekly work.... some
>>>>>     where passing users, some tried harder, but made lots of erroneous
>>>>>     entries (battling against our Experts at times).  We could probably
>>>>>     provide a list of those sorta community blacklisted users who's
>>>>> data
>>>>>     submissions should probably not be trusted.
>>>>>
>>>>>     +1 for looking at better maintained specific properties.
>>>>>     +1 for being cautious for some Freebase usernames and their
>>>>> entries.
>>>>>     +1 for trusting wholesale all of the Freebase Experts submissions.
>>>>>     We policed each other quite well.
>>>>>
>>>>>
>>>>>
>>>>>     Thad
>>>>>     +ThadGuidry <https://www.google.com/+ThadGuidry>
>>>>>
>>>>>     On Fri, Sep 25, 2015 at 11:45 AM, Jason Douglas
>>>>>     <jasondoug...@google.com <mailto:jasondoug...@google.com>> wrote:
>>>>>
>>>>>         > It would indeed be interesting to see which percentage of
>>>>> proposals are
>>>>>         > being approved (and stay in Wikidata after a while), and
>>>>> whether there
>>>>>         > is a pattern (100% approval on some type of fact that could
>>>>> then be
>>>>>         > merged more quickly; or very low approval on something else
>>>>> that would
>>>>>         > maybe better revisited for mapping errors or other
>>>>> systematic problems).
>>>>>
>>>>>         +1, I think that's your best bet. Specific properties were much
>>>>>         better maintained than others -- identify those that meet the
>>>>>         bar for wholesale import and leave the rest to the primary
>>>>>         sources tool.
>>>>>
>>>>>         On Thu, Sep 24, 2015 at 4:03 PM Markus Krötzsch
>>>>>         <mar...@semantic-mediawiki.org
>>>>>         <mailto:mar...@semantic-mediawiki.org>> wrote:
>>>>>
>>>>>             On 24.09.2015 23:48, James Heald wrote:
>>>>>              > Has anybody actually done an assessment on Freebase and
>>>>>             its reliability?
>>>>>              >
>>>>>              > Is it *really* too unreliable to import wholesale?
>>>>>
>>>>>               From experience with the Primary Sources tool proposals,
>>>>>             the quality is
>>>>>             mixed. Some things it proposes are really very valuable,
>>>>> but
>>>>>             other
>>>>>             things are also just wrong. I added a few very useful facts
>>>>>             and fitting
>>>>>             references based on the suggestions, but I also rejected
>>>>>             others. Not
>>>>>             sure what the success rate is for the cases I looked at,
>>>>> but
>>>>>             my feeling
>>>>>             is that some kind of "supervised import" approach is really
>>>>>             needed when
>>>>>             considering the total amount of facts.
>>>>>
>>>>>             An issue is that it is often fairly hard to tell if a
>>>>>             suggestion is true
>>>>>             or not (mainly in cases where no references are suggested
>>>>> to
>>>>>             check). In
>>>>>             other cases, I am just not sure if a fact is correct for
>>>>> the
>>>>>             property
>>>>>             used. For example, I recently ended up accepting
>>>>> "architect:
>>>>>             Charles
>>>>>             Husband" for Lovell Telescope (Q555130), but to be honest I
>>>>>             am not sure
>>>>>             that this is correct: he was the leading engineer
>>>>> contracted
>>>>>             to design
>>>>>             the telescope, which seems different from an architect; no
>>>>>             official web
>>>>>             site uses the word "architect" it seems; I could not find a
>>>>>             better
>>>>>             property though, and it seemed "good enough" to accept it
>>>>>             (as opposed to
>>>>>             the post code of the location of this structure, which
>>>>>             apparently was
>>>>>             just wrong).
>>>>>
>>>>>              >
>>>>>              > Are there any stats/progress graphs as to how the actual
>>>>>             import is in
>>>>>              > fact going?
>>>>>
>>>>>             It would indeed be interesting to see which percentage of
>>>>>             proposals are
>>>>>             being approved (and stay in Wikidata after a while), and
>>>>>             whether there
>>>>>             is a pattern (100% approval on some type of fact that could
>>>>>             then be
>>>>>             merged more quickly; or very low approval on something else
>>>>>             that would
>>>>>             maybe better revisited for mapping errors or other
>>>>>             systematic problems).
>>>>>
>>>>>             Markus
>>>>>
>>>>>
>>>>>              >
>>>>>              >    -- James.
>>>>>              >
>>>>>              >
>>>>>              > On 24/09/2015 19:35, Lydia Pintscher wrote:
>>>>>              >> On Thu, Sep 24, 2015 at 8:31 PM, Tom Morris
>>>>>             <tfmor...@gmail.com <mailto:tfmor...@gmail.com>> wrote:
>>>>>              >>>> This is to add MusicBrainz to the primary source
>>>>> tool,
>>>>>             not anything
>>>>>              >>>> else?
>>>>>              >>>
>>>>>              >>>
>>>>>              >>> It's apparently worse than that (which I hadn't
>>>>>             realized until I
>>>>>              >>> re-read the
>>>>>              >>> transcript).  It sounds like it's just going to
>>>>>             generate little warning
>>>>>              >>> icons for "bad" facts and not lead to the recording of
>>>>>             any new facts
>>>>>              >>> at all.
>>>>>              >>>
>>>>>              >>> 17:22:33 <Lydia_WMDE> we'll also work on getting the
>>>>>             extension
>>>>>              >>> deployed that
>>>>>              >>> will help with checking against 3rd party databases
>>>>>              >>> 17:23:33 <Lydia_WMDE> the result of constraint checks
>>>>>             and checks
>>>>>              >>> against 3rd
>>>>>              >>> party databases will then be used to display little
>>>>>             indicators next to a
>>>>>              >>> statement in case it is problematic
>>>>>              >>> 17:23:47 <Lydia_WMDE> i hope this way more people
>>>>>             become aware of
>>>>>              >>> issues and
>>>>>              >>> can help fix them
>>>>>              >>> 17:24:35 <sjoerddebruin> Do you have any names of
>>>>>             databases that are
>>>>>              >>> supported? :)
>>>>>              >>> 17:24:59 <Lydia_WMDE> sjoerddebruin: in the first
>>>>>             version the german
>>>>>              >>> national library. it can be extended later
>>>>>              >>>
>>>>>              >>>
>>>>>              >>> I know Freebase is deemed to be nasty and unreliable,
>>>>>             but is MusicBrainz
>>>>>              >>> considered trustworthy enough to import directly or
>>>>>             will its facts
>>>>>              >>> need to
>>>>>              >>> be dripped through the primary source soda straw one
>>>>> at
>>>>>             a time too?
>>>>>              >>
>>>>>              >> The primary sources tool and the extension that helps
>>>>> us
>>>>>             check against
>>>>>              >> other databases are two independent things.
>>>>>              >> Imports from Musicbrainz have been happening since a
>>>>>             very long time
>>>>>              >> already.
>>>>>              >>
>>>>>              >>
>>>>>              >> Cheers
>>>>>              >> Lydia
>>>>>              >>
>>>>>              >
>>>>>              >
>>>>>              > _______________________________________________
>>>>>              > Wikidata mailing list
>>>>>              > Wikidata@lists.wikimedia.org
>>>>>             <mailto:Wikidata@lists.wikimedia.org>
>>>>>              > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>>             _______________________________________________
>>>>>             Wikidata mailing list
>>>>>             Wikidata@lists.wikimedia.org
>>>>>             <mailto:Wikidata@lists.wikimedia.org>
>>>>>             https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>>         _______________________________________________
>>>>>         Wikidata mailing list
>>>>>         Wikidata@lists.wikimedia.org <mailto:
>>>>> Wikidata@lists.wikimedia.org>
>>>>>         https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     Wikidata mailing list
>>>>>     Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>>>>>     https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wikidata mailing list
>>>>> Wikidata@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>
>>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to