[Wikidata] Re: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

2023-01-13 Thread Sandra Fauconnier
Hi all! I’m curious as this service also includes the data I’ve worked on over the years, often with partners: who are typical users of this service and are there linkable/visible examples of this use? And more in general, which typical use cases does your service serve?Thanks!Sandra Sent from my iPhoneOn 13 Jan 2023, at 10:02, Jerven Tjalling Bolleman  wrote:

  
  
Hi All,
  
  Regarding these FAIR use settings. They are tuneable and maybe
  turned off, so the specific
  values that Openlink uses may or may not be used if wikidata would
  host itself a virtuoso instance.
  
  e.g. for sparql.uniprot.org you are unlikely to run into these
  limits (as the values are set very high indeed) 
  and are more likely to suffer from settings around the http layer
  that limit query run time due to connection issues.
  
  Regards,
  Jerven

On 1/12/23 11:45 PM, Kingsley Idehen
  via Wikidata wrote:

 
  On 1/12/23 3:39 AM, Larry Gonzalez wrote: 
  Dear Kingsley, 

Let me start saying that I appreciate and thank the effort of
loading complete wikidata over a graph database and make and
sparql endpoint available. I know it is not an easy task to do 

I just tried out the new virtuoso-hosted sparql endpoint with
some queries. My experiments are not exhaustive at all, but I
just wanted to raise two concern that I detected 

Considering a (very simple) query that count all humans: 

''' 
SELECT (count(?human) as ?c) 
WHERE 
{ 
  ?human wdt:P31 wd:Q5 . 
} 
''' 

I get a result of 10396057, which is ok considering the dataset
that you are using 

But if we try to export all instances of human (on a tsv file)
with the following query: 

''' 
SELECT ?human 
WHERE 
{ 
  ?human wdt:P31 wd:Q5 . 
} 
''' 

Then I only get 10 results. Is there a limit over the number
of results that a query can have? 
  
  
  
  Yes, because these services are primarily for ad-hoc querying
  rather than wholesale data exports. If you want to export massive
  amounts of data then you can do so using OFFSET and LIMIT. 
  
  Alternatively, you can instantiate your own instance in the Azure
  or AWS cloud and use as you see fit. 
  
  Like what we provide regarding DBpedia, there's a server side
  configuration in place for enforcing a "fair use" policy :) 
  
  
   

Furthermore, if we want to get all humans ordered by id, then
the endpoint times out. The following is the query: 

''' 
SELECT ?human 
WHERE 
{ 
  ?human wdt:P31 wd:Q5 . 
} 
ORDER BY DESC(?human) 
''' 
  
  
  
  If you set the query timeout to a value over 1000 msecs, the
  Virtuoso Anytime Query feature will provide you with a partial
  solution which you can use in conjunction with OFFSET and LIMIT to
  creative an interactive cursor (or scrollable cursor). Beyond
  that, its back to the "fair use" policy and option to instantiate
  your own service-specific instance using our cloud offerings. 
  
  
  Regards, 
  
  Kingsley 
  
  
   
Thank you again for all your efforts. I am looking forward to
see how this new endpoint work, :) 

Are you planning to update regularly the dataset? 

All the best! 
Larry 

https://iccl.inf.tu-dresden.de/web/Larry_Gonzalez




On 11.01.23 21:51, Kingsley Idehen via Wikidata wrote: 
All, 
  
  We are pleased to announce immediate availability of an new
  Virtuoso-hosted Wikidata instance based on the most recent
  datasets. This instance comprises 17 billion+ RDF triples. 
  
  Host Machine Info: 
  
  Item Value 
  
  CPU 
  
  
  
  |2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz| 
  
  Cores 
  
  
  
  |24| 
  
  Memory 
  
  
  
  |378 GB| 
  
  SSD 
  
  
  
  |4x Crucial M4 SSD 500 GB| 
  
  
  Cloud related costs for a self-hosted variant, assuming: 
  
    * 
  
      dedicated machine for 1 year without upfront costs 
  
    * 
  
      128 GiB memory 
  
    * 
  
      16 cores or more 
  
    * 
  
      512GB SSD for the database 

[Wikidata] monthly online OpenRefine / Wikimedia office hours

2022-03-10 Thread Sandra Fauconnier
Hello everyone,

As some of you may know, several developers are currently working on a
project (funded by a Wikimedia grant) to add Structured Data on Wikimedia
Commons functionalities to OpenRefine. See
https://commons.wikimedia.org/wiki/Commons:OpenRefine for more info.

We are making good progress on this project. As we will probably regularly
add new Wikibase/Commons/Wikidata-related functionalities to OpenRefine,
the OpenRefine team starts hosting online, monthly office hours for
OpenRefine users from the Wikimedia community (including Wikidatans!). You
can meet and ask questions to other OpenRefine / Wikimedia users here, and
talk to members of the development team. These office hours are informal,
have no set agenda, are held via Zoom but are not recorded. Registration is
not needed.

For now, we have scheduled office hours until the end of June 2022. Time of
the day alternates to accommodate participants from diverse time zones. If
these office hours prove to be popular, we will plan more of these later!

   - Tuesday, March 22, 2022 at 9AM UTC
   - Tuesday, April 19, 2022 at 4PM UTC
   - Tuesday, May 24, 2022 at 8AM UTC
   - Tuesday, June 21, 2022 at 4PM UTC

See
https://commons.wikimedia.org/wiki/Commons:OpenRefine#Join_OpenRefine_meetups_and_office_hours
where
we will post the Zoom links. Feel free to drop by :-)

All the best,
Sandra (User:Spinster / User:SFauconnier)
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] invitation: join the WikidataCon 2021 iNaturalist BioBlitz!

2021-10-29 Thread Sandra Fauconnier
Hello everyone,

This weekend (from today until Sunday end of the day), a few of us Wikidata
/ biodiversity enthusiasts are hosting a so-called iNaturalist BioBlitz
. We invite
all of you to participate!

If you are attending WikidataCon, you are probably spending a lot of time
indoors behind your computer screen. Why not take a break, get some
fresh air, take a walk in your surroundings, and do some very easy
citizen science? Take some photos of the plants, insects, birds,
mushrooms... around you, and contribute data and images to a worldwide
biodiversity observation dataset
 which
is also heavily re-used on Wikidata

and other Wikimedia projects
,
by many enthusiastic volunteers
! :-)

Contributing is easy!

* Sign up for iNaturalist  (you can
check its privacy policy ) if
you haven't done so already. Make sure to adjust the default license of
your images to a Wikimedia-compatible license (CC0 or CC-BY are preferred)
via Account settings > Content & Display.
* Go to https://www.inaturalist.org/projects/wikidatacon-2021-bioblitz and
join the project (top right of the page).
* Take a walk around your neighborhood and take photos of flora and fauna.
You can upload them to your iNaturalist account via one of its mobile apps
or through its web interface.
* Any observations that you make between today and Sunday (end of the day)
will automatically be added to the BioBlitz.

See the map of current observations here
 (scroll
down). Please dethrone me! :p Let's make this map a worldwide and very
(bio)diverse one!

Have fun and see you at WikidataCon!
Sandra, Andra and Tiago
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Two open developer positions for OpenRefine (paid contractors, part time, remote)

2021-07-08 Thread Sandra Fauconnier
Hi everyone,

I'm very happy to announce:

OpenRefine [1] has two Junior Developer job openings (paid contractor
positions; part-time, fully remote) for building Structured Data on
Wikimedia Commons [2] functionalities.

Needless to say, we would love to receive applications from Wikimedians :-)

* Junior Developer - Wikimedia Development [3] (6 months, from September
2021 till February 2022)
* Junior Developer - OpenRefine Development [4] (8 months, from November
2021 till June 2022)

All the best!
Sandra (User:Spinster / User:SFauconnier)

[1] https://openrefine.org
[2] https://w.wiki/UR
[3]
https://openrefine.org/blog/2021/07/07/Wikimedia-Commons-reconciliation-batch-developer.html
[4] https://openrefine.org/blog/2021/07/07/OpenRefine-SDC-developer.html
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


Re: [Wikidata] Project Grant application for SDC support in OpenRefine: feedback and endorsements welcome

2021-03-16 Thread Sandra Fauconnier
Hello Thad,

I think this comment merely is an enthusiastic shout-out that means 'in my
opinion, OpenRefine rules (as in: it's a great solution) for data'. I know
the person who responded here and will check with them if I'm correct.

Anyone is free to sign up to the project as a volunteer or advisor. There
are absolutely no obligations or 'rules' (as in 'regulations'). At this
moment, I see this as a sign that you are more interested than average, and
are OK with, for instance, being directly approached on your talk page when
new features can be tested and if we have questions or assumptions that we
would like to check with the community.
I will add this bit of info to the application text.

Sandra


On Tue, Mar 16, 2021 at 2:26 PM Thad Guidry  wrote:

> Hi Sandra and team!
>
> Great to see the application in its final form and kudos to you and
> Antonin for bringing this forward for the whole community.
>
> I do have 1 question on the application concerning the participants?
>
> * *Volunteer* OR rules data Ecritures
> <https://meta.wikimedia.org/wiki/User:Ecritures> (talk
> <https://meta.wikimedia.org/wiki/User_talk:Ecritures>) 12:19, 16 March
> 2021 (UTC)
>
> I did not see a mention about "rules" in any of the phases of
> development?  Can you explain the volunteer effort needed for this (or
> maybe added some of this missing detail in the phase where needed) and how
> it pertains to any of the phases of development?
>
> Thad
> https://www.linkedin.com/in/thadguidry/
> https://calendly.com/thadguidry/
>
>
> On Tue, Mar 16, 2021 at 4:07 AM Sandra Fauconnier <
> sandra.fauconn...@gmail.com> wrote:
>
>> Hello everyone,
>>
>> Since 2019, it is possible to add structured data to files on Wikimedia
>> Commons [1] (SDC = Structured Data on Commons). But there are no very
>> advanced and user-friendly tools yet to edit the structured data of very
>> large and very diverse batches of files on Commons. And there is no batch
>> upload tool yet that supports SDC.
>>
>> The OpenRefine [2] community wants to fill this gap: in the upcoming
>> year, we would like to build brand new features in the open source
>> OpenRefine tool, allowing batch editing and batch uploading SDC :-) As
>> these are major new functionalities in OpenRefine, we have applied for a
>> Project Grant [3]. Your feedback [4] and (if you support this plan)
>> endorsements are very welcome.
>>
>> Thanks in advance, and many greetings,
>>
>> Sandra (User:Spinster / User:SFauconnier) as member of the OpenRefine
>> steering committee
>>
>> Antonin (User:Pintoch) as OpenRefine developer
>>
>> [1] https://commons.wikimedia.org/wiki/Commons:Structured_data
>>
>> [2] https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine
>>
>> [3]
>> https://meta.wikimedia.org/wiki/Grants:Project/Structured_Data_on_Wikimedia_Commons_functionalities_in_OpenRefine
>>
>>
>> [4]
>> https://meta.wikimedia.org/wiki/Grants_talk:Project/Structured_Data_on_Wikimedia_Commons_functionalities_in_OpenRefine
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Project Grant application for SDC support in OpenRefine: feedback and endorsements welcome

2021-03-16 Thread Sandra Fauconnier
Hello everyone,

Since 2019, it is possible to add structured data to files on Wikimedia
Commons [1] (SDC = Structured Data on Commons). But there are no very
advanced and user-friendly tools yet to edit the structured data of very
large and very diverse batches of files on Commons. And there is no batch
upload tool yet that supports SDC.

The OpenRefine [2] community wants to fill this gap: in the upcoming year,
we would like to build brand new features in the open source OpenRefine
tool, allowing batch editing and batch uploading SDC :-) As these are major
new functionalities in OpenRefine, we have applied for a Project Grant [3].
Your feedback [4] and (if you support this plan) endorsements are very
welcome.

Thanks in advance, and many greetings,

Sandra (User:Spinster / User:SFauconnier) as member of the OpenRefine
steering committee

Antonin (User:Pintoch) as OpenRefine developer

[1] https://commons.wikimedia.org/wiki/Commons:Structured_data

[2] https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine

[3]
https://meta.wikimedia.org/wiki/Grants:Project/Structured_Data_on_Wikimedia_Commons_functionalities_in_OpenRefine


[4]
https://meta.wikimedia.org/wiki/Grants_talk:Project/Structured_Data_on_Wikimedia_Commons_functionalities_in_OpenRefine
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Structured Data on Commons at the Wikimedia Hackathon in Barcelona

2018-05-09 Thread Sandra Fauconnier

Hi all!

Several members of the Structured Data on Commons team will be at the 
Wikimedia Hackathon in Barcelona next week. Join us there!


We will do an introduction and Q session, a session focused on GLAM 
projects, and you are warmly welcomed to join us at the dedicated 
Structured Commons corner/table(s) we'll have on location. Come and have 
a chat!


More info here: 
https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2018/Structured_Commons 
- please add your username to the list of participants if you're interested.


All the best! Sandra - on behalf of the team :-)

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Fwd: [Commons-l] IRC office hours - Structured Commons - Tuesday 13 February

2018-02-06 Thread Sandra Fauconnier

(This time as inline message! Apologies. -Sandra)

-

Greetings,

As the subject line says, there will be a Wikimedia Foundation-hosted 
IRC office hours [0] this coming Tuesday, 13 February 2018, from 
18:00-19:00 UTC [1]. The topic is Structured Data on Commons, and the 
subjects are mainly whatever those who attend would like to discuss. The 
Structured Data hub has information about what the development team has 
been up this past year as well as upcoming plans [2] for those who might 
like to prepare or find interesting things to talk about.


The Structured Data team also issues a newsletter every few months. You 
can subscribe to have it delivered to a talk page, receive a 
notification instead delivery, and read past issues. Find out more on 
Meta [3].


I'll be sending out a reminder email a few hours before this occurs in 
one week's time.


Thank you for your time, I hope to see you there.

0. https://meta.wikimedia.org/wiki/IRC_office_hours
1. To check your local date and time for the office hours < 
https://www.timeanddate.com/worldclock/fixedtime.html?hour=18=00=0=13=02=2018 
>

2. https://commons.wikimedia.org/wiki/Commons:Structured_data/Development
3. 
https://meta.wikimedia.org/wiki/Global_message_delivery/Targets/Structured_Data_on_Commons


--
Keegan Peterzell
Technical Collaboration Specialist
Wikimedia Foundation

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Fwd: [Commons-l] IRC office hours - Structured Commons - Tuesday 13 February

2018-02-06 Thread Sandra Fauconnier

Relevant for Wikidatans too! All welcome.

 Forwarded Message 
Subject: 	[Commons-l] IRC office hours - Structured Commons - Tuesday 13 
February

Date:   Tue, 6 Feb 2018 14:30:48 -0600
From:   Keegan Peterzell 
Reply-To: 	Wikimedia Commons Discussion List 






Greetings,

As the subject line says, there will be a Wikimedia Foundation-hosted 
IRC office hours [0] this coming Tuesday, 13 February 2018, from 
18:00-19:00 UTC [1]. The topic is Structured Data on Commons, and the 
subjects are mainly whatever those who attend would like to discuss. The 
Structured Data hub has information about what the development team has 
been up this past year as well as upcoming plans [2] for those who might 
like to prepare or find interesting things to talk about.


The Structured Data team also issues a newsletter every few months. You 
can subscribe to have it delivered to a talk page, receive a 
notification instead delivery, and read past issues. Find out more on 
Meta [3].


I'll be sending out a reminder email a few hours before this occurs in 
one week's time.


Thank you for your time, I hope to see you there.

0. https://meta.wikimedia.org/wiki/IRC_office_hours
1. To check your local date and time for the office hours < 
https://www.timeanddate.com/worldclock/fixedtime.html?hour=18=00=0=13=02=2018 
>

2. https://commons.wikimedia.org/wiki/Commons:Structured_data/Development
3. 
https://meta.wikimedia.org/wiki/Global_message_delivery/Targets/Structured_Data_on_Commons


--
Keegan Peterzell
Technical Collaboration Specialist
Wikimedia Foundation
___
Commons-l mailing list
common...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Fwd: Call for Papers: EuropeanaTech 2018 Conference

2018-01-18 Thread Sandra Fauconnier
Hi all!

Here's a call for proposals for the EuropeanaTech conference, which will
take place in Rotterdam, May 15-16, 2018.
https://pro.europeana.eu/event/europeanatech-conference-2018

Some of the suggested topics are very Wikidata- and Wikimedia-related.

Best! Sandra (User:Spinster)

-- Forwarded message --
From: Gregory Markus 
Date: Thu, Jan 18, 2018 at 9:17 AM
Subject: Call for Papers: EuropeanaTech 2018 Conference
To: europeana-t...@list.ecompass.nl


Dear EuropeanaTech community

EuropeanaTech is about the practical application of research concepts and
the latest technologies to digital libraries. For this edition of
EuropeanaTech, we concentrate on t*he three D’s: Data, Discovery and
Delivery*. Intertwined are the concepts of participation, linked and big
data; language and tools. Across all the subjects we are looking for the
inclusion of rigorous evaluations of the outcomes.

The conference will be a mix of invited speakers and successful
presentations from this call. We are not expecting an academic paper but a
lively presentation of work that you have been doing under the subjects
below. We are as interested in the glorious failures as we are in the
gorgeous successes.
Submission Guidelines

Please submit your proposal* by February 7*.  It should contain a title, an
abstract of 250 words, some key words and a two sentence evaluation of its
practical benefits or learning.  The Programme Committee will evaluate all
the submitted proposals and will notify you before the end of February if
your proposal has been selected for presentation.  *We have room for up to
15 presentations* to be given in the conference as a result of this call.
The conference fee and your travelling costs will be covered if your
presentation is chosen.

Submissions are to be made via EasyChair: https://easychair.org/
conferences/?conf=eurtech18
List of Topics

*DATA*

   1.

   *User generated content and metadata:*  from crowdsourcing of
   descriptive data and transcription projects to Wikidata and structured data
   on the Commons to how to combine institutional and user generated metadata.
   We are looking for what has worked, or what hasn’t and can be done better.
   2.

   *Enhancing the results of digitisation: *various applications connect
   the act of digitisation with required data processes for the use of the
   data.   What are the latest techniques, have they been applied at scale, do
   they work in the in the more challenging audio-visual areas?  We are
   interested in everything from 3D capture, OCR,  sound/video labelling,
   named entity recognition and feature detection, to machine or deep learning
   to help classify and categorise the digitised data.
   3.

   *Decentralisation vs Centralisation:* We know that aggregation works as
   a process to bring together disparate data, standardised, normalise it and
   make it available to other parties, but we also know that this is labour
   intensive, very hierarchical, and does not distribute knowledge and
   expertise. On the other hand more decentralised ways of working have yet to
   be really proven in practice. Presentations that give the latest thinking
   on how we can best enable access to cultural heritage data and reduce
   friction costs are welcome, particularly with evaluation on the relative
   strengths and weaknesses.
   4.

   *Multilingualism*: Google has more or less cracked full text translation
   of mainstream languages, but we are still struggling with niche languages
   and metadata. Presentations that evaluate the current thinking or give
   insights into the latest work in the area would fit well in this section of
   the creation and use of multilingual data in Cultural Heritage.

*DISCOVERY*

   1.

   *User Interaction: *Search is still the dominant means of gaining access
   to the wealth of cultural heritage data now online, but does it represent
   that wealth? Search is ungenerous: it withholds information, and demands a
   query, what are the alternatives? Papers on generous interfaces and
   frictionless design are sought to shed new light on how Cultural Heritage
   can show itself more deeply. Evaluating the benefits and weaknesses to the
   user in the process.
   2.

   *Artificial Intelligence: *For this subject topics ranging from machine
   learning to neural network-based approaches to Cultural Heritage are
   welcome. This includes applications of AI from image feature recognition to
   natural language processing, and from building search interfaces on
   features/colour similarity between images and discovery to the use of human
   metadata and computer vision.  We would also be interested in the audio and
   moving image equivalents. Anything dealing with the combination of metadata
   tags, image similarity and machine learning based on user input would be
   very relevant as would Artificial Intelligence technology for content
   curation.

*DELIVERY*

   1.

   

Re: [Wikidata] [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

2017-11-25 Thread Sandra Fauconnier
It would be great to have federated Wikibase on meta.wikimedia.org 
<http://meta.wikimedia.org/> specifically to manage these kinds of projects, 
institutions we work with, etc.
I saw that idea appear on wishlists somewhere, but, I guess, seeing the current 
workload for other projects, this is not going to be implemented soonish, so an 
‘in between’ solution that is easily transferrable to external federated 
wikibase solutions later would be good.

-- 
Sandra Fauconnier
sandra.fauconn...@gmail.com
http://www.spinster.be



> On 25 Nov 2017, at 16:01, Gerard Meijssen <gerard.meijs...@gmail.com> wrote:
> 
> Hoi,
> I "abuse"  the property catalog for some time now just to get this effect. I 
> use it to identify items that are part of a project like the "Black Lunch 
> Table". It works really well it is used for instance to identify in queries; 
> by associating it with a location, we know the subjects of / for an ediathon. 
> 
> The principle is the same
> Thanks,
>  GerardM
> 
> On 25 November 2017 at 05:42, John Erling Blad <jeb...@gmail.com 
> <mailto:jeb...@gmail.com>> wrote:
> Implicit heterogeneous unordered containers where members sees a homogeneous 
> parent. The member properties should be transitive to avoid the maintenance 
> burden, like a "tracking property", and also to make the parent item 
> manageable.
> 
> I can't see anything that needs any kind of special structure at the entity 
> level. Not even sure whether we need a new container for this, claims are 
> already unordered containers.
> 
> On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett <a...@pigsonthewing.org.uk 
> <mailto:a...@pigsonthewing.org.uk>> wrote:
> On 24 November 2017 at 23:30, Dario Taraborelli
> <dtarabore...@wikimedia.org <mailto:dtarabore...@wikimedia.org>> wrote:
> 
> > I'd like to propose a fairly simple solution and hear your feedback on
> > whether it makes sense to implement it as is or with some modifications.
> >
> > create a Wikidata class called "Wikidata item collection" [Q-X]
> 
> This sounds like Wikimedia categories, as used on Wikipedia and
> Wikimedia Commons.
> 
> --
> Andy Mabbett
> @pigsonthewing
> http://pigsonthewing.org.uk <http://pigsonthewing.org.uk/>
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikidata 
> <https://lists.wikimedia.org/mailman/listinfo/wikidata>
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikidata 
> <https://lists.wikimedia.org/mailman/listinfo/wikidata>
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Reminder: Structured Commons IRC office hour starts in a bit more than an hour

2017-11-21 Thread Sandra Fauconnier

Hi all!

In a bit more than an hour - at 18:00 UTC - the IRC office hour about 
Structured Data on Wikimedia Commons will start in the IRC channel 
#wikimedia-office. We plan to give many updates about the project, and 
of course there is also room for questions. The log will be published 
afterwards.


Hope to see you there! Sandra

--
Sandra Fauconnier
Community Liaison for Structured Data on Wikimedia Commons, Wikimedia 
Foundation

sfauconn...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Structured Commons newsletter, October 25, 2017

2017-10-25 Thread Sandra Fauconnier
Hello everyone! Here's the most recent news about Structured Data on
Wikimedia Commons
<https://commons.wikimedia.org/wiki/Commons:Structured_data>.


Community updates

   -

   Rama <https://commons.wikimedia.org/wiki/User:Rama> published an article
   about Structured Commons in Arbido, a Swiss online magazine for
   archivists, librarians and documentalists: original in French,
   illustrated
   
<http://arbido.ch/fr/edition-article/2017/metadonn%C3%A9es-donn%C3%A9es-de-qualit%C3%A9/donn%C3%A9es-structur%C3%A9es-la-puissance-de-wikidata-au-service-de-wikimedia-commons>
   and the article translated in English
   <https://commons.wikimedia.org/wiki/User:Rama/Structured_Data>.
   -

   We now have a dedicated IRC channel: wikimedia-commons-sd webchat
   <https://webchat.freenode.net/?channels=#wikimedia-commons-sd>

Things to do / input and feedback requests
<https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved>

   -

   Join the community focus group
   
<https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Community_focus_group>
   !
   -

   Translation. Do you want to help out translating messages about
   Structured Data on Commons from English to your own language? Sign up on
   the translators' page
   
<https://meta.wikimedia.org/wiki/Structured_Data_on_Commons/Newsletter/Translators>
   .
   -

   The documentation and info pages about Structured Data on Commons
   <https://commons.wikimedia.org/wiki/Commons:Structured_data> have
   received a thorough update, in order to get them ready for all the upcoming
   work. Obsolete pages were archived. There are undoubtedly still a lot of
   omissions and bits that are unclear. You can help by editing boldly, and by
   leaving feedback and tips on the talk pages.
   -

   We have started to list tools, gadgets and bots
   
<https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Tools>
   that might be affected by Structured Commons in order to prepare for a
   smooth transition to the new situation. You can help by adding alerts
   about/to specific tools and developers on the dedicated tools page
   
<https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Tools>.
   You can also create Phabricator tasks
   <https://phabricator.wikimedia.org/T173971> to help keep track of this.
   Volunteers and developers interested in helping out with this process are
   extremely welcome - please sign up
   
<https://commons.wikimedia.org/wiki/Commons:Structured_data/Get_involved/Tools>
   !
   -

   Help write the next Structured Commons newsletter
   <https://meta.wikimedia.org/wiki/Structured_Data_on_Commons/Newsletter/Next>
   .

Presentations / Press / Events
<https://commons.wikimedia.org/wiki/Commons:Structured_data/About/Press_and_presentations>

   -

   Structured Data on Commons was presented at Wikimania 2017 in Montréal
   for a packed room. First design sketches for search functionality were
   discussed during a breakout session. Read the Etherpad reports of the
   presentation
   <https://etherpad.wikimedia.org/p/Wikimedia2017-StructuredDataPresentation>
   and the breakout session
   
<https://etherpad.wikimedia.org/p/Wikimedia2017-StructuredDataDesignDiscussion>
   .
   -

   Katherine Maher, Executive Director of the Wikimedia Foundation, answered
   questions on Quora <https://www.quora.com/session/Katherine-Maher/1>. One
   of her answers
   
<https://www.quora.com/Whats-your-take-on-VideoWiki-building-a-multi-media-encyclopedia>,
   mentioning Structured Data on Commons, was republished on Huffington Post
   
<https://www.huffingtonpost.com/entry/wikipedia-is-working-on-making-its-treasure-troves_us_59d2e800e4b03905538d17ea>
   .
   -

   Sandra Fauconnier, Amanda Bittaker and Ramsey Isler from the Structured
   Commons team will be at WikidataCon. Sandra presents Structured Commons
   there
   
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Submissions/Structured_Data_on_Wikimedia_Commons:_what%27s_coming,_and_how_to_be_involved_as_Wikidatans>
   (with a focus on fruitful collaboration between the Wikidata and Commons
   communities). If you attend the conference, don't hesitate to say hi and
   have a chat with us! (phabricator task T176858
   <https://phabricator.wikimedia.org/T176858>)

Team updates

Two new people have been hired for the Structured Data on Commons team. We
are now complete! :-)

   -

   Ramsey Isler is the new Product Manager of the Multimedia team.
   -

   Pamela Drouin was hired as User Interface Designer. She works at the
   Multimedia team as well, and her work will focus on the Structured Commons
   project.

Partners and allies

   -

   We are still welcoming (more) staff from GLAMs (Galleries, Libraries,
   Archives and Museums) to become part of our long-term focus group
(phabricator
   task T17

[Wikidata] Fwd: Call for Participation: Lexical Data Masterclass

2017-10-12 Thread Sandra Fauconnier
Hi all! Maybe of interest for those involved in 
Wikidata/Wiktionary/lexicographical data… Best! Sandra

> -- Forwarded message --
> From: DARIAH-EU >
> Date: 2017-10-12 13:19 GMT+02:00
> Subject: Call for Participation: Lexical Data Masterclass
> To: rient...@wikimedia.nl 
> 
> 
> DARIAH Update
> The DARIAH Update contains important news from the DARIAH network, which we 
> share to inform you. Please feel free to disseminate this information through 
> your networks.
> 
> Call for Participation: Lexical Data Masterclass
> 
> The Lexical Data Master Class aims at bringing together 20 trainees together 
> with experts to share experiences, methods and techniques for the creation, 
> management and use of digital lexical data.
> 
> Background and Motivation
> Co-organized by DARIAH, the Berlin Brandenburg Academy of Sciences (BBAW), 
> Inria (Paris, France) and the Belgrade Center for Digital Humanities, with 
> the support of the German Ministry of Education and Research (BMBF), Clarin 
> and DARIAH-DE, the Lexical Data Master Class will take place in Berlin at the 
> BBAW from 4 to 8 December 2017.
> 
> This Master Class is part of a joint French-German program supported by the 
> BMBF and MESRI (French Ministry for Higher Education, Research and 
> Innovation). The masterclass will cover a wide range of topics ranging from 
> general models for lexical content and TEI-based representation of lexical 
> data to managing digital lexica as online resources and working efficiently 
> with XML editors. The participants will have a chance to attend two keynote 
> talks as well as different sessions on a variety of lexicon-related topics, 
> as well as to consult with experts on their own dictionary projects.
> 
> Keynote speakers will be James Pustejowksy (Brandeis University) and Frieda 
> Steurs (KU Leuven).
> 
> Training material produced for and during the master-class will be converted, 
> revised and submitted for peer-reviewed publication  in #dariahTeach 
> (https://teach.dariah.eu/ ), an online training 
> platform, partially as supplements to the existing #dariahTeach course on 
> Digitizing Legacy Dictionaries, and partially in the form of dariahTeach 
> Workshops (on topics such as “XPath for Lexicographers” or “Representing 
> Morphological Data in Digital Lexica”). Dictionary samples used and annotated 
> during the Master Class will be published in the ENEL WG2 / PARTHENOS GitHub 
> repository.
> 
> Prerequisites
> Potential applicants should submit a short proposal presenting their 
> background and interest in the field together with a description of a 
> particular project involving lexical data that they would like to pursue 
> during the master class.
> 
> Application Process
> Potential applicants should submit a short proposal presenting their 
> background and interest in the field together with a description of a 
> particular project involving lexical data that they would like to pursue 
> during the master class.
> 
> Participation is free of charge. Travel costs and accommodation will be 
> covered for all participants up to a maximum of 600€.
> 
> Applications should be made via the Lexical Master Class website: 
> https://lexmc.sciencesconf.org 
> 
> Deadline for application: 20th October 2017
> Notification of acceptance: 27th October 2017
> 
> Further inquiries can be made to: le...@sciencesconf.org 
> 
> Looking for more information on the DARIAH network? 
> Subscribe to DARIAH-EU's monthly Newsletter:
> http://dariah.eu/library/newsletter.html 
> 
>  
> 
>  
> 
> Copyright © 2017 dariah-eu, All rights reserved.
>  
> 
>  
> 
> Copyright © 2017 dariah-eu, All rights reserved.
> 
> 
> 
> 
> 
> 
> This email was sent to rient...@wikimedia.nl  
> why did I get this? 
> 
> unsubscribe from this list 
> 
> update subscription preferences 
> 
>  
> dariah-eu  
> 

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-27 Thread Sandra Fauconnier
+1 from someone who would be so extremely happy (and much more productive) if 
such a service were implemented in OpenRefine.

I also added it as a task to Phabricator, feel free to comment, add 
suggestions… https://phabricator.wikimedia.org/T146740 


Best, Sandra/User:Spinster

> On 26 Jan 2017, at 19:00, Thad Guidry  wrote:
> 
> Everyone,
> 
> Yes, our OpenRefine API can use Multiple Query Mode (reconciling an Entity by 
> using multiple columns/ WD properties)
> 
> https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#multiple-query-mode
>  
> 
> I do not think that Magnus has implemented our Multiple Query Mode yet, 
> however.
> The bounty issue https://github.com/OpenRefine/OpenRefine/issues/805 
>   that I created and 
> funded on BountySource.com is to fully implement the Mutliple Query Mode API 
> and ensure that it works correctly in OpenRefine 2.6 RC2 latest.
> 
> Happy Hacking anyone :)
> Let us know if we can answer any questions regarding OpenRefine or the 
> Reconcile API , on our own mailing list.
> http://groups.google.com/group/openrefine/ 
> 
> 
> -Thad
> 
> 
> On Thu, Jan 26, 2017 at 11:18 AM AMIT KUMAR JAISWAL  > wrote:
> Hey Alina,
> 
> Thanks for letting us know about this.
> 
> I'll start testing it after configuring OpenRefine(as it's API is
> implemented in WMF).
> 
> Can you share me the open task related to this?
> 
> Cheers,
> Amit Kumar Jaiswal
> 
> On 1/26/17, Antonin Delpeuch (lists)  > wrote:
> > Hi Magnus,
> >
> > Mix'n'match looks great and I do have a few questions about it. I'd like
> > to use it to import a dataset, which looks like this (these are the 100
> > first lines):
> > http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt 
> > 
> >
> > I see how to import it in Mix'n'match, but given all the columns I have
> > in this dataset, I think that it is a bit sad to resort to matching on
> > the name only.
> >
> > Do you see any way to do some fuzzy-matching on, say, the URLs provided
> > in the dataset against the "official website" property? I think that it
> > would be possible with the (proposed) Wikidata interface for OpenRefine
> > (if I understand the UI correctly).
> >
> > In this context, I think it might even be possible to confirm matches
> > automatically (when the matches are excellent on multiple columns). As
> > the dataset is rather large (400,000 lines) I would not really want to
> > validate them one after the other with the web interface. So I would
> > need a sort of batch edit. How would you do that?
> >
> > Finally, once matches are found, it would be great if statements
> > corresponding to the various columns could be created in the items (if
> > these statements don't already exist). With the appropriate reference to
> > the dataset, ideally.
> >
> > I realise this is a lot to ask - maybe I should just write a bot.
> >
> > Alina, sorry to hijack your thread. I hope my questions were general
> > enough to be interesting for other readers.
> >
> > Cheers,
> > Antonin
> >
> >
> > On 26/01/2017 16:01, Magnus Manske wrote:
> >> If you want to match your list to Wikidata, to find which entries
> >> already exist, have you considered Mix'n'match?
> >> https://tools.wmflabs.org/mix-n-match/ 
> >> 
> >>
> >> You can upload your names and identifiers at
> >> https://tools.wmflabs.org/mix-n-match/import.php 
> >> 
> >>
> >> There are several mechanisms in place to help with the matching. Please
> >> contact me if you need help!
> >>
> >> On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
> >>  
> >> >> 
> >> wrote:
> >>
> >> Alina, I just found your bug report, which you filed under the wrong
> >> issue tracker. The git repo (source code, issue tracker etc.) are
> >> here:
> >> https://bitbucket.org/magnusmanske/reconcile 
> >> 
> >>
> >> The report says it "keeps hanging", which is so vague that it's
> >> impossible to debug, especially since the example linked on
> >> https://tools.wmflabs.org/wikidata-reconcile/ 
> >> 
> >> works perfectly fine for me.
> >>
> >> Does it not work at all for you? Does it work for a time, but then
> >> stops? Does it "break" reproducibly on specific queries, or at
> >> random? Maybe it breaks for specific 

Re: [Wikidata] Terms - search for corresponding WD-item and WP-article

2016-10-14 Thread Sandra Fauconnier
What you are encountering here, is a major bottleneck and timesuck for any data 
import into Wikidata. Matching external lists of concepts (names of people, 
places, buildings, whatever) from external datasets correctly with the right 
Wikidata items is a thing that always takes me hours and hours and hours of 
work.

In order to solve it, we need a working and user-friendly reconciliation tool 
that is integrated into a common data management platform (i.e. OpenRefine, and 
would also be fantastic to have it for Google Spreadsheets).

Magnus has developed a basic API for it 
, but a working and 
user-friendly interface in one of those tools mentioned above is the missing 
link.

I want to emphasize again that there is a bounty (money!) to be earned 

 for those who develop this for OpenRefine.

I have outlined the task in Phabricator too. 
https://phabricator.wikimedia.org/T146740 


Just putting this out here to give it attention again. It is such an important 
missing link in the workflow of anyone who wants to import data into Wikidata.
I’m so desperate for it that I’m considering to collect funding and then hire 
an external developer to make it, but of course it would be best if it would be 
developed and maintained from within our community ;-)

Greetings, Sandra

> On 13 Oct 2016, at 11:16, Markus Bärlocher  
> wrote:
> 
> Hi Tom,
> 
>> This is a lighthouse case for my Google Sheets add-on 
> 
> Great tool - thanks!
> And more great tools included there :-)
> 
>> just add new terms to the "Terms" column, everything else fills 
>> automagically.
> 
> I checked the first results by hand:
> 30% of the found WP-articles are specific helpful
> 70% of the URLs lead to not concordant content
> 
> My idea:
> A "reliability index" may be could help?
> 
> (1. handy approved accordance of Term and WP-article)
> 2. Term and Lemma identical
> 3. Term and section title identical
> 4. all words in Term found in Lemma
> 5. all words in Term found in section title
> 6. Term found as string in article text
> 
> But I have no idea how to do this myself:
> https://github.com/tomayac/wikipedia-tools-for-google-spreadsheets/issues/11
> 
> Best regards,
> Markus
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Possibility of data lock

2016-06-10 Thread Sandra Fauconnier

> On 10 Jun 2016, at 12:39, Yellowcard  wrote:

> However, there are single statements (!) that
> are proven to be correct (possibly in connection with qualifiers) and
> are no subject to being changed in future. Locking these statements
> would make them much less risky to obtain them and use them directly in
> Wikipedia. What would be the disadvantage of this, given that slightly
> experienced users can still edit them and the lock is only a protection
> against anonymous vandalism?

I agree 100%, and would like to add (again) that this would also make our data 
more reliable for re-use outside Wikimedia projects.

There’s a huge scala of possibilities between locking harshly (no-one can edit 
it anymore) and leaving stuff entirely open. I disagree that just one tiny step 
away from ‘entirely open’ betrays the wiki principle.

Of course I’m in favour of all improvements to watchlist systematics. However, 
with 100,000+ items I’ll probably be watching at some point, even great tools 
might not be enough to catch everything.
And I’d indeed like to focus all my time on constructive new edits, and 
advocating the great work we do ;-)

Sandra
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Possibility of data lock

2016-06-10 Thread Sandra Fauconnier

> On 09 Jun 2016, at 15:25, Julie McMurry  wrote:
> 
> How big a problem is fact vandalism? It may be less likely to be 
> detected/fixed in languages for which there are fewer editors. Only if a big 
> problem, I'd suggest that specific text (not whole articles) be protected, 
> but not locked. Eg implementing a requirement for confirmation by multiple 
> editors before it is published. A lock would be too likely to thwart 
> legitimate edits and could be abused by moderators.
> 
> Some ostensibly hard facts do in fact change over time. Even the measurement 
> of the mass of the electron took years to perfect.

Correct. But then, also, for the history of science it is valuable to know how 
that measurement has evolved over the years.
So you could have something like
- mass of the electron has  / statement with reliable 
reference and ‘deprecated status' / valid from a certain date till a later date 
/ <- statement protected
- mass of the electron has  / statement with reliable 
reference and ‘deprecated status' / valid from that later date till an even 
later date / <- statement protected
- etc
- (current situation) mass of the electron has  / 
statement with reliable reference and ‘preferred status' / valid from a certain 
date / <- statement protected as soon as it has its reliable reference
- New research? Add a new statement, give it ‘preferred’ status, give the 
previous one ‘deprecated’ value.
Awesome stuff for science historians.

I have seen many frustrating cases of merges and changes to ‘good’ statements 
too; not all are due to vandalism, some can also be attributed to lack of 
experience or to agendas, for instance. And having a hard time to keep track of 
it via my watchlist. I’m very much in favour of a system where we have 
semi-protection of statements with reliable references, approved by a certain 
number of trustworthy editors, and editable only by trustworthy editors. (I 
know this is a very tricky thing to organise…..)

I have a hunch that this would also make Wikidata much more attractive for 
external parties. In informal discussions with GLAMs, for instance, this issue 
comes up all the time: how can we really trust that the data on Wikidata is 
good? Why should we link our own databases to Wikidata and re-use its data if 
anyone can add nonsense there? Is there a way to indicate that certain stuff on 
Wikidata is reliable?

Best, Sandra






___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] From the game to the competition

2016-02-07 Thread Sandra Fauconnier
Wow, these are great links, Lydia, thanks.

I, for one, would warmly welcome more well-designed games, especially in the 
distributed game framework that Magnus has built.
Not so much for the playfulness, but because it’s such an easy way to do many 
useful edits without needing deep concentration, and because I find it really 
interesting to see all the kind of content that we cover (the games allow me to 
get out of the ‘filter bubble’ in which I usually edit, which is the field of 
the visual arts).

Game-like tools that I would like to see, would include
- a sourcing game to add reference URLs from RKDartists to statements related 
to artists (birth dates, death dates, places of birth and death, professions)
- a nice and pleasant interface that allows me to state what is depicted in an 
artwork
- a better game to add Thesaurus of Geograpic Names IDs to geographic entities. 
The TGN is now in the distributed and ‘normal’ Mix’n’Match version but is hard 
to match in these. It really needs a good interface with a large map for ‘our’ 
items next to the more detailed geographical info contained in the TGN (tree 
view; is it a city/river/mountain…)

A competition element, on the other hand, would really put me off. I don’t care 
at all, I’m not in it for that and it would chase me away very quickly.

Sandra


> On 07 Feb 2016, at 10:00, Lydia Pintscher  
> wrote:
> 
> On Sat, Feb 6, 2016 at 11:08 PM David Abián  > wrote:
> Hi folks,
> 
> It's fantastic to see that we have such interesting tools to contribute
> to Wikidata like Magnus' games.
> 
> With Wikidata Game and The Distributed Game as a base, I think we could
> go further and get a tool that serve, not only as a game, but as a real
> competition. In particular, with the following additions and a few
> suggestions, I believe we could celebrate great /in situ/ Wikidata
> competitions over the world:
> 
> * A chronometer with a start and a scheduled end while contributions are
> registered for the contest.
> * Some quorum (e.g., three) so that edits in the contest are only
> applied to Wikidata if that quorum of people agrees on an answer.
> * A scoring system that only provide points (or much more points) to
> those who get a quorum. This avoids people answering randomly while they
> destroy Wikidata and earn more and more points.
> * A way to show the same questions to the quorum number of participants
> during the competition.
> * A real-time ranking in the competition scope.
> * A way to manage the list of participants and to register an
> administrator, or multiple ones, for every contest.
> 
> Would this be a good idea? Would anyone like to develop some of these
> features?
> 
> 
> Please be aware that a large amount of gamification systems actually hurt the 
> community they are trying to build. They (usually) only work in the 
> short-term and make the situation worse in the longer-term. The way the 
> current Wikidata Game is doing it is very good.
> Here are some articles to read up on this topic:
> * https://www.feverbee.com/participation-for-intrinsic-reasons/ 
>  
> * https://www.feverbee.com/blunt-instruments/ 
> 
> * https://www.feverbee.com/reputationsystems/ 
> 
> * https://www.feverbee.com/bigparticipationimpact/ 
> 
> * https://www.feverbee.com/dont-use-recognition-shortcuts/ 
> 
> * https://www.feverbee.com/superfans/ 
> 
> I can recommend the blog above for all kinds of interesting research-backed 
> information around communities.
> 
> Cheers
> Lydia 
> -- 
> Lydia Pintscher - http://about.me/lydia.pintscher 
> 
> Product Manager for Wikidata
> 
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de 
> 
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> 
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter 
> der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für 
> Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] staying mutually up-to-date with external authority sources was: Re: Duplicates in Wikidata

2015-12-31 Thread Sandra Fauconnier
This exchange with VIAF/OCLC is only one case. We are increasingly becoming a 
hub for authority sources/files/records, and staying mutually up-to-date with 
them is (IMHO) really important.
And indeed, we don’t have reliable tools/workflows in place yet to take good 
care of this…

I haven’t given it much thought yet either, but what occurs from the top of my 
mind (among other things):
1. We have changes and updates on our own side (sometimes in error, sometimes 
correctly)
1.1 We add new entries that correspond with not-yet-linked entries in external 
authority files
1.2 We remove duplicates in Wikidata
1.3 We add errors that may or may not be tracked by the external party (e.g. 
the VIAF case here) -> this feedback is indeed gold to us

2. There are changes and updates on the authority file’s side
2.1 Entries are added on their side, and some or all may correspond to items on 
Wikidata
2.2. Entries are removed on their side (RKDartists has deleted many entries 
recently, for instance)
2.3 We discover errors, duplicates… in the external authority file -> this 
information is probably gold to the external party too

I have the feeling that this is one large issue that would ideally be covered 
in one overall solution/system. Mix’n’Match is the first step in that 
direction, but is only very rudimentary and doesn’t catch all issues described 
above.

Any thoughts or input on this? (I’m not a developer, just an active 
user/contributor to this…)

Greetings, Sandra/User:Spinster

> On 28 Dec 2015, at 20:29, Tom Morris  wrote:
> 
> I think there are at least two uses for information like this.  Fixing the 
> actual errors is good, but perhaps more important is looking at why they 
> happened in the first place.  Are there infrastructure/process issues which 
> need to be improved? Are there systemic problems with particular tool chains, 
> users, domains, etc? What patterns does the data show?
> 
> I've attached a munged version of the list in a format which I find a little 
> easier to work with and added Wikidata links
> 
> Looking at the 30 oldest entities, 22 (!) of the duplicates were added by a 
> single user (bot?) who was adding botanists, apparently based on data from 
> the International Plant Names Index 
> , without first checking to see 
> if they already existed.  The user's page indicates that they've had 2.5 
> *million* items deleted or merged (~12% of everything they've added).  I'd 
> hope to see high volume users/bots/tools in the 99%+ range for quality, not 
> <90%.
> 
> One pair is not a duplicate, but rather a father 
>  & son 
>  with the same name, apparently 
> flagged because they were both born in the 2nd century and died in the 3rd 
> century, making them a "match."
> 
> A few of remaining the duplicates were created by a variety of bots importing 
> Wikipedia entries with incompletely fused sitelinks (not terribly surprising 
> when the only structured information is a name and a sitelink).
> 
> The last few pairs of duplicates don't really have enough provenance to 
> figure out the source of the data.  One was created just a couple of weeks 
> ago by a bot 
>  using 
> "data from the Rijksmuseum" (no link or other provenance given), apparently 
> without checking for existing entries first.  A few 
>  others 
>  was 
> created by Widar 
> , but I 
> can't tell what game, what data source, etc.
> 
> Looking at three pairs of entries which were created at nearly the same time 
> (min QNumberDelta), each pair was created by a single game/bot, indicating 
> inadequate internal duplicate checks on the input data.
> 
> It seems like post hoc analysis of merged entries to mine for patterns would 
> be a very useful tool to identify systemic issues. Is that something that is 
> done currently?
> 
> Tom
> 
> 
> 
> On Wed, Dec 23, 2015 at 5:05 PM, Proffitt,Merrilee  > wrote:
> Hello colleagues,
> 
>  
> 
> During the most recent VIAF harvest we encountered a number of duplicate 
> records in Wikidata. Forwarding on in case this is of interest (there is an 
> attached file – not sure if that will go through on this list or not).
> 
>  
> 
> Some discussion from OCLC colleagues is included below.
> 
>  
> 
> Merrilee Proffitt, Senior Program Officer
> OCLC Research
> 
>  
> 
> From: Toves,Jenny 
> Sent: Tuesday, December 22, 2015 6:02 AM
> To: Proffitt,Merrilee
> Subject: FW: 201551 vs 201552
> 
>  
> 
> Good morning Merrilee,
> 
>  
> 
> You probably know that we harvest wikidata monthly for ingest into VIAF. This 
> month we found 315 pairs of records 

[Wikidata] whitepaper on the Belgian museums' Wikidata project

2015-11-13 Thread Sandra Fauconnier
Hi everyone,

Romaine just mentioned the project of Flemish museums donating metadata to 
Wikidata 
.
 

A first outcome of this project is a whitepaper 
.
  In the first place, it is intended for the project partners, to introduce 
Wikidata, to explain what the project is about, what the costs and benefits are 
for them, and how the upload will work (i.e. generally - later in the project 
we will also produce a more detailed handbook).
However, we think the whitepaper might be interesting to a much broader 
audience as well - especially other (cultural) organisations who also consider 
donating data to Wikidata.

I just finished translating it from Dutch to English and transferring it to 
Wikidata. If anyone wants to read it, comment, add corrections… I’d be very 
grateful! All feedback is welcome.
Also, feel free to share it if relevant.
https://www.wikidata.org/wiki/Wikidata:Flemish_art_collections,_Wikidata_and_Linked_Open_Data/Whitepaper
 


Greetings, Sandra (User:Spinster)___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] getting RDF out of a specific WDQ query of Wikidata?

2015-10-26 Thread Sandra Fauconnier
Hi all,

For this Flemish museums on Wikidata project 

 ( … we hope to import some 30,000 Flemish artworks in the upcoming months :-) 
… ) I and the rest of the project team are trying to find out if and how we’ll 
be able to retrieve RDF from Wikidata - one RDF export/file for all concerned 
items at once.

So this is not RDF for a single item (like this 
) and also not 
a RDF dump of all of Wikidata like mentioned here 
. It would 
be an RDF file corresponding to the results of this WDQ query 

 (which should produce more than 30,000 items in a few months!).

Any tips on how to achieve this? Wikidata Toolkit? But how/what to do? We are 
not programmers/developers but we do have some budget to hire someone to build 
us something, so pointers to a (Belgian??) developer who could help would also 
be very welcome.

The project raises quite a few questions, by the way, so I might come back with 
more :-)

Many thanks in advance! Sandra (User:Spinster)___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] The MoMa, CSV, GitHub, and CC0

2015-08-02 Thread Sandra Fauconnier
As I hinted on project chat: if we create a property for MoMA IDs, we really 
have to be consistent and create property for all art collections that use 
persistent IDs and of which we have items on Wikidata.
I think we (will) have hundreds of these.
Are we OK with that - having hundreds of museum and art collection-specific 
properties?

Best, Sandra

 On 30 Jul 2015, at 17:44, Gerard Meijssen gerard.meijs...@gmail.com wrote:
 
 Hoi,
 Good job ... went there to approve ... but as there is nothing to approve I 
 do.
 Thanks,
   GerardM
 
 On 30 July 2015 at 16:27, Andy Mabbett a...@pigsonthewing.org.uk 
 mailto:a...@pigsonthewing.org.uk wrote:
 On 30 July 2015 at 14:52, Magnus Manske magnusman...@googlemail.com 
 mailto:magnusman...@googlemail.com wrote:
 
  Everyone vote for MoMA property creation please ;-)
 
  https://www.wikidata.org/wiki/Wikidata:Property_proposal/Authority_control#MoMA_artwork
   
  https://www.wikidata.org/wiki/Wikidata:Property_proposal/Authority_control#MoMA_artwork
 
 I took the liberty of speedily creating that - P2014
 
 --
 Andy Mabbett
 @pigsonthewing
 http://pigsonthewing.org.uk http://pigsonthewing.org.uk/
 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata 
 https://lists.wikimedia.org/mailman/listinfo/wikidata
 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] tool to do multiple searches in Wikidata at once?

2015-07-23 Thread Sandra Fauconnier
Hi everyone,

Thanks for the tips so far!

I usually receive and look up ‘external’ lists of concepts/names, for which I 
have no corresponding Wikipedia article, and there may be spelling variations 
or even errors.
I tried Linked Items! Thank you! It does give me partial results, which makes a 
nice difference! But if someone knows of an even more generic search tool (e.g. 
across all Wikipedias and searching in all Wikidata aliases), that would make 
me even more happy :-D

In any case, Linked Items will make my life easier in the near future for sure.

BTW, here’s a typical example of a (small part of a) list of terms I’d like to 
look up on Wikidata:

Names of Dutch companies, well-known products, and designers, for an 
edit-a-thon, provided by A GLAM:

Wim Rietveld
Martin Visser
Spectrum Design
Kembo
TX 400
Rein Veersema
Gerard Kiljan
Heemaf
Wim Gilles
DRU
Vladimir Flem
Fridor
G.M.E. Bellefroid
Koninklijke Mosa

Thanks! Sandra





 On 23 Jul 2015, at 09:29, Gerard Meijssen gerard.meijs...@gmail.com wrote:
 
 Hoi,
 When the list is a list of articles in a Wikipedia, try Linked items one 
 magnificent tool by Magnus that can be used for this..
 Thanks,
  GerardM
 
 
 
 
 https://tools.wmflabs.org/wikidata-todo/linked_items.php 
 https://tools.wmflabs.org/wikidata-todo/linked_items.php
 
 On 23 July 2015 at 08:44, Sandra Fauconnier sandra.fauconn...@gmail.com 
 mailto:sandra.fauconn...@gmail.com wrote:
 Hi everyone,
 
 I’ve been in the situation quite often (edit-a-thons; various to do lists) 
 where I had a list of terms (most usually names of a few hundreds of people, 
 or titles of Wikipedia articles), where I wanted to do a quick search on 
 Wikidata to retrieve each of these concept’s Q number.
 Does anyone know of a tool that helps me make this easier? Enter a list of, 
 say, 100 of these search terms, and receive Q number suggestions for each of 
 them? I’ve looked around on wmflabs but have not found anything in that 
 direction (also not with the help of Hay’s awesome tool directory 
 http://tools.wmflabs.org/hay/directory/#/).
 
 Till now, I’ve done all these searches manually - use an excel sheet, look 
 for each term individually, enter Q number for each term - quite accurate but 
 very time-consuming!
 
 Would appreciate all help/tips !
 Thanks! Sandra (User:Spinster)
 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata 
 https://lists.wikimedia.org/mailman/listinfo/wikidata
 
 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata