Re: [Wikimedia-l] How diverse are your readers?

2019-03-18 Thread Gerard Meijssen
Hoi,
I read the blogpost and it utterly misses the point. The point is that this
is NOT about English Wikipedia, for them another approach will work better.
At the same time when you read my blogpost, you will find that the elephant
in the room is that we consider articles to be synonymous with subjects.
They are not. We do not have an aggregated number of most popular subjects,
subjects on all Wikipedias. When we did, we would know what the world reads
and not what is served by a single Wikipedia, the English Wikipedia.

The biggest benefit is that it will provide us with a list with less of an
Anglo-American bias. One subset of this list will be what the world reads
and is not available on English. Subjects that feature high in the world
indicate a particular kind of notability. It will be really interesting to
see how these subjects will be appreciated by the public and the "wiki
gnomes". Finding authors can be done in a similar way as the "gender bias"
approach.

Another thing where the blogpost misses the point is that is concentrates
on English Wikipedia. The only line left for the small Wikipedias is
that the gem-to-dung ratio may differ. As English has never been my
objective of this approach, it disqualifies the results. By posting this
blogpost, you make it plain you have not read or understood what it is that
I propose in my blogpost [1].

First I want the search extension by Magnus active on every Wikipedia. This
will expose all subjects known to us as a result, not just the articles on
a Wikipedia. It is save to log such an interest. All we want is a
timestamp, the language and the Qid. This is exactly what we do for
articlesl so there is no privacy issue here. I also want to invite people
to add labels and false friends in their language.

For any Wikipedia, the approach what is the most read article that you do
not have, is a valid approach to propose the writing of a new article. Some
will use this list, most will not and again, the English Wikignomes do not
know the language elsewhere.

We know what articles are receiving what traffic. It is just a data
question for us to know what new articles received what traffic in a full
month. Exposing this, highlighting their success is a powerful way to
provide recognition.

In conclusion, there is a very strong bias for English Wikipedia in the
attention given to the exclusion of others. English is less than fifty
percent of our traffic. It gets more than eighty percent of attention. As
you read in the comments of your blogpost, I am happy to collaborate but so
far it has not fit your agenda.
Thanks,
  GerardM



[1]
https://ultimategerardm.blogspot.com/2019/03/sharing-in-sum-of-all-knowledge-from.html

On Mon, 18 Mar 2019 at 18:28, Ed Erhart  wrote:

> Hey folks,
>
> Trey authored a Wikimedia blog post on this as well:
> https://blog.wikimedia.org/2017/12/12/failed-queries-fear-of-missing-out/
>
> --Ed
>
> On Mon, Mar 18, 2019 at 11:34 AM Dan Garry (Deskana) 
> wrote:
>
> > The topic of zero-result search queries comes up from time to time. The
> > logic is generally this: if we can see the top queries that got no
> results,
> > then we can figure out what users are looking for but not finding, and
> add
> > it to the encyclopedia. Wonderful user-centred thinking, and it sounds
> > great! The problem is, sadly, the data doesn't help us achieve this at
> all.
> >
> > The sheer volume of requests means that a lot of the top zero-results
> > queries are junk. Trey Jones, an engineer on the Search Platform Team,
> > wrote a comprehensive analysis
> > <
> >
> https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Zero-Results_Queries
> > >
> > a
> > few years ago of the top zero-result queries based on an analysis of a
> > 500,000 multi-lingual sample. It was quite enlightening in some senses—we
> > found out a lot about the things that people are doing with the search
> > system, found some bugs in other products, and so on—but it didn't
> actually
> > help us understand what people were looking for and not finding.
> >
> > Dan
> >
> > On Tue, 12 Mar 2019 at 23:12, Leila Zia  wrote:
> >
> > > Hi Gerard,
> > >
> > > On Sun, Mar 10, 2019 at 2:26 PM Gerard Meijssen
> > >  wrote:
> > > > but really
> > > > why can we not have the data that allows us to seek out what people
> are
> > > > actually looking for and do not find..
> > >
> > > Please open a Phabricator task for this request at
> > > https://phabricator.wikimedia.org . Please add Research as a tag and
> > > add me as one of the subscribers. I'd like to work with you on a
> > > concrete proposal. A few items to consider as you're expanding the
> > > description of the task:
> > >
> > > * We won't be able to release raw search queries as they come to
> > > Wikimedia servers. That is for privacy reasons.
> > >
> > > * You also likely don't need raw search queries. If you can be
> > > specific about what you want to have access to, as much as possible,
> > > that can help us get starte

[Wikimedia-l] [Wikimedia Research Showcase] March 20 at 11:30 AM PST, 18:30 UTC

2019-03-18 Thread Leila Zia
Hi all,

The next Research Showcase, “Learning How to Correct a Knowledge Base
from the Edit History” and “TableNet: An Approach for Determining
Fine-grained Relations for Wikipedia Tables” will be live-streamed
this Wednesday, March 20, 2019, at 11:30 AM PST/18:30 UTC (Please note
the change in time in UTC due to daylight saving changes in the U.S.).
The first presentation is about using edit history to automatically
correct constraint violations in Wikidata, and the second is about
interlinking Wikipedia tables.

YouTube stream: https://www.youtube.com/watch?v=6p62PMhkVNM

As usual, you can join the conversation on IRC at #wikimedia-research.
You can also watch our past research showcases at
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase .

This month's presentations:

Learning How to Correct a Knowledge Basefrom the Edit History

By Thomas Pellissier Tanon (Télécom ParisTech), Camille Bourgaux (DI
ENS, CNRS, ENS, PSL Univ. & Inria), Fabian Suchanek (Télécom
ParisTech), WWW'19.

The curation of Wikidata (and other knowledge bases) is crucial to
keep the data consistent, to fight vandalism and to correct good faith
mistakes. However, manual curation of the data is costly. In this
work, we propose to take advantage of the edit history of the
knowledge base in order to learn how to correct constraint violations
automatically. Our method is based on rule mining, and uses the edits
that solved violations in the past to infer how to solve similar
violations in the present. For example, our system is able to learn
that the value of the [[d:Property:P21|sex or gender]] property
[[d:Q467|woman]] should be replaced by [[d:Q6581072|female]]. We
provide [https://tools.wmflabs.org/wikidata-game/distributed/#game=43
a Wikidata game] that suggests our corrections to the users in order
to improve Wikidata. Both the evaluation of our method on past
corrections, and the Wikidata game statistics show significant
improvements over baselines.


TableNet: An Approach for Determining Fine-grained Relations for
Wikipedia Tables

By Besnik Fetahu

Wikipedia tables represent an important resource, where information is
organized w.r.t table schemas consisting of columns. In turn each
column, may contain instance values that point to other Wikipedia
articles or primitive values (e.g. numbers, strings etc.). In this
work, we focus on the problem of interlinking Wikipedia tables for two
types of table relations: equivalent and subPartOf. Through such
relations, we can further harness semantically related information by
accessing related tables or facts therein. Determining the relation
type of a table pair is not trivial, as it is dependent on the
schemas, the values therein, and the semantic overlap of the cell
values in the corresponding tables. We propose TableNet, an approach
that constructs a knowledge graph of interlinked tables with subPartOf
and equivalent relations. TableNet consists of two main steps: (i) for
any source table we provide an efficient algorithm to find all
candidate related tables with high coverage, and (ii) a neural based
approach, which takes into account the table schemas, and the
corresponding table data, we determine with high accuracy the table
relation for a table pair. We perform an extensive experimental
evaluation on the entire Wikipedia with more than 3.2 million tables.
We show that with more than 88% we retain relevant candidate tables
pairs for alignment. Consequentially, with an accuracy of 90% we are
able to align tables with subPartOf or equivalent relations.
Comparisons with existing competitors show that TableNet has superior
performance in terms of coverage and alignment accuracy.

Best,
Leila

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] [Wikimedia Education] Video tutorial regarding creating Wikipedia references with VisualEditor

2019-03-18 Thread Pine W
Hi Avery,

You might be interested in my final report from the previous grant, which
is at
https://meta.wikimedia.org/wiki/Grants:IEG/Motivational_and_educational_video_to_introduce_Wikimedia/Final#Learning
.

I was not planning to publish the referencing videos to YouTube, but I
think that's a good idea. Do you have any suggestions about how to make the
videos be easy for people to find if they search for Wikipedia help with
referencing on Youtube? If so, I would be appreciative if you'd add those
comments to the project talk page at
https://meta.wikimedia.org/wiki/Grants_talk:Project/Rapid/Pine/Continuation_of_educational_video_and_website_series
.

Accessibility and translation are very much in my mind as I'm working on
the current script. An issue to keep in mind is that the various language
editions of Wikipedia have variations in policies and workflows, so
translation alone may be insufficient to adapt a video or tutorial from one
language edition of Wikipedia to another language edition of Wikipedia.

I like the interactive nature of the Wiki Ed tutorial. WMF's Growth Team is
developing in-context help, which I think is also a good method for
teaching. My guess is that the optimal ways to teach how to edit Wikipedia
will be a combination of methods including video, interactive tutorials
possibly with quizzes and certifications, in-context help, and
individualized help. As an example of how the methods could be blended, I
think that in-context help could offer video tutorials of varying lengths
to cover certain subjects.

Thanks for the comments.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Sat, Mar 16, 2019 at 10:24 PM Avery Jensen 
wrote:

> @Bekriah, that is very exciting.  From time to time we have native Arabic
> speakers attend our events and have wondered how we could offer them
> instruction in Arabic. We would appreciate any further information you have
> on pending projects (or by email if it's not public or of general interest
> to this list).
>
> @Pine, it has been suggested to me that your original project might have
> been slowed because of the complexity of not only the instruction itself,
> but also knowledge about how to make an instructional video.  There is
> supposed to be an online course on making instructional videos, but it has
> not been released yet.  In the context of Wikipedia, the process is even
> more of a challenge because the YouTube format cannot be easily imported to
> Commons. It would also be an advantage to have transcripts available for
> those with hearing disabilities or who do not speak the language. There
> have been captioning and translation projects but they depend on
> transcripts.
>
> If anyone has not seen the self paced tutorials in the WikiEd training
> library, here is the first one, a basic overview.
> https://dashboard.wikiedu.org/training/students/wikipedia-essentials This
> type of tutorial has the advantage of being self paced and geared to visual
> learners (which is maybe 80% of an average class).  Anyone who doesn't
> speak the language can easily use a translation tool to get a basic idea of
> the content.
>
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] How diverse are your readers?

2019-03-18 Thread Ed Erhart
Hey folks,

Trey authored a Wikimedia blog post on this as well:
https://blog.wikimedia.org/2017/12/12/failed-queries-fear-of-missing-out/

--Ed

On Mon, Mar 18, 2019 at 11:34 AM Dan Garry (Deskana) 
wrote:

> The topic of zero-result search queries comes up from time to time. The
> logic is generally this: if we can see the top queries that got no results,
> then we can figure out what users are looking for but not finding, and add
> it to the encyclopedia. Wonderful user-centred thinking, and it sounds
> great! The problem is, sadly, the data doesn't help us achieve this at all.
>
> The sheer volume of requests means that a lot of the top zero-results
> queries are junk. Trey Jones, an engineer on the Search Platform Team,
> wrote a comprehensive analysis
> <
> https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Zero-Results_Queries
> >
> a
> few years ago of the top zero-result queries based on an analysis of a
> 500,000 multi-lingual sample. It was quite enlightening in some senses—we
> found out a lot about the things that people are doing with the search
> system, found some bugs in other products, and so on—but it didn't actually
> help us understand what people were looking for and not finding.
>
> Dan
>
> On Tue, 12 Mar 2019 at 23:12, Leila Zia  wrote:
>
> > Hi Gerard,
> >
> > On Sun, Mar 10, 2019 at 2:26 PM Gerard Meijssen
> >  wrote:
> > > but really
> > > why can we not have the data that allows us to seek out what people are
> > > actually looking for and do not find..
> >
> > Please open a Phabricator task for this request at
> > https://phabricator.wikimedia.org . Please add Research as a tag and
> > add me as one of the subscribers. I'd like to work with you on a
> > concrete proposal. A few items to consider as you're expanding the
> > description of the task:
> >
> > * We won't be able to release raw search queries as they come to
> > Wikimedia servers. That is for privacy reasons.
> >
> > * You also likely don't need raw search queries. If you can be
> > specific about what you want to have access to, as much as possible,
> > that can help us get started with scoping the problem. I'm looking for
> > something along these lines: "I want to be able to see a monthly list
> > of top n search terms in language x that result in 0 search results or
> > results where the user does not click on any of the search results
> > offered." The more specific, the better. If you are in doubt, put some
> > description and we can iterate on it.
> >
> > Best,
> > Leila
> > p.s. The goal of this exercise is to have an open question ready (with
> > all the details one needs to know) for the next time we will have a
> > volunteer researcher to work with us.
> >
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > https://meta.wikimedia.org/wiki/Wikimedia-l
> > New messages to: Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 



-- 
[image: Wikimedia-logo black.svg] *Ed Erhart* (he/him)

Senior Editorial Associate

Wikimedia Foundation 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] How diverse are your readers?

2019-03-18 Thread Dan Garry (Deskana)
The topic of zero-result search queries comes up from time to time. The
logic is generally this: if we can see the top queries that got no results,
then we can figure out what users are looking for but not finding, and add
it to the encyclopedia. Wonderful user-centred thinking, and it sounds
great! The problem is, sadly, the data doesn't help us achieve this at all.

The sheer volume of requests means that a lot of the top zero-results
queries are junk. Trey Jones, an engineer on the Search Platform Team,
wrote a comprehensive analysis

a
few years ago of the top zero-result queries based on an analysis of a
500,000 multi-lingual sample. It was quite enlightening in some senses—we
found out a lot about the things that people are doing with the search
system, found some bugs in other products, and so on—but it didn't actually
help us understand what people were looking for and not finding.

Dan

On Tue, 12 Mar 2019 at 23:12, Leila Zia  wrote:

> Hi Gerard,
>
> On Sun, Mar 10, 2019 at 2:26 PM Gerard Meijssen
>  wrote:
> > but really
> > why can we not have the data that allows us to seek out what people are
> > actually looking for and do not find..
>
> Please open a Phabricator task for this request at
> https://phabricator.wikimedia.org . Please add Research as a tag and
> add me as one of the subscribers. I'd like to work with you on a
> concrete proposal. A few items to consider as you're expanding the
> description of the task:
>
> * We won't be able to release raw search queries as they come to
> Wikimedia servers. That is for privacy reasons.
>
> * You also likely don't need raw search queries. If you can be
> specific about what you want to have access to, as much as possible,
> that can help us get started with scoping the problem. I'm looking for
> something along these lines: "I want to be able to see a monthly list
> of top n search terms in language x that result in 0 search results or
> results where the user does not click on any of the search results
> offered." The more specific, the better. If you are in doubt, put some
> description and we can iterate on it.
>
> Best,
> Leila
> p.s. The goal of this exercise is to have an open question ready (with
> all the details one needs to know) for the next time we will have a
> volunteer researcher to work with us.
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,