Re: [Wikimedia-l] Curating YOUR Wikipedia

2018-07-16 Thread James Salsman
Gerard and Alessandro,

The taxonomy question is very important. I touched on it in the
ethnicity categorization discussion:

https://lists.wikimedia.org/pipermail/wikimedia-l/2018-May/090366.html

I suggest that both the Enwiki Categories and Wikidata are most
deficient from a utilitarian perspective because of their poor support
of the bijection between subject matter experts and their subjects,
which is one of the man reasons for the existence of encyclopedias and
"Who's Who in ..." references to begin with. This issue has come up
more and more in my mentoring, and these two patent applications
caught my eye:

IBM: 
https://patentimages.storage.googleapis.com/ec/a6/fe/b47153da8a0a0d/US20100262610A1.pdf

Siemens: 
https://patentimages.storage.googleapis.com/b0/7b/b1/5bdcddc6370ceb/US20160140186A1.pdf

Those are different approaches to the general over-arching problem,
pursued as patent applications -- even under the current pro-free
(perhaps overly pro-free) software patent reexamination regime -- by
those companies because they recognize the centrality of the problem
to be solved.

Do you think Wikidata can serve as a unified subject matter expert database?

Best regards,
Jim

On Mon, Jul 16, 2018 at 4:56 AM, Alessandro Marchetti via Wikimedia-l
 wrote:
> Hi.
> You said that you find an area where there is a problem. I found another one 
> too, taxonomy, and in this case I am quite sure it won't be solved for a 
> while even without better diagnostic tools. Yet I am optimistic on the long 
> term. I have also found areas where problems were similar to yours, and they 
> were solved. Like the examples of ancient Greece items. In that case you need 
> enough people that knows ancient Greek, possibly, and those can be rare to 
> find as well.
> For one thing you notice, there are other ones other people noticed. But they 
> also see them improving, we have examples.
>
> As far I can say from my experience, the main issue, if the discrepancies 
> were not structural (that is: in the sources), was not having a super tool. 
> In the end, it was about understanding the sources. Tools help, they are 
> cool, is nice to show them, but you need human resources. For all these 
> possible gaps I can notice, my strategy is to look for people.
> Sometimes I ask to improve tools based specifically on what these people, the 
> newbies of wikidata, want, not what the "expert users" want. I don't say 
> these people know what is best but they kinda feel what is necessary, 
> especially what is necessary to integrate more users with specific necessary 
> knowledge in the workflow.
>
> So my core advice remain the same: create a dedicated project, ask users 
> interested in the topic, teach them wikidata. You can teach them without a 
> project too, but I guess the project could help.
>
> I made you one example in the private mail, the situation of the Italian 
> hamlets imported by some archive on some minor wikipedians (to pick a theme 
> among possible dozens). Some of them are correct, some of them are weird . 
> They are still there but, as I said, if you want to get rid of the trash I 
> can find you 30 users now willing to clean up in a short amount of time and 
> leave only what has a real meaning. So it's not so bad. I could have written 
> general emails and the structural starting point would have not changed this 
> way.
>
> What I am trying to say is that you probably have around the human resources 
> to tackle most of this cluster of work, you just have to find them. I see the 
> energy inside the communities. Your mail is more centered on the issue, the 
> guideline, the possible tool... it 's not "warm". You don't seem to consider 
> the people who should do the continuous, constant work. You describe 
> something where you are alone and I might say, if I ask this help inside the 
> wikidata community, I have the same feeling sometimes. That is true, since 
> there are many small tasks that are much simpler, very generic tasks that are 
> interesting to write a nerdy post on ablog, or virgin areas ready to be 
> conquered massively importing data from archives... and many established 
> wikidata users prefer to focus on these things. But when I look for users at 
> the level of local communities, I had much less problems, i had good 
> feedback. That's it. And that is why I am basically optimistic.
>
> When I see a situation that is not evolving inside wikidata, my instinct 
> remains to ask around to people who create real content wherever they are.
> About this specific problem, did you contact the users who created these 
> contents on local wikipedias? 50% of them should have a decent English 
> working proficiency, in my experience. Did you scroll the history of the 
> pages here and there, found the most common usernames dedicated to their 
> creation and maintennace, and left the a message in their user talks? that's 
> what I am trying to understand.
>
> Il Lunedì 16 Luglio 2018 8:13, Gerard Me

Re: [Wikimedia-l] Curating YOUR Wikipedia

2018-07-16 Thread Gerard Meijssen
Hoi,
Thanks for your reply. There is one big issue that you do not address and,
it is best explained using a Wikipedia "best practice". The best practice
is that a town, a village whatever is known to be in the next
level "administrative territorial entities". This is done properly for the
first world. Where Wikidata does not hold data, as it often does, it cannot
help in info boxes but what I find is that the data of the Wikipedia is
wrong for more than 6% when I add information.

It does not matter that the information is fractured; coming from many
sources. The data for Egyptian subdivisions is largely in Arabic. This is
not something I can curate but it is something that can be presented.

What does matter is that differences between Wikipedias and Wikidata are
not noticed. Of particular importance is where the data is biased or wrong.
Particularly where the data is wrong and is about "administrative
territorial entities", I have had push back because English Wikipedia was
said to be wrong [1]... My interpretation of the facts is that the German
article was better written but out of date.

In this mail thread, I raise the issue of differences between Wikipedias,
differences between projects and Wikidata. Particularly where the
data/articles are biased or wrong our quality suffers. When for a subject
the error rate is more than 6%, the error rate is more than can be expected
of human adding good faith information to a project. The data I am adding
at this time supports Wikipedia best practices. It is particularly intended
for the "minority languages" [2] but the quality of all our data will be
improved when we are aware of the differences and curate them everywhere.

This is distinctly different from the issue with Commons; its data is good
enough for its current use case but is what holds it back from becoming the
resource you goto because you can "find" what you are seeking.

In a nutshell our problem is that we work in an insular fashion. We do not
have ways to find the differences, the errors, the bias between our
projects. We could do, suggestions for a basic mechanism have been made.
Our quality suffers and it does not need to [3].
Thanks,
   GerardM

[1] https://ultimategerardm.blogspot.com/2018/07/africagap-where-wikipedias-
collide.html
[2] https://ultimategerardm.blogspot.com/2018/07/africagap-support-for-
minority-languages.html
[3] https://ultimategerardm.blogspot.com/2016/01/wikipedia-lowest-hanging-
fruit-from.html

On 16 July 2018 at 05:41, Alessandro Marchetti via Wikimedia-l <
wikimedia-l@lists.wikimedia.org> wrote:

> yes, it is an old issue, what you say it's right but I would be more
> optimistic.
> To summarize my view (I couls send you more information privately)
>
> 1. Wikidata largely reflected what Wikipedia indicated, and that was not
> the right way to make it grow, but that was also the past. At the moment,
> the reference of the content is increasing, the clean-up too. In some
> areas, wikidata items are also created before the wikipedia articles
> nowadays.
>
> 2. new tools are great and will do a lot, but it's users who do the real
> tricks. You have to start to bring local users to wikidata, show them how
> it can be used (automatic infoboxes, fast creation of stubs, automatic
> lists, detecing missing images). They will start to fix the issues,
> curating their wikipedia, wikidata and also indirectly influence the other
> ones.
>
> 3. IMHO, the wikidata ecosystem is not so bad, it could have more expert
> users with real knowledge of topics, but  commons with millions of
> automatically imported files, and tons of poorly described and
> uncategorized images faces a much worse perspective. You need more tools
> there than on wikidata, at the moment, if you want to keep some balanced
> workflow. What is really missing on wikidata are mostly active projects to
> coordinate and catalyze the ongoing efforts. This one
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Ancient_Greece made
> miracles, for example. But I couldn't find one about peer-reviewed
> researchers or photographers to name a few, at least in the past months.
> Investing on this aspect would not change the final situation on wikidata
> (that will be positive for me), but it would speed up the process. it will
> also influence much more the content on local wikis because it will bring
> content-related users closer together and increase their wikidata literacy
> with lower effort.
> 4. In the end, even with a good high quality wikidata platform, there will
> always be communities that will not integrated in wikidata massively... but
> that's also a good thing for pluralism. You can't assume that a discrepancy
> is always a clue for a mistake (I am sure the examples of your experience
> are, of course), on the long term some of them are simply effects of gray
> areas that need to wait to be resolved even at the level of the sources.
> Insome fields, such as taxonomy, there is some confusion and asymmetr

Re: [Wikimedia-l] Curating YOUR Wikipedia

2018-07-16 Thread Alessandro Marchetti via Wikimedia-l
Hi.
You said that you find an area where there is a problem. I found another one 
too, taxonomy, and in this case I am quite sure it won't be solved for a while 
even without better diagnostic tools. Yet I am optimistic on the long term. I 
have also found areas where problems were similar to yours, and they were 
solved. Like the examples of ancient Greece items. In that case you need enough 
people that knows ancient Greek, possibly, and those can be rare to find as 
well.
For one thing you notice, there are other ones other people noticed. But they 
also see them improving, we have examples.

As far I can say from my experience, the main issue, if the discrepancies were 
not structural (that is: in the sources), was not having a super tool. In the 
end, it was about understanding the sources. Tools help, they are cool, is nice 
to show them, but you need human resources. For all these possible gaps I can 
notice, my strategy is to look for people.
Sometimes I ask to improve tools based specifically on what these people, the 
newbies of wikidata, want, not what the "expert users" want. I don't say these 
people know what is best but they kinda feel what is necessary, especially what 
is necessary to integrate more users with specific necessary knowledge in the 
workflow.

So my core advice remain the same: create a dedicated project, ask users 
interested in the topic, teach them wikidata. You can teach them without a 
project too, but I guess the project could help.

I made you one example in the private mail, the situation of the Italian 
hamlets imported by some archive on some minor wikipedians (to pick a theme 
among possible dozens). Some of them are correct, some of them are weird . They 
are still there but, as I said, if you want to get rid of the trash I can find 
you 30 users now willing to clean up in a short amount of time and leave only 
what has a real meaning. So it's not so bad. I could have written general 
emails and the structural starting point would have not changed this way.

What I am trying to say is that you probably have around the human resources to 
tackle most of this cluster of work, you just have to find them. I see the 
energy inside the communities. Your mail is more centered on the issue, the 
guideline, the possible tool... it 's not "warm". You don't seem to consider 
the people who should do the continuous, constant work. You describe something 
where you are alone and I might say, if I ask this help inside the wikidata 
community, I have the same feeling sometimes. That is true, since there are 
many small tasks that are much simpler, very generic tasks that are interesting 
to write a nerdy post on ablog, or virgin areas ready to be conquered massively 
importing data from archives... and many established wikidata users prefer to 
focus on these things. But when I look for users at the level of local 
communities, I had much less problems, i had good feedback. That's it. And that 
is why I am basically optimistic.
 
When I see a situation that is not evolving inside wikidata, my instinct 
remains to ask around to people who create real content wherever they are.
About this specific problem, did you contact the users who created these 
contents on local wikipedias? 50% of them should have a decent English working 
proficiency, in my experience. Did you scroll the history of the pages here and 
there, found the most common usernames dedicated to their creation and 
maintennace, and left the a message in their user talks? that's what I am 
trying to understand.

Il Lunedì 16 Luglio 2018 8:13, Gerard Meijssen  
ha scritto:
 

 Hoi,Thanks for your reply. There is one big issue that you do not address and, 
it is best explained using a Wikipedia "best practice". The best practice is 
that a town, a village whatever is known to be in the next level 
"administrative territorial entities". This is done properly for the first 
world. Where Wikidata does not hold data, as it often does, it cannot help in 
info boxes but what I find is that the data of the Wikipedia is wrong for more 
than 6% when I add information. 
It does not matter that the information is fractured; coming from many sources. 
The data for Egyptian subdivisions is largely in Arabic. This is not something 
I can curate but it is something that can be presented.
What does matter is that differences between Wikipedias and Wikidata are not 
noticed. Of particular importance is where the data is biased or wrong. 
Particularly where the data is wrong and is about "administrative territorial 
entities", I have had push back because English Wikipedia was said to be wrong 
[1]... My interpretation of the facts is that the German article was better 
written but out of date. 
In this mail thread, I raise the issue of differences between Wikipedias, 
differences between projects and Wikidata. Particularly where the data/articles 
are biased or wrong our quality suffers. When for a subject the error rate is 
more th