Re: [Wikidata] Machine translation efforts for underserved languages

2018-09-05 Thread mathieu lovato stumpf guntz

Hi Olya,

Sorry for the late reply, but I just wondered if you were aware of 
Wikitrans[1], which "provides machine-translated versions of Wikipedia 
articles, completely linked and searchable in the target language, as 
well as cross language simultaneous Wikipedia searches".


It doesn't use Wikidata, but on the other hand use some formalized 
grammar of targeted languages. For what I know the translation software 
is not open source, but it might be interesting to have a wikimedia 
hosted backup of translated versions and links toward them in Wikidata, 
then maybe usable in Wikipedia.


Let me know if this kind of late feedback is welcome/undesired

Cheers,
Mathieu

[1] https://wikitrans.net/


Le 18/06/2018 à 01:12, Olya Irzak a écrit :

Dear Wikidata community,

We're working on a project called Wikibabel to machine-translate parts 
of Wikipedia into underserved languages, starting with Swahili.


In hopes that some of our ideas can be helpful to machine translation 
projects, we wrote a blogpost about how we prioritized which pages to 
translate, and what categories need a human in the loop:

https://medium.com/@oirzak/wikibabel-equalizing-information-access-on-a-budget-4038f750e90e

Rumor has it that the Wikidata community has thought deeply about 
information access. We'd love your feedback on our work. Please let us 
know about past / ongoing machine translation related projects so we 
can learn from & collaborate with them.


Best regards,
Olya & the Wikibabel crew



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] On traceability and reliability of data we publish [was Re: [Wikimedia-l] Solve legal uncertainty of Wikidata]

2018-07-07 Thread mathieu lovato stumpf guntz

Hi Andra,

I agree this is misconception that a copyright license make any direct 
change to data reliability. But attribution requirement does somewhat 
indirectly have an impact on it, as it legally enforce traceability. 
That is I strongly disagree with the following assertion: "a license 
that requires BY sucks so hard for data [because] attribution 
requirements grow very quickly". To my mind it is equivalent to say that 
we will throw away traceability because it is subjectively judged too 
large a burden, without providing any start of evidence that it indeed 
can't be managed, at least with Wikimedia current ressources.


Now, I don't say traceability is the sole factor one should take into 
account in data reliability, but certainly it is one of them. Maybe we 
should first come with clear criteria to put in a equation that enable 
to calculate reliability of information. Since it's in the core goals of 
the Wikimedia strategy, it would certainly worth the effort to establish 
clear metrics about reliability of information the movement is spreading.


Cheers


Le 04/07/2018 à 13:00, Andra Waagmeester a écrit :
I agree with Maarten and to add to that. It is a huge misconception 
that CC0  makes data unreliable. It is only a legal statement about 
copyright, nothing more, nothing less. Statements without proper 
references and qualifiers make data unreliable, but Wikidata has a 
decent mechanism to capture that needed provenance.


On Wed, Jul 4, 2018 at 12:50 PM, Maarten Dammers > wrote:


Hi Mathieu,

On 04-07-18 11:07, mathieu stumpf guntz wrote:

Hi,

Le 19/05/2018 à 03:35, Denny Vrandečić a écrit :


Regarding attribution, commonly it is assumed that you
have to respect it transitively. That is one of the
reasons a license that requires BY sucks so hard for data:
unlike with text, the attribution requirements grow very
quickly. It is the same as with modified images and
collages: it is not sufficient to attribute the last
author, but all contributors have to be attributed.

If we want our data to be trustable, then we need
traceability. That is reporting this chain of sources as
extensively as possible, whatever the license require or not
as attribution. CC-0 allow to break this traceability, which
make an aweful license to whoever is concerned with obtaining
reliable data.

A license is not the way to achieve this. We have references for that.


This is why I think that whoever wants to be part of a
large federation of data on the web, should publish under CC0.

As long as one aim at making a federation of untrustable data
banks, that's perfect. ;)

So I see you started forum shopping (trying to get the Wikimedia-l
people in) and making contentious trying to be funny remarks.
That's usually a good indication a thread is going nowhere.

No, Wikidata is not going to change the CC0. You seem to be the
only person wanting that and trying to discredit Wikidata will not
help you in your crusade. I suggest the people who are still
interested in this to go to
https://phabricator.wikimedia.org/T193728
 and make useful
comments over there.

Maarten


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikimedia-l] Solve legal uncertainty of Wikidata

2018-07-07 Thread mathieu lovato stumpf guntz

Hi Andra,


Le 04/07/2018 à 13:00, Andra Waagmeester a écrit :



No, Wikidata is not going to change the CC0. You seem to be the
only person wanting that and trying to discredit Wikidata will not
help you in your crusade. I suggest the people who are still
interested in this to go to
https://phabricator.wikimedia.org/T193728
 and make useful
comments over there.

It seems all this assertions are following some erroneous assumptions. 
This ticket is not about changing Wikidata license. It aims at making 
sure what can and what can not be legally imported into a database using 
CC0, and in which juridiction it can be legally used safely or not in 
downstream projects.


It would certainly be interesting that Wikimedia infrastructure would 
allow to host projects using Wikibase with other topic/license scopes 
that are queriables within other Wikimedia projects. Surelly it would 
make a good match with the "become the essential infrastructure of the 
ecosystem of free knowledge" goal. But that's an other story, and I 
didn't found time to work on that topic so far.


It would also be great if we could avoid to imput the title of "crusader 
dedicated to discredit Wikidata" to someone that not later than this 
afternoon helped a new contributor to make its first edit on this project.


Cheers.



Maarten


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] On traceability and reliability of data we publish [was Re: [Wikimedia-l] Solve legal uncertainty of Wikidata]

2018-07-07 Thread mathieu lovato stumpf guntz

Le 07/07/2018 à 19:55, Stas Malyshev a écrit :


I think this assertion (that attribution requirements grow) is factually
true. Each data piece from CC-BY data set needs to carry attribution. If
your data needs require to combine several data sets, each of them needs
to carry attribution. This attribution should be carried through all
data processing pipelines. You may be OK with this growth, but as I just
explained above, these requirements, while being onerous for people that
don't need tracing each piece of data, are still unsatisfactory in many
cases for those that do. So having CC-BY would be both onerous and useless.

Hi Stas,

The attribution need to be carried only through processing pipelines 
whose results need to be published.


Can we talk about real concrete examples where attribution would 
seriously prevent any real case use? If all this stands on solid facts, 
surely it shouldn't be too hard to come with at least one example. 
Otherwise, it is certainly useless to continue this discussion.


Cheers

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata