Re: [Wikimedia-l] Idea of a new project: Wikifacts ?

2021-02-10 Thread Douglas Clark
I am coming to the realization that machine learning is somewhat counter
wiki-culture, but I'm going to keep trying to help in the way I know. My
new project proposal, WikiPragmatica (naming things is something I love,
but am poor at, so sorry), is a machine learning-based, data curation of
paraphrases. It has a bit of a special sauce in that the curation retains
the original context, hence the nod to pragmatics. When complete for any
corpus, you could still read the source material in the graph itself, as
the connectors, (edges) point to what concepts come prior and next.

This is important to fact checking, as well as the more general
misinformation detection, as each node (the collection of paraphrases of
each other) will have wikidata and other associated metadata. Thus, for any
speech or article or propaganda piece, we can detect variations from
pragmatics, as well as making statements contrary to the associated
metadata. The deviation from pragmatics may stem from a piece of
misinformation using a string of concepts rarely or never used, say giving
the location of a pizza restaurant, followed by a discussion of child
trafficking at the restaurant. A statement that, "Vaccines cause autism,"
will immediately be flagged as the associated metadata for that concept
will have an entry in the sentiment section of the metadata as "not true,"
no matter how they choose to say, "Vaccines cause autism."

The biggest problem with word based approaches is that they all lack higher
order contextual clues. Autism the lexeme is not as useful for fact
checking as the full context of the concept, "Vaccines cause autism."
However, the concept, "Vaccines cause autism," will inherit all of the
context appropriate metadata from the lexemes that are used to build the
concept! We get all of the power of Wikidata for free, just by reindexing
all Wikipedias to paraphrases. This will also massively help with general
search. Google and all other search indices suffer from a lack of
contextual resolution. Even the near AI of GPT-3 suffers from repetitions,
coherence loss over sufficiently long passages, and contradiction, meaning
it loses context occasionally (from the GPT paper here
).
The combination of contextual forensics and metadata evaluation and graph
based analytics will provide highly accurate misinformation detection.

In my project proposal, I recommended that we ultimately re-index the web
to paraphrases. While this may sound a bit Dr Evilish, the paraphrase graph
is actually the same thing as a Wikipedia. The paraphrase graph also
documents human knowledge, but is based on how we humans communicate that
knowledge. Some of this knowledge is intended to harm others via
misinformation or outright fraud. The wiki community can certainly build a
reference work of all paraphrases, but sorting through about 500 billion
sentences seems daunting to me without some electrical help.

I do think that a reference work of human communication is in
Wikimedia's wheelhouse, but this reference work also has very practical
applications. Thank you for taking the time to read my thoughts on this
vital topic.

On Tue, Feb 9, 2021 at 5:35 PM Netha Hussain  wrote:

> Hi all,
>
> I am generally interested in any project that helps counter misinformation
> on the internet, and I think that our existing projects have limitations in
> calling out fake news. Wikipedia, for example, has dedicated pages
> surrounding misinformation related to various topic areas (such as this
> article on Misinformation related to COVID-19
> ) where fact
> checking can be incorporated. However, such articles do not only contain
> fact-checked statements, but they deal with misinformation in a
> comprehensive way, covering the origin, extent and effect of
> misinformation, in addition to commonly circulated bits of
> mis(dis)information. Another possibility on Wikipedia is to create a list
> of commonly circulated misinformation on notable themes (such as this
> article on List of unproven methods against COVID-19
> ).
> Turns out that such lists contain several primary sources as
> citations, because there are too few available secondary sources which call
> out misinformation.
>
> In the realm of misinformation, the existing primary/secondary sources
> only cover the tip of the iceberg, and there is so much more misinformation
> circulating in the internet than is being documented by fact checking
> websites and news media. Another limitation is that it is not possible to
> add a piece of misinformation that you found on social media to a Wikipedia
> page, because that amounts to original research. Searchability is also an
> issue on Wikipedia, and our search interface on Wikipedia is not exactly
> suitable for someone who wants to che

Re: [Wikimedia-l] Idea of a new project: Wikifacts ?

2021-02-04 Thread Douglas Clark
I proposed a project, WikiPragmatica
, that can support fake
news detection. The retained context of the paraphrase graph can identify
fake news patterns similar to what MIT does with their detector.

On Thu, Feb 4, 2021 at 12:42 PM Galder Gonzalez LarraƱaga <
galder...@hotmail.com> wrote:

> Does Wikinews cover this aspect?
> --
> *From:* Wikimedia-l  on behalf
> of Chris Gates via Wikimedia-l 
> *Sent:* Thursday, February 4, 2021 8:20 PM
> *To:* Wikimedia Mailing List 
> *Cc:* Chris Gates 
> *Subject:* Re: [Wikimedia-l] Idea of a new project: Wikifacts ?
>
> Hello,
>
> Independent of my opinions on the validity of such a new Wikimedia
> project, there is currently an experiment of similar goals (and potentially
> structure) over at Twitter:
>
>
> https://blog.twitter.com/en_us/topics/product/2021/introducing-birdwatch-a-community-based-approach-to-misinformation.html
>
>
>
> Best,
> Verm
>
>
> On Thu, Feb 4, 2021 at 2:17 PM Leinonen Teemu 
> wrote:
>
> Hi all,
>
> Has there been any discussion to start a new Wikimedia project focusing on
> fact checking?
>
> Fact checking of course is in the core of editing Wikipedia, but I was
> thinking about dedicated wiki-site that is dedicated for fact checking of
> current events and news. Why this would be important?
>
> (1) There are many fact checking site in the English speaking world but
> much less elsewhere. I am afraid that there is still greater need for fact
> checking in the rest of the world. {{Citation needed}}
>
> (2) Our community is very well educated to do fact checking the wiki-way.
> Again internationally, many of our community members are real fact
> champions in their home countries and language groups. The practice of
> Wikipedia could be applied to fact checking of fast moving current events
> and news, too.
>
> (3) This could help us to get new young people to the movement, as editing
> Wikipedias is not anymore so easy to start (because they are so good
> already).
>
> (4) In many parts of the world, fact checking can also be dangerous. With
> our anonymous and community driven practices and services we could protect
> the fact checkers in many parts of the world.
>
> I am not sure what is the state of the Wikinews, but my impression is that
> it is not really working. It was a good idea, but maybe wiki or wiki-way is
> not the way to produce news. Also the beautiful idea of citizen journalism
> has not really become reality. Maybe we could try if wiki and the wki-way
> works better in fact checking.
>
> Peace,
>
>  - Teemu
>
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,