[Wikimedia-l] Re: Join the second Eduwiki Sub-saharan regional meeting_Tuesday 27th Sept 2022

2022-09-10 Thread Mamadou Oury Sow
Ok merci

Le ven. 9 sept. 2022 à 19:50, ruby damenshie-brown 
a écrit :

> Hello Everyone,
>
> It’s the time of the year again. We are excited to invite you to join the
> second Eduwiki Sub-saharan regional meeting. Together with the Wikimedia
> Education team we will offer you the opportunity to learn about some
> wonderful projects happening across the continent, learn about new
> projects, and exciting opportunities.
>
> We want you to be involved in the broader story storytelling and
> contribution to the growth of the Eduwiki community.
>
> Date: Tuesday 27th September 2022
> Time: 15:00 UTC/GMT
> Join via Zoom here-
> https://wikimedia.zoom.us/j/89225378677?pwd=L2dFbG1rRDA2bnBiS1RlYzA4OTJPUT09
>
> Register/submit a session using this link -
> https://forms.gle/4d3Hh18qeS3TktAV6
>
>
> Listen to the latest podcast and learn more about the
> Eduwiki-Collaborators network -
> https://spotifyanchor-web.app.link/e/onjdWYNf8sb
> Please share and invite members from your communities.
>
> See you soon.
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/HFN5YIAL3HAZCIATXPBS2R44JKWYOUCH/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/T5JD4XQVCIPHK3QL522WSJCNRM7YYECB/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: [Foundation-l] Improving search in Wikipedia through quality and concept discovery

2022-09-10 Thread Brian Mingus
Your the search guy?

Why did you marginalize my work?

On Sun, Nov 1, 2009 at 9:15 AM Robert Stojnic  wrote:

>
> Hi Brian,
>
> I'm not sure this is foundation-l type of discussion, but let me give a
> couple of comments.
> I took the liberty of re-running your sample query "hippie" using google
> and built-in search on simple.wp, here are the results I got for top 10
> hits:
>
> Google:  [1]
> Hippie, Human Be-In, Woodstock Festival, South Park, Summer of Love,
> Lysergic acid diethylamide, Across the Universe (movie), Glam rock,
> Wikipedia:Simple talk/Archive 27, Morris Gleitzman
>
> simple.wikipedia.org: [2]
> Hippie, Flower Power, Counterculture, Human Be-In, Summer of Love,
> Woodstock Festival, San Francisco California, Glam Rock, Psychedelic
> pop, Neal Cassady
>
> LDA (your method results from your e-mail):
> Acid rock, Aldeburgh Festival, Anne Murray, Carl Radle, Harry Nilsson,
> Jack Kerouac, Phil Spector, Plastic Ono Band, Rock and Roll, Salvador
> Allende, Smothers brothers, Stanley Kubrick
>
> Personally, I think the results provided by the internal search engine
> are the best, maybe even slightly better than google's, and I'm not sure
> what kind of relatedness LDA captures.
>
> If we were to systematically benchmark these methods on en.wp I think
> google would be better than internal search, mainly because it can
> extract information from pages that link to wikipedia (which apparently
> doesn't work as well for simple.wp). But that is beside the point here.
>
> I think it is interesting that you found that certain classes of pages
> (e.g. featured articles) could be predicted from some statistical
> properties, although I'm not sure how big is your false discovery rate.
>
> In any case, if you do want to work on improving the search engine and
> classification of articles, here are some ideas I think are worth
> pursuing and problems worth solving:
>
> * integrating trends into search results - if one searches for "plane
> crash" a day after a plane crashes, he should get first hit that plane
> crash and not some random plane crash from 10 years ago - we can
> conclude this is the one he wants because it is likely that this page is
> going to get a lots of page hits. So, this boils down to: integrate page
> hit data into search results in a way that is robust and hard to
> manipulate (e.g. by running a bot or refreshing a page million times)
>
> * morphological and context-dependent analysis, if a user enters a query
> like "douglas adams book" what are the concepts in this query? Should we
> group the query like [(douglas adams) (book)] or [(douglas) (adams
> book)]? Can we devise a general rule that will quickly and reliably
> separate the query into parts that are related to each other, and then
> use those to search through the article space to find the most relevant
> articles?
>
> * technical challenges: can we efficiently index expanded article with
> templates, can we make efficient category intersection (w/o subcategories)
>
> * extracting information: what kinds of information is in wikipedia, how
> do we properly extract it and index it? What about chemical formulas,
> geographical locations, computer code, stuff in templates, tables, image
> captions, mathematical formulas
>
> * how can we improve on the language model? Can we have smarter stemming
> and word disambiguation (compare shares in "shares and bonds" vs  "John
> shares a cookie"). What about synonyms and acronyms? Can we improve on
> the language model "did you mean..." is using to correlate related words?
>
> Hope this helps,
>
> Cheers, robert (a.k.a "the search guy")
>
> [1] http://www.google.co.uk/search?q=hippie+site%3Asimple.wikipedia.org
> [2]
>
> http://simple.wikipedia.org/w/index.php?title=Special%3ASearch=Hippie=Search
>
>
> Brian J Mingus wrote:
> > This paper (first reference) is the result of a class project I was part
> of
> > almost two years ago for CSCI 5417 Information Retrieval Systems. It
> builds
> > on a class project I did in CSCI 5832 Natural Language Processing and
> which
> > I presented at Wikimania '07. The project was very late as we didn't send
> > the final paper in until the day before new years. This technical report
> was
> > never really announced that I recall so I thought it would be
> interesting to
> > look briefly at the results. The goal of this paper was to break articles
> > down into surface features and latent features and then use those to
> study
> > the rating system being used, predict article quality and rank results
> in a
> > search engine. We used the [[random forests]] classifier which allowed
> us to
> > analyze the contribution of each feature to performance by looking
> directly
> > at the weights that were assigned. While the surface analysis was
> performed
> > on the whole english wikipedia, the latent analysis was performed on the
> > simple english wikipedia (it is more expensive to compute). = Surface
> > features = * Readability measures are the