Just also wanted to note that these paws-public URLs will break in the
near-to-mid future :)
On Mon, Mar 7, 2016 at 4:22 PM, Aaron Halfaker wrote:
> Got some work done here. I'm using this as an opportunity to test out PAWS.
>
> See
>
Got some work done here. I'm using this as an opportunity to test out
PAWS.
See
http://paws-public.wmflabs.org/paws-public/EpochFail/projects/headings/extract_headings.ipynb
It's still running right now, but I should have an output file that we can
download and/or load into MySQL soon.
-Aaron
Bumping this thread - has anyone made progress on this, for example to
determine the percentage of enwiki articles that contain one of these
standard sections?
(I'm also curious how Danny B - BCCed - generates the lists at
https://cs.wiktionary.org/wiki/User:Danny_B./Datamining/Nadpisy that
Petr
Cross-posting this request to wiki-research-l. Anyone have data on
frequently used section titles in articles (any language), or know of
datasets/publications that examined this?
I'm not aware of any off the top of my head, Amir.
- Jonathan
-- Forwarded message --
From: Amir E.
Yes, that's the idea more or less, but I'm not sure that our search engine
is able to search for headings, though I might be wrong. I suspect,
however, that it will be required to process dumps article by article (or
at least a random sample), and in big projects this could be extremely time
Would it be possible to run a search on the full text of Wikipedias for
lines that start and end with ==, ===, , and lines that start
with ;, then make a list of those strings, and count the number of times
that each title appears in the list?
Pine
On Jul 13, 2015 10:29 AM, Jonathan Morgan