Re: [Wiki-research-l] Fwd: [Wikitech-l] statistics about frequent section titles

2016-03-07 Thread Yuvi Panda
Just also wanted to note that these paws-public URLs will break in the near-to-mid future :) On Mon, Mar 7, 2016 at 4:22 PM, Aaron Halfaker wrote: > Got some work done here. I'm using this as an opportunity to test out PAWS. > > See >

Re: [Wiki-research-l] Fwd: [Wikitech-l] statistics about frequent section titles

2016-03-07 Thread Aaron Halfaker
Got some work done here. I'm using this as an opportunity to test out PAWS. See http://paws-public.wmflabs.org/paws-public/EpochFail/projects/headings/extract_headings.ipynb It's still running right now, but I should have an output file that we can download and/or load into MySQL soon. -Aaron

Re: [Wiki-research-l] Fwd: [Wikitech-l] statistics about frequent section titles

2016-03-02 Thread Tilman Bayer
Bumping this thread - has anyone made progress on this, for example to determine the percentage of enwiki articles that contain one of these standard sections? (I'm also curious how Danny B - BCCed - generates the lists at https://cs.wiktionary.org/wiki/User:Danny_B./Datamining/Nadpisy that Petr

[Wiki-research-l] Fwd: [Wikitech-l] statistics about frequent section titles

2015-07-13 Thread Jonathan Morgan
Cross-posting this request to wiki-research-l. Anyone have data on frequently used section titles in articles (any language), or know of datasets/publications that examined this? I'm not aware of any off the top of my head, Amir. - Jonathan -- Forwarded message -- From: Amir E.

Re: [Wiki-research-l] Fwd: [Wikitech-l] statistics about frequent section titles

2015-07-13 Thread Amir E. Aharoni
Yes, that's the idea more or less, but I'm not sure that our search engine is able to search for headings, though I might be wrong. I suspect, however, that it will be required to process dumps article by article (or at least a random sample), and in big projects this could be extremely time

Re: [Wiki-research-l] Fwd: [Wikitech-l] statistics about frequent section titles

2015-07-13 Thread Pine W
Would it be possible to run a search on the full text of Wikipedias for lines that start and end with ==, ===, , and lines that start with ;, then make a list of those strings, and count the number of times that each title appears in the list? Pine On Jul 13, 2015 10:29 AM, Jonathan Morgan