Robert Rohde wrote: >Which, after substituting "display:none;" I think translates directly to >the regex search: > >insource:/style[ ]*=[ ]*\"display:[ ]*none;[ ]*\"/i > >That gives me 487 articles.
Almost, but not quite. You actually want this: insource:/style[ ]*=[ ]*\"display:[ ]*none;?[ ]*\"/i With the semicolon being made optional, the search results increase from 487 to 2,487 currently on the English Wikipedia. The normalization script (<https://phabricator.wikimedia.org/P2229>) made the trailing semicolon consistent, in addition to lowercasing and trying to account for strange spacing. For whatever reason, "display: none;" is often written without the trailing semicolon in main namespace pages on the English Wikipedia. I was worried that I may have made a major coding mistake, so I re-ran my script using this pattern: pattern = r'style[ ]*=[ ]*"[ ]*display[ ]*:[ ]*none[ ]*;?[ ]*"' The results are available here: <https://phabricator.wikimedia.org/P2255>. Sixteen articles have over 1,000 instances of "display: none;" each! The total is 142,176 instances of "display: none;" (normalized) in 2,507 main namespace pages on the English Wikipedia, as of about 2015-10-02. >I am happy to agree that searching the XML should be better than the local >search tool, but I still find these numbers hard to reconcile. After re-reviewing the code and re-running the script to focus on "display: none;" specifically, there's strong evidence to suggest that the numbers are accurate, if not a bit surprising in some cases. :-) MZMcBride _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
