[For others: trying to list all page titles that contain any characters, *other 
than*: alphanumeric, spaces, underscores, dashes]
I'm seeing the same. I went to this regex chat and asked them for a regex, 
which worked on the regex tester http://regexpal.com/but not on the bot :-(.
python pagegenerators.py -titleregex:.*[^\w\s-].*            ...  - [1]
I also tried these: 
(?=.*[^\w\s-])(.*[\w\s-].*)
(?=.*[^\w\s-]).*

#1 hits all pages including ones that only have that set of characters, for 
example its also showing up "Apple".

I want the bot to show me pages with the following titles:
My apple is sweet (song)
He said "hello" to me

But it should not show:
My apple is sweet - song
He said hello to me








________________________________
 From: Jon Harald Søby <[email protected]>
To: Eric K <[email protected]> 
Sent: Sunday, January 15, 2012 1:50 PM
Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update 
links
 

I'm trying to figure this one out too... I've been using mostly the same regex: 
[^A-Za-z0-9-\s]*, but mine hits _every_ page no matter what. Something's fishy. 
Will look more into it.


Den 19:49 15. januar 2012 skrev Eric K <[email protected]> følgende:

I'm trying this for example:
>python pagegenerators.py -titleregex:[^A-Za-z0-9-\s]+
>(trying to make it say: dont match any alphabets, numbers, spaces and dashes)
>And it only brings up titles with begin a " (quotation mark) , but it misses 
>titles that have the " somewhere in the middle.
>
>
>Then I remove the ^ for the regex and see what that does, and it gets all 
>titles including those which have " and so on.
>python pagegenerators.py -titleregex:[A-Za-z0-9-\s]+
>
>
>Its probably my regex that is flawed :).
>
>
>
>
>
>________________________________
> From: Eric K <[email protected]>
>To: Jon Harald Søby <[email protected]> 
>Sent: Sunday, January 15, 2012 1:40 AM
>
>Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update 
>links
> 
>
>
>Thanks, I'm learning! I tried step one for generating the title list. It was 
>saying "incomplete XML data". It was doing a text replace and I was able to 
>make it work by doing "-debug" but it was slow.
>Then I found out this command:
>python pagegenerators.py -titleregex:apple
>http://www.mediawiki.org/wiki/Manual:Pywikipediabot/pagegenerators.py
>(there is no -save option for this script but if I add "> filelist.txt" at the 
>end, it outputs the screen output to that text file, which works)
>
>So this one is only working on the page titles (so its quicker) and it does 
>work, e.g. for the above, it will list any pages beginning with the word 
>"apple". I actually found there's other characters in the database, so the 
>best way would be to do this kind of search:
>- If a page contains any character which is not:
>--- alphanumeric 
>
>--- underscore 
>
>--- dash 
>
>--- space
>
>Then include that page in the list.
>So "Hello 123" would be excluded but "Hello 123$" would be included.
>The pagegenerators.py does not have an "exclude" option like the "replace.py" 
>had. 
>
>
>
>Do you know of a regex that will work? 
>
>
>
>
>
>
>
>
>________________________________
> From: Jon Harald Søby <[email protected]>
>To: Eric K <[email protected]> 
>Sent: Saturday, January 14, 2012 10:09 PM
>Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / update 
>links
> 
>
>Good luck! :-)
>My regex skills are quite rudimentary, so be cautious when doing the 
>replacement step 5 -- the regexes may catch something they shouldn't. Please 
>let me know how it goes! :-)
>
>
>Den 04:43 15. januar 2012 skrev Eric K <[email protected]> følgende:
>
>Hi John, wow, that really is awesome, that you were able to do this with the 
>provided scripts. I could never have come up with that.  I'll try this right 
>away and let you know how it goes.
>>
>>
>>
>>
>>
>>
>>
>>________________________________
>> From: Jon Harald Søby <[email protected]>
>>To: Eric K <[email protected]>; Pywikipedia discussion list 
>><[email protected]> 
>>Sent: Saturday, January 14, 2012 8:47 PM
>>Subject: Re: [Pywikipedia-l] Need help for: Page rename / insert text / 
>>update links
>> 
>>
>>
>>
>>2012/1/15 Eric K <[email protected]>
>>
>>Hi guys,
>>>I just installed the pywikipedia bot on my wiki yesterday. I'm new to Python 
>>>but I can try learn it since I'm familiar with PHP. It would take me a while 
>>>though to make this first bot since I'm new to the language. The tasks are 
>>>pretty straightforward. I would like the bot to run without any user input 
>>>and do all of this by itself:
>>>
>>>
>>>1. For every page on the wiki, check if it has these three characters: ( , ) 
>>>, : . Any page containing any of these characters (curly brackets and colon) 
>>>will be moved to a new title. The original title is var_1.
>>>
>>>
>>
>>>
>>>2. For the new title, brackets are simply deleted, and the : (colon) is 
>>>replaced with a " - " (a dash with a space on each side). The new title 
>>>generated is var_2.
>>>
>>
>>>
>>>3. Insert this text at the top of this page: {{page_rename|var_1}}, and save 
>>>page.
>>> 
>>
>>>
>>>4. Find any existing links on the site to this page which would be in the 
>>>format of [[var_1]], and change them to [[var_2|var_1]].
>>>
>>>
>>>I don't need any menus or other functionality. Is something something pretty 
>>>straightforward to make? I would appreciate any tips/help and if its 
>>>something that can be made pretty easily, I would be really thankful if 
>>>someone could do this for me or give me a good start.
>>>I've looked at some of existing pywikipedia bot scripts (basic.py, 
>>>movepages.py) but none of them would work for me and being new to Python, it 
>>>would take me a long time to do what I need but in any case I will learn a 
>>>lot in this first attempt.
>>>
>>>
>>>
>>>thanksEric
>>>_______________________________________________
>>>Pywikipedia-l mailing list
>>>[email protected]
>>>https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>>
>>>
>> 
>>This is how I would do it. It is probably a hacky solution, and there may be 
>>better/more efficient ways of doing it, but it should work.
>>
>>Step 1: Getting list of pages to change
>>
>>Run this line:
>>
>>
>>python replace.py -regex -requiretitle:"\(|\)|:" "[A-Za-z0-9]" "test" 
>>-save:Pagestoberenamed.txt -start:!
>>
>>Press "a" when it prompts.
>>
>>This will not change anything, only save a list of all pages that 
need to be renamed. The script assumes there is either a letter or 
number in all the pages that needs to be changed.
>>
>>Step 2: Put that template on top of the pages
>>
>>Run this line:
>>
>>python add_text.py -up -text:"{{page_rename|{{subst:PAGENAME}}}}" 
>>-file:Pagestoberenamed.txt
>>
>>Step 3: Creating list for renaming files
>>
>>Open the file "Pagestoberenamed.txt" in a regex-supporting text editor and 
>>use the follow regex replacements:
>>
>>Replace:
>>
>>#\[\[([^:]*):([^\]]*)\]\]
>>with
>>
>>[[\1:\2]] [[\1 - \2]]
>>
>>
and replace
>>
>>#\[\[([^\(]*)\(([^\)]*)\)([^\]]*)\]\]
>>
with
>>
>>[[\1(\2)\3]] [[\1\2\3]]
>>
>>
>>
I don't actually have a text editor that supports regex, so instead I 
copypasted the contents of that file into a sandbox page, and ran the 
following line:
>>
>>python replace.py -page:SANDBOX -regex 
"#\[\[([^:]*):([^\]]*)\]\]" "[[\1:\2]] [[\1 - \2]]" 
"#\[\[([^\(]*)\(([^\)]*)\)([^\]]*)\]\]" "[[\1(\2)\3]] [[\1\2\3]]"
>>
>>Save the text as Pagerenaming.txt
>>
>>
>>
Hacky solution, but it should work.
>>
>>Step 4: Moving the pages
>>
>>Run this line:
>>
>>python movepages.py -pairs:Pagerenaming.txt
>>
>>It will not prompt you, it will move the pages as specified in 
>>Pagerenaming.txt
>>If you do not want to have redirects from the old page names, use 
-noredirect as an additional argument. This may depend on how your wiki 
is set up, I know Wikipedias didn't have this option until relatively 
recently (and maybe it is only for administrators now).
>>
>>Step 5: Fixing links
>>Links can be fixed using this line:
>>
>>python replace.py -regex 
"\[\[([^:]*):([^\]]*)\]\]" "[[\1 - \2|\1: \2]]" 
"\[\[([^\(|^\[]*)\(([^\)]*)\)([^\]]*)\]\]" "[[\1\2\3|\1(\2)\3]]" 
-start:!
>>
>>If you think it is too slow, you can append -pt:1 to that.
>>
>>With this last one you should be careful, and approve quite a few changes 
>>manually first (pressing "y" and not "a"), in case something is fishy with 
>>the regex.
>>
>>Hope this helps.
>>
>>-- 
>>
>>mvh
>>Jon Harald Søby
>>
>>
>>
>
>
>-- 
>
>mvh
>Jon Harald Søby
>
>
>
>
>


-- 

mvh
Jon Harald Søby
_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to