[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-12-31 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

xqt i...@gno.de changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
 CC||i...@gno.de

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

--- Comment #6 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
Nicdumz, do you have the time to work on this? It's been stale for a while.

Sigmaoctantis, sorry for the very slow uptake. It's a general problem for most
patches that are larger than the 'glance over it, looks ok, commit' language
updates. I'll see if I can find the time to review it.

In any case, the patch does not apply cleanly currently, so it needs some more
fiddling.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

--- Comment #1 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
patch for replace.py solve_disambiguation.py  pagegenerators.py

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com changed:

   What|Removed |Added

   See Also||https://sourceforge.net/p/p
   ||ywikipediabot/patches/326

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

--- Comment #5 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
Assigning to nicdumz for processing.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

--- Comment #7 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
- **assigned_to**: nobody -- nicdumz

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

--- Comment #3 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
Thanks for the quick review. I will try to address the 
various points and included a new version of the patch.

a.  I added a bit more text to the source and reformatted
part of the code, but I didn't want to change existing 
code more than needed.

b.  generator:
\- checks if the filter file exists
\- reads it
\- runs the next generator and skips pages in memory

Previously, it first run the next generator and then deleted 
from its result pages that were in the filter file

c.  replace.py command line options

I added several command line options to define which
pages should be skipped the next time. One could edit 
replace.py directly, but it seemed cleaner to provide 
all options at command line level.

toobaz excluded pages where a replacement was manually 
rejected \(N\). The option -exclude will keep this 
functionality.

Personally, I find it more useful to filter pages that
were edited in a previous run. This avoids that the bot 
repeats the same edit later, after someone reverted 
a previous edit. Option -editonce provides this.

-treatonce combines the two.

-scanonce avoids that the bot re-fetches the same page
in a 2nd run, even if the regex didn't match it in 
the first run. \(I fixed an omission for skipped in
the second patch\)

Without the different options, the additions to replace.py
would be much shorter ..

d.  I had to insert several break in replace.py to avoid
that nothing but N gets to the stage confusingly labeled
choice must be 'N' in the code.

e.  FilterFileAppend is based on the function from
solve\_disambiguation. The advantage of writing each
page to the file is that it wont miss one if it's
interrupted or crashes. This mode from 
solve\_disambiguation remains unchanged.

f.  The same goes for the file format. Up to now, I didn't
have any problems with it and it worked ok with a
title 臺灣Taiwanāàäà I just tested. urlname was also
used by PrimaryIgnoreManager. For backward compatibility, 
may it should be kept.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
patch for replace.py solve_disambiguation.py pagegenerators.py (revised)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

--- Comment #4 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
Wow, that's a big patch =\)

\* codecs is fine with me
\* can you avoid lines gt; 80 characters? I know that this is not something we
do everywhere, but that's bad looking code. Same goes for if foo: bar. Please
skip a line.
\* can you document thoroughly what's being done? parameters in the generators?
In replace.py ? I find it really hard to understand the choice table in the
docstring explaining -scanonce  others. 
\* What's this:
\+f = codecs.open\(filename, 'r', 'utf-8'\)
\+f.close\(\)
??

I am also not convinced by the fact that after each page, FilterFileAppend is
called, and \#1 path is computed, \#2 a file is opened, written in, and closed.
I'm thinking that a possible cleaner way to do this would be to have a Filter
object: put everything you need in it \(an opened file descriptor, a list of
titles to ignore if you need to use this, etc...\) and keep a reference to it
from the replace  disambig bots. How does that sound to you?

I also know that Daniel wanted first to keep the same file format, but... a
couple of things are wrong here:
\* if you output titles with page.urlname\(\) it will not be possible to read
the file with TextfilePageGenerator afaik. Think of special characters, being
url encoded, and not decoded.
\* if you want to use a Page title for a filename, you want
Page.titleforFilename, not Page.urlname

Thank you\!

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54574] Re 1843798: Add capabiliy to remember pages to replace.py

2013-09-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54574

--- Comment #8 from Kunal Mehta (Legoktm) legoktm.wikipe...@gmail.com ---
- **priority**: 5 -- 7

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l