Re: modifying standard library functionality (difflib)

2010-06-24 Thread Bruno Desthuilliers

Vlastimil Brom a écrit :

Hi all,
I'd like to ask about the most reasonable/recommended/... way to
modify the functionality of the standard library module (if it is
recommended at all).


(snip)

However, I'd like to ask, how to best maintain this modified
functionality in the sourcecode.
I tried some possibilities, which seem to work, but I'd appreciate
suggestions on the preferred way in such cases.
- It is simply possibly to have a modified sourcefile difflib.py in
the script directory.


You'd better do real fork then and rename the damn thing to avoid 
confusions and name shadowing.




- Furthermore one can subclass difflib.SequenceMatcher an overide its
__chain_b function (however the name doesn't look like a public
function ...


It's indeed a very private one. Beware of name mangling here, can lead 
to surprising results !-)


Also, overriding an implementation method, your code might break with 
each new release, so it kind of tie you to a specific version (or set 
of...). The odds depend on difflib's source code stability.



- I guess, it wouldn't be recommended to directly replace
difflib.SequenceMatcher._SequenceMatcher__chain_b ...


For which definition of directly replace ? If you mean patching the 
standardlib's source code inplace, then it's definitly not something i'd 
do.  Monkeypatching OTHO is sometimes the simplest solution, specially 
for temporary fixes or evolutions.


Anyway - which solution (forking, subclassing or monkeypatching) is the 
most appropriate really depends on the context so only you can decide. 
If it's for personal use only and not mission-critical, go for the 
simplest working solution. If it's going to be publicly released, you 
may want to consider contacting the difflib maintainer and submit a 
patch, and rely on a monkeypatch in the meantime. If you think you'll 
have a need for more modifications / specialisations / evolution to 
difflib, then just fork.


My 2 cents.
--
http://mail.python.org/mailman/listinfo/python-list


Re: modifying standard library functionality (difflib)

2010-06-24 Thread Vlastimil Brom
2010/6/24 Bruno Desthuilliers bruno.42.desthuilli...@websiteburo.invalid:
 Vlastimil Brom a écrit :

 Hi all,
 I'd like to ask about the most reasonable/recommended/... way to
 modify the functionality of the standard library module (if it is
 recommended at all).

 ...
 - I guess, it wouldn't be recommended to directly replace
 difflib.SequenceMatcher._SequenceMatcher__chain_b ...

 For which definition of directly replace ? If you mean patching the
 standardlib's source code inplace, then it's definitly not something i'd do.
  Monkeypatching OTHO is sometimes the simplest solution, specially for
 temporary fixes or evolutions.

 Anyway - which solution (forking, subclassing or monkeypatching) is the most
 appropriate really depends on the context so only you can decide. If it's
 for personal use only and not mission-critical, go for the simplest working
 solution. If it's going to be publicly released, you may want to consider
 contacting the difflib maintainer and submit a patch, and rely on a
 monkeypatch in the meantime. If you think you'll have a need for more
 modifications / specialisations / evolution to difflib, then just fork.

 My 2 cents.
 --


Many thanks for your insights!
Just now, I am the almost the only user of this script, hence the
consequences of version mismatches etc. shouldn't (directly) affect
anyone else, fortunately.
However, I'd like to ask for some clarification about monkeypatching -
With directly replace I  meant something like the following scenario:

import difflib

def tweaked__chain_b(self):
# modified code of the function __chain_b copy from Lib\difflib.py
...

difflib.SequenceMatcher._SequenceMatcher__chain_b = tweaked__chain_b

this way I can only unconditionally change the functionality, as the
signature of SequenceMatcher (which is then used in my script) remains
unchanged.

I thought, this would qualify as monkeypatching, but I am apparently
missing some distinction between patching the ... code inplace  and
monkeypatching.
Is it maybe a difference, if one makes backups of the original
objects and reactivates them after the usage of the patched code?

By subclassing (which I am using just now in the code) the behaviour
can be parametrised:

class my_difflib_SequenceMatcher(difflib.SequenceMatcher):
def __init__(self, isjunk=None, a='', b='', checkpopular=True):
# checkpopular added parameter to the signature
self.checkpopular = checkpopular
 ...

def __chain_b(self):
# modified copy from Lib\difflib.py - reacting to the value of
self.checkpopular

An official update of the source in the standard library is probably
not viable (at least not in a way that would currently help me, as my
code only supports python 2.x due to the relevant dependencies
(wxpython )
Otherwise, it would depend on other users' needs (e.g. finer diff at
the cost of the much slower code in some cases )

Thanks again for your thoughts.
   vbr
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: modifying standard library functionality (difflib)

2010-06-24 Thread Bruno Desthuilliers

Vlastimil Brom a écrit :


Many thanks for your insights!
Just now, I am the almost the only user of this script, hence the
consequences of version mismatches etc. shouldn't (directly) affect
anyone else, fortunately.


So far so good.


However, I'd like to ask for some clarification about monkeypatching -
With directly replace I  meant something like the following scenario:

import difflib

def tweaked__chain_b(self):
# modified code of the function __chain_b copy from Lib\difflib.py
...

difflib.SequenceMatcher._SequenceMatcher__chain_b = tweaked__chain_b

I thought, this would qualify as monkeypatching,


It does, indeed


but I am apparently
missing some distinction between patching the ... code inplace  and
monkeypatching.


patching source code canonically means physically modifying the 
original source file. Monkeypatching - which can only be done in some 
dynamic languages - is what you're doing above, ie dynamically replacing 
a given feature at runtime.



By subclassing (which I am using just now in the code)


If it already works and you don't have to care too much about possible 
compat issues with different difflib versions, then look no further.


--
http://mail.python.org/mailman/listinfo/python-list


Re: modifying standard library functionality (difflib)

2010-06-24 Thread Vlastimil Brom
2010/6/24 Bruno Desthuilliers bruno.42.desthuilli...@websiteburo.invalid:
 Vlastimil Brom a écrit :


 patching source code canonically means physically modifying the original
 source file. Monkeypatching - which can only be done in some dynamic
 languages - is what you're doing above, ie dynamically replacing a given
 feature at runtime.


Thank you very much for the clarification (I indeed didn't consider
this ultima ratio approach :-)
Thanks for  the positive suggestion as well.

Regards,
   vbr
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: modifying standard library functionality (difflib)

2010-06-24 Thread Paul Rubin
Bruno Desthuilliers bruno.42.desthuilli...@websiteburo.invalid writes:
 patching source code canonically means physically modifying the
 original source file. Monkeypatching - which can only be done in some
 dynamic languages - is what you're doing above, ie dynamically
 replacing a given feature at runtime.

I came across a less polite term for this, analogous with duck typing:

 http://justatheory.com/computers/programming/methodology/fuck-typing.html

Example application in perl:

 http://www.justatheory.com/computers/programming/perl/fuck-typing-lwp.html
-- 
http://mail.python.org/mailman/listinfo/python-list


modifying standard library functionality (difflib)

2010-06-23 Thread Vlastimil Brom
Hi all,
I'd like to ask about the most reasonable/recommended/... way to
modify the functionality of the standard library module (if it is
recommended at all).
I'm using difflib.SequenceMatcher for character-wise comparisons of
the texts; although this might not be a usual use case, the results
are fine for the given task; however,  there were some cornercases,
where the shown differences were clearly larger than needed. As it
turned out, this is due to a kind of specialcasing of relatively more
frequent items; cf.
http://bugs.python.org/issue1528074#msg29269
http://bugs.python.org/issue2986
The solution (or workaround) for me was to modify the SequenceMatcher
class by adding another parameter checkpopular=True which influences
the behaviour of the __chain_b function accordingly. The possible
speed issues with this optimisation turned off (checkpopular=False)
don't really matter now and the comparison results are much better for
my use cases.

However, I'd like to ask, how to best maintain this modified
functionality in the sourcecode.
I tried some possibilities, which seem to work, but I'd appreciate
suggestions on the preferred way in such cases.
- It is simply possibly to have a modified sourcefile difflib.py in
the script directory.
- Furthermore one can subclass difflib.SequenceMatcher an overide its
__chain_b function (however the name doesn't look like a public
function ...
- I guess, it wouldn't be recommended to directly replace
difflib.SequenceMatcher._SequenceMatcher__chain_b ...
In all cases I have either a copy of the whole file or the respective
function as a part of my source.

I'd appreciate comments or suggestions on this or maybe another better
approaches to this problem.

Thanks in advance,
   vbr
-- 
http://mail.python.org/mailman/listinfo/python-list