Re: [Python-Dev] Fast Implementation for ZIP decryption
Does it sound worthy enough to create a patch for and integrate into python itself? Probably not, given that people think that the algorithm itself is fairly useless. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
On 12:59 pm, st...@pearwood.info wrote: On Sun, 30 Aug 2009 06:55:33 pm Martin v. L�wis wrote: Does it sound worthy enough to create a patch for and integrate into python itself? Probably not, given that people think that the algorithm itself is fairly useless. I would think that for most people, the threat model isn't the CIA is reading my files but my little brother or nosey co-worker is reading my files, and for that, zip encryption with a good password is probably perfectly adequate. E.g. OpenOffice uses it for password-protected documents. Given that Python already supports ZIP decryption (as it should), are there any reasons to prefer the current pure-Python implementation over a faster version? Given that the use case is protect my biology homework from my little brother, how fast does the implementation really need to be? Is speeding it up from 0.1 seconds to 0.001 seconds worth the potential new problems that come with more C code (more code to maintain, less portability to other runtimes, potential for interpreter crashes or even arbitrary code execution vulnerabilities from specially crafted files)? Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
On 30 aug 2009, at 16:34, Shashank Singh wrote: just to give you an idea of the speed up: a 3.3 mb zip file extracted using the current all-python implementation on my machine (win xp 1.67Ghz 1.5GB) takes approximately 38 seconds. the same file when extracted using c implementation takes 0.4 seconds. If this matters to the users of the API, then likely they'd search for alternatives -- no need for it to go into the standard library just because it replaces functionality, or am I misunderstanding? - Ludvig Ericson lud...@lericson.se ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration: help needed
I suggested a new extension for two reasons: * I'm using Linux, and I mentally skip over all extensions that mention win32... I guess others do the same, and in this case it's really a shame since converting EOL markers is a cross-platform problem: if someone creates a repository on Windows, I might find it nice to translate the EOL markers into LF on my machine. As far as I know, all my tools works correctly with CRLF EOL markers, but I can see the usefulness of such an extension when adding new files (which would default to LF unless I take care). * A new extension will not have to deal with backwards compatibility issues. That would let us clean up the strange names: I think cleverencode: and cleverdecode: quite poor names that convey little meaning (and what's with the colon?). We could instead use the same names as Subversion: native, CRLF and LF. The new extension could be named 'convert-eol' or something like that. Thanks for the confirmation - this is also why I think a new extension would be best. FWIW, in Python, most files would be declared native, some CRLF, none LF. 2) These same recent discussions about an entirely new extension and no clear indication of our expectations regarding what the tool actually enforces means I'm not sure how to make a start on the more general issue. It would be a folly to require all files in all changesets to use the right EOL markers -- people will be making mistakes offline. The important thing is that they fix them before pushing to a public server. So the extension should do that: either abort commits with the wrong EOL markers or do as Subversion and automatically convert the file in the working copy. Maybe I misunderstand: when people use the extension, they cannot possibly make mistakes, right? Because the commit that gets aborted is already the local commit, right? Of course, it may still be that not all people use the extension. I think this is of concern to Mark (and he would like hg to refuse operation at all if the extension isn't used), but not to me: I would like this to be a feature of hg eventually, in which case I don't need to worry whether hg enforces presence of certain extensions. If people make commits that break the eol style, we could well refuse to accept them on the server, telling people that they should have used the extension (or that they should have been more careful if they don't use the extension). I think subversion's behavior wrt. incorrect eol-style is more subtle. In some cases, it will complain about inconsistencies, rather than fixing them automatically. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration: help needed
Martin v. Löwis mar...@v.loewis.de writes: So the extension should do that: either abort commits with the wrong EOL markers or do as Subversion and automatically convert the file in the working copy. Maybe I misunderstand: when people use the extension, they cannot possibly make mistakes, right? Because the commit that gets aborted is already the local commit, right? Of course, it may still be that not all people use the extension. Exactly, when people use the extension, they wont be able to make bad commits. I think this is of concern to Mark (and he would like hg to refuse operation at all if the extension isn't used), but not to me: I would like this to be a feature of hg eventually, in which case I don't need to worry whether hg enforces presence of certain extensions. Yes, that would be nice for the future. I don't know if the other Mercurial developers will see this as a big controversy -- Mercurial has so far made very sure to never mutate your files behind your back. Expansion of keywords (like $Id$) is also implemented as an extension. If people make commits that break the eol style, we could well refuse to accept them on the server, telling people that they should have used the extension (or that they should have been more careful if they don't use the extension). Indeed. Their work will not be lost -- one can always take the final file, convert the line-endings, copy it into a fresh clone and commit that. With more work one could even salvage the intermediate commits, but that is probably not necessary. I think subversion's behavior wrt. incorrect eol-style is more subtle. In some cases, it will complain about inconsistencies, rather than fixing them automatically. Okay --- I don't have much experience with the svn:eol-style, except that I've read about it in the manual. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/. pgpaYHbx5rh2L.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
exar...@twistedmatrix.com wrote: Given that the use case is protect my biology homework from my little brother, how fast does the implementation really need to be? Is speeding it up from 0.1 seconds to 0.001 seconds worth the potential new problems that come with more C code (more code to maintain, less portability to other runtimes, potential for interpreter crashes or even arbitrary code execution vulnerabilities from specially crafted files)? Also, if the use case is just protecting stuff from a sibling or your childen, use an archiving program to zip/extract it :) So -1 here as well. Any added C code has a real cost for the reasons Jean-Paul listed, so it should only be used in cases where there's a major practical benefit to the speed-up. Faster execution of a problematic algorithm that is already well implemented by plenty of other applications doesn't qualify in my book (even if the speedup is by a couple of orders of magnitude). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import. Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function). My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it? -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, 2009-08-30 at 16:28 -0700, Brett Cannon wrote: My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it? Just to be clear, this would show up if I: had a python tree built and run stuff from it symlinked to that tree from somewhere else ran stuff from that somewhere else - because the pyc is already on disk? Thats been an invaluable 'wtf' debugging tool at various times, because the odd provenance of the path in the pyc makes it extremely clear that what is being loaded isn't what one had thought was being loaded. OTOH, always showing the path that the pyc was *actually found at* would fix the weirdness that occurs when you mv a python tree from one place to another. -Rob signature.asc Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannonbr...@python.org wrote: I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import. Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function). My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it? ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the canonical name on a replicated read-only filesystem, and the longer writable name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case. In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario. (I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 5:23 PM, Brett Cannonbr...@python.org wrote: Right; the code object would think it was loaded from the original location it was created at instead of where it actually is. Now why someone would want to move their .pyc files around instead of recompiling I don't know short of not wanting to send someone source. I already mentioned replication; it could also just be a matter of downloading a tarball with .py and .pyc files. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 17:24, Guido van Rossumgu...@python.org wrote: On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannonbr...@python.org wrote: I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import. Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function). My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it? ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the canonical name on a replicated read-only filesystem, and the longer writable name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case. In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario. Well, to get this level of compatibility I am going to need to add some magical API somewhere then to overwrite a code object's file location. Blah. I will either add an argument to marshal.loads to specify an overriding file path or add an imp.exec that takes a file path argument to override the code object with. (I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-) So it turns out a bunch of execution/write bit stuff has come up in Python 2.7 and importlib has been ignoring it. =) Importlib has simply been opening up the bytecode files with 'wb' and writing out the file. But test_import tests that no execution bit get set or that a write bit gets added if the source file lacks it. I guess I can use posix.chmod and posix.stat to copy the source file's read and write bits and always mask out the execution bits. I hate this low-level file permission stuff. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannonbr...@python.org wrote: On Sun, Aug 30, 2009 at 17:24, Guido van Rossumgu...@python.org wrote: On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannonbr...@python.org wrote: I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import. Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function). My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it? ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the canonical name on a replicated read-only filesystem, and the longer writable name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case. In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario. Well, to get this level of compatibility I am going to need to add some magical API somewhere then to overwrite a code object's file location. Blah. Agreed, no fun. Unfortunately for core Python it really pays to go the extra mile... I will either add an argument to marshal.loads to specify an overriding file path or add an imp.exec that takes a file path argument to override the code object with. Remember, there are many code objects created from one pyc file. Adding it to marshal.load*() makes sense because then it's usable for other purposes too, and that attacks the issue from the root. (in import.c it's done by update_compiled_module() right after read_compiled_module(), which is a thin wrapper around marshal.load()) I'm not sure how imp.exec would make sure that introspection of the loaded code objects always gets the right thing. (I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-) So it turns out a bunch of execution/write bit stuff has come up in Python 2.7 and importlib has been ignoring it. =) Importlib has simply been opening up the bytecode files with 'wb' and writing out the file. But test_import tests that no execution bit get set or that a write bit gets added if the source file lacks it. I guess I can use posix.chmod and posix.stat to copy the source file's read and write bits and always mask out the execution bits. I hate this low-level file permission stuff. It's no fun -- see the layers of #ifdefs in open_exclusive() in import.c. (Though I think you won't need to worry about VMS. :-) But it's somewhat important to get it right from a security POV. I would use os.open() and wrap an io.BufferedWriter around it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
On Sun, Aug 30, 2009 at 7:34 AM, Shashank Singhshashank.sunny.si...@gmail.com wrote: just to give you an idea of the speed up: a 3.3 mb zip file extracted using the current all-python implementation on my machine (win xp 1.67Ghz 1.5GB) takes approximately 38 seconds. the same file when extracted using c implementation takes 0.4 seconds. Are there any applications/frameworks which have zip files on their critical path, where this kind of (admittedly impressive) speedup would be beneficial? What was the motivation for writing the C version? Collin Winter On Sun, Aug 30, 2009 at 6:35 PM, exar...@twistedmatrix.com wrote: On 12:59 pm, st...@pearwood.info wrote: On Sun, 30 Aug 2009 06:55:33 pm Martin v. Löwis wrote: Does it sound worthy enough to create a patch for and integrate into python itself? Probably not, given that people think that the algorithm itself is fairly useless. I would think that for most people, the threat model isn't the CIA is reading my files but my little brother or nosey co-worker is reading my files, and for that, zip encryption with a good password is probably perfectly adequate. E.g. OpenOffice uses it for password-protected documents. Given that Python already supports ZIP decryption (as it should), are there any reasons to prefer the current pure-Python implementation over a faster version? Given that the use case is protect my biology homework from my little brother, how fast does the implementation really need to be? Is speeding it up from 0.1 seconds to 0.001 seconds worth the potential new problems that come with more C code (more code to maintain, less portability to other runtimes, potential for interpreter crashes or even arbitrary code execution vulnerabilities from specially crafted files)? Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/shashank.sunny.singh%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/collinw%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
-On [20090831 06:29], Collin Winter (coll...@gmail.com) wrote: Are there any applications/frameworks which have zip files on their critical path, where this kind of (admittedly impressive) speedup would be beneficial? What was the motivation for writing the C version? Would zipped eggs count? For example, SQLAlchemy runs in the 5 MB range. -- Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B All for one, one for all... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 19:51, Benjamin Petersonbenja...@python.org wrote: 2009/8/30 Brett Cannon br...@python.org: On Sun, Aug 30, 2009 at 19:34, Guido van Rossumgu...@python.org wrote: On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannonbr...@python.org wrote: On Sun, Aug 30, 2009 at 17:24, Guido van Rossumgu...@python.org wrote: (I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-) So it turns out a bunch of execution/write bit stuff has come up in Python 2.7 and importlib has been ignoring it. =) Importlib has simply been opening up the bytecode files with 'wb' and writing out the file. But test_import tests that no execution bit get set or that a write bit gets added if the source file lacks it. I guess I can use posix.chmod and posix.stat to copy the source file's read and write bits and always mask out the execution bits. I hate this low-level file permission stuff. It's no fun -- see the layers of #ifdefs in open_exclusive() in import.c. (Though I think you won't need to worry about VMS. :-) But it's somewhat important to get it right from a security POV. I would use os.open() and wrap an io.BufferedWriter around it. I will have to see what of that is implemented in C or in Python. I have always tried to keep all pure Python code out of importlib for bootstrapping reasons in order to keep the possibility of using importlib as the implementation of import. But maybe I should not be worrying about that right at the moment and instead do what keeps the code simple. You can use the C implementation of io, _io, which has a full buffering implementation. Of course, that also makes it a better harder for other implementations which may wish to use importlib because the io library would have to be completely implemented... True. I guess it's a question of whether making importlib easier to maintain and as minimally reliant on C-specific modules is more/less important than trying to bootstrap it in for CPython for __import__ at some point. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com