writing large files quickly

2006-01-27 Thread rbt
I've been doing some file system benchmarking. In the process, I need to 
create a large file to copy around to various drives. I'm creating the 
file like this:

fd = file('large_file.bin', 'wb')
for x in xrange(40960):
 fd.write('0')
fd.close()

This takes a few minutes to do. How can I speed up the process?

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread rbt
Grant Edwards wrote:
 On 2006-01-27, Tim Chase [EMAIL PROTECTED] wrote:
 
fd.write('0')

[cut]

f = file('large_file.bin','wb')
f.seek(40960-1)
f.write('\x00')

While a mindblowingly simple/elegant/fast solution (kudos!), the 
OP's file ends up with full of the character zero (ASCII 0x30), 
while your solution ends up full of the NUL character (ASCII 0x00):
 
 
 Oops.  I missed the fact that he was writing 0x30 and not 0x00.
 
 Yes, the hole in the file will read as 0x00 bytes.  If the OP
 actually requires that the file contain something other than
 0x00 bytes, then my solution won't work.
 

Won't work!? It's absolutely fabulous! I just need something big, quick 
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes 
a second or two while every other solution takes at least 2 - 5 minutes. 
Awesome... thanks for the tip!!!

Thanks to all for the advice... one can really learn things here :)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread rbt
Grant Edwards wrote:
 On 2006-01-27, rbt [EMAIL PROTECTED] wrote:
 
 
I've been doing some file system benchmarking. In the process, I need to 
create a large file to copy around to various drives. I'm creating the 
file like this:

fd = file('large_file.bin', 'wb')
for x in xrange(40960):
 fd.write('0')
fd.close()

This takes a few minutes to do. How can I speed up the process?
 
 
 Don't write so much data.
 
 f = file('large_file.bin','wb')
 f.seek(40960-1)
 f.write('\x00')
 f.close()

OK, I'm still trying to pick my jaw up off of the floor. One question... 
  how big of a file could this method create? 20GB, 30GB, limit depends 
on filesystem, etc?

 That should be almost instantaneous in that the time required
 for those 4 lines of code is neglgible compared to interpreter
 startup and shutdown.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread rbt
Donn Cave wrote:
 In article [EMAIL PROTECTED],
  rbt [EMAIL PROTECTED] wrote:
 
Won't work!? It's absolutely fabulous! I just need something big, quick 
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes 
a second or two while every other solution takes at least 2 - 5 minutes. 
Awesome... thanks for the tip!!!
 
 
 Because it isn't really writing the zeros.   You can make these
 files all day long and not run out of disk space, because this
 kind of file doesn't take very many blocks. 

Hmmm... when I copy the file to a different drive, it takes up 
409,600,000 bytes. Also, an md5 checksum on the generated file and on 
copies placed on other drives are the same. It looks like a regular, big 
file... I don't get it.


 The blocks that
 were never written are virtual blocks, inasmuch as read() at
 that location will cause the filesystem to return a block of NULs.
 
Donn Cave, [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread rbt
Grant Edwards wrote:
 On 2006-01-27, rbt [EMAIL PROTECTED] wrote:
 
 
Hmmm... when I copy the file to a different drive, it takes up 
409,600,000 bytes. Also, an md5 checksum on the generated file and on 
copies placed on other drives are the same. It looks like a regular, big 
file... I don't get it.
 
 
 Because the filesystem code keeps track of where you are in
 that 400MB stream, and returns 0x00 anytime you're reading from
 a hole.  The cp program and the md5sum just open the file
 and start read()ing.  The filesystem code returns 0x00 bytes
 for all of the read positions that are in the hole, just like
 Don said:

OK I finally get it. It's too good to be true :)

I'm going back to using _real_ files... files that don't look as if they 
are there but aren't. BTW, the file 'size' and 'size on disk' were 
identical on win 2003. That's a bit deceptive. According to the NTFS 
docs, they should be drastically different... 'size on disk' should be 
like 64K or something.

 
 
The blocks that were never written are virtual blocks,
inasmuch as read() at that location will cause the filesystem
to return a block of NULs.
 
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Apache with Python 2.4 Need Help !!!

2006-01-25 Thread rbt
Sybren Stuvel wrote:
 pycraze enlightened us with:
 
I am currently using Fedora Core - 3 with apache 2.0 Web Server and
Python 2.4 .

[...] i would like to know have apache released any version that can
be successfully use Python 2.4 ( with mod-python module ) using
Fedora Core -3 .
 
 
 I don't know about Fedora (crap distro IMO), but Apache
 2.0.54-5ubuntu4, mod_python 3.1.3-3ubuntu1 and Python 2.4.2-0ubuntu2
 work fine on Ubuntu Breezy.
 
 Sybren

I second that...

Apache/2.0.54 (Ubuntu) mod_python/3.1.3 Python/2.4.2 PHP/4.4.0-3ubuntu1 
Server at 127.0.0.1 Port 80

This was my first real go at using mod_python. It was a bit different, 
but once I got the hang of it, I like it very much. Existing py scripts 
need little modification to work, and if you've done any PHP or Perl web 
projects in the past, you'll understand 50% of it right away.

Best of luck!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.unlink() AND win32api.DeleteFile()

2006-01-24 Thread rbt
Tim Golden wrote:
 [rbt]
 
 | Can someone detail the differences between these two? On
 | Windows which is preferred?
 
 Looks like that's been answered elsewhere.
 
 | Also, is it true that win32api.DeleteFile() can remove the 'special'
 | files located in the 'special' folders only accessible by the shell
 | object such as Temporary Internet Files, etc.
 
 Generally, you want to look at the functions
 in the shell module from pywin32 for these.
 Specifically, look at
 
 [using: from win32com.shell import shell, shellcon
  because I always forget *which* is the shell module
  I need to import]
 
 shell.SHGetSpecialFolderLocation
 shell.SHFileOperation
 
 The former will find the real location of various
 special-looking folders. The latter will move/copy etc.
 through the shell which means, among other things, that
 you'll see the flying folders animated icon.
 
 TJG
 

Thanks for the explanation guys!
-- 
http://mail.python.org/mailman/listinfo/python-list


os.unlink() AND win32api.DeleteFile()

2006-01-23 Thread rbt
Can someone detail the differences between these two? On Windows which 
is preferred?

Also, is it true that win32api.DeleteFile() can remove the 'special' 
files located in the 'special' folders only accessible by the shell 
object such as Temporary Internet Files, etc.

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a more precise re for email addys

2006-01-19 Thread rbt
[EMAIL PROTECTED] wrote:
 Does it really need to be a regular expression? Why not just write a
 short function that breaks apart the input and validates each part?
 
 def IsEmail(addr):
   'Returns True if addr appears to be a valid email address'
 
   # we don't allow stuff like [EMAIL PROTECTED]@biff.com
   if addr.count('@') != 1:
 return False
   name, host = addr.split('@')
 
   # verify the hostname (is an IP or has a valid TLD, etc.)
   hostParts = host.split('.')
   ...
 
 That way you'd have a nice, readable chunk of code that you could tweak
 as needed (for example, maybe you'll find that the RFC is too liberal
 so you'll end up needing to add additional rules to exclude bad
 addresses).
 

Just to follow-up on this. I found that doing something such as this 
along with a more generic RE that the results are much better. Thanks 
for the idea!
-- 
http://mail.python.org/mailman/listinfo/python-list


a more precise re for email addys

2006-01-18 Thread rbt
Is it possible to write an re that _only_ matches email addresses? I've 
been googling around and have found several examples on the Web, but all 
of them produce too many false positives... here are examples from 
Google that I've experimented with:

re.compile('([EMAIL PROTECTED])')
re.compile(r'[EMAIL PROTECTED],4}')
re.compile('(\S+)@(\S+)')

All of these will find email addys, but they also find other things. 
Could someone demonstrate how to write a more accurate re for emails?

BTW, this is not for spam, but like any tool could be used in a bad way.

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a more precise re for email addys

2006-01-18 Thread rbt
Jim wrote:
 There is a precise one in a Perl module, I believe.
   http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
 Can you swipe that?
 
 Jim
 

I can swipe it... but it causes my head to explode. I get unbalanced 
paratheses errors when trying to make it work as a python re... it makes 
more sense when broken up like this:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()@,;:\\.\[\] \000-\031]
+(?:(?:(?:\r\n)... \000-\031]
+(?:(?:(?:\r\n)... \000-\031]
+(?:(?:(?:\r\n)... \000-\031]
...
...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a more precise re for email addys

2006-01-18 Thread rbt
[EMAIL PROTECTED] wrote:
 Does it really need to be a regular expression? Why not just write a
 short function that breaks apart the input and validates each part?
 
 def IsEmail(addr):
   'Returns True if addr appears to be a valid email address'
 
   # we don't allow stuff like [EMAIL PROTECTED]@biff.com
   if addr.count('@') != 1:
 return False
   name, host = addr.split('@')
 
   # verify the hostname (is an IP or has a valid TLD, etc.)
   hostParts = host.split('.')
   ...
 
 That way you'd have a nice, readable chunk of code that you could tweak
 as needed (for example, maybe you'll find that the RFC is too liberal
 so you'll end up needing to add additional rules to exclude bad
 addresses).
 

Good idea. I'll see what I can do with this. Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


recursively removing files and directories

2006-01-16 Thread rbt
What is the most efficient way to recursively remove files and directories?

Currently, I'm using os.walk() to unlink any files present, then I call 
os.walk() again with the topdown=False option and get rid of diretories 
with rmdir. This works well, but it seems that there should be a more 
efficient way. Here are my function definitions:

def remove_files(target_dir):
 # This attempts to remove _all_ files from a directory.
 # Use with caution on directories that store temporary files.

 for root, dirs, files in os.walk(target_dir):
 for f in files:

 try:
 # Make attributes normal so file can be deleted.
 win32api.SetFileAttributes(os.path.join(root, f), 
win32con.FILE_ATTRIBUTE_NORMAL)
 except:
 pass

 try:
 # Try to delete the file.
 os.unlink(os.path.join(root, f))
 except:
 pass

def remove_dirs(target_dir):
 # This attempts to remove _all_ sub directories from a directory.
 # Use with caution on directories that store temporary information.

 for root, dirs, files in os.walk(target_dir, topdown=False):
 for d in dirs:

 try:
 # Make attributes normal so dir can be deleted.
 win32api.SetFileAttributes(os.path.join(root, d), 
win32con.FILE_ATTRIBUTE_NORMAL)
 except:
 pass

 try:
 # Try to delete the directory.
 os.rmdir(os.path.join(root, d))
 except:
 pass
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: recursively removing files and directories

2006-01-16 Thread rbt
Tim N. van der Leeuw wrote:
 Wasn't this the example given in the Python manuals? Recursively
 deleting files and directories?

I don't know... I wrote it without consulting anything. Hope I'm not 
infringing on a patent :)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: recursively removing files and directories

2006-01-16 Thread rbt
Fuzzyman wrote:
 shutil.rmtree

Many thanks. I'll give that a go!

 
 You might need an ``onerror`` handler to sort out permissions.
 
 There is one for just this in pathutils :
 
 http://www.voidspace.org.uk/python/pathutils.html
 
 All the best,
 
 Fuzzyman
 http://www.voidspace.org.uk/python/index.shtml
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Space left on device

2006-01-16 Thread rbt
sir_alex wrote:
 Is there any function to see how much space is left on a device (such
 as a usb key)? I'm trying to fill in an mp3 reader in a little script,
 and this information could be very useful! Thanks!
 

On windows with the win32 extensions, you might try this:

# Get hard drive info from Windows.
drive = win32api.GetDiskFreeSpace('c:')
print drive[0], sectors per cluster.
print drive[1], bytes per sector.
print drive[2], free clusters.
print drive[3], total clusters.

You could use wmi to discover where the device has been mounted:
for item in c.Win32_DiskDrive():
 print item

instance of Win32_DiskDrive
{
 BytesPerSector = 512;
 Capabilities = {3, 4, 7};
 Caption = M-SysT5 Dell Memory Key USB Device;
 ConfigManagerErrorCode = 0;
 ConfigManagerUserConfig = FALSE;
 CreationClassName = Win32_DiskDrive;
 Description = Disk drive;
 DeviceID = .\\PHYSICALDRIVE1;
 Index = 1;
 InterfaceType = USB;
 Manufacturer = (Standard disk drives);
 MediaLoaded = TRUE;
 MediaType = Removable media other than\tfloppy;
 Model = M-SysT5 Dell Memory Key USB Device;
 Name = .\\PHYSICALDRIVE1;
 Partitions = 1;
 PNPDeviceID =
USBSTOR\\DISKVEN_M-SYST5PROD_DELL_MEMORY_KEYREV_5.00\\09809350C300C9D70;
 SectorsPerTrack = 63;
 Signature = 2865277640;
 Size = 254983680;
 Status = OK;
...
-- 
http://mail.python.org/mailman/listinfo/python-list


return values of os.system() on win32

2006-01-13 Thread rbt
Is it safe to say that any value returned by os.system() other than 0 is 
an error?

if os.system('winver') != 0:
 print Winver failed!
else:
 print Winver Worked.

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: return values of os.system() on win32

2006-01-13 Thread rbt
Peter Hansen wrote:
 rbt wrote:
 
 Is it safe to say that any value returned by os.system() other than 0 
 is an error?

 if os.system('winver') != 0:
  print Winver failed!
 else:
  print Winver Worked.
 
 
 According to the docs, assuming that *in general* would be an error, but 
 it's likely that for the sorts of cases you are talking about, it's true.
 
 Ultimately, since the return code is generally under the control of the 
 application you're calling, it's absolutely possible (likely) that there 
 are many programs which do not work as you assume above, and probably a 
 large number which don't ever explicitly set the return value at all...
 
 -Peter
 

OK, thanks guys. That's helpful... this is more of an MS issue than a 
Python issue.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: return values of os.system() on win32

2006-01-13 Thread rbt
Paul Watson wrote:
 rbt wrote:
 
 Is it safe to say that any value returned by os.system() other than 0 
 is an error?

 if os.system('winver') != 0:
 print Winver failed!
 else:
 print Winver Worked.

 Thanks!
 
 
 What are you really seeking to do?

This is a corner case. I'm trying to detect if the py script is running 
on a 'special' version of windows. I can't go into the details about 
what makes it unique. Python installs and runs, but the windows API 
isn't as complete as a normal Windows install... among other things, it 
doesn't have a winver.exe file, or if it does, it's crippled... this 
causes os.system('winver') to return a 1... while it returns 0 on 
Windows XP, etc.

 Are you wanting to detect if your 
 code is running on a Windows machine?  Are you wanting to know the 
 version number of Windows?  Why not use popen2() and see the output?


-- 
http://mail.python.org/mailman/listinfo/python-list


loops breaks and returns

2006-01-01 Thread rbt
Is it more appropriate to do this:

while 1:
 if x:
 return x

Or this:

while 1:
 if x:
 break
return x

Or, does it matter?
-- 
http://mail.python.org/mailman/listinfo/python-list


idle with -n switch

2006-01-01 Thread rbt
What impact does the -n option have on idle.py on Windows?
-- 
http://mail.python.org/mailman/listinfo/python-list


compare dictionary values

2005-12-30 Thread rbt
What's a good way to compare values in dictionaries? I want to find 
values that have changed. I look for new keys by doing this:

new = [k for k in file_info_cur.iterkeys() if k not in 
file_info_old.iterkeys()]
 if new == []:
 print new, No new files.
 else:
 print new, New file(s)!!!

My key-values pairs are filepaths and their modify times. I want to 
identify files that have been updated or added since the script last ran.

Thanks,
rbt
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: compare dictionary values

2005-12-30 Thread rbt
Marc 'BlackJack' Rintsch wrote:
 In [EMAIL PROTECTED], rbt wrote:
 
 What's a good way to compare values in dictionaries?
 
 Look them up and then compare!?  ;-)
 
 I want to find 
 values that have changed. I look for new keys by doing this:

 new = [k for k in file_info_cur.iterkeys() if k not in 
 file_info_old.iterkeys()]
  if new == []:
  print new, No new files.
  else:
  print new, New file(s)!!!

 My key-values pairs are filepaths and their modify times. I want to 
 identify files that have been updated or added since the script last ran.
 
 This looks up each `key` from the `new` dictionary and compares the value
 with the `old` one.  If it's not equal or the key is not present in `old`
 the key is appended to the `result`::
 
  def new_and_changed_keys(old, new):
  result = list()
  for (key, value) in new:
  try:
  if old[key] != value:
  result.append(key)
  except KeyError:
  result.append(key)
  return result
 
 Ciao,
   Marc 'BlackJack' Rintsch

Thanks Marc! I changed this line:

for (key, value) in new:

To this:

for (key, value) in new.iteritems():

And, it works great. Thanks again.
-- 
http://mail.python.org/mailman/listinfo/python-list


reading files into dicts

2005-12-29 Thread rbt
What's a good way to write a dictionary out to a file so that it can be 
easily read back into a dict later? I've used realines() to read text 
files into lists... how can I do the same thing with dicts? Here's some 
sample output that I'd like to write to file and then read back into a dict:

{'.\\sync_pics.py': 1135900993, '.\\file_history.txt': 1135900994, 
'.\\New Text Document.txt': 1135900552}
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: reading files into dicts

2005-12-29 Thread rbt
Gary Herron wrote:
 rbt wrote:
 
 What's a good way to write a dictionary out to a file so that it can 
 be easily read back into a dict later? I've used realines() to read 
 text files into lists... how can I do the same thing with dicts? 
 Here's some sample output that I'd like to write to file and then read 
 back into a dict:

 {'.\\sync_pics.py': 1135900993, '.\\file_history.txt': 1135900994, 
 '.\\New Text Document.txt': 1135900552}
  

 
 
 A better way, than rolling your own marshaling (as this is called), 
 would be to use the cPickle module. It can write almost any Python 
 object to a file, and then read it back in later. It's more efficient, 
 and way more general than any code you're likely to write yourself.
 
 The contents of the file are quite opaque to anything except the cPickle 
 and pickle modules. If you *do* want to roll you own input and output to 
 the file, the standard lib functions repr and eval can be used. Repr 
 is meant to write out objects so they can be read back in and recovered 
 with eval. If the contents of your dictionary are well behaved enough 
 (simple Python objects are, and classes you create may be made so), then 
 you may be able to get away with as little as this:
 
 f = file('file.name', 'wb')
 f.write(repr(myDictionary))
 f.close()
 
 and
 
 f = file('file.name', 'rb')
 myDictionary = eval(f.read())
 f.close()
 
 Simple as that is, I'd still recommend the cPickle module.
 
 As always, this security warning applys: Evaluating arbitrary text 
 allows anyone, who can change that text, to take over complete control 
 of your program. So be carefully.
 
 Gary Herron
 
 

Thanks to everyone for the tips on eval and repr. I went with the 
cPickle suggestion... this is awesome! It was the easiest and quickest 
solution performance-wise. Just makes me think, Wow... how the heck 
does pickle do that?!

Thanks again,
rbt
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: html resize pics

2005-12-28 Thread rbt
Peter Hansen wrote:
 rbt wrote:
 I use Python to generate html pages. I link to several large images at 
 times. I'd like to display a thumbnail image that when clicked will go 
 to the original, large jpg for a more detailed view.
 
 I use PIL with the thumbnail() function for that...  depending on what 
 sort of web server/framework you are using, you could generate the 
 thumbnails dynamically, or you could pre-generate them and store both 
 files (this is probably the most common way to do it).  The HTML, of 
 course, has to be generated to refer to the appropriate file: the IMG 
 source is the thumbnail, the anchor's href points to the larger image.
 
 -Peter
 

Thanks PIL is a very handy tool!
-- 
http://mail.python.org/mailman/listinfo/python-list


html resize pics

2005-12-27 Thread rbt
What's a good way to resize pictures so that they work well on html 
pages? I have large jpg files. I want the original images to remain as 
they are, just resize the displayed image in the browser.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: html resize pics

2005-12-27 Thread rbt
Peter Hansen wrote:
 rbt wrote:
 What's a good way to resize pictures so that they work well on html 
 pages? I have large jpg files. I want the original images to remain as 
 they are, just resize the displayed image in the browser.
 
 These two things are mutually exclusive by most people's definition of 
 work well.  You can't push the resizing down to the browser *and* 
 work well when working well includes avoiding downloading massive JPGs 
 when only small images are to be shown.
 
 Can you clarify what you really want? 

Sure, sorry. I use Python to generate html pages. I link to several 
large images at times. I'd like to display a thumbnail image that when 
clicked will go to the original, large jpg for a more detailed view.

  resize ... image in the browser
 implies merely using width and height attributes... 

How does one do that and keep the original ratio intact?

 size of the image *after* the whole thing has been downloaded, in which 
 case this would be solved with mere attributes on the IMG element.
 
 -Peter
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Windows and python execution

2005-12-26 Thread rbt
Mark Carter wrote:
 rzed wrote:
 Mark Carter [EMAIL PROTECTED] wrote in
 news:[EMAIL PROTECTED]:

 What I would like to do it type something like

 myscript.py

 instead of

 python myscript.py
 
 As another poster points out, be sure that your Python is on your path.
 And there is a PATHEXT environment variable, 
 
 Aha. You'bve provided a significant clue.
 
 What you need to do is include the following line in autoexec.bat:
 set .py=c:\python24\python.exe
 
 This will achieve the desired result. I'm suprised more people don't use 
 it.

I'm surprised the installer doesn't do it :)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Python

2005-12-26 Thread rbt
[EMAIL PROTECTED] wrote:
 Gekitsuu wrote:
 I've been reading a lot of python modules lately to see how they work
 and I've stumbled across something that's sort of annoying and wanted
 to find out of there was a good reason behind it. In a Perl program
 when you're calling other modules you'll add use statements at the
 beginning of your script like:

 use strict;
 use WWW::Mechanize;
 use CGI;

 This seems to be the de facto standard in the Perl community but in
 python it seems most of the code I look at has import statements
 everywhere in the code. Is there a sound reason for putting the imports
 there are are developers just loading modules in as they need them. I
 own Damian Conway's book of Perl Best Practices and it seems from a
 maintainability standpoint  that having all the modules declared at the
 beginning would make it easier for someone coming behind you to see
 what other modules they need to use yours. Being new I didn't know if
 there was a performance reason for doing this or it is simply a common
 habit of developers.
 
 
 Without taking anything away from other posts responding to your
 question, the first response perhaps should have been:
 
 Imports are always put at the top of the file, just after
  any module comments and docstrings, and before module
  globals and constants.
 
 Which is the official Python style guide, PEP 0008, at
 http://www.python.org/peps/pep-0008.html
 and basically a reflection of Guido's own recommendations.

I've always done it this way (import at the top in alphabetical order 
with standard modules before add-in modules). The one exception is that 
at times I use imports to make sure that certain modules are indeed 
installed. For example, when I must have win32 modules I import like this:

import os
import sys
import time

try:
 import win32api
except ImportError:
 print The win32 extensions must be installed!
 sys.exit()

 
 There are reasons to break almost any rule sometimes(*), but I think
 you were asking whether there IS a rule -- which is an insightful,
 worthy question if you've been looking at code which begs it.
 
 -Bill
 
 
 (*) PEP 0008 itself says, even before it lays out any rules,
 know when to be inconsistent.
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python coding contest

2005-12-25 Thread rbt
Steven D'Aprano wrote:
 On Sun, 25 Dec 2005 18:05:37 +0100, Simon Hengel wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 I'm envisioning lots of convoluted one-liners which
 are more suitable to a different P-language... :-)
 I feel that python is more beautiful and readable, even if you write
 short programs.

 How about best compromize between shortness and readibility
 plus elegance of design?
 I would love to choose those criteria for future events. But I'm not
 aware of any algorithm that is capable of creating a ranking upon them.
 
 
 What is your algorithm for determining shortest program? Are you
 counting tokens, lines or characters? Does whitespace count?
 
 
If whitespace and var names count, these things are going to be ugly :)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python coding contest

2005-12-25 Thread rbt
Simon Hengel wrote:
 Hello,
 we are hosting a python coding contest an we even managed to provide a
 price for the winner...
 
 http://pycontest.net/
 
 The contest is coincidentally held during the 22c3 and we will be
 present there.
 
 https://events.ccc.de/congress/2005/wiki/Python_coding_contest
 
 Please send me comments, suggestions and ideas.
 
 Have fun,
 

Does positioning matter? For example, say I give it '123' is it ok to 
output this:

1
2
3

Or does it have to be 123

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python coding contest

2005-12-25 Thread rbt
Tim Hochberg wrote:
 
 Is it necessary to keep the input parameter as 'input'? Reducing that to 
 a single character drops the length of a program by at least 8 
 characters. Technically it changes the interface of the function, so 
 it's a little bogus, but test.py doesn't check. (Personally I prefer 
 that if be illegal, but if it's legal I'll have to do it).
 
 -tim
 

isn't the word 'input' a special word anyway???
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido at Google

2005-12-23 Thread rbt
Anand wrote:
 It's like having James Bond as your very own personal body guard ;)
 
 That is such a nice quote that I am going to put it in my email
 signature ! :)
 
 -Anand
 

Go right ahead. Perhaps we should do one for Perl too:

It's like having King Kong as your very own personal body guard ;)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido at Google

2005-12-23 Thread rbt
Luis M. González wrote:
 rbt wrote:
 Go right ahead. Perhaps we should do one for Perl too:

 It's like having King Kong as your very own personal body guard ;)
 
 Good analogy:
 You know, they call Perl the eight-hundred-pound gorilla of scripting
 languages.

Absolutely. It's big, hairy, smelly, a bit dense at times and always
difficult to communicate with, but by god it gets the job done albeit in
a messy sort of way ;)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Indentation/whitespace

2005-12-23 Thread rbt
BartlebyScrivener wrote:
 What's needed is STRICTER whitespace enforcement, especially on April
 Fools Day. Some call it whitespace fascism.
 
 http://www.artima.com/weblogs/viewpost.jsp?thread=101968
 

I've only been coding Python for about 3 years now. C is the only other 
language I'm moderately good with. Most of the people on 
comp.lang.python have _much_ more coding experience than I do. I don't 
know who came up with this PEP, but I like it a lot. To me, it makes 
sense, just like Python makes sense. Any time-line on implementing this?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Indentation/whitespace

2005-12-23 Thread rbt
Gary Herron wrote:
 rbt wrote:
 
 BartlebyScrivener wrote:
  

 What's needed is STRICTER whitespace enforcement, especially on April
 Fools Day. Some call it whitespace fascism.

 http://www.artima.com/weblogs/viewpost.jsp?thread=101968

   

 I've only been coding Python for about 3 years now. C is the only 
 other language I'm moderately good with. Most of the people on 
 comp.lang.python have _much_ more coding experience than I do. I don't 
 know who came up with this PEP, but I like it a lot. To me, it makes 
 sense, just like Python makes sense. Any time-line on implementing this?
  

 Look at the date. That was an April Fools joke.
 
 Gary Herron
 

I still think it makes sense :) ... perhaps I'm a fool.
-- 
http://mail.python.org/mailman/listinfo/python-list


deal or no deal

2005-12-22 Thread rbt
The house almost always wins or are my assumptions wrong...

import random

amounts = [.01, 1, 5, 10, 25, 50, 75, 100, 200, 300, 400, 500, 750,
1000, 5000, 1, 25000, 5, 75000, 10, 20,
30, 40, 50, 75, 100]

results = []

count = 0
while count  10:
 count = count + 1
 pick = random.choice(amounts)
 if pick  10:
 results.append(NBC won... Your briefcase contains $%s %pick)
 else:
 results.append(You won... Your briefcase contains $%s %pick)

results.sort()
print Here are 10 random picks: 
for result in results:
 print result
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: deal or no deal

2005-12-22 Thread rbt
Bengt Richter wrote:
 On Thu, 22 Dec 2005 09:29:49 -0500, rbt [EMAIL PROTECTED] wrote:
 
 The house almost always wins or are my assumptions wrong...

 import random

 amounts = [.01, 1, 5, 10, 25, 50, 75, 100, 200, 300, 400, 500, 750,
1000, 5000, 1, 25000, 5, 75000, 10, 20,
30, 40, 50, 75, 100]

 results = []

 count = 0
 while count  10:
 count = count + 1
 pick = random.choice(amounts)
 if pick  10:
 results.append(NBC won... Your briefcase contains $%s %pick)
 else:
 results.append(You won... Your briefcase contains $%s %pick)

 results.sort()
 print Here are 10 random picks: 
 for result in results:
 print result
 
 I don't know what you are doing, but 10 is a small sample to draw confident
 conclusions from. E.g., try counting how many times pick10 out of
 a larger number, e.g.,

The TV show on NBC in the USA running this week during primetime (Deal 
or No Deal). I figure there are roughly 10, maybe 15 contestants. They 
pick a briefcase that has between 1 penny and 1 million bucks and then 
play this silly game where NBC tries to buy the briefcase from them 
while amounts of money are taken away from the list of possibilities. 
The contestant's hope is that they've picked a briefcase with a lot of 
money and that when an amount is removed from the list that it is small 
amount of money not a large amount (I categorize a large amount to be 
more than 100,000)

NBC tries to give the least amount of money possible away. The 
contestants try to get the most.

 What should the sum be?

100,000 or above unless you're already rich ;)


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido at Google

2005-12-22 Thread rbt
Alex Martelli wrote:

 Rhetorical
 questions are a perfectly legitimate style of writing (although, like
 all stylistic embellishments, they can be overused, and can be made much
 less effective if murkily or fuzzily phrased), of course.

Also, email doesn't convey rhetorical questions that well. Facial 
expressions and body movement aid the audience in picking up on things 
such as this... maybe Google can fix that too ;)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido at Google

2005-12-22 Thread rbt
Luis M. González wrote:
 Java = Sun
 .Net = Microsoft
 C# = Microsoft
 Linux = too many big name IT companies to mention
 Python =  ?
 
 I know at least one company responsible for a linux distro (Cannonical
 - Ubuntu), which encourages and even pays programmers for developing
 applications in Python.
 His founder, Mark Shuttleworth, is a python fan.

Aren't most all intelligent people Python fans?

Python is so unbarbaric or one might say 'refined', yet it can be 
applied in a practical manner to all sorts of things. It's like having 
James Bond as your very own personal body guard ;)
-- 
http://mail.python.org/mailman/listinfo/python-list


os.path.splitext() and case sensitivity

2005-12-21 Thread rbt
Hi,

Is there a way to make os.path.splitext() case agnostic?

def remove_file_type(target_dir, file_type):
 for root, dirs, files in os.walk(target_dir):
 for f in files:
 if os.path.splitext(os.path.join(root, f))[1] in file_type:
 pass

remove_file_type(sysroot, ['.tmp', '.TMP'])

As you can see, the way I do it now, I place file extensions in a list. 
However, I'd like to able just to say '.tmp' and for that to work on any 
type of file that has tmp (no matter the case) in the extension.

Many thanks!!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.path.splitext() and case sensitivity

2005-12-21 Thread rbt
Richie Hindle wrote:
 [rbt]
 Is there a way to make os.path.splitext() case agnostic?

 def remove_file_type(target_dir, file_type):
  for root, dirs, files in os.walk(target_dir):
  for f in files:
  if os.path.splitext(os.path.join(root, f))[1] in file_type:
  pass

 remove_file_type(sysroot, ['.tmp', '.TMP'])
 
   def remove_file_type(target_dir, file_type):
   [...]
if os.path.splitext(f)[1].lower() == file_type.lower():
pass
 
 remove_file_type(sysroot, '.tmp')
 

Thanks guys!!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.path.splitext() and case sensitivity

2005-12-21 Thread rbt
Juho Schultz wrote:
 rbt wrote:
 Hi,

 Is there a way to make os.path.splitext() case agnostic?

 def remove_file_type(target_dir, file_type):
 for root, dirs, files in os.walk(target_dir):
 for f in files:
 if os.path.splitext(os.path.join(root, f))[1] in file_type:
 pass

 remove_file_type(sysroot, ['.tmp', '.TMP'])

 As you can see, the way I do it now, I place file extensions in a 
 list. However, I'd like to able just to say '.tmp' and for that to 
 work on any type of file that has tmp (no matter the case) in the 
 extension.

 Many thanks!!!
 
 
 One solution would be to convert the extensions to lowercase
 (or uppercase, if you prefer that)
 
 if fileExtension.lower() == .tmp:

Many thanks... I did it this way as I sometimes delete files with 
different extensions:

def remove_file_type(target_dir, file_type):
 for root, dirs, files in os.walk(target_dir):
 for f in files:
 if os.path.splitext(os.path.join(root, f))[1].lower() in 
file_type:


remove_file_type(user_docs, ['.tmp', '.mp3'])
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido at Google

2005-12-21 Thread rbt
Alex Martelli wrote:
 I don't think there was any official announcement, but it's true -- he
 sits about 15 meters away from me;-).

For Americans: 15 meters is roughly 50 feet.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to execute an EXE via os.system() with spaces in the directory name?

2005-12-06 Thread rbt
[EMAIL PROTECTED] wrote:
 This comes up from time to time.  The brain damage is all Windows', not
 Python's.  Here's one thread which seems to suggest a bizarre doubling
 of the initial quote of the commandline.
 
 http://groups.google.com/group/comp.lang.python/browse_frm/thread/89d94656ea393d5b/ef40a65017848671

I do this:

  # remove spaces from ends of filenames.
  for root, dirs, files in os.walk('programs'):
  for fname in files:
  new_fname = fname.strip()
  if new_fname != fname:
  new_path = os.path.join(root,new_fname)
  old_path = os.path.join(root,fname)
  os.renames(old_path,new_path)

  # remove spaces from middle of filenames.
  for root, dirs, files in os.walk('programs'):
  for f in files:
  new_f = string.replace(f, ' ' , '-')
  new_path = os.path.join(root,new_f)
  old_path = os.path.join(root,f)
  os.renames(old_path,new_path)

  # install files.
  for root, dirs, files in os.walk('programs'):
  installable = ['.exe', '.msi', '.EXE', '.MSI']
  for f in files:
  ext = os.path.splitext(f)
  if ext[1] in installable:
  print f
  install = os.system(os.path.join(root,f))


-- 
http://mail.python.org/mailman/listinfo/python-list


extract python install info from registry

2005-12-06 Thread rbt
On windows xp, is there an easy way to extract the information that 
Python added to the registry as it was installed?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: extract python install info from registry

2005-12-06 Thread rbt
Laszlo Zsolt Nagy wrote:
 rbt wrote:
 
 On windows xp, is there an easy way to extract the information that 
 Python added to the registry as it was installed?
  

 Using regedit.exe, look at the registry keys and values under
 
 HKEY_LOCAL_MACHINE\Software\Python
 
 If you need to know how to read the registry from Python: please install 
 the python win32 extensions (or use ActivePython).
 
   Les
 

There's more to it than that... isn't there? I've used _winreg and the 
win32 extensions in the past when working with the registry. I thought 
perhaps someone had already scripted something to extract this info.

I'm creating a Python plugin for Bartpe (Windows Pre-Install 
Environment) and it works OK, but to make it work _exactly_ like it does 
on XP (.py and .pyw associate with python and pythonw), I need to 
extract the reg entries so I can recreate them in the WinPE environment.

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: extract python install info from registry

2005-12-06 Thread rbt
gene tani wrote:
 There's more to it than that... isn't there? I've used _winreg and the
 win32 extensions in the past when working with the registry. I thought
 perhaps someone had already scripted something to extract this info.

 
 Yes, a small firm named Microsoft has done this (but not tested w/2.4):
 
 http://www.microsoft.com/technet/scriptcenter/scripts/python/os/registry/osrgpy01.mspx
 

That tells me this:

Caption:  Registry
Current Size:  2
Description:  Registry
Install Date:  20051125152108.00-300
Maximum Size:  54
Name:  Microsoft Windows XP 
Professional|C:\WINDOWS|\Device\Harddisk0\Partition1
Proposed Size:  54
Status:  OK
-- 
http://mail.python.org/mailman/listinfo/python-list


speeding up Python when using wmi

2005-11-28 Thread rbt
Here's a quick and dirty version of winver.exe written in Python:

http://filebox.vt.edu/users/rtilley/public/winver/winver.html

It uses wmi to get OS information from Windows... it works well, but 
it's slow... too slow. Is there any way to speed up wmi?

In the past, I used the platform and sys modules to do some of what 
winver.exe does and they were rather fast. However, since switching to 
wmi (for a more accurate representation) thinngs have gotten slow... 
real slow.

I suppose this could be a wmi only issue not related at all to Python.

Any tips or ideas?

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: speeding up Python when using wmi

2005-11-28 Thread rbt
Tim Golden wrote:
 [rbt]
 
 Here's a quick and dirty version of winver.exe written in Python:
 
 [.. snip ..]
 
 It uses wmi to get OS information from Windows... it works well, but 
 it's slow... too slow. Is there any way to speed up wmi?
 
 In the past, I used the platform and sys modules to do some of what 
 winver.exe does and they were rather fast. However, since switching to
 
 wmi (for a more accurate representation) thinngs have gotten slow... 
 real slow.
 
 I suppose this could be a wmi only issue not related at all to Python.
 
 In short, I recommend replacing the wmi module by the underlying
 calls which it hides, and replacing Tkinter by a win32gui MessageBox.
 The wmi module does some magicish things which are useful for
 interactive
 browsing, but will only slow you down if you know exactly what you need.
 As you don't need anything more than a native message box, don't
 bother with GUI loops etc. Windows will do that for you in a Modal
 Dialog (here message box).
 
 This was going to be a longer post comparing versions, but in short
 running this code:
 
 python
 import win32gui
 import win32com.client
 
 for os in win32com.client.GetObject (winmgmts:).InstancesOf
 (Win32_OperatingSystem):
   win32gui.MessageBox (
 0,
 os.Properties_ (Caption).Value + \n + \
   os.Properties_ (TotalVisibleMemorySize).Value + \n + \
   os.Properties_ (Version).Value + \n + \
   os.Properties_ (CSDVersion).Value,
 Platform Info, 
 0
   )
 /python

Wow... thanks. I didn't expect someone to completely rewrite it like 
that. I'll use your example and name it PyWinver and attribute it to 
you. Hope you don't mind. Great learning experience.

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socketServer questions

2005-10-10 Thread rbt
On Sat, 2005-10-08 at 14:09 -0700, Paul Rubinhttp: wrote:
 rbt [EMAIL PROTECTED] writes:
  Off-topic here, but you've caused me to have a thought... Can hmac be
  used on untrusted clients? Clients that may fall into the wrong hands?
  How would one handle message verification when one cannot trust the
  client? What is there besides hmac? Thanks, rbt
 
 I don't understand the question.  HMAC requires that both ends share a
 secret key; does that help?  

That's what I don't get. If both sides have the key... how can it be
'secret'? All one would have to do is look at the code on any of the
clients and they'd then know everything, right?

 What do you mean by verification?

I'm trying to keep script kiddies from tampering with a socket server. I
want the server to only load a valid or verified string into its log
database and to discard everything else. 

Strings could come to the socket server from anywhere on the Net from
any machine. This is outside my control. What is there to prevent a
knowledgeable person from finding the py code on a client computer,
understanding it and then being able to forge a string that the server
will accept?

Does that make sense?

-- 
http://mail.python.org/mailman/listinfo/python-list


One last thing about SocketServer

2005-10-10 Thread rbt
I've read more about sockets and now, I have a better understanding of
them. However, I still have a few SocketServer module questions:

When used with SocketServer how exactly does socket.setdefaulttimeout()
work? Does it timeout the initial connect request to the socket server
or does it timeout the session between the connecting client socket and
the client socket the server generated to handle the incoming request? 

Also, since the *only* thing a 'socket server' does is to create 'client
sockets' to handle requests, how do I use socket object features on
these generated clients to manage and/or monitor them?

The SocketServer module is great, but it seems to hide too many details
of what it's up to!

Thanks,
rbt





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socketServer questions

2005-10-10 Thread rbt
On Mon, 2005-10-10 at 05:54 -0700, Paul Rubinhttp: wrote:
 rbt [EMAIL PROTECTED] writes:
   I don't understand the question.  HMAC requires that both ends share a
   secret key; does that help?  
  
  That's what I don't get. If both sides have the key... how can it be
  'secret'? All one would have to do is look at the code on any of the
  clients and they'd then know everything, right?
 
 Yes, clients have to keep the key secure.
 
   What do you mean by verification?
  
  I'm trying to keep script kiddies from tampering with a socket server. I
  want the server to only load a valid or verified string into its log
  database and to discard everything else. 
 
 If the clients can keep a secret key secure, then use hmac.  Note that
 if there's lots of clients, they shouldn't all use the same secret key.
 Instead, for client #i, let that client's key be something like
   hmac(your_big_secret, str(i)).digest()
 and the client would send #i as part of the string.

How is this different from sending a pre-defined string from the client
that the server knows the md5 hash of? The clients know the string, the
server knows the hash of that string.

Also, could this not be done both ways? So that, if an attacker figures
out the string he's supposed to send from a client to the server (which
he could easily do). He could not easily figure out the string the
server should send back as all he would have is the hash of that string.

So, before the actual data is sent from the client to the server. The
client would send it's secret string that the server would verify and
then if that worked, the server would send its own secret string that
the client must verify. We'd have two secret strings instead of one.


   You'd use
 #i to recompute the client's key and then use that derived key to
 verify the string.  This is called key derivation or key
 diversification.  If an attacker gets hold of that client's key and
 starts hosing you, you can disable that key without affecting the
 other ones.  (The client is issued only the derived key and never sees
 the big secret).

This is interesting. I didn't know that was possible.

 
  Strings could come to the socket server from anywhere on the Net from
  any machine. This is outside my control. What is there to prevent a
  knowledgeable person from finding the py code on a client computer,
  understanding it and then being able to forge a string that the server
  will accept?
 
 Yes, if you're concerned about insecure clients, you have a much more
 complex problem.  But your x..z..y scheme is far worse than hmac.
 Once the attacker figures that out, there's no security at all.

I dropped the x,y,z scheme after your first response ;)

 
 What is the actual application, if you can say?  Depending on the
 environment and constraints, various approaches are possible.

Nothing important. It just logs network data. It's an anti-theft program
for laptops that phones home data like this: public and private IP(s),
MAC addy, date, time, etc. Maybe I'm putting too much thought into it.
Python encourages good design and I try to follow that encouragement
when coding... even for trivial things such as this. 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socketServer questions

2005-10-10 Thread rbt
On Mon, 2005-10-10 at 07:46 -0700, Paul Rubinhttp: wrote:
 rbt [EMAIL PROTECTED] writes:
   Instead, for client #i, let that client's key be something like
 hmac(your_big_secret, str(i)).digest()
   and the client would send #i as part of the string.
  
  How is this different from sending a pre-defined string from the client
  that the server knows the md5 hash of? The clients know the string, the
  server knows the hash of that string.
 
 I'm confused, I don't understand what that md5 whatever would do for you.
 I'm assuming the server is secure and the clients are less secure.
 
  Also, could this not be done both ways? So that, if an attacker figures
  out the string he's supposed to send from a client to the server (which
  he could easily do). He could not easily figure out the string the
  server should send back as all he would have is the hash of that string.
 
 I'm still confused

OK, we'll leave it at that and just accept that we're from different
planets ;) Thanks for the help.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socketServer questions

2005-10-08 Thread rbt
On Fri, 2005-10-07 at 15:07 -0700, Paul Rubinhttp: wrote:
 rbt [EMAIL PROTECTED] writes:
  The server just logs data, nothing else. It's not private or important
  data... just sys admin type stuff (ip, mac addy, etc.). I just don't
  want some script kiddie discovering it and trying to 'hack' it. By doing
  so, they'd fill the log up with crap. So, If the data doesn't contain x,
  y, and z and if the data is too big or too small, I record it to a
  'tamper' log and tell the leet hacker to 'go away'. 
 
 Well, rather than this x,y,z stuff, it's best to do it properly and
 authenticate the records with the hmac module.


Off-topic here, but you've caused me to have a thought... Can hmac be
used on untrusted clients? Clients that may fall into the wrong hands?
How would one handle message verification when one cannot trust the
client? What is there besides hmac? Thanks, rbt

-- 
http://mail.python.org/mailman/listinfo/python-list


socketServer questions

2005-10-07 Thread rbt
I have written a python socketServer program and I have a few questions
that I hope the group can answer... here is a simple version of the
server:

class tr_handler(SocketServer.StreamRequestHandler):

def handle(self):

data = self.rfile.readline(300)
data = str.strip(data)
bytes = str(len(data))
   
public_ip = self.client_address[0]

serv_date = time.strftime('%Y-%m-%d', time.localtime())
serv_time = time.strftime('%H:%M:%S', time.localtime())

# Note that 'data; comes from the client.
fp = file('/home/rbt/Desktop/tr_report.txt', 'a')

fp.write(data+\t+serv_date+\t+serv_time+\t+public_ip+\t+bytes+\n)
fp.close()

if __name__=='__main__':
server = SocketServer.TCPServer( ('', 55503), tr_handler)
server.serve_forever()

---

1. Do I need to use threads to handle requests, if so, how would I incorporate 
them?
The clients are light and fast never sending more than 270 bytes of data and 
never connecting
for more than 10 seconds at a time. There are currently 500 clients and 
potentially there could be
a few thousand... how high does the current version scale?

2. What's the proper way to handle server exceptions (server stops, fails to 
run at boot, etc.)?

3. How do I keep people from tampering with the server? The clients send 
strings of data to the
server. All the strings start with x and end with y and have z in the middle. 
Is requiring x at
the front and y at the back and z someplace in the middle enough to keep people 
out? I'm open to
suggestions.

Thanks!
rbt







-- 
http://mail.python.org/mailman/listinfo/python-list


Re: socketServer questions

2005-10-07 Thread rbt
On Fri, 2005-10-07 at 09:17 -0700, Paul Rubinhttp: wrote:
  3. How do I keep people from tampering with the server? The clients
  send strings of data to the server. All the strings start with x and
  end with y and have z in the middle. Is requiring x at the front and
  y at the back and z someplace in the middle enough to keep people
  out? I'm open to suggestions.
 
 It only keeps them out if they don't know to use that x..y..z pattern
 and maybe not even then.  Get a copy of Security Engineering by
 Ross Anderson to have an idea of what you're dealing with, especially
 if your server controls something valuable.

The server just logs data, nothing else. It's not private or important
data... just sys admin type stuff (ip, mac addy, etc.). I just don't
want some script kiddie discovering it and trying to 'hack' it. By doing
so, they'd fill the log up with crap. So, If the data doesn't contain x,
y, and z and if the data is too big or too small, I record it to a
'tamper' log and tell the leet hacker to 'go away'. 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Finding where to store application data portably

2005-09-22 Thread rbt
On Tue, 2005-09-20 at 23:03 +0100, Tony Houghton wrote:
 I'm using pygame to write a game called Bombz which needs to save some
 data in a directory associated with it. In Unix/Linux I'd probably use
 ~/.bombz, in Windows something like
 C:\Documents And Settings\user\Applicacation Data\Bombz.
 
 There are plenty of messages in the archives for this group about how to
 find the correct location in Windows, but what about Mac OS?

~/.bombz works equally well on OSX.

-- 
http://mail.python.org/mailman/listinfo/python-list


win32 service and time.sleep()

2005-09-20 Thread rbt
I have a win32 service written in Python. It works well. It sends a
report of the status of the machine via email periodically. The one
problem I have is this... while trying to send an email, the script
loops until a send happens and then it breaks. Should it be unable to
send, it sleeps for 10 minutes with time.sleep(600) and then wakes and
tries again. This is when the problem occurs. I can't stop the service
while the program is sleeping. When I try, it just hangs until a reboot.
Can some suggest how to fix this?

Thanks,
rbt
-- 
http://mail.python.org/mailman/listinfo/python-list


win32 service and time.sleep()

2005-09-20 Thread rbt
I have a win32 service written in Python. It works well. It sends a
report of the status of the machine via email periodically. The one
problem I have is this... while trying to send an email, the script
loops until a send happens and then it breaks. Should it be unable to
send, it sleeps for 10 minutes with time.sleep(600) and then wakes and
tries again. This is when the problem occurs. I can't stop the service
while the program is sleeping. When I try, it just hangs until a reboot.
Can some suggest how to fix this?

Thanks,
rbt

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: appended crontab entries with py script

2005-09-14 Thread rbt
On Tue, 2005-09-13 at 23:18 -0400, Mike Meyer wrote:
 rbt [EMAIL PROTECTED] writes:
 
  How can I safely append a crontab entry to a crontab file
  progammatically with Python?
 
 Well, one way would be to invoke the system crontab utility and use an
 editor that passes the file to your program, and reads the results
 back.
 
  I need to handle crontabs that currently have entries and crontabs that
  are empty. Also, I'd like this to work across Linux and BSD systems.
 
  Any pointers?
 
 I think most Free Unix systems use the Vixie cron, and the non-free
 ones have a crontab command (do some of them call it cron?) with the
 same API. So you're pretty safe using that.
 
 If you want to assume that you're going to have the vixie cron, you
 could dig into it's guts to see what it does for locking, and do that
 by hand.
 
mike

Here's what I did... can you write uglier code than this ;)

Works on Mac and Linux... for the most part.

def add_cron_entry():
home = os.path.expanduser('~')
cur_cron = os.popen('crontab -l  current_crontab.txt')
cur_cron.read()
cur_cron.close()
fp = file('current_crontab.txt', 'a')
print  fp, 0 * * * * %s/.theft_recovery.py %home
fp.close()
load = os.popen('crontab current_crontab.txt')
load.read()
load.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


Get Mac OSX Version

2005-09-13 Thread rbt
Is there a similar function to sys.getwindowsversion() for Macs?

Many thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


appended crontab entries with py script

2005-09-13 Thread rbt
How can I safely append a crontab entry to a crontab file
progammatically with Python?

I need to handle crontabs that currently have entries and crontabs that
are empty. Also, I'd like this to work across Linux and BSD systems.

Any pointers?
-- 
http://mail.python.org/mailman/listinfo/python-list


pretty windows installer for py scripts

2005-09-08 Thread rbt
Any recommendations on a windows packager/installer that's free? I need
it to allow non-tech users to install some python scripts... you know,
Click Next... Click Next... Click Finish... You're Done! and
everything just magically works ;)
-- 
http://mail.python.org/mailman/listinfo/python-list


broken links

2005-07-22 Thread rbt
How can I find broken links (links that point to files that do not
exist) in a directory and remove them using Python? I'm working on RHEL4

Thanks,
rbt
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: broken links

2005-07-22 Thread rbt
I found it:

os.path.exists(path)


On Fri, 2005-07-22 at 09:22 -0400, rbt wrote:
 How can I find broken links (links that point to files that do not
 exist) in a directory and remove them using Python? I'm working on RHEL4
 
 Thanks,
 rbt
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: goto

2005-07-19 Thread rbt
On Tue, 2005-07-19 at 10:02 -0400, George Sakkis wrote:
 rbt [EMAIL PROTECTED] wrote:
 
  On Mon, 2005-07-18 at 12:27 -0600, Steven Bethard wrote:
   Hayri ERDENER wrote:
what is the equivalent of C languages' goto  statement in python?
  
   Download the goto module:
http://www.entrian.com/goto/
   And you can use goto to your heart's content. And to the horror of all
   your friends/coworkers. ;)
  
   STeVe
 
  Shouldn't that be to the horror of all your goto-snob friends.
 
  IMO, most of the people who deride goto do so because they heard or read
  where someone else did.
 
  Many of the world's most profitable software companies (MS for example)
  have thousands of goto statements in their code... oh the horror of it
  all. Why aren't these enlightened-by-the-gods know-it-alls as profitable
  as these obviously ignorant companies?
 
 
 It should not really come as a shock that the same fellow who came up with a 
 brilliant efficient way
 to generate all permutations (http://tinyurl.com/dnazs) is also in favor of 
 goto.
 
 Coming next from rbt: Pointer arithmetic in python ?.
 
 George
 
 

I have moments of brilliance and moments of ignorance. You must admit
though, that was a unique way of generating permutations... how many
other people would have thought of that approach? It did solve my
problem ;)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: goto

2005-07-19 Thread rbt
On Wed, 2005-07-20 at 03:43 +1000, Steven D'Aprano wrote:
 On Tue, 19 Jul 2005 11:29:58 -0400, rbt wrote:
 
  It should not really come as a shock that the same fellow who came up with 
  a brilliant efficient way
  to generate all permutations (http://tinyurl.com/dnazs) is also in favor 
  of goto.
  
  Coming next from rbt: Pointer arithmetic in python ?.
  
  George
  
  
  
  I have moments of brilliance and moments of ignorance. You must admit
  though, that was a unique way of generating permutations... how many
  other people would have thought of that approach? It did solve my
  problem ;)
 
 Sorry rbt, but your algorithm isn't unique, nor was it clever, and in fact
 your implementation wasn't very good even by the undemanding requirements
 of the algorithm. It is just a minor modification of bogosort (also known
 as bozo-sort) algorithm:
 
 http://en.wikipedia.org/wiki/Bogosort
 
 I quote:
 
 ...bogosort is 'the archetypal perversely awful algorithm', one example
 of which is attempting to sort a deck of cards by repeatedly throwing the
 deck in the air, picking the cards up at random, and then testing whether
 the cards are in sorted order.
 
 Bogosort is nothing to be proud of, except as a joke.

It *was* a joke.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python scripts wont run - HELP

2005-07-18 Thread rbt
On Mon, 2005-07-18 at 17:22 +0100, John Abel wrote:
 windozbloz wrote:
 
 Bye Bye Billy Bob...
 
 Hello All,
 I'm a fairly literate windoz amateur programmer mostly in visual basic. I
 have switched to SuSE 9.2 Pro and am trying to quickly come up to speed
 with Python 2.3.4.  I can run three or four line scripts from the command
 line but have not been able to execute a script from a file.  
 
 I have used EMACS and JEDIT to create small test routines.  I would right
 click the file and set properties to executable.  I would then click the
 icon, the bouncy ball would do its thing then a dialog box would flash on
 the screen for a fraction of a second.  I could tell it had a progress bar
 on it but could not catch anything else on it.  Then nothing else would
 happen.
 
 If I could execute a script the world would once again be my playground...
 PLEASE HELP.
 
 
 
   
 
 You will need to include
 
 #!/usr/bin/python
 
 At the top of your script.
 
 HTH
 
 J

Or, better yet:

#!/usr/bin/env python

;)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: goto

2005-07-18 Thread rbt
On Mon, 2005-07-18 at 12:27 -0600, Steven Bethard wrote:
 Hayri ERDENER wrote:
  what is the equivalent of C languages' goto  statement in python?
 
 Download the goto module:
  http://www.entrian.com/goto/
 And you can use goto to your heart's content. And to the horror of all 
 your friends/coworkers. ;)
 
 STeVe

Shouldn't that be to the horror of all your goto-snob friends.

IMO, most of the people who deride goto do so because they heard or read
where someone else did. 

Many of the world's most profitable software companies (MS for example)
have thousands of goto statements in their code... oh the horror of it
all. Why aren't these enlightened-by-the-gods know-it-alls as profitable
as these obviously ignorant companies?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: goto

2005-07-18 Thread rbt
10 PRINT YOU'RE NOT RIGHT IN THE HEAD.
20 GOTO 10


On Tue, 2005-07-19 at 02:33 +, Leif K-Brooks wrote:
 rbt wrote:
  IMO, most of the people who deride goto do so because they heard or read
  where someone else did. 
 
 1  GOTO 17
 2  mean,GOTO 5
 3  couldGOTO 6
 4  with GOTO 7
 5  what GOTO 3
 6  possibly GOTO 24
 7  you! GOTO 21
 8  that GOTO 18
 9  really,  GOTO 23
 10 understandable?
 11 neat.GOTO 16
 12 and  GOTO 25
 13 are  GOTO 9
 14 IGOTO 26
 15 wrongGOTO 20
 16 IGOTO 2
 17 Yes, GOTO 14
 18 simple   GOTO 12
 19 agreeGOTO 4
 20 with GOTO 22
 21 GotosGOTO 13
 22 somethingGOTO 8
 23 really   GOTO 11
 24 be   GOTO 15
 25 easily   GOTO 10
 26 totally  GOTO 19

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: all possible combinations

2005-07-15 Thread rbt
Wow. That's neat. I'm going to use it. Thanks!

On Thu, 2005-07-14 at 19:52 -0400, Peter Hansen wrote:
 Bengt Richter wrote:
  On Thu, 14 Jul 2005 17:10:37 -0400, William Park [EMAIL PROTECTED] wrote:
  It's a one liner in Python too ;-)
  
print ' '.join([x+y+z+q for s in ['abc'] for x in s for y in s for z 
  in s for q in s])
 
 Or for the cost of an import and a lambda, you can keep it looking real 
 obscure and generalize it to any size of sequence ('abcdef' or whatever) 
 and a result length of up to 52 elements:
 
   from string import letters as L
   cartesian = lambda seq, num: eval(list(%s for __ in [seq]
 %s) % ('+'.join(L[:num]), 'for %s in __ ' * num % tuple(L[:num])))
 # (there are spaces at any line breaks above)
 
   cartesian('abcde', 6)
 ['aa', 'ab', 'ac', 'ad', 'ae', 'ba',
 ...
 'ec', 'ed', 'ee']
   len(_)
 15625
 
 grin
 
 -Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: all possible combinations

2005-07-14 Thread rbt
Thanks to all who were helpful... some of you guys are too harsh and
cynical. Here's what I came up with. I believe it's a proper
combination, but I'm sure someone will point out that I'm wrong ;)

groups = [list('abc'),list('abc'),list('abc'),list('abc')]

already = []

while 1:

LIST = []

for g in groups:
sample = random.sample(g, 1)
LIST.append(sample[0])

STRING = ''.join(LIST)
if STRING not in already:
print STRING
already.append(STRING)
if len(already) == 81:
break

On Thu, 2005-07-14 at 23:18 +1000, John Machin wrote:
 Steven D'Aprano wrote:
  On Thu, 14 Jul 2005 08:49:05 +1000, John Machin wrote:
  
  
 You keep using that word. I do not think it means what you think it means.
 
 Both of you please google(define: combination)
  
  
  Combination: a coordinated sequence of chess moves.
  
  An option position that is effected by either a purchase of two long
  positions or two short positions. The investor purchases a call and a put
  (or sells a call and a put) with different expiration dates and/or
  different strike prices.
  
  Or perhaps in Scheme, a function call, consisting of a function name and
  arguments written within parentheses.
  
  Yes, mathematically the definition of combination includes that order does
  not matter. But that certainly isn't the case in common English. Now,
  John, given the tone of the posts you are complaining about,
 
 Wrong -- no complaint. Another quote: It's a joke, Joyce!
 
  do you think
  I was using combination in the precise mathematical sense, or the common
  English sense?
 
 As in Please don't get your combinations in a twist??
 
  
  (Hint: the very first definition Google finds is a collection of things
  that have been combined; an assemblage of separate parts or qualities .
  Not a word there about order mattering or not.)
 

-- 
http://mail.python.org/mailman/listinfo/python-list


all possible combinations

2005-07-13 Thread rbt
Say I have a list that has 3 letters in it:

['a', 'b', 'c']

I want to print all the possible 4 digit combinations of those 3
letters:

4^3 = 64


abaa
aaba
aaab
acaa
aaca
aaac
...

What is the most efficient way to do this? 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: all possible combinations

2005-07-13 Thread rbt
On Thu, 2005-07-14 at 00:47 +1000, Steven D'Aprano wrote:
 On Wed, 13 Jul 2005 10:21:19 -0400, rbt wrote:
 
  Say I have a list that has 3 letters in it:
  
  ['a', 'b', 'c']
  
  I want to print all the possible 4 digit combinations of those 3
  letters:
  
  4^3 = 64
  
  
  abaa
  aaba
  aaab
  acaa
  aaca
  aaac
  ...
  
  What is the most efficient way to do this?
 
 Efficient for who? The user? The programmer? The computer? Efficient use
 of speed or memory or development time?

The CPU

 
 If you want the fastest runtime efficiency, a lookup table of
 pre-calculated values. That is an O(1) operation, and you don't get any
 faster than that.
 
 If you expect to extend the program to arbitrary lists, pre-calculation
 isn't practical, so you need an algorithm to calculate permutations (order
 matters) or combinations (order doesn't matter).

My list is not arbitrary. I'm looking for all 'combinations' as I
originally posted. Order does not matter to me... just all
possibilities.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: all possible combinations

2005-07-13 Thread rbt
On Wed, 2005-07-13 at 10:21 -0400, rbt wrote:
 Say I have a list that has 3 letters in it:
 
 ['a', 'b', 'c']
 
 I want to print all the possible 4 digit combinations of those 3
 letters:
 
 4^3 = 64
 
 
 abaa
 aaba
 aaab
 acaa
 aaca
 aaac
 ...
 
 What is the most efficient way to do this? 

Expanding this to 4^4 (256) to test the random.sample function produces
interesting results. It never finds more than 24 combinations out of the
possible 256. This leads to the question... how 'random' is sample ;)

Try it for yourselves:

test = list('1234')

combinations = []
while 1:
combo = random.sample(test, 4)
possibility = ''.join(combo)
if possibility not in combinations:
print possibility
combinations.append(possibility)
continue
else:
continue

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: all possible combinations

2005-07-13 Thread rbt
On Wed, 2005-07-13 at 11:09 -0400, rbt wrote:
 On Wed, 2005-07-13 at 10:21 -0400, rbt wrote:
  Say I have a list that has 3 letters in it:
  
  ['a', 'b', 'c']
  
  I want to print all the possible 4 digit combinations of those 3
  letters:
  
  4^3 = 64
  
  
  abaa
  aaba
  aaab
  acaa
  aaca
  aaac
  ...
  
  What is the most efficient way to do this? 
 
 Expanding this to 4^4 (256) to test the random.sample function produces
 interesting results. It never finds more than 24 combinations out of the
 possible 256. This leads to the question... how 'random' is sample ;)
 
 Try it for yourselves:
 
 test = list('1234')
 
 combinations = []
 while 1:
 combo = random.sample(test, 4)
 possibility = ''.join(combo)
 if possibility not in combinations:
 print possibility
 combinations.append(possibility)
 continue
 else:
 continue
 

Someone pointed out off-list that this is doing permutation, not
combination. Is there a way to make random.sample to do combinations?

-- 
http://mail.python.org/mailman/listinfo/python-list


breaking out of nested loop

2005-07-12 Thread rbt
What is the appropriate way to break out of this while loop if the for
loop finds a match?

while 1:
for x in xrange(len(group)):
try:
mix = random.sample(group, x)
make_string = ''.join(mix)
n = md5.new(make_string)
match = n.hexdigest()
if match == target:
print Collision!!!
print make_string
Stop = time.strftime(%H:%M:%S-%m-%d-%y, time.localtime())
print Stop, Stop
break
else:
continue
except Exception, e:
print e 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: breaking out of nested loop

2005-07-12 Thread rbt
Thanks guys... that works great. Now I understand why sometimes logic
such as 'while not true' is used ;)

On Tue, 2005-07-12 at 10:51 -0400, Peter Hansen wrote:
 rbt wrote:
  What is the appropriate way to break out of this while loop if the for
  loop finds a match?
 
 Define a flag first:
 
 keepGoing = True
 
  while 1:
 while keepGoing:
 
  for x in xrange(len(group)):
  try:
 ...
  if match == target:
  print Collision!!!
  print make_string
 
 Set the flag here, then do the break:
keepGoing = False
 
  break
 
 Tada...
 
 -Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python Suitable for Large Find Replace Operations?

2005-06-17 Thread rbt
On Fri, 2005-06-17 at 12:33 +1000, John Machin wrote:

 OK then, let's ignore the fact that the data is in a collection of Word 
  Excel files, and let's ignore the scale for the moment. Let's assume 
 there are only 100 very plain text files to process, and only 1000 SSNs 
 in your map, so it doesn't have to be very efficient.
 
 Can you please write a few lines of Python that would define your task 
 -- assume you have a line of text from an input file, show how you would 
   determine that it needed to be changed, and how you would change it.

The script is too long to post in its entirety. In short, I open the
files, do a binary read (in 1MB chunks for ease of memory usage) on them
before placing that read into a variable and that in turn into a list
that I then apply the following re to

ss = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')

like this:

for chunk in whole_file:
search = ss.findall(chunk)
if search:
validate(search)

The validate function makes sure the string found is indeed in the range
of a legitimate SSN. You may read about this range here:

http://www.ssa.gov/history/ssn/geocard.html

That is as far as I have gotten. And I hope you can tell that I have
placed some small amount of thought into the matter. I've tested the
find a lot and it is rather accurate in finding SSNs in files. I have
not yet started replacing anything. I've only posted here for advice
before beginning.

 
  
  
 (4) Under what circumstances will it not be possible to replace *ALL* 
 the SSNs?
  
  
  I do not understand this question.
 
 Can you determine from the data, without reference to the map, that a 
 particular string of characters is an SSN?

See above.

 
 If so, and it is not in the map, why can it not be *added* to the map 
 with a new generated ID?

It is not my responsibility to do this. I do not have that authority
within the organization. Have you never worked for a real-world business
and dealt with office politics and territory ;)

 And what is the source of the SSNs in this file??? Have they been 
 extracted from the data? How?
  
  
  That is irrelevant.
 
 Quite the contrary. If they had been extracted from the data,

They have not. They are generated by a higher authority and then used by
lower authorities such as me. 

Again though, I think this is irrelevant to the task at hand... I have a
map, I have access to the data and that is all I need to have, no? I do
appreciate your input though. If you would like to have a further
exchange of ideas, perhaps we should do so off list?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python Suitable for Large Find Replace Operations?

2005-06-16 Thread rbt
On Tue, 2005-06-14 at 19:51 +0200, Gilles Lenfant wrote:
 rbt a crit :
  Here's the scenario:
  
  You have many hundred gigabytes of data... possible even a terabyte or 
  two. Within this data, you have private, sensitive information (US 
  social security numbers) about your company's clients. Your company has 
  generated its own unique ID numbers to replace the social security numbers.
  
  Now, management would like the IT guys to go thru the old data and 
  replace as many SSNs with the new ID numbers as possible. You have a tab 
  delimited txt file that maps the SSNs to the new ID numbers. There are 
  500,000 of these number pairs. What is the most efficient way  to 
  approach this? I have done small-scale find and replace programs before, 
  but the scale of this is larger than what I'm accustomed to.
  
  Any suggestions on how to approach this are much appreciated.
 
 Are this huge amount of data to rearch/replace stored in an RDBMS or in 
 flat file(s) with markup (XML, CSV, ...) ?
 
 --
 Gilles

I apologize that it has taken me so long to respond. I had a hdd crash
which I am in the process of recovering from. If emails to
[EMAIL PROTECTED] bounced, that is the reason why.


The data is in files. Mostly Word documents and excel spreadsheets. The
SSN map I have is a plain text file that has a format like this:

ssn-xx- new-id-
ssn-xx- new-id-
etc.

There are a bit more than 500K of these pairs.

Thank you,
rbt

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is Python Suitable for Large Find Replace Operations?

2005-06-16 Thread rbt
On Tue, 2005-06-14 at 19:51 +0200, Gilles Lenfant wrote:
 rbt a crit :
  Here's the scenario:
  
  You have many hundred gigabytes of data... possible even a terabyte or 
  two. Within this data, you have private, sensitive information (US 
  social security numbers) about your company's clients. Your company has 
  generated its own unique ID numbers to replace the social security numbers.
  
  Now, management would like the IT guys to go thru the old data and 
  replace as many SSNs with the new ID numbers as possible. You have a tab 
  delimited txt file that maps the SSNs to the new ID numbers. There are 
  500,000 of these number pairs. What is the most efficient way  to 
  approach this? I have done small-scale find and replace programs before, 
  but the scale of this is larger than what I'm accustomed to.
  
  Any suggestions on how to approach this are much appreciated.
 
 Are this huge amount of data to rearch/replace stored in an RDBMS or in 
 flat file(s) with markup (XML, CSV, ...) ?
 
 --
 Gilles

I apologize that it has taken me so long to respond. I had a hdd crash
which I am in the process of recovering from. If emails to
[EMAIL PROTECTED] bounced, that is the reason why.


The data is in files. Mostly Word documents and excel spreadsheets. The
SSN map I have is a plain text file that has a format like this:

ssn-xx- new-id-
ssn-xx- new-id-
etc.

There are a bit more than 500K of these pairs.

Thank you,
rbt


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is Python Suitable for Large Find Replace Operations?

2005-06-16 Thread rbt
On Tue, 2005-06-14 at 11:34 +1000, John Machin wrote:
 rbt wrote:
  Here's the scenario:
  
  You have many hundred gigabytes of data... possible even a terabyte or 
  two. Within this data, you have private, sensitive information (US 
  social security numbers) about your company's clients. Your company has 
  generated its own unique ID numbers to replace the social security numbers.
  
  Now, management would like the IT guys to go thru the old data and 
  replace as many SSNs with the new ID numbers as possible.
 
 This question is grossly OT; it's nothing at all to do with Python. 
 However 
 
 (0) Is this homework?

No, it is not.

 
 (1) What to do with an SSN that's not in the map?

Leave it be.

 
 (2) How will a user of the system tell the difference between new ID 
 numbers and SSNs?

We have documentation. The two sets of numbers (SSNs and new_ids) are
exclusive of each other.

 
 (3) Has the company really been using the SSN as a customer ID instead 
 of an account number, or have they been merely recording the SSN as a 
 data item? Will the new ID numbers be used in communication with the 
 clients? Will they be advised of the new numbers? How will you handle 
 the inevitable cases where the advice doesn't get through?

My task is purely technical.

 
 (4) Under what circumstances will it not be possible to replace *ALL* 
 the SSNs?

I do not understand this question.

 
 (5) For how long can the data be off-line while it's being transformed?

The data is on file servers that are unused on weekends and nights.

 
 
  You have a tab 
  delimited txt file that maps the SSNs to the new ID numbers. There are 
  500,000 of these number pairs.
 
 And what is the source of the SSNs in this file??? Have they been 
 extracted from the data? How?

That is irrelevant.

 
  What is the most efficient way  to 
  approach this? I have done small-scale find and replace programs before, 
  but the scale of this is larger than what I'm accustomed to.
  
  Any suggestions on how to approach this are much appreciated.
 
 A sensible answer will depend on how the data is structured:
 
 1. If it's in a database with tables some of which have a column for 
 SSN, then there's a solution involving SQL.
 
 2. If it's in semi-free-text files where the SSNs are marked somehow:
 
 ---client header---
 surname: Doe first: John initial: Q SSN:123456789 blah blah
 or
 ssn123456789/ssn
 
 then there's another solution which involves finding the markers ...
 
 3. If it's really free text, like
 
 File note: Today John Q. Doe telephoned to advise that his Social 
 Security # is 123456789  not 987654321 (which is his wife's) and the soc 
 sec numbers of his kids Bob  Carol are 
 
 then you might be in some difficulty ... google(TREC)
 
 
 AND however you do it, you need to be very aware of the possibility 
 (especially with really free text) of changing some string of digits 
 that's NOT an SSN.

That's possible, but I think not probably.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python Suitable for Large Find Replace Operations?

2005-06-16 Thread rbt
On Tue, 2005-06-14 at 19:51 +0200, Gilles Lenfant wrote:
 rbt a crit :
  Here's the scenario:
  
  You have many hundred gigabytes of data... possible even a terabyte or 
  two. Within this data, you have private, sensitive information (US 
  social security numbers) about your company's clients. Your company has 
  generated its own unique ID numbers to replace the social security numbers.
  
  Now, management would like the IT guys to go thru the old data and 
  replace as many SSNs with the new ID numbers as possible. You have a tab 
  delimited txt file that maps the SSNs to the new ID numbers. There are 
  500,000 of these number pairs. What is the most efficient way  to 
  approach this? I have done small-scale find and replace programs before, 
  but the scale of this is larger than what I'm accustomed to.
  
  Any suggestions on how to approach this are much appreciated.
 
 Are this huge amount of data to rearch/replace stored in an RDBMS or in 
 flat file(s) with markup (XML, CSV, ...) ?
 
 --
 Gilles

I apologize that it has taken me so long to respond. I had a hdd crash
which I am in the process of recovering from. If emails to
[EMAIL PROTECTED] bounced, that is the reason why.


The data is in files. Mostly Word documents and excel spreadsheets. The
SSN map I have is a plain text file that has a format like this:

ssn-xx- new-id-
ssn-xx- new-id-
etc.

There are a bit more than 500K of these pairs.

Thank you,
rbt


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is Python Suitable for Large Find Replace Operations?

2005-06-16 Thread rbt
On Tue, 2005-06-14 at 19:51 +0200, Gilles Lenfant wrote:
 rbt a crit :
  Here's the scenario:
  
  You have many hundred gigabytes of data... possible even a terabyte or 
  two. Within this data, you have private, sensitive information (US 
  social security numbers) about your company's clients. Your company has 
  generated its own unique ID numbers to replace the social security numbers.
  
  Now, management would like the IT guys to go thru the old data and 
  replace as many SSNs with the new ID numbers as possible. You have a tab 
  delimited txt file that maps the SSNs to the new ID numbers. There are 
  500,000 of these number pairs. What is the most efficient way  to 
  approach this? I have done small-scale find and replace programs before, 
  but the scale of this is larger than what I'm accustomed to.
  
  Any suggestions on how to approach this are much appreciated.
 
 Are this huge amount of data to rearch/replace stored in an RDBMS or in 
 flat file(s) with markup (XML, CSV, ...) ?
 
 --
 Gilles

The data is in files. Mostly Word documents and excel spreadsheets. The
SSN map I have is a plain text file that has a format like this:

ssn-xx- new-id-
ssn-xx- new-id-
etc.

There are a bit more than 500K of these pairs.

Thank you,
rbt

-- 
http://mail.python.org/mailman/listinfo/python-list

Is Python Suitable for Large Find Replace Operations?

2005-06-13 Thread rbt
Here's the scenario:

You have many hundred gigabytes of data... possible even a terabyte or 
two. Within this data, you have private, sensitive information (US 
social security numbers) about your company's clients. Your company has 
generated its own unique ID numbers to replace the social security numbers.

Now, management would like the IT guys to go thru the old data and 
replace as many SSNs with the new ID numbers as possible. You have a tab 
delimited txt file that maps the SSNs to the new ID numbers. There are 
500,000 of these number pairs. What is the most efficient way  to 
approach this? I have done small-scale find and replace programs before, 
but the scale of this is larger than what I'm accustomed to.

Any suggestions on how to approach this are much appreciated.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Destructive Windows Script

2005-06-06 Thread rbt
Terry Reedy wrote:
 Dennis Lee Bieber [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]
 
My previous facility didn't even accept mil-spec wipes -- all
disk drives leaving the facility had to go through a demagnitizer,
 
 
 OT but I am curious: does a metallic case act as a metallic shield, so that 
 the case needs to be opened to do this?  (Conversely, is a magnet near a 
 disk drive a danger to it?)

Absolutely. Small HDD's (like laptops) are especially vulnerable to 
magnetic force.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Destructive Windows Script

2005-06-06 Thread rbt
Mike Meyer wrote:
 Terry Reedy [EMAIL PROTECTED] writes:
 
On *nix, one could open '/dev/rawdisk' (actual name depends on the *nix 
build) and write a tracks worth of garbage for as many tracks as there are. 
I don't how to programmaticly get the track size and number (if there is a 
standard way at all).
 
 
 Modern Unix systems assume drives don't care much about geometry, what
 with sector forwarding and variable track lengths and the like.
 
 Just open the raw disk device (assuming your Unix has such), and start
 writing data to it. Keep going until the write fails at the end of the
 media.
 
 mike

Wouldn't /dev/urandom or /dev/random on Linux systems work better? It's 
the kernel's built in random number generator. It'd fill the drive with 
random bits of data. You could loop it too... in fact, I think many of 
the pre-packaged *wipe* programs are mini Linux distros that do just this.

dd if=/dev/random of=/dev/your_hard_drive
-- 
http://mail.python.org/mailman/listinfo/python-list


How many threads are too many?

2005-06-05 Thread rbt
This may be a stupid question, but here goes:

When designing a threaded application, is there a pratical limit on  the 
number of threads that one should use or is there a way to set it up so 
that the OS handles the number of threads automatically? I am developing 
on 32-bit x86 Intel systems with python 2.4.1. The OS will be Linux and 
Windows.

I have an older app that used to work fine (254 threads) on early 2.3 
Pythons, but now, I get this error with 2.4.1 and 2.3.5:

Traceback (most recent call last):
   File net_queue_and_threads.py, line 124, in ?
 thread.start()
   File /usr/lib/python2.3/threading.py, line 416, in start
 _start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread

-- 
http://mail.python.org/mailman/listinfo/python-list


mix up a string

2005-06-05 Thread rbt
What's the best way to take a string such as 'dog' and mix it up? You 
know, like the word jumble in the papers? ODG. I thought something like 
mix = random.shuffle('dog') would do it, but it won't. Any tips?

Thanks,
rbt
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mix up a string

2005-06-05 Thread rbt
Reinhold Birkenfeld wrote:
 rbt wrote:
 
What's the best way to take a string such as 'dog' and mix it up? You 
know, like the word jumble in the papers? ODG. I thought something like 
mix = random.shuffle('dog') would do it, but it won't. Any tips?
 
 
 py def shuffled(s):
 ... l = list(s)
 ... random.shuffle(l)
 ... return ''.join(l)
 
 
 Reinhold

Thanks guys, this works great. I forgot that shuffle needs a sequence... 
duh ;)
-- 
http://mail.python.org/mailman/listinfo/python-list


Destructive Windows Script

2005-06-05 Thread rbt
How easy or difficult would it be for a computer forensics expert to 
recover data that is overwritten in this manner? This is a bit off-topic 
for comp.lang.python, but I thought some here would have some insight 
into this.

Warning: **This code is destructive**. Do not run it unless you fully 
understand what you're doing!!!

os.chdir('/temp')
for root, dirs, files in os.walk('.'):
 for f in files:
 try:
 print f

 data = ['0', 'a', '1', 'b', '2', 'c',\
 '3', 'd', '4', 'e', '5', 'f',\
 '6', 'g', '7', 'h', '8', 'i',\
 '9', 'j', '~', '!', '@', '#',\
 '$', '%', '^', '', '*', ';']

 fp = file(os.path.join(root,f), 'w')
 random.shuffle(data)
 garble = ''.join(data)
 fp.write(garble)
 fp.close()

 fs = os.popen(del /f /q /s *)
 fs.read()
 fs.close()

 except Exception, e:
 print e
 time.sleep(1)
 continue
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Destructive Windows Script

2005-06-05 Thread rbt
Roose wrote:
 My guess would be: extremely, extremely easy.  Since you're only writing 30 
 bytes for each file, the vast majority of the data will still be present on 
 disk, just temporarily inaccessible because of the del command.  And more 
 than likely it will be possible to recover 100% if they are using a 
 journaling file system like NTFS, which Windows XP does.
 
 If you are honestly trying to destroy your own data, go out and download a 
 free program that will do it right.  If you're trying to write some kind of 
 trojan, well you've got a lot of learning to do.  :)

Thanks for the opinion... I don't do malware. Just interested in 
speeding up file wiping (if possible) for old computers that will be 
auctioned. The boot programs that you allude to (killdisk, autoclave) 
work well, but are slow and tedious. If this can be done *properly* in 
Python, I'd like to have a go at it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Destructive Windows Script

2005-06-05 Thread rbt
Chris Lambacher wrote:
 The reason they are slow and tedious is that they need to write to
 every byte on the disk.  Depending on the size of the disk, there may
 be a lot of data that needs to be written, and if they are older
 computers, write speed may not be particularly fast.

OK, I accept that, but if you have a HDD that's 8GB total and it has 1GB 
of files, why must every byte be written to? Why not just overwrite the 
used portion?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Two questions

2005-06-02 Thread rbt
Peter Hansen wrote:
 Philosophy not entirely aside, you should note that object code in any 
 language can easily be reverse-engineered in the same way, with the 
 only difference being the degree of ease involved.  If the code is worth 
 enough to someone that they are willing to risk violating your license 
 terms, they *will* be able to recover enough source code (whether it was 
 Python, C, or assembly) to do what they need.  

Don't intend to hijack this thread, but this bit interests me. I know 
several accomplished C/assembly programmers who have told me that 
reverse engineering object code from either of these two languages is 
anything but trivial. Yet, I *hear* and *read* the opposite all of the 
time. Can anyone actually demonstrate a decompile that mirrors the 
original source?

Also, I'd venture to say that the number of people in the world who can 
consistently reverse engineer object code is almost statistically 
insignificant... sure, they are out there, but you'll win the lottery 
before you meet one of them and most of them work for big, bad 
government agencies ;)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Information about Python Codyng Projects Ideas

2005-06-01 Thread rbt
Rob Cowie wrote:
 Ha,
 
 I've just headed over here to ask the same thing!
 
 Any good ideas not listed on the wiki?
 
 I too am taking a Masters in Computer Science, however my first degree
 was not purely CS - mostly microbiology, so I'm not yet what one would
 call an expert
 
 Cheers
 

So long as you've had enough math. CS is really just applied math.
-- 
http://mail.python.org/mailman/listinfo/python-list


Calculating Inflation, retirement and cost of living adjustments over 30 years

2005-06-01 Thread rbt
Is this mathematically correct?


def inflation():
 start = int(str.strip(raw_input(How much money do you need each 
month at the start of retirement: )))
 inflation = float(str.strip(raw_input(What will inflation average 
over the next 30 years(.03, .04, etc): )))

 for x in xrange(30):
 start = start*inflation+start
 print start

inflation()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cpu usage limit

2005-05-27 Thread rbt

mf wrote:
 Hi.
 
 My problem:
 How can I make sure that a Python process does not use more that 30% of
 the CPU at any time. I only want that the process never uses more, but
 I don't want the process being killed when it reaches the limit (like
 it can be done with resource module).
 
 Can you help me?
 
 Thanks in advance.
 
 Best regards,
 Markus
 

Are you looping during a cpu intensive task? If so, make it sleep a bit 
like this:

for x in cpu_task:
 time.sleep(0.5)
 do(x)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: cpu usage limit

2005-05-27 Thread rbt
[EMAIL PROTECTED] wrote:
 rbt [EMAIL PROTECTED] wrote:
 
mf wrote:

Hi.

My problem:
How can I make sure that a Python process does not use more that 30% of
the CPU at any time. I only want that the process never uses more, but
I don't want the process being killed when it reaches the limit (like
it can be done with resource module).

Can you help me?

Thanks in advance.

Best regards,
Markus


Are you looping during a cpu intensive task? If so, make it sleep a bit 
like this:

for x in cpu_task:
time.sleep(0.5)
do(x)
 
 
 or like this (untested!)
 
 finished = False
 while not finished:

Why don't you just write 'while True'??? 'while not false' is like 
saying 'I am not unemployed by Microsoft' instead of saying 'I am 
employed by Microsoft'. It's confusing, complex and unnecessary. Lawyers 
call it circumlocution (talking around the truth).

   before = time.time()
   do(x) # sets finished if all was computed
   after = time.time()
   delta = after-before
   time.sleep(delta*10/3.)
 
 now the trick: do(x) can be a single piece of code, with strategically placed 
 yield's
 all over
 
 
 
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   >