how to remove 'FFFD' character

2009-01-09 Thread webcomm
Does anyone know a way to remove the 'FFFD' character with python? You can see the browser output I'm dealing with here: http://webcomm.webfactional.com/htdocs/fffd.JPG I deleted a big chunk out of the middle of that JPG to protect sensitive data. I don't know what the character encoding of this

Re: distinction between unzipping bytes and unzipping a file

2009-01-10 Thread webcomm
On Jan 9, 6:07 pm, John Machin wrote: > Yup, it looks like it's encoded in utf_16_le, i.e. no BOM as > God^H^H^HGates intended: > > >>> buff = open('data', 'rb').read() > >>> buff[:100] > > '<\x00R\x00e\x00g\x00i\x00s\x00t\x00r\x00a\x00t\x00i\x00o\x00n\x00> > \x00<\x00B\x0 > 0a\x00l\x00a\x00n\x00c

Re: BadZipfile "file is not a zip file"

2009-01-10 Thread webcomm
On Jan 9, 7:33 pm, John Machin wrote: > It is not impossible for a file with dummy data to have been > handcrafted or otherwise produced by a process different to that used > for a real-data file. I knew it was produced by the same process, or I wouldn't have shared it. : ) But you couldn't have

Re: BadZipfile "file is not a zip file"

2009-01-12 Thread webcomm
If anyone's interested, here are my django views... from django.shortcuts import render_to_response from django.http import HttpResponse from xml.etree.ElementTree import ElementTree import urllib, base64, subprocess def get_data(request): service_url = 'http://www.something.com/webservices/

Re: BadZipfile "file is not a zip file"

2009-01-12 Thread webcomm
On Jan 12, 11:53 am, "Chris Mellon" wrote: > On Sat, Jan 10, 2009 at 1:32 PM,webcomm wrote: > > On Jan 9, 7:33 pm, John Machin wrote: > >> It is not impossible for a file with dummy data to have been > >> handcrafted or otherwise produced by a process differ

practical limits of urlopen()

2009-01-24 Thread webcomm
Hi, Am I going to have problems if I use urlopen() in a loop to get data from 3000+ URLs? There will be about 2KB of data on average at each URL. I will probably run the script about twice per day. Data from each URL will be saved to my database. I'm asking because I've never opened that many

BadZipfile "file is not a zip file"

2009-01-08 Thread webcomm
The error... >>> file = zipfile.ZipFile('data.zip', "r") Traceback (most recent call last): File "", line 1, in file = zipfile.ZipFile('data.zip', "r") File "C:\Python25\lib\zipfile.py", line 346, in __init__ self._GetContents() File "C:\Python25\lib\zipfile.py", line 366, in _GetCo

Re: BadZipfile "file is not a zip file"

2009-01-08 Thread webcomm
On Jan 8, 8:02 pm, MRAB wrote: > You're just creating a file called "data.zip". That doesn't make it a > zip file. A zip file has a specific format. If the file doesn't have > that format then the zipfile module will complain. Hmm. When I open it in Windows or with 7-Zip, it contains a text file

Re: BadZipfile "file is not a zip file"

2009-01-08 Thread webcomm
On Jan 8, 8:39 pm, "James Mills" wrote: > Send us a sample of this file in question... It contains data that I can't share publicly. I could ask the providers of the service if they have a dummy file I could use that doesn't contain any real data, but I don't know how responsive they'll be. It'

Re: BadZipfile "file is not a zip file"

2009-01-08 Thread webcomm
On Jan 8, 8:54 pm, MRAB wrote: > Have you tried gzip instead? There's no option to download the data in a gzipped format. The files are .zip archives. -- http://mail.python.org/mailman/listinfo/python-list

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 3:16 am, Steven D'Aprano wrote: > The full signature of ZipFile is: > > ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True) > > Try passing compression=zipfile.ZIP_DEFLATED and/or allowZip64=False and > see if that makes any difference. Those arguments didn't make a differe

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 3:46 am, Carl Banks wrote: > The zipfile format is kind of brain dead, you can't tell where the end > of the file is supposed to be by looking at the header.  If the end of > file hasn't yet been reached there could be more data.  To make > matters worse, somehow zip files came to have t

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 5:42 am, John Machin wrote: > And here's a little gadget that might help the diagnostic effort; it > shows the archive size and the position of all the "magic" PKnn > markers. In a "normal" uncommented archive, EndArchive_pos + 22 == > archive_size. I ran the diagnostic gadget... archi

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 10:14 am, "Chris Mellon" wrote: > This is a ticket about another issue or 2 with invalid zipfiles that > the zipfile module won't load, but that other tools will compensate > for: > > http://bugs.python.org/issue1757072 Hmm. That's interesting. Are there other tools I can use in a pyt

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 10:14 am, "Chris Mellon" wrote: > This is a ticket about another issue or 2 with invalid zipfiles that > the zipfile module won't load, but that other tools will compensate > for: > > http://bugs.python.org/issue1757072 Looks like I just need to do this to unzip with unix... from os im

distinction between unzipping bytes and unzipping a file

2009-01-09 Thread webcomm
Hi, In python, is there a distinction between unzipping bytes and unzipping a binary file to which those bytes have been written? The following code is, I think, an example of writing bytes to a file and then unzipping... decoded = base64.b64decode(datum) #datum is a base64 encoded string of data

Re: distinction between unzipping bytes and unzipping a file

2009-01-09 Thread webcomm
On Jan 9, 2:49 pm, webcomm wrote: > decoded = base64.b64decode(datum) > #datum is a base64 encoded string of data downloaded from a web > service > f = open('data.zip', 'wb') > f.write(decoded) > f.close() > x = zipfile.ZipFile('data.zip', '

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 1:32 pm, Scott David Daniels wrote: > I'd certainly try to figure out if the archive was mis-handled > somewhere along the way.   Quite possible that I'm mishandling something, or the service provider is mishandling something. Probably the former. Please see this more recent thread...

Re: distinction between unzipping bytes and unzipping a file

2009-01-09 Thread webcomm
On Jan 9, 3:15 pm, Steve Holden wrote: > webcomm wrote: > > Hi, > > In python, is there a distinction between unzipping bytes and > > unzipping a binary file to which those bytes have been written? > > > The following code is, I think, an example of writing bytes to

Re: distinction between unzipping bytes and unzipping a file

2009-01-09 Thread webcomm
On Jan 9, 4:12 pm, "Chris Mellon" wrote: > It would really help if you could post a sample file somewhere. Here's a sample with some dummy data from the web service: http://webcomm.webfactional.com/htdocs/data.zip That's the zip created in this line of my code... f = open('data.zip', 'wb') If I

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 8, 8:39 pm, "James Mills" wrote: > Send us a sample of this file in question... Here's a sample with some dummy data from the web service: http://webcomm.webfactional.com/htdocs/data.zip That's the zip created in this line of my code... f = open('data.zip', 'wb') If I open the file it co

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 5:00 pm, webcomm wrote: > If I unzip it like this... > popen("unzip data.zip") > ...then the bad characters are 'FFFD' characters as described and > pictured > here...http://groups.google.com/group/comp.lang.python/browse_thread/thread/... >

Re: BadZipfile "file is not a zip file"

2009-01-09 Thread webcomm
On Jan 9, 5:21 pm, John Machin wrote: > Thanks. Would you mind spending a few minutes more on this so that we > can see if it's a problem that can be fixed easily, like the one that > Chris Mellon reported? > Don't mind at all. I'm now working with a zip file with some dummy data I downloaded fr