Bug#642020: python-debian: ArFile (via DebFile) errors when given only a fileobj

2016-06-20 Thread Simon Ruggier
On Tue, 26 Aug 2014 16:43:43 -0700 John Wright  wrote:
> Sorry for the very delayed response.  I'm trying to clean up our
> outstanding bug list...
>
> I'm not convinced that sharing the file is the correct thing to do - it
> introduces thread-unsafe-ness between DebFile/ArFile and its dependent
> objects that isn't obvious up front.
>
> In addition, the existing behavior or re-opening the file by name is bad
> for another reason: it leaks open files.
>
> I think a better solution is probably something like:
>   - If the input file is a "real" file, use mmap to get a view of the
> object for members.
>   - Otherwise, we should probably iterate over the entire input object
> to generate our own file-like view of it, which itself supports an
> interface similar to mmap, and use that for member objects.

Hi, thanks for tending to the bug tracker.

I just want to raise this question: does it make sense for this API to be
thread-safe? Threads are essentially a performance optimization, and
creating a DebFile object and passing it to multiple threads (so that they
can read from it in parallel) is surely not the most performant way to
retrieve information from package binaries. So it has neither the
convenience of single-threading, nor good performance characteristics, and
it's therefore somewhat unreasonable for callers to expect that use case
would be supported.

On the other hand, it's very reasonable to expect an API to support file
objects, and right now, the lack of support for this has motivated two
separate people to fix the problem in forks. I think it would make sense to
incorporate changes from one of the downstream forks to fix this bug, and
not worry about thread safety at all. Any callers that need to access the
information from multiple threads can use a single thread to retrieve the
information, then pass it between threads internally without calling the
API in a concurrent way.


Bug#642020: python-debian: ArFile (via DebFile) errors when given only a fileobj

2014-08-26 Thread John Wright
Sorry for the very delayed response.  I'm trying to clean up our
outstanding bug list...

I'm not convinced that sharing the file is the correct thing to do - it
introduces thread-unsafe-ness between DebFile/ArFile and its dependent
objects that isn't obvious up front.

In addition, the existing behavior or re-opening the file by name is bad
for another reason: it leaks open files.

I think a better solution is probably something like:
  - If the input file is a real file, use mmap to get a view of the
object for members.
  - Otherwise, we should probably iterate over the entire input object
to generate our own file-like view of it, which itself supports an
interface similar to mmap, and use that for member objects.

-- 
John Wright j...@debian.org


signature.asc
Description: Digital signature


Bug#642020: python-debian: ArFile (via DebFile) errors when given only a fileobj

2011-09-18 Thread Mel Collins
Package: python-debian
Version: 0.1.14ubuntu2
Severity: normal
Tags: patch

Currently (tested using trunk), ArFile assumes that it will always receive a 
filename, and that filename relates to the local filesystem.
This means that, when instantiating a DebFile using only a fileobj parameter, 
you get something like:

 from debian import debfile
 package = debfile.DebFile(fileobj=open('some-package.deb', 'r'))
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/usr/local/lib/python2.6/dist-packages/python_debian-_CHANGELOG_VERSION_-py2.6.egg/debian/debfile.py,
 line 236, in __init__
self.__version = f.read().strip()
  File 
/usr/local/lib/python2.6/dist-packages/python_debian-_CHANGELOG_VERSION_-py2.6.egg/debian/arfile.py,
 line 211, in read
self.__fp = open(self.__fname, r)
TypeError: coercing to Unicode: need string or buffer, NoneType found

The solution I've come up with will involve DebFile sharing the fileobj among 
several ArFile instances (is there a way to clone file-like objects?), which 
means ArFiles keep an internal seek-location, and don't actually close() the 
file pointer (leaky?).

A patch should be attached, or there is my fork on GitHub:
https://github.com/Raumkraut/python-debian



-- System Information:
Debian Release: squeeze/sid
  APT prefers lucid-updates
  APT policy: (500, 'lucid-updates'), (500, 'lucid-security'), (500, 'lucid')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-33-generic-pae (SMP w/2 CPU cores)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages python-debian depends on:
ii  python 2.6.5-0ubuntu1An interactive high-level object-o
ii  python-apt 0.7.94.2ubuntu6.4 Python interface to libapt-pkg
ii  python-support 1.0.4ubuntu1  automated rebuilding support for P

python-debian recommends no packages.

Versions of packages python-debian suggests:
ii  gpgv 1.4.10-2ubuntu1 GNU privacy guard - signature veri
diff --git a/lib/debian/arfile.py b/lib/debian/arfile.py
index 9ad757e..d881e5f 100644
--- a/lib/debian/arfile.py
+++ b/lib/debian/arfile.py
@@ -159,6 +159,7 @@ class ArMember(object):
 self.__fp = None# file pointer 
 self.__offset = None# start-of-data offset
 self.__end = None   # end-of-data offset
+self.__cur = None   # current seek-location (for shared fp)
 
 def from_file(fp, fname):
 fp is an open File object positioned on a valid file header inside
@@ -198,51 +199,68 @@ class ArMember(object):
 f.__fname = fname
 f.__offset = fp.tell() # start-of-data
 f.__end = f.__offset + f.__size
-
+
+if fname is None:
+f.__fp = fp
+f.__cur = f.__offset
+
 return f
 
 from_file = staticmethod(from_file)
 
-# file interface
-
-# XXX this is not a sequence like file objects
-def read(self, size=0):
+def _init_fp(self):
+Readies the file pointer. 
 if self.__fp is None:
+# We have just a filename, so open it (one time)
 self.__fp = open(self.__fname, r)
 self.__fp.seek(self.__offset)
+
+elif self.__fname is None:
+# Nameless (possibly shared) file-like object
+self.__fp.seek(self.__cur)
+
+def _update_cur(self):
+Update our current position. 
+self.__cur = self.__fp.tell()
+
+# file interface
 
-cur = self.__fp.tell()
-
-if size  0 and size = self.__end - cur: # there's room
-return self.__fp.read(size)
+# XXX this is not a sequence like file objects
+def read(self, size=0):
+self._init_fp()
 
-if cur = self.__end or cur  self.__offset:
-return ''
+if size  0 and size = self.__end - self.__cur: # there's room
+ret = self.__fp.read(size)
 
-return self.__fp.read(self.__end - cur)
+elif self.__cur = self.__end or self.__cur  self.__offset:
+ret = ''
+
+else:
+ret = self.__fp.read(self.__end - self.__cur)
+
+self._update_cur()
+return ret
 
 def readline(self, size=None):
-if self.__fp is None:
-self.__fp = open(self.__fname, r)
-self.__fp.seek(self.__offset)
-
+self._init_fp()
+
 if size is not None: 
 buf = self.__fp.readline(size)
-if self.__fp.tell()  self.__end:
+self._update_cur()
+if self.__cur  self.__end:
 return ''
 
 return buf
 
 buf = self.__fp.readline()
-if self.__fp.tell()  self.__end:
+self._update_cur()
+if self.__cur  self.__end:
 return ''
 else:
 return buf
 
 def readlines(self, sizehint=0):
-if self.__fp is None:
-