All, 

The current 3.1 mod_python implementation of mod_python.util.StorageField.read_to_boudary reads as follows:

   203      def read_to_boundary(self, req, boundary, file):
   204          delim = ""
   205          line = req.readline()
   206          sline = line.strip()
   207          last_bound = boundary + "--"
   208          while line and sline != boundary and sline != last_bound:
   209              odelim = delim
   210              if line[-2:] == "\r\n":
   211                  delim = "\r\n"
   212                  line = line[:-2]
   213              elif line[-1:] == "\n":
   214                  delim = "\n"
   215                  line = line[:-1]
   216              file.write(odelim + line)
   217              line = req.readline()
   218              sline = line.strip()

As we have discussed previously: 

This triggered couple of changes in mod_python 3.2 Beta which reads as follows:
    33  # Fixes memory error when upload large files such as 700+MB ISOs.
    34  readBlockSize = 65368
    35
...
   225     def read_to_boundary(self, req, boundary, file):
...
   234         delim = ''
   235         lastCharCarried = False
   236         last_bound = boundary + '--'
   237         roughBoundaryLength = len(last_bound) + 128
   238         line = req.readline(readBlockSize)
   239         lineLength = len(line)
   240         if lineLength < roughBoundaryLength:
   241             sline = line.strip()
   242         else:
   243             sline = ''
   244         while lineLength > 0 and sline != boundary and sline != last_bound:
   245             if not lastCharCarried:
   246                 file.write(delim)
   247                 delim = ''
   248             else:
   249                 lastCharCarried = False
   250             cutLength = 0
   251             if lineLength == readBlockSize:
   252                 if line[-1:] == '\r':
   253                     delim = '\r'
   254                     cutLength = -1
   255                     lastCharCarried = True
   256             if line[-2:] == '\r\n':
   257                 delim += '\r\n'
   258                 cutLength = -2
   259             elif line[-1:] == '\n':
   260                 delim += '\n'
   261                 cutLength = -1
   262             if cutLength != 0:
   263                 file.write(line[:cutLength])
   264             else:
   265                 file.write(line)
   266             line = req.readline(readBlockSize)
   267             lineLength = len(line)
   268             if lineLength < roughBoundaryLength:
   269                 sline = line.strip()
   270             else:
   271                 sline = ''

This function has a mysterious bug in it... For some files which I could disclose (one of them been the PDF file for Apple's Pages User Manual in Italian) the uploaded file in the server ends up with the same length but different sha512 (the only digest that I'm using).  The problem is a '\r' in the middle of a chunk of data that is much larger than readBlockSize.

Anyhow, I wrote a new function, which I believe is much simpler, and test it with thousands and thousands of different files and so far it seems to work fine.  It reads as follows:

def read_to_boundary(self, req, boundary, file):
    ''' read from the request object line by line with a maximum size,
        until the new line starts with boundary
    '''
    previous_delimiter = ''
    while 1:
        line = req.readline(1<<16)
        if line.startswith(boundary):
            break
       
        if line.endswith('\r\n'):
            file.write(previous_delimiter + line[:-2])
            previous_delimiter = '\r\n'
       
        elif line.endswith('\r') or line.endswith('\n'):
            file.write(previous_delimiter + line[:-1])  
            previous_delimiter = line[-1:]

        else:
            file.write(previous_delimiter + line)
            previous_delimiter = ''

Let me know any comments on it and if you test it and fails please also let me know. I don't have subversion account neither I don't know how to use it thus this email.

/amn

_______________________________________________
Mod_python mailing list


Reply via email to