Hi Juliusz,

If you are think of chunk encoded object, yes the cache file of such object
do not know the content-length at the beginning. However, once the object is
fully downloaded. The Content-Length is known. These content-length will be
written into diskcache. As you have said in other thread that polipo don't
keep the chunked encoding in cache. So the content-length is the only thing
polipo know to be the real size of the object on disk. The cache file size
is not reliable as the file can be trimmed. Power failure, program crash can
also leave partially written cache file. The file size is not a reliable
indication of object size.

If a fresh restart of polipo encountered a cache file without
Content-Length, and the Cache Control indicate the cache is public and not
yet expired. There is no way the polipo can be sure if the cache file is
full or partial. The trim cache may have cut the file short. Or there is a
crash, power failure, etc that leaves the file partial. The proper way I
believe is to invalid the diskcache.

Here is a little test I do.
1. Access a chunked public object through polipo
2. Send USR2 signal to polipo,
3. find the diskcache file, The file will have the Content-Length.
4. Use vi to change the Content-Length header into some other header but
keep the size of that header line. trim some data off the end of the cache
file. This emulates a partial written cache.
5 access the file again. You will get the partial file.

At the end of this test, the client has no idea the file it get is partial.
The polipo send out a Content-Length based on the file size minus the
headers etc.

My fix does not invalid the dynamic content (chunked encoding). It merely
make a fresh read of diskcache to invalid the cache file without
content-length. This can only happen when there is no in memory object of
the corresponding diskcache. If there is in memory cache object, the
Content-Length is already accessible from in memory object. The disk read
won't happen. If the Content-length of the object is not yet known in
memory, then the object is in progress (still downloading).

Have a nice summer.
Ming

On Wed, Aug 19, 2009 at 5:52 PM, Juliusz Chroboczek <
[email protected]> wrote:

> > I am occasionally saw cache file without the Content-Length header.
> > Wondering if this header should always be there?
>
> No.  The Content-Length header specifies the length of the instance.  If
> it is unknown (e.g. because it's a dynamically-generated file that is
> incomplete), then this header will be missing.
>
> > Should I change the polipo to ignore or destroy a cache file that
> > don't have the Content-Length header?
>
> No.  Partial caching of dynamically generated content is a useful feature.
>
>                                        Juliusz
>
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Polipo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/polipo-users

Reply via email to