On Sun, Aug 30, 2009 at 4:14 PM, Juliusz Chroboczek <
[email protected]> wrote:

> > If you are think of chunk encoded object, yes the cache file of such
> object
> > do not know the content-length at the beginning. However, once the object
> is
> > fully downloaded.
>
> At that point, Polipo will set object->length to the now-known content
> length, and call objectMetadataChanged, which calls dirtyDiskEntry, which
> will cause a Content-Length entry to be written out.
>
You are right.

>
> > If a fresh restart of polipo encountered a cache file without
> > Content-Length, and the Cache Control indicate the cache is public and
> not
> > yet expired. There is no way the polipo can be sure if the cache file is
> > full or partial.
>
> Polipo should consider the file as partial in that case.
>
Partial might be a more optimum than invalid, but how can polipo make a
range request out of this as it does not know the real size?  If you try my
steps below, It showes that polipo take the file size as Content-Length.

>
> > 4. Use vi to change the Content-Length header into some other header but
> > keep the size of that header line. trim some data off the end of the
> cache
> > file. This emulates a partial written cache.
> > 5 access the file again. You will get the partial file.
>
> Looks like a bug.  Could you provide a trace of the HTTP exchange?
>

I will explain the steps to reproduce the problem
1. find a public cacheable page, or create a public page from a apache
server you can control, I use the apache server.
2. use curl or ie/firefox to download the page.
3. wait until the page is fully downloaded, signal USR2 for polipo to dump
the object to cache.
4. Find the cache file and edit it. Change the Content-Length: .... to
Kontent-Length:.. . Polipo won't know the content-length if it read the
cache.
5. trim the cache file a bit, remove a few lines or characters from the end.
6. access the page again, you will get the partial file as long as the cache
has not yet expire. In the partial download, polipo put file size as
Content-Length.

The partial cache file without Content-Length is not easy to reproduce, so I
can give a trace. But once it is cached, it can be very problematic until it
expires. Here is how it was found. I have a site consistently complain a
java script error on both IE and firefox. It is trace to the cache file of
polipo.  As the file is java script, it is obvious the file does not end
correctly. The last few blocks of the code does not seem to have the closing
"}". The cache also don't have the Content-Length. I am lucky that the file
is not zipped, jpeg etc. Otherwise, the file is partial won't be so obvious.
How does this happen, I don't know. As I have a few GIGs of cache. My disk
for the cache is of 100+ gigs, so out of disk space is not the issue. I
started to look at other cache files after (signal USR2 polipo) I found a
number of other caches that do not have the Content-Length as well. Not
many, but a few.

As it was very elusive on when the partial cache file was created, I started
to experiment manually create these situation. It became the steps listed
above.
Such partial file without Content-Length do not happen often. As I have
said, only a few out of a few gig of cache. So just invalid them is not
going to hit the performance at all.

Ming


>                                        Juliusz
>
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Polipo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/polipo-users

Reply via email to