On Sun, Aug 30, 2009 at 4:14 PM, Juliusz Chroboczek < [email protected]> wrote:
> > If you are think of chunk encoded object, yes the cache file of such > object > > do not know the content-length at the beginning. However, once the object > is > > fully downloaded. > > At that point, Polipo will set object->length to the now-known content > length, and call objectMetadataChanged, which calls dirtyDiskEntry, which > will cause a Content-Length entry to be written out. > You are right. > > > If a fresh restart of polipo encountered a cache file without > > Content-Length, and the Cache Control indicate the cache is public and > not > > yet expired. There is no way the polipo can be sure if the cache file is > > full or partial. > > Polipo should consider the file as partial in that case. > Partial might be a more optimum than invalid, but how can polipo make a range request out of this as it does not know the real size? If you try my steps below, It showes that polipo take the file size as Content-Length. > > > 4. Use vi to change the Content-Length header into some other header but > > keep the size of that header line. trim some data off the end of the > cache > > file. This emulates a partial written cache. > > 5 access the file again. You will get the partial file. > > Looks like a bug. Could you provide a trace of the HTTP exchange? > I will explain the steps to reproduce the problem 1. find a public cacheable page, or create a public page from a apache server you can control, I use the apache server. 2. use curl or ie/firefox to download the page. 3. wait until the page is fully downloaded, signal USR2 for polipo to dump the object to cache. 4. Find the cache file and edit it. Change the Content-Length: .... to Kontent-Length:.. . Polipo won't know the content-length if it read the cache. 5. trim the cache file a bit, remove a few lines or characters from the end. 6. access the page again, you will get the partial file as long as the cache has not yet expire. In the partial download, polipo put file size as Content-Length. The partial cache file without Content-Length is not easy to reproduce, so I can give a trace. But once it is cached, it can be very problematic until it expires. Here is how it was found. I have a site consistently complain a java script error on both IE and firefox. It is trace to the cache file of polipo. As the file is java script, it is obvious the file does not end correctly. The last few blocks of the code does not seem to have the closing "}". The cache also don't have the Content-Length. I am lucky that the file is not zipped, jpeg etc. Otherwise, the file is partial won't be so obvious. How does this happen, I don't know. As I have a few GIGs of cache. My disk for the cache is of 100+ gigs, so out of disk space is not the issue. I started to look at other cache files after (signal USR2 polipo) I found a number of other caches that do not have the Content-Length as well. Not many, but a few. As it was very elusive on when the partial cache file was created, I started to experiment manually create these situation. It became the steps listed above. Such partial file without Content-Length do not happen often. As I have said, only a few out of a few gig of cache. So just invalid them is not going to hit the performance at all. Ming > Juliusz > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Polipo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/polipo-users
