On 2014-09-18 9:24, Alexander Todorov wrote:
I'm running an experiment which needs to collect http objects (html pages, images, CSS, JavaScript, etc) and store them in some easy to access/analyze structure. Something like:

.../device-mac-addr/timestamp/url-or-domain-would-be-nice/content/

under content/ goes
 * the actual content
 * the headers
 * any referenced content in a subdir if this is an HTML page

Hi, Alex,

The purpose of a cache is to serve collected HTTP resources to large numbers of clients as quickly as possible, while minimizing duplication and keeping the content as fresh as possible; this is then complicated by the HTTP Vary mechanism.

What you're asking for is something very different than this, so mod_cache_disk is not a good solution. For example, a MAC address is irrelevant, and a timestamp in the path is actually harmful. mod_cache_disk does, however, use the host header, port, URL path and query string to create a hash that it uses for its filenames and directory names -- this permits mod_cache_disk to find cached resources quickly while avoiding problems with URL length or special characters in filenames.

In case it is helpful, you can see what is in the cache by running the command "htcacheclean -a -D -p/path/to/your/disk/cache". You can also get more detailed information by using the "-A" option instead of "-a". You could then use the output from this as an index to what is in the cache at a particular point in time. See https://httpd.apache.org/docs/2.4/programs/htcacheclean.html

If this doesn't meet your need, you might want to look into writing your own module to do exactly what you need for your experiment.

--
  Mark Montague
  m...@catseye.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to