On Mon, Mar 17, 2008 at 4:41 PM, Micah Cowan <[EMAIL PROTECTED]> wrote: > Is that true? I thought wget actually read the input file in a streaming > fashion.
If that is the case, then I think it's possible to add links to the list while wget has already running. > I don't expect that a single session's database would get frequent > reuse, though. However, it probably _would_ be used repeatedly while > you're working on a specific session; in that case, it's useful to have > the binary format. A session database! :D So I have misunderstood this database thing. I thougt it is something like a central repository in the user's home (like .wget-history) that records all the links that have been downloaded with all its meta-information. Maybe a better name is a project file, or a session file, but calling it a database would have been too much ... :D. For a session information, an ini file is sufficient IMO. > However, it's important to be able to parse the file, even if there is > some corruption or malformed information in some places--and especially, > if it is truncated (Wget abruptly killed). YAML is safe for this I think. The libyaml implements a YAML scanner. If the scanner failed at a point in the session file, we can consider all points forwards as invalid. And since YAML is composed of line-per-line information, the worst we will get is missing a line of information, instead of losing all the information in the file. To prevent losing data, wget has to frequently write to the session information, but frequent writing will burden the harddisk. I wonder if memory mapped file can help with this. From http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2044.html, it says that memory mapped file has feature of "Automatic file data synchronization and cache from the OS". If wget process is suddenly killed, the task of synchronizing memory and disk content will be done by the OS, CMIIW, so we won't lose any data. > Still, I imagine the problem is easily fixed by placing some line at the > end of the file to indicate completion. Wget completion timestamp would fit it. Considering libyaml stability. Even though it's alpha quality software at version 0.0.1 it has already distributed with its stable counterpart pyyaml (which is implemented in Python [1]) so I think it is usable. At the time this session database feature of wget gets impelemented, libyaml could have reach its production release, so both can run together will, I guess. [1] The binary distribution of PyYAML includes libyaml which can be used as the faster alternative parser.
