On Sat, July 14, 2007 3:07 pm, Mildred wrote: > I would like to see the extended attributes more widely used, > especially for the mime type of documents. The libmagic is really good > but sometimes it can't really detect differences between files types > that are really close. And extended attributes can be used to store > others informations like the encoding (utf-8, iso-8859-1, utf-16, ...) > of text files.
Yes, I agree. There are many uses of xattrs which are not actually being used. > The problem is that "many applications and filesystems do not support > extended attributes". > > This problem occurs also when you want to send a file to someone else > or when you want to store a document with extended attributes on a > filesystem where extended attributes are not available. So what about > creating a special file that will serialize extended attributes. So the > data found in these attributes would not be lost. This is very possible. This automatically solves a lot of problems, also for instance that of losing metadata when you download a file over HTTP. (Just download both the file and its .metadata and nothing is lost; heck, Apache could be modified to serve these files virtually even if the file on the server has metadata stored using real fs xattrs). In fact, when I think of it, a lot of people create .md5 or .sha1 checksum files and store them on their servers along with the real files. This is a kind of metadata similar to serialized xattrs, though a rather highly specialized use. It would be nice to have a format specification for the serialized xattrs. However, this is not enough. There must also be a C library that implements the specification. If not, the file will never be used consistently. I think such a library should easily provide transparent access to native filesystem xattrs and serialized metadata file with as little user interaction as possible (ie. simple from user/developer point of view). > I had the idea that the extended attributes could be like mail or http > headers. That is the name of the attribute followed by a colon ':' and > the data. After all attributes there would be a blank line and the file > encapsulated. > The extended attributes can be also serialized in a separate file that > must come with the file it refers to. Yes. I don't think it's a good idea to store the data-file and the xattrs together in a single file. This would break all known applications on earth. It's simply not viable. Also, consider that xattrs may contain binary data per se, so the format should handle binary data by for instance escaping (if the format should be human-readable). I suppose it should also default to UTF-8 for text strings, so that these characters are not escaped as binary data. For a binary format, it is enough to prefix the data with its length and dump the raw binary data into the file. This is not as nice for users who want to inspect the file by hand, though. What would the separate file called? Should there be one metadata-file for each regular file (that contains metadata)? How about simply appending a ".metadata" extension? It is also possible to prefix with a dot to hide this from most applications, but I don't know if this is desirable. The library could check both when reading, but write out the file with a dot prefixed or not based on user preference. > So that specification would be implemented by unix commands like cp or > filemanagers. Then it would permit us to use extended attributes > knowing that they would be preserved and reliable. This is the hardest part. You simply cannot hope to implement xattrs in all programs, and so they might easily get lost silently anywhere along the way (imagine downloading a file (that HAS xattrs) to your harddisk, then copying it to a FAT-formatted pen-drive, opening it on an older Linux distribution, etc.). Though, for this reason, no program should ever RELY on xattr metadata being present. And no valuable data should be stored as xattrs. Xattrs should be considered volatile. I think, for the best possibility to preserve (or simply use) xattrs in as many programs as possible, there has to be a repository of patches for programs. A lot of program authors most likely won't add support for xattrs or serialized-xattrs to their "master" program just like that, especially if it relies on external libraries or even simply Linux-specific system calls (ah, another reason to make a library wrapper). > Also, what about using extended attributes to cache the guessed file > type (guessed using libmagic and extension). Then when a filemanager or > any other application want to know the file type, they will use this > value instead of using guessing another time. That is quite possible. Also, in many cases, you KNOW the type/encoding of a file (the HTTP server always sends this information, although it *might* be a guess, too), even though there is no way to save this to disk along with the file. > I also thought that extended attributes could store the size of a > directory. For example when I want to scan the filesystem with > utilities like du or Baobab, they would store in extended attributes > the directory size along with the time when it was measured. > Afterwards, when they want to know the directory size, they could just > compare the date in the extended attribute with the modification date > of the folder and if they differs, measure the directory size. If not, > they could just use it directly. I think this would be hard to make work reliably in reality. Do all filesystems update the modification date of all parent directories when a file changes? Although, yes, it would be great to have this. It takes ages to scan a whole disk like this normally. > I think the extended attributes should be better integrated in the > desktop. These are just few ideas. What do you think about it ? It's great. We need ideas like these. Keep them coming! :-D > Mildred Vegard _______________________________________________ xdg mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xdg
