[ntfs-3g-devel] [PATCH] attrib.c: fix expanding $STANDARD_INFORMATION with almost-full MFT record
When setting a security descriptor on an NTFS v1.2 format file in an NTFS v3.0+ volume, NTFS-3G would migrate $STANDARD_INFORMATION to the new format, which requires extending its size from 48 to 72 bytes. If this happened while the file's MFT record was almost full, and none of the file's attributes could be made non-resident, and the file did not have an attribute list attribute, then the operation would unexpectedly fail with ENOENT. Fix this by adding an attribute list to the file in this situation. Note that this bug would have been very difficult to hit under normal usage because it required the MFT record to be filled to just the right amount with attributes that cannot be made nonresident, such as $FILE_NAME attributes. The $SECURITY_DESCRIPTOR attribute must also have already been made nonresident, since otherwise space could be freed by making it nonresident. Nevertheless, here's a script which reproduces the bug: fallocate -l 100M ntfs.img mkntfs --fast --force ntfs.img mkdir -p mnt ntfs-3g ntfs.img mnt touch mnt/file ln mnt/file mnt/0001 ln mnt/file mnt/0002 ln mnt/file mnt/0003 ln mnt/file mnt/004 setfattr mnt/file -n system.ntfs_object_id -v 0x setfattr mnt/file -n system.ntfs_acl -v 0x010004801400240034000102000520002002010200052000200202001c0001031400ff011f0001010001 The hard links make the MFT record of "mnt/file" nearly full. Then, assigning an object ID forces $SECURITY_DESCRIPTOR to be nonresident in favor of $OBJECT_ID, while still keeping the MFT record nearly full. Finally, the last command, which sets the file's security descriptor, should succeed; but in fact it failed with "No such file or directory". This bug was found using the wlfuzz program from wimlib. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- libntfs-3g/attrib.c | 4 1 file changed, 4 insertions(+) diff --git a/libntfs-3g/attrib.c b/libntfs-3g/attrib.c index a5a6549a..1cc3ef64 100644 --- a/libntfs-3g/attrib.c +++ b/libntfs-3g/attrib.c @@ -5142,6 +5142,10 @@ static int ntfs_resident_attr_resize_i(ntfs_attr *na, const s64 newsize, */ if (na->type==AT_STANDARD_INFORMATION || na->type==AT_ATTRIBUTE_LIST) { ntfs_attr_put_search_ctx(ctx); + if (!NInoAttrList(na->ni) && ntfs_inode_add_attrlist(na->ni)) { + ntfs_log_perror("Could not add attribute list"); + return -1; + } if (ntfs_inode_free_space(na->ni, offsetof(ATTR_RECORD, non_resident_end) + 8)) { ntfs_log_perror("Could not free space in MFT record"); -- 2.11.0 -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] Allow setting DOS name when long name has trailing dot or space
Windows places filenames with a trailing dot or space in the Win32 namespace and allows setting DOS names on such files. This is true even though on Windows such filenames can only be created and accessed using WinNT-style paths and will confuse most Windows software. Regardless, because libntfs-3g did not allow setting DOS names on such files, in some cases it was impossible to correctly restore, using libntfs-3g, a directory structure that was created under Windows. Update ntfs_set_ntfs_dos_name() to permit operating on a file that has a long name with a trailing dot or space. But continue to forbid creating such names on a filesystem FUSE-mounted with the windows_name option. Additionally, continue to forbid a trailing a dot or space in DOS names; this matches the Windows behavior. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- include/ntfs-3g/layout.h | 11 --- include/ntfs-3g/unistr.h | 4 ++-- libntfs-3g/dir.c | 10 +++--- libntfs-3g/unistr.c | 21 +++-- src/lowntfs-3g.c | 6 +++--- src/ntfs-3g.c| 8 6 files changed, 39 insertions(+), 21 deletions(-) diff --git a/include/ntfs-3g/layout.h b/include/ntfs-3g/layout.h index 564167c..0aba464 100644 --- a/include/ntfs-3g/layout.h +++ b/include/ntfs-3g/layout.h @@ -1068,12 +1068,17 @@ typedef enum { FILE_NAME_WIN32 = 0x01, /* The standard WinNT/2k NTFS long filenames. Case insensitive. All Unicode chars except: '\0', '"', '*', '/', ':', '<', - '>', '?', '\' and '|'. Further, names cannot end with a '.' - or a space. */ + '>', '?', '\' and '|'. Trailing dots and spaces are allowed, + even though on Windows a filename with such a suffix can only + be created and accessed using a WinNT-style path, i.e. + \\?\-prefixed. (If a regular path is used, Windows will + strip the trailing dots and spaces, which makes such + filenames incompatible with most Windows software.) */ FILE_NAME_DOS = 0x02, /* The standard DOS filenames (8.3 format). Uppercase only. All 8-bit characters greater space, except: '"', '*', '+', - ',', '/', ':', ';', '<', '=', '>', '?' and '\'. */ + ',', '/', ':', ';', '<', '=', '>', '?' and '\'. Trailing + dots and spaces are forbidden. */ FILE_NAME_WIN32_AND_DOS = 0x03, /* 3 means that both the Win32 and the DOS filenames are identical and hence have been saved in this single filename diff --git a/include/ntfs-3g/unistr.h b/include/ntfs-3g/unistr.h index 7ea0038..76e4ced 100644 --- a/include/ntfs-3g/unistr.h +++ b/include/ntfs-3g/unistr.h @@ -65,9 +65,9 @@ extern ntfschar *ntfs_str2ucs(const char *s, int *len); extern void ntfs_ucsfree(ntfschar *ucs); -extern BOOL ntfs_forbidden_chars(const ntfschar *name, int len); +extern BOOL ntfs_forbidden_chars(const ntfschar *name, int len, BOOL strict); extern BOOL ntfs_forbidden_names(ntfs_volume *vol, - const ntfschar *name, int len); + const ntfschar *name, int len, BOOL strict); extern BOOL ntfs_collapsible_chars(ntfs_volume *vol, const ntfschar *shortname, int shortlen, const ntfschar *longname, int longlen); diff --git a/libntfs-3g/dir.c b/libntfs-3g/dir.c index fdc87fa..a66f807 100644 --- a/libntfs-3g/dir.c +++ b/libntfs-3g/dir.c @@ -2654,9 +2654,12 @@ int ntfs_set_ntfs_dos_name(ntfs_inode *ni, ntfs_inode *dir_ni, shortlen = ntfs_mbstoucs(newname, ); if (shortlen > MAX_DOS_NAME_LENGTH) shortlen = MAX_DOS_NAME_LENGTH; - /* make sure the short name has valid chars */ + + /* Make sure the short name has valid chars. +* Note: the short name cannot end with dot or space, but the +* corresponding long name can. */ if ((shortlen < 0) - || ntfs_forbidden_names(ni->vol,shortname,shortlen)) { + || ntfs_forbidden_names(ni->vol,shortname,shortlen,TRUE)) { ntfs_inode_close_in_dir(ni,dir_ni); ntfs_inode_close(dir_ni); res = -errno; @@ -2667,7 +2670,8 @@ int ntfs_set_ntfs_dos_name(ntfs_inode *ni, ntfs_inode *dir_ni, if (longlen > 0) { oldlen = get_dos_name(ni, dnum, oldname); if ((oldlen >= 0) - && !ntfs_forbidden_names(ni->vol, longname, longlen)) { + && !ntfs_forbidden_names(ni->vol, longname, longlen, +FALSE)) { if (oldlen > 0) { if (flag
Re: [ntfs-3g-devel] [PATCH 2/2] ACE validation fixes
On Sat, Oct 29, 2016 at 11:21:37AM +0200, Jean-Pierre André wrote: > Hi again, > > Eric Biggers wrote: > > Hi Jean-Pierre, > > > > Sorry for the late response. > > No problem. I also did not do much about it. > > The intent of ntfs_valid_descr() was to guard against > processing security descriptors with invalid or unknown > features, but your need is to check whether a descriptor > is valid for Windows. The purpose of ntfs-3g is to map > Unix concepts to an ntfs file system, which is somehow > different from emulating the Windows behavior (a moving > target, Windows 8 and Windows 10 brought significant > changes). > > The translations of Windows ACLs to Posix ones rely on > heuristics which will be defeated if the ACEs are not > as expected. > > Maybe having two variants of ntfs_valid_descr() would > be the way to go, as you do not need translations, > inheritance, etc. > > Jean-Pierre Maybe. I suppose that would mean callers would be updated to specify whether they want the stricter validation or not, and ntfs_get_ntfs_acl() and ntfs_set_ntfs_acl() wouldn't require the stricter validation? Would the stricter validation apply to the SACL as well as the DACL? It seems that it shouldn't, i.e. having entries in the SACL, such as system audit entries or integrity labels or whatever, shouldn't prevent NTFS-3G from attempting to map the DACL to UNIX permissions. Eric -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] ntfs-3g, lowntfs-3g: remove failed_secure variable
This is no longer used. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- src/lowntfs-3g.c | 3 --- src/ntfs-3g.c| 3 --- 2 files changed, 6 deletions(-) diff --git a/src/lowntfs-3g.c b/src/lowntfs-3g.c index 9d933d2..59fbcd0 100644 --- a/src/lowntfs-3g.c +++ b/src/lowntfs-3g.c @@ -4227,7 +4227,6 @@ int main(int argc, char *argv[]) fuse_fstype fstype = FSTYPE_UNKNOWN; #endif const char *permissions_mode = (const char*)NULL; - const char *failed_secure = (const char*)NULL; #if defined(HAVE_SETXATTR) && defined(XATTR_MAPPINGS) struct XATTRMAPPING *xattr_mapping = (struct XATTRMAPPING*)NULL; #endif /* defined(HAVE_SETXATTR) && defined(XATTR_MAPPINGS) */ @@ -4454,8 +4453,6 @@ int main(int argc, char *argv[]) ntfs_log_info("%s", fuse26_kmod_msg); #endif setup_logging(parsed_options); - if (failed_secure) - ntfs_log_info("%s\n",failed_secure); if (permissions_mode) ntfs_log_info("%s, configuration type %d\n",permissions_mode, 5 + POSIXACLS*6 - KERNELPERMS*3 + CACHEING); diff --git a/src/ntfs-3g.c b/src/ntfs-3g.c index 702d676..c6ee014 100644 --- a/src/ntfs-3g.c +++ b/src/ntfs-3g.c @@ -4034,7 +4034,6 @@ int main(int argc, char *argv[]) fuse_fstype fstype = FSTYPE_UNKNOWN; #endif const char *permissions_mode = (const char*)NULL; - const char *failed_secure = (const char*)NULL; #if defined(HAVE_SETXATTR) && defined(XATTR_MAPPINGS) struct XATTRMAPPING *xattr_mapping = (struct XATTRMAPPING*)NULL; #endif /* defined(HAVE_SETXATTR) && defined(XATTR_MAPPINGS) */ @@ -4260,8 +4259,6 @@ int main(int argc, char *argv[]) ntfs_log_info("%s", fuse26_kmod_msg); #endif setup_logging(parsed_options); - if (failed_secure) - ntfs_log_info("%s\n",failed_secure); if (permissions_mode) ntfs_log_info("%s, configuration type %d\n",permissions_mode, 4 + POSIXACLS*6 - KERNELPERMS*3 + CACHEING); -- 2.10.1 -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] unistr.c: make utf16_to_utf8_size() always honor @outs_len
On Sat, Oct 29, 2016 at 12:07:17PM +0200, Jean-Pierre André wrote: > Eric Biggers wrote: > > On Sat, Oct 29, 2016 at 09:45:57AM +0200, Jean-Pierre André wrote: > > > > > > I am waiting for a green light from Tuxera for merging them > > > into the git. > > > > Is there any particular reason you need their permission to do so? > > Well, they are the owner of the project... > Why? What exactly do they contribute? I don't see patches from them, or them reviewing patches on the mailing list. >From what I can see the real work is done by you and other independent contributors such as myself. It's been claimed Tuxera helps with testing. How exactly? Where can I find information about the tests they run on NTFS-3G? Is it helpful and do they actually find bugs? Wouldn't it be more useful to develop more open source tests and add NTFS support to xfstests? You've mentioned several times that not all changes are going into the next version. Normally this means that some time before the release, a release branch would be cut, so there would be two branches in the repository: one for the development version, and one for the next release. Right now I only see an "edge" branch. Which version is that supposed to be? Is there going to be another branch created? Where can I find the version destined for the next release so I can help test it? Thanks, Eric -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] lowntfs-3g: correctly pass file info to reparse plugins
uintptr_t can always hold a pointer but long may not. In practice it doesn't matter on non-Windows, though. Anyway, the important part is not the cast at all but rather the fact that 'fi->fh' needs to be used and not 'fi'... On Sat, Oct 29, 2016 at 10:38:45AM +0200, Jean-Pierre André wrote: > Hi again, > > The "#ifdef PLUGIN_ENABLED" part is a major bug eligible > for the next version. > > The "uintptr_t" casts are more debatable, because this is > defined as an optional type in C11 (section 7.20.1.4). > This is probably why fh is not declared as uintptr_t in > "fuse_common.h". Nevertheless, you are right, the cast to > long is wrong in 64-bit environments which define long as > a 32-bit integer (and the equivalent type for that on > Windows is ULONG_PTR). The real question is "Is there an > environment which defines uintptr_t different from unsigned > long ?" > > As doing something about it in "configure" would be > overkilling, I will probably take the proposal for next > version. > > Jean-Pierre > > Eric Biggers wrote: > > Signed-off-by: Eric Biggers <ebigge...@gmail.com> > > --- > > src/lowntfs-3g.c | 14 +++--- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/src/lowntfs-3g.c b/src/lowntfs-3g.c > > index a91d123..9d933d2 100644 > > --- a/src/lowntfs-3g.c > > +++ b/src/lowntfs-3g.c > > @@ -1493,15 +1493,15 @@ close: > > of->parent = 0; > > of->ino = ino; > > of->state = state; > > -#ifdef PLUGIN_ENABLED > > +#ifndef PLUGINS_DISABLED > > memcpy(>fi, fi, sizeof(struct fuse_file_info)); > > -#endif /* PLUGIN_ENABLED */ > > +#endif /* PLUGINS_DISABLED */ > > of->next = ctx->open_files; > > of->previous = (struct open_file*)NULL; > > if (ctx->open_files) > > ctx->open_files->previous = of; > > ctx->open_files = of; > > - fi->fh = (long)of; > > + fi->fh = (uintptr_t)of; > > } > > } > > if (res) > > @@ -1542,7 +1542,7 @@ static void ntfs_fuse_read(fuse_req_t req, fuse_ino_t > > ino, size_t size, > > REPARSE_POINT *reparse; > > struct open_file *of; > > > > - of = (struct open_file*)fi; > > + of = (struct open_file*)(uintptr_t)fi->fh; > > res = CALL_REPARSE_PLUGIN(ni, read, buf, size, offset, >fi); > > if (res >= 0) { > > goto stamps; > > @@ -1623,7 +1623,7 @@ static void ntfs_fuse_write(fuse_req_t req, > > fuse_ino_t ino, const char *buf, > > REPARSE_POINT *reparse; > > struct open_file *of; > > > > - of = (struct open_file*)fi; > > + of = (struct open_file*)(uintptr_t)fi->fh; > > res = CALL_REPARSE_PLUGIN(ni, write, buf, size, offset, > > >fi); > > if (res >= 0) { > > @@ -2283,7 +2283,7 @@ exit: > > if (ctx->open_files) > > ctx->open_files->previous = of; > > ctx->open_files = of; > > - fi->fh = (long)of; > > + fi->fh = (uintptr_t)of; > > } > > } > > return res; > > @@ -2774,7 +2774,7 @@ static void ntfs_fuse_release(fuse_req_t req, > > fuse_ino_t ino, > > char ghostname[GHOSTLTH]; > > int res; > > > > - of = (struct open_file*)(long)fi->fh; > > + of = (struct open_file*)(uintptr_t)fi->fh; > > /* Only for marked descriptors there is something to do */ > > if (!of > > || !(of->state & (CLOSE_COMPRESSED | CLOSE_ENCRYPTED > > > -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] unistr.c: make utf16_to_utf8_size() always honor @outs_len
Hi Jean-Pierre, Are you going to be reviewing/applying any of these other patches? (Excluding the "ACE validation fixes" one which will need to be reworked once the desired behavior is agreed on.) I've also found a bug in lowntfs-3g regarding the reparse plugin support, so I'll send a patch for that too. Thanks, Eric On Wed, Sep 14, 2016 at 11:39:07PM -0700, Eric Biggers wrote: > utf16_to_utf8_size() was not guaranteed to fail with ENAMETOOLONG if the > computed length was greater than @outs_len. This could cause a buffer > overrun in ntfs_utf16_to_utf8(). This was a bug introduced by the > patches to allow broken Unicode. Fix it. > > Signed-off-by: Eric Biggers <ebigge...@gmail.com> > --- > libntfs-3g/unistr.c | 26 +- > 1 file changed, 17 insertions(+), 9 deletions(-) > > diff --git a/libntfs-3g/unistr.c b/libntfs-3g/unistr.c > index 4d33bb4..190dbd8 100644 > --- a/libntfs-3g/unistr.c > +++ b/libntfs-3g/unistr.c > @@ -458,10 +458,15 @@ void ntfs_file_value_upcase(FILE_NAME_ATTR > *file_name_attr, > */ > > /* > - * Return the amount of 8-bit elements in UTF-8 needed (without the > terminating > - * null) to store a given UTF-16LE string. > + * Return the number of bytes in UTF-8 needed (without the terminating null) > to > + * store the given UTF-16LE string. > * > - * Return -1 with errno set if string has invalid byte sequence or too long. > + * On error, -1 is returned, and errno is set to the error code. The > following > + * error codes can be expected: > + * EILSEQ The input string is not valid UTF-16LE (only possible > + * if compiled without ALLOW_BROKEN_UNICODE). > + * ENAMETOOLONGThe length of the UTF-8 string in bytes (without the > + * terminating null) would exceed @outs_len. > */ > static int utf16_to_utf8_size(const ntfschar *ins, const int ins_len, int > outs_len) > { > @@ -470,7 +475,7 @@ static int utf16_to_utf8_size(const ntfschar *ins, const > int ins_len, int outs_l > BOOL surrog; > > surrog = FALSE; > - for (i = 0; i < ins_len && ins[i]; i++) { > + for (i = 0; i < ins_len && ins[i] && count <= outs_len; i++) { > unsigned short c = le16_to_cpu(ins[i]); > if (surrog) { > if ((c >= 0xdc00) && (c < 0xe000)) { > @@ -511,17 +516,20 @@ static int utf16_to_utf8_size(const ntfschar *ins, > const int ins_len, int outs_l > count += 3; > else > goto fail; > - if (count > outs_len) { > - errno = ENAMETOOLONG; > - goto out; > - } > } > - if (surrog) > + > + if (surrog && count <= outs_len) { > #if ALLOW_BROKEN_UNICODE > count += 3; /* ending with a single surrogate */ > #else > goto fail; > #endif /* ALLOW_BROKEN_UNICODE */ > + } > + > + if (count > outs_len) { > + errno = ENAMETOOLONG; > + goto out; > + } > > ret = count; > out: > -- > 2.9.3 > -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH 2/2] ACE validation fixes
Hi Jean-Pierre, Sorry for the late response. On Mon, Sep 26, 2016 at 01:52:36PM +0200, Jean-Pierre André wrote: > > 1. "Object" ACEs are mentioned as only being used for Active Directory > > objects > > [source: Windows Internals 6th edition]. On Windows, trying to use > > SetFileSecurity() to set an object ACE in the DACL of an NTFS directory > > fails > > with ERROR_INVALID_ACL. This is different from how Windows treats truly > > unknown ACE types (see below). But I think it would be fine for > > NTFS-3G to > > simplify things by treating object ACEs like any other unknown ACE type. > > Just for me to understand : Unix stores special objects > (such as pipes, sockets, etc.) as file system objects with > access control attached. Does Windows also records special > objects as file system objects ? If so, how are these > objects distinguished from files and directories ? Active Directory objects are stored in a database, not directly on the filesystem. They just happen to both use the security descriptor format. > The purpose of ntfs_valid_descr() is to reject descriptors > which ntfs-3g cannot process properly. I am not keen on > letting special ACEs leaking into translations to Linux. > > Inheritance is of particular concern, because this is rather > complex and there are undocumented special cases which have > to be found by trials and errors. Moreover the result is > generally not satisfactory for Linux users who have > expectations different from Windows ones (typical example : > inheritance of execute permissions). > > Also : if objects can inherit special ACEs from their parent > directory, what prevents them to be inherited to plain files > created in the same directory ? > > Cannot these specific needs be implemented within wimlib ? I am not sure what you mean regarding the purpose of ntfs_valid_descr(), but the existing behavior is that it allows unknown ACEs in both the DACL and SACL except in certain cases. And in those certain cases, such as a callback ACE that is not the last ACE in the list, it is broken for backup/restore applications like wimlib which expect to be able to set security descriptors that were created on Windows. Note that as a backup/restore application, wimlib does not want ACE inheritance at all, nor does it want translation of Windows ACLs to POSIX or Linux permissions. It simply wants to stamp a security descriptor on each file or directory. It can be assumed that the security descriptors are valid in the sense that Windows would accept them. As part of this, it can be assumed that the ACEs in each ACL have the common ACE_HEADER. But it can *not* be assumed that every ACE is of a known type. The expected behavior is that unknown ACEs are settable but are then skipped during access evalution. This would match the Windows behavior. I think Windows users might actually find it quite annoying for Windows to do otherwise because then people could not, for example, restore security descriptors intended for a later version of Windows from a live system running an older version of Windows. Essentially the same argument applies to NTFS-3G. With regards to inheritance, as I pointed out Windows simply performs the standard inheritance algorithm on unknown ACEs per the standard ACE_HEADER flags. It's true that there could be special rules that it is missing, but it seems like the most logical behavior and probably the easiest to implement too. Of course, this isn't currently relevant to wimlib which as I mentioned does not care about inheritance. I just thought I'd suggest it as an improvement (though it's still not clear to me what the current behavior is "supposed" to be, and that's part of the problem). Eric -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH 2/2] ACE validation fixes
On Wed, Sep 21, 2016 at 12:01:23PM +0200, Jean-Pierre André wrote: > > I have never met an object ACE and they might be irrelevant > for a file system which only deals with files and directories. > > Is there a point in ntfs-3g accepting ACE types controlling > entities which are not emulated on Linux (callbacks, labels, > policies, etc.) ? Yes, because it should be --- and already is, except in certain cases --- possible to use NTFS-3G to restore security descriptors that were created under Windows. This can be done by using wimlib to extract a WIM image to a NTFS volume (for example). I think the emulation of ACEs under Linux is a separate concern which for some of the new ACE types isn't really possible or meaningful. I also did some research, and some experiments on Windows: 1. "Object" ACEs are mentioned as only being used for Active Directory objects [source: Windows Internals 6th edition]. On Windows, trying to use SetFileSecurity() to set an object ACE in the DACL of an NTFS directory fails with ERROR_INVALID_ACL. This is different from how Windows treats truly unknown ACE types (see below). But I think it would be fine for NTFS-3G to simplify things by treating object ACEs like any other unknown ACE type. 2. "Callback" ACEs, also known as "conditional" ACEs, are mentioned as only existing for use by the AuthZ API, which is a userspace API for access control. The Windows kernel does *not* evaluate such ACEs when performing access checks [source: Windows Internals 6th edition]. However, I *was* able to set such an ACE in the DACL of an NTFS directory using SetFileSecurity(). In addition, on Windows such an ACE is inherited per the standard ACE header flags, and the generic rights and SID mapping is performed. Still, I don't yet know exactly *why* recent Windows 10 builds have been observed to use such ACEs. 3. Truly unknown ACE types are accepted by SetFileSecurity(). They also are inherited per the standard ACE header flags. However, they are not evaluated during access checks. In addition, SetFileSecurity() does no validation of the ACCESS_MASK or SID fields of unknown ACEs --- which makes sense because the format of such ACEs is actually unknown beyond the ACE_HEADER. Instead, the ACE size field simply required to be at least sizeof(ACE_HEADER) and a multiple of 4. No generic rights or SID mapping is performed during inheritance of unknown ACEs. So, given the requirements and these observations, I'd like to propose that NTFS-3G handle unknown ACE types as follows: * ntfs_valid_descr() accepts them and check the size only (like Windows) * ntfs_inherit_acl() performs inheritance on unknown ACE types per the ACE header flags but without the generic mapping (like Windows). Optionally, generic rights and SID mapping can be done for callback ACEs. * NTFS-3G otherwise ignores unknown ACEs (like Windows) Any thoughts on this? Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH 2/2] Remove unused argument from ntfs_make_symlink()
On Wed, Sep 21, 2016 at 10:47:57AM +0200, Jean-Pierre André wrote: > Hi Eric, > > There has been a recent request for ntfs-3g to return > the st_size for symlinks as the size of the target > path (as described in the stat manual), so the target > is now useful > > Regards > Well, the previous patch does that, which is why 'attr_size' became unused. Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH 2/2] Remove unused argument from ntfs_make_symlink()
Now that the size of the reparse point attribute is no longer used by the FUSE drivers to populate st_size for symlinks and junctions, it no longer needs to be returned by ntfs_make_symlink(). Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- include/ntfs-3g/reparse.h | 4 ++-- libntfs-3g/reparse.c | 4 +--- src/lowntfs-3g.c | 14 -- src/ntfs-3g.c | 12 4 files changed, 11 insertions(+), 23 deletions(-) diff --git a/include/ntfs-3g/reparse.h b/include/ntfs-3g/reparse.h index 27e9050..76af915 100644 --- a/include/ntfs-3g/reparse.h +++ b/include/ntfs-3g/reparse.h @@ -24,8 +24,8 @@ #ifndef REPARSE_H #define REPARSE_H -char *ntfs_make_symlink(ntfs_inode *ni, const char *mnt_point, - int *pattr_size); +char *ntfs_make_symlink(ntfs_inode *ni, const char *mnt_point); + BOOL ntfs_possible_symlink(ntfs_inode *ni); int ntfs_get_ntfs_reparse_data(ntfs_inode *ni, char *value, size_t size); diff --git a/libntfs-3g/reparse.c b/libntfs-3g/reparse.c index b0f96ae..2e92fbb 100644 --- a/libntfs-3g/reparse.c +++ b/libntfs-3g/reparse.c @@ -724,8 +724,7 @@ static char *ntfs_get_rellink(ntfs_inode *ni, ntfschar *junction, int count) * symbolic link or directory junction */ -char *ntfs_make_symlink(ntfs_inode *ni, const char *mnt_point, - int *pattr_size) +char *ntfs_make_symlink(ntfs_inode *ni, const char *mnt_point) { s64 attr_size = 0; char *target; @@ -820,7 +819,6 @@ char *ntfs_make_symlink(ntfs_inode *ni, const char *mnt_point, } free(reparse_attr); } - *pattr_size = attr_size; if (bad) errno = EOPNOTSUPP; return (target); diff --git a/src/lowntfs-3g.c b/src/lowntfs-3g.c index bc9770a..b05492e 100644 --- a/src/lowntfs-3g.c +++ b/src/lowntfs-3g.c @@ -634,11 +634,10 @@ static int junction_getstat(ntfs_inode *ni, struct stat *stbuf) { char *target; - int attr_size; int res; errno = 0; - target = ntfs_make_symlink(ni, ctx->abs_mnt_point, _size); + target = ntfs_make_symlink(ni, ctx->abs_mnt_point); /* * If the reparse point is not a valid * directory junction, and there is no error @@ -713,11 +712,9 @@ static int ntfs_fuse_getstat(struct SECURITY_CONTEXT *scx, goto ok; #else /* PLUGINS_DISABLED */ char *target; - int attr_size; errno = 0; - target = ntfs_make_symlink(ni, ctx->abs_mnt_point, - _size); + target = ntfs_make_symlink(ni, ctx->abs_mnt_point); /* * If the reparse point is not a valid * directory junction, and there is no error @@ -1020,12 +1017,11 @@ static int junction_readlink(ntfs_inode *ni, const REPARSE_POINT *reparse __attribute__((unused)), char **pbuf) { - int attr_size; int res; errno = 0; res = 0; - *pbuf = ntfs_make_symlink(ni, ctx->abs_mnt_point, _size); + *pbuf = ntfs_make_symlink(ni, ctx->abs_mnt_point); if (!*pbuf) { if (errno == EOPNOTSUPP) { *pbuf = strdup(ntfs_bad_reparse); @@ -1068,11 +1064,9 @@ static void ntfs_fuse_readlink(fuse_req_t req, fuse_ino_t ino) res = -errno; } #else /* PLUGINS_DISABLED */ - int attr_size; - errno = 0; res = 0; - buf = ntfs_make_symlink(ni, ctx->abs_mnt_point, _size); + buf = ntfs_make_symlink(ni, ctx->abs_mnt_point); if (!buf) { if (errno == EOPNOTSUPP) buf = strdup(ntfs_bad_reparse); diff --git a/src/ntfs-3g.c b/src/ntfs-3g.c index f4af89b..3633ac3 100644 --- a/src/ntfs-3g.c +++ b/src/ntfs-3g.c @@ -698,11 +698,10 @@ static int junction_getattr(ntfs_inode *ni, struct stat *stbuf) { char *target; - int attr_size; int res; errno = 0; - target = ntfs_make_symlink(ni, ctx->abs_mnt_point, _size); + target = ntfs_make_symlink(ni, ctx->abs_mnt_point); /* * If the reparse point is not a valid * directory junction, and there is no error @@ -805,10 +804,9 @@ static int ntfs_fuse_getattr(const char *org_path, struct stat *stbuf) goto exit; #else /* PLUGINS_DISABLED */ char *target; - int attr_size; errno = 0; - target = ntfs_make_symlink(ni, ctx->abs_mnt_point,
[ntfs-3g-devel] [PATCH 1/2] ntfs-3g, lowntfs-3g: set correct st_size for symlinks
NTFS-3G used several different conventions for setting st_size of symlinks. Make it use the standard POSIX convention of setting st_size to the length of the link target without a terminating null. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- src/lowntfs-3g.c | 35 +++ src/ntfs-3g.c| 35 +++ 2 files changed, 54 insertions(+), 16 deletions(-) diff --git a/src/lowntfs-3g.c b/src/lowntfs-3g.c index a91d123..bc9770a 100644 --- a/src/lowntfs-3g.c +++ b/src/lowntfs-3g.c @@ -645,11 +645,10 @@ static int junction_getstat(ntfs_inode *ni, * we still display as a symlink */ if (target || (errno == EOPNOTSUPP)) { - /* returning attribute size */ if (target) - stbuf->st_size = attr_size; + stbuf->st_size = strlen(target); else - stbuf->st_size = sizeof(ntfs_bad_reparse); + stbuf->st_size = sizeof(ntfs_bad_reparse) - 1; stbuf->st_blocks = (ni->allocated_size + 511) >> 9; stbuf->st_mode = S_IFLNK; free(target); @@ -705,7 +704,7 @@ static int ntfs_fuse_getstat(struct SECURITY_CONTEXT *scx, apply_umask(stbuf); } else { stbuf->st_size = - sizeof(ntfs_bad_reparse); + sizeof(ntfs_bad_reparse) - 1; stbuf->st_blocks = (ni->allocated_size + 511) >> 9; stbuf->st_mode = S_IFLNK; @@ -725,12 +724,11 @@ static int ntfs_fuse_getstat(struct SECURITY_CONTEXT *scx, * we still display as a symlink */ if (target || (errno == EOPNOTSUPP)) { - /* returning attribute size */ if (target) - stbuf->st_size = attr_size; + stbuf->st_size = strlen(target); else stbuf->st_size = - sizeof(ntfs_bad_reparse); + sizeof(ntfs_bad_reparse) - 1; stbuf->st_blocks = (ni->allocated_size + 511) >> 9; stbuf->st_nlink = @@ -837,8 +835,29 @@ static int ntfs_fuse_getstat(struct SECURITY_CONTEXT *scx, le64_to_cpu( intx_file->minor)); } - if (intx_file->magic == INTX_SYMBOLIC_LINK) + if (intx_file->magic == INTX_SYMBOLIC_LINK) { + char *target = NULL; + int len; + + /* st_size should be set to length of +* symlink target as multibyte string */ + len = ntfs_ucstombs( + intx_file->target, + (na->data_size - + offsetof(INTX_FILE, +target)) / + sizeof(ntfschar), +, 0); + if (len < 0) { + res = -errno; + free(intx_file); + ntfs_attr_close(na); + goto exit; + } + free(target); stbuf->st_mode = S_IFLNK; + stbuf->st_size = len; + } free(intx_file); } ntfs_attr_close(na); diff --git a/src/ntfs-3g.c b/src/ntfs-3g.c index 702d676..f4af89b 100644 --- a/src/ntfs-3g.c +++ b/src/ntfs-3g.c @@ -709,11 +709,10 @@ static int junction_getattr(ntfs_inode *ni, * we still display as a symlink */ if (target || (errno == EOPNOTSUPP)) { - /* returni
[ntfs-3g-devel] [PATCH 2/2] ACE validation fixes
- Allow extra data after the SID. Recent Windows 10 images have been reported to contain DACLs with an ACCESS_ALLOWED_CALLBACK_ACE with extra data after the SID. The ACE is not necessarily at the end of the DACL, so the recent fix to allow extra data for the last ACE only was insufficient. I also suspect extra data is sometimes used by other ACE types. In fact, SetFileSecurity() on Windows permits extra data for all ACE types. So make NTFS-3G do the same. - Validate the SID at the correct offset for "object" ACEs. (Most likely this bug wasn't noticed because object ACEs are rarely used.) - Only validate the SID for recognized ACE types. The placement or presence of the SID should not be assumed for future ACE types. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- libntfs-3g/acls.c | 114 +++--- 1 file changed, 82 insertions(+), 32 deletions(-) diff --git a/libntfs-3g/acls.c b/libntfs-3g/acls.c index b91e041..06c44d9 100644 --- a/libntfs-3g/acls.c +++ b/libntfs-3g/acls.c @@ -536,45 +536,95 @@ gid_t ntfs_find_group(const struct MAPPING* groupmapping, const SID * gsid) } /* + * Does the ACE have the same format as ACCESS_ALLOWED_ACE? + */ + +static BOOL is_regular_ace(const ACE_HEADER *pace) +{ + switch (pace->type) { + case ACCESS_ALLOWED_ACE_TYPE: + case ACCESS_DENIED_ACE_TYPE: + case SYSTEM_AUDIT_ACE_TYPE: + case ACCESS_ALLOWED_CALLBACK_ACE_TYPE: + case ACCESS_DENIED_CALLBACK_ACE_TYPE: + case SYSTEM_AUDIT_CALLBACK_ACE_TYPE: + case SYSTEM_MANDATORY_LABEL_ACE_TYPE: + case SYSTEM_RESOURCE_ATTRIBUTE_ACE_TYPE: + case SYSTEM_SCOPED_POLICY_ID_ACE_TYPE: + case SYSTEM_PROCESS_TRUST_LABEL_ACE_TYPE: + return TRUE; + default: + return FALSE; + } +} + +/* + * Does the ACE have the same format as ACCESS_ALLOWED_OBJECT_ACE? + */ + +static BOOL is_object_ace(const ACE_HEADER *pace) +{ + switch (pace->type) { + case ACCESS_ALLOWED_OBJECT_ACE_TYPE: + case ACCESS_DENIED_OBJECT_ACE_TYPE: + case SYSTEM_AUDIT_OBJECT_ACE_TYPE: + case ACCESS_ALLOWED_CALLBACK_OBJECT_ACE_TYPE: + case ACCESS_DENIED_CALLBACK_OBJECT_ACE_TYPE: + case SYSTEM_AUDIT_CALLBACK_OBJECT_ACE_TYPE: + return TRUE; + default: + return FALSE; + } +} + +/* * Check the validity of the ACEs in a DACL or SACL + * + * If an ACE is recognized, we validate its SID. + * Otherwise, we validate its size only. */ static BOOL valid_acl(const ACL *pacl, unsigned int end) { - const ACCESS_ALLOWED_ACE *pace; - unsigned int offace; - unsigned int acecnt; - unsigned int acesz; - unsigned int nace; - unsigned int wantsz; - BOOL ok; + unsigned int ace_count; + unsigned int ace_offset; + unsigned int ace_size; + unsigned int sid_offset; + const ACE_HEADER *pace; + const SID *psid; - ok = TRUE; - acecnt = le16_to_cpu(pacl->ace_count); - offace = sizeof(ACL); - for (nace = 0; (nace < acecnt) && ok; nace++) { - /* be sure the beginning is within range */ - if ((offace + sizeof(ACCESS_ALLOWED_ACE)) > end) - ok = FALSE; - else { - pace = (const ACCESS_ALLOWED_ACE*) - &((const char*)pacl)[offace]; - acesz = le16_to_cpu(pace->size); - if (((offace + acesz) > end) - || !ntfs_valid_sid(>sid)) -ok = FALSE; - else { - /* Win10 may insert garbage in the last ACE */ - wantsz = ntfs_sid_size(>sid) + 8; - if (((nace < (acecnt - 1)) - && (wantsz != acesz)) - || (wantsz > acesz)) - ok = FALSE; - } - offace += acesz; - } + for (ace_count = le16_to_cpu(pacl->ace_count), ace_offset = sizeof(ACL); +ace_count != 0; +ace_count--, ace_offset += ace_size) + { + if (sizeof(ACE_HEADER) > end - ace_offset) + return FALSE; + + pace = (const ACE_HEADER *)((char *)pacl + ace_offset); + ace_size = le16_to_cpu(pace->size); + if (ace_size < sizeof(ACE_HEADER) || + ace_size > end - ace_offset) + return FALSE; + + if (is_regular_ace(pace)) + sid_offset = offsetof(ACCESS_ALLOWED_ACE, sid); + else if (is_object_ace
[ntfs-3g-devel] [PATCH 1/2] Add definitions of ACE types up to Windows 10
The ACE types defined in layout.h were significantly out of date, as Microsoft has defined a number of new ACE types over the years. None of the new ACEs uses a new base structure, though it seems that some can have (or usually have) additional data after the SID. More information about the new ACEs can be found in the public documentation on MSDN. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- include/ntfs-3g/layout.h | 90 +--- 1 file changed, 62 insertions(+), 28 deletions(-) diff --git a/include/ntfs-3g/layout.h b/include/ntfs-3g/layout.h index 98380de..564167c 100644 --- a/include/ntfs-3g/layout.h +++ b/include/ntfs-3g/layout.h @@ -1406,28 +1406,52 @@ typedef enum { * enum ACE_TYPES - The predefined ACE types (8-bit, see below). */ typedef enum { - ACCESS_MIN_MS_ACE_TYPE = 0, - ACCESS_ALLOWED_ACE_TYPE = 0, - ACCESS_DENIED_ACE_TYPE = 1, - SYSTEM_AUDIT_ACE_TYPE = 2, - SYSTEM_ALARM_ACE_TYPE = 3, /* Not implemented as of Win2k. */ - ACCESS_MAX_MS_V2_ACE_TYPE = 3, - - ACCESS_ALLOWED_COMPOUND_ACE_TYPE= 4, - ACCESS_MAX_MS_V3_ACE_TYPE = 4, - - /* The following are Win2k only. */ - ACCESS_MIN_MS_OBJECT_ACE_TYPE = 5, - ACCESS_ALLOWED_OBJECT_ACE_TYPE = 5, - ACCESS_DENIED_OBJECT_ACE_TYPE = 6, - SYSTEM_AUDIT_OBJECT_ACE_TYPE= 7, - SYSTEM_ALARM_OBJECT_ACE_TYPE= 8, - ACCESS_MAX_MS_OBJECT_ACE_TYPE = 8, - - ACCESS_MAX_MS_V4_ACE_TYPE = 8, - - /* This one is for WinNT&2k. */ - ACCESS_MAX_MS_ACE_TYPE = 8, + ACCESS_MIN_MS_ACE_TYPE = 0, + ACCESS_ALLOWED_ACE_TYPE = 0, + ACCESS_DENIED_ACE_TYPE = 1, + SYSTEM_AUDIT_ACE_TYPE = 2, + SYSTEM_ALARM_ACE_TYPE = 3, /* reserved */ + ACCESS_MAX_MS_V2_ACE_TYPE = 3, + + ACCESS_ALLOWED_COMPOUND_ACE_TYPE= 4, /* reserved */ + ACCESS_MAX_MS_V3_ACE_TYPE = 4, + + /* Win2k and later */ + ACCESS_MIN_MS_OBJECT_ACE_TYPE = 5, + ACCESS_ALLOWED_OBJECT_ACE_TYPE = 5, + ACCESS_DENIED_OBJECT_ACE_TYPE = 6, + SYSTEM_AUDIT_OBJECT_ACE_TYPE= 7, + SYSTEM_ALARM_OBJECT_ACE_TYPE= 8, /* reserved */ + ACCESS_MAX_MS_OBJECT_ACE_TYPE = 8, + + ACCESS_MAX_MS_V4_ACE_TYPE = 8, + + /* Apparently, this was the max type in Win2k, but for some reason MS +* chose not to update this constant in later Windows versions */ + ACCESS_MAX_MS_ACE_TYPE = 8, + + /* Windows XP and later */ + ACCESS_ALLOWED_CALLBACK_ACE_TYPE= 9, + ACCESS_DENIED_CALLBACK_ACE_TYPE = 10, + ACCESS_ALLOWED_CALLBACK_OBJECT_ACE_TYPE = 11, + ACCESS_DENIED_CALLBACK_OBJECT_ACE_TYPE = 12, + SYSTEM_AUDIT_CALLBACK_ACE_TYPE = 13, + SYSTEM_ALARM_CALLBACK_ACE_TYPE = 14, /* reserved */ + SYSTEM_AUDIT_CALLBACK_OBJECT_ACE_TYPE = 15, + SYSTEM_ALARM_CALLBACK_OBJECT_ACE_TYPE = 16, /* reserved */ + + /* Windows Vista and later */ + SYSTEM_MANDATORY_LABEL_ACE_TYPE = 17, + + /* Windows 8 and later */ + SYSTEM_RESOURCE_ATTRIBUTE_ACE_TYPE = 18, + SYSTEM_SCOPED_POLICY_ID_ACE_TYPE= 19, + + /* Windows 10 and later */ + SYSTEM_PROCESS_TRUST_LABEL_ACE_TYPE = 20, + + ACCESS_MAX_MS_V5_ACE_TYPE = 20, + } __attribute__((__packed__)) ACE_TYPES; /** @@ -1628,9 +1652,7 @@ typedef struct { */ /** - * struct ACCESS_DENIED_ACE - - * - * ACCESS_ALLOWED_ACE, ACCESS_DENIED_ACE, SYSTEM_AUDIT_ACE, SYSTEM_ALARM_ACE + * struct ACCESS_ALLOWED_ACE, etc. - Base structure for all regular ACEs */ typedef struct { /* 0 ACE_HEADER; -- Unfolded here as gcc doesn't like unnamed structs. */ @@ -1641,7 +1663,15 @@ typedef struct { /* 4*/ACCESS_MASK mask; /* Access mask associated with the ACE. */ /* 8*/SID sid;/* The SID associated with the ACE. */ } __attribute__((__packed__)) ACCESS_ALLOWED_ACE, ACCESS_DENIED_ACE, - SYSTEM_AUDIT_ACE, SYSTEM_ALARM_ACE; + SYSTEM_AUDIT_ACE, SYSTEM_ALARM_ACE, + ACCESS_ALLOWED_CALLBACK_ACE, + ACCESS_DENIED_CALLBACK_ACE, + SYSTEM_AUDIT_CALLBACK_ACE, + SYSTEM_ALARM_CALLBACK_ACE, + SYSTEM_MANDATORY_LABEL_ACE, + SYSTEM_RESOURCE_ATTRIBUTE_ACE, + SYSTEM_SCOPED_POLICY_ID_ACE, + SYSTEM_PROCESS_TRUST_LABEL_ACE; /** * enum OBJECT_ACE_FLAGS - The object ACE flags (32-bit). @@ -1652
[ntfs-3g-devel] [PATCH] Eliminate NTFS_BUG()
NTFS_BUG() was broken because it relied on dereferencing a NULL pointer. This is undefined behavior, and gcc was compiling out the statement. Crashing in library code is also unfriendly in general. There were only two users. Make them just use regular error handling. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- include/ntfs-3g/debug.h | 8 libntfs-3g/mft.c| 12 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/include/ntfs-3g/debug.h b/include/ntfs-3g/debug.h index f7f3c6f..dba400b 100644 --- a/include/ntfs-3g/debug.h +++ b/include/ntfs-3g/debug.h @@ -36,12 +36,4 @@ extern void ntfs_debug_runlist_dump(const struct _runlist_element *rl); static __inline__ void ntfs_debug_runlist_dump(const struct _runlist_element *rl __attribute__((unused))) {} #endif -#define NTFS_BUG(msg) \ -{ \ - int ___i = 1; \ - ntfs_log_critical("Bug in %s(): %s\n", __FUNCTION__, msg); \ - ntfs_log_debug("Forcing segmentation fault!"); \ - ___i = ((int*)NULL)[___i]; \ -} - #endif /* defined _NTFS_DEBUG_H */ diff --git a/libntfs-3g/mft.c b/libntfs-3g/mft.c index 29f1f4b..85cd120 100644 --- a/libntfs-3g/mft.c +++ b/libntfs-3g/mft.c @@ -1276,8 +1276,10 @@ static int ntfs_mft_record_init(ntfs_volume *vol, s64 size) /* Sanity checks. */ if (mft_na->data_size > mft_na->allocated_size || - mft_na->initialized_size > mft_na->data_size) - NTFS_BUG("mft_na sanity checks failed"); + mft_na->initialized_size > mft_na->data_size) { + ntfs_log_critical("mft_na sanity checks failed"); + goto undo_data_init; + } /* Sync MFT to minimize data loss if there won't be clean unmount. */ if (ntfs_inode_sync(mft_na->ni)) @@ -1343,8 +1345,10 @@ static int ntfs_mft_rec_init(ntfs_volume *vol, s64 size) /* Sanity checks. */ if (mft_na->data_size > mft_na->allocated_size || - mft_na->initialized_size > mft_na->data_size) - NTFS_BUG("mft_na sanity checks failed"); + mft_na->initialized_size > mft_na->data_size) { + ntfs_log_critical("mft_na sanity checks failed"); + goto undo_data_init; + } out: ntfs_log_leave("\n"); return ret; -- 2.9.3 -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] unistr.c: fix another buffer overrun in ntfs_utf16_to_utf8()
If an output buffer was provided, ntfs_utf16_to_utf8() limited the output string length without the terminating null to 'outs_len'. This was incorrect because a terminating null was always added to the string, causing a buffer overrun if the output string happened to have exactly the maximum length. This was a longstanding bug. Fix it by leaving space for a terminating null. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- libntfs-3g/unistr.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/libntfs-3g/unistr.c b/libntfs-3g/unistr.c index 190dbd8..e70e316 100644 --- a/libntfs-3g/unistr.c +++ b/libntfs-3g/unistr.c @@ -544,7 +544,7 @@ fail: * @ins: input utf16 string buffer * @ins_len: length of input string in utf16 characters * @outs: on return contains the (allocated) output multibyte string - * @outs_len: length of output buffer in bytes + * @outs_len: length of output buffer in bytes (ignored if *@outs is NULL) * * Return -1 with errno set if string has invalid byte sequence or too long. */ @@ -563,10 +563,16 @@ static int ntfs_utf16_to_utf8(const ntfschar *ins, const int ins_len, int halfpair; halfpair = 0; - if (!*outs) + if (!*outs) { + /* If no output buffer was provided, we will allocate one and +* limit its length to PATH_MAX. Note: we follow the standard +* convention of PATH_MAX including the terminating null. */ outs_len = PATH_MAX; + } - size = utf16_to_utf8_size(ins, ins_len, outs_len); + /* The size *with* the terminating null is limited to @outs_len, +* so the size *without* the terminating null is limited to one less. */ + size = utf16_to_utf8_size(ins, ins_len, outs_len - 1); if (size < 0) goto out; @@ -877,7 +883,7 @@ fail: * @ins: input Unicode string buffer * @ins_len: length of input string in Unicode characters * @outs: on return contains the (allocated) output multibyte string - * @outs_len: length of output buffer in bytes + * @outs_len: length of output buffer in bytes (ignored if *@outs is NULL) * * Convert the input little endian, 2-byte Unicode string @ins, of length * @ins_len into the multibyte string format dictated by the current locale. -- 2.9.3 -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] unistr.c: remove unused function ntfs_file_value_upcase()
ntfs_file_value_upcase() is not called from anywhere in NTFS-3G, seems unlikely to be used by third-party programs, and can be replaced with calling ntfs_name_upcase() directly. So remove it. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- include/ntfs-3g/unistr.h | 3 --- libntfs-3g/unistr.c | 17 - 2 files changed, 20 deletions(-) diff --git a/include/ntfs-3g/unistr.h b/include/ntfs-3g/unistr.h index b6d428e..7ea0038 100644 --- a/include/ntfs-3g/unistr.h +++ b/include/ntfs-3g/unistr.h @@ -50,9 +50,6 @@ extern void ntfs_name_upcase(ntfschar *name, u32 name_len, extern void ntfs_name_locase(ntfschar *name, u32 name_len, const ntfschar *locase, const u32 locase_len); -extern void ntfs_file_value_upcase(FILE_NAME_ATTR *file_name_attr, - const ntfschar *upcase, const u32 upcase_len); - extern int ntfs_ucstombs(const ntfschar *ins, const int ins_len, char **outs, int outs_len); extern int ntfs_mbstoucs(const char *ins, ntfschar **outs); diff --git a/libntfs-3g/unistr.c b/libntfs-3g/unistr.c index e70e316..199aeba 100644 --- a/libntfs-3g/unistr.c +++ b/libntfs-3g/unistr.c @@ -416,23 +416,6 @@ void ntfs_name_locase(ntfschar *name, u32 name_len, const ntfschar *locase, name[i] = locase[u]; } -/** - * ntfs_file_value_upcase - Convert a filename to upper case - * @file_name_attr: - * @upcase: - * @upcase_len: - * - * Description... - * - * Returns: - */ -void ntfs_file_value_upcase(FILE_NAME_ATTR *file_name_attr, - const ntfschar *upcase, const u32 upcase_len) -{ - ntfs_name_upcase((ntfschar*)_name_attr->file_name, - file_name_attr->file_name_length, upcase, upcase_len); -} - /* NTFS uses Unicode (UTF-16LE [NTFS-3G uses UCS-2LE, which is enough for now]) for path names, but the Unicode code points need to be -- 2.9.3 -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] unistr.c: make utf16_to_utf8_size() always honor @outs_len
utf16_to_utf8_size() was not guaranteed to fail with ENAMETOOLONG if the computed length was greater than @outs_len. This could cause a buffer overrun in ntfs_utf16_to_utf8(). This was a bug introduced by the patches to allow broken Unicode. Fix it. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- libntfs-3g/unistr.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/libntfs-3g/unistr.c b/libntfs-3g/unistr.c index 4d33bb4..190dbd8 100644 --- a/libntfs-3g/unistr.c +++ b/libntfs-3g/unistr.c @@ -458,10 +458,15 @@ void ntfs_file_value_upcase(FILE_NAME_ATTR *file_name_attr, */ /* - * Return the amount of 8-bit elements in UTF-8 needed (without the terminating - * null) to store a given UTF-16LE string. + * Return the number of bytes in UTF-8 needed (without the terminating null) to + * store the given UTF-16LE string. * - * Return -1 with errno set if string has invalid byte sequence or too long. + * On error, -1 is returned, and errno is set to the error code. The following + * error codes can be expected: + * EILSEQ The input string is not valid UTF-16LE (only possible + * if compiled without ALLOW_BROKEN_UNICODE). + * ENAMETOOLONGThe length of the UTF-8 string in bytes (without the + * terminating null) would exceed @outs_len. */ static int utf16_to_utf8_size(const ntfschar *ins, const int ins_len, int outs_len) { @@ -470,7 +475,7 @@ static int utf16_to_utf8_size(const ntfschar *ins, const int ins_len, int outs_l BOOL surrog; surrog = FALSE; - for (i = 0; i < ins_len && ins[i]; i++) { + for (i = 0; i < ins_len && ins[i] && count <= outs_len; i++) { unsigned short c = le16_to_cpu(ins[i]); if (surrog) { if ((c >= 0xdc00) && (c < 0xe000)) { @@ -511,17 +516,20 @@ static int utf16_to_utf8_size(const ntfschar *ins, const int ins_len, int outs_l count += 3; else goto fail; - if (count > outs_len) { - errno = ENAMETOOLONG; - goto out; - } } - if (surrog) + + if (surrog && count <= outs_len) { #if ALLOW_BROKEN_UNICODE count += 3; /* ending with a single surrogate */ #else goto fail; #endif /* ALLOW_BROKEN_UNICODE */ + } + + if (count > outs_len) { + errno = ENAMETOOLONG; + goto out; + } ret = count; out: -- 2.9.3 -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] reparse.c: validate minimum size of mountpoint/symlink reparse points
valid_reparse_data() would read past the end of the reparse point buffer if it was passed a malformed reparse point that had the tag for a mountpoint or a symlink but had a data buffer smaller than expected. Fix this by validating the buffer size. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- libntfs-3g/reparse.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/libntfs-3g/reparse.c b/libntfs-3g/reparse.c index 354f7bb..b0f96ae 100644 --- a/libntfs-3g/reparse.c +++ b/libntfs-3g/reparse.c @@ -446,6 +446,11 @@ static BOOL valid_reparse_data(ntfs_inode *ni, if (ok) { switch (reparse_attr->reparse_tag) { case IO_REPARSE_TAG_MOUNT_POINT : + if (size < sizeof(REPARSE_POINT) + + sizeof(struct MOUNT_POINT_REPARSE_DATA)) { + ok = FALSE; + break; + } mount_point_data = (const struct MOUNT_POINT_REPARSE_DATA*) reparse_attr->reparse_data; offs = le16_to_cpu(mount_point_data->subst_name_offset); @@ -458,6 +463,11 @@ static BOOL valid_reparse_data(ntfs_inode *ni, ok = FALSE; break; case IO_REPARSE_TAG_SYMLINK : + if (size < sizeof(REPARSE_POINT) + + sizeof(struct SYMLINK_REPARSE_DATA)) { + ok = FALSE; + break; + } symlink_data = (const struct SYMLINK_REPARSE_DATA*) reparse_attr->reparse_data; offs = le16_to_cpu(symlink_data->subst_name_offset); -- 2.9.3 -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] Correct validation of multi sector transfer protected records
On Wed, Jul 27, 2016 at 12:20:24PM +0200, Jean-Pierre André wrote: > > Can you disambiguate the word "sector" here ? This is not > a physical sector, but an ntfs logical sector whose size > is NTFS_BLOCK_SIZE (512 bytes). This might not have been > known to the original developer, and it would be useful > to have it documented somewhere. > Thanks, I've submitted a revised patch. -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] Correct validation of multi sector transfer protected records
I found that the validation contained an off-by-one error. The expression '(u32)(usa_ofs + (usa_count * 2)) > size' used 'usa_count' after it had been decremented to skip the update sequence number entry. Consequently, the code could read out of bounds, up to two bytes past the end of the MST-protected record. Furthermore, as documented in the comment in layout.h for "NTFS_RECORD" and also on MSDN for "MULTI_SECTOR_HEADER", the update sequence array must end before the last le16 in the first sector --- not merely before the end of the record. Fix the validation and move it into a helper function, as it was done identically in the read and write paths. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- libntfs-3g/mst.c | 45 +++-- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/libntfs-3g/mst.c b/libntfs-3g/mst.c index 9dff773..88b9cdb 100644 --- a/libntfs-3g/mst.c +++ b/libntfs-3g/mst.c @@ -31,6 +31,21 @@ #include "mst.h" #include "logging.h" +/* + * Basic validation of a NTFS multi-sector record. The record size must be a + * multiple of the sector size; and the update sequence array must be properly + * aligned, of the expected length, and must end before the last le16 in the + * first sector. + */ +static BOOL +is_valid_record(u32 size, u16 usa_ofs, u16 usa_count) +{ + return size % NTFS_BLOCK_SIZE == 0 && + usa_ofs % 2 == 0 && + usa_count == 1 + (size / NTFS_BLOCK_SIZE) && + usa_ofs + ((u32)usa_count * 2) <= NTFS_BLOCK_SIZE - 2; +} + /** * ntfs_mst_post_read_fixup - deprotect multi sector transfer protected data * @b: pointer to the data to deprotect @@ -57,12 +72,9 @@ int ntfs_mst_post_read_fixup_warn(NTFS_RECORD *b, const u32 size, /* Setup the variables. */ usa_ofs = le16_to_cpu(b->usa_ofs); - /* Decrement usa_count to get number of fixups. */ - usa_count = le16_to_cpu(b->usa_count) - 1; - /* Size and alignment checks. */ - if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1 || - (u32)(usa_ofs + (usa_count * 2)) > size || - (size >> NTFS_BLOCK_SIZE_BITS) != usa_count) { + usa_count = le16_to_cpu(b->usa_count); + + if (!is_valid_record(size, usa_ofs, usa_count)) { errno = EINVAL; if (warn) { ntfs_log_perror("%s: magic: 0x%08lx size: %ld " @@ -91,7 +103,7 @@ int ntfs_mst_post_read_fixup_warn(NTFS_RECORD *b, const u32 size, /* * Check for incomplete multi sector transfer(s). */ - while (usa_count--) { + while (--usa_count) { if (*data_pos != usn) { /* * Incomplete multi sector transfer detected! )-: @@ -109,10 +121,10 @@ int ntfs_mst_post_read_fixup_warn(NTFS_RECORD *b, const u32 size, data_pos += NTFS_BLOCK_SIZE/sizeof(u16); } /* Re-setup the variables. */ - usa_count = le16_to_cpu(b->usa_count) - 1; + usa_count = le16_to_cpu(b->usa_count); data_pos = (u16*)b + NTFS_BLOCK_SIZE/sizeof(u16) - 1; /* Fixup all sectors. */ - while (usa_count--) { + while (--usa_count) { /* * Increment position in usa and restore original data from * the usa into the data buffer. @@ -171,12 +183,9 @@ int ntfs_mst_pre_write_fixup(NTFS_RECORD *b, const u32 size) } /* Setup the variables. */ usa_ofs = le16_to_cpu(b->usa_ofs); - /* Decrement usa_count to get number of fixups. */ - usa_count = le16_to_cpu(b->usa_count) - 1; - /* Size and alignment checks. */ - if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1 || - (u32)(usa_ofs + (usa_count * 2)) > size || - (size >> NTFS_BLOCK_SIZE_BITS) != usa_count) { + usa_count = le16_to_cpu(b->usa_count); + + if (!is_valid_record(size, usa_ofs, usa_count)) { errno = EINVAL; ntfs_log_perror("%s", __FUNCTION__); return -1; @@ -195,7 +204,7 @@ int ntfs_mst_pre_write_fixup(NTFS_RECORD *b, const u32 size) /* Position in data of first le16 that needs fixing up. */ data_pos = (le16*)b + NTFS_BLOCK_SIZE/sizeof(le16) - 1; /* Fixup all sectors. */ - while (usa_count--) { + while (--usa_count) { /* * Increment the position in the usa and save the * original data from the data buffer into the usa. @@ -223,7 +232,7 @@ void ntfs_mst_post_write_fixup(NTFS_RECORD *b) u16 *usa_pos, *data_pos; u16 usa_ofs = le16_to_cpu(b->usa_ofs); - u16 usa_count = le16_to_cpu(b->usa_count) - 1
[ntfs-3g-devel] [RFC PATCH] Always open $Secure when mounting NTFS volume
Currently, applications that wish to access security descriptors have to explicitly open the volume's security descriptor index ("$Secure") using ntfs_open_secure(). Applications are also responsible for closing the index when done with it. However, the cleanup function for doing, ntfs_close_secure(), cannot be called easily by all applications because it requires a SECURITY_CONTEXT argument, not simply the ntfs_volume. Some applications therefore have to close the inode and index contexts manually in order to clean up properly. This proposal updates libntfs-3g to open $Secure unconditonally as part of ntfs_mount(), so that applications do not have to worry about it. ntfs_close_secure() is updated to take in a ntfs_volume for internal use, and ntfs_destroy_security_context() is now the function to call to free memory associated with a SECURITY_CONTEXT rather than a ntfs_volume. Some memory leaks in error paths of ntfs_open_secure() are also fixed. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- include/ntfs-3g/security.h | 4 ++- libntfs-3g/security.c | 87 -- libntfs-3g/volume.c| 7 src/lowntfs-3g.c | 6 +--- src/ntfs-3g.c | 6 +--- 5 files changed, 65 insertions(+), 45 deletions(-) diff --git a/include/ntfs-3g/security.h b/include/ntfs-3g/security.h index b5c6375..d27599e 100644 --- a/include/ntfs-3g/security.h +++ b/include/ntfs-3g/security.h @@ -256,7 +256,9 @@ int ntfs_set_owner_mode(struct SECURITY_CONTEXT *scx, le32 ntfs_inherited_id(struct SECURITY_CONTEXT *scx, ntfs_inode *dir_ni, BOOL fordir); int ntfs_open_secure(ntfs_volume *vol); -void ntfs_close_secure(struct SECURITY_CONTEXT *scx); +void ntfs_close_secure(ntfs_volume *vol); + +void ntfs_destroy_security_context(struct SECURITY_CONTEXT *scx); #if POSIXACLS diff --git a/libntfs-3g/security.c b/libntfs-3g/security.c index ef036af..f7085cc 100644 --- a/libntfs-3g/security.c +++ b/libntfs-3g/security.c @@ -4466,55 +4466,75 @@ int ntfs_set_ntfs_attrib(ntfs_inode *ni, /* - * Open $Secure once for all + * Open the volume's security descriptor index ($Secure) + * * returns zero if it succeeds - * non-zero if it fails. This is not an error (on NTFS v1.x) + * non-zero if it fails and the NTFS version is at least v3.x */ - - int ntfs_open_secure(ntfs_volume *vol) { ntfs_inode *ni; - int res; + ntfs_index_context *sii; + ntfs_index_context *sdh; - res = -1; - vol->secure_ni = (ntfs_inode*)NULL; - vol->secure_xsii = (ntfs_index_context*)NULL; - vol->secure_xsdh = (ntfs_index_context*)NULL; - if (vol->major_ver >= 3) { - /* make sure this is a genuine $Secure inode 9 */ - ni = ntfs_pathname_to_inode(vol, NULL, "$Secure"); - if (ni && (ni->mft_no == 9)) { - vol->secure_reentry = 0; - vol->secure_xsii = ntfs_index_ctx_get(ni, - sii_stream, 4); - vol->secure_xsdh = ntfs_index_ctx_get(ni, - sdh_stream, 4); - if (ni && vol->secure_xsii && vol->secure_xsdh) { - vol->secure_ni = ni; - res = 0; - } - } + if (vol->secure_ni) /* Already open? */ + return 0; + + ni = ntfs_pathname_to_inode(vol, NULL, "$Secure"); + if (!ni) + goto err; + + /* Verify that $Secure has the expected inode number. */ + if (ni->mft_no != FILE_Secure) { + errno = EINVAL; + goto err_close_ni; } - return (res); + + /* Allocate the needed index contexts. */ + sii = ntfs_index_ctx_get(ni, sii_stream, 4); + if (!sii) + goto err_close_ni; + + sdh = ntfs_index_ctx_get(ni, sdh_stream, 4); + if (!sdh) + goto err_close_sii; + + vol->secure_ni = ni; + vol->secure_xsii = sii; + vol->secure_xsdh = sdh; + return 0; + +err_close_sii: + ntfs_index_ctx_put(sii); +err_close_ni: + ntfs_inode_close(ni); +err: + /* Failing on NTFS versions before 3.x is expected */ + if (vol->major_ver < 3) + return 0; + ntfs_log_perror("error opening $Secure"); + return -1; } /* - * Final cleaning - * Allocated memory is freed to facilitate the detection of memory leaks + * Close the volume's security descriptor index ($Secure) */ - -void ntfs_close_secure(struct SECURITY_CONTEXT *scx) +void ntfs_close_secure(ntfs_volume *vol) { - ntfs_volume *vol; - - vol = scx->vol;
[ntfs-3g-devel] [PATCH] ntfscmp: fix tautological comparison
Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- ntfsprogs/ntfscmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ntfsprogs/ntfscmp.c b/ntfsprogs/ntfscmp.c index 555e401..cabc9c0 100644 --- a/ntfsprogs/ntfscmp.c +++ b/ntfsprogs/ntfscmp.c @@ -547,7 +547,7 @@ static void cmp_index_allocation(ntfs_attr *na1, ntfs_attr *na2) /* * FIXME: ia can be the same even if the bitmap sizes are different. */ - if (cia1.bm_size != cia1.bm_size) + if (cia1.bm_size != cia2.bm_size) goto out; if (cmp_buffer(cia1.bitmap, cia2.bitmap, cia1.bm_size, na1)) -- 2.9.0 -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] xattrs.c: remove unused variables
Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- libntfs-3g/xattrs.c | 4 1 file changed, 4 deletions(-) diff --git a/libntfs-3g/xattrs.c b/libntfs-3g/xattrs.c index f17e4ca..2b7e709 100644 --- a/libntfs-3g/xattrs.c +++ b/libntfs-3g/xattrs.c @@ -81,10 +81,6 @@ struct LE_POSIX_ACL { #endif #endif -static const char xattr_ntfs_3g[] = "ntfs-3g."; -static const char nf_ns_user_prefix[] = "user."; -static const int nf_ns_user_prefix_len = sizeof(nf_ns_user_prefix) - 1; - static const char nf_ns_xattr_ntfs_acl[] = "system.ntfs_acl"; static const char nf_ns_xattr_attrib[] = "system.ntfs_attrib"; static const char nf_ns_xattr_attrib_be[] = "system.ntfs_attrib_be"; -- 2.9.0 -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] Filename collation cleanups
- Update documentation for COLLATION_RULES - Document how ntfs_names_full_collate() compares names - Update comments and DEBUG code to reflect that ntfs_names_full_collate() always access 'upcase', even in CASE_SENSITIVE mode - Remove unneeded assignments to 'c1' and 'c2' in IGNORE_CASE mode Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- include/ntfs-3g/layout.h | 33 - libntfs-3g/unistr.c | 28 +++- 2 files changed, 31 insertions(+), 30 deletions(-) diff --git a/include/ntfs-3g/layout.h b/include/ntfs-3g/layout.h index e23fa10..ddd29c1 100644 --- a/include/ntfs-3g/layout.h +++ b/include/ntfs-3g/layout.h @@ -515,16 +515,15 @@ typedef enum { * enum COLLATION_RULES - The collation rules for sorting views/indexes/etc * (32-bit). * - * COLLATION_UNICODE_STRING - Collate Unicode strings by comparing their binary - * Unicode values, except that when a character can be uppercased, the - * upper case value collates before the lower case one. - * COLLATION_FILE_NAME - Collate file names as Unicode strings. The collation - * is done very much like COLLATION_UNICODE_STRING. In fact I have no idea - * what the difference is. Perhaps the difference is that file names - * would treat some special characters in an odd way (see - * unistr.c::ntfs_collate_names() and unistr.c::legal_ansi_char_array[] - * for what I mean but COLLATION_UNICODE_STRING would not give any special - * treatment to any characters at all, but this is speculation. + * COLLATION_BINARY - Collate by binary compare where the first byte is most + * significant. + * COLLATION_FILE_NAME - Collate Unicode strings by comparing their 16-bit + * coding units, primarily ignoring case using the volume's $UpCase table, + * but falling back to a case-sensitive comparison if the names are equal + * ignoring case. + * COLLATION_UNICODE_STRING - TODO: this is not yet implemented and still needs + * to be properly documented --- is it really the same as + * COLLATION_FILE_NAME? * COLLATION_NTOFS_ULONG - Sorting is done according to ascending le32 key * values. E.g. used for $SII index in FILE_Secure, which sorts by * security_id (le32). @@ -549,17 +548,9 @@ typedef enum { * equal then the second le32 values would be compared, etc. */ typedef enum { - COLLATION_BINARY = const_cpu_to_le32(0), /* Collate by binary - compare where the first byte is most - significant. */ - COLLATION_FILE_NAME = const_cpu_to_le32(1), /* Collate file names - as Unicode strings. */ - COLLATION_UNICODE_STRING = const_cpu_to_le32(2), /* Collate Unicode - strings by comparing their binary - Unicode values, except that when a - character can be uppercased, the upper - case value collates before the lower - case one. */ + COLLATION_BINARY= const_cpu_to_le32(0), + COLLATION_FILE_NAME = const_cpu_to_le32(1), + COLLATION_UNICODE_STRING= const_cpu_to_le32(2), COLLATION_NTOFS_ULONG = const_cpu_to_le32(16), COLLATION_NTOFS_SID = const_cpu_to_le32(17), COLLATION_NTOFS_SECURITY_HASH = const_cpu_to_le32(18), diff --git a/libntfs-3g/unistr.c b/libntfs-3g/unistr.c index 54cfd46..4d33bb4 100644 --- a/libntfs-3g/unistr.c +++ b/libntfs-3g/unistr.c @@ -143,14 +143,24 @@ BOOL ntfs_names_are_equal(const ntfschar *s1, size_t s1_len, * @name1_len: length of first Unicode name to compare * @name2: second Unicode name to compare * @name2_len: length of second Unicode name to compare - * @ic:either CASE_SENSITIVE or IGNORE_CASE - * @upcase:upcase table (ignored if @ic is CASE_SENSITIVE) - * @upcase_len:upcase table size (ignored if @ic is CASE_SENSITIVE) + * @ic:either CASE_SENSITIVE or IGNORE_CASE (see below) + * @upcase:upcase table + * @upcase_len:upcase table size * - * -1 if the first name collates before the second one, - * 0 if the names match, - * 1 if the second name collates before the first one, or + * If @ic is CASE_SENSITIVE, then the names are compared primarily ignoring + * case, but if the names are equal ignoring case, then they are compared + * case-sensitively. As an example, "abc" would collate before "BCD" (since + * "abc" and "BCD" differ ignoring case and 'A' < 'B') but after "ABC" (since + * "ABC" and "abc" are equal ignoring case and 'A' < 'a'). This matches the + * collation order of filenames as indexed in NTFS directories. + * + * If @ic is IGNOR
[ntfs-3g-devel] New repository for system compression plugin
Hello, I have made the NTFS-3G system compression plugin available in a new repository at https://github.com/ebiggers/ntfs-3g-system-compression. I also made a few small updates and updated the build system to use autotools. With libntfs-3g installed including headers, the plugin can be built and installed with the standard './configure && make && sudo make install'. Or at least that's the intent --- I still need to do more testing on more platforms. Eric -- Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH 2/2] Conditionally compile debugging code in ntfs_delete()
Although ntfs_log_trace() is defined to a no-op in non-DEBUG builds, ntfs_attr_name_get() is not. This function performs a string conversion and a memory allocation, so it is nice to have the call to it compiled out when not needed. Signed-off-by: Eric Biggers <ebigge...@gmail.com> --- libntfs-3g/dir.c | 4 1 file changed, 4 insertions(+) diff --git a/libntfs-3g/dir.c b/libntfs-3g/dir.c index bd049d2..6e97ee7 100644 --- a/libntfs-3g/dir.c +++ b/libntfs-3g/dir.c @@ -1906,17 +1906,21 @@ int ntfs_delete(ntfs_volume *vol, const char *pathname, search: while (!(err = ntfs_attr_lookup(AT_FILE_NAME, AT_UNNAMED, 0, CASE_SENSITIVE, 0, NULL, 0, actx))) { + #ifdef DEBUG char *s; + #endif IGNORE_CASE_BOOL case_sensitive = IGNORE_CASE; fn = (FILE_NAME_ATTR*)((u8*)actx->attr + le16_to_cpu(actx->attr->value_offset)); + #ifdef DEBUG s = ntfs_attr_name_get(fn->file_name, fn->file_name_length); ntfs_log_trace("name: '%s' type: %d dos: %d win32: %d " "case: %d\n", s, fn->file_name_type, looking_for_dos_name, looking_for_win32_name, case_sensitive_match); ntfs_attr_name_free(); + #endif if (looking_for_dos_name) { if (fn->file_name_type == FILE_NAME_DOS) break; -- 2.9.0 -- Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] Experimental support for Windows 10 "System Compressed" files
On Fri, Dec 04, 2015 at 10:03:41AM +0100, Jean-Pierre André wrote: > For creating a new compressed file, the procedure would be : > - create a new void file > - "truncate" it to the desired size (hence a void sparse file) > - set reparse data for the desired compression mode > - feed the data sequentially. > > Only the last step would go through the plugin, and the plugin > knows how much space has to be reserved for the pointers to > compression blocks. Interesting idea. I hadn't considered how, if at all, the plugin should support *creating* system compressed files. I think it is actually already possible for a 3rd party application to create a system compressed file on a volume mounted by a released version of NTFS-3g --- either with FUSE driver or directly with the library. The process would be as follows: 1. create empty file and truncate it to the desired size (same as you described) 2. open the "WOFCompressedData" named data stream of the file and write the compressed data to it 3. create the reparse point The hard part is, of course, creating the compressed data stream. I think what you're suggesting is that *uncompressed* data would be written to the file and the plugin would automatically compress it, which would mean users wouldn't have to deal with that part. I think it will be possible, provided that the uncompressed size is known ahead of time and the writes are made sequentially. The plugin could detect out-of-order writes and fail them. I would still have to do my own thing in wimlib if I wanted to extract files as system-compressed, since it uses libntfs-3g directly. So this ability would be for FUSE driver users. Of course, the same will also apply to reading system compressed files. I think the audience for *reading* system compressed files is much larger than the audience for *creating* system compressed files, since you always have the option of just creating standard uncompressed files, whereas users might have no choice in reading compressed files created by Windows. Something else to keep in mind is that allowing a system-compressed file to be opened for writing would conflict with any attempt by the plugin to emulate Windows' behavior where it automatically turns the file into a standard uncompressed file when it is opened for writing (perhaps it could otherwise be done in the ->open() hook). Eric -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140 ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] Experimental support for Windows 10 "System Compressed" files
Hi, On Fri, Dec 04, 2015 at 10:03:41AM +0100, Jean-Pierre André wrote: > Hi Eric, > > Please see http://jp-andre.pagesperso-orange.fr/systcomp.tar.gz > with (most of) your comments taken into account. It generally looks good. I did a basic test of the system compression plugin (reading files only) and it worked correctly. Here are some comments on the code: - I think there should be a comment in plugin.h that describes the reparse plugin architecture and when the header is needed. It should clarify that the header is there for plugin development for the FUSE drivers and is not part of the libntfs-3g API. - The 'size' argument to truncate() should be off_t. - The documentation for init() should mention the errno to set (currently EINVAL in your proposal) if the reparse tag is not supported. - The documentation for readlink() says the link target must be encoded in UTF-8, but I don't believe that's necessarily true because NTFS-3g supports alternate locales. The link target will need to be returned as a "multibyte string" as provided by ntfs_ucstombs(), which is what ntfs_make_symlink() does. - ntfs_get_reparse_data() needs a comment to document it. It should note that the return value, if not NULL, has already been validated as a proper reparse point. Also, there is the case where the file is not a reparse point. It looks like the function would fail with ENOENT in that case; is that the proper error code or should it be something else like ENODATA? Another issue is that it is prone to be confused with ntfs_get_ntfs_reparse_data(). Maybe it would be better to call it ntfs_get_reparse_point()? - There is a lot of boilerplate code around calling the reparse point operations. I have two ideas for improvement. First, there could be a function that combines ntfs_get_reparse_data() and get_reparse_plugin(): const struct plugin_operations *select_reparse_plugin(ntfs_inode *ni, ntfs_fuse_context_t *ctx, REPARSE_POINT **reparse_ret) { REPARSE_POINT *reparse; const struct plugin_operations *ops; reparse = ntfs_get_reparse_data(ni); if (!reparse) return NULL; ops = get_reparse_plugin(ctx, reparse->reparse_tag); if (ops) *reparse_ret = reparse; else free(reparse); return ops; } Second, there could be a macro which calls a plugin operation: #define CALL_REPARSE_PLUGIN(ni, op_name, ...) \ ({ \ const struct plugin_operations *ops;\ REPARSE_POINT *reparse; \ int res;\ \ ops = select_reparse_plugin(ni, ctx, ); \ if (ops) { \ if (ops->op_name) \ res = ops->op_name(ni, reparse, ##__VA_ARGS__); \ else\ res = -EOPNOTSUPP; \ free(reparse); \ } else {\ res = -errno; \ } \ res;\ }) Maybe there would be too much magic going on behind the scenes with the macro, but it does get rid of the boilerplate code. Example for truncate(): if (ni->flags & FILE_ATTR_REPARSE_POINT) { if (stream_name_len) { res = -EINVAL; goto exit; } res = CALL_REPARSE_PLUGIN(ni, truncate, size); if (res) goto exit; set_archive(ni); goto stamps; } - Perhaps it should be possible to disable external plugins at build time as a ./configure option? - In the final version, I think there should be a dedicated directory created for plugins. It's not really appropriate to drop plugins in the top-level system library directory. Probably the plugin directory should be settable by ./configure and should default to something like ${libdir}/ntfs-3g/. - The function names "set_reparse_plugin()" and "set_internal_reparse_plugins()" seem a little nonstandard. I think they should use the verb "register" instead of "set". You're "registering" a plugin, not "setting" a plugin. - Interesting idea with the fi->fh value. I'll have to see if I can do
Re: [ntfs-3g-devel] Experimental support for Windows 10 "System Compressed" files
On Wed, Nov 25, 2015 at 10:32:06AM +0100, Jean-Pierre André wrote: > Agreed. It should be plugin.h, but where should this be > located ("src/plugin.h" ?) There are a few options I can think of: 1.) Make it include/ntfs-3g/plugin.h and install it with the library headers 2.) Make it src/plugin.h and require that compiling a ntfs-3g plugin requires access to the ntfs-3g source tree 3.) Make it src/plugin.h and install it as a separate "ntfs-3g-plugin-devel" package for plugin developers to use (3) sounds like "the right way", but for now I think it would be overkill to make a separate package just for that header. What do you think about (1) or (2)? > Microsoft advertises IO_REPARSE_TAG_WIM (0x8008) > Is that not used for WIMBoot files ? No. IO_REPARSE_TAG_WIM is used to indicate a WIM image that has been "mounted" with ImageX or DISM, whereas IO_REPARSE_TAG_WOF with the "WIM provider" is used to indicate a single file whose data (specifically, the unnamed data stream) is stored in a resource in a WIM file. The former has been around since Windows Vista, whereas the latter was added in Windows 8.1 to support the "WIMBoot" feature. -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] Experimental support for Windows 10 "System Compressed" files
Hi Jean-Pierre, I've made a few updates to the "system compression" branch. I finally got around to testing files with uncompressed size >= 4 GiB. It turns out that Windows *does* permit system compression on such files. The file format changes slightly to accomodate 64-bit offsets rather than 32-bit offsets, (exactly the same as in WIM archives), so I updated the code accordingly. I added a check in ntfs_fuse_open() to forbid writing to the unnamed data stream of system compressed files, since it is not supported. Such files are effectively read-only; the write bit is being cleared in the mode as well. I suppose it would be possible to implement Windows' behavior where it automatically decompresses the file if you try to write to it, but I'm passing on that for now. I simplified chunk caching in the decompression context. Now it just holds the most recently decompressed chunk, which should be good enough for library users who are unaware of the precise compression chunk size. However, the FUSE driver still just opens the inode and allocates a new decompression context for every read. Since the FUSE driver --- the high-level one, at least --- doesn't currently maintain file descriptor structures, there wasn't much that could be done. But it does do big reads, as you mentioned. (Side note: in the FUSE filesystem I have in wimlib for mounting WIM images, I set the 'fh' member of the 'struct fuse_file_info' to a file descriptor structure in the ->open() operation, and I have 'flag_nullpath_ok' set in the 'struct fuse_operations'. Then, I just get the file descriptor structure, with no path, passed to operations such as ->read(). If something like that could be done with NTFS-3g and objects like inodes could be left open for many reads or writes, I expect it would make things a bit faster for all users. Maybe it's not possible because you could end up with the same inode opened multiple times at once, in different file descriptors...) Finally, I made a few other code cleanups and added a short subsection to the ntfs-3g man page. Eric On Tue, Sep 22, 2015 at 10:54:10PM -0500, Eric Biggers wrote: > I've pushed changes to my repository that address a few things you brought > up: > > - compiler warnings addressed > - decompression memory allocated on heap rather than stack > - a couple optimizations for decompression speed > > I'll take a closer look at the interaction with the NTFS-3g driver when I > have time. > > > > On Tue, Sep 22, 2015 at 10:49 PM, Eric Biggers <ebigge...@gmail.com> wrote: > > > Hi, > > > > "WOF compression" is as good as the other names. It still seems slightly > > wrong > > because WOF (the "Windows Overlay Filesystem Filter") is a more general > > feature, > > and this is actually the *second* compression technology that Microsoft has > > built on top of it (the first was "WIMBoot"). For now, I'll keep the code > > the > > way it is, using the "system compression" name. It could be that > > Microsoft will > > release more documentation for this. > > > > Yes, your reparse data indicates XPRESS4K compression (the fourth 32-bit > > little > > endian word is 0). FYI, here are the compressed sizes I get with the > > Silesia > > corpus (uncompressed size: 211,938,580 bytes total): > > > > LZNT1 (NTFS compression): 121,049,088 bytes > > XPRESS4K: 104,124,416 bytes > > XPRESS8K: 95,465,472 bytes > > XPRESS16K: 90,460,160 bytes > > LZX: 69,144,576 bytes > > > > Even though FUSE makes big reads, it would be nice to not have to allocate > > a > > decompression context for every read. That would avoid doing all of the > > following on a per-read basis: > > - open WofCompressedData attribute > > - allocate heap memory for ntfs_system_decompression_ctx > > - allocate heap memory for XPRESS or LZX > > - read chunk offsets from the compressed file's chunk table > > > > Having an external tool to create "system compressed" files, if people > > want that > > support, is probably the way to go. Probably that would be possible even > > with > > no changes in libntfs-3g. > > > > Eric > > > > -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [RESEND] Incorrect handling of attribute starting with single-cluster extent
[Resending with only the script attached, since the original apparently didn't go through] Hi, I finally have more information, and a potential solution, for the apparent NTFS corruption bug I've been encountering during randomized tests. The bug, as I've been experiencing it, results in an unreadable directory where readdir fails with EIO. I've attached a script that creates a small volume exhibiting this bug. Based on my analysis, the bug is actually read-side. In the example volume, the unreadable directory has an ATTRIBUTE_LIST attribute and an INDEX_ALLOCATION attribute occupying two clusters, each in a different extent. Therefore, the first INDEX_ALLOCATION extent has lowest_vcn=0 and highest_vcn=0, and the second has lowest_vcn=1 and highest_vcn=1. This unusual case, which is apparently created by the combination of a small volume and near-full MFT records, triggers some special-case behavior in ntfs_mapping_pairs_decompress_i() near line 950 in runlist.c: > /* >* A highest_vcn of zero means this is a single extent >* attribute so simply terminate the runlist with LCN_ENOENT). >*/ That behavior is incorrect if the attribute's first extent only contains a single cluster, since in that case highest_vcn=0 as well. For what it's worth, I tested the volume on Windows and it *is* able to successfully read the directory. This supports the hypothesis that the volume is valid and NTFS-3g has a bug on the read side. I think that this bug could, in theory, occur with any non-resident attribute, not just INDEX_ALLOCATION attributes. Finally, here is a proposed patch to fix the bug; please read it carefully since a lot of the code has been new to me: diff --git a/libntfs-3g/runlist.c b/libntfs-3g/runlist.c index 7e158d4..7bb9da9 100644 --- a/libntfs-3g/runlist.c +++ b/libntfs-3g/runlist.c @@ -939,40 +939,39 @@ mpa_err: "attribute.\n"); goto err_out; } - /* Setup not mapped runlist element if this is the base extent. */ + + /* If this is the base extent (if 'lowest_vcn' is 0), then +* 'allocated_size' is valid, and we can use it to compute the total +* number of clusters across all extents. If the runlist covers all +* clusters, then there was just a single extent and we can terminate +* the runlist with LCN_NOENT. Otherwise, we must terminate the runlist +* with LCN_RL_NOT_MAPPED and let the caller look for more extents. */ if (!attr->lowest_vcn) { - VCN max_cluster; + VCN num_clusters; - max_cluster = ((sle64_to_cpu(attr->allocated_size) + + num_clusters = ((sle64_to_cpu(attr->allocated_size) + vol->cluster_size - 1) >> - vol->cluster_size_bits) - 1; - /* -* A highest_vcn of zero means this is a single extent -* attribute so simply terminate the runlist with LCN_ENOENT). -*/ - if (deltaxcn) { - /* -* If there is a difference between the highest_vcn and -* the highest cluster, the runlist is either corrupt -* or, more likely, there are more extents following -* this one. -*/ - if (deltaxcn < max_cluster) { - ntfs_log_debug("More extents to follow; deltaxcn = " - "0x%llx, max_cluster = 0x%llx\n", - (long long)deltaxcn, - (long long)max_cluster); - rl[rlpos].vcn = vcn; - vcn += rl[rlpos].length = max_cluster - deltaxcn; - rl[rlpos].lcn = (LCN)LCN_RL_NOT_MAPPED; - rlpos++; - } else if (deltaxcn > max_cluster) { - ntfs_log_debug("Corrupt attribute. deltaxcn = " - "0x%llx, max_cluster = 0x%llx\n", - (long long)deltaxcn, - (long long)max_cluster); - goto mpa_err; - } + vol->cluster_size_bits); + + if (num_clusters > vcn) { + /* The runlist doesn't cover all the clusters, so there +* must be more extents. */ + ntfs_log_debug("More extents to follow; vcn = 0x%llx, " + "num_clusters = 0x%llx\n", + (long long)vcn, + (long
Re: [ntfs-3g-devel] ENOSPC when adding file to directory with near-full MFT record
On Wed, Nov 04, 2015 at 08:10:56AM +0100, Jean-Pierre André wrote: > Hi Eric, > > Attached is the patch (simpler than I first thought). > > Jean-Pierre Thanks. I tested the patch and it made the ENOSPC problem go away. I'm currently trying to track down a corruption problem that seems to trigger under a similar set of very specific circumstances. It occurs both before and after this patch, and I expect it is a different problem. More information to come... -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [BUG] ntfs_attr_pwrite() may return short count when writing to compressed attribute
Hi, This is indeed a real problem for me; I discovered it via some new automated tests of wimlib. For extraction to an NTFS volume, I had code that assumed any return value from ntfs_attr_pwrite() other than the count that was passed in indicated an error. In practice I was usually passing a count of 32768 which happened to avoid the problem, but when dealing with highly compressed WIM archives the count could, in fact, be much larger. The solution for me is, of course, to keep calling ntfs_attr_pwrite() until all bytes have been written or a real error has occurred. However, I'm suggesting that something should also be done on the libntfs-3g side to make it harder for people to run into this problem, whether that is updating the documentation or updating ntfs_attr_pwrite() itself to always try to write the full count. I am also concerned about whether all internal callers of ntfs_attr_pwrite() in libntfs-3g itself handle short writes correctly. There are quite a few callers and it looks like most don't use retry loops; however, many callers probably either write small amounts of data only or rarely operate on compressed attributes, thereby avoiding short writes in practice. What if the existing ntfs_attr_pwrite() was simply moved to an internal function, and ntfs_attr_pwrite() was written as a retry loop around the internal function? Eric On Sat, Oct 31, 2015 at 07:06:01PM +0100, Jean-Pierre André wrote: > Hi Eric, > > Eric Biggers wrote: > >Hi, > > > >The return value of ntfs_attr_pwrite() is documented as follows: > > > >>On success, return the number of successfully written bytes. If this number > >>is lower than @count this means that an error was encountered during the > >>write so that the write is partial. 0 means nothing was written (also return > >>0 when @count is 0). > > > >Hence, a short count implies that an error occurred. However, I discovered > >that > >a short count may, in fact, be returned when successfully writing to a > >compressed attribute, since ntfs_attr_pwrite() truncates the count to a > >single > >compression block only: > > > >> if (compressed) { > >> fullcount = (pos | (na->compression_block_size - 1)) + 1 - > >> pos; > >> if (count > fullcount) > >> count = fullcount; > >> } > > > >There are two possible ways to fix this: > > > > 1) Update ntfs_attr_pwrite() to always try to write the full count > > 2) Update ntfs_attr_pwrite() documentation to clarify that short returns > >are allowed and applications should, generally, continue calling > >ntfs_attr_pwrite() until all bytes have been written > > > >It looks like the callers of ntfs_attr_pwrite() in the FUSE drivers do retry > >short writes, but this doesn't appear to be the case for all internal > >callers in > >libntfs-3g itself. So I think that option (1) is preferred, if it is at all > >possible. > > Actually, the current state is intentional, and motivated > by the complexity of allocating clusters for compressed > attributes. The promise should indeed have been mitigated > for compressed attributes (a subset of data attributes). > > If this is a problem (did you find a specific case ?), a > reasonable solution is to insert a new function for writing > to data attributes, which will repeat the call until done. > > Jean-Pierre > > > > >Eric > -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] Experimental support for Windows 10 "System Compressed" files
Hi, "WOF compression" is as good as the other names. It still seems slightly wrong because WOF (the "Windows Overlay Filesystem Filter") is a more general feature, and this is actually the *second* compression technology that Microsoft has built on top of it (the first was "WIMBoot"). For now, I'll keep the code the way it is, using the "system compression" name. It could be that Microsoft will release more documentation for this. Yes, your reparse data indicates XPRESS4K compression (the fourth 32-bit little endian word is 0). FYI, here are the compressed sizes I get with the Silesia corpus (uncompressed size: 211,938,580 bytes total): LZNT1 (NTFS compression): 121,049,088 bytes XPRESS4K: 104,124,416 bytes XPRESS8K: 95,465,472 bytes XPRESS16K: 90,460,160 bytes LZX: 69,144,576 bytes Even though FUSE makes big reads, it would be nice to not have to allocate a decompression context for every read. That would avoid doing all of the following on a per-read basis: - open WofCompressedData attribute - allocate heap memory for ntfs_system_decompression_ctx - allocate heap memory for XPRESS or LZX - read chunk offsets from the compressed file's chunk table Having an external tool to create "system compressed" files, if people want that support, is probably the way to go. Probably that would be possible even with no changes in libntfs-3g. Eric -- Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] Multiple unnamed data streams?
Hi, I had a report of a file's data disappearing when an NTFS volume was archived using wimlib, which uses libntfs-3g to read from NTFS volumes. What seems to have happened is that libntfs-3g reported two unnamed data streams for a file: one nonempty and one empty, and wimlib happened to store the empty one (arbitrarily). Is it an expected or valid case for a file to have more than one unnamed data stream like this? When and how might this happen? If it's relevant: in wimlib I am basically doing something like this: ntfs_attr_search_ctx *actx = ntfs_attr_get_search_ctx(ni, NULL); while (!ntfs_attr_lookup(AT_DATA, NULL, 0, CASE_SENSITIVE, 0, NULL, 0, actx)) { ATTR_RECORD *record = actx-attr; u32 name_nchars = record-name_length; ntfschar *name = (ntfschar *) ((u8 *)record + le16_to_cpu(record-name_offset)); if (name_chars == 0) { /* unnamed stream... */ } else { /* named stream... */ } } I know that for the problem to have been apparent, ntfs_attr_lookup() must have provided the nonempty version of the stream first and the empty version second. This may explain why this problem isn't encountered more frequently, since perhaps the first matching stream is used ordinarily and the empty duplicate streams always appear second. Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] Experimental support for Windows 10 System Compressed files
[I'm re-sending this since it didn't reach the mailing list due to the SourceForge outage.] There is not too much information specifically about this feature available yet. You can try googling Windows 10 System compression to find some articles. If you are looking for information about the data format, it is not yet documented in the context of the system compression feature but it seems that Microsoft lifted the format of the compressed data directly from the Windows Imaging (WIM) file format. One way to create such files for testing is to use the Windows 10 version of the compact program. It has a new option for compressing files using one of the new formats: /exe:xpress4k /exe:xpress8k /exe:xpress16k /exe:lzx The format is designed for write-once, read-many files, such as executable files. If you try to write to such a file on Windows, Windows immediately decompresses it and turns it into a standard uncompressed file. There is no need for manual cluster allocation as the feature is not implemented directly in NTFS. However, for reading, the compressed files can be accessed randomly with chunk granuality. Each chunk can be decompressed independently. If, say, you want to read starting from byte offset 100 and the chunks are 8192 bytes, then you know you need to read starting from chunk (100/8192) = 122. Then you can load the offsets of chunks 122, and any later chunks that may be needed, from the chunk table at the beginning of the file. Those will tell you where in the file the chunks are and what their compressed sizes are. Eric On Thu, Jul 16, 2015 at 09:59:46AM +0200, Jean-Pierre André wrote: Hi Eric, Interesting. Where can I find more information about this feature, and how can I create such files on Windows 10 ? Glancing at your code, I do not see anything related to (sparse) cluster allocation. Does that mean these files are not seekable and must be read/written sequentially ? Regards Jean-Pierre Eric Biggers wrote: Hello, I've made an experimental fork of ntfs-3g that supports reading the System Compressed files that are / will be supported by Windows 10. This feature allows rarely-modified files to be stored using XPRESS or LZX compression, with stronger compression than the LZNT1 compression built into NTFS. Windows 10 will supposedly enable it on selected files automatically. Microsoft designed this feature to use a reparse point which redirects access to a named data stream, which avoided changing NTFS itself. The format of the compressed stream is identical to that of a compressed resource stored in a Windows Imaging (WIM) archive. I suspect it will be a while before NTFS-3g support would be useful to more people and it ultimately may not be worthwhile adding it at all (especially since this is a reparse-point based feature and therefore is not part of NTFS itself, and it takes quite a bit of code to support), but I thought I'd post this in case anyone else is interested. The source code is available as the system_compression branch of https://github.com/ebiggers/ntfs-3g.git. Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] Experimental support for Windows 10 System Compressed files
Hello, I've made an experimental fork of ntfs-3g that supports reading the System Compressed files that are / will be supported by Windows 10. This feature allows rarely-modified files to be stored using XPRESS or LZX compression, with stronger compression than the LZNT1 compression built into NTFS. Windows 10 will supposedly enable it on selected files automatically. Microsoft designed this feature to use a reparse point which redirects access to a named data stream, which avoided changing NTFS itself. The format of the compressed stream is identical to that of a compressed resource stored in a Windows Imaging (WIM) archive. I suspect it will be a while before NTFS-3g support would be useful to more people and it ultimately may not be worthwhile adding it at all (especially since this is a reparse-point based feature and therefore is not part of NTFS itself, and it takes quite a bit of code to support), but I thought I'd post this in case anyone else is interested. The source code is available as the system_compression branch of https://github.com/ebiggers/ntfs-3g.git. Eric -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
[ntfs-3g-devel] [PATCH] acls.c: fix validation of SID subauthority count
ntfs_valid_sid() required that the subauthority count be between 1 and 8 inclusively. However, Windows permits more than 8 subauthorities as well as 0 subauthorities: - The install.wim file for the latest Windows 10 build contains a file whose DACL contains a SID with 10 subauthorities. ntfs_set_ntfs_acl() was failing on this file. - The IsValidSid() function on Windows returns true for subauthority less than or equal to 15, including 0. There was actually already a another SID validation function that had the Windows-compatible behavior, so I merged the two together. --- include/ntfs-3g/security.h | 16 libntfs-3g/acls.c | 16 +--- libntfs-3g/security.c | 4 ++-- 3 files changed, 11 insertions(+), 25 deletions(-) diff --git a/include/ntfs-3g/security.h b/include/ntfs-3g/security.h index 8875c9c..9167155 100644 --- a/include/ntfs-3g/security.h +++ b/include/ntfs-3g/security.h @@ -222,22 +222,6 @@ enum { extern BOOL ntfs_guid_is_zero(const GUID *guid); extern char *ntfs_guid_to_mbs(const GUID *guid, char *guid_str); -/** - * ntfs_sid_is_valid - determine if a SID is valid - * @sid: SID for which to determine if it is valid - * - * Determine if the SID pointed to by @sid is valid. - * - * Return TRUE if it is valid and FALSE otherwise. - */ -static __inline__ BOOL ntfs_sid_is_valid(const SID *sid) -{ - if (!sid || sid-revision != SID_REVISION || - sid-sub_authority_count SID_MAX_SUB_AUTHORITIES) - return FALSE; - return TRUE; -} - extern int ntfs_sid_to_mbs_size(const SID *sid); extern char *ntfs_sid_to_mbs(const SID *sid, char *sid_str, size_t sid_str_size); diff --git a/libntfs-3g/acls.c b/libntfs-3g/acls.c index 925bb96..500d60f 100644 --- a/libntfs-3g/acls.c +++ b/libntfs-3g/acls.c @@ -362,16 +362,18 @@ unsigned int ntfs_attr_size(const char *attr) return (attrsz); } -/* - * Do sanity checks on a SID read from storage - * (just check revision and number of authorities) +/** + * ntfs_valid_sid - determine if a SID is valid + * @sid: SID for which to determine if it is valid + * + * Determine if the SID pointed to by @sid is valid. + * + * Return TRUE if it is valid and FALSE otherwise. */ - BOOL ntfs_valid_sid(const SID *sid) { - return ((sid-revision == SID_REVISION) -(sid-sub_authority_count = 1) -(sid-sub_authority_count = 8)); + return sid sid-revision == SID_REVISION + sid-sub_authority_count = SID_MAX_SUB_AUTHORITIES; } /* diff --git a/libntfs-3g/security.c b/libntfs-3g/security.c index 3ac4790..e00bcf9 100644 --- a/libntfs-3g/security.c +++ b/libntfs-3g/security.c @@ -224,7 +224,7 @@ int ntfs_sid_to_mbs_size(const SID *sid) { int size, i; - if (!ntfs_sid_is_valid(sid)) { + if (!ntfs_valid_sid(sid)) { errno = EINVAL; return -1; } @@ -298,7 +298,7 @@ char *ntfs_sid_to_mbs(const SID *sid, char *sid_str, size_t sid_str_size) * No need to check @sid if !@sid_str since ntfs_sid_to_mbs_size() will * check @sid, too. 8 is the minimum SID string size. */ - if (sid_str (sid_str_size 8 || !ntfs_sid_is_valid(sid))) { + if (sid_str (sid_str_size 8 || !ntfs_valid_sid(sid))) { errno = EINVAL; return NULL; } -- 2.4.5 -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
On Sun, Aug 10, 2014 at 10:45:16AM +0200, Jean-Pierre André wrote: You have defined the hash table on static data, and I do not want to enter into the meanings of static data in shared objects in various operating systems (allowed or not, shared by threads or not...). I prefer to have it dynamically allocated (hence never shared by mounts), and pointed to in the volume structure. Unfortunately this means adding an extra argument to ntfs_compress_block() and freeing the table when unmounting. (I will later post the code for that). Yes, I wondered if that would cause issues. Since the algorithm does not depend on the specific hash function used, an alternative to jumping through hoops to use the crc32_table is to swap ntfs_hash() with another 3-byte hash function, one that does not rely on static data. I will try some other ones (zlib-like which I've already tested a little bit, and maybe multiplicative hashing) and see how they affect the results. Also a minor issue : please use http://sourceforge.net/p/ntfs-3g/ntfs-3g/ci/edge/tree/libntfs-3g/compress.c as a reference for your patch for easier integration. Will do next time. I somehow missed the fact that this repository even exists! Looks like the only conflict is in the change to the copyright block... Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
On Sun, Aug 10, 2014 at 11:18:49AM +0200, Jean-Pierre André wrote: Hi, Did you compare with the Microsoft implementation ? I have only checked the biggest file in IE7 update for WinXP (WINDOWS/ie7updates/KB963027-IE7/ieframe.dll) with cluster size 4096 : Original size 6066688 Microsoft implementation 3883008 (64.0%) current implementation 3682304 (60.7%) proposed implementation 3710976 (61.2%) I have not done any comparisons with the Microsoft implementation yet. Is there a more precise way to test it than actually copying a file to a NTFS volume from Windows? I'm not surprised that it apparently produces a worse compression ratio than NTFS-3g. Although it's impossible to know for sure what their algorithm does, my expectation is that they use hash chains --- similar to my proposal, perhaps with a slightly less exhaustive search --- but use greedy parsing rather than lazy parsing. If there's a desire for even greater performance improvement, then greedy parsing is the way to go. But it will degrade the compression ratio, maybe placing it closer to the Microsoft implementation. Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
On Sun, Aug 10, 2014 at 05:29:53PM +0200, Jean-Pierre André wrote: For a better way you would have to identify which is the dll which compresses , and submit compression tasks with some control over the durations. RtlCompressBuffer() in ntdll.dll can do LZNT1 compression, which I think is the same as NTFS compression. But I wouldn't be surprised if there is actually another implementation in Microsoft's NTFS driver, which would be inaccessible. But either way, simply copying files to a NTFS volume is probably good enough for approximate benchmarking; that's what I was doing with NTFS-3g, after all. I had analyzed the difference of results, and I was surprised to find that the full length of the matching string was not always used (such as found a matching string at some position with a matching length of 20, but only used a length of 12 and the next match not being better than the expected 8 bytes), and there does not appear to be a fixed maximum length (when all bytes are the same, the matching length is 4095 as would be expected). They probably bargained the duration against the compression rate. This is strange. If the algorithm does the work to find a match at some position, it should at least extend it to its full length. Although a non-greedy parser will not necessarily choose that full length, it's unexpected that the algorithm would actually choose a sequence of matches that is *worse* than that which a greedy parser would choose. Is it possible that the length 12 match was less distant than the length 20 match? If it was, then this would be an expected result of an incomplete search using hash chains. Regardless, it's probably possible to implement something that beats the Microsoft implementation (and I have done this for the XPRESS-Huffman and LZX algorithms), so I personally wouldn't read too much into how they might be doing things. Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Hi, I did some more testing with zlib-like and multiplicative hashing. The differences for the files I was testing were quite small. However, the proposed hash function, as well as a slightly modified version of it, did come out slightly ahead. So it might as well be used. I also did a few quick tests with greedy parsing. In general, it seemed to improve performance by 10-20% and increase the size of the compressed files by 1-2%. If the performance improvement is considered more desirable, then I can change the patch to use greedy parsing. For now it's using lazy parsing, like it was before but more optimized. Here's the updated patch. diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c index 73ad283..1fefc3e 100644 --- a/libntfs-3g/compress.c +++ b/libntfs-3g/compress.c @@ -6,6 +6,7 @@ * Copyright (c) 2004-2006 Szabolcs Szakacsits * Copyright (c) 2005 Yura Pakhuchiy * Copyright (c) 2009-2014 Jean-Pierre Andre + * Copyright (c) 2014 Eric Biggers * * This program/include file is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as published @@ -21,17 +22,6 @@ * along with this program (in the main directory of the NTFS-3G * distribution in the file COPYING); if not, write to the Free Software * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * A part of the compression algorithm is based on lzhuf.c whose header - * describes the roles of the original authors (with no apparent copyright - * notice, and according to http://home.earthlink.net/~neilbawd/pall.html - * this was put into public domain in 1988 by Haruhiko OKUMURA). - * - * LZHUF.C English version 1.0 - * Based on Japanese version 29-NOV-1988 - * LZSS coded by Haruhiko OKUMURA - * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI - * Edited and translated to English by Kenji RIKITAKE */ #ifdef HAVE_CONFIG_H @@ -81,96 +71,256 @@ typedef enum { NTFS_SB_IS_COMPRESSED = 0x8000, } ntfs_compression_constants; +/* Match length at or above which ntfs_best_match() will stop searching for + * longer matches. */ +#define NICE_MATCH_LEN 18 + +/* Maximum number of potential matches that ntfs_best_match() will consider at + * each position. */ +#define MAX_SEARCH_DEPTH 24 + +/* Number of entries in the hash table for match-finding. + * + * This can be changed, but ntfs_hash() would need to be updated to use an + * appropriate shift. Also, if this is made more than 1 16 (not recommended + * for the 4096-byte buffers used in NTFS compression!), then 'crc_table' would + * need to be updated to use 32-bit entries. */ +#define HASH_LEN (1 14) + struct COMPRESS_CONTEXT { const unsigned char *inbuf; int bufsize; int size; int rel; int mxsz; - s16 head[256]; - s16 lson[NTFS_SB_SIZE]; - s16 rson[NTFS_SB_SIZE]; + s16 head[HASH_LEN]; + s16 prev[NTFS_SB_SIZE]; } ; /* - * Search for the longest sequence matching current position + * CRC table for hashing bytes for Lempel-Ziv match-finding. * - * A binary tree is maintained to locate all previously met sequences, - * and this function has to be called for all of them. + * We use a CRC32 for this purpose. But since log2(HASH_LEN) = 16, we only + * need 16 bit entries, each of which contains the low 16 bits of the entry in a + * real CRC32 table. (CRC16 would also work, but it caused more collisions when + * I tried it.) * - * This function is heavily used, it has to be optimized carefully + * Hard-coding the table avoids dealing with thread-safe initialization. + */ +static const u16 crc_table[256] = { + 0x, 0x3096, 0x612C, 0x51BA, 0xC419, 0xF48F, 0xA535, 0x95A3, + 0x8832, 0xB8A4, 0xE91E, 0xD988, 0x4C2B, 0x7CBD, 0x2D07, 0x1D91, + 0x1064, 0x20F2, 0x7148, 0x41DE, 0xD47D, 0xE4EB, 0xB551, 0x85C7, + 0x9856, 0xA8C0, 0xF97A, 0xC9EC, 0x5C4F, 0x6CD9, 0x3D63, 0x0DF5, + 0x20C8, 0x105E, 0x41E4, 0x7172, 0xE4D1, 0xD447, 0x85FD, 0xB56B, + 0xA8FA, 0x986C, 0xC9D6, 0xF940, 0x6CE3, 0x5C75, 0x0DCF, 0x3D59, + 0x30AC, 0x003A, 0x5180, 0x6116, 0xF4B5, 0xC423, 0x9599, 0xA50F, + 0xB89E, 0x8808, 0xD9B2, 0xE924, 0x7C87, 0x4C11, 0x1DAB, 0x2D3D, + 0x4190, 0x7106, 0x20BC, 0x102A, 0x8589, 0xB51F, 0xE4A5, 0xD433, + 0xC9A2, 0xF934, 0xA88E, 0x9818, 0x0DBB, 0x3D2D, 0x6C97, 0x5C01, + 0x51F4, 0x6162, 0x30D8, 0x004E, 0x95ED, 0xA57B, 0xF4C1, 0xC457, + 0xD9C6, 0xE950, 0xB8EA, 0x887C, 0x1DDF, 0x2D49, 0x7CF3, 0x4C65, + 0x6158, 0x51CE, 0x0074, 0x30E2, 0xA541, 0x95D7, 0xC46D, 0xF4FB, + 0xE96A, 0xD9FC, 0x8846, 0xB8D0, 0x2D73, 0x1DE5, 0x4C5F, 0x7CC9, + 0x713C, 0x41AA, 0x1010, 0x2086, 0xB525, 0x85B3, 0xD409, 0xE49F, + 0xF90E, 0xC998, 0x9822, 0xA8B4, 0x3D17, 0x0D81, 0x5C3B, 0x6CAD, + 0x8320, 0xB3B6, 0xE20C, 0xD29A, 0x4739, 0x77AF, 0x2615, 0x1683, + 0x0B12, 0x3B84, 0x6A3E, 0x5AA8, 0xCF0B
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Good news: I tried some different constants for multiplicative hashing, and the results for two constants were as good as the CRC-based hash function, and faster to compute (at least on x86). So here's the revised patch that does away with static data completely. --- diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c index 73ad283..b356e35 100644 --- a/libntfs-3g/compress.c +++ b/libntfs-3g/compress.c @@ -6,6 +6,7 @@ * Copyright (c) 2004-2006 Szabolcs Szakacsits * Copyright (c) 2005 Yura Pakhuchiy * Copyright (c) 2009-2014 Jean-Pierre Andre + * Copyright (c) 2014 Eric Biggers * * This program/include file is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as published @@ -21,17 +22,6 @@ * along with this program (in the main directory of the NTFS-3G * distribution in the file COPYING); if not, write to the Free Software * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * A part of the compression algorithm is based on lzhuf.c whose header - * describes the roles of the original authors (with no apparent copyright - * notice, and according to http://home.earthlink.net/~neilbawd/pall.html - * this was put into public domain in 1988 by Haruhiko OKUMURA). - * - * LZHUF.C English version 1.0 - * Based on Japanese version 29-NOV-1988 - * LZSS coded by Haruhiko OKUMURA - * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI - * Edited and translated to English by Kenji RIKITAKE */ #ifdef HAVE_CONFIG_H @@ -81,96 +71,183 @@ typedef enum { NTFS_SB_IS_COMPRESSED = 0x8000, } ntfs_compression_constants; +/* Match length at or above which ntfs_best_match() will stop searching for + * longer matches. */ +#define NICE_MATCH_LEN 18 + +/* Maximum number of potential matches that ntfs_best_match() will consider at + * each position. */ +#define MAX_SEARCH_DEPTH 24 + +/* log base 2 of the number of entries in the hash table for match-finding. */ +#define HASH_SHIFT 14 + +/* Constant for the multiplicative hash function. */ +#define HASH_MULTIPLIER 0x1E35A7BD + struct COMPRESS_CONTEXT { const unsigned char *inbuf; int bufsize; int size; int rel; int mxsz; - s16 head[256]; - s16 lson[NTFS_SB_SIZE]; - s16 rson[NTFS_SB_SIZE]; + s16 head[1 HASH_SHIFT]; + s16 prev[NTFS_SB_SIZE]; } ; /* + * Hash the next 3-byte sequence in the input buffer + */ +static inline unsigned int ntfs_hash(const u8 *p) +{ + u32 str; + +#if defined(__i386__) || defined(__x86_64__) + /* Unaligned access okay */ + str = *(u32 *)p 0xFF; +#else + str = ((u32)p[0] 0) | ((u32)p[1] 8) | ((u32)p[2] 16); +#endif + + return (str * HASH_MULTIPLIER) (32 - HASH_SHIFT); +} + +/* * Search for the longest sequence matching current position * - * A binary tree is maintained to locate all previously met sequences, - * and this function has to be called for all of them. + * A hash table, each entry of which points to a chain of sequence + * positions sharing the corresponding hash code, is maintained to speed up + * searching for matches. To maintain the hash table, either + * ntfs_best_match() or ntfs_skip_position() has to be called for each + * consecutive position. + * + * This function is heavily used; it has to be optimized carefully. + * + * This function sets pctx-size and pctx-rel to the length and offset, + * respectively, of the longest match found. + * + * The minimum match length is assumed to be 3, and the maximum match + * length is assumed to be pctx-mxsz. If this function produces + * pctx-size 3, then no match was found. + * + * Note: for the following reasons, this function is not guaranteed to find + * *the* longest match up to pctx-mxsz: * - * This function is heavily used, it has to be optimized carefully + * (1) If this function finds a match of NICE_MATCH_LEN bytes or greater, + * it ends early because a match this long is good enough and it's not + * worth spending more time searching. * - * Returns the size of the longest match, - * zero if no match is found. + * (2) If this function considers MAX_SEARCH_DEPTH matches with a single + * position, it ends early and returns the longest match found so far. + * This saves a lot of time on degenerate inputs. */ - -static int ntfs_best_match(struct COMPRESS_CONTEXT *pctx, int i) +static void ntfs_best_match(struct COMPRESS_CONTEXT *pctx, const int i, + int best_len) { - s16 *prev; - int node; - register long j; - long maxpos; - long startj; - long bestj; - int bufsize; - int bestnode; - register const unsigned char *p1,*p2; - - p1 = pctx-inbuf; - node = pctx-head[p1[i] 255
[ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
The current compression algorithm does lazy parsing of matches, backed by a binary tree match-finder with one byte hashing. Performance-wise, this approach is not very good for several reasons: (1) One byte hashing results in a lot of hash collisions, which slows down searches. (2) With lazy parsing, many positions never actually need to be matched. But when using binary trees, all the work needs to be done anyway, because the sequence at each position needs to be inserted into the appropriate binary tree. This makes binary trees better suited for optimal parsing --- but that isn't being done and is probably too slow to be practical for NTFS. (3) There was also no hard cut-off on the amount of work done per position. This did not matter too much because the buffer size is never greater than 4096 bytes, but in degenerate cases the binary trees could generate into linked lists and there could be hundreds of matches considered at each position. This patch changes the algorithm to use hash chains instead of binary trees, with much stronger hashing. It also introduces useful (for performance) parameters, such as the nice match length and maximum search depth, that are similar to those used in other commonly used compression algorithms such as zlib's DEFLATE implementation. The performance improvement is very significant, but data-dependent. Compressing text files is faster by about 3x; x86 executables files by about 3x; random data by about 1.7x; all zeroes by about 1.2x; some degenerate cases by over 10x. (I did my tests on an x86_64 CPU.) The compression ratio is the same or slightly worse. It is less than 1% worse on all files I tested except an ASCII representation of a genome. No changes were made to the decompressor. --- libntfs-3g/compress.c | 484 +- 1 file changed, 324 insertions(+), 160 deletions(-) diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c index 69b39ed..e62c7dd 100644 --- a/libntfs-3g/compress.c +++ b/libntfs-3g/compress.c @@ -6,6 +6,7 @@ * Copyright (c) 2004-2006 Szabolcs Szakacsits * Copyright (c) 2005 Yura Pakhuchiy * Copyright (c) 2009-2013 Jean-Pierre Andre + * Copyright (c) 2014 Eric Biggers * * This program/include file is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as published @@ -21,17 +22,6 @@ * along with this program (in the main directory of the NTFS-3G * distribution in the file COPYING); if not, write to the Free Software * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * A part of the compression algorithm is based on lzhuf.c whose header - * describes the roles of the original authors (with no apparent copyright - * notice, and according to http://home.earthlink.net/~neilbawd/pall.html - * this was put into public domain in 1988 by Haruhiko OKUMURA). - * - * LZHUF.C English version 1.0 - * Based on Japanese version 29-NOV-1988 - * LZSS coded by Haruhiko OKUMURA - * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI - * Edited and translated to English by Kenji RIKITAKE */ #ifdef HAVE_CONFIG_H @@ -81,96 +71,210 @@ typedef enum { NTFS_SB_IS_COMPRESSED = 0x8000, } ntfs_compression_constants; +/* Match length at or above which ntfs_best_match() will stop searching for + * longer matches. */ +#define NICE_MATCH_LEN 16 + +/* Maximum length at which a lazy match will be attempted. */ +#define MAX_LAZY_MATCH_LEN 20 + +/* Maximum number of potential matches that ntfs_best_match() will consider at + * each position. */ +#define MAX_SEARCH_DEPTH 24 + +/* Number of entries in the hash table for match-finding. This can be changed, + * but it should be a power of 2 so that computing the hash bucket is fast. */ +#define HASH_LEN (1 14) + struct COMPRESS_CONTEXT { const unsigned char *inbuf; int bufsize; int size; int rel; int mxsz; - s16 head[256]; - s16 lson[NTFS_SB_SIZE]; - s16 rson[NTFS_SB_SIZE]; + s16 head[HASH_LEN]; + s16 prev[NTFS_SB_SIZE]; } ; +#define CRC32_POLYNOMIAL 0xEDB88320 + +static u32 crc32_table[256]; + +static void do_crc32_init(void) +{ + int i, j; + u32 r; + + for (i = 0; i 256; i++) { + r = i; + for (j = 0; j 8; j++) + r = (r 1) ^ (CRC32_POLYNOMIAL ~((r 1) - 1)); + crc32_table[i] = r; + } +} + +/* + * Initialize the CRC32 table for ntfs_hash() if not done already + */ +static void crc32_init(void) +{ + static int done = 0; + + if (!done) { + do_crc32_init(); + done = 1; + } +} + +/* + * Hash the next 3-byte sequence in the input buffer + * + * Currently, we use a hash function similar to that used in LZMA. It + * takes slightly longer to compute than zlib's hash
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Also, results from tests I did copying a file to compressed directory on a NTFS-3g mount, with time elapsed and compressed sizes shown: silesia_corpus.tar (211,957,760 bytes) Current 43.318s 111,230,976 bytes Proposed12.903s 111,751,168 bytes canterbury_corpus.tar (2,826,240 bytes): Current 1.778s 1,232,896 bytes Proposed0.142s 1,241,088 bytes Firefox-11-windows-bin.tar (38,225,920 bytes) Current 5.685s 27,418,624 bytes Proposed1.992s 27,492,352 bytes boot.wim (361,315,088 bytes, no internal compression) Current 64.682s 189,124,608 bytes Proposed20.990s 190,386,176 bytes mp3-files.tar (201,646,080 bytes) Current 14.547s 200,916,992 bytes Proposed8.585s 200,937,472 bytes linux-2.4.31-src.tar (174,417,920 bytes) Current 36.115s 75,751,424 bytes Proposed10.262s 76,251,136 bytes gcc-4.7.3.tar.bz2 (82,904,224 bytes) Current 5.637s 82,907,136 bytes Proposed3.276s 82,907,136 bytes ntoskrnl.exe (3,911,040 bytes) Current 0.492s 2,789,376 bytes Proposed0.186s 2,789,376 bytes E_coli_genome.fasta (4,706,040 bytes) Current 1.101s 2,060,288 bytes Proposed0.458s 2,351,104 bytes shakespeare.txt (5,328,042 bytes) Current 0.909s 3,321,856 bytes Proposed0.303s 3,321,856 bytes zeroes.bin (134,217,728 bytes) Current 3.053s 0 bytes Proposed2.601s 0 bytes ascending_digrams.bin (16,777,216 bytes) Current 4.309s 16,777,216 bytes Proposed0.673s 16,777,216 bytes degenerate_trigrams.bin (16,777,216 bytes) Current 9.292s 5,242,880 bytes Proposed0.684s 5,242,880 bytes -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel