Re: APFS: mmap page fault can take up to minutes after ftruncate/F_PREALLOCATE

2019-12-19 Thread Thomas Tempelmann via Filesystem-dev
Out of curiosity, I just ran the test on my 10.13.6 system:

On HFS+, the tool finishes in about 5 seconds, ending up with a 4 GB file.

However, on a freshly created APFS vol (no encryption, on same SSD as the
HFS+ volume), it runs for about an hour! But no extra wait time around 2 GB.

I guess that APFS's mmap was so slow on 10.13 that the Apple engineers
optimized this in 10.14, but still with a hiccup at 2GB. And then fixed
that as well in 10.15.

Thomas

>
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Rare bug in HFS path conversion function involving slashes in folder names

2019-09-18 Thread Thomas Tempelmann via Filesystem-dev
Just a small heads-up when you still have to deal with HFS paths:

Getting the HFS Path from an NSURL using CFURLCopyFileSystemPath (path, 1)
does not work if the path contains a file that uses a "/" (or ":",
respectively) in its name.

Filed as FB7291855, see also
http://www.openradar.me/radar?id=4994410022436864

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: searchfs on Catalina - trouble with split system vs user volume

2019-06-07 Thread Thomas Tempelmann via Filesystem-dev
Quick update: I've been informed that the new data volume appears as a mount 
point at /System/Volumes/Data, as one can also tell by using the "mount" 
command.

My problem was that my app did not list this new volume because it's 
categorized as hidden.

All I have to do is perform a separate searchfs() on that extra volume.

Thanks for the quick help (which I got in a private email)!

Thomas

> On 4. Jun 2019, at 21:20, Thomas Tempelmann  wrote:
> 
> The new system splits up the boot volume into a user's and the system's 
> volume, somehow. The problem is now that when I try to search the boot volume 
> for files, I get different results:
> 
> When I search "/" recursively, using [NSFileManager 
> contentsOfDirectoryAtURL:...], I get to scan the entire "Volume" as the user 
> sees it, including both the System and the User's folders.
> 
> However, when I search "/" using searchfs(), I see only items in the system's 
> hidden volume, but no user files.
> 
> I guess this is a feature and not a bug. Could I be advised how to deal with 
> this, i.e. how I can search both the system and the user volumes with 
> searchfs? How do I determine and address them?
> 
> -- 
> Thomas Tempelmann, http://apps.tempel.org/
> 

 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


searchfs on Catalina - trouble with split system vs user volume

2019-06-04 Thread Thomas Tempelmann via Filesystem-dev
The new system splits up the boot volume into a user's and the system's
volume, somehow. The problem is now that when I try to search the boot
volume for files, I get different results:

When I search "/" recursively, using [NSFileManager
contentsOfDirectoryAtURL:...], I get to scan the entire "Volume" as the
user sees it, including both the System and the User's folders.

However, when I search "/" using searchfs(), I see only items in the
system's hidden volume, but no user files.

I guess this is a feature and not a bug. Could I be advised how to deal
with this, i.e. how I can search both the system and the user volumes with
searchfs? How do I determine and address them?

-- 
Thomas Tempelmann, http://apps.tempel.org/
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: FSEvents API and sandboxing

2019-05-16 Thread Thomas Tempelmann
On Thu, May 16, 2019 at 7:43 PM Dominic Giampaolo 
wrote:

>
> > Without knowing the particulars about sandboxing, I wonder why you
> originally wanted to use multiple streams anyway. I had thought it's
> obvious that using multiple streams for watching multiple locations would
> be rather inefficient compared to having only one watcher for "/", and then
> filter the items yourself (may in a separate thread so that you do not hold
> up the FSEvents handler for too long - not sure if that's necessary,
> though).
> >
> I don't know how/why you would come to this conclusion.
>
> It is always more efficient to filter things as far up the stream as
> possible.  Filtering in fseventsd avoids it sending events that wake up
> your daemon/app for paths it isn't interested in.  This is going to be
> better from all perspectives: cpu usage, power consumption, and memory
> usage.  Watching "/" gets a *lot* of events which means you daemon or app
> gets woken up very frequently.
>

Okay, maybe my case, where I believe watching "/" is best comes from the
fact that my app doing this is a file browser that usually shows files from
all over the disk, after doing a file search on the disk.

If I show a few 100 files from all over the disk, I it easier to just ask
to watch for "/" than for 100s of different dirs. I assumed that in this
case FSEvents might be overburdened with sorting out all those dirs,
because it's designed to look only for a small amount. Maybe I got this
impression from old docs suggesting something like that, but I'm not sure.

Also, my app is in the foreground for that anyway, so it's not as much an
impact if I do this badly compared to a bg app that does this all the time.

Also, I only request an update a few times per second, i.e. I then get the
entire bunch of changes in one go, and sort them out myself. That should
put less stress on the FSEvents management, because it does not have to
sort it out for me, and it has all the data assembled anyway. I think.
Maybe I have this all wrong. Without much info on how FSEvents internally
works, it's difficult to get this right. I make assumptions on the way I'd
have implemented it.

I see your points, though, and I should have been more aware of the
possibility that Dragan's requirements are different from mine and my
advice therefore not being good.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: FSEvents API and sandboxing

2019-05-15 Thread Thomas Tempelmann
Without knowing the particulars about sandboxing, I wonder why you
originally wanted to use multiple streams anyway. I had thought it's
obvious that using multiple streams for watching multiple locations would
be rather inefficient compared to having only one watcher for "/", and then
filter the items yourself (may in a separate thread so that you do not hold
up the FSEvents handler for too long - not sure if that's necessary,
though).

I think doing that is faster than letting the FSEvents system sort them out
for you, and then repeatedly calling your app regardless. So, now that
you're figured out that you need to watch "/" anyway, just make sure you
use only one stream to improve overall system performance.

I see, you've come to the same conclusion in the end, for different reasons
(mine is for performances, yours for avoiding redundancy) :)

Also, if you need to track changes to the paths of your files, you could
instead use NSURLs made based on file IDs instead of paths, by calling
[NSURL fileReferenceURL]. You could then also store the path alongside, and
every time you learn of possible changes, you resolve the path again to see
if it has moved, if that's what you need to know.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS cloning not working across volumes inside same container?

2019-05-07 Thread Thomas Tempelmann
That makes sense to me.
Thank you for taking the time to explain.

Thomas


On Tue, May 7, 2019 at 4:33 PM Dominic Giampaolo 
wrote:

>
> > I accept that it's currently not possible - but just to help me
> understand this, could you please elaborate:
> >
> > In theory, would it be possible to share a cloned file between two
> volumes? Since they're in the same container, and the space management is
> container-wide, could this be made possible? Or are there technical reason
> why this couldn't work at all (such as that IDs needed to manage this are
> not sharable between volumes)?
> >
> Space management (i.e. is this block allocated or not) is shared.  But
> cloning introduces another layer which is a reference count the blocks and
> that is not shared between volumes within a container.  Further it's
> complicated by snapshots and the management of the reference counts across
> snapshots and the live view of the file system.
>
> Consider what would happen if you cloned a file between two volumes, took
> a snapshot on both volumes, modified part of the file on one of the volumes
> then deleted it on the other.  Managing the reference counting would be a
> nightmare, especially since you'd have to manage locking some new shared
> data structure between two different volumes.
>
> We'll definitely think about it but as always, a compelling use case is
> needed to justify the work.  In other words, we can't just do something
> because it would be kinda cool - there has to be a good reason to expend
> the effort required to do it.
>
>
> --dominic
>
>
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS cloning not working across volumes inside same container?

2019-05-07 Thread Thomas Tempelmann
Hi Dominic,
Thank you for jumping in.

I accept that it's currently not possible - but just to help me understand
this, could you please elaborate:

In theory, would it be possible to share a cloned file between two volumes?
Since they're in the same container, and the space management is
container-wide, could this be made possible? Or are there technical reason
why this couldn't work at all (such as that IDs needed to manage this are
not sharable between volumes)?

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS cloning not working across volumes inside same container?

2019-05-06 Thread Thomas Tempelmann
>
> The think that most closely resembles the HFS catalog isn't shared amongst
> volumes.


But it is. There is one btree for all volumes in a container (well,
technically, there are two btrees, one for the IDs and one for the actual
file catalog, but both are shared over all vols).


> clonefile(2) has the same restriction as link(2) and rename(2), it has to
> be done within the same volume/mounted filesystem.
>

So that's an API restriction, but not a technically necessary one on APFS,
IMO. I guess APFS was created with iOS in mind, where we do not have
multiple volumes, so no one gave this a second thought. It's a shame, as it
could have some great savings when sharing volumes in some situations like
the one I described. Still, it's a niche, and so I'm not hopeful this will
get improved. Sigh.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


APFS cloning not working across volumes inside same container?

2019-05-06 Thread Thomas Tempelmann
I was just looking into making a Deduplication tool for macOS / APFS.

One common scenario would be to deduplicate macOS systems on separate
volumes. Like, for us developers who have several macOS versions on their
computers for software testing purposes.

I thought I could save a lot of space by having files inside /System and
/Library share the same space by relying on APFS's cloning feature.

However, when I tested copying a file from one volume to another volume,
both being in the same APFS container (partition), Finder would still copy
the data instead of doing the clone thing.

Now I wonder if that's just a shortcoming of the Finder or a problem with
the macOS API.

After all, since APFS shares a single catalog between all volumes of its
container, cloning should be possible across volumes, shouldn't it?

-- 
Thomas Tempelmann, http://apps.tempel.org/
Follow me on Twitter: https://twitter.com/tempelorg
Read my programming blog: http://blog.tempel.org/
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: readdir vs. getdirentriesattr

2019-04-29 Thread Thomas Tempelmann
> The volume ID is at a higher layer, but the enumeration code attempts to
> retrieve the value less than once per URL returned. That said, if the
> directory hierarchy has few items per directory, the number of times it is
> retrieved will be higher. You can write a bug report and I'll look to see
> if there are ways to improve the performance.
>

As I just wrote, going with your proposed enumeratorAtURL: method takes
care of that already. I may still write a report, and will let you know if
I do.

Though I still haven't gotten to see if I can speed up recursive search by
using multiple threads for each directory read. If that helps, then I
cannot use enumeratorAtURL for that but would have to revert to classic
recursion, and which point the volumeID checking comes into play again
(but, with only checking it whenever I enter a dir, it'll be less of an
impact).


> In the meantime, there's something you could do to improve the performance
> (even if our code changes). You can get the volumeIdentifier for the
> directory you start enumerating from. It will be the same for the entire
> enumeration except when directories are seen on other file systems (today,
> that's volume mount points and mount triggers). Like this:
>

I already do that in my actual working code. I was just showing this more
inefficient way of ALWAYS getting the value in order to demonstrate its
performance impact.

It used to be based on heavily modified fts(3). I rewrote it for Mojave to
> improve the memory footprint. It uses getattrlistbulk()for everything
> except when ti sees a mount point, and then it calls getattrlist on the
> mount point path to get the attributes from the other file system's root
> directory.
>

Glad to see you're still on top of it.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: readdir vs. getdirentriesattr

2019-04-29 Thread Thomas Tempelmann
Quick update:


> -[enumeratorAtURL:includingPropertiesForKeys:options:errorHandler:] also
>> supports recursive enumeration (which stops at device boundaries -- you'll
>> see mount points but not their contents) so you don't have to do that
>> yourself.
>>
>
This is indeed faster than most of the other options, but, if only looking
for file names, not as fast as fts_read. When also looking at file sizes,
it's the fastest, though. Here are run times for "best case" in an APFS
volume ("/System" folder). These times come out quite similarly on repeated
runs.

*Target: /System, format: apfs*

*--- contentsOfDirectoryAtURL ---*

*3.35s, scanned: 336991, found: 520, size: 0*

*4.31s, scanned: 336991, found: 520, size: 9184548546*

*--- getattrlistbulk() ---*

*3.45s, scanned: 336991, found: 520, size: 0*

*3.50s, scanned: 336991, found: 520, size: 9184548546*

*--- readdir() ---*

*3.05s, scanned: 336991, found: 520, size: 0*

*8.04s, scanned: 336991, found: 520, size: 9184548546*

*--- fts ---*

*2.32s, scanned: 336991, found: 520, size: 0*

*2.40s, scanned: 336991, found: 520, size: 9184548546*

*--- enumeratorAtURL ---*

*1.97s, scanned: 336991, found: 520, size: 0*

*2.52s, scanned: 336991, found: 520, size: 9184548546*

The first of each test type looks for names only (and it extracts them from
the URL, not by getting it as a resource value like the code in
https://developer.apple.com/documentation/foundation/nsfilemanager/1409571-enumeratoraturl
suggests.
The second test also fetches the file size.

Note that on network volumes, readdir may be faster than the others,
though. Also depends on the server (Linux based NAS vs. macOS).

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: readdir vs. getdirentriesattr

2019-04-29 Thread Thomas Tempelmann
Jim,

In contentsOfDirectoryAtURL, instead of "includingPropertiesForKeys:nil",
> use "includingPropertiesForKeys:@[NSURLVolumeIdentifierKey]" (and add
> whatever other property keys you know you'll need). The whole purpose of
> the includingPropertiesForKeys argument is so the enumerator code can
> pre-fetch the properties you need as efficiently as possible. The
> enumeration will be a bit slower, but the entire operation of enumerating
> and getting the properties from the URLs returned will be faster.
>

I know. That's the theory, but my benchmarking says it makes no difference
in that case. And that's quite logical because the pre-caching is meant for
data that has to come from the lowest level, i.e. where the catalog data is
fetched - it makes sense to combine multiple property requests into one,
just like the getdirentriesattr is meant to used like. However, as I
explained the volume ID is not stored in the catalog but at a higher level,
and therefore pre-fetching this at the lowest level makes no difference, at
requires no catalog access, right?

My performance tests always runs twice in fast succession, so that in the
second run, due to caching, all data's ready and does not incur random
delays that would give imprecise measurements. Sure, this does not give me
the worst case, but it gives me the best case results at least. And these
best case results say: Scanning "/System" on my Mac without getting the
Volume ID takes less than 3s, but with (with and without pre-fetching)
getting it takes over 6s. That's TWICE as much time. With smaller dir tree
the difference is less, possibly because then there's other caches helping.

I assume that when I re-run the scan, after having released all NSURLs from
the previous scan (even by restarting the test app), the framework creates,
fresh, NSURL objects, right? It's not that there is only one
NSURL instance on the entire system per volume item, shared between all
processes, or is there? The only caching, once I release an NSURL, is at
the volume block cache level, isn't it?

Also, use
> -[enumeratorAtURL:includingPropertiesForKeys:options:errorHandler:] instead
> of -[contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error:]
> unless you really need an NSArray of NSURLs. If your code is just
> processing all of the URLs and has no need to keep them after processing,
> there's no reason to add them to an array (which takes time and adds to
> peak memory pressure).
>

Thanks, that makes sense.

-[enumeratorAtURL:includingPropertiesForKeys:options:errorHandler:] also
> supports recursive enumeration (which stops at device boundaries -- you'll
> see mount points but not their contents) so you don't have to do that
> yourself.
>

Is that based on fts_read? Because I found that this is much faster on
local volumes (not on network vols, though) than all other ways I've tried.
And it brings along the st_dev value without time penalty, unlike
contentsOfDirectoryAtURL.

Regardless, I'll give that a try.

-- 
Thomas Tempelmann, http://apps.tempel.org/
Follow me on Twitter: https://twitter.com/tempelorg
Read my programming blog: http://blog.tempel.org/
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: readdir vs. getdirentriesattr

2019-04-29 Thread Thomas Tempelmann
Doing more performance tests for directory traversal I ran into a
performance issue with [NSURL contentsOfDirectoryAtURL:]:

See this typical code for scanning a directory:

  NSArray *contentURLs = [fileMgr contentsOfDirectoryAtURL:parentURL
includingPropertiesForKeys:nil options:0 error:nil];

  for (NSURL *url in contentURLs) {

id value;

[url getResourceValue: forKey:NSURLVolumeIdentifierKey error:nil];


I would have expected the call for fetching NSURLVolumeIdentifierKey to be
rather fast because the upper file system layer should know which volume
this belong to because it has to know which FS driver it has to pass the
calls to. I.e., asking for the volume ID should be much faster than
fetching actual directory data such as the file size, for instance.

However, it turns out that this is just as slow as getting actual data from
the lower levels.

Could it be that the call is not optimized for returning this information
as earlier as possible but that it passes the call down to the lowest level
regardless of need?

I mention this because it degrades the performance of a recursive directory
scan significantly in my tests (on both APFS and HFS) - by more than 30%!
The only thing even slower would be to call stat() instead (for getting the
st_dev value).

Is this worth having looked at? If so, should I report this via bugreporter
(though, when I'm then asked to provide a system profiler report then, it's
not going anywhere)?

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: readdir vs. getdirentriesattr

2019-04-22 Thread Thomas Tempelmann
Jim,
thanks for your comments.

If all you need is filenames and no other attributes, readdir is usually
> faster than getattrlistbulk because it doesn't have to do as much
> work. However, if you need additional attributes, getattrlistbulk is
> usually much faster. Some of that extra work done
> by getattrlistbulk involves checking to see what attributes were requested
> and packing the results into the result buffer.
>

What's interesting is that on HFS+, readdir is not faster in my tests, but
on a recent and fast Mac (i.e. not on my MacPro 2010), it can be twice as
fast as the others when scanning an APFS volume. I wonder why. Is the
implementation for getattrlistbulk in the APFS driver inefficient compared
to the one in HFS+? The source code for the APFS FS driver has still not be
published, or has it?

You'll find that lstat is slightly faster than getattrlist (when
> getattrlist is returning the same set of attributes) for the same reason.
> There's no extra code needed in lstat to see what attributes were requested
> and packing the results into the result buffer.
>

It's also significantly faster than using NSURL's getResourceValue, even if
the NSURL has already been created regardless. That's probably due to all
the objc overhead.

By the way, I haven't tested this but I would expect
> enumeratorAtURL:includingPropertiesForKeys:options:errorHandler: (followed
> by a "for (NSURL *fileURL in directoryEnumerator)" loop) to be slightly
> faster than
> contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error: because
> the URLs aren't retained in a NSArray. Using CFURLEnumerator may also be
> slightly faster than NSFileManager's directory enumeration.
>

Now, that's something I had not considered, yet. Will try.


> Using POSIX/BSD APIs will be the fastest, but that means you have to deal
> with the different capabilities between file systems yourself (although
> getattrlistbulk helps with that a lot).
>

*Most interesting, though:*

Today someone pointed out *fts_read*. This does, so far always beat all
other methods, especially if I also need extra attributes (e.g. file size).

Can you give some more information about the fts implementation? Is this
user-library-level oder kernel code that's doing this? I had expected that
this would only be a convenience userland function that uses readdir or
similar BSD functions, but it appears to beat them all, suggesting this is
optimized at a lower level.


I have updated my test project accordingly (with the fts code) in case
anyone likes to run their own tests:

  http://files.tempel.org/Various/DirScanner.zip

Also, I am wondering if using concurrent threads will speed up scanning a
dir tree on an SSD as well, by distributing each directory read to one
thread (or dispatch queue). Will eventually try, but probably not soon.
Gotta get my program out of the door soon, first.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: readdir vs. getdirentriesattr

2019-04-21 Thread Thomas Tempelmann
I like to add some info on a thread from 2015:

I recently worked on my file search tool (FAF) and wanted to make sure that I 
use the best method to deep-scan directory contents.

I had expected that getattrlistbulk() would always be the best choice, but it 
turns out that opendir/readdir perform much better in some cases, oddly (this 
is about reading just the file names, no other attributes).

See my blog post: https://blog.tempel.org/2019/04/dir-read-performance.html 
<https://blog.tempel.org/2019/04/dir-read-performance.html>

There's also a test project trying out the various methods.

Any comments, insights, clarifications and bug reports are most welcome.

Enjoy,
 Thomas Tempelmann


> On 12. Jan 2015, at 17:33, Jim Luther  wrote:
> 
> getattrlistbulk() works on all file systems. If the file system supports bulk 
> enumeration natively, great! If it does not, then the kernel code takes care 
> of it. In addition, getattrlistbulk() supports all non-volume attributes 
> (getattrlistbulk only supported a large subset).
> 
> The API calling convention for getattrlistbulk() is slightly different than 
> getattrlistbulk() — read the man page carefully. In particular:
> 
> • ATTR_CMN_NAME and ATTR_CMN_RETURNED_ATTRS are required (requiring 
> ATTR_CMN_NAME allowed us to get rid of the newState argument).
> • A new attribute, ATTR_CMN_ERROR, can be requested to detect error 
> conditions for a specific directory entry.
> • The method for determining when enumeration is complete is different. You 
> just keep calling getattrlistbulk() until 0 entries are returned.
> 
> - Jim
> 
>> On Jan 11, 2015, at 9:31 PM, James Bucanek  wrote:
>> 
>> Eric,
>> 
>> I would just like to clarify: the new getattrlistbulk() function works on 
>> all filesystem. We don't have to check the volume's VOL_CAP_INT_READDIRATTR 
>> capability before calling it, correct?
>> 
>> James Bucanek
>> 
>>> Eric Tamura December 10, 2014 at 5:57 PM
>>> It should be much faster.
>>> 
>>> Also note that as of Yosemite, we have added a new API: getattrlistbulk(2), 
>>> which is like getdirentriesattr(), but supported in VFS for all 
>>> filesystems. getdirentriesattr() is now deprecated. 
>>> 
>>> The main advantage of the bulk call is that we can return results in most 
>>> cases without having to create a vnode in-kernel, which saves on I/O: HFS+ 
>>> on-disk layout is such that all of the directory entries in a given 
>>> directory are clustered together and we can get multiple directory entries 
>>> from the same cached on-disk blocks.
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-22 Thread Thomas Tempelmann
Eric,

What information are you trying to get out of each scan?  You will always
> have a time-of-use vs. time-of-check race condition here .. the filesystem
> is in a perennial state of flux.
>

That's one thing I was surprised about when using searchfs() on APFS vs.
HFS+: On HFS, I'd frequently get the specific return code telling me that
the btree has changed, meaning I should restart the search to make sure I
don't miss nodes.
However, I never see this with APFS. I did then assume, err, hope, that
searchfs on APFS holds a temporary snapshot, thereby preventing changes to
the searched tree. Are you saying that this is not the case, and even, that
searchfs on APFS doesn't tell me about changed node order when it should?

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-22 Thread Thomas Tempelmann
>
> So far I hadn't had much lack in scanning by this order, sparse filesystem
> makes the /.vol// option inefficient.
> As for the searchfs option, I haven't seen in the man page any way to
> control the order of the files.
>

The order is arbitrary, as it walks over the btree nodes in the most
efficient manner, and that btree's order can change.

Still, if you want to keep the scan persistent between boots, a persistent
hashmap of the IDs is still the best option, IMO, as this lets you
determine which items you have already seen in a previous scan. Of course,
this means that you'll have to save the entire map regularly (and with
backups in place in case you crash during the save process) to protect
yourself against system crashes during your scan.

Be advised that searchfs is fairly fast if you only want to learn all used
fileID first, without actually learning the paths, yet. You can specify
that you want a large number of fileIDs returned in one go, so you can, for
instance, request chunks of a few 1000 IDs per call, then add them to your
hashmap, and save that map every few seconds if you're concerned about
system crashes (which I find rather unlikely to happen unless your program
runs in environments where this is more likely).

Once you have gathered all IDs (which may be in the range of a million,
easily), you can start requesting each ID's path and file attributes.

Hit me up privately if you need help with using searchfs.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-21 Thread Thomas Tempelmann
>
> 1. Is there any way to extract the current file-id range (minimum to
> maximum fileid).
>

Well, both HFS+ and APFS know the last used FileID and whenever a new node
(file, dir) is created, the last ID + 1 will be used for it. But you cannot
query that value directly (only indirectly, by creating a new file and then
getting its ID).


> 2. I've noticed that there are some gaps in file-id list. meaning that
> some ids aren't connected to files. How can this happen (I assume it's due
> to deleted files), and when creating new file, does it get file-id from the
> lowest available value or the next file-id after the current maximum value.
>

When dirs or files are deleted, those IDs will not be re-used. This is on
purpose, i.e. by design. It allows programs to later find previously
identified files by their ID. Finder Aliases (and NSURL Bookmark files) use
this.


3. I'd like to use an array that each index represent a file-id. Can I
> assume that the file-ids aren't sparse (meaning that the gaps of unused id
> values are small) so that I won't waste too much memory ?
>

Not smart at all. You should rather use a dictionary (hash-map, hashed set)
for this, with the key being the ID.

4. Do you recommend other, more efficient way to iterate through the files
> in order to ascending file-id, other than through the /.vol/ drive ?
>

If you want to traverse all files on a volume, use searchfs (see "man
searchfs") - that's ideal for this purpose. It tells you ever ID, and then
you can look up the file information from that ID.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: gmtime_r bug in High Sierra?

2017-09-10 Thread Thomas Tempelmann
Disregard my previous post. I did some testing on my own and believe it was
just a user (programming) error.

My apologies for the unnecessary alertism.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: gmtime_r bug in High Sierra?

2017-09-10 Thread Thomas Tempelmann
With this comment, I'll cross-post this to the filesystem-dev list, as this
could be a bug in the APFS code, making this a bit urgent with the imminent
release of 10.13

But it could also be that there's a new(?) flag somewhere that you need to
test in order to tell whether the values are in seconds or microseconds.

Maybe someone on the filesystem list could clarify this?

Thomas


On Sun, Sep 10, 2017 at 8:13 AM, Stephane Sudre 
wrote:

> Very interesting. Thanks.
>
> As a matter of fact, the code (and the value sent to gmtime_r) are
> extracted from the xar Apple source code (and when it's run on High
> Sierra):
>
> https://opensource.apple.com/source/xar/xar-357/xar/lib/
> darwinattr.c.auto.html
>
> stragglers_archive
>
> So now, it looks like the problem is earlier in the code and it comes
> from the value returned by getattrlist(2).
>
> If I run the code (*) for a file on a HFS+ volume, the value looks correct.
> If I run the code for a file on a APFS volume, the value looks incorrect.
>
>
> * In conditions specific to the code using the xar code, which is a
> launchd daemon and playing with a file that was very recently created.
>
>
>
>
> On Sun, Sep 10, 2017 at 3:23 AM, Stephen J. Butler
>  wrote:
> > It's not a bug IMHO. The problem is that you don't check the return
> value of
> > gmtime_r(). If you did, you'd see that the result is NULL (which means
> the
> > function failed) and so you shouldn't use any of the results (they're
> > undefined).
> >
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(int argc, const char * argv[]) {
> >
> > char tmpc[128];
> > struct tm tm;
> >
> > memset(tmpc, 0, sizeof(tmpc));
> >
> > __darwin_time_t tValue=2914986787602432000;
> >
> > if (gmtime_r(, )) {
> > strftime(tmpc, sizeof(tmpc), "%FT%T", );
> >
> > printf("time: %s\n",tmpc);
> >
> > printf("-\n");
> > }
> >
> > struct tm tm2;
> > tm2.tm_mon=10240;
> >
> > memset(tmpc, 0, sizeof(tmpc));
> >
> > tValue=2914986787602432000;
> >
> > if (gmtime_r(, )) {
> > strftime(tmpc, sizeof(tmpc), "%FT%T", );
> >
> > printf("time: %s\n",tmpc);
> > }
> >
> > return 0;
> > }
> >
> >
> > But also here, the number you're providing to tValue looks a lot like
> > nanosecond time stamp, and gmtime_r() expects seconds. If you correct
> that,
> > then you get:
> >
> > time: 2062-05-16T06:33:07
> > -
> > time: 2062-05-16T06:33:07
> >
> >
> > On Sat, Sep 9, 2017 at 5:51 PM, Stephane Sudre 
> > wrote:
> >>
> >> Maybe I'm missing something (regarding the tm var) but shouldn't all
> >> the struct tm fields be correctly set by gmtime_r?
> >>
> >> #include 
> >> #include 
> >> #include 
> >> #include 
> >>
> >> int main(int argc, const char * argv[]) {
> >>
> >> char tmpc[128];
> >> struct tm tm;
> >>
> >> memset(tmpc, 0, sizeof(tmpc));
> >>
> >> __darwin_time_t tValue=2914986787602432000;
> >>
> >> gmtime_r(, );
> >> strftime(tmpc, sizeof(tmpc), "%FT%T", );
> >>
> >> printf("time: %s\n",tmpc);
> >>
> >> printf("-\n");
> >>
> >> struct tm tm2;
> >> tm2.tm_mon=10240;
> >>
> >> memset(tmpc, 0, sizeof(tmpc));
> >>
> >> tValue=2914986787602432000;
> >>
> >> gmtime_r(, );
> >> strftime(tmpc, sizeof(tmpc), "%FT%T", );
> >>
> >> printf("time: %s\n",tmpc);
> >>
> >> return 0;
> >> }
> >>
> >> =>
> >>
> >> time: 1900-01-00T18:40:00
> >> -
> >> time: 1900-10241-00T18:40:00
> >>
>
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: searchfs support on APFS

2017-07-25 Thread Thomas Tempelmann
After more testing and communications with Apple engineers and other devs
relying on this function I like to post a summary, in case anyone else is
interested in using the CatSearch operation with APFS:

1. searchfs() is implemented for APFS on 10.12.6 and 10.13b3 (and earlier
versions probably, too).

2. searchfs() has one major issue that makes it quite useless when
searching for file names: It currently *compares case-sensitively*, but
should be comparing case-insensitively. Whether this will get fixed before
the 10.13 release is not clear. I've filed a bug report:
https://openradar.appspot.com/33455597 - *if you care for searchfs support,
please file one as well, adding your vote to have this fixed.*

3. *FSCatalogSearch cannot be used on APFS* as it cannot handle the 64 bit
node IDs APFS uses. Currently, the volparms claim CatalogSearch support,
and calling FSCatalogSearch does not return an error either, but there's an
open bug to fix that (https://openradar.appspot.com/33428180).
Unfortunately, some non-beta versions of macOS, e.g. 10.12.6, have this bug
incorrectly claiming CatSearch support, meaning we'll have to
special-handle this case.

4. searchfs() does not return paths but only names and node IDs (which are
64 bit), e.g. through ATTR_CMN_FILEID. However, *there is currently no
documented way to resolve node IDs to actual file paths or URLs*. (
https://openradar.appspot.com/33507188)

5. searchfs() does not handle the ATTR_CMNEXT_LINKID attribute, meaning *one
cannot learn about the separate dir entries of multiple hard links* to the
same file. (https://openradar.appspot.com/33473247)

I've also got a blog post about this here:
http://blog.tempel.org/2017/07/apfs-and-fast-catalog-search.html

-- 
Thomas Tempelmann, http://www.tempel.org/
Follow me on Twitter: https://twitter.com/tempelorg
Read my blog: http://blog.tempel.org/
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: searchfs support on APFS

2017-07-06 Thread Thomas Tempelmann
I was kindly informed that 10.12 already supports searchfs (and
FSCatalogSearch along with it) on APFS, and after doing some testing myself
it turned out that it's all working as intended, and that I just did the
testing wrong, on my misguided assumptions, along with using an outdated
version of my app for testing.

So, nothing to see here, all is well - and thanks to the File System team
for making the effort to support searchfs on APFS!

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


searchfs support on APFS

2017-07-06 Thread Thomas Tempelmann
Two things:

1.
A long while ago the APFS team at Apple asked for input on what's needed -
I then asked for fast catalog search support, in the ways of the searchfs
BSD function. Has anything happened in that direction?

2.
And then there's a possible issue around reporting support for this
operation (as of 10.13 beta 2):

My search tool unsuccessfully attempts to use FSCatalogSearch on APFS
volumes because it's being told by the API that the APFS volume supports
that operation. Which it doesn't do, though, as my app then gets no results
from the search. I haven't been able to look deeper into this yet, I get
this from customer reports that all have the same problem now after
installing High Sierra and upgrading to APFS.

So, could it be that the volume flags are incorrectly set, indicating that
the file system supports CatalogSearch even though it doesn't? (I'm just
throwing this in as early as possible, but I'll make sure to file a bug
report should my suspicion be confirmed).

-- 
Thomas Tempelmann, http://www.tempel.org/
Follow me on Twitter: https://twitter.com/tempelorg
Read my programming blog: http://blog.tempel.org/
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Updated APFS guide

2017-03-31 Thread Thomas Tempelmann
On Sat, Apr 1, 2017 at 1:32 AM, Brendan Shanks  wrote:

> The Apple File System Guide was updated yesterday with additional info
> about filenames, iOS 10.3, and macOS 10.12.4. Quick summary: no
> normalization, iOS 10.3 is case-sensitive, 10.12.4 now has (beta)
> case-insensitive AFPS.
>
> https://developer.apple.com/library/content/documentation/Fi
> leManagement/Conceptual/APFS_Guide/FAQ/FAQ.html
>
>
Thanks for the pointer.

I have trouble understanding this line:

The case-insensitive variant of APFS is normalization-preserving, but not
> normalization-sensitive."


I assume this is the mode that we now have in iOS 10.3.

I ask myself: What is meant by "sensitive" here? I know about
case-sensitivity. There, on a *non-case-sensitive* HFS system, I can pass
names in any case and they'll all be matched up with the same file name
stored on-disk. Right?

So, analogously, to my understanding, if I use AFPS on iOS, which is *non-*
*normalization-sensitive*, wouldn't it mean that I can pass differently
normalized (e.g. composited and decomposited) names to the FS API, and they
would all match the same file name on disk?

However, that contradicts this part from the same article, doesn't it:

For example, attempting to create a file using one normalization behavior
> and opening that file using another normalization behavior may result in
> ENOENT


Could someone tell me where my error in this logic is?

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


New Disk Utility apparently not using rdisk when saving / restoring partitions

2017-03-31 Thread Thomas Tempelmann
Not relly a file system related thing, but since I'm tired of filing bug
with Apple's bugreporter and mostly just getting red tape thrown at me, I
thought I bring it up here in case someone listens and wants to take
reponsibility:

I found that, when using the new DU, using the Restore operation to copy a
partition from one drive to another, it's unusually slow. Comparing similar
operations in Terminal using the "dd" command makes me believe that DU is
using /dev/diskN to read and write, whereas it should be using /dev/rdisk.

The speed difference is substantial: I had wanted to copy a 750 GB SSD from
one MacBook Pro, connected via Thunderbolt Target Disk Mode to another MBP
with a 1 TB SDD. With Sierra's DU (loaded from the Recovery partition) as
well as with "dd" using "disk", this went along at about with less than 40
MB/s. Performing the same with "dd" and "rdisk", I got to the maximum,
which was about 200 MB/s. That's more than five times slower than it should
be.

I'm going to write a blog article about this over the weekend, too.

-- 
Thomas Tempelmann, http://www.tempel.org/
Follow me on Twitter: https://twitter.com/tempelorg
Read my programming blog: http://blog.tempel.org/
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


How do I lock a volume the way First Aid does?

2017-03-26 Thread Thomas Tempelmann
I am working on a tool that makes an image snapshot of a bootable HFS
volume. If the volume can be unmounted, that's an easy task. But when it's
the boot volume, this doesn't work, of course.

Now, I've seen that DFA in Sierra seems to be able to lock a volume so that
it can verify and possibly repair it.

How does DFA do that? Is there some OS function I could use for that
purpose, too?

-- 
Thomas Tempelmann, http://www.tempel.org/
Follow me on Twitter: https://twitter.com/tempelorg
Read my programming blog: http://blog.tempel.org/
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Strategie for detecting changes to remote network files

2016-08-09 Thread Thomas Tempelmann
My app, a find tool with a file browser, attempts to keep its browser
window updated even if the user renames or moves files via another app,
e.g. in the Finder.

With local volumes, that's fairly easy.

But how shall I do this on remote volumes?

Is it best I perform my own checking by re-fetching the file properties to
detect modifications, or do the advanced tracking methods of OS X also work
well on remote volumes?

E.g., does any of the network file systems, maybe SMB2, support file change
notifications, meaning polling is not necessary, and does OSX make use of
that?

Or would at least OSX notify me when the user on this Mac makes changes to
files on the network volume? (Meaning I could detect changes made by the
user without polling, while I'd still need to poll to detect changes made
from other computers?)

Any advice and insights are appreciated.

-- 
Thomas Tempelmann, http://www.tempel.org/
Follow me on Twitter: https://twitter.com/tempelorg
Read my programming blog: http://blog.tempel.org/
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Determine whether a NFS mount is actually reachable

2015-02-22 Thread Thomas Tempelmann
I did some testing and can confirm that what Jim wrote below only tells me
whether a path has an auto-mount trigger, but not whether it's actually
reachable at the moment.

So far, to tell if the volume is actually available I must try to read its
content. Often I would then get an error returned for off-line vols
immediately, and sometimes the call would take a long time before I get the
error, which means that it tried to contact the server, waiting, and
eventually timing out.

What bothers me is that the automountd daemon maintains a state of
whether the volume is currently accessible or not, and it silently keeps on
checking if the server is reachable. I had hoped that this state would be
communicated thru the BSD APIs, but that doesn't seem to be the case.
Bummer.

Thomas


On Sat, Feb 21, 2015 at 11:56 PM, Jim Luther luthe...@apple.com wrote:

 You can tell if the automounted volume has been auto-unmounted by checking
 to see if it is a trigger again — at a low-level with getattrlist() by
 checking the DIR_MNTSTATUS_TRIGGER bit returned for the
 ATTR_DIR_MOUNTSTATUS attribute, or at a high level by checking the value of
 the NS/kCFURLIsMountTriggerKey resource property value. Accessing a path
 beyond a trigger of opening the trigger directory will cause the automount
 to be mounted.

 However, that doesn’t really answer your question because a mounted
 network volume’s server might not be accessible, and mounting/remounting an
 automount might not find the file server. Other than pinging the server’s
 host, there’s not much you can do to tell the server is reachable (and even
 if it pings OK, the NFS server might not be responsive). Someone else might
 have more/better suggestions.

 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com