Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-22 Thread Thomas Tempelmann
Eric,

What information are you trying to get out of each scan?  You will always
> have a time-of-use vs. time-of-check race condition here .. the filesystem
> is in a perennial state of flux.
>

That's one thing I was surprised about when using searchfs() on APFS vs.
HFS+: On HFS, I'd frequently get the specific return code telling me that
the btree has changed, meaning I should restart the search to make sure I
don't miss nodes.
However, I never see this with APFS. I did then assume, err, hope, that
searchfs on APFS holds a temporary snapshot, thereby preventing changes to
the searched tree. Are you saying that this is not the case, and even, that
searchfs on APFS doesn't tell me about changed node order when it should?

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-22 Thread Thomas Tempelmann
>
> So far I hadn't had much lack in scanning by this order, sparse filesystem
> makes the /.vol// option inefficient.
> As for the searchfs option, I haven't seen in the man page any way to
> control the order of the files.
>

The order is arbitrary, as it walks over the btree nodes in the most
efficient manner, and that btree's order can change.

Still, if you want to keep the scan persistent between boots, a persistent
hashmap of the IDs is still the best option, IMO, as this lets you
determine which items you have already seen in a previous scan. Of course,
this means that you'll have to save the entire map regularly (and with
backups in place in case you crash during the save process) to protect
yourself against system crashes during your scan.

Be advised that searchfs is fairly fast if you only want to learn all used
fileID first, without actually learning the paths, yet. You can specify
that you want a large number of fileIDs returned in one go, so you can, for
instance, request chunks of a few 1000 IDs per call, then add them to your
hashmap, and save that map every few seconds if you're concerned about
system crashes (which I find rather unlikely to happen unless your program
runs in environments where this is more likely).

Once you have gathered all IDs (which may be in the range of a million,
easily), you can start requesting each ID's path and file attributes.

Hit me up privately if you need help with using searchfs.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-22 Thread Eric Tamura
What information are you trying to get out of each scan?  You will always have 
a time-of-use vs. time-of-check race condition here .. the filesystem is in a 
perennial state of flux. 

Eric


> On 22 Mar 2018, at 12:53 PM, Irad K  wrote:
> 
> Hi and thanks for the invaluable responses. 
> 
> I understood that since I cannot limit the file id range, and that the 
> allocated fileid keeps ascending, it would be better using hash-map. 
> 
> However, My requirements are to be able to scan the volume with reentrancy. 
> This means that if the machine will undergo reboot during the scan, I'd be 
> able to start from this point, by keeping some sort of pivot so I can regain 
> scanning from the point it stopped. 
> 
> If I'll scan the files in ascending fileid order, I can simply keep the last 
> scanned fileid, and upon restart, I can start from this point (since no files 
> added with fileid before that value)... this is one option, it can also be 
> creation date, or any other order which allows me to differentiate between 
> the scanned and unscanned files. 
> 
> Scanning according to directory hierarchy is one example which doesn't 
> satisfy this requirement, since there might be new files creation during the 
> scan, in folders that I already covered. 
> 
> So far I hadn't had much lack in scanning by this order, sparse filesystem 
> makes the /.vol// option inefficient.
> As for the searchfs option, I haven't seen in the man page any way to control 
> the order of the files. 
> 
> Perhaps you can suggest me a proper use of searchfs, or any other efficient 
> way to do this ? 
> 
> Thanks A lot, 
> Irad K
> 
> 
> On Thu, Mar 22, 2018 at 7:08 PM, Kevin Elliott  wrote:
> 
> 
> > On Mar 21, 2018, at 1:38 PM, Irad K  wrote:
> >
> > Eric,
> >
> > Thanks for the info, that can explain the offset since I previously 
> > upgraded the OS from Sierra which uses HFS+ for its root filesystem.
> >
> > The reason that brought me looking into the fileid values, is some file 
> > scanner design I'm currently working on that instead of iterating the files 
> > according to their directory structure (i.e. BFS), I iterate according to 
> > ascending file-id attribute, where I always assumed that the file-id starts 
> > from zero.
> >
> > Using this scanning order, I can halt my scanning and regain to it 
> > according to the last scanned file-id (assuming that I can ignore newly 
> > created files that got file-id value lower than this last scanned value).
> >
> > I'd be happy if you could tell me what is the file-id allocation policy in 
> > APFS or HFS+ in the following aspects
> >
> > 1. Is there any way to extract the current file-id range (minimum to 
> > maximum fileid).
> 
> No, and it’s unlikely that there will ever be one.  Functionally, the most 
> useful way to think about file ID’s is a UUID’s .  Their value implies gives 
> you NO useful information about the file, aside from the fact that it’s a 
> unique ID for the file.
> 
> > 2. I've noticed that there are some gaps in file-id list. meaning that some 
> > ids aren't connected to files. How can this happen (I assume it’s due to 
> > deleted files), and when creating new file, does it get file-id from the 
> > lowest available value or the next file-id after the current maximum value.
> 
> FIle deletion, arbitrary action, dumb luck…  Again, the file ID is 
> effectively a unique identifier for the file.  The fact that APFS HAPPENS to 
> hand them out in a particular order is an artifact of it’s implementation, 
> not a requirement.
> 
> 
> > 3. I’d like to use an array that each index represent a file-id.
> 
> Why?  I can’t think of any real use for doing this.  Thomas’s idea of using a 
> dictionary isn’t a bad one if this is a problem you really need to solve, but 
> this seems like a very strange problem to decide you need to solve.
> 
> 
> > Can I assume that the file-ids aren’t sparse (meaning that the gaps of 
> > unused id values are small) so that I won't waste too much memory ?
> 
> No, not at all.  Over time it would be normal and expected for the list to 
> become quite sparse-  some files are written out early in the drives life and 
> basically never change.  Just looking in my /Applications directory, the 
> Info.plist of the oldest app on my system was last modified on “July 17, 
> 2006”.  A lot of files have come and gone since 2006…
> 
> > 4. Do you recommend other, more efficient way to iterate through the files 
> > in order to ascending file-id, other than through the /.vol/ drive ?
> 
> Well, I’m not sure I’d recommend using .vol either…
> 
> What are you trying to do, and why does iterating by file ID seem like a good 
> way to do it?
> 
> -Kevin
> 
> ___
> Do not post admin requests to the list. They will be ignored.
> Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
> Help/Unsubscribe/Update your Subscription:
> 

Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-22 Thread Kevin Elliott


> On Mar 21, 2018, at 1:38 PM, Irad K  wrote:
> 
> Eric, 
> 
> Thanks for the info, that can explain the offset since I previously upgraded 
> the OS from Sierra which uses HFS+ for its root filesystem. 
> 
> The reason that brought me looking into the fileid values, is some file 
> scanner design I'm currently working on that instead of iterating the files 
> according to their directory structure (i.e. BFS), I iterate according to 
> ascending file-id attribute, where I always assumed that the file-id starts 
> from zero. 
> 
> Using this scanning order, I can halt my scanning and regain to it according 
> to the last scanned file-id (assuming that I can ignore newly created files 
> that got file-id value lower than this last scanned value). 
> 
> I'd be happy if you could tell me what is the file-id allocation policy in 
> APFS or HFS+ in the following aspects 
> 
> 1. Is there any way to extract the current file-id range (minimum to maximum 
> fileid).

No, and it’s unlikely that there will ever be one.  Functionally, the most 
useful way to think about file ID’s is a UUID’s .  Their value implies gives 
you NO useful information about the file, aside from the fact that it’s a 
unique ID for the file.  

> 2. I've noticed that there are some gaps in file-id list. meaning that some 
> ids aren't connected to files. How can this happen (I assume it’s due to 
> deleted files), and when creating new file, does it get file-id from the 
> lowest available value or the next file-id after the current maximum value.

FIle deletion, arbitrary action, dumb luck…  Again, the file ID is effectively 
a unique identifier for the file.  The fact that APFS HAPPENS to hand them out 
in a particular order is an artifact of it’s implementation, not a requirement.


> 3. I’d like to use an array that each index represent a file-id.

Why?  I can’t think of any real use for doing this.  Thomas’s idea of using a 
dictionary isn’t a bad one if this is a problem you really need to solve, but 
this seems like a very strange problem to decide you need to solve.  


> Can I assume that the file-ids aren’t sparse (meaning that the gaps of unused 
> id values are small) so that I won't waste too much memory ? 

No, not at all.  Over time it would be normal and expected for the list to 
become quite sparse-  some files are written out early in the drives life and 
basically never change.  Just looking in my /Applications directory, the 
Info.plist of the oldest app on my system was last modified on “July 17, 2006”. 
 A lot of files have come and gone since 2006…

> 4. Do you recommend other, more efficient way to iterate through the files in 
> order to ascending file-id, other than through the /.vol/ drive ?

Well, I’m not sure I’d recommend using .vol either…

What are you trying to do, and why does iterating by file ID seem like a good 
way to do it?  

-Kevin
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-21 Thread Thomas Tempelmann
>
> 1. Is there any way to extract the current file-id range (minimum to
> maximum fileid).
>

Well, both HFS+ and APFS know the last used FileID and whenever a new node
(file, dir) is created, the last ID + 1 will be used for it. But you cannot
query that value directly (only indirectly, by creating a new file and then
getting its ID).


> 2. I've noticed that there are some gaps in file-id list. meaning that
> some ids aren't connected to files. How can this happen (I assume it's due
> to deleted files), and when creating new file, does it get file-id from the
> lowest available value or the next file-id after the current maximum value.
>

When dirs or files are deleted, those IDs will not be re-used. This is on
purpose, i.e. by design. It allows programs to later find previously
identified files by their ID. Finder Aliases (and NSURL Bookmark files) use
this.


3. I'd like to use an array that each index represent a file-id. Can I
> assume that the file-ids aren't sparse (meaning that the gaps of unused id
> values are small) so that I won't waste too much memory ?
>

Not smart at all. You should rather use a dictionary (hash-map, hashed set)
for this, with the key being the ID.

4. Do you recommend other, more efficient way to iterate through the files
> in order to ascending file-id, other than through the /.vol/ drive ?
>

If you want to traverse all files on a volume, use searchfs (see "man
searchfs") - that's ideal for this purpose. It tells you ever ID, and then
you can look up the file information from that ID.

Thomas
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-21 Thread Irad K
Eric,

Thanks for the info, that can explain the offset since I previously
upgraded the OS from Sierra which uses HFS+ for its root filesystem.

The reason that brought me looking into the fileid values, is some file
scanner design I'm currently working on that instead of iterating the files
according to their directory structure (i.e. BFS), I iterate according to
ascending file-id attribute, where I always assumed that the file-id starts
from zero.

Using this scanning order, I can halt my scanning and regain to it
according to the last scanned file-id (assuming that I can ignore newly
created files that got file-id value lower than this last scanned value).

I'd be happy if you could tell me what is the file-id allocation policy in
APFS or HFS+ in the following aspects

1. Is there any way to extract the current file-id range (minimum to
maximum fileid).

2. I've noticed that there are some gaps in file-id list. meaning that some
ids aren't connected to files. How can this happen (I assume it's due to
deleted files), and when creating new file, does it get file-id from the
lowest available value or the next file-id after the current maximum value.

3. I'd like to use an array that each index represent a file-id. Can I
assume that the file-ids aren't sparse (meaning that the gaps of unused id
values are small) so that I won't waste too much memory ?

4. Do you recommend other, more efficient way to iterate through the files
in order to ascending file-id, other than through the /.vol/ drive ?

Thanks a lot for your help.
Irad,



On Wed, Mar 21, 2018 at 6:53 PM, Eric Tamura  wrote:

> Irad,
>
> This is because your volume went through the HFS -> APFS converter. As a
> side effect of some on-disk APFS format differences from HFS, we need to
> make sure we can differentiate large EAs and resource from data forks. So
> the large EAs/resource forks in APFS retain the original HFS inode number,
> plus 0x1000, since that number can’t possibly be in use in HFS (maximum
> of 32 bits for fileIDs).  This means that all new APFS objects (files,
> directories, EAs, etc) start at 0x2000.   This does mean that we burned
> the range of inode numbers from 0x1 -> 0x1 … however with
> 64 bits of inodes, this was determined to be an acceptable trade-off.
>
> Newly formatted APFS volumes are not subject to this behavior.
>
> Eric
>
>
>
> On Mar 21, 2018, at 8:35 AM, Irad K  wrote:
>
> Hi,
>
> I'm trying to figure out why my files are all allocated with huge fileid.
>
> Setup:
> - root filesystem is formatted to APFS
> - macOS version High Sierra.
>
>
> For some reason, all my files are allocated with fileid that is added to
> offset of `0x2`
>
> For example, create new file `/tmp/1` and getting its attributes will
> produce
>
> -> *stat /tmp/1*
>
> 16777220 *8595046795 <(859)%20504-6795>* -rw-r--r-- 2 demouser wheel
> 0 7237 "Mar 21 16:42:57 2018" "Mar 21 16:42:36 2018" "Mar 21 16:42:36 2018"
> "Mar 21 14:29:12 2018" 4194304 16 0 /tmp/1
>
> (8595046795 <(859)%20504-6795> is 0x2 in hex)
>
>
> Furthermore, if I search for valid inode ABOVE this offset in /.vol/ mount
> drive, I find plenty, but below there are none.
>
> for x in {1..1}; do stat /.vol/${filesystemid}/$((0x2-x)) >>
> /dev/null 2&>1  && echo "fileid $x + 0x2 exit"; done
> fileid 115 + 0x2 exit
> fileid 116 + 0x2 exit
> fileid 139 + 0x2 exit
> fileid 140 + 0x2 exit
> fileid 141 + 0x2 exit
> fileid 142 + 0x2 exit
> fileid 144 + 0x2 exit
> fileid 145 + 0x2 exit
> fileid 146 + 0x2 exit
> fileid 147 + 0x2 exit
>
> fileid 149 + 0x2 exit
>
>
> Is there any rigorous API to retrieve this offset ? where can I find it ?
> looked for in APFS header but didn't find an appropriate field.
>
> thanks
> ___
> Do not post admin requests to the list. They will be ignored.
> Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/filesystem-dev/etamura%40apple.com
>
> This email sent to etam...@apple.com
>
>
>
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: APFS root filesystem. All files' inode id have offset of 0x200000000

2018-03-21 Thread Eric Tamura
Irad,

This is because your volume went through the HFS -> APFS converter. As a side 
effect of some on-disk APFS format differences from HFS, we need to make sure 
we can differentiate large EAs and resource from data forks. So the large 
EAs/resource forks in APFS retain the original HFS inode number, plus 
0x1000, since that number can’t possibly be in use in HFS (maximum of 32 
bits for fileIDs).  This means that all new APFS objects (files, directories, 
EAs, etc) start at 0x2000.   This does mean that we burned the range of 
inode numbers from 0x1 -> 0x1 … however with 64 bits of inodes, 
this was determined to be an acceptable trade-off.

Newly formatted APFS volumes are not subject to this behavior. 

Eric



> On Mar 21, 2018, at 8:35 AM, Irad K  wrote:
> 
> Hi, 
> 
> I'm trying to figure out why my files are all allocated with huge fileid. 
> 
> Setup: 
> - root filesystem is formatted to APFS 
> - macOS version High Sierra.
> 
> 
> For some reason, all my files are allocated with fileid that is added to 
> offset of `0x2`
> 
> For example, create new file `/tmp/1` and getting its attributes will produce 
> 
> -> stat /tmp/1
> 
> 16777220 8595046795 -rw-r--r-- 2 demouser wheel 0 7237 "Mar 21 16:42:57 
> 2018" "Mar 21 16:42:36 2018" "Mar 21 16:42:36 2018" "Mar 21 14:29:12 2018" 
> 4194304 16 0 /tmp/1
> 
> (8595046795 is 0x2 in hex)
> 
> 
> Furthermore, if I search for valid inode ABOVE this offset in /.vol/ mount 
> drive, I find plenty, but below there are none. 
> 
> for x in {1..1}; do stat /.vol/${filesystemid}/$((0x2-x)) >> 
> /dev/null 2&>1  && echo "fileid $x + 0x2 exit"; done
> fileid 115 + 0x2 exit
> fileid 116 + 0x2 exit
> fileid 139 + 0x2 exit
> fileid 140 + 0x2 exit
> fileid 141 + 0x2 exit
> fileid 142 + 0x2 exit
> fileid 144 + 0x2 exit
> fileid 145 + 0x2 exit
> fileid 146 + 0x2 exit
> fileid 147 + 0x2 exit
> fileid 149 + 0x2 exit
> 
> 
> Is there any rigorous API to retrieve this offset ? where can I find it ? 
> looked for in APFS header but didn't find an appropriate field. 
> 
> thanks
> ___
> Do not post admin requests to the list. They will be ignored.
> Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/filesystem-dev/etamura%40apple.com
> 
> This email sent to etam...@apple.com

 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com