Re: disk I/O, VFS hirunningspace

2010-07-16 Thread Alan Cox

Peter Jeremy wrote:

Regarding vfs.lorunningspace and vfs.hirunningspace...

On 2010-Jul-15 13:52:43 -0500, Alan Cox  wrote:
   

Keep in mind that we still run on some fairly small systems with limited I/O
capabilities, e.g., a typical arm platform.  More generally, with the range
of systems that FreeBSD runs on today, any particular choice of constants is
going to perform poorly for someone.  If nothing else, making these sysctls
a function of the buffer cache size is probably better than any particular
constants.
 


That sounds reasonable but brings up a related issue - the buffer
cache.  Given the unified VM system no longer needs a traditional Unix
buffer cache, what is the buffer cache still used for?


Today, it is essentially a mapping cache.  So, what does that mean?

After you've set aside a modest amount of physical memory for the kernel 
to hold its own internal data structures, all of the remaining physical 
memory can potentially be used to cache file data.  However, on many 
architectures this is far more memory than the kernel can 
instantaneously access.  Consider i386.  You might have 4+ GB of 
physical memory, but the kernel address space is (by default) only 1 
GB.  So, at any instant in time, only a fraction of the physical memory 
is instantaneously accessible to the kernel.  In general, to access an 
arbitrary physical page, the kernel is going to have to replace an 
existing virtual-to-physical mapping in its address space with one for 
the desired page.  (Generally speaking, on most architectures, even the 
kernel can't directly access physical memory that isn't mapped by a 
virtual address.)


The buffer cache is essentially a region of the kernel address space 
that is dedicated to mappings to physical pages containing cached file 
data.  As applications access files, the kernel dynamically maps (and 
unmaps) physical pages containing cached file data into this region.  
Once the desired pages are mapped, then read(2) and write(2) can 
essentially "bcopy" from the buffer cache mapping to the application's 
buffer.  (Understand that this buffer cache mapping is a prerequisite 
for the copy out to occur.)


So, why did I call it a mapping cache?  There is generally locality in 
the access to file data.  So, rather than map and unmap the desired 
physical pages on every read and write, the mappings to file data are 
allowed to persist and are managed much like many other kinds of 
caches.  When the kernel needs to map a new set of file pages, it finds 
an older, not-so-recently used mapping and destroys it, allowing those 
kernel virtual addresses to be remapped to the new pages.


So far, I've used i386 as a motivating example.  What of other 
architectures?  Most 64-bit machines take advantage of their large 
address space by implementing some form of "direct map" that provides 
instantaneous access to all of physical memory.  (Again, I use 
"instantaneous" to mean that the kernel doesn't have to dynamically 
create a virtual-to-physical mapping before being able to access the 
data.)  On these machines, you could, in principle, use the direct map 
to implement the "bcopy" to the application's buffer.  So, what is the 
point of the buffer cache on these machines?


A trivial benefit is that the file pages are mapped contiguously in the 
buffer cache.  Even though the underlying physical pages may be 
scattered throughout the physical address space, they are mapped 
contiguously.  So, the "bcopy" doesn't need to worry about every page 
boundary, only buffer boundaries.


The buffer cache also plays a role in the page replacement mechanism.  
Once mapped into the buffer cache, a page is "wired", that is, it 
removed from the paging lists, where the page daemon could reclaim it.  
However, a page in the buffer cache should really be thought of as being 
"active".  In fact, when a page is unmapped from the buffer cache, it is 
placed at the tail of the virtual memory system's "inactive" list.  The 
same place where the virtual memory system would place a physical page 
that it is transitioning from "active" to "inactive".  If an application 
later performs a read(2) from or write(2) to the same page, that page 
will be removed from the "inactive" list and mapped back into the buffer 
cache.  So, the mapping and unmapping process contributes to creating an 
LRU-ordered "inactive" queue.


Finally, the buffer cache limits the amount of dirty file system data 
that is cached in memory.



...  Is the current
tuning formula still reasonable (for virtually all current systems
it's basically 10MB + 10% RAM)?


It's probably still good enough.  However, this is not a statement for 
which I have supporting data.  So, I reserve the right to change my 
opinion.  :-)


Consider what the buffer cache now does.  It's just a mapping cache.  
Increasing the buffer cache size doesn't affect (much) the amount of 
physical memory available for caching file data.  So, unlike ancient 
times, increa

Re: strange problem with int64_t variables

2010-07-16 Thread Gabor Kovesdan

Em 2010.07.16. 12:17, pluknet escreveu:

Almost the same (#__jid_t#jid_t#).

   
Did you have to include any header for that? IIRC, I used __jid_t 
because it didn't compile with jid_t.

The difference (and probably a trigger of bug elsewhere) might be in
that this lives
on amd64 arch (while yours on i386 afair). Just a food for thoughts.
   
Yes, and now I also added the entries to 
sys/compat/freebsd32/syscalls.master, although I think it's just for 
32-bit binaries on 64-bit systems but let's see if that helps.


Thanks for your help.

Gabor

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: strange problem with int64_t variables

2010-07-16 Thread pluknet
On 16 July 2010 01:42, Gabor Kovesdan  wrote:
> Em 2010.07.13. 16:05, pluknet escreveu:
>>
>> #ifndef _SYS_SYSPROTO_H_
>> struct setjlimit_args {
>>         jid_t   jid;
>>         int     resource;
>>         struct rlimit *rlp;
>> };
>> #endif
>> int
>> setjlimit(td, uap)
>>         struct thread *td;
>>         struct setjlimit_args /* {
>>                 jid_t   jid;
>>                 int     resource;
>>                 struct rlimit *rlp;
>>         } */ *uap;
>> {
>>
>>         printf("%s called\n", __FUNCTION__);
>>
>>         printf("resource: %d\n", uap->resource);
>>         if (uap->resource>= JLIM_NLIMITS) {
>>                 td->td_retval[0] = -1;
>>                 return (EINVAL);
>>         }
>>         return (0);
>> }
>>
>
> Thanks for trying this out. I still couldn't find the problem. Is this
> generate code? I mean the prototype of the function. I'm using C99 syntax
> and I manually added the implementation, the generated code what I'm using
> is just what make sysent generated. Besides, the generated code in
> sysproto.h is different from this struct that you have here, there are
> padding members, as well:
>
> +struct setjlimit_args {
> +       char jid_l_[PADL_(__jid_t)]; __jid_t jid; char
> jid_r_[PADR_(__jid_t)];
> +       char resource_l_[PADL_(int)]; int resource; char
> resource_r_[PADR_(int)];
> +       char rlp_l_[PADL_(struct rlimit *)]; struct rlimit * rlp; char
> rlp_r_[PADR_(struct rlimit *)];
> +};
>

>
> And what do you have in syscalls.master? Is it the same as I have?
>
> +527AUE_NULLSTD { int setjlimit(__jid_t jid, int resource, \
> +   struct rlimit *rlp); }

Almost the same (#__jid_t#jid_t#).

struct setjlimit_args {
char jid_l_[PADL_(jid_t)]; jid_t jid; char jid_r_[PADR_(jid_t)];
char resource_l_[PADL_(int)]; int resource; char
resource_r_[PADR_(int)];
char rlp_l_[PADL_(struct rlimit *)]; struct rlimit * rlp; char
rlp_r_[PADR_(struct rlimit *)];
};

526 AUE_NULLSTD { int setjlimit(jid_t jid, int resource, \
struct rlimit *rlp); }

The difference (and probably a trigger of bug elsewhere) might be in
that this lives
on amd64 arch (while yours on i386 afair). Just a food for thoughts.

-- 
wbr,
pluknet
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: disk I/O, VFS hirunningspace

2010-07-16 Thread Peter Jeremy
Regarding vfs.lorunningspace and vfs.hirunningspace...

On 2010-Jul-15 13:52:43 -0500, Alan Cox  wrote:
>Keep in mind that we still run on some fairly small systems with limited I/O
>capabilities, e.g., a typical arm platform.  More generally, with the range
>of systems that FreeBSD runs on today, any particular choice of constants is
>going to perform poorly for someone.  If nothing else, making these sysctls
>a function of the buffer cache size is probably better than any particular
>constants.

That sounds reasonable but brings up a related issue - the buffer
cache.  Given the unified VM system no longer needs a traditional Unix
buffer cache, what is the buffer cache still used for?  Is the current
tuning formula still reasonable (for virtually all current systems
it's basically 10MB + 10% RAM)?  How can I measure the effectiveness
of the buffer cache?

The buffer cache size is also very tightly constrained (vfs.hibufspace
and vfs.lobufspace differ by 64KB) and at least one of the underlying
tuning parameters have comments at variance with current reality:
In :

 * MAXBSIZE -   Filesystems are made out of blocks of at most MAXBSIZE bytes
 *  per block.  MAXBSIZE may be made larger without effecting
...
 *
 * BKVASIZE -   Nominal buffer space per buffer, in bytes.  BKVASIZE is the
...
 *  The default is 16384, roughly 2x the block size used by a
 *  normal UFS filesystem.
 */
#define MAXBSIZE65536   /* must be power of 2 */
#define BKVASIZE16384   /* must be power of 2 */

There's no mention of the 64KiB limit in newfs(8) and I recall seeing
occasional comments from people who have either tried or suggested
trying larger blocksizes.  Likewise, the default UFS blocksize has
been 16KiB for quite a while.  Are the comments still valid and, if so,
should BKVASIZE be doubled to 32768 and a suitable note added to newfs(8)
regarding the maximum block size?

-- 
Peter Jeremy


pgp33d0jx50sK.pgp
Description: PGP signature