Re: Is fdisk broken?

2013-03-22 Thread Bruce Evans

On Fri, 22 Mar 2013 mla_str...@att.net wrote:


I recently bought a 4 TB usb disk drive and discovered that it reported
a sector size of 4096 bytes instead of the traditional 512 bytes.  This
is apparently necessary because there may be a 32 bit sector number field
somewhere in the usb mass storage protocols.  It turns out that disk
drive manufacturers have been producing disks with large sector sizes
for some years now.  The feature goes by the name Advanced Format and
other things.  Look it up in Wikipedia.

FreeBSD seems to use the sector size information when interpreting MBR
partition offsets and sizes.  Unfortunately, when I try to use fdisk to
print out the partition table on my new disk drive, fdisk just says
fdisk: could not detect sector size.


It has the following gratuitous breakage at 2K for its probe of the
sector size:

#define MAX_SEC_SIZE 2048   /* maximum section size that is supported */
 
#define MIN_SEC_SIZE 512/* the sector size to start sensing at */

I used 64K for the probe maximum limit when I fixed fsck_msdosfs
(fsck_msdosfs doesn't has a probe and only supports sector sizes of
512 in -current).

Most file systems in FreeBSD have gratuitous limits on the size in their
probe for there superblock, but the limit is mostly larger than 4K.
Most of them don't need to know the sector size and don't have a probe,
but they read a fixed size that is larger than their superblock size,
so they fail if this size is smaller than the the sector size.


Otherwise the MBR partition
table seems to work correctly and newfs seems to have done the right
thing.  (It made the file system fragment size a multiple of the sector
size and I am not getting any weird error messages out of the disk
driver.)  It would be nice if fdisk also worked.  I do have to share
the disk with other operating systems that might not understand other
partition table schemes.

Is may analysis of what is going on essentially correct?
Can fdisk be made happy again?  (At least for a few more years?)


Changing the above should fix fdisk for FreeBSD.  A sector size of
4K gives a limit of 16TB for the partition table data structure,
which is enough for a few more years with single disks.  After that,
double the sector size to 8K to work for another year or two.

However, to share the disk you need all the other operating systems and
BIOS to agree that _this_ partition table scheme (with units of 4K
sectors) is what the partition table records.

Bruce
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: kernel profiling: spinlock_exit consumes 36% CPU time.

2008-10-08 Thread Bruce Evans

On Tue, 7 Oct 2008, John Baldwin wrote:


On Tuesday 07 October 2008 07:44:00 am  wrote:

Hi, folks,

I did kernel profiling when a single thread client sends UDP packets to a
single thread server on the same machine.

In the output kernel profile, the first few kernel functions that consumes
the most CPU time are listed below:

granularity: each sample hit covers 16 byte(s) for 0.01% of 25.68 seconds

  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 42.4  10.8810.880  100.00%   __mcount [1]
 36.1  20.14 9.26 17937541 0.00 0.00  spinlock_exit [4]
  4.2  21.22 1.08  3145728 0.00 0.00  in_cksum_skip [40]
  1.8  21.68 0.45  7351987 0.00 0.00  generic_copyin [43]
  1.1  21.96 0.29  3146028 0.00 0.00  generic_copyout [48]
  1.0  22.21 0.24  2108904 0.00 0.00  Xint0x80_syscall [3]
  0.8  22.42 0.21  6292131 0.00 0.00  uma_zalloc_arg [46]
  0.8  22.62 0.20  1048576 0.00 0.00  soreceive_generic [9]

It is very strange that spinlock_exit consumes over 36% CPU time while it
seems a very simple function.


It's because the intr_restore() re-enables interrupts and the resulting time
spent executing the handlers for any pending interrupts are attributed to
spinlock_exit().


This is one of many defects that are not present in high resolution
kernel profiling (kgmon -B instead of kgmon -b; availaible on amd64
and i386).  However, high resolution kernel profiling doesn't work
right with SMP, and was completely broken by gcc-4.  Ordinary profiling
was less completely broken by gcc-4, and you can recover the old
behaviour by turning off new optimizations (mainly -funit-at-a-time
and/or -finline-functions-called-once and or all of -O2).

Bruce___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Realtek 8111B LAN Chipset

2008-01-20 Thread Bruce Evans

On Sat, 19 Jan 2008, Greg Mars wrote:

I'm buying parts for a computer and want to make sure that the core 
components are as freebsd friendly as possible. So far, I've decided on a 
core 2 quad q6600 and I'm choosing the motherboard now.


Me2 (unless I wait for a newer generation of CPUs).

However it seems many 
of the popular motherboards have Realtek ALC888 as built-in audio and Realtek 
8111B as built-in LAN.

I read at:

http://www.freebsd.org/relnotes/CURRENT/hardware/i386/article.html

that the sound should work but I couldn't find any info on the LAN.
Does anyone on the list have any experience with it?

By the way, I'm going to run FreeBSD 7.


I also want a cheap PCI/e NIC that works well with drivers back to
FreeBSD-4 like my plain PCI bge and em NICs do.  I doubt that any
popular motherboard will have anything better than a cheap PCI/e NIC.

Bruce
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: (S)ATA performance in FBSD 6.2/7.0

2007-03-02 Thread Bruce Evans

On Fri, 2 Mar 2007, Brooks Davis wrote:


Also, you should time the actual copy and do the math to verify that
vmstat is actually producing valid results.  It's possible there's a bug
in vmstat or the underlying statistics it uses.


There is certainly a bug in the underlying statistics.  For ATA disks,
at least with the ata driver, the maximum transfer size in DMA mode
is 64K, so any reports of a block size of 128K for SATA disks are
wrong.  The block size of 128K reported by vmstat is actually a virtual
size.  For most or types of disks, the GEOM layer virtualizes the
physical maximum size MAXPHYS = 128K so that layers above GEOM including
statistics gathering and file systems cannot see the physical size.
For writing large files, this normally confuses ffs and vfs clustering
into producing contiguous writes of 128K.  This is good for efficiency,
but it is not what the hardware sees or what you want for statistics.
The contiguous writes of 128K get split up into 2 sequential writes
of 64K.

However, 64K is more than large enough for efficiency, so the bug in
the underlying statistics doesn't matter, at least if vmstat reports
only 128K blocks.  If it reported 64K-blocks then you would have to
worry about the contiguous block sizes being a mixture of 128K and
much smaller blocks, with the much smaller blocks (actually, more
the seeks across gaps to get to the smaller blocks) being very
inefficient.

Bruce
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: very big files on cd9660 file system

2005-08-19 Thread Bruce Evans

On Fri, 19 Aug 2005, Mikhail Teterin wrote:


I have a cd9660 image with several files on it. One of the files is very large
(above 4Gb). When I mount the image, the size of this file is shown as
realsize % 4Gb -- 758876749 bytes instead of 5053844045.

What should I blame:

1) The software, that created the image (modified mkisofs)
2) cd9660 part of the FreeBSD kernel
3) ISO-9660 standard


Mostly (b).  Sizes are 64 bits in the standard, but FreeBSD has always
silently discarded the highest 32 bits and corrupted the next highest
bit to a sign bit, so the file size limit is at most 2GB or 4GB
(depending on whether the sign bit gets corrupted back to a value bit).

From cd9660_vfsops.c:


%   ip-i_size = isonum_733(isodir-size);

This reads the size from the directory entry.


From iso.h:


%   u_char size [ISODCL (11, 18)]; /* 733 */

This says that the size is in bytes 11-18 (option base 1) in the directory
entry.  All 733 entries are 8 bytes.  The others are for other sizes and
the extent (the starting block number for a file).

% static __inline int
% isonum_733(p)
%   u_char *p;
% {
%   return *p|(p[1]  8)|(p[2]  16)|(p[3]  24);
% }

This says that the the highest 32 bits are discarded for all 733 entries
and the sign bit in p[3] is corrupted, first by shifting it and then by
assigning the result to an int.

i_size has type long, unlike in most file systems in FreeBSD where it is
uint64_t or uint32_t, so I think the sign bit stays corrupted but doesn't
cause further problems by being converted to 33 top unsigned bits, giving
a limit of 2GB.

The file size limit is hit before the others.  31-bit block numbers with
2K-blocks work up to 4TB.  There are likely to be overflow bugs at 1TB
before the 4TB limit is hit.

We still have the even closer limit of 4GB on media sizes.  From
cd9660_node.c:

% ino_t
% isodirino(isodir, imp)
%   struct iso_directory_record *isodir;
%   struct iso_mnt *imp;
% {
%   ino_t ino;
% 
% 	ino = (isonum_733(isodir-extent) + isonum_711(isodir-ext_attr_length))

%  imp-im_bshift;
%   return (ino);
% }

This fakes the inode number as the byte offset of the directory entry.
ino_t is uint32_t, so this fails if the byte offset exceeds 4GB.  The
eventual 32nd bit overflows to become a sign bit in the shift but then
gets overflows back to a correct bit in the assignent, so offsets
between 2GB and 4GB work accidentally.

Since the limit is on the offsets of directory entries, media larger
than 4GB can be used for cd9660 under FreeBSD iff all directroy entries
are below the limit, which happens automatically for the non-multi-session
case only.

See revs.1.77 and 1.99 for other bugs caused by isodirino().

Bruce
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Anyway to extract a large file from EXT2FS filesystem?

2004-02-17 Thread Bruce Evans
On Tue, 17 Feb 2004, Kris Kennaway wrote:

 5BOn Tue, Feb 17, 2004 at 11:16:50AM +0100, Stefan Krantz wrote:
 
  On Tue, 17 Feb 2004, Kris Kennaway wrote:
 
   On Tue, Feb 17, 2004 at 10:49:47AM +0100, Stefan Krantz wrote:
   
Hi!
   
I would like to extract a large (11GB) tar file on an ext3 filesystem. But
it shows only to be about 3gb large:
   
yabba# ls -la pictures.tar
-rw-r--r--  1 root  wheel  3317055488 Feb 15 19:03 pictures.tar
   
Is there any possible way to extract the file?
  
   It shouldn't be appearing truncated.  Are you certain that this size
   is incorrect, and the file has a different size when viewed from
   another OS?
 
  Yes. Yesterday I tested the archive with tar tvf (11gb) in
  Linux and it tested OK. In FBSD it says unexpected EOF.
 
  If I could i would just boot linux and split the file. But I can nolonger
  boot linux =/ (migrated to fbsd 5.2 ;).

 I'm CC'ing tjr and bde, who might have some idea about the problem.

ext2fs under FreeBSD is missing support for files larger than Linux's
old limit of 4GB.  Fixing this should be relatively easy (start by
using i_size_high when converting the Linux disk inode to a FreeBSDish
in-core inode).  I don't have any patches for this.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Anyway to extract a large file from EXT2FS filesystem?

2004-02-17 Thread Bruce Evans
On Tue, 17 Feb 2004, Tim Robbins wrote:

  5BOn Tue, Feb 17, 2004 at 11:16:50AM +0100, Stefan Krantz wrote:
 I would like to extract a large (11GB) tar file on an ext3 filesystem. But
 it shows only to be about 3gb large:

 yabba# ls -la pictures.tar
 -rw-r--r--  1 root  wheel  3317055488 Feb 15 19:03 pictures.tar

 Is there any possible way to extract the file?

 Try this patch and let me know how it goes. You'll have to specify
 the file name of /sys/gnu/ext2fs/ext2_inode_cnv.c to patch(1) manually,
 then either buildkernel or rebuild only ext2fs.ko. If the file shows
 up with the correct size in a directory listing, make sure you can actually
 read data past 4 GB.

  //depot/user/tjr/freebsd-tjr/src/sys/gnu/ext2fs/ext2_inode_cnv.c#1 - 
 /p4/tjr/src/sys/gnu/ext2fs/ext2_inode_cnv.c 
 @@ -77,6 +77,8 @@
   */
   ip-i_mode = ei-i_links_count ? ei-i_mode : 0;
   ip-i_size = ei-i_size;
 + if (S_ISREG(ip-i_mode))
 + ip-i_size |= ((u_int64_t)ei-i_size_high)  32;
   ip-i_atime = ei-i_atime;
   ip-i_mtime = ei-i_mtime;
   ip-i_ctime = ei-i_ctime;
 @@ -112,6 +114,8 @@
*/
   ei-i_dtime = ei-i_links_count ? 0 : ip-i_mtime;
   ei-i_size = ip-i_size;
 + if (S_ISREG(ip-i_mode))
 + ei-i_size_high = ip-i_size  32;
   ei-i_atime = ip-i_atime;
   ei-i_mtime = ip-i_mtime;
   ei-i_ctime = ip-i_ctime;


The feature stuff needs to be handled for writing.

The feature stuff is slightly broken for reading.  Large file support is
a read-only compatibility feature (it is indicated by the
EXT2_FEATURE_RO_COMPAT_LARGE_FILE flag in the s_feature_ro_compat field
in the superblock), but we didn't support it without the first hunk in
the above patch so we should have rejected even r/o mounts of file systems
that have this flag set.  We only reject r/w mounts of such file systems.
I suppose this isn't a problem in Linux implementations of ext2fs because
implementations that don't support large files in ext2fs don't support
large files anywhere, so files larger than the old limit of 4GB are handled
as correctly as possible at read time so their presence need not prevent
mounting.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Anyway to extract a large file from EXT2FS filesystem?

2004-02-17 Thread Bruce Evans
On Wed, 18 Feb 2004, Tim Robbins wrote:

 On Wed, Feb 18, 2004 at 11:37:26AM +1100, Bruce Evans wrote:
  The feature stuff needs to be handled for writing.

 I discovered that a few minutes after posting the patch :-) I decided to
 take the lazy way out for now and to return EFBIG if we would need to
 upgrade the filesystem to EXT2_DYNAMIC_REV or set ..._RO_COMPAT_LARGE_FILE.
 I think what's most important here is being able to read large files
 from Linux ext2 filesystems, and I don't like the current ext2 code
 enough to implement superblock updating etc.

The ext2 code seems to do a little more than necessary.  Anyway, we
shouldn't copy it, to keep the the superblock update parts of FreeBSD's
ext2fs free of the copyleft :-).

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: problems with filesystems 1TB

2004-01-22 Thread Bruce Evans
On Wed, 21 Jan 2004, Eric wrote:

 i have been trying (for many moons now) to create a filesystem larger than
 1TB. I've had a variety of RAID controllers in my boxes, and I have 250GB
 drives, so it adds up quick. I've also tried doing this with vinum, but
 that fails too.

 i've searched for help on this topic, and i've found lots of info, but
 nothing substantial. I've read everything from it being a sysinstall
 issue, to needing new versions of the CLI tools (newfs, dd, disklabel), to
 newfs using the wrong variable type to store fssize, to having to update
 to fbsd 5.x to use UFS2.

This requires FreeBSD-5.x and either UFS2 or fixing an overflow bug
in UFS1 (and possibly other bugs).  Only file system sizes much larger
than 1TB require UFS2 (UFS1 starts losing at 4TB but but can handle
128TB (poorly)).

FreeBSD 4.x has a limit of 2^31 blocks of size 512 for i/o.  This gives
a limit of 1TB.

UFS1 has a limit of 2^31 blocks of size fs block size.  I forget
whether relevant block size is what is called the block size or the
fragment size in newfs.  Probably the latter; I will assume this in
the following examples.  This gives the same limit of 1TB if the
fragment size is 512.  However, with the default fragment size of 2K
the limit is 4TB, and UFS1 can reasonably support a few more doublings
of the file system size using a few more doublings of the fragment
size.  However2, UFS1 has an overflow bug converting fs block numbers
to i/o block numbers.  Overflow occurs at i/o block number 2^31 so
there is the same 1TB limit as in systems that have a limit of 2^31
on the i/o block number.

 Other reports say it's a softlimit imposed
 somewhere, some say to make the frag size in newfs to 1024B for a 2TB max
 volume, it has to be dedicated, it has to be non-dedicated... the list of
 suggestions goes on and on.

For UFS1, this only works in FreeBSD-5.x with an overflow bug (and
possibly other bugs) fixed.

Fix for an overflow bug:

%%%
Index: fs.h
===
RCS file: /home/ncvs/src/sys/ufs/ffs/fs.h,v
retrieving revision 1.40
diff -u -2 -r1.40 fs.h
--- fs.h16 Nov 2003 07:08:27 -  1.40
+++ fs.h16 Nov 2003 11:30:26 -
@@ -491,5 +491,5 @@
  * This maps filesystem blocks to device size blocks.
  */
-#define fsbtodb(fs, b) ((b)  (fs)-fs_fsbtodb)
+#definefsbtodb(fs, b)  ((daddr_t)(b)  (fs)-fs_fsbtodb)
 #definedbtofsb(fs, b)  ((b)  (fs)-fs_fsbtodb)

%%%

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Followup to fallback to PIO mode on dual processor AMD systems

2003-01-02 Thread Bruce Evans
On Thu, 2 Jan 2003, Bruce Campbell wrote:

 At present, I don't suspect bad media because the error message is
 WRITE command timeout tag=0 serv=0 which doesn't suggest a specific
 sector/track etc, and running with UDMA33 instead of UDMA100 makes the problem
 appear to vanish.

The fallback is clearly wrong because it turns isolated media errors
into pessimized i/o for the whole disk at best, system hangs during
resets next best, and system crashes at worst.  I keep a disk with bad
media on line for testing some of this, and zap the fallback using the
following patch (hope this is complete; it was edited from a larger
patch).

%%%
Index: ata-disk.c
===
RCS file: /home/ncvs/src/sys/dev/ata/ata-disk.c,v
retrieving revision 1.139
diff -u -2 -r1.139 ata-disk.c
--- ata-disk.c  17 Dec 2002 16:26:22 -  1.139
+++ ata-disk.c  18 Dec 2002 01:03:37 -
@@ -597,5 +606,5 @@
else {
ata_dmainit(adp-device, ata_pmode(adp-device-param), -1, -1);
-   printf( falling back to PIO mode\n);
+   printf( NOT falling back to PIO mode\n);
}
TAILQ_INSERT_HEAD(adp-device-channel-ata_queue, request, chain);
@@ -603,4 +612,5 @@
}

+#if 0
/* if using DMA, try once again in PIO mode */
if (request-flags  ADR_F_DMA_USED) {
@@ -613,4 +623,5 @@
return ATA_OP_FINISHED;
}
+#endif

request-flags |= ADR_F_ERROR;
%%%

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-questions in the body of the message