Re: Mounting cd9660 multiple times gives EBUSY [Was: unionfs a little improvement]
On Wed, Aug 18, 2010 at 12:48:53PM +0200, Ed Schouten wrote: Hi Daichi, I think Keith Packard of Xorg once wrote a commit message along the lines of 5000 lines of code removed, feature added This seems to be similar, albeit on a smaller scale. ;-) Apart from this issue with unionfs, I am also experiencing another issue, where for some reason I cannot perform a second mount of the CD right after booting the system. Basically, my WIP FreeBSD boot CD does the following (but written in C): mount -t cd9660 /dev/iso9660/freebsd /mnt mount -t tmpfs none /tmp mount -t unionfs /tmp /mnt mount -t devfs none /mnt/dev chroot /mnt /sbin/init The first step fails with EBUSY. I use the following hack to get it working, but I don't think it's the proper way to solve it: What you are trying to do here is to mount /dev/iso9660/freebsd for the second time? This is not supported. The check is there to prevent doing this, as it will panic on you when you try to unmount first mount (not really a problem in your case, as the first mount is /, so you probably don't want to unmount it, but it is a problem in general). You should be able to reproduce the panic with your patch applied by doing the following: # mount -t cd9660 /dev/iso9660/freebsd /mnt0 # mount -t cd9660 /dev/iso9660/freebsd /mnt1 # umount /mnt0 -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp88NLmz310d.pgp Description: PGP signature
Re: Interpreted language(s) in the base
Gabor PALI p...@freebsd.org writes: Dag-Erling Smørgrav d...@des.no writes: Gabor PALI p...@freebsd.org writes: Sorry for chiming in, just a quick idea. If you find the get a high-level language that compiled to C idea good, I don't think it's a good idea Could you be more specific on your concerns? I am just curious. If we want compiled code, we already have C and C++. What we need is an interpreted language with good libraries so people can write scripts and one-liners without having to jump through too many hoops and worry about quoting. The result may not be as fast as a compiled program, but it will be much faster than a shell script and take less time to write than either C or shell. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: CD/DVD ejecting after sysinstall
Randi Harper sek...@gmail.com writes: Rui Paulo rpa...@freebsd.org writes: You are correct. We should not be ejecting the CD without a prompt. If the commit is reverted, it should be explicitly noted in the code so that we don't do this mistake again. That's a judgement call, not an absolute. I think what we are doing isn't a problem for 99.999% of use cases. On the rare occasions where I use sysinstall, I usually find that prompt annoying... but I almost broke a CD drive once by ejecting the tray with the enclosure's dust cover half closed. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: meory file system
gahn ipfr...@yahoo.com writes: I am running 8.1. under /dev, I don't see /dev/md0, /dev/md0 won't show up until you actually run mdconfig. so i am trying to add following lines in kernel file and got error messages: options MFS #Memory Filesystem The correct line is device md, but mdconfig(8) will automatically load the module, so you don't need it. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
Mike Haertel m...@ducky.net writes: GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character. Boyer-Moore is for fixed search strings. I don't see how that optimization can work with a regexp search unless the regexp is so simple that you break it down into a small number of cases with known length and final character. GNU grep uses raw Unix input system calls and avoids copying data after reading it. Yes, that was the first thing we looked at (and fixed) in BSD grep. Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte! Amen. The current bottleneck in BSD grep is the memchr() that looks for '\n' in the input buffer. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [head tinderbox] failure on powerpc64/powerpc
Nathan Whitehorn nwhiteh...@freebsd.org writes: Dag-Erling Smørgrav d...@des.no writes: I'm not sure I understand what you mean (or rather, how it would help the tinderbox). What *would* help would be an easy way to determine, *before* trying to build it, whether a specific kernel config is appropriate for a specific target. Can you think of an easier way to do this than to scan the config for the machine line? That's exactly what I proposed. You use config, before trying the build, to look up the machine specification for the config file. I sent you a 5 line patch to tinderbox.pl that does this by private email. Here's a solution that works regadless of config(8) version, though I'm not sure it qualifies as either easy or clean: Index: tinderbox.pl === RCS file: /home/projcvs/projects/tinderbox/tinderbox.pl,v retrieving revision 1.68 diff -u -r1.68 tinderbox.pl --- tinderbox.pl25 Aug 2009 17:28:14 - 1.68 +++ tinderbox.pl22 Aug 2010 12:08:46 - @@ -722,10 +722,29 @@ } # Build additional kernels + kernel: foreach my $kernel (sort(keys(%kernels))) { if (! -f $srcdir/sys/$machine/conf/$kernel) { warning(no kernel config for $kernel); - next; + next kernel; + } + # Hack: check that the config is appropriate for this target. + # If no machine declaration is present, assume that it is. + local *KERNCONF; + if (open(KERNCONF, , $srcdir/sys/$machine/conf/$kernel)) { + while (KERNCONF) { + next unless m/^machine\s+(\w+(?:\s+\w+)?)\s*(?:\#.*)?$/; + if ($1 !~ m/^\Q$machine\E(\s+\Q$arch\E)?$/) { + warning(skipping $kernel); + close(KERNCONF); + next kernel; + } + last; + } + close(KERNCONF); + } else { + warning($kernel: $!); + next kernel; } logstage(building $kernel kernel); logenv(); It will break if the machine declaration ever moves into an included file, since it does not follow include statements, but it will do for now. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Interpreted language(s) in the base
Am 22.08.2010 13:21, schrieb Dag-Erling Smørgrav: Gabor PALI p...@freebsd.org writes: Dag-Erling Smørgrav d...@des.no writes: Gabor PALI p...@freebsd.org writes: Sorry for chiming in, just a quick idea. If you find the get a high-level language that compiled to C idea good, I don't think it's a good idea Could you be more specific on your concerns? I am just curious. If we want compiled code, we already have C and C++. What we need is an interpreted language with good libraries so people can write scripts and one-liners without having to jump through too many hoops and worry about quoting. The result may not be as fast as a compiled program, but it will be much faster than a shell script and take less time to write than either C or shell. Looks a bit like a swing. First we remove Perl from the base system (years ago) and move to sed/awk, now we discuss using a scripting language in the base system... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[head tinderbox] failure on powerpc64/powerpc
TB --- 2010-08-22 11:13:56 - tinderbox 2.6 running on freebsd-current.sentex.ca TB --- 2010-08-22 11:13:56 - starting HEAD tinderbox run for powerpc64/powerpc TB --- 2010-08-22 11:13:56 - cleaning the object tree TB --- 2010-08-22 11:14:44 - cvsupping the source tree TB --- 2010-08-22 11:14:44 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca /tinderbox/HEAD/powerpc64/powerpc/supfile TB --- 2010-08-22 11:15:15 - building world TB --- 2010-08-22 11:15:15 - MAKEOBJDIRPREFIX=/obj TB --- 2010-08-22 11:15:15 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2010-08-22 11:15:15 - TARGET=powerpc TB --- 2010-08-22 11:15:15 - TARGET_ARCH=powerpc64 TB --- 2010-08-22 11:15:15 - TZ=UTC TB --- 2010-08-22 11:15:15 - __MAKE_CONF=/dev/null TB --- 2010-08-22 11:15:15 - cd /src TB --- 2010-08-22 11:15:15 - /usr/bin/make -B buildworld World build started on Sun Aug 22 11:15:15 UTC 2010 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything stage 5.1: building 32 bit shim libraries World build completed on Sun Aug 22 12:58:14 UTC 2010 TB --- 2010-08-22 12:58:14 - generating LINT kernel config TB --- 2010-08-22 12:58:14 - cd /src/sys/powerpc/conf TB --- 2010-08-22 12:58:14 - /usr/bin/make -B LINT TB --- 2010-08-22 12:58:14 - building LINT kernel TB --- 2010-08-22 12:58:14 - MAKEOBJDIRPREFIX=/obj TB --- 2010-08-22 12:58:14 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2010-08-22 12:58:14 - TARGET=powerpc TB --- 2010-08-22 12:58:14 - TARGET_ARCH=powerpc64 TB --- 2010-08-22 12:58:14 - TZ=UTC TB --- 2010-08-22 12:58:14 - __MAKE_CONF=/dev/null TB --- 2010-08-22 12:58:14 - cd /src TB --- 2010-08-22 12:58:14 - /usr/bin/make -B buildkernel KERNCONF=LINT Kernel build for LINT started on Sun Aug 22 12:58:14 UTC 2010 stage 1: configuring the kernel stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3.1: making dependencies stage 3.2: building everything Kernel build for LINT completed on Sun Aug 22 13:24:10 UTC 2010 TB --- 2010-08-22 13:24:10 - building GENERIC kernel TB --- 2010-08-22 13:24:10 - MAKEOBJDIRPREFIX=/obj TB --- 2010-08-22 13:24:10 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2010-08-22 13:24:10 - TARGET=powerpc TB --- 2010-08-22 13:24:10 - TARGET_ARCH=powerpc64 TB --- 2010-08-22 13:24:10 - TZ=UTC TB --- 2010-08-22 13:24:10 - __MAKE_CONF=/dev/null TB --- 2010-08-22 13:24:10 - cd /src TB --- 2010-08-22 13:24:10 - /usr/bin/make -B buildkernel KERNCONF=GENERIC Kernel build for GENERIC started on Sun Aug 22 13:24:10 UTC 2010 stage 1: configuring the kernel stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3.1: making dependencies stage 3.2: building everything [...] /src/sys/dev/ofw/ofw_standard.c:705: warning: cast to pointer from integer of different size /src/sys/dev/ofw/ofw_standard.c: In function 'ofw_std_release': /src/sys/dev/ofw/ofw_standard.c:719: warning: cast from pointer to integer of different size /src/sys/dev/ofw/ofw_standard.c:724: warning: cast from pointer to integer of different size /src/sys/dev/ofw/ofw_standard.c: In function 'ofw_std_enter': /src/sys/dev/ofw/ofw_standard.c:742: warning: cast from pointer to integer of different size /src/sys/dev/ofw/ofw_standard.c: In function 'ofw_std_exit': /src/sys/dev/ofw/ofw_standard.c:760: warning: cast from pointer to integer of different size *** Error code 1 Stop in /obj/powerpc.powerpc64/src/sys/GENERIC. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2010-08-22 13:29:24 - WARNING: /usr/bin/make returned exit code 1 TB --- 2010-08-22 13:29:24 - ERROR: failed to build GENERIC kernel TB --- 2010-08-22 13:29:24 - 5654.18 user 1511.02 system 8127.36 real http://tinderbox.freebsd.org/tinderbox-head-HEAD-powerpc64-powerpc.full ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: softupdate with journal panic
On Sat, Aug 21, 2010 at 01:49:45PM -0400, Michael Butler wrote: While updating sysutils/coreutils port on -current as of this morning (SVN r211550), I noted a panic during the directory rename config test. Your problem seems identical to this report: http://docs.freebsd.org/cgi/mid.cgi?AANLkTinPjiOV21kDLZYV5WScrhLMN7DY8E8jVHWPU5mC - Peter ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[CFT] Improved ZFS metaslab code (faster write speed)
Dear FreeBSD community, many of our [2] (and Solaris [3]) users today are complaining about slow ZFS writes. One of the causes for these writes is the selection of the proper allocation method for allocation of new blocks [3] [4]. Another issue a write slowdown during TXG sync times. Solaris 10 (and OpenSolaris up to november 2009) have the following scenario: - pool has more than 30% free space: use first fit method [1] - pool has less than 30% free space: use best fit method [1] This causes a major slowdown of the writes if we go below 30% of free space. On large pools, 30% may be terabytes of free space. OpenSolaris has changed this in November 2009 and the Oracle Storage Appliances also included the new code in Q1/2010 [1]. The source [1] states, that with this change they archieved a speedup of: 50% Improved OLTP Performance, 70% Reduced Variability, 200% Improvement on MS Exchange I would like to issue a Call For Testing for the following 9-CURRENT patch: http://people.freebsd.org/~mm/patches/zfs/zfs_metaslab.patch To apply the patch against 8-STABLE, you need to apply the v15 update first: http://people.freebsd.org/~mm/patches/zfs/v15/stable-8-v15.patch The patch includes the following OpenSolaris onnv revisions: 10921 (partial), 11146, 11728, 12047 And covers the following Bug IDs: 6826241 Sync write IOPS drops dramatically during TXG sync 6869229 zfs should switch to shiny new metaslabs more frequently 6917066 zfs block picking can be improved 6918420 zdb -m has issues printing metaslab statistics References: [1] http://blogs.sun.com/roch/entry/doubling_exchange_performance [2] http://forums.freebsd.org/showthread.php?t=8270 [3] http://blogs.everycity.co.uk/alasdair/2010/07/zfs-runs-really-slowly-when-free-disk-usage-goes-above-80/ [4] http://blogs.sun.com/bonwick/entry/zfs_block_allocation [5] http://blogs.sun.com/bonwick/entry/space_maps ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Interpreted language(s) in the base
Matthias Andree mand...@freebsd.org writes: Looks a bit like a swing. First we remove Perl from the base system (years ago) and move to sed/awk, now we discuss using a scripting language in the base system... Read the discussion from the beginning. We are discussing introducing a domain-specific scripting language, not a general-purpose one. BTW, most of the Perl scripts we had were rewritten in C, not sed / awk. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
In article 86k4nikglg@ds4.des.no you write: Mike Haertel m...@ducky.net writes: GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character. Boyer-Moore is for fixed search strings. I don't see how that optimization can work with a regexp search unless the regexp is so simple that you break it down into a small number of cases with known length and final character. The common case of regexps used in the grep utility (and, for obvious reasons, universal in the fgrep utility) is fixed-length search strings. Even non-fixed-length regexps typically consist of one one or two variable-length parts. Matching a completely variable-length regexp is just hard, computationally, so it's OK for it to be slower. There are other tricks you can do, such as turning the anchors ^ and $ into explicit newlines in your search -- ^foo is a very common regexp to search for, and it's really a fixed-string search for \nfoo which is entirely amenable to the B-M treatment. You just have to remember that a matched newline isn't part of the result. The GNU regexp library also uses the Boyer-Moore (or is it Boyer-Moore-Gosper?) strategy. -GAWollman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] Improved ZFS metaslab code (faster write speed)
2010/8/22 Martin Matuska m...@freebsd.org: Dear FreeBSD community, many of our [2] (and Solaris [3]) users today are complaining about slow ZFS writes. One of the causes for these writes is the selection of the proper allocation method for allocation of new blocks [3] [4]. Another issue a write slowdown during TXG sync times. Solaris 10 (and OpenSolaris up to november 2009) have the following scenario: - pool has more than 30% free space: use first fit method [1] - pool has less than 30% free space: use best fit method [1] This causes a major slowdown of the writes if we go below 30% of free space. On large pools, 30% may be terabytes of free space. OpenSolaris has changed this in November 2009 and the Oracle Storage Appliances also included the new code in Q1/2010 [1]. The source [1] states, that with this change they archieved a speedup of: 50% Improved OLTP Performance, 70% Reduced Variability, 200% Improvement on MS Exchange I would like to issue a Call For Testing for the following 9-CURRENT patch: http://people.freebsd.org/~mm/patches/zfs/zfs_metaslab.patch To apply the patch against 8-STABLE, you need to apply the v15 update first: http://people.freebsd.org/~mm/patches/zfs/v15/stable-8-v15.patch This one doesn't apply cleanly since few minutes : # svn log -l 1 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c r211599 | avg | 2010-08-22 10:18:32 +0200 (Dim 22 aoû 2010) | 7 lignes Fix a mismerge in r211581, MFC of r210427 This is a direct commit. Reported by:many Pointyhat to: avg But it does not seem hard to correct. Do you want me to submit an updated patch for 8-stable ? The patch includes the following OpenSolaris onnv revisions: 10921 (partial), 11146, 11728, 12047 And covers the following Bug IDs: 6826241 Sync write IOPS drops dramatically during TXG sync 6869229 zfs should switch to shiny new metaslabs more frequently 6917066 zfs block picking can be improved 6918420 zdb -m has issues printing metaslab statistics References: [1] http://blogs.sun.com/roch/entry/doubling_exchange_performance [2] http://forums.freebsd.org/showthread.php?t=8270 [3] http://blogs.everycity.co.uk/alasdair/2010/07/zfs-runs-really-slowly-when-free-disk-usage-goes-above-80/ [4] http://blogs.sun.com/bonwick/entry/zfs_block_allocation [5] http://blogs.sun.com/bonwick/entry/space_maps ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: oliv...@gid0.org - against HTML email vCards X www: http://www.gid0.org - against proprietary attachments / \ Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
2010/8/22 Dag-Erling Smørgrav d...@des.no: Amen. The current bottleneck in BSD grep is the memchr() that looks for '\n' in the input buffer. FYI I actually have a rewritten memchr() which is faster than the current one here: http://people.freebsd.org/~delphij/for_review/memchr.c Review/comments welcome. I've done some preliminary validation/benchmark on this but still need to compare it with some hand optimized assembler implementations that I have seen and see if it's worthy. Cheers, -- Xin LI delp...@delphij.net http://www.delphij.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] Improved ZFS metaslab code (faster write speed)
Thank you, I have updated the v15 patch for 8-STABLE. Dňa 22. 8. 2010 17:44, Olivier Smedts wrote / napísal(a): 2010/8/22 Martin Matuska m...@freebsd.org: Dear FreeBSD community, many of our [2] (and Solaris [3]) users today are complaining about slow ZFS writes. One of the causes for these writes is the selection of the proper allocation method for allocation of new blocks [3] [4]. Another issue a write slowdown during TXG sync times. Solaris 10 (and OpenSolaris up to november 2009) have the following scenario: - pool has more than 30% free space: use first fit method [1] - pool has less than 30% free space: use best fit method [1] This causes a major slowdown of the writes if we go below 30% of free space. On large pools, 30% may be terabytes of free space. OpenSolaris has changed this in November 2009 and the Oracle Storage Appliances also included the new code in Q1/2010 [1]. The source [1] states, that with this change they archieved a speedup of: 50% Improved OLTP Performance, 70% Reduced Variability, 200% Improvement on MS Exchange I would like to issue a Call For Testing for the following 9-CURRENT patch: http://people.freebsd.org/~mm/patches/zfs/zfs_metaslab.patch To apply the patch against 8-STABLE, you need to apply the v15 update first: http://people.freebsd.org/~mm/patches/zfs/v15/stable-8-v15.patch This one doesn't apply cleanly since few minutes : # svn log -l 1 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c r211599 | avg | 2010-08-22 10:18:32 +0200 (Dim 22 aoû 2010) | 7 lignes Fix a mismerge in r211581, MFC of r210427 This is a direct commit. Reported by:many Pointyhat to: avg But it does not seem hard to correct. Do you want me to submit an updated patch for 8-stable ? The patch includes the following OpenSolaris onnv revisions: 10921 (partial), 11146, 11728, 12047 And covers the following Bug IDs: 6826241 Sync write IOPS drops dramatically during TXG sync 6869229 zfs should switch to shiny new metaslabs more frequently 6917066 zfs block picking can be improved 6918420 zdb -m has issues printing metaslab statistics References: [1] http://blogs.sun.com/roch/entry/doubling_exchange_performance [2] http://forums.freebsd.org/showthread.php?t=8270 [3] http://blogs.everycity.co.uk/alasdair/2010/07/zfs-runs-really-slowly-when-free-disk-usage-goes-above-80/ [4] http://blogs.sun.com/bonwick/entry/zfs_block_allocation [5] http://blogs.sun.com/bonwick/entry/space_maps ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote: Mike Haertel m...@ducky.net writes: GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character. Boyer-Moore is for fixed search strings. I don't see how that optimization can work with a regexp search unless the regexp is so simple that you break it down into a small number of cases with known length and final character. When I was working on making FreeGrep faster (years ago), I wrote down a few notes about possible algorithms, especially those that could be useful for fgrep functionality. I am just passing these onto the list. Some algorithms: 1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm 2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm 3. GNU fgrep: Commentz-Walter 4. GLIMPSE: http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant) Also, this may be a useful book: http://www.dcc.uchile.cl/~gnavarro/FPMbook/ Sean -- s...@freebsd.org___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
* Mike Haertel m...@ducky.net wrote: Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte! I think that implementing a simple fgrep boils down to mmap()ing a file and calling memmem() on the mapping to search for the input string. Of course this relies on having an efficient memmem() implementation, for example using one of the algorithms mentioned in this thread. -- Ed Schouten e...@80386.nl WWW: http://80386.nl/ pgp3jSdL5D2SW.pgp Description: PGP signature
Re: why GNU grep is fast
On Aug 22, 2010, at 8:02 AM, Garrett Wollman wrote: In article 86k4nikglg@ds4.des.no you write: Mike Haertel m...@ducky.net writes: GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character. Boyer-Moore is for fixed search strings. I don't see how that optimization can work with a regexp search unless the regexp is so simple that you break it down into a small number of cases with known length and final character. The common case of regexps used in the grep utility (and, for obvious reasons, universal in the fgrep utility) is fixed-length search strings. Even non-fixed-length regexps typically consist of one one or two variable-length parts. This is an important point: A good grep implementation will use different strategies depending on the input regexp. Fixed-string matching is a very important special case. Matching a completely variable-length regexp is just hard, computationally, See Russ Cox' articles for why this is not true. It does require considerable sophistication to build an efficient DFA but the actual matcher, once built, can run very fast indeed: http://swtch.com/~rsc/regexp/ Tim ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
In article 20100822163644.gu2...@hoeg.nl you write: I think that implementing a simple fgrep boils down to mmap()ing a file and calling memmem() on the mapping to search for the input string. Of course this relies on having an efficient memmem() implementation, for example using one of the algorithms mentioned in this thread. It's actually more complicated than that, because you have to ensure that you are not matching the middle of a multibyte character, when the current locale specifies a character set with a multibyte encoding. Likewise when searching for the newlines that delimit the matched line. (I'm not sure whether FreeBSD supports any character encodings that would be ambiguous in that way.) I don't think this was considered an issue when Mike Haertel was developing GNU grep. It seems reasonable to implement BMG or some other fast search in memmem(). Note that if you can't (or don't want to) mmap the whole file at once, you'll need special handling for the boundary conditions -- both at the string search level and at the search for line delimiters. This is much easier in the fgrep case, obviously, since the length of the query puts a finite upper bound on the amount of the old buffer you need to keep -- with regexps you really need your regexp engine to be able to report its matching state, or else limit your input to strictly conforming POSIX text files (i.e., line lengths limited to {LINE_MAX}). -GAWollman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [head tinderbox] failure on powerpc64/powerpc
On 08/22/10 07:10, Dag-Erling Smørgrav wrote: Nathan Whitehornnwhiteh...@freebsd.org writes: Dag-Erling Smørgravd...@des.no writes: I'm not sure I understand what you mean (or rather, how it would help the tinderbox). What *would* help would be an easy way to determine, *before* trying to build it, whether a specific kernel config is appropriate for a specific target. Can you think of an easier way to do this than to scan the config for the machine line? That's exactly what I proposed. You use config, before trying the build, to look up the machine specification for the config file. I sent you a 5 line patch to tinderbox.pl that does this by private email. Here's a solution that works regadless of config(8) version, though I'm not sure it qualifies as either easy or clean: Index: tinderbox.pl === RCS file: /home/projcvs/projects/tinderbox/tinderbox.pl,v retrieving revision 1.68 diff -u -r1.68 tinderbox.pl --- tinderbox.pl25 Aug 2009 17:28:14 - 1.68 +++ tinderbox.pl22 Aug 2010 12:08:46 - @@ -722,10 +722,29 @@ } # Build additional kernels + kernel: foreach my $kernel (sort(keys(%kernels))) { if (! -f $srcdir/sys/$machine/conf/$kernel) { warning(no kernel config for $kernel); - next; + next kernel; + } + # Hack: check that the config is appropriate for this target. + # If no machine declaration is present, assume that it is. + local *KERNCONF; + if (open(KERNCONF, , $srcdir/sys/$machine/conf/$kernel)) { + while (KERNCONF) { + next unless m/^machine\s+(\w+(?:\s+\w+)?)\s*(?:\#.*)?$/; + if ($1 !~ m/^\Q$machine\E(\s+\Q$arch\E)?$/) { + warning(skipping $kernel); + close(KERNCONF); + next kernel; + } + last; + } + close(KERNCONF); + } else { + warning($kernel: $!); + next kernel; } logstage(building $kernel kernel); logenv(); It will break if the machine declaration ever moves into an included file, since it does not follow include statements, but it will do for now. Thanks! I think we are pretty likely to stay in the situation where this hack works for the foreseeable future. -Nathan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
On Aug 22, 2010, at 9:30 AM, Sean C. Farley wrote: On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote: Mike Haertel m...@ducky.net writes: GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character. Boyer-Moore is for fixed search strings. I don't see how that optimization can work with a regexp search unless the regexp is so simple that you break it down into a small number of cases with known length and final character. When I was working on making FreeGrep faster (years ago), I wrote down a few notes about possible algorithms, especially those that could be useful for fgrep functionality. I am just passing these onto the list. Some algorithms: 1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm 2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm 3. GNU fgrep: Commentz-Walter 4. GLIMPSE: http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant) Also, this may be a useful book: http://www.dcc.uchile.cl/~gnavarro/FPMbook/ And of course, Russ Cox' excellent series of articles starting at: http://swtch.com/~rsc/regexp/regexp1.html Later on, he summarizes some of the existing implementations, including comments about the Plan 9 implementation and his own RE2, both of which efficiently handle international text (which seems to be a major concern of Gabor's). The key comment in Mike's GNU grep notes is the one about not breaking into lines. That's simply double-scanning the input; instead, run the matcher over blocks of text and, when it finds a match, work backwards from the match to find the appropriate line beginning. This is efficient because most lines don't match. Boyer-Moore is great for fixed strings (a very common use case for grep) and for more complex patterns that contain long fixed strings (helps to discard most lines early). Sophisticated regex matchers implement a number of strategies and choose different ones depending on the pattern. In the case of bsdgrep, it might make sense to use the regex library for the general case but implement a hand-tuned search for fixed strings that can be heavily optimized for that case. Of course, international text support complicates the picture; you have to consider the input character set (if you want to auto-detect Unicode encodings by looking for leading BOMs, for example, you either need to translate the fixed-string pattern to match the input encoding or vice-versa). Tim ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
Dag-Erling Sm�rgrav d...@des.no writes: Mike Haertel m...@ducky.net writes: GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character. Boyer-Moore is for fixed search strings. I don't see how that optimization can work with a regexp search unless the regexp is so simple that you break it down into a small number of cases with known length and final character. GNU grep uses heuristics to look for a fixed string that any string matching the regex *must* contain, and uses that fixed string as the bases for its initial Boyer-Moore search. For example if your regex is /foo.*bar/, the initial Boyer-Moore search is (probably) searching for foo. If the initial search succeeds, GNU grep isolates the containing line, and then runs the full regex matcher on that line to make sure. This is the sort of thing that a good regex library could do internally. Unfortunately, you can'd do this with a library that conforms to the !...@#%$!@#% POSIX regex API. The problem is that regexec()'s interface is based on NUL-terminated strings, rather than byte-counted buffers. So POSIX regexec() is necessarily character-at-a-time, because it has to look for that input-terminating NUL byte, and also you can't use it to search binary data that might contain NULs. (GNU grep works fine with arbitrary input files, as long as it can allocate enough memory to hold the longest line.) For these reasons a good grep implementation is pretty muched doomed to bundle its own regex matcher. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
Sean C. Farley s...@freebsd.org writes: Some algorithms: 1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm Aho-Corasick is not really a search algorithm, but an algorithm for constructing a table-driven finite state machine that will match either of the search strings you fed it. I believe it is less efficient than Boyer-Moore for small numbers of search terms, since it scans the entire input. I don't see the point in using it in grep, because grep already has an algorithm for constructing finite state machines: regcomp(3). 2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm It doesn't seem to compare favorably to the far older Aho-Corasick. It uses slightly less memory, but memory is usually not an issue with grep. 4. GLIMPSE: http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant) Glimpse is a POS... and not really comparable, because grep is designed to search for a single search string in multiple texts, while glimpse is designed to search a large amount of text over and over with different search strings. I believe it uses suffix tables to construct its index, and Boyer-Moore only to locate specific matches, since the index lists only files, not exact positions. For anything other than fixed strings, it reverts to agrep, but I assume (I haven't looked at the code) that if the regexp has one or more fixed components, it uses those to narrow the search space before running agrep. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
Mike Haertel m...@ducky.net writes: For example if your regex is /foo.*bar/, the initial Boyer-Moore search is (probably) searching for foo. If the initial search succeeds, GNU grep isolates the containing line, and then runs the full regex matcher on that line to make sure. You don't really need to isolate the containing line unless you have an actual match, do you? There are two cases: 1) The regexp does not use any character classes, including /./, so the FSA will stop if it hits EOL before it reaches an accepting state. 2) The regexp uses character classes, and you rewrite them to exclude \n: /[^bar]/ becomes /[^bar\n]/, /./ becomes /[^\n]/, etc., and the FSA will stop if it hits EOL before it reaches an accepting state. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
Dag-Erling Smørgrav d...@des.no writes: Mike Haertel m...@ducky.net writes: For example if your regex is /foo.*bar/, the initial Boyer-Moore search is (probably) searching for foo. If the initial search succeeds, GNU grep isolates the containing line, and then runs the full regex matcher on that line to make sure. You don't really need to isolate the containing line unless you have an actual match, do you? There are two cases: Theoretically no. However, suppose the pattern was /foo.*blah/. The Boyer-Moore search will be for blah, since that's the longest fixed substring. But verifying a match for the full regexp either requires a regexp matcher with the feature start here, at THIS point in the middle of the RE and THAT point in the middle of the buffer, and match backwards and forwards, or else running a more standard RE matcher starting from the beginning of the line. So, in practice you pretty much have to at least search backwards for the preceding newline. As you mentioned, you can avoid searching forwards for the next newline if your RE matcher supports using newline as an exit marker. But if the workload characteristics are that matching lines are scarce compared to the input, this is an optimization that just won't matter much either way. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote: Sean C. Farley s...@freebsd.org writes: Some algorithms: 1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm Aho-Corasick is not really a search algorithm, but an algorithm for constructing a table-driven finite state machine that will match either of the search strings you fed it. I believe it is less efficient than Boyer-Moore for small numbers of search terms, since it scans the entire input. I don't see the point in using it in grep, because grep already has an algorithm for constructing finite state machines: regcomp(3). especially those that could be useful for fgrep functionality I was mainly talking about algorithms useful for the fgrep portion within FreeGrep. fgrep would run (still runs?) over the same text for each pattern. Therefore, Aho–Corasick had to be mentioned for the reason referenced within the link: The Aho–Corasick string matching algorithm formed the basis of the original Unix command fgrep. 2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm It doesn't seem to compare favorably to the far older Aho-Corasick. It uses slightly less memory, but memory is usually not an issue with grep. I agree, yet I like to keep alternative algorithms in mind in case a variant would be useful. 4. GLIMPSE: http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant) Glimpse is a POS... and not really comparable, because grep is designed to search for a single search string in multiple texts, while glimpse is designed to search a large amount of text over and over with different search strings. I believe it uses suffix tables to construct its index, and Boyer-Moore only to locate specific matches, since the index lists only files, not exact positions. For anything other than fixed strings, it reverts to agrep, but I assume (I haven't looked at the code) that if the regexp has one or more fixed components, it uses those to narrow the search space before running agrep. Glimpse may be a POS; I have not used it personally. I only noted its algorithm for possible use within fgrep. Of course, there may be much better algorithms out there to boost fgrep's speed, but these are what I had found at one time. Sean -- s...@freebsd.org___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: why GNU grep is fast
On Sun, 22 Aug 2010, Tim Kientzle wrote: On Aug 22, 2010, at 9:30 AM, Sean C. Farley wrote: On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote: Mike Haertel m...@ducky.net writes: GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character. Boyer-Moore is for fixed search strings. I don't see how that optimization can work with a regexp search unless the regexp is so simple that you break it down into a small number of cases with known length and final character. When I was working on making FreeGrep faster (years ago), I wrote down a few notes about possible algorithms, especially those that could be useful for fgrep functionality. I am just passing these onto the list. Some algorithms: 1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm 2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm 3. GNU fgrep: Commentz-Walter 4. GLIMPSE: http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant) Also, this may be a useful book: http://www.dcc.uchile.cl/~gnavarro/FPMbook/ And of course, Russ Cox' excellent series of articles starting at: http://swtch.com/~rsc/regexp/regexp1.html I saved that link from an E-mail earlier because it looked very interesting. Later on, he summarizes some of the existing implementations, including comments about the Plan 9 implementation and his own RE2, both of which efficiently handle international text (which seems to be a major concern of Gabor's). I believe Gabor is considering TRE for a good replacement regex library. The key comment in Mike's GNU grep notes is the one about not breaking into lines. That's simply double-scanning the input; instead, run the matcher over blocks of text and, when it finds a match, work backwards from the match to find the appropriate line beginning. This is efficient because most lines don't match. I do like the idea. Boyer-Moore is great for fixed strings (a very common use case for grep) and for more complex patterns that contain long fixed strings (helps to discard most lines early). Sophisticated regex matchers implement a number of strategies and choose different ones depending on the pattern. That is what fastgrep (in bsdgrep) attempts to accomplish with very simply regex lines (beginning of line, end of line and dot). In the case of bsdgrep, it might make sense to use the regex library for the general case but implement a hand-tuned search for fixed strings that can be heavily optimized for that case. Of course, international text support complicates the picture; you have to consider the input character set (if you want to auto-detect Unicode encodings by looking for leading BOMs, for example, you either need to translate the fixed-string pattern to match the input encoding or vice-versa). BTW, the fastgrep portion of bsdgrep is my fault/contribution to do a faster search bypassing the regex library. :) It certainly was not written with any encodings in mind; it was purely ASCII. As I have not kept up with it, I do not know if anyone improved it or not. Sean -- s...@freebsd.org___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
runaway intr problems: powerd and/or hw.acpi.cpu.cx_lowest related
Thanks to help from Andriy I've been working on narrowing down the cause of my runaway intr problems and we've found some interesting things. First, if I use neither powerd nor set hw.acpi.cpu.cx_lowest less than C1 things seem to work fine. Using one or the other sort of works, but between the 2 powerd seems to cause the most problems. However, the more interesting thing is that generally the problem seems to be caused by contention on IRQ 20 between the following: 20 (ehci0) 20 (uhci0) 20 (hpet0) If I set the following in loader.conf: kern.eventtimer.timer1=i8254 kern.eventtimer.timer2=RTC Then everything works (where everything is 40 minutes or so of watching a video that previously caused the runaway problem consistently in about 10-20 minutes, although in the past it sometimes took hours to manifest). Or, if I build a kernel with no USB (so IRQ 20 is no longer shared) then once again, everything works (as above) using: kern.eventtimer.timer1: LAPIC kern.eventtimer.timer2: HPET (I.e., the default) I also got another interesting set of data today from a runaway intr situation that did not involve swi:4. The symptoms were the same as previously, but the devices involved were totally different. This may have to do with the fact that I switched back to ULE for the testing today, and/or I hadn't set cx_lowest=C3. http://people.freebsd.org/~dougb/intr-out-3.txt This was with ULE + USB in the kernel, LAPIC/HPET, cx_lowest=C1, but running powerd with the following: powerd_flags=-a adaptive -b adaptive -n adaptive ... and so it goes, Doug -- Improve the effectiveness of your Internet presence with a domain name makeover!http://SupersetSolutions.com/ Computers are useless. They can only give you answers. -- Pablo Picasso ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org