Re: memmem issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Corinna Vinschen on 12/20/2007 3:11 AM: | + /* FIXME - this algorithm is worst-case O(l_len*s_len)... | | or what about Boyer-Moore instead: | | http://de.wikipedia.org/wiki/Boyer-Moore-Algorithmus (in German) | | Using one of them is certainly not a licensing violation since all code | examples are more or less the published examples from well-known | textbooks (Knuth, Sedgewick, et al.). Given that, I don't think you're | actually tainted. An actual implementation would be much better than | a forlorn comment in an unimpressive file in some subdirectory. I took you up on that, and submitted an even better implementation to the newlib list, shared among memmem, strstr, and strcasestr (Knuth-Morris-Pratt and Boyer-Moore both require memory allocation, but not Two-Way). If Jeff gives the go-ahead for newlib, then we'll need to delete cygwin's copy of memmem.cc. - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhu6l84KuGfSFAYARAg4WAJ9+8FkRcJlFaFYG/ouvK+4x/VQIlQCeJ03y e9u22aTS92xNaLELTW+otK4= =9rXA -END PGP SIGNATURE-
Re: memmem issues
On Dec 19 21:01, Eric Blake wrote: Index: libc/memmem.cc === RCS file: /cvs/src/src/winsup/cygwin/libc/memmem.cc,v retrieving revision 1.1 diff -u -p -r1.1 memmem.cc --- libc/memmem.cc8 Nov 2005 22:08:39 - 1.1 +++ libc/memmem.cc20 Dec 2007 03:56:35 - @@ -45,8 +45,8 @@ memmem (const void *l, size_t l_len, const char *cs = (const char *)s; /* we need something to compare */ - if (l_len == 0 || s_len == 0) -return NULL; + if (s_len == 0) +return l; Looks like this is actually more correct. Glibc and NetBSD both agree with you, while only the FreeBSD function seems to return NULL in this case. However, it's not quite ok. l is const void * while the function returns void *. I applied this part of your patch with the obvious cast. + /* FIXME - this algorithm is worst-case O(l_len*s_len), but using + Knuth-Morris-Pratt would be O(l_len+s_len) at the expense of a + memory allocation of s_len bytes. Consider rewriting this to + swap over the KMP if the first few iterations fail, and back to + this if KMP can't allocate enough memory. */ for (cur = (char *) cl; cur = last; cur++) if (cur[0] == cs[0] memcmp (cur, cs, s_len) == 0) return cur; What about just using documented example code, for instance from here: http://www-igm.univ-mlv.fr/~lecroq/string/node8.html or here: http://de.wikipedia.org/wiki/Knuth-Morris-Pratt-Algorithmus (in German) or what about Boyer-Moore instead: http://de.wikipedia.org/wiki/Boyer-Moore-Algorithmus (in German) Using one of them is certainly not a licensing violation since all code examples are more or less the published examples from well-known textbooks (Knuth, Sedgewick, et al.). Given that, I don't think you're actually tainted. An actual implementation would be much better than a forlorn comment in an unimpressive file in some subdirectory. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat
Re: memmem issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Eric Blake on 12/19/2007 2:22 PM: memmem isn't standardized, but even so, it has some bugs based on comparison with strstr. memmem(haystack,len,,0) should return haystack, not NULL. This should be an easy one-line fix. memmem currently has O(m*n) worst-case complexity (quadratic, when haystack and needle are both long). But with the Knuth-Morris-Pratt algorithm and a memory allocation of size n, this could be trimmed to O(m+n) (linear). Here's a patch: 2007-12-20 Eric Blake [EMAIL PROTECTED] * libc/memmem.cc (memmem): Fix bug when searching for empty string. Document O(n^2) bug. - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHaekM84KuGfSFAYARArgXAJ9lz/Rk3GnD4i5q4Jy557TzVQoGoQCfb61v 4TUb18LiAQIdCasBx9nwFdw= =5Vbo -END PGP SIGNATURE- Index: libc/memmem.cc === RCS file: /cvs/src/src/winsup/cygwin/libc/memmem.cc,v retrieving revision 1.1 diff -u -p -r1.1 memmem.cc --- libc/memmem.cc 8 Nov 2005 22:08:39 - 1.1 +++ libc/memmem.cc 20 Dec 2007 03:56:35 - @@ -45,8 +45,8 @@ memmem (const void *l, size_t l_len, const char *cs = (const char *)s; /* we need something to compare */ - if (l_len == 0 || s_len == 0) -return NULL; + if (s_len == 0) +return l; /* s must be smaller or equal to l */ if (l_len s_len) @@ -59,6 +59,11 @@ memmem (const void *l, size_t l_len, /* the last position where its possible to find s in l */ last = (char *) cl + l_len - s_len; + /* FIXME - this algorithm is worst-case O(l_len*s_len), but using + Knuth-Morris-Pratt would be O(l_len+s_len) at the expense of a + memory allocation of s_len bytes. Consider rewriting this to + swap over the KMP if the first few iterations fail, and back to + this if KMP can't allocate enough memory. */ for (cur = (char *) cl; cur = last; cur++) if (cur[0] == cs[0] memcmp (cur, cs, s_len) == 0) return cur;