Re: memmem issues

2008-01-10 Thread Eric Blake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Corinna Vinschen on 12/20/2007 3:11 AM:
| +  /* FIXME - this algorithm is worst-case O(l_len*s_len)...
|
| or what about Boyer-Moore instead:
|
|   http://de.wikipedia.org/wiki/Boyer-Moore-Algorithmus (in German)
|
| Using one of them is certainly not a licensing violation since all code
| examples are more or less the published examples from well-known
| textbooks (Knuth, Sedgewick, et al.).  Given that, I don't think you're
| actually tainted.  An actual implementation would be much better than
| a forlorn comment in an unimpressive file in some subdirectory.

I took you up on that, and submitted an even better implementation to the
newlib list, shared among memmem, strstr, and strcasestr
(Knuth-Morris-Pratt and Boyer-Moore both require memory allocation, but
not Two-Way).  If Jeff gives the go-ahead for newlib, then we'll need to
delete cygwin's copy of memmem.cc.

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHhu6l84KuGfSFAYARAg4WAJ9+8FkRcJlFaFYG/ouvK+4x/VQIlQCeJ03y
e9u22aTS92xNaLELTW+otK4=
=9rXA
-END PGP SIGNATURE-


Re: memmem issues

2007-12-20 Thread Corinna Vinschen
On Dec 19 21:01, Eric Blake wrote:
 Index: libc/memmem.cc
 ===
 RCS file: /cvs/src/src/winsup/cygwin/libc/memmem.cc,v
 retrieving revision 1.1
 diff -u -p -r1.1 memmem.cc
 --- libc/memmem.cc8 Nov 2005 22:08:39 -   1.1
 +++ libc/memmem.cc20 Dec 2007 03:56:35 -
 @@ -45,8 +45,8 @@ memmem (const void *l, size_t l_len,
const char *cs = (const char *)s;
  
/* we need something to compare */
 -  if (l_len == 0 || s_len == 0)
 -return NULL;
 +  if (s_len == 0)
 +return l;

Looks like this is actually more correct.  Glibc and NetBSD both
agree with you, while only the FreeBSD function seems to return NULL
in this case.  However, it's not quite ok.  l is const void * while the
function returns void *.  I applied this part of your patch with the
obvious cast.

 +  /* FIXME - this algorithm is worst-case O(l_len*s_len), but using
 + Knuth-Morris-Pratt would be O(l_len+s_len) at the expense of a
 + memory allocation of s_len bytes.  Consider rewriting this to
 + swap over the KMP if the first few iterations fail, and back to
 + this if KMP can't allocate enough memory.  */
for (cur = (char *) cl; cur = last; cur++)
  if (cur[0] == cs[0]  memcmp (cur, cs, s_len) == 0)
return cur;

What about just using documented example code, for instance from here:

  http://www-igm.univ-mlv.fr/~lecroq/string/node8.html

or here:

  http://de.wikipedia.org/wiki/Knuth-Morris-Pratt-Algorithmus (in German)

or what about Boyer-Moore instead:

  http://de.wikipedia.org/wiki/Boyer-Moore-Algorithmus (in German)

Using one of them is certainly not a licensing violation since all code
examples are more or less the published examples from well-known
textbooks (Knuth, Sedgewick, et al.).  Given that, I don't think you're
actually tainted.  An actual implementation would be much better than
a forlorn comment in an unimpressive file in some subdirectory.


Thanks,
Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat


Re: memmem issues

2007-12-19 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Eric Blake on 12/19/2007 2:22 PM:
 memmem isn't standardized, but even so, it has some bugs based on comparison 
 with strstr.
 
 memmem(haystack,len,,0) should return haystack, not NULL.  This should be 
 an 
 easy one-line fix.
 
 memmem currently has O(m*n) worst-case complexity (quadratic, when haystack 
 and 
 needle are both long).  But with the Knuth-Morris-Pratt algorithm and a 
 memory 
 allocation of size n, this could be trimmed to O(m+n) (linear).

Here's a patch:

2007-12-20  Eric Blake  [EMAIL PROTECTED]

* libc/memmem.cc (memmem): Fix bug when searching for empty
string.  Document O(n^2) bug.

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHaekM84KuGfSFAYARArgXAJ9lz/Rk3GnD4i5q4Jy557TzVQoGoQCfb61v
4TUb18LiAQIdCasBx9nwFdw=
=5Vbo
-END PGP SIGNATURE-
Index: libc/memmem.cc
===
RCS file: /cvs/src/src/winsup/cygwin/libc/memmem.cc,v
retrieving revision 1.1
diff -u -p -r1.1 memmem.cc
--- libc/memmem.cc  8 Nov 2005 22:08:39 -   1.1
+++ libc/memmem.cc  20 Dec 2007 03:56:35 -
@@ -45,8 +45,8 @@ memmem (const void *l, size_t l_len,
   const char *cs = (const char *)s;
 
   /* we need something to compare */
-  if (l_len == 0 || s_len == 0)
-return NULL;
+  if (s_len == 0)
+return l;
 
   /* s must be smaller or equal to l */
   if (l_len  s_len)
@@ -59,6 +59,11 @@ memmem (const void *l, size_t l_len,
   /* the last position where its possible to find s in l */
   last = (char *) cl + l_len - s_len;
 
+  /* FIXME - this algorithm is worst-case O(l_len*s_len), but using
+ Knuth-Morris-Pratt would be O(l_len+s_len) at the expense of a
+ memory allocation of s_len bytes.  Consider rewriting this to
+ swap over the KMP if the first few iterations fail, and back to
+ this if KMP can't allocate enough memory.  */
   for (cur = (char *) cl; cur = last; cur++)
 if (cur[0] == cs[0]  memcmp (cur, cs, s_len) == 0)
   return cur;