[Rd] R 2.7.0, match() and strings containing \0 - bug?

2008-04-28 Thread Jon Clayden
Hi,

A piece of my code that uses readBin() to read a certain file type is
behaving strangely with R 2.7.0. This seems to be because of a failure
to match() strings after using rawToChar() when the original was
terminated with a \0 character. Direct equality testing with ==
still works as expected. I can reproduce this as follows:

 x - foo
 y - c(charToRaw(foo),as.raw(0))
 z - rawToChar(y)
 z==x
[1] TRUE
 z==foo
[1] TRUE
 z %in% c(foo,bar)
[1] FALSE
 z %in% c(foo,bar,foo\0)
[1] FALSE

But without the nul character it works fine:

 zz - rawToChar(charToRaw(foo))
 zz %in% c(foo,bar)
[1] TRUE

I don't see anything about this in the latest NEWS, but is this
expected behaviour? Or is it, as I suspect, a bug? This seems to be
new to R 2.7.0, as I said.

Regards,
Jon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 2.7.0, match() and strings containing \0 - bug?

2008-04-28 Thread Jon Clayden
Apologies for missing out the sessionInfo():

R version 2.7.0 (2008-04-22)
i386-apple-darwin8.10.1

locale:
en_GB.UTF-8/en_US.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


2008/4/28 Jon Clayden [EMAIL PROTECTED]:
 Hi,

  A piece of my code that uses readBin() to read a certain file type is
  behaving strangely with R 2.7.0. This seems to be because of a failure
  to match() strings after using rawToChar() when the original was
  terminated with a \0 character. Direct equality testing with ==
  still works as expected. I can reproduce this as follows:

   x - foo
   y - c(charToRaw(foo),as.raw(0))
   z - rawToChar(y)
   z==x
  [1] TRUE
   z==foo
  [1] TRUE
   z %in% c(foo,bar)
  [1] FALSE
   z %in% c(foo,bar,foo\0)
  [1] FALSE

  But without the nul character it works fine:

   zz - rawToChar(charToRaw(foo))
   zz %in% c(foo,bar)
  [1] TRUE

  I don't see anything about this in the latest NEWS, but is this
  expected behaviour? Or is it, as I suspect, a bug? This seems to be
  new to R 2.7.0, as I said.

  Regards,
  Jon


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 2.7.0, match() and strings containing \0 - bug?

2008-04-28 Thread Prof Brian Ripley

On Mon, 28 Apr 2008, Jon Clayden wrote:


Hi,

A piece of my code that uses readBin() to read a certain file type is
behaving strangely with R 2.7.0. This seems to be because of a failure
to match() strings after using rawToChar() when the original was
terminated with a \0 character. Direct equality testing with ==
still works as expected. I can reproduce this as follows:


x - foo
y - c(charToRaw(foo),as.raw(0))
z - rawToChar(y)
z==x

[1] TRUE

z==foo

[1] TRUE

z %in% c(foo,bar)

[1] FALSE

z %in% c(foo,bar,foo\0)

[1] FALSE

But without the nul character it works fine:


zz - rawToChar(charToRaw(foo))
zz %in% c(foo,bar)

[1] TRUE

I don't see anything about this in the latest NEWS, but is this
expected behaviour? Or is it, as I suspect, a bug? This seems to be
new to R 2.7.0, as I said.


And so is the comment in ?match:

 Character inputs with embedded nul bytes will be truncated at the
 first nul.

The bug is in the documentation here -- this was intentional.

As support for embedded nuls in character strings is being removed in R 
2.8.0, you should not rely on this.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 2.7.0, match() and strings containing \0 - bug?

2008-04-28 Thread Seth Falcon
Hi Jon,

* On 2008-04-28 at 11:00 +0100 Jon Clayden wrote:
 A piece of my code that uses readBin() to read a certain file type is
 behaving strangely with R 2.7.0. This seems to be because of a failure
 to match() strings after using rawToChar() when the original was
 terminated with a \0 character. Direct equality testing with ==
 still works as expected. I can reproduce this as follows:
 
  x - foo
  y - c(charToRaw(foo),as.raw(0))
  z - rawToChar(y)
  z==x
 [1] TRUE
  z==foo
 [1] TRUE
  z %in% c(foo,bar)
 [1] FALSE
  z %in% c(foo,bar,foo\0)
 [1] FALSE
 
 But without the nul character it works fine:
 
  zz - rawToChar(charToRaw(foo))
  zz %in% c(foo,bar)
 [1] TRUE
 
 I don't see anything about this in the latest NEWS, but is this
 expected behaviour? Or is it, as I suspect, a bug? This seems to be
 new to R 2.7.0, as I said.

The short answer is that your example works in R-2.6 and in the
current R-devel.  Whether the behavior in R-2.7 is a bug is perhaps in
the eye of the beholder.

Historically, R's internal string representation allowed for embedded
nul characters.  This was particularly useful before the raw vector
type, RAWSXP, was introduced.  Since the vast majority of
R's internal string processing functions use standard C semantics
and truncated at first nul there has always been some room for
interesting behavior.  The change in R-2.7 was an attempt to start
resolving these inconsistencies.  Since then the core team has agreed
to remove the partial support for embedded nul in character strings --
raw can be used when this is desired, and having nul terminated
strings will make the code more consistent and easier to maintain
going forward.

Best Wishes,

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 2.7.0, match() and strings containing \0 - bug?

2008-04-28 Thread Herve Pages

Hi Jon,

Jon Clayden wrote:

Hi,

A piece of my code that uses readBin() to read a certain file type is
behaving strangely with R 2.7.0. This seems to be because of a failure
to match() strings after using rawToChar() when the original was
terminated with a \0 character. Direct equality testing with ==
still works as expected. I can reproduce this as follows:


x - foo
y - c(charToRaw(foo),as.raw(0))
z - rawToChar(y)
z==x

[1] TRUE

z==foo

[1] TRUE

z %in% c(foo,bar)

[1] FALSE

z %in% c(foo,bar,foo\0)

[1] FALSE


But this gives TRUE:

   z %in% c(foo,bar, z)
  [1] TRUE

An additional problem you have here is that when the foo\0 string literal
is converted into a character string, then the string data that are after the
first embedded nul are dropped:

   identical(foo\0a\0b, foo)
  [1] TRUE

And to add to the endless source of surprises that come with embedded nuls:

   dump(z, file=)
  z -
  foo\0

but of course sourcing the above dump into an R session will not restore 'z'.

Dropping support for embedded nuls in R 2.8.0 sounds like good news to me.

Cheers,
H.




But without the nul character it works fine:


zz - rawToChar(charToRaw(foo))
zz %in% c(foo,bar)

[1] TRUE

I don't see anything about this in the latest NEWS, but is this
expected behaviour? Or is it, as I suspect, a bug? This seems to be
new to R 2.7.0, as I said.

Regards,
Jon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel