Thanks for the report, fixed in R-devel (74848).

Best
Tomas

On 06/04/2018 02:41 PM, NELSON, Michael wrote:

On R 3.5.0 (Mac)

The issue appears when using the default (libcurl) method and specifying the 
encoding

Note that using method='internal' causes a segfault if used in conjunction with 
encoding. (and works when encoding is not set)

urlR <- "http://home.versanet.de/~s-berman/source2.R";
# works
url_default <- url(urlR)
scan(url_default, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  
"print(\"Non-ascii:" "äöüß\")"
# [7] "}"

url_default_en <- url(urlR, encoding = "UTF-8")
scan(url_default_en, "")
# Read 0 items
# character(0)
url_internal <- url(urlR, method = 'internal')
scan(url_internal, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  
"print(\"Non-ascii:" "äöüß\")"
# [7] "}"

url_internal_en <- url(urlR, encoding = "UTF-8", method = 'internal')
#scan(url_internal_en, "")
#*** caught segfault ***
#  address 0x0, cause 'memory not mapped'

url_libcurl <- url(urlR, method = 'libcurl')
scan(url_libcurl, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  
"print(\"Non-ascii:" "äöüß\")"
# [7] "}"
url_libcurl_en <- url(urlR, encoding = "UTF-8", method = 'libcurl')
scan(url_libcurl_en, "")
# Read 0 items
# character(0)


Michael

________________________________________
From: R-devel [r-devel-boun...@r-project.org] on behalf of Stephen Berman 
[stephen.ber...@gmx.net]
Sent: Monday, 4 June 2018 7:26 PM
To: Martin Maechler
Cc: R-devel
Subject: Re: [Rd] encoding argument of source() in 3.5.0

On Mon, 4 Jun 2018 10:44:11 +0200 Martin Maechler <maech...@stat.math.ethz.ch> 
wrote:

peter dalgaard
     on Sun, 3 Jun 2018 23:51:24 +0200 writes:
     > Looks like this actually comes from readLines(), nothing
     > to do with source() as such: In current R-devel (still):

     >> f <- file("http://home.versanet.de/~s-berman/source2.R";, 
encoding="UTF-8")
     >> readLines(f)
     > character(0)
     >> close(f)
     >> f <- file("http://home.versanet.de/~s-berman/source2.R";)
     >> readLines(f)
     > [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
     > [3] "}"

     > -pd

and that's not even readLines(), but rather how exactly the
connection is defined [even in your example above]

   > urlR <- "http://home.versanet.de/~s-berman/source2.R";
   > readLines(urlR, encoding="UTF-8")
   [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
   [3] "}"
   > f <- file(urlR, encoding = "UTF-8")
   > readLines(f)
   character(0)

and the same behavior with scan()  instead of readLines() :

scan(urlR,"") # works
Read 7 items
[1] "source.test2"       "<-"                 "function()"         "{"
[5] "print(\"Non-ascii:" "äöüß\")"            "}"
scan(f,"") # fails
Read 0 items
character(0)
So it seems as if the bug is in the file() [or url()] C code ..
Yes, the problem seems to be restricted to loading files from a
(non-local) URL; i.e. this works fine on my computer:

   > source("file:///home/steve/prog/R/source2.R", encoding="UTF-8")

Also, I noticed this works too:

   > read.table("http://home.versanet.de/~s-berman/table2";, encoding="UTF-8", 
skip=1)

where (if I read the source correctly) using `skip=1' makes read.table()
call readLines().  (The read.table() invocation also works without
`skip'.)

But then we also have to consider Windows .. where I think most changes have
happened during the  R-3.4.4 --> R-3.5.0  transition.
Yes, please.  I need (or at least it would be convenient) to be able to
load R code containing non-ascii characters from the web under
MS-Windows.

Steve Berman

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__________________________________________________________________________________________________________
This email has been scanned for the NSW Ministry of Health by the Websense 
Hosted Email Security System.
Emails and attachments are monitored to ensure compliance with the NSW Ministry 
of health's Electronic Messaging Policy.
__________________________________________________________________________________________________________

_______________________________________________________________________________________________________
Disclaimer: This message is intended for the addressee named and may contain 
confidential information.
If you are not the intended recipient, please delete it and notify the sender.
Views expressed in this message are those of the individual sender, and are not 
necessarily the views of the NSW Ministry of Health.
_______________________________________________________________________________________________________
This email has been scanned for the NSW Ministry of Health by the Websense 
Hosted Email Security System.
Emails and attachments are monitored to ensure compliance with the NSW Ministry 
of Health's Electronic Messaging Policy.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to