[R] downloaf.file

2003-02-04 Thread Adelchi Azzalini

Dear List-members,

to download a file from the net, the function download.file(..)
does the job.  However, before embarking on the download, I would
like to find out how large the file is.  Is there a way to know it?

Most easily, this question has been asked before, but I am new to 
the list.

Regards, with thanks in advance,

Adelchi Azzalini

Adelchi Azzalini  [EMAIL PROTECTED]
Dipart.Scienze Statistiche, Università di Padova, Italia

[EMAIL PROTECTED] mailing list

Re: [R] downloaf.file

2003-02-04 Thread ripley
Essentially no.  Most servers will give you the length if you start the 
download, and then R prints it out, but it may be unknown.  As in
trying URL `http://cran.r-project.org/src/contrib/PACKAGES'
Content type `text/plain; charset=iso-8859-1' length 95407 bytes
opened URL
.. .. .. .. ..
.. .. .. .. ...
downloaded 93Kb

and you can (probably) interrupt during those dots.

On Tue, 4 Feb 2003, Adelchi Azzalini wrote:

 to download a file from the net, the function download.file(..)
 does the job.  However, before embarking on the download, I would
 like to find out how large the file is.  Is there a way to know it?
 Most easily, this question has been asked before, but I am new to 
 the list.

Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

[EMAIL PROTECTED] mailing list

Re: [R] downloaf.file

2003-02-04 Thread Barry Rowlingson

to download a file from the net, the function download.file(..)
does the job.  However, before embarking on the download, I would
like to find out how large the file is.  Is there a way to know it?

 You can send web servers a 'HEAD' request, which can give you some 
basic information about the download. I cant see a way to get this from 
the current R functions, so here's a little routine to leverage the 
'lynx' web browser:

head.download -
  function (url)
  if (system(lynx -help  /dev/null) == 0) {
method - lynx
  else {
stop(No lynx found)
  if (method == lynx) {
heads - system(paste(lynx -head -dump ', url,', sep = 

# turn name: value lines into named list. prob vectorisable

  ret - list(status=heads[1])
  for(l in 2:length(heads)){
col - regexpr(:,heads[l])
  name - substr(heads[l],1,(col-1))
  value - substr(heads[l],(col+1),nchar(heads[l]))
  ret[[name]] - value
  ret - c(ret,heads[l])

 this borrows bits from download.file(), but it does depend on you 
having lynx installed. The return value is a list with names 
corresponding to the header titles and values being the values. It looks 
for a : as the title: value separator, and anything that doesnt have a : 
is just added verbatim unnamed.

 For example, how big is the R logo on the home page?

[1]  8793

 That's bytes. Yes I know its character! I dont think web servers are 
under any obligation to provide accurate Content-length values. Many 
dynamic web servers have pages that change length every time. This will 
also not for for ftp:// URLs or local file:// URLs (or gopher:// URLs?).

 Perhaps HEAD-getting functionality can be put in the next release of 
R? It would probably have a better name: value - named list routine 
than the one I just hacked up in two minutes above. Oops. Shame.


[EMAIL PROTECTED] mailing list

Re: [R] downloaf.file

2003-02-04 Thread Thomas Lumley
On Tue, 4 Feb 2003, Barry Rowlingson wrote:

   That's bytes. Yes I know its character! I dont think web servers are
 under any obligation to provide accurate Content-length values. Many
 dynamic web servers have pages that change length every time. This will
 also not for for ftp:// URLs or local file:// URLs (or gopher:// URLs?).

The HTTP protocol says that a content length SHOULD be provided and MUST
be accurate if it is provided.


[EMAIL PROTECTED] mailing list

Re: [R] downloaf.file

2003-02-04 Thread ripley
On Tue, 4 Feb 2003, Thomas Lumley wrote:

 On Tue, 4 Feb 2003, Barry Rowlingson wrote:
That's bytes. Yes I know its character! I dont think web servers are
  under any obligation to provide accurate Content-length values. Many
  dynamic web servers have pages that change length every time. This will
  also not for for ftp:// URLs or local file:// URLs (or gopher:// URLs?).
 The HTTP protocol says that a content length SHOULD be provided and MUST
 be accurate if it is provided.

Most proxies of my acquaintance will report unknown unless they are asked
to actually get the file or have it already cached.  Further, the IE 
internals used under Windows with --internet2 usually seems to get the 
wrong length (far too short) when talking to a proxy.

Why is this of interest: there are lots of internet download tools 
available apart from R?

Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

[EMAIL PROTECTED] mailing list