Re: [R] Ubuntu vs. Windows

2008-04-26 Thread George N. White III
On Tue, 22 Apr 2008, Doran, Harold wrote:

 Dear List:

 I am very much a unix neophyte, but recently had a Ubuntu box installed
 in my office. I commonly use Windows XP with 3 GB RAM on my machine and
 the Ubuntu machine is exactly the same as my windows box (e.g.,
 processor and RAM) as far as I can tell.

 Now, I recently had to run a very large lmer analysis using my windows
 machine, but was unable to due to memory limitations, even after
 increasing all the memory limits in R (which I think is a 2gig max
 according to the FAQ for windows). So, to make this computationally
 feasible, I had to sample from my very big data set and then run the
 analysis. Even still, it would take something on the order of 45 mins to
 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse
 and kept giving execution errors until the data set was very small and
 then it ran for a long time)

 However, I just ran the same analysis on the Ubuntu machine with the
 full, complete data set, which is very big and lmer gave me back
 parameter estimates in less than 5 minutes.

 Because I have so little experience with Ubuntu, I am quite pleased and
 would like to understand this a bit better. Does this occur because R is
 a bit friendlier with unix somehow? Or, is this occuring because unix
 somehow has more efficient methods for memory allocation?

On the same hardware the differences between windows and linux
performance are generally minor, but there are many things that
can cause very poor performance on either platform.

 I wish I knew enough to even ask the right questions. So, I welcome any
 enlightenment members may add.

I have seen very big differences in performance on computational 
benchmarks for hardware with similar basic specifications (CPU type and 
clock, RAM, etc).  Often the difference is a symptom of broken hardware or 
some misconfiguration.  Can you see the difference in performance in other 
applications?  Here are some things to consider:

1. anti-virus scanning and other background tasks -- I've seen
systems configured to scan gigabyte network drives.  Windows
task manager and linux top, etc. can give an idea of what
is using a lot of CPU, but they are not so helpful if the issue
involves I/O bottlenecks.

2. incorrect hardware configuration in the system BIOS.  This happens
far too often, even with big name vendors.  I like to run some
benchmarks on every new system to make sure there aren't some
basic configuration errors, and to have as a reference if I
suspect problems after the systems have been in use.

3. network problems.  Where I work, so PC's (both linux and Windows)
get the ethernet duplex setting wrong when booted.  This can
result in poor performance when using networked disks without
other symptoms.  On windows, the repair network connection
button often clears the problem.  On linux, ethtool can display
and change ethernet settings.

4. all sorts of hardware issues -- sometimes useful data appear in
the system logs.  Use event viewer on Windows, look at
/var/log/messages and /var/log/dmesg on linux.

5. does the slow system exhibit a lot more disk activity?  Sometimes
this is hard to detect, but most systems do provide some statistics.
Try running some I/O intensive benchmark at the same time your R
job is running.

-- 
George N. White III  [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ubuntu vs. Windows

2008-04-23 Thread Douglas Bates
On 4/22/08, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 On Tue, 22 Apr 2008, Peter Dalgaard wrote:

   Doran, Harold wrote:
   Dear List:
  
   I am very much a unix neophyte, but recently had a Ubuntu box installed
   in my office. I commonly use Windows XP with 3 GB RAM on my machine and
   the Ubuntu machine is exactly the same as my windows box (e.g.,
   processor and RAM) as far as I can tell.
  
   Now, I recently had to run a very large lmer analysis using my windows
   machine, but was unable to due to memory limitations, even after
   increasing all the memory limits in R (which I think is a 2gig max
   according to the FAQ for windows). So, to make this computationally
   feasible, I had to sample from my very big data set and then run the
   analysis. Even still, it would take something on the order of 45 mins to
   1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse
   and kept giving execution errors until the data set was very small and
   then it ran for a long time)
  
   However, I just ran the same analysis on the Ubuntu machine with the
   full, complete data set, which is very big and lmer gave me back
   parameter estimates in less than 5 minutes.
  
   Because I have so little experience with Ubuntu, I am quite pleased and
   would like to understand this a bit better. Does this occur because R is
   a bit friendlier with unix somehow? Or, is this occuring because unix
   somehow has more efficient methods for memory allocation?
  
   Probably partly the latter and not the former (we try to make the most
   of what the OS offers in either case), but a more important difference
   is that we can run in 64 bit address space on non-Windows platforms
   (assuming that you run a 64 bit Ubuntu).
  
   Even with 64 bit Windows we do not have the 64 bit toolchain in place to
   build R except as a 32 bit program. Creating such a toolchain is beyond
   our reach, and although progress is being made, it is painfully slow
   (http://sourceforge.net/projects/mingw-w64/). Every now and then, the
   prospect of using commercial tools comes up, but they are not
   plug-compatible and using them would leave end users without the
   possibility of building packages with C code, unless they go out and buy
   the same toolchain.

 There is another possibility.  lmer is heavy on matrix algebra, and so
  usually benefits considerably from an optimized BLAS.  Under Windows you
  need to download one of those on CRAN (or build your own).  I believe that
  under Ubuntu R will make use of one if it is already installed.

Optimized BLAS is a possible explanation but it would depend on the
Ubuntu package for the correct version of Atlas having been installed.
 I don't think those packages are installed by default.  Even if they
were installed, optimized BLAS are not always beneficial for lmer.
Depending on the structure of the model, optimized BLAS, especially
multithreaded BLAS, can actually slow lmer down.

I think the difference is more likely due to swapping.  A typical lmer
call does considerable memory allocation at the beginning of the
computation then keeps a stable memory footprint during the
optimization of the deviance with respect to the model parameters.  It
does access essentially all the big chunks of memory in that footprint
during the optimization.  If the required memory is a bit larger than
the available memory you get a lot of swapping, as I found out
yesterday.  I started an lmer run on a 64-bit Ubuntu machine
forgetting that I had recently removed a defective memory module from
that machine.  It had only 2 GB of memory and about 8 GB of swap
space.  It spent a lot of time swapping.  I definitely should have
done that run on one of our servers that has much more real memory.

Harold: Typing either

cat /proc/meminfo

or

free

in a shell window on your Ubuntu machine will tell you the amount of
memory and swap space on the machine.  If you start the lmer fit and
switch to a terminal window where you run the program top you can
watch the evolution of the memory usage by the R program.  It will
probably increase  at the beginning of the run then stabilize.



Harold

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ubuntu vs. Windows

2008-04-22 Thread Doran, Harold
Dear List:

I am very much a unix neophyte, but recently had a Ubuntu box installed
in my office. I commonly use Windows XP with 3 GB RAM on my machine and
the Ubuntu machine is exactly the same as my windows box (e.g.,
processor and RAM) as far as I can tell.

Now, I recently had to run a very large lmer analysis using my windows
machine, but was unable to due to memory limitations, even after
increasing all the memory limits in R (which I think is a 2gig max
according to the FAQ for windows). So, to make this computationally
feasible, I had to sample from my very big data set and then run the
analysis. Even still, it would take something on the order of 45 mins to
1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse
and kept giving execution errors until the data set was very small and
then it ran for a long time)

However, I just ran the same analysis on the Ubuntu machine with the
full, complete data set, which is very big and lmer gave me back
parameter estimates in less than 5 minutes. 

Because I have so little experience with Ubuntu, I am quite pleased and
would like to understand this a bit better. Does this occur because R is
a bit friendlier with unix somehow? Or, is this occuring because unix
somehow has more efficient methods for memory allocation?

I wish I knew enough to even ask the right questions. So, I welcome any
enlightenment members may add.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ubuntu vs. Windows

2008-04-22 Thread Abhijit Dasgupta
My naive understanding of this (I switched to Ubuntu a year ago from 
WinXP for similar reasons) is that Ubuntu as an OS uses less memory than 
WinXP, thus leaving more memory for computation, swap space, etc. In 
other words, Ubuntu is lighter than XP on system resources.

Abhijit

Doran, Harold wrote:
 Dear List:

 I am very much a unix neophyte, but recently had a Ubuntu box installed
 in my office. I commonly use Windows XP with 3 GB RAM on my machine and
 the Ubuntu machine is exactly the same as my windows box (e.g.,
 processor and RAM) as far as I can tell.

 Now, I recently had to run a very large lmer analysis using my windows
 machine, but was unable to due to memory limitations, even after
 increasing all the memory limits in R (which I think is a 2gig max
 according to the FAQ for windows). So, to make this computationally
 feasible, I had to sample from my very big data set and then run the
 analysis. Even still, it would take something on the order of 45 mins to
 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse
 and kept giving execution errors until the data set was very small and
 then it ran for a long time)

 However, I just ran the same analysis on the Ubuntu machine with the
 full, complete data set, which is very big and lmer gave me back
 parameter estimates in less than 5 minutes. 

 Because I have so little experience with Ubuntu, I am quite pleased and
 would like to understand this a bit better. Does this occur because R is
 a bit friendlier with unix somehow? Or, is this occuring because unix
 somehow has more efficient methods for memory allocation?

 I wish I knew enough to even ask the right questions. So, I welcome any
 enlightenment members may add.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ubuntu vs. Windows

2008-04-22 Thread Peter Dalgaard
Doran, Harold wrote:
 Dear List:

 I am very much a unix neophyte, but recently had a Ubuntu box installed
 in my office. I commonly use Windows XP with 3 GB RAM on my machine and
 the Ubuntu machine is exactly the same as my windows box (e.g.,
 processor and RAM) as far as I can tell.

 Now, I recently had to run a very large lmer analysis using my windows
 machine, but was unable to due to memory limitations, even after
 increasing all the memory limits in R (which I think is a 2gig max
 according to the FAQ for windows). So, to make this computationally
 feasible, I had to sample from my very big data set and then run the
 analysis. Even still, it would take something on the order of 45 mins to
 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse
 and kept giving execution errors until the data set was very small and
 then it ran for a long time)

 However, I just ran the same analysis on the Ubuntu machine with the
 full, complete data set, which is very big and lmer gave me back
 parameter estimates in less than 5 minutes. 

 Because I have so little experience with Ubuntu, I am quite pleased and
 would like to understand this a bit better. Does this occur because R is
 a bit friendlier with unix somehow? Or, is this occuring because unix
 somehow has more efficient methods for memory allocation?
   
Probably partly the latter and not the former (we try to make the most 
of what the OS offers in either case), but a more important difference 
is that we can run in 64 bit address space on non-Windows platforms 
(assuming that you run a 64 bit Ubuntu).

Even with 64 bit Windows we do not have the 64 bit toolchain in place to 
build R except as a 32 bit program. Creating such a toolchain is beyond 
our reach, and although progress is being made, it is painfully slow 
(http://sourceforge.net/projects/mingw-w64/). Every now and then, the 
prospect of using commercial tools comes up, but they are not 
plug-compatible and using them would leave end users without the 
possibility of building packages with C code, unless they go out and buy 
the same toolchain.

 I wish I knew enough to even ask the right questions. So, I welcome any
 enlightenment members may add.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ubuntu vs. Windows

2008-04-22 Thread Prof Brian Ripley
On Tue, 22 Apr 2008, Peter Dalgaard wrote:

 Doran, Harold wrote:
 Dear List:

 I am very much a unix neophyte, but recently had a Ubuntu box installed
 in my office. I commonly use Windows XP with 3 GB RAM on my machine and
 the Ubuntu machine is exactly the same as my windows box (e.g.,
 processor and RAM) as far as I can tell.

 Now, I recently had to run a very large lmer analysis using my windows
 machine, but was unable to due to memory limitations, even after
 increasing all the memory limits in R (which I think is a 2gig max
 according to the FAQ for windows). So, to make this computationally
 feasible, I had to sample from my very big data set and then run the
 analysis. Even still, it would take something on the order of 45 mins to
 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse
 and kept giving execution errors until the data set was very small and
 then it ran for a long time)

 However, I just ran the same analysis on the Ubuntu machine with the
 full, complete data set, which is very big and lmer gave me back
 parameter estimates in less than 5 minutes.

 Because I have so little experience with Ubuntu, I am quite pleased and
 would like to understand this a bit better. Does this occur because R is
 a bit friendlier with unix somehow? Or, is this occuring because unix
 somehow has more efficient methods for memory allocation?

 Probably partly the latter and not the former (we try to make the most
 of what the OS offers in either case), but a more important difference
 is that we can run in 64 bit address space on non-Windows platforms
 (assuming that you run a 64 bit Ubuntu).

 Even with 64 bit Windows we do not have the 64 bit toolchain in place to
 build R except as a 32 bit program. Creating such a toolchain is beyond
 our reach, and although progress is being made, it is painfully slow
 (http://sourceforge.net/projects/mingw-w64/). Every now and then, the
 prospect of using commercial tools comes up, but they are not
 plug-compatible and using them would leave end users without the
 possibility of building packages with C code, unless they go out and buy
 the same toolchain.

There is another possibility.  lmer is heavy on matrix algebra, and so 
usually benefits considerably from an optimized BLAS.  Under Windows you 
need to download one of those on CRAN (or build your own).  I believe that 
under Ubuntu R will make use of one if it is already installed.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.