Re: R performance

2020-05-13 Thread Nicholas Geovanis
On Wed, May 13, 2020, 3:02 AM Mark Fletcher  wrote:

> . So the question has become academic but I would like to get some
> sort of explanation so I can adjust for the future.
>

It used to be the case that AMD caches performed vastly differently than
Intel. That will especially be so as you stride across that big array. What
you want is to adjust the stride in order to match cache behavior. You
don't want the L1/2/3 caches thrashing as you step through it. Ive seen
cases where it makes an order-of-magnitude difference even where RAM is
calm either way.

Row-major versus column-major matrix striding is one of the sticking points.

Mark
>
>


Re: R performance

2020-05-13 Thread Reco
Hi.

On Wed, May 13, 2020 at 09:02:20AM +0100, Mark Fletcher wrote:
> EC2 used to offer Debian but they don't any more.

Debian AMIs do exist on EC2, you just have to search them:

https://wiki.debian.org/Cloud/AmazonEC2Image/Buster

Reco



Re: R performance

2020-05-13 Thread Mark Fletcher
On Tue, May 12, 2020 at 12:06:52PM -0500, Nicholas Geovanis wrote:
> 
> You don't mention which distro you are running on the EC2 instance, nor
> whether R or the C libraries differ in release levels. Moreover, that EC2
> instance type is AMD-based not Intel. So if not an apples-to-oranges
> comparison, it might be fujis-to-mcintoshs.

The distro on EC2 was Amazon Linux -- the current Amazon Linux build 
they offer that isn't Amazon Linux 2. Sorry I don't remember the exact 
build number. I did also try Amazon Linux 2 and got similar results, 
tainted by the fact that I had a little bit of trouble on THAT occasion 
building the tidyverse libraries in R, and may possibly have ended up 
with a not-entirely-clean install as a result on that one occasion. So 
best to ignore that and concentrate on the Amazon Linux (not Amazon 
Linux 2) attempts, of whcih I had a couple, which were consistent.

Locally I am running Buster.

On EC2 I commissioned the fresh machine, installed R from the Amazon Linux 
repositories, which by the way is 3.4.1 "Single Candle" -- not quite 
what is in the Debian repositories but different by a minor version.

EC2 used to offer Debian but they don't any more. The closest I could 
get would be Ubuntu.

But we are talking about the same R code and same data running with a 
13-fold performance difference -- I don't believe that is down to the 
Linux distro per se, or AMD vs Intel. Something else is going on here. 
The EC2 box has 128GB of RAM and I could see the R instance using about 
1.3GB, which is what it does on my local box too (24GB RAM here).

I do get that we are talking about virtualised CPUs on the EC2 box and 
virtualisation introduces a penalty of some sort -- but 13-fold? 
Compared to 10-year-old physical hardware? Sounds wrong to me. As I say, 
something else is going on here. Especially when one considers, as I 
mentioned before, that past experiments with Java have not seen a 
significant performance difference between EC2 and my environment.

> 
> Long ago I built R from source a couple times a year. It has an
> unfathomable number of libraries and switches, any one of them could have a
> decisive effect on performamce. Two different builds could be quite
> different in behavior.
> 

Right -- that's what prompted my original question. I was/am hoping 
someone might be in a position to say "well it could be the fact that we 
set the XYZ flags in the build of R in Debian..." that would give me a 
rabbit hole to chase off down.

D.R. Evans' helpful point about his experiences with multi-CPU usage in 
recent Debian builds of R, for example -- even though that's not what 
I'm seeing in my runs, it does imply thought has gone into optimal CPU 
usage in the Debian R build...

Overnight I've run the job on my own machine now, by splitting the job 
up into 10 pieces and running 2 parts each in 5 parallel batches -- I 
was loath to do that at first as the machine is old and self-built and I 
worried about overheating it, but me of little faith, it handled it 
fine. So the question has become academic but I would like to get some 
sort of explanation so I can adjust for the future.

Mark



Re: R performance

2020-05-12 Thread Nicholas Geovanis
On Tue, May 12, 2020, 10:55 AM Mark Fletcher  wrote:

> On Tue, May 12, 2020 at 08:16:52AM -0600, D. R. Evans wrote:
> > Mark Fletcher wrote on 5/12/20 7:34 AM:
> > > Hello
> > >
> >
> > I have noticed that recent versions of R supplied by debian are using
> all the
> > available cores instead of just one. I don't know whether that's a debian
> > change or an R change, but it certainly makes things much faster (one of
> my
> > major complaints about R was that it seemed to be single threaded, so
> I'm very
> > glad that, for whatever reason, that's no longer the case).
> >
> Thanks, but definitely not the case here. When running on my own
> machine, top shows the process at 100% CPU, the load on the machine
> heading for 1.0, and the Gnome system monitor shows one CPU vCore
> (hyperthread, whatever) at 100% and the other 7 idle.
>
> R is certainly _capable_ of using more of the CPU than that, but you
> have to load libraries eg snow and use their function calls to do it -- in
> short, like in many languages, you have to code for parallelism. I tried
> to keep parallelism out of this experiment on both machines being
> compared.
>

You don't mention which distro you are running on the EC2 instance, nor
whether R or the C libraries differ in release levels. Moreover, that EC2
instance type is AMD-based not Intel. So if not an apples-to-oranges
comparison, it might be fujis-to-mcintoshs.

Long ago I built R from source a couple times a year. It has an
unfathomable number of libraries and switches, any one of them could have a
decisive effect on performamce. Two different builds could be quite
different in behavior.

Mark
>
>


Re: R performance

2020-05-12 Thread D. R. Evans
Mark Fletcher wrote on 5/12/20 9:55 AM:
> On Tue, May 12, 2020 at 08:16:52AM -0600, D. R. Evans wrote:
>> Mark Fletcher wrote on 5/12/20 7:34 AM:
>>> Hello
>>>
>>
>> I have noticed that recent versions of R supplied by debian are using all the
>> available cores instead of just one. I don't know whether that's a debian
>> change or an R change, but it certainly makes things much faster (one of my
>> major complaints about R was that it seemed to be single threaded, so I'm 
>> very
>> glad that, for whatever reason, that's no longer the case).
>>
> Thanks, but definitely not the case here. When running on my own 
> machine, top shows the process at 100% CPU, the load on the machine 
> heading for 1.0, and the Gnome system monitor shows one CPU vCore 
> (hyperthread, whatever) at 100% and the other 7 idle.
> 
> R is certainly _capable_ of using more of the CPU than that, but you 
> have to load libraries eg snow and use their function calls to do it -- in 
> short, like in many languages, you have to code for parallelism. I tried 
> to keep parallelism out of this experiment on both machines being 
> compared.
> 

OK. Well I don't understand what's going on and don't really have anything
further to contribute :-( All I know is that the same code that used to run on
just one core pre-buster now uses all the cores available, with no changes or
fancy libraries. I was (of course) very pleasantly surprised the first time I
ran one of my R scripts under buster and saw this happen. I haven't
experimented further.

  Doc

-- 
Web:  http://enginehousebooks.com/drevans



signature.asc
Description: OpenPGP digital signature


Re: R performance

2020-05-12 Thread Mark Fletcher
On Tue, May 12, 2020 at 08:16:52AM -0600, D. R. Evans wrote:
> Mark Fletcher wrote on 5/12/20 7:34 AM:
> > Hello
> > 
> 
> I have noticed that recent versions of R supplied by debian are using all the
> available cores instead of just one. I don't know whether that's a debian
> change or an R change, but it certainly makes things much faster (one of my
> major complaints about R was that it seemed to be single threaded, so I'm very
> glad that, for whatever reason, that's no longer the case).
> 
Thanks, but definitely not the case here. When running on my own 
machine, top shows the process at 100% CPU, the load on the machine 
heading for 1.0, and the Gnome system monitor shows one CPU vCore 
(hyperthread, whatever) at 100% and the other 7 idle.

R is certainly _capable_ of using more of the CPU than that, but you 
have to load libraries eg snow and use their function calls to do it -- in 
short, like in many languages, you have to code for parallelism. I tried 
to keep parallelism out of this experiment on both machines being 
compared.

Mark



Re: R performance

2020-05-12 Thread D. R. Evans
Mark Fletcher wrote on 5/12/20 7:34 AM:
> Hello
> 
> I have recently had cause to compare performance of running the R 
> language on my 10+-year-old PC running Buster (Intel Core i7-920 CPU) 
> and in the cloud on AWS. I got a surprising result, and I am wondering 
> if the R packages on Debian have been built with any flags that account 
> for the difference.
> 
> My PC was a mean machine when it was built, but that was in 2009. I'd 
> expect it would be outperformed by up to date hardware.
> 

I have noticed that recent versions of R supplied by debian are using all the
available cores instead of just one. I don't know whether that's a debian
change or an R change, but it certainly makes things much faster (one of my
major complaints about R was that it seemed to be single threaded, so I'm very
glad that, for whatever reason, that's no longer the case).

  Doc

-- 
Web:  http://enginehousebooks.com/drevans



signature.asc
Description: OpenPGP digital signature


R performance

2020-05-12 Thread Mark Fletcher
Hello

I have recently had cause to compare performance of running the R 
language on my 10+-year-old PC running Buster (Intel Core i7-920 CPU) 
and in the cloud on AWS. I got a surprising result, and I am wondering 
if the R packages on Debian have been built with any flags that account 
for the difference.

My PC was a mean machine when it was built, but that was in 2009. I'd 
expect it would be outperformed by up to date hardware.

I have a script in R which I wrote which performs a moderately involved 
calculation column-by-column on a 4000-row, 1-column matrix. On my 
Buster PC, performing the calculation on a single column takes 9.5 
seconds. The code does not use any multi-cpu capabilities so it uses 
just one of the 8 avaialable virtual CPUs in my PC while doing so. (4 
cores, with hyperthreading = 8 virtual CPUs)

Running the same code on the same data on a fairly high-spec AWS EC2 
server in the cloud, (the r5a-4xlarge variety for those who know about 
AWS) the same calculation takes 2 minutes and 6 seconds. 

Obviously there is virtualisation involved here, but at low load with 
just one instance running and the machine not being asked to do anything 
else I would have expected the AWS machine to be much closer to local 
performance if not better, given the age of my PC.

In the past I have run highly parallel Java programs in the two 
environments and have seen much better results from using AWS in 
Java-land. That led me to wonder if it is something about how R is 
configured. I am not getting anywhere in the AWS forums (unless you pay 
a lot of money you basically don't get a lot of attention) so I was 
wondering if anyone was familiar with how the R packages are configured 
in Debian who might know if anything has been done to optimise 
performance, that might explain why it is so much faster in Debian? Is 
it purely local hardware versus virtualised? I am struggling to believe 
that because I don't see the same phenomenon in Java programs.

Thanks for any ideas

Mark