Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled

2018-08-27 Thread Tomas Kalibera

Dear Arun,

thank you for checking the workaround scripts.

I've modified detectCores() to use GetLogicalProcessorInformationEx. It 
is in revision 75198 of R-devel, could you please test it on your 
machines? For a binary, you can wait until the R-devel snapshot build 
gets to at least this svn revision.


Thanks for the link to the processor groups documentation. I don't have 
a machine to test this on, but I would hope that snow clusters (e.g. 
PSOCK) should work fine on systems with >64 logical processors as they 
spawn new processes (not just threads). Note that FORK clusters are not 
supported on Windows.


Thanks
Tomas

On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote:

Dear Tomas, thank you for looking into this. Here's the output:

# number of logical processors - what detectCores() should return
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
[1] "NumberOfLogicalProcessors  \r" "22 \r" "22  
   \r"
[4] "20 \r" "22 \r" "\r"
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, 
value=TRUE
# [1] 86

[I've asked the IT team to understand why one of the values is 20 instead of 
22].

# number of cores - what detectCores(FALSE) should return
out <- system("wmic cpu get numberofcores", intern=TRUE)
[1] "NumberOfCores  \r" "22 \r" "22 \r" "20 \r" 
"22 \r"
[6] "\r"
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, 
value=TRUE
# [1] 86

[Currently hyperthreading is disabled. So this output being identical to the 
previous output makes sense].

system("wmic computersystem get numberofprocessors")
NumberOfProcessors
4

In addition, I'd also bring to your attention this documentation: 
https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups on 
processor groups which explain how one should go about running a process ro run 
on multiple groups (which seems to be different to NUMA). All this seems overly 
complicated to allow a process to use all cores by default TBH.

Here's a project on Github 'fio' where the issue of running a process on more 
than 1 processor group has come up -  https://github.com/axboe/fio/issues/527 
and is addressed - 
https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h
 . I am not sure though if this is entirely relevant since we would be forking 
new processes in R instead of allowing a single process to use all cores. 
Apologies if this is utterly irrelevant.

Thank you,
Arun.

From: Tomas Kalibera 
Sent: 21 August 2018 11:50
To: Srinivasan, Arunkumar ; 
r-devel@r-project.org
Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled 
or disabled

Dear Arun,

thank you for the report. I agree with the analysis, detectCores() will only 
report logical processors in the NUMA group in which R is running. I don't have 
a system to test on, could you please check these workarounds for me on your 
systems?

# number of logical processors - what detectCores() should return
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, 
value=TRUE

# number of cores - what detectCores(FALSE) should return
out <- system("wmic cpu get numberofcores", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, 
value=TRUE

# number of physical processors - as a sanity check

system("wmic computersystem get numberofprocessors")

Thanks,
Tomas

On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:
Dear R-devel list,

R's detectCores() function internally calls "ncpus" function to get the total 
number of logical processors. However, this doesnot seem to take NUMA into account on 
Windows machines.

On a machine having 48 processors (24 cores) in total and windows server 2012 
installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 
CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA 
is disabled, detectCores() returns 48.

Similarly, on a machine with 88 cores (176 processors) and windows server 2012, 
detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA 
is enabled with 4 nodes (44 processors each), then detectCores() will only 
return 44. This is particularly limiting since we cannot get to use all 
processors by enabling/disabling NUMA in this case.

We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" 
(https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of 
"PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" 
(https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, 
quoting from the first link:

"On systems with more than 64 logical processors, the GetLogicalProcessorInformation 
function retrieves logical processor information about pr

Re: [Rd] Where does L come from?

2018-08-27 Thread William Dunlap via R-devel
Rich Calaway pointed out that S4 came out c. 1996-97, not 1991.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Aug 26, 2018 at 8:30 PM, William Dunlap  wrote:

> >  the lack of a decimal place had historically not been significant
>
> Version 4 of S (c. 1991) and versions of S+ based on it treated a sequence
> of digits without a decimal  point as integer.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sat, Aug 25, 2018 at 4:33 PM, Duncan Murdoch 
> wrote:
>
>> On 25/08/2018 4:49 PM, Hervé Pagès wrote:
>>
>>> The choice of the L suffix in R to mean "R integer type", which
>>> is mapped to the "int" type at the C level, and NOT to the "long int"
>>> type, is really unfortunate as it seems to be misleading and confusing
>>> a lot of people.
>>>
>>
>> Can you provide any evidence of that (e.g. a link to a message from one
>> of these people)?  I think a lot of people don't really know about the L
>> suffix, but that's different from being confused or misleaded by it.
>>
>> And if you make a criticism like that, it would really be fair to suggest
>> what R should have done instead.  I can't think of anything better, given
>> that "i" was already taken, and that the lack of a decimal place had
>> historically not been significant.  Using "I" *would* have been confusing
>> (3i versus 3I being very different).  Deciding that 3 suddenly became an
>> integer value different from 3. would have led to lots of inefficient
>> conversions (since stats mainly deals with floating point values).
>>
>> Duncan Murdoch
>>
>>
>>
>>> The fact that nowadays "int" and "long int" have the same size on most
>>> platforms is only anecdotal here.
>>>
>>> Just my 2 cents.
>>>
>>> H.
>>>
>>> On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote:
>>>

 On 25 August 2018 at 09:28, Carl Boettiger wrote:
 | I always thought it meant "Long" (I'm assuming R's integers are long
 | integers in C sense (iirrc one can declare 'long x', and it being
 common to
 | refer to integers as "longs"  in the same way we use "doubles" to mean
 | double precision floating point).  But pure speculation on my part,
 so I'm
 | curious!

 It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan &
 Ritchie.  It
 explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and
 'long' is
 32 bit; and (in sec 2.3) introduces the I, U, and L labels for
 constants.  So
 "back then when" 32 bit was indeed long.  And as R uses 32 bit integers
 ...

 (It is all murky because the size is an implementation detail and later
 "essentially everybody" moved to 32 bit integers and 64 bit longs as
 the 64
 bit architectures became prevalent.  Which is why when it matters one
 should
 really use more explicit types like int32_t or int64_t.)

 Dirk


>>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Package compiler - efficiency problem

2018-08-27 Thread Karol Podemski
Dear Tomas, Inaki and the rest of R-devel team,

thank you for your explainations and suggestions. I talked with gEcon
development team and we decided to change our implementation along the
lines you suggested.

Best regards,
Karol Podemski


pt., 17 sie 2018 o 13:38 Tomas Kalibera 
napisał(a):

> Dear Karol,
>
> I don't understand the models behind these function, but I can tell that
> the code generated is very inefficient. The AST interpreter will be very
> inefficient performing each scalar computation with all boxing,
> allocations, function calls. The byte-code compiler removes some of the
> boxing and allocation. While it could certainly compile faster, it will
> always be taking long compiling such functions with so many commands: so
> many expressions to track, so many source references to map, for so little
> computation. The same code could be much more efficient if it used
> vectorized operations and loops. The compiler cannot infer the loops and
> vector operations from the code - it is not that smart and I doubt it could
> easily be for R, but such optimizations could certainly be done easily at a
> higher level, when optimizing computation within the model, not within R
> with all its complicated semantics. I think you could hope for ~100x
> speedups compared to current generated code running with R AST interpreter.
>
> So I think it might be worth thinking about writing an interpreter for the
> model (the generator would compute function values on the fly, without
> actually generating code). If that was too slow, it might pay off to
> generate some intermediate representation for the model that would be
> faster to interpret. If that was too hard, then perhaps generating the code
> from the model in a smarter way (use vector operations, loops). It is ok to
> do that opportunistically - only when possible. With compilers, this is
> normal, optimizations often take advantage of certain patterns in the code
> if they are present. If you had more specific questions how to optimize the
> code feel free to ask (also offline).
>
> Certainly I don't want the existence of the byte-code compiler to require
> you to switch from R to C/C++, that would be exactly the opposite of what
> the compiler is aiming for. If it turns out you really need a way to
> disable compilation of these generated functions (so they run much slower,
> but you don't have to wait for them to compile), we will provide it and
> using a hack/workaround it is already possible in existing versions of R,
> with all the drawbacks I mentioned previously.
>
> Best
> Tomas
>
>
> On 08/17/2018 12:43 AM, Karol Podemski wrote:
>
> Dear Thomas,
>
> thank you for prompt response and taking interest in this issue. I really
> appreciate your compiler project and efficiency gains in usual case. I am
> aware of limitations of interpreted languages too and because of that even
> when writing my first mail I had a hunch that it is not that easy to
> address this problem.  As you mentioned optimisation of compiler for
> handling non-standard code may be tricky and harmful for usual code. The
> question is if gEcon is the only package that may face the same issue
> because of compilation.
>
> The functions generated by gEcon are systems of non-linear equations
> defining the equilibrium of an economy (see
> http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf  if you
> want to learn a bit how we obtain it). The rows, you suggested to
> vectorise, are indeed vectorisable because they define equilibrium for
> similiar markets (e.g. production and sale of beverages and food) but do
> not have to be vectorisable in general case. So that not to delve into too
> much details I will stop here in description of how the equations
> originate. However, I would like to point that similiar large systems of
> linear equations may arise in other fields (
> https://en.wikipedia.org/wiki/Steady_state ) and there may be other
> packages that generate similar large systems (e.g. network problems like
> hydraulic networks). In that case, reports such as mine may help you to
> assess the scale of the problems.
>
> Thank you for suggestions for improvement in our approach, i am going to
> discuss them with other package developers.
>
> Regards,
> Karol Podemski
>
> pon., 13 sie 2018 o 18:02 Tomas Kalibera 
> napisał(a):
>
>> Dear Karol,
>>
>> thank you for the report. I can reproduce that the function from you
>> example takes very long to compile and I can see where most time is spent.
>> The compiler is itself written in R and requires a lot of resources for
>> large functions (foo() has over 16,000 lines of code, nearly 1 million of
>> instructions/operands, 45,000 constants). In particular a lot of time is
>> spent in garbage collection and in finding a unique set of constants. Some
>> optimizations of the compiler may be possible, but it is unlikely that
>> functions this large will compile fast any soon. For non-generated code, we
>> now have the byte-

Re: [Rd] Where does L come from?

2018-08-27 Thread Adam M. Dobrin
most likely L comes from Michel or Obelisk.

http://img.izing.ml/MARSHALL.html = why you are making Mars colonization
(and space) "just a game"
http://img.izing.ml/IT.html = why i could care less.
ᐧ

On Sun, Aug 26, 2018 at 11:30 PM, William Dunlap via R-devel <
r-devel@r-project.org> wrote:

> >  the lack of a decimal place had historically not been significant
>
> Version 4 of S (c. 1991) and versions of S+ based on it treated a sequence
> of digits without a decimal  point as integer.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sat, Aug 25, 2018 at 4:33 PM, Duncan Murdoch 
> wrote:
>
> > On 25/08/2018 4:49 PM, Hervé Pagès wrote:
> >
> >> The choice of the L suffix in R to mean "R integer type", which
> >> is mapped to the "int" type at the C level, and NOT to the "long int"
> >> type, is really unfortunate as it seems to be misleading and confusing
> >> a lot of people.
> >>
> >
> > Can you provide any evidence of that (e.g. a link to a message from one
> of
> > these people)?  I think a lot of people don't really know about the L
> > suffix, but that's different from being confused or misleaded by it.
> >
> > And if you make a criticism like that, it would really be fair to suggest
> > what R should have done instead.  I can't think of anything better, given
> > that "i" was already taken, and that the lack of a decimal place had
> > historically not been significant.  Using "I" *would* have been confusing
> > (3i versus 3I being very different).  Deciding that 3 suddenly became an
> > integer value different from 3. would have led to lots of inefficient
> > conversions (since stats mainly deals with floating point values).
> >
> > Duncan Murdoch
> >
> >
> >
> >> The fact that nowadays "int" and "long int" have the same size on most
> >> platforms is only anecdotal here.
> >>
> >> Just my 2 cents.
> >>
> >> H.
> >>
> >> On 08/25/2018 10:01 AM, Dirk Eddelbuettel wrote:
> >>
> >>>
> >>> On 25 August 2018 at 09:28, Carl Boettiger wrote:
> >>> | I always thought it meant "Long" (I'm assuming R's integers are long
> >>> | integers in C sense (iirrc one can declare 'long x', and it being
> >>> common to
> >>> | refer to integers as "longs"  in the same way we use "doubles" to
> mean
> >>> | double precision floating point).  But pure speculation on my part,
> so
> >>> I'm
> >>> | curious!
> >>>
> >>> It does per my copy (dated 1990 !!) of the 2nd ed of Kernighan &
> >>> Ritchie.  It
> >>> explicitly mentions (sec 2.2) that 'int' may be 16 or 32 bits, and
> >>> 'long' is
> >>> 32 bit; and (in sec 2.3) introduces the I, U, and L labels for
> >>> constants.  So
> >>> "back then when" 32 bit was indeed long.  And as R uses 32 bit integers
> >>> ...
> >>>
> >>> (It is all murky because the size is an implementation detail and later
> >>> "essentially everybody" moved to 32 bit integers and 64 bit longs as
> the
> >>> 64
> >>> bit architectures became prevalent.  Which is why when it matters one
> >>> should
> >>> really use more explicit types like int32_t or int64_t.)
> >>>
> >>> Dirk
> >>>
> >>>
> >>
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel