Re: [R] R Running slow on Ubuntu

2014-03-16 Thread Russell Bainer
Thanks guys. I'll look into this and tell you if I come up with anything.

-R


On Saturday, March 15, 2014, Jeff Newmiller jdnew...@dcn.davis.ca.us
wrote:

 Comparing with an unspecified benchmark makes answering this too hard.
 Following instructions in the Posting Guide will lead to more accurate Q
 and A.

 Note that you may not need to compile if you have not as yet followed the
 recommendations: http://cran.r-project.org/bin/linux/ubuntu/README. There
 are apparently compile-time options that can obtain noticeable improvements
 for certain classes of problems, but if you and your friend are both using
 standard installs that seems unlikely to explain the difference. I have not
 needed a custom compile (yet?).
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.us javascript:;Basics: ##.#.
 ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On March 15, 2014 3:46:57 AM PDT, Augusto Cesar 
 augusto.ce...@gmail.comjavascript:;
 wrote:
 My guess is that maybe the default Ubuntu binaries aren't compiled
 with MKL (Math Kernel Library) support and thus with no
 multithreading.
 
 I would suggest doing a quick research on how to re-compile R with MKL
 support and maybe you'll be good to go.
 
 On Fri, Mar 14, 2014 at 9:45 PM, Russell Bainer 
 russ.bai...@gmail.comjavascript:;
 
 wrote:
  Hi All,
 
  I've run across an odd phenomenon and I am wondering if someone might
 be
  able to provide insight as to what is going on. I'm running some R
 code
  that was provided by a collaborator, who is not a very experienced R
  programmer (e.g., the code is functional but not very efficient).
 When I
  run it from the terminal or command line everything executes, albeit
 very
  slowly- the logfile suggests that the program is about 5% done after
  running over last weekend. Top indicates that it is maxing out one of
 my
  CPUs and chewing up a lot of memory, which I expect.
 
   The strange thing is that my collaborator insists that the code
 executes
  on the order of minutes on his 2012 macbook pro with 8G of memory. I
 am
  running it with ubuntu 12.04 on a dual-core i7 with 32G, and it's
 slow as
  molasses. That suggests a configuration issue of some kind with R
 that I
  might not be aware of (I am more experienced in R and usually don't
 write
  code that requires resources like that). I have played with my
 swappiness
  and the effect seems to be minimal. Can anyone suggest something else
 that
  could be going on? I have considered trying to run it directly on a
 unix
  server, but the code has a lot of third-party dependencies that would
 be a
  bit of work to set up for simple troubleshooting. And naturally I'd
 prefer
  that R be configured correctly in the event that I need to locally
 run
  something more intense in the future.
 
   Thanks in advance for any advice you can give. This message has been
  cross-posted omn the ubuntu forums.
 
  -R
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org javascript:; mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Running slow on Ubuntu

2014-03-16 Thread Mitchell Maltenfort
http://www.cybaea.net/Blogs/Faster-R-through-better-BLAS.html any help?
On Mar 16, 2014 9:38 PM, Russell Bainer russ.bai...@gmail.com wrote:

 Thanks guys. I'll look into this and tell you if I come up with anything.

 -R


 On Saturday, March 15, 2014, Jeff Newmiller jdnew...@dcn.davis.ca.us
 wrote:

  Comparing with an unspecified benchmark makes answering this too hard.
  Following instructions in the Posting Guide will lead to more accurate Q
  and A.
 
  Note that you may not need to compile if you have not as yet followed the
  recommendations: http://cran.r-project.org/bin/linux/ubuntu/README.
 There
  are apparently compile-time options that can obtain noticeable
 improvements
  for certain classes of problems, but if you and your friend are both
 using
  standard installs that seems unlikely to explain the difference. I have
 not
  needed a custom compile (yet?).
 
 ---
  Jeff NewmillerThe .   .  Go
 Live...
  DCN:jdnew...@dcn.davis.ca.us javascript:;Basics: ##.#.
  ##.#.  Live Go...
Live:   OO#.. Dead: OO#..  Playing
  Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
  /Software/Embedded Controllers)   .OO#.   .OO#.
  rocks...1k
 
 ---
  Sent from my phone. Please excuse my brevity.
 
  On March 15, 2014 3:46:57 AM PDT, Augusto Cesar augusto.ce...@gmail.com
 javascript:;
  wrote:
  My guess is that maybe the default Ubuntu binaries aren't compiled
  with MKL (Math Kernel Library) support and thus with no
  multithreading.
  
  I would suggest doing a quick research on how to re-compile R with MKL
  support and maybe you'll be good to go.
  
  On Fri, Mar 14, 2014 at 9:45 PM, Russell Bainer russ.bai...@gmail.com
 javascript:;
  
  wrote:
   Hi All,
  
   I've run across an odd phenomenon and I am wondering if someone might
  be
   able to provide insight as to what is going on. I'm running some R
  code
   that was provided by a collaborator, who is not a very experienced R
   programmer (e.g., the code is functional but not very efficient).
  When I
   run it from the terminal or command line everything executes, albeit
  very
   slowly- the logfile suggests that the program is about 5% done after
   running over last weekend. Top indicates that it is maxing out one of
  my
   CPUs and chewing up a lot of memory, which I expect.
  
The strange thing is that my collaborator insists that the code
  executes
   on the order of minutes on his 2012 macbook pro with 8G of memory. I
  am
   running it with ubuntu 12.04 on a dual-core i7 with 32G, and it's
  slow as
   molasses. That suggests a configuration issue of some kind with R
  that I
   might not be aware of (I am more experienced in R and usually don't
  write
   code that requires resources like that). I have played with my
  swappiness
   and the effect seems to be minimal. Can anyone suggest something else
  that
   could be going on? I have considered trying to run it directly on a
  unix
   server, but the code has a lot of third-party dependencies that would
  be a
   bit of work to set up for simple troubleshooting. And naturally I'd
  prefer
   that R be configured correctly in the event that I need to locally
  run
   something more intense in the future.
  
Thanks in advance for any advice you can give. This message has been
   cross-posted omn the ubuntu forums.
  
   -R
  
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org javascript:; mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Running slow on Ubuntu

2014-03-15 Thread Russell Bainer
Hi All,

I've run across an odd phenomenon and I am wondering if someone might be
able to provide insight as to what is going on. I'm running some R code
that was provided by a collaborator, who is not a very experienced R
programmer (e.g., the code is functional but not very efficient). When I
run it from the terminal or command line everything executes, albeit very
slowly- the logfile suggests that the program is about 5% done after
running over last weekend. Top indicates that it is maxing out one of my
CPUs and chewing up a lot of memory, which I expect.

 The strange thing is that my collaborator insists that the code executes
on the order of minutes on his 2012 macbook pro with 8G of memory. I am
running it with ubuntu 12.04 on a dual-core i7 with 32G, and it's slow as
molasses. That suggests a configuration issue of some kind with R that I
might not be aware of (I am more experienced in R and usually don't write
code that requires resources like that). I have played with my swappiness
and the effect seems to be minimal. Can anyone suggest something else that
could be going on? I have considered trying to run it directly on a unix
server, but the code has a lot of third-party dependencies that would be a
bit of work to set up for simple troubleshooting. And naturally I'd prefer
that R be configured correctly in the event that I need to locally run
something more intense in the future.

 Thanks in advance for any advice you can give. This message has been
cross-posted omn the ubuntu forums.

-R

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Running slow on Ubuntu

2014-03-15 Thread Augusto Cesar
My guess is that maybe the default Ubuntu binaries aren't compiled
with MKL (Math Kernel Library) support and thus with no
multithreading.

I would suggest doing a quick research on how to re-compile R with MKL
support and maybe you'll be good to go.

On Fri, Mar 14, 2014 at 9:45 PM, Russell Bainer russ.bai...@gmail.com wrote:
 Hi All,

 I've run across an odd phenomenon and I am wondering if someone might be
 able to provide insight as to what is going on. I'm running some R code
 that was provided by a collaborator, who is not a very experienced R
 programmer (e.g., the code is functional but not very efficient). When I
 run it from the terminal or command line everything executes, albeit very
 slowly- the logfile suggests that the program is about 5% done after
 running over last weekend. Top indicates that it is maxing out one of my
 CPUs and chewing up a lot of memory, which I expect.

  The strange thing is that my collaborator insists that the code executes
 on the order of minutes on his 2012 macbook pro with 8G of memory. I am
 running it with ubuntu 12.04 on a dual-core i7 with 32G, and it's slow as
 molasses. That suggests a configuration issue of some kind with R that I
 might not be aware of (I am more experienced in R and usually don't write
 code that requires resources like that). I have played with my swappiness
 and the effect seems to be minimal. Can anyone suggest something else that
 could be going on? I have considered trying to run it directly on a unix
 server, but the code has a lot of third-party dependencies that would be a
 bit of work to set up for simple troubleshooting. And naturally I'd prefer
 that R be configured correctly in the event that I need to locally run
 something more intense in the future.

  Thanks in advance for any advice you can give. This message has been
 cross-posted omn the ubuntu forums.

 -R

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Augusto

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Running slow on Ubuntu

2014-03-15 Thread Jeff Newmiller
Comparing with an unspecified benchmark makes answering this too hard. 
Following instructions in the Posting Guide will lead to more accurate Q and A.

Note that you may not need to compile if you have not as yet followed the 
recommendations: http://cran.r-project.org/bin/linux/ubuntu/README. There are 
apparently compile-time options that can obtain noticeable improvements for 
certain classes of problems, but if you and your friend are both using standard 
installs that seems unlikely to explain the difference. I have not needed a 
custom compile (yet?).
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 15, 2014 3:46:57 AM PDT, Augusto Cesar augusto.ce...@gmail.com wrote:
My guess is that maybe the default Ubuntu binaries aren't compiled
with MKL (Math Kernel Library) support and thus with no
multithreading.

I would suggest doing a quick research on how to re-compile R with MKL
support and maybe you'll be good to go.

On Fri, Mar 14, 2014 at 9:45 PM, Russell Bainer russ.bai...@gmail.com
wrote:
 Hi All,

 I've run across an odd phenomenon and I am wondering if someone might
be
 able to provide insight as to what is going on. I'm running some R
code
 that was provided by a collaborator, who is not a very experienced R
 programmer (e.g., the code is functional but not very efficient).
When I
 run it from the terminal or command line everything executes, albeit
very
 slowly- the logfile suggests that the program is about 5% done after
 running over last weekend. Top indicates that it is maxing out one of
my
 CPUs and chewing up a lot of memory, which I expect.

  The strange thing is that my collaborator insists that the code
executes
 on the order of minutes on his 2012 macbook pro with 8G of memory. I
am
 running it with ubuntu 12.04 on a dual-core i7 with 32G, and it's
slow as
 molasses. That suggests a configuration issue of some kind with R
that I
 might not be aware of (I am more experienced in R and usually don't
write
 code that requires resources like that). I have played with my
swappiness
 and the effect seems to be minimal. Can anyone suggest something else
that
 could be going on? I have considered trying to run it directly on a
unix
 server, but the code has a lot of third-party dependencies that would
be a
 bit of work to set up for simple troubleshooting. And naturally I'd
prefer
 that R be configured correctly in the event that I need to locally
run
 something more intense in the future.

  Thanks in advance for any advice you can give. This message has been
 cross-posted omn the ubuntu forums.

 -R

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Running slow on Ubuntu

2014-03-15 Thread Shige Song
Installing the openbals library may help.

Shige


On Sat, Mar 15, 2014 at 12:00 PM, Jeff Newmiller
jdnew...@dcn.davis.ca.uswrote:

 Comparing with an unspecified benchmark makes answering this too hard.
 Following instructions in the Posting Guide will lead to more accurate Q
 and A.

 Note that you may not need to compile if you have not as yet followed the
 recommendations: http://cran.r-project.org/bin/linux/ubuntu/README. There
 are apparently compile-time options that can obtain noticeable improvements
 for certain classes of problems, but if you and your friend are both using
 standard installs that seems unlikely to explain the difference. I have not
 needed a custom compile (yet?).
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On March 15, 2014 3:46:57 AM PDT, Augusto Cesar augusto.ce...@gmail.com
 wrote:
 My guess is that maybe the default Ubuntu binaries aren't compiled
 with MKL (Math Kernel Library) support and thus with no
 multithreading.
 
 I would suggest doing a quick research on how to re-compile R with MKL
 support and maybe you'll be good to go.
 
 On Fri, Mar 14, 2014 at 9:45 PM, Russell Bainer russ.bai...@gmail.com
 wrote:
  Hi All,
 
  I've run across an odd phenomenon and I am wondering if someone might
 be
  able to provide insight as to what is going on. I'm running some R
 code
  that was provided by a collaborator, who is not a very experienced R
  programmer (e.g., the code is functional but not very efficient).
 When I
  run it from the terminal or command line everything executes, albeit
 very
  slowly- the logfile suggests that the program is about 5% done after
  running over last weekend. Top indicates that it is maxing out one of
 my
  CPUs and chewing up a lot of memory, which I expect.
 
   The strange thing is that my collaborator insists that the code
 executes
  on the order of minutes on his 2012 macbook pro with 8G of memory. I
 am
  running it with ubuntu 12.04 on a dual-core i7 with 32G, and it's
 slow as
  molasses. That suggests a configuration issue of some kind with R
 that I
  might not be aware of (I am more experienced in R and usually don't
 write
  code that requires resources like that). I have played with my
 swappiness
  and the effect seems to be minimal. Can anyone suggest something else
 that
  could be going on? I have considered trying to run it directly on a
 unix
  server, but the code has a lot of third-party dependencies that would
 be a
  bit of work to set up for simple troubleshooting. And naturally I'd
 prefer
  that R be configured correctly in the event that I need to locally
 run
  something more intense in the future.
 
   Thanks in advance for any advice you can give. This message has been
  cross-posted omn the ubuntu forums.
 
  -R
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running *slow*

2011-10-07 Thread thomas.chesney
Thank you Michael and Patrick for your responses. Michael - your code ran in
under 5 minutes, which I find stunning, and Patrick I have sent the Inferno
doc to the copier for printing and reading this weekend.

I now have 8 million values in my lookup table and want to replace each
value in Dat with the index of that value in the lookup table. In line with
Chapter 2 in the Inferno doc, I created a list of appropriate size first,
rather than growing it, but still couldn't figure out how to do it without
looping in R, so it still runs extremely slowly, even just to process the
first 1000 values in Dat. My original code (before I tried specifiying the
size of Dat2) was:

Dat2 - c()

for (i in 1:nrow(Dat))
{
for (j in 1:2)
{
Dat2 - c(Dat2, match(Dat[i,j], ltable))
}}

write(t(edgelist), EL.txt, ncolumns=2)

Can anyone suggest a way of doing this without looping in R? Or is the
bottleneck the c function? I am looking at apply this morning, but Gentleman
(2009) suggests apply isn't very efficient. 

--
View this message in context: 
http://r.789695.n4.nabble.com/Running-slow-tp3878093p3881365.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running *slow*

2011-10-07 Thread Gerrit Eichner

Hi, Thomas,

if I'm not completely mistaken

Dat2 - match( t( Dat), ltable)

should do what you want.

 Hth -- Gerrit

On Fri, 7 Oct 2011, thomas.chesney wrote:


Thank you Michael and Patrick for your responses. Michael - your code ran in
under 5 minutes, which I find stunning, and Patrick I have sent the Inferno
doc to the copier for printing and reading this weekend.

I now have 8 million values in my lookup table and want to replace each
value in Dat with the index of that value in the lookup table. In line with
Chapter 2 in the Inferno doc, I created a list of appropriate size first,
rather than growing it, but still couldn't figure out how to do it without
looping in R, so it still runs extremely slowly, even just to process the
first 1000 values in Dat. My original code (before I tried specifiying the
size of Dat2) was:

Dat2 - c()

for (i in 1:nrow(Dat))
{
for (j in 1:2)
{
Dat2 - c(Dat2, match(Dat[i,j], ltable))
}}

write(t(edgelist), EL.txt, ncolumns=2)

Can anyone suggest a way of doing this without looping in R? Or is the
bottleneck the c function? I am looking at apply this morning, but Gentleman
(2009) suggests apply isn't very efficient.

--
View this message in context: 
http://r.789695.n4.nabble.com/Running-slow-tp3878093p3881365.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running *slow*

2011-10-07 Thread thomas.chesney
Gerrit,

Looks like it does and in less than--an incredible--one minute!

Thank you!

--
View this message in context: 
http://r.789695.n4.nabble.com/Running-slow-tp3878093p3881588.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running *slow*

2011-10-07 Thread thomas.chesney
Making a bit more sense now: If you are translating code into R that has a
double for loop, think. The R Inferno, Page 18.

--
View this message in context: 
http://r.789695.n4.nabble.com/Running-slow-tp3878093p3881951.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Running *slow*

2011-10-06 Thread Thomas
Anyone got any hints on how to make this code more efficient? An early  
version (which to be fair did more than this one is) ran for 330 hours  
and produced no output.


I have a two column table, Dat, with 12,000,000 rows and I want to  
produce a lookup table, ltable, in a 1 dimensional matrix with one  
copy of each of the values in Dat:


for (i in 1:nrow(Dat))
{
for (j in 1:2)
{
#If next value is already in ltable, do nothing
if (is.na(match(Dat[i,j], ltable))){ltable - rbind(ltable,Dat[i,j])}
}
}

but it takes forever to produce anything.

Any advice gratefully received.

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running *slow*

2011-10-06 Thread R. Michael Weylandt
?unique

x - matrix(c(1:6, 6:1),ncol=2)

x.temp - x
dim(x.temp) - NULL
unique(x.temp)

Michael


On Thu, Oct 6, 2011 at 8:37 AM, Thomas chesney@gmail.com wrote:
 Anyone got any hints on how to make this code more efficient? An early
 version (which to be fair did more than this one is) ran for 330 hours and
 produced no output.

 I have a two column table, Dat, with 12,000,000 rows and I want to produce a
 lookup table, ltable, in a 1 dimensional matrix with one copy of each of the
 values in Dat:

 for (i in 1:nrow(Dat))
 {
 for (j in 1:2)
 {
 #If next value is already in ltable, do nothing
 if (is.na(match(Dat[i,j], ltable))){ltable - rbind(ltable,Dat[i,j])}
 }
 }

 but it takes forever to produce anything.

 Any advice gratefully received.

 Thomas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running *slow*

2011-10-06 Thread Patrick Burns

Probably most of the time you're waiting
for this you are in Circle 2 of 'The R
Inferno'.  If the values are numbers,
you might also be in Circle 1.

On 06/10/2011 13:37, Thomas wrote:

Anyone got any hints on how to make this code more efficient? An early
version (which to be fair did more than this one is) ran for 330 hours
and produced no output.

I have a two column table, Dat, with 12,000,000 rows and I want to
produce a lookup table, ltable, in a 1 dimensional matrix with one copy
of each of the values in Dat:

for (i in 1:nrow(Dat))
{
for (j in 1:2)
{
#If next value is already in ltable, do nothing
if (is.na(match(Dat[i,j], ltable))){ltable - rbind(ltable,Dat[i,j])}
}
}

but it takes forever to produce anything.

Any advice gratefully received.

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running *slow*

2011-10-06 Thread R. Michael Weylandt
Patrick is right, most of the time is probably taken up for the
reasons documented in the (masterful) R Inferno, namely the rbind()
calls.

There is another problem though and it gets at the very core of R, and
for that matter, all interpreted languages that I'm familiar with.
I'll give a fairly elementary explanation and gloss over many of the
subtleties that R core worries about so we mere mortals don't have to.

At the end of the day, everything is looped, there's no way to get
around it. However, from a code perspective we have a choice of
looping in C or R. Whenever possible it is better to loop in C than R
and most of the key built-in functions, like unique(), are designed to
do just that. The reason for it is pretty straightforward: consider
what has to happen to run a loop in R:

Iterator is defined: a sequence of C calls start this
first line of loop is hit - interpreted by R - sent to C code -
executed - changed back into an R result - passed to the next line
of the loop
iterator is increased: C again
second line of loop is hit - interpreted by R - sent to C code -
executed - changed back into an R result - passed to the next line
of the loop
etc.

Complicated and/or multiple lines of code only compound the problem
because you have to go up and down multiple times at each iteration.

Looping on the C level gets rid of all those translations between
C/R, save 2, and thereby mightily increases efficiency. Hence, even if
you are using the same (or heaven forbid a faster!) algorithm on the R
level, it can look super slow because of all the moving up and down
the ladder; I don't know how unique.C is implemented, but my guess is
it's more or less like what you have now, with more efficient memory
usage/preallocation, it just looks *much* faster because of the C
architecture.

DISCLAIMER: there are quite a few inaccuracies, most small, maybe a
few large, in here, and I probably only am aware of a small fraction
thereof, but this wasn't intended to be a super accurate explanation.

On another note, I should explain my solution a little more clearly.

A straight call to unique() would check for unique ROWS not values of
x. I take x, make a copy so as not to harm the original object, strip
if of its dimensionality (thereby converting it to a vector
efficiently), and then apply unique() which will now find unique
values. It's not a huge thing, but not immediately apparent from what
I did.

Hope this helps,

Michael


On Thu, Oct 6, 2011 at 11:59 AM, Patrick Burns pbu...@pburns.seanet.com wrote:
 Probably most of the time you're waiting
 for this you are in Circle 2 of 'The R
 Inferno'.  If the values are numbers,
 you might also be in Circle 1.

 On 06/10/2011 13:37, Thomas wrote:

 Anyone got any hints on how to make this code more efficient? An early
 version (which to be fair did more than this one is) ran for 330 hours
 and produced no output.

 I have a two column table, Dat, with 12,000,000 rows and I want to
 produce a lookup table, ltable, in a 1 dimensional matrix with one copy
 of each of the values in Dat:

 for (i in 1:nrow(Dat))
 {
 for (j in 1:2)
 {
 #If next value is already in ltable, do nothing
 if (is.na(match(Dat[i,j], ltable))){ltable - rbind(ltable,Dat[i,j])}
 }
 }

 but it takes forever to produce anything.

 Any advice gratefully received.

 Thomas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 Patrick Burns
 pbu...@pburns.seanet.com
 twitter: @portfolioprobe
 http://www.portfolioprobe.com/blog
 http://www.burns-stat.com
 (home of 'Some hints for the R beginner'
 and 'The R Inferno')

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.