[R] loop is going to take 26 hours - needs to be quicker!
Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes ~~ Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
Jenny Barnes wrote: Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } I don't know if it is faster - but adding three columns to qpcc.data, one for longitude, one for lattitude and one for year (using rep() as they are in sequence) and the using reshape() might be faster? So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes ~~ Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Department of Conservation Ecology and Entomology University of Stellenbosch Matieland 7602 South Africa Tel:+27 - (0)72 808 2975 (w) Fax:+27 - (0)86 516 2782 Fax:+27 - (0)21 808 3304 (w) Cell: +27 - (0)83 9479 042 email: [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
On 12/14/2006 7:56 AM, Jenny Barnes wrote: Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! I think the loop above is equivalent to gpcc.array - array(gpcc.data2[,5], c(144, 72, 46)) which would certainly be a lot quicker. You should check that the values are loaded in the right order (probably on a smaller example!). If not, you should change the order of indices when you create the array, and use the aperm() function to get them the way you want afterwards. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
Dear R-help, I forgot to mention that I need the array in that format because I am going to do the same thing for another dataset of precipitation (ncep.data2) so they are both arrays of dimensions [144,72,46] so that I can correlate them globally and plot a visual image of the global correlations between the 2 datasets One of the datasets has a land mask applied to it already so it should be clear to see the land and pick ot the locations (i.e.over Europe) where there is strongest and weakest correlation.that is the ultimate goal. Following Rainer's response I should also point out that the columns in gpcc.data2 (with dimensions dim(gpcc.data2) = [476928,5]) are: [,1]=Year, [,2]=month (which is just january so always 1), [,3]=latitude, [,4]=longitude and [,5]=data. All I want in the gpcc.array is the data not the longitudes and latitude values...hope that helps clear it up a bit! I look forward to hearing any more ideas, thanks again for your time in reading this, Jenny Barnes Jenny Barnes wrote: Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } I don't know if it is faster - but adding three columns to qpcc.data, one for longitude, one for lattitude and one for year (using rep() as they are in sequence) and the using reshape() might be faster? So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes ~~ Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Department of Conservation Ecology and Entomology University of Stellenbosch Matieland 7602 South Africa Tel: +27 - (0)72 808 2975 (w) Fax: +27 - (0)86 516 2782 Fax: +27 - (0)21 808 3304 (w) Cell: +27 - (0)83 9479 042 email: [EMAIL PROTECTED] [EMAIL PROTECTED] Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
What about gpcc.array - array(gpcc.data2[,5], dim=c(144,72,46)) On 14/12/06, Rainer M Krug [EMAIL PROTECTED] wrote: Jenny Barnes wrote: Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } I don't know if it is faster - but adding three columns to qpcc.data, one for longitude, one for lattitude and one for year (using rep() as they are in sequence) and the using reshape() might be faster? So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes ~~ Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Department of Conservation Ecology and Entomology University of Stellenbosch Matieland 7602 South Africa Tel:+27 - (0)72 808 2975 (w) Fax:+27 - (0)86 516 2782 Fax:+27 - (0)21 808 3304 (w) Cell: +27 - (0)83 9479 042 email: [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
On Thu, 2006-12-14 at 12:56 +, Jenny Barnes wrote: Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes Take a whole object approach to this problem. You are also wasting a lot of time by printing the values of 'j' in the loop. gpcc.data2 - matrix(rnorm(476928 * 5), ncol = 5) dim(gpcc.data2) [1] 476928 5 str(gpcc.data2) num [1:476928, 1:5] 2.7385 -0.0438 -0.1084 0.8768 -1.0024 ... system.time(gpcc.array - array(gpcc.data2[, 5], dim = c(144, 72, 46))) [1] 0.024 0.026 0.078 0.000 0.000 You should verify the order of the values and adjust the indices accordingly, if the above results in an out of order array. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
David Barron wrote: What about gpcc.array - array(gpcc.data2[,5], dim=c(144,72,46)) I guess this will be slightly faster then my suggestion :-) ? On 14/12/06, Rainer M Krug [EMAIL PROTECTED] wrote: Jenny Barnes wrote: Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } I don't know if it is faster - but adding three columns to qpcc.data, one for longitude, one for lattitude and one for year (using rep() as they are in sequence) and the using reshape() might be faster? So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes -- Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Department of Conservation Ecology and Entomology University of Stellenbosch Matieland 7602 South Africa Tel:+27 - (0)72 808 2975 (w) Fax:+27 - (0)86 516 2782 Fax:+27 - (0)21 808 3304 (w) Cell: +27 - (0)83 9479 042 email: [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
Dear R-help, Thank you for the responses off everyone- you'll be please to hear Duncan that using: gpcc.array - array(gpcc.data2[,5], c(144, 72, 46)) was spot-on, worked like a dream. The data is in the correct places as I checked with the text file. It took literally 2 seconds - quite an improvement time on the predicted 26 hours :-) I really really appreciate your help, you're all very very kind people. Merry Christmas, Jenny Barnes Date: Thu, 14 Dec 2006 08:17:24 -0500 From: Duncan Murdoch [EMAIL PROTECTED] User-Agent: Thunderbird 1.5.0.8 (Windows/20061025) MIME-Version: 1.0 To: Jenny Barnes [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] loop is going to take 26 hours - needs to be quicker! Content-Transfer-Encoding: 7bit X-MSSL-MailScanner-Information: Please contact the ISP for more information X-MSSL-MailScanner: No virus found X-MSSL-MailScanner-SpamCheck: not spam, SpamAssassin (score=-4.9, required 5, BAYES_00 -4.90) On 12/14/2006 7:56 AM, Jenny Barnes wrote: Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! I think the loop above is equivalent to gpcc.array - array(gpcc.data2[,5], c(144, 72, 46)) which would certainly be a lot quicker. You should check that the values are loaded in the right order (probably on a smaller example!). If not, you should change the order of indices when you create the array, and use the aperm() function to get them the way you want afterwards. Duncan Murdoch Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
Jenny Barnes wrote: Dear R-help, Thank you for the responses off everyone- you'll be please to hear Duncan that using: gpcc.array - array(gpcc.data2[,5], c(144, 72, 46)) was spot-on, worked like a dream. The data is in the correct places as I checked with the text file. It took literally 2 seconds - quite an improvement time on the predicted 26 hours :-) However now you cant tell your supervisor that your data manipulation will take 26 hours - giving you a day to get your Xmas shopping done... Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop is going to take 26 hours - needs to be quicker!
Dear Patrick, Thank you for the link - I'd advise anyone who's started using R to have a look at these as well - any help is always appreciated. I've downloaded the S Poetry and will hit the books tomorrow and get reading it! Jenny S Poetry may be of use to you -- especially the chapter on arrays which discusses 3 dimensional arrays in particular. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Jenny Barnes wrote: Dear R-help, I forgot to mention that I need the array in that format because I am going to do the same thing for another dataset of precipitation (ncep.data2) so they are both arrays of dimensions [144,72,46] so that I can correlate them globally and plot a visual image of the global correlations between the 2 datasets One of the datasets has a land mask applied to it already so it should be clear to see the land and pick ot the locations (i.e.over Europe) where there is strongest and weakest correlation.that is the ultimate goal. Following Rainer's response I should also point out that the columns in gpcc.data2 (with dimensions dim(gpcc.data2) = [476928,5]) are: [,1]=Year, [,2]=month (which is just january so always 1), [,3]=latitude, [,4]=longitude and [,5]=data. All I want in the gpcc.array is the data not the longitudes and latitude values...hope that helps clear it up a bit! I look forward to hearing any more ideas, thanks again for your time in reading this, Jenny Barnes Jenny Barnes wrote: Dear R-help, I have a loop, which is set to take about 26 hours to run at the rate it's going - this is ridiculous and I really need your help to find a more efficient way of loading up my array gpcc.array: #My data is stored in a table format with all the data in one long column #running though every longitute, for every latitude, for every year. The #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where #the 5th column is the data: #make the array in the format I need [longitude,latitude,years] gpcc.array - array(NA, c(144,72,46)) n=0 for(k in 1:46){ for(j in 1:72){ for(i in 1:144){ n - n+1 gpcc.array[i,j,k] - gpcc.data2[n,5] print(j) } } } I don't know if it is faster - but adding three columns to qpcc.data, one for longitude, one for lattitude and one for year (using rep() as they are in sequence) and the using reshape() might be faster? So it runs through all the longs for every lat for every year - which is the order the data is running down the column in gpcc.data2 so n increses by 1 each time and each data point is pulled off It needs to be a lot quicker, I'd appreciate any ideas! Many thanks for taking time to read this, Jenny Barnes ~~ Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation Biology (UCT) Department of Conservation Ecology and Entomology University of Stellenbosch Matieland 7602 South Africa Tel: +27 - (0)72 808 2975 (w) Fax: +27 - (0)86 516 2782 Fax: +27 - (0)21 808 3304 (w) Cell:+27 - (0)83 9479 042 email: [EMAIL PROTECTED] [EMAIL PROTECTED] Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Jennifer Barnes PhD student - long range drought prediction Climate Extremes Department of Space and Climate Physics University College London Holmbury St Mary, Dorking Surrey RH5 6NT 01483 204149 07916 139187 Web: http://climate.mssl.ucl.ac.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.