Re: [R] Why isn't R recognising integers as numbers?
Ted Byers wrote: Thanks Jim, Alas, it wasn't this. Here is the output from both of your suggestions: refdata18 = read.csv(K:\\MerchantData\\RiskModel\\Capture.Week.18.csv, header = TRUE,na.strings=) str(refdata18) 'data.frame': 341 obs. of 1 variable: $ X0: int 0 0 0 0 0 0 0 0 0 0 ... Ummm, is there a header line or not? If there isn't, read.csv is going to eat the first observation thinking it is a name (and since it is non-syntactic add an X in front). The scan command looks fine, you just should have assigned it somewhere, x - scan(..) and then fitdistr(x, ) scan(K:\\MerchantData\\RiskModel\\Capture.Week.18.csv, what=0L) Read 342 items [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [76] 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [101] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [126] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [151] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [176] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 [201] 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 [226] 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 [251] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 [276] 7 7 7 8 8 8 8 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 [301] 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 [326] 12 12 12 18 18 18 18 18 18 18 18 18 18 18 18 18 18 -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding layers in ggplot2 (data and code included)
The way you've attempted to get this result seems to align with the way R should work, but it fails in this case. The fix is to break things up a little bit: p - ggplot(mydata, aes(x=Est, y=Tri)) p - p + geom_point(aes(colour=factor(Group),shape=factor(Group))) p - p + geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F) p Eric Juliet Hannah wrote: Here is some sample data: mydata - read.table(textConnection(Est GroupTri 00 4.639644 10 4.579189 20 4.590714 01 4.443696 11 4.588243 21 4.650505 02 4.296608 12 4.826036 22 4.765386),header=TRUE); closeAllConnections(); I can form two plots, scatter and lines, as follows: p - ggplot(mydata, aes(x=Est, y=Tri)) p + geom_point(aes(colour=factor(Group),shape=factor(Group))) and p+ geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F). However, I am unable to have the plots together. I obtain the following error: p + geom_point(aes(colour=factor(Group),shape=factor(Group)))+geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F) Error in `[.data.frame`(df, , var) : undefined columns selected Thanks, Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time series (ts) questions.
I have been working with the base time series object (ts) and I had a couple of questions that hopefully this group can help me with: 1) What is the best why to append an observation to an existing time-series? Suppose I have a time series: t - ts(1:12, frequency=5) This would generate two complete cycles and one remainder. Now I would like to append an observation to this time series. I could use 'c' but then I would need to rebuild the whole time series and I would need to know the frequency etc. I would like some operation like '+' that would simply append the value to the end of the time series (incrementing the 'las time value so thing like cycle() still output the correnct values) but alas t + 10 is already taken as an equally useful operation by adding 10 to each element in the time series (rather than in thie case, appending ts(10,frequency) with a time value of 13 to the time series). 2) How is the best way to get the last time value in a time series? I can do something like: (start(t)[2] - 1) + (end(t)[1]-1) * frequency(t) + end(t)[2] But there has to be an easier way. Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix balancing on margins
Hello, Is there any package in R for balancing matrix I want to estimate a matrix with * a initial matrix (1 everywhere for example) * Row margin * Col margin * distance class vector (each cell of the matrix belong to a distance class) and I want that the distance class repartition will be preserved How can I do such thing? Is there any function already existing or should I compute an iterative script myself? Thanks -- *Patrick PALMIER** **Centre d'Études Techniques de l'Équipement Nord - Picardie Département Infrastructures */*Trafic -- Socio-économie */2, rue de Bruxelles, BP 275 59019 Lille cedex FRANCE Tél: +33 (0) 3 20 49 60 70 Fax: +33 (0) 3 20 49 63 69 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable Selection for data reduction and discriminant anlaysis
Hi Gareth, My data is transformed to the clr or alr under Aitchison geometry, so I am essentially working in Euclidean space. Great: glad to hear it. Has anyone had experience doing stepwise LDA?? I can't for the life of me find any help online about where to start. A better option might be this: Trevor Hastie and a student of his have recently put out a paper that does a step-up from penalized discriminant analysis based, I think, on Trevor's sparse principal component analysis method (in his elasticnet package). http://www-stat.stanford.edu/~hastie/Papers/sda_line.pdf You can get R-code to do the analysis on the first author's website; there's a link in the paper. Bye, Mark. gcam032 wrote: Thanks Mark, I failed to mention that i'm working within a compositional framework. I didn't want to confuse things. My data is transformed to the clr or alr under Aitchison geometry, so I am essentially working in Euclidean space. Has anyone had experience doing stepwise LDA?? I can't for the life of me find any help online about where to start. Thanks Gareth quote author=Mark Difford Hi Gareth, If I use the full composition (31 elements or variables), I can get reasonable separation of my 6 sources. A word of advice: You need to be exceptionally careful when analyzing compositional data. Taking compositions puts your data values into a constrained/bounded space (generally called a simplex) so that most standard statistical procedures (i.e. anything that uses a Euclidean metric, and most do) deliver erroneous results. Pearson wrote a paper on this long ago, but it's generally been ignored (except by Aitchison and the Spanish School of mathematical statisticians). The problem is comparatively well known to geologists, who work with compositional much of the time. R has a very good package for analysing this data-type: see the compositions package (a new release seems iminent). You will be able to get most of the main references from it. (The authors of the package also have a newly-released article in one of the Elsevier journals [unfor. my bib+ are elsewhere so I cannot give details]). You could start by Wiki'ing your way to compositional data. HTH, Mark. Gareth Campbell wrote: Hello all, I'm dealing with geochemical analyses of some rocks. If I use the full composition (31 elements or variables), I can get reasonable separation of my 6 sources. Then when I go onto do LDA with the 6 groups, I get excellent separation. I feel like I should be reducing the variables to thos that are providing the most discrimination between the groups as this is important information for me. I struggle to interpret the PCA plot in a way that helps me (due to the large number of elements). So I'm trying to do some sort of step-wise variable selection. I would love to hear from someone (possibly a geochemist or similar) who does this regularly to determine the best course of action in R to do this. Thanks very much -- Gareth Campbell PhD Candidate The University of Auckland P +649 815 3670 M +6421 256 3511 E [EMAIL PROTECTED] [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Variable-Selection-for-data-reduction-and-discriminant-anlaysis-tp19591270p19602702.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
Hi, You can treat it as a database and use ODBC to fetch data from the CSV file using SQL. See the package RODBC for details about database connections. (I have dealt with similar problems before with RODBC) Regards, Yihui -- Yihui Xie [EMAIL PROTECTED] Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Mon, Sep 22, 2008 at 2:50 PM, José E. Lozano [EMAIL PROTECTED] wrote: Hello, Recently I have been trying to open a huge database with no success. It's a 4GB csv plain text file with around 2000 rows and over 500,000 columns/variables. I have try with The SAS System, but it reads only around 5000 columns, no more. R hangs up when opening. Is there any way to work with parts (a set of columns) of this database, since its impossible to manage it all at once? Is there any way to establish a link to the csv file and to state the columns you want to fetch every time you make an analysis? I've been searching the net, but found little about this topic. Best regards, Jose Lozano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
Hello, Yihui You can treat it as a database and use ODBC to fetch data from the CSV file using SQL. See the package RODBC for details about database connections. (I have dealt with similar problems before with RODBC) Thanks for your tip, I have used RODBC before to read data from MSAccess and MSExcel files, but never I imagined it could work for non-database files such as csv. I will check the RODBC documentation. Best Regards, Jose Lozano -- Jose E. Lozano Alonso Observatorio de Salud Pública. Direccion General de Salud Pública e I+D+I. Junta de Castilla y León. Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
I wouldn't call a 4GB csv text file a 'database'. Obviously, a csv it's not a database itself, I tried to mean (though it seems I was not understood) that I had a huge database, exported to csv file by the people who created it (and I dont have any idea of the original format of the database). Yes, use a database. A real database. I've used MSAccess and there is a limit of 255 columns, as far as I know, so there is no way of import it. Obviously, I won't buy an Oracle license to read this file, so: what database system allows a 50 variables table? MySQL? Do I have to split the file in smaller parts to import in tables to relate them all using an index field? No, but you can establish a link to a database. You want a database. A real relational database. Try: http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases It didn't help, sorry. I perfectly knew what a relational database is (and I humbly consider myself an advanced user on working with MSAccess+VBA, only that I've never face this problem with variables), you should not suppose everyone's stupid, though... Thanks for your help, Best regards Jose Lozano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to keep up with R?
Adaikalavan Ramasamy wrote: I agree! The best way to learn (and remember for longer) is to teach someone else about it. And there is not reason not to repeat some of the anlysis done on SAS with R. That way you can verify your outputs or compare the presentations. If you consistently find differences in the outputs, then trying to figure out the reason may lead you to better understand the methods (e.g. different optimization or estimation procedures). My take on this: I have repeatedly found that it is surprisingly easy to improve on existing (non-R) implementations of statistical and non-statistical computation, when working in R. Something about the structure of the language, something about the package mechanism, something about R-help, something about R-core, something about open-source, something about JSS or R-news, whatever it is, there is SOMETHING ABOUT R which lends itself to straightforward production of quality software. And that something is missing from other programming languages, IMO. rksh Regards, Adai Barry Rowlingson wrote: 2008/9/19 Wensui Liu [EMAIL PROTECTED]: Dear Listers, I've been a big fan of R since graduate school. After working in the industry for years, I haven't had many opportunities to use R and am mainly using SAS. However, I am still forcing myself really hard to stay close to R by reading R-help and books and writing R code by myself for fun. But by and by, I start realizing I have hard time to keep up with R and am afraid that I would totally forget how to program in R. I really like it and am very unwilling to give it up. Is there any idea how I might keep touch with R without using it in work on daily basis? I really appreciate it. -- Robin K. S. Hankin Senior Research Associate Cambridge Centre for Climate Change Mitigation Research (4CMR) Faculty of Economics The University of Cambridge [EMAIL PROTECTED] 01223-764877 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why isn't R recognising integers as numbers?
Hi Ted (from Ted), Just to clarify Marc's comments about dataframes in more basic terms. If you read in data with read.csv() the result returned by the function is a dataframe. This is a specialised kind of list, which you can think of as a list of columns all of the same length. You can think of each column as a vector of elements, all of which must be of the same type within the column, though the type can vary (e.g. numeric, factor, character) between columns. When you display a dataframe, it looks like a matrix, though in R terms it is not really a matrix; it is a list, where each component of the list is a column. Of course a dataframe, like any list, might have only one component. But it is still a list -- and the actual contents are only available one layer down, after you have extracted that component by some means (e.g. by using the $ extractor). Simple example: L - c(1,2,3,4) ## vector L # [1] 1 2 3 4 L.df - data.frame(L=L) ## Dataframe with 1 component named L L.df # L # 1 1 # 2 2 # 3 3 # 4 4 L.df$L ## Extract the component named L # [1] 1 2 3 4 ## Compare with the result of 'L' above # Try a regression on L (this works): lm(L ~ 1) # Call: # lm(formula = L ~ 1) # Coefficients: # (Intercept) # 2.5 # Try a regression on L.df (this doesn't work): lm(L.df ~ 1) # Error in model.frame.default(formula = L.df ~ 1, # drop.unused.levels = TRUE) : # invalid type (list) for variable 'L.df' # But it does after you refer to the component L by name: lm(L.df$L ~ 1) # Call: # lm(formula = L.df$L ~ 1) # Coefficients: # (Intercept) # 2.5 # or: lm(L ~ 1, data=L.df) # Call: # lm(formula = L ~ 1, data = L.df) # Coefficients: # (Intercept) # 2.5 # But you can (for a dataframe, not a general list) use an index method of extraction *as if* it were a matrix (even though it isn't): L.df[,1] # [1] 1 2 3 4 L.df[3,1] # [1] 3 # But compare with: L.df[1] # L # 1 1 # 2 2 # 3 3 # 4 4 which is essentially the same as L.df itself (e.g. lm(L.df[1] ~ 1) will not work in exactly the same way as lm(L.df ~ 1) didn't work). The dataframe structure exists in R because so much data is typically in the row by column (case by variables) layout such as you get in spreadsheets and associated CSV files, and it is very useful to be able to get into this layout directly (and refer to the variables by name, as above). The full generality of a 'list' can also be useful for encapsulating data of a less strictly structured kind, but that is another (longer) story! Helping this helps. Ted. On 22-Sep-08 02:09:29, Ted Byers wrote: Thanks Marc, That was it. For the last 30 years, I'd write my own code, in FORTRAN, C++, or even Java, to do whatever statistical analysis I needed. When at the office, sometimes I could use SAS, but that hasn't been an option for me in years. This is the first time I have had to load real data into R (instead of generating random data to use while playing with some of the stats functions, or manually typing dummy data). I take it, then, that the result of loading data is a data frame, and notjust a matrix or array. Using something like refdata18[, 1] feels rather alien, but I'm sure I'll quickly get used to it. I'd seen it before in the R docs, but it didn't register that I had to use it to get the functions of most interest to me to recognise my data as a vector of numbers, given I'd provided only a vector of integers as input. Thanks Ted Marc Schwartz wrote: on 09/21/2008 08:01 PM Ted Byers wrote: I have a number of files containing anywhere from a few dozen to a few thousand integers, one per record. The statement refdata18 = read.csv(K:\\MerchantData\\RiskModel\\Capture.Week.18.csv, header = TRUE,na.strings=) works fine, and if I type refdata18, I get the integers displayed, one value per record (along with a record number). However, when I try fitdistr(refdata18,negative binomial), or hist.scott(refdata18, prob = TRUE), I get an error: Error in fitdistr(refdata18, negative binomial) : 'x' must be a non-empty numeric vector Or Error in hist.default(x, nclass.scott(x), prob = prob, xlab = xlab, ...) : 'x' must be numeric How can it not recognise integers as numbers? Thanks Ted 'refdata18' is a data frame and the two functions are expecting a numeric vector. If you use: fitdistr(refdata18[, 1], negative binomial) or hist(refdata18[, 1]) you should get a suitable result, presuming that the first column in the data frame is a numeric vector. Use: str(refdata18) to get a sense for the structure of the data frame, including the column names, which you could then use, instead of the above index based syntax. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
Re: [R] Manage huge database
2008/9/22 José E. Lozano [EMAIL PROTECTED]: I wouldn't call a 4GB csv text file a 'database'. It didn't help, sorry. I perfectly knew what a relational database is (and I humbly consider myself an advanced user on working with MSAccess+VBA, only that I've never face this problem with variables), you should not suppose everyone's stupid, though... Maybe you've not lurked on R-help for long enough :) Apologies! A bit more googling tells me both MySQL and PostgreSQL have limits of a few thousand on the number of columns in a table, not a few hundred thousand. An insightful comment on one mailing list is: Of course, the real bottom line is that if you think you need more than order-of-a-hundred columns, your database design probably needs revision anyway ;-) So, how much design is in this data? If none, and what you've basically got is a 2000x50 grid of numbers, then maybe a more raw binary-type format will help - HDF or netCDF? Although I'm not sure how much R support for reading slices of these formats exists, you may be able to use an external utility to write slices out on demand. Random access to parts of these files is pretty fast. http://cran.r-project.org/web/packages/RNetCDF/index.html http://cran.r-project.org/web/packages/hdf5/index.html Thinking back to your 4GB file with 1,000,000,000 entries, that's only 3 bytes per entry (+1 for the comma). What is this data? There may be more efficient ways to handle it. Hope *that* helps... Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matrix balancing on margins
Hello, Is there any package in R for balancing matrix I want to estimate a matrix with * a initial matrix (1 everywhere for example) * Row margin * Col margin * distance class vector (each cell of the matrix belong to a distance class) and I want that the distance class repartition will be preserved How can I do such thing? Is there any function already existing or should I compute an iterative script myself? Thanks -- ** __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
What are you going to do with the data once you have read it in? Are all the data items numeric? If they are numeric, you would need at least 8GB to hold one copy and probably a machine with 32GB if you wanted to do any manipulation on the data. You can use a 'connection' and 'scan' to read the data in chunks and then store it in a more accessible format. A lot would depend on your answer to my first question. On Mon, Sep 22, 2008 at 6:26 AM, José E. Lozano [EMAIL PROTECTED] wrote: Maybe you've not lurked on R-help for long enough :) Apologies! Probably. So, how much design is in this data? If none, and what you've basically got is a 2000x50 grid of numbers, then maybe a more raw Exactly, raw data, but a little more complex since all the 50 variables are in text format, so the width is around 2,500,000. http://cran.r-project.org/web/packages/RNetCDF/index.html http://cran.r-project.org/web/packages/hdf5/index.html Thanks, I will check. Right now I am reading line by line the file. It's time consuming, but since I will do it only once, just to rearrange the data into smaller tables to query, it's ok. Thinking back to your 4GB file with 1,000,000,000 entries, that's only 3 bytes per entry (+1 for the comma). What is this data? There may be more efficient ways to handle it. Is genetic DNA data (individuals genotyped), hence the large amount of columns to analyze. Best Regards, Jose Lozano -- Jose E. Lozano Alonso Observatorio de Salud Pública. Direccion General de Salud Pública e I+D+I. Junta de Castilla y León. Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
2008/9/22 José E. Lozano [EMAIL PROTECTED]: Exactly, raw data, but a little more complex since all the 50 variables are in text format, so the width is around 2,500,000. Thanks, I will check. Right now I am reading line by line the file. It's time consuming, but since I will do it only once, just to rearrange the data into smaller tables to query, it's ok. A language like python, perl, or even awk might be able to help you slice your data up. Is genetic DNA data (individuals genotyped), hence the large amount of columns to analyze. So is each line just ACCGTATAT etc etc? If you have fixed width fields in a file, so that every line is the same length, then you can use random access methods to get to a particular value - just multiply the line length by the row number you want and add the column number. In R you can do this with seek() on a connection. This should be fast because it seeks by bytes, instead of having to scan all the comma-separated stuff. The only problem comes when your data doesn't quite conform, and you can end up reading junk. When doing this, it's a good idea to test your dataset first to make sure the lines and fields are right. Example with dummy.dna: aaaccctttgggaaa gattacagattacaa aaacggg gtgtggg aac each line has 15 bases, and on my OS there's one additional invisible character to mark the line end. Windows uses 2, but your data might not be Windows format... So anyway, my multiplier is 16. Hence to get a slice of the file of four columns from column 7 for some rows: dna=file(dummy.dna) open(dna,open=rb) for(r in 2:4){seek(dna,7+(r-1)*16);print(readChar(dna,4))} [1] gatt [1] [1] The speed of this should be independent of the size of your data file. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rgl: How to position a window during open3d call
Duncan Murdoch wrote: This is fixed now on R-forge; eventually it will make it into the next rgl release on CRAN. You should be able to download a binary of the development version from R-forge sooner. Make sure you get version 0.81.706 or newer. The R-forge version 0.81.706 works as advertised, both on Linux and Windows. Thanks Duncan! Koen Stegen Royal Meteorological Institute of Belgium __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] auto.arima help.
Hello, I am calling the auto.arima method in the forecast package at it returns what seems to be valid Arima output. But when I feed this output to 'predict' I get: Error in predict.Arima(catall.fit[[.index]], n.ahead = 12) : 'xreg' and 'newxreg' have different numbers of columns Is there a way to tell what is being supplied to xreg from the Arima output? Any ideas? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help for R
Please read the posting guide an tell us: - Which version of R - Which OS? - Which version of the matlab package (I guess you are using that one?) - If Windows and a binary version of the matlab package: Does the binary it fit to your version of R? Uwe Ligges Mac wrote: Dear R users£¬ I've just started learning R and I'm having a problem with it. I was told as following when I tried to run R: Error in loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = keep.source) : in 'matlab' methods specified for export, but none defined: sum, size, padarray, flipud, fliplr Error: package/namespace load failed for 'matlab' Then I tried package/load in package/matlab, however, the same message showed to me as above. I appreciate for any help and suggestion. Thanks. Kai - ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Joint maximum likelihood estimation for ordinal data
Dear R users From what I understand, the joint maximum likelihood procedure for Rasch (availabe in the package MiscPsycho) in R can only be used on binary data. I was wondering if the code is currently being adapted for application to ordinal data? I'm trying to replicate results obtained from Winsteps in R. Best wishes denn -- View this message in context: http://www.nabble.com/Joint-maximum-likelihood-estimation-for-ordinal-data-tp19606190p19606190.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hmisc and Ubuntu (aptitude install)
Matthew, As per the CRAN Ubuntu README http://cran.r-project.org/bin/linux/ubuntu/ install the Ubuntu r-base-dev package to compile R packages from sources. Vincent Le lun. 22 sept. à 00:08, Matthew Pettis a écrit : Hi, I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install. I tried getting Hmisc from within R by issuing the standard 'install.packages' command, but it said I needed 'gfortran' to compile. I thought I could circumvent this by using 'aptitude' to get the package 'r-cran-hmisc', but when I got it, the package had critical missing parts (got 404s). So, I'll be trying to go back and download 'gfortran', but can anybody tell me if this aptitude ubuntu package should be kept up to date and is just currently overlooked? Thanks, Matt -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hmisc and Ubuntu (aptitude install)
On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote: Matthew, As per the CRAN Ubuntu README http://cran.r-project.org/bin/linux/ubuntu/ install the Ubuntu r-base-dev package to compile R packages from sources. Well there should be a working r-cran-hmisc package. You simply got a '404' error indicating that your network access (using http) to the external Ubuntu mirror was broken. Fix that, or download the package by hand. It may be easier to just install the missing package. That said, Vincent is of course entirely correct on the need for r-base-dev. Dirk Vincent Le lun. 22 sept. à 00:08, Matthew Pettis a écrit : Hi, I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install. I tried getting Hmisc from within R by issuing the standard 'install.packages' command, but it said I needed 'gfortran' to compile. I thought I could circumvent this by using 'aptitude' to get the package 'r-cran-hmisc', but when I got it, the package had critical missing parts (got 404s). So, I'll be trying to go back and download 'gfortran', but can anybody tell me if this aptitude ubuntu package should be kept up to date and is just currently overlooked? Thanks, Matt -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
José E. Lozano [EMAIL PROTECTED] writes: Maybe you've not lurked on R-help for long enough :) Apologies! Probably. So, how much design is in this data? If none, and what you've basically got is a 2000x50 grid of numbers, then maybe a more raw Exactly, raw data, but a little more complex since all the 50 variables are in text format, so the width is around 2,500,000. http://cran.r-project.org/web/packages/RNetCDF/index.html http://cran.r-project.org/web/packages/hdf5/index.html Thanks, I will check. Right now I am reading line by line the file. It's time consuming, but since I will do it only once, just to rearrange the data into smaller tables to query, it's ok. Thinking back to your 4GB file with 1,000,000,000 entries, that's only 3 bytes per entry (+1 for the comma). What is this data? There may be more efficient ways to handle it. Is genetic DNA data (individuals genotyped), hence the large amount of columns to analyze. The Bioconductor package snpMatrix is designed for this type of data. See http://www.bioconductor.org/packages/2.2/bioc/html/snpMatrix.html and if that looks promising source('http://bioconductor.org/biocLite.R') biocLite('snpMatrix') Likely you'll quickly want a 64 bit (linux or Mac) machine. Martin Best Regards, Jose Lozano -- Jose E. Lozano Alonso Observatorio de Salud Pública. Direccion General de Salud Pública e I+D+I. Junta de Castilla y León. Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] use of system() under Linux
H I want to use the system() command to execute a command and have to return the result in a r-variable, so I an using intern=TRUE. On the other hand, I want to evaluate the return value of the command, to determine if the command was successful. According to the help, these to objectives are exclusive, either the one or the other. Is this true, or is there another way of accomplishing this? My prefered return value would be a list, consisting of thre entries: return code of the command stderr and the result Thanks Rainer -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Faculty of Science Natural Sciences Building Private Bag X1 University of Stellenbosch Matieland 7602 South Africa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Likelihood between observed and predicted response
Christophe LOOTS Christophe.Loots at ifremer.fr writes: Thank you so much for your help. The function dbinom seems to work very well. However, I'm a bit lost with the dnorm function. Apparently, I have to compute the mean mu and the standard deviation sd but what does it mean exactly? I only have a vector of predicted response and a vector of observed response that I would like to compare! What are mu and sigma. mu is the mean (which you might as well set to the predicted value). sd is the standard deviation; in order to calculate the likelihood in this case, you'll need an *independent* estimate (from somewhere) of the standard deviation. Without thinking about it too carefully I think you could probably get this from sqrt(sum((predicted-observed)^2)/(n-1)) Thanks again. Christophe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding layers in ggplot2 (data and code included)
Hi Juliet, On Sun, Sep 21, 2008 at 11:47 PM, Juliet Hannah [EMAIL PROTECTED] wrote: Here is some sample data: mydata - read.table(textConnection(Est GroupTri 00 4.639644 10 4.579189 20 4.590714 01 4.443696 11 4.588243 21 4.650505 02 4.296608 12 4.826036 22 4.765386),header=TRUE); closeAllConnections(); I can form two plots, scatter and lines, as follows: p - ggplot(mydata, aes(x=Est, y=Tri)) p + geom_point(aes(colour=factor(Group),shape=factor(Group))) and p+ geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F). However, I am unable to have the plots together. I obtain the following error: p + geom_point(aes(colour=factor(Group),shape=factor(Group)))+geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F) Error in `[.data.frame`(df, , var) : undefined columns selected Are you using R 2.7.2? Something in R changed between R 2.7.1 and R 2.7.2 that breaks certain ggplot plots (you code works fine for me without modification). It's on my to do list to fix. You can also simplify your code a little by relying on defaults set in the ggplot() call: ggplot(mydata, aes(Est, Tri, colour = factor(Group))) + geom_point(aes(shape = factor(Group))) + geom_smooth(method = lm, se = F) (Andpleaseusespacesotherwiseitsveryhardtoreadyourcode) Hadley Thanks, Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SmoothScatter plot range issue
Hello, I am attempting to use smoothScatter to plot a heatmap of locations of events in an x-y axis. When I plot the heatmap without passing xlim and ylim parameters, it fills the plot area but the perspective is a bit skewed. I would like to standardize these plots to a uniform window size that does not depend on the range of values in the dataframe. However, when I resize the plot using xlim or ylim, there is a light blue background that surrounds the immediate area of the data (correspnding to the range of the points listed in the dataframe), surrounded by extra white space for the new xlim and ylim values I have added. Some of the rings around the datapoints are also cut off at the margins. I would like to stop the plot from being cut off, and want this light blue range to extend throughout the entire area of the resized plot. I have attempted to add NAs, but it has no effect on expanding this light blue plot area. Code is below. xyz is a dataframe containing two columns with corresponding x and y values library(geneplotter) library(RColorBrewer) layout(matrix(1:1, ncol=2, byrow=TRUE)) smoothScatter(xyz, nrpoints=0, xlim=c(-3,3), ylim=c(0,5),colramp=colorRampPalette(c(#f8f8ff, white, #736AFF, cyan, yellow, #F87431, #FF7F00, red, #7E2217))) ###END Thanks very much for any help, Jason [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] paste with list
Hello, I guess the solution is rather simple but whatever I tried, I don't manage to get the result as I want to have it: I have several vectors of equal length in a list and I'd like to combine all first elements to a single string, all second elements to a single string, ..., all n-th elements to a single string. # Example code (how it should look like): t1 - c(1,2,3) t2 - c(3.4,5.5,1.1) paste(t1,t2, sep=\t) # and now how the data is available tl - list(t1,t2) ??? what do I have to do to get the same output ??? Can anybody help me? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] zoo: hourly values (local time) not unique
Hi! I've got a time series as a zoo object which contains hourly values. My problem is that these values occur in every real hour with regard to daylight savings time. I.e. the last sunday in march, i'll have 23values whereas the last sunday in october contains 25 values instead of 24. Thus if I try to aggregate the data using for example tapply (e.g. to get a monthly mean), I get the error some methods for zoo objects do not work if the index entries in 'order.by' are not unique Any idea how I can solve this without having to remove/add an hour each year manually? Or, as I'm quite new to R, how I could easily manipulate my data so that the missing hour is introduced and the double hour is cut from the data (and the index)? I'd really appreciate your help! Thanks in advance, Arne -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] paste with list
try this: t1 - c(1, 2, 3) t2 - c(3.4, 5.5, 1.1) tl - list(t1, t2) do.call(paste, c(tl, sep = \t)) I hope it helps. Best, Dimitris Antje wrote: Hello, I guess the solution is rather simple but whatever I tried, I don't manage to get the result as I want to have it: I have several vectors of equal length in a list and I'd like to combine all first elements to a single string, all second elements to a single string, ..., all n-th elements to a single string. # Example code (how it should look like): t1 - c(1,2,3) t2 - c(3.4,5.5,1.1) paste(t1,t2, sep=\t) # and now how the data is available tl - list(t1,t2) ??? what do I have to do to get the same output ??? Can anybody help me? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] paste with list
Try this: paste(tl[[1]], tl[[2]], sep=\t) On Mon, Sep 22, 2008 at 11:08 AM, Antje [EMAIL PROTECTED] wrote: Hello, I guess the solution is rather simple but whatever I tried, I don't manage to get the result as I want to have it: I have several vectors of equal length in a list and I'd like to combine all first elements to a single string, all second elements to a single string, ..., all n-th elements to a single string. # Example code (how it should look like): t1 - c(1,2,3) t2 - c(3.4,5.5,1.1) paste(t1,t2, sep=\t) # and now how the data is available tl - list(t1,t2) ??? what do I have to do to get the same output ??? Can anybody help me? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] paste with list
Great! That's exactly what I was looking for. (I see, I still have to learn a lot...) Thank you! Antje Dimitris Rizopoulos schrieb: try this: t1 - c(1, 2, 3) t2 - c(3.4, 5.5, 1.1) tl - list(t1, t2) do.call(paste, c(tl, sep = \t)) I hope it helps. Best, Dimitris Antje wrote: Hello, I guess the solution is rather simple but whatever I tried, I don't manage to get the result as I want to have it: I have several vectors of equal length in a list and I'd like to combine all first elements to a single string, all second elements to a single string, ..., all n-th elements to a single string. # Example code (how it should look like): t1 - c(1,2,3) t2 - c(3.4,5.5,1.1) paste(t1,t2, sep=\t) # and now how the data is available tl - list(t1,t2) ??? what do I have to do to get the same output ??? Can anybody help me? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] zoo: hourly values (local time) not unique
See question #1 in the zoo faq: library(zoo) vignette(zoo-faq) Also in the upcoming zoo 1.6-0, not yet on CRAN but in the development version at R-Forge found here: http://r-forge.r-project.org/projects/zoo/ there are a set of make.unique functions and a make.unique= argument in read.zoo which will provide additional capabilities for uniquifying series. On Mon, Sep 22, 2008 at 10:13 AM, [EMAIL PROTECTED] wrote: Hi! I've got a time series as a zoo object which contains hourly values. My problem is that these values occur in every real hour with regard to daylight savings time. I.e. the last sunday in march, i'll have 23values whereas the last sunday in october contains 25 values instead of 24. Thus if I try to aggregate the data using for example tapply (e.g. to get a monthly mean), I get the error some methods for zoo objects do not work if the index entries in 'order.by' are not unique Any idea how I can solve this without having to remove/add an hour each year manually? Or, as I'm quite new to R, how I could easily manipulate my data so that the missing hour is introduced and the double hour is cut from the data (and the index)? I'd really appreciate your help! Thanks in advance, Arne -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time series (ts) questions.
Try this to append 100 to the end of the series, say: tt - ts(1:12, frequency=5) # sample data ts(c(tt, 100), start = start(tt), frequency = frequency(tt)) On Mon, Sep 22, 2008 at 2:17 AM, [EMAIL PROTECTED] wrote: I have been working with the base time series object (ts) and I had a couple of questions that hopefully this group can help me with: 1) What is the best why to append an observation to an existing time-series? Suppose I have a time series: t - ts(1:12, frequency=5) This would generate two complete cycles and one remainder. Now I would like to append an observation to this time series. I could use 'c' but then I would need to rebuild the whole time series and I would need to know the frequency etc. I would like some operation like '+' that would simply append the value to the end of the time series (incrementing the 'las time value so thing like cycle() still output the correnct values) but alas t + 10 is already taken as an equally useful operation by adding 10 to each element in the time series (rather than in thie case, appending ts(10,frequency) with a time value of 13 to the time series). 2) How is the best way to get the last time value in a time series? I can do something like: (start(t)[2] - 1) + (end(t)[1]-1) * frequency(t) + end(t)[2] But there has to be an easier way. Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
What are you going to do with the data once you have read it in? Are all the data items numeric? If they are numeric, you would need at least 8GB to hold one copy and probably a machine with 32GB if you wanted to do any manipulation on the data. Well, I will use only sets of variables to analyze, I cant manage the full 50 variables at a time, of course. So each time I make an analysis I will extract the information I need, so that's why I wanted an easy way to extract parts of the file. Best regards, Jose Lozano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
So is each line just ACCGTATAT etc etc? Exacty, A_G, A_A, G_G and the such. If you have fixed width fields in a file, so that every line is the same length, then you can use random access methods to get to a particular value - just multiply the line length by the row number you Nice hint! I didnt think on this. But I fear that if I have missing values on the file I wont be able to read the right information... When doing this, it's a good idea to test your dataset first to make sure the lines and fields are right. Yes, I am trying to figure out if all the lines have the exact same lenght to use a random access method to read it. Thanks, Jose Lozano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combine data frames using column names as key
Hi guys, Suppose I have 2 data frames ie: values one0.32 two0.25 three 0.11 and values two0.66 one0.74 three 0.19 nb the first column is the row names in both cases How can I combine them on the row names column? Ie to make something like values.1 values.2 one0.32 0.74 two0.25 0.66 three 0.11 0.19 I guess its data.frame or c.bind but I keep getting errors when I try to combine them on row names... Many many thanks, Jim -- View this message in context: http://www.nabble.com/Combine-data-frames-using-column-names-as-%22key%22-tp19609173p19609173.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combine data frames using column names as key
Try: data.frame(merge(df1, df2, by = row.names), row.names = 1) On Mon, Sep 22, 2008 at 12:34 PM, jimineep [EMAIL PROTECTED] wrote: Hi guys, Suppose I have 2 data frames ie: values one0.32 two0.25 three 0.11 and values two0.66 one0.74 three 0.19 nb the first column is the row names in both cases How can I combine them on the row names column? Ie to make something like values.1 values.2 one0.32 0.74 two0.25 0.66 three 0.11 0.19 I guess its data.frame or c.bind but I keep getting errors when I try to combine them on row names... Many many thanks, Jim -- View this message in context: http://www.nabble.com/Combine-data-frames-using-column-names-as-%22key%22-tp19609173p19609173.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
Why don't you make one pass through your data and encode you characters as integers (it would appear that you only have 16 combinations). You might also want to consider using the 'raw' object since these only take up one byte of storage -- will reduce your storage requirements by 4. Then store each row in a 'filehash' object so you can quickly retrieve a row at a time and then index directly to the byte(s) that have the information that you want. On Mon, Sep 22, 2008 at 7:00 AM, José E. Lozano [EMAIL PROTECTED] wrote: So is each line just ACCGTATAT etc etc? Exacty, A_G, A_A, G_G and the such. If you have fixed width fields in a file, so that every line is the same length, then you can use random access methods to get to a particular value - just multiply the line length by the row number you Nice hint! I didn't think on this. But I fear that if I have missing values on the file I wont be able to read the right information... When doing this, it's a good idea to test your dataset first to make sure the lines and fields are right. Yes, I am trying to figure out if all the lines have the exact same lenght to use a random access method to read it. Thanks, Jose Lozano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
2008/9/22 jim holtman [EMAIL PROTECTED]: Why don't you make one pass through your data and encode you characters as integers (it would appear that you only have 16 combinations). You might also want to consider using the 'raw' object since these only take up one byte of storage -- will reduce your storage requirements by 4. Then store each row in a 'filehash' object so you can quickly retrieve a row at a time and then index directly to the byte(s) that have the information that you want. My original response of specifying a relational database now seems somewhat comical :) Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re lative novice: Working with fitdistr(MASS): 3 questions
OK, I am now at the point where I can use fitdistr to obtain a fit of one of the standard distributions to mydata. It is quite remarkable how different the parameters are for different samples through from the same system. Clearly the system itself is not stationary. Anyway, question 1: I require a visual perspective of the fit I get. I can use hist.scott to get a hisogram (and just have to figure out how to get finer granularity from it - my samples are taken weekly, but the histogram bars cover two weeks of data and the most interesting changes happen in the first three to four weeks - after that things slow down tremendously), but how would I overlay a plot of the best distribution I get from fitdistr over it? Second question: I don't see anything in the documentation for fitdistr that says anything about using the distribution obtained to integrate the distribution over some range of values. I get weekly sampled, and for each sample I get a certain number of events each week for about three months. I need to be able to use the distribution to estimate the number of such events next week or the week after, and how long it will be that the probability of such an event is so low that no more of them are likely to be observed from that sample ever. What package or functions should I be looking at here to get this done? Third question: I see nothing in the docs about non-central distributions. The distribution most likely to fit is cauchy, but we know that there is skew that depends on the magnitude: large positive deviates are more common that large negative deviates, but extremely large positive deviates are less common that extremely large negative deviates. What we don't know is how significant such skewness is for the overall distribution. How can I assess this, or can I assess this, using fitdistr (or some other function I haven't found yet)? Thanks Ted -- View this message in context: http://www.nabble.com/Relative-novice%3A-Working-with-fitdistr%28MASS%29%3A-3-questions-tp19610812p19610812.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
On 22-Sep-08 11:00:30, José E. Lozano wrote: So is each line just ACCGTATAT etc etc? Exacty, A_G, A_A, G_G and the such. If you have fixed width fields in a file, so that every line is the same length, then you can use random access methods to get to a particular value - just multiply the line length by the row number you Nice hint! I didnt think on this. But I fear that if I have missing values on the file I wont be able to read the right information... When doing this, it's a good idea to test your dataset first to make sure the lines and fields are right. Yes, I am trying to figure out if all the lines have the exact same lenght to use a random access method to read it. If you were using Linux, I would suggest a command on the lines of cat filename | awk '{print(length($0))}' which would give you the length of each line. But since you have around 2000 lines, to simply check whether they all have the same length (in bytes/characters) you can extend the above to cat filename | awk '{print(length($0))}' | sort -u which will present you with all the different line-lengths. If they are all the same length you will get one number. I just tested this on a file with lines exceeding 500,000 characters in length, and it worked perfectly well even for such long lines. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 22-Sep-08 Time: 17:03:21 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to set rownames / colnames for matrices in a list
Hello, I have another stupid question. I hope you can give me a hint how to solve this: I have a list and one element is again a list containing matrices, all of the same dimensions. Now, I'd like to set the dimnames for all matrices: example code: m1 - matrix(1:25, nrow=5) m2 - matrix(26:50, nrow=5) # ... there can be much more than two matrices l - list() l[[1]] - list(m1,m2) r_names - LETTERS[1:5] c_names - LETTERS[6:10] ? how can I apply these names to any number of matrices within this list-list ? Ciao, Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using wildcards in subsets
Hi there, I am looking for a way to use wildcards in a subset, this is not working: subset(data, colname-1==valuecolname2==value*, select=colx:coly) is there a way to use wildcards here? Thanks for your help, Daniel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to set rownames / colnames for matrices in a list
Hi, If all your matrices have the same size, you should work with an array and not with a list. Then you can use dimnames to set the names of the rows, columns, and so on.. Alain Antje wrote: Hello, I have another stupid question. I hope you can give me a hint how to solve this: I have a list and one element is again a list containing matrices, all of the same dimensions. Now, I'd like to set the dimnames for all matrices: example code: m1 - matrix(1:25, nrow=5) m2 - matrix(26:50, nrow=5) # ... there can be much more than two matrices l - list() l[[1]] - list(m1,m2) r_names - LETTERS[1:5] c_names - LETTERS[6:10] ? how can I apply these names to any number of matrices within this list-list ? Ciao, Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Guillet Statistician and Computer Scientist Institut de statistique - Université catholique de Louvain Bureau d.126 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help creating spatial correlation for MC simulation
Thank you for the input. Which command in the spatstat package am I looking for? The documentation is unclear to me. milton ruser wrote: Dear J.J.Harden I think that on spatial stat you will find several ways of simulate spatial pattern that (point or line) that may be what you are looking for. Case not, please let me know and may be we can improve some solution. Best wishes, miltinho astronauta brazil On Wed, Sep 17, 2008 at 7:36 PM, jjh21 [EMAIL PROTECTED] wrote: I want to create a dataset in R with spatial correlation (i.e. clustering) built in for a linear regression analysis. Any tips on how to do this? Thanks. -- View this message in context: http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19542145.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19610885.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SmoothScatter plot range issue
Hi, Bioconductor.org is the home of the geneplotter package. You get a quicker response if you ask there. /Henrik On Mon, Sep 22, 2008 at 7:06 AM, Jason Pare [EMAIL PROTECTED] wrote: Hello, I am attempting to use smoothScatter to plot a heatmap of locations of events in an x-y axis. When I plot the heatmap without passing xlim and ylim parameters, it fills the plot area but the perspective is a bit skewed. I would like to standardize these plots to a uniform window size that does not depend on the range of values in the dataframe. However, when I resize the plot using xlim or ylim, there is a light blue background that surrounds the immediate area of the data (correspnding to the range of the points listed in the dataframe), surrounded by extra white space for the new xlim and ylim values I have added. Some of the rings around the datapoints are also cut off at the margins. I would like to stop the plot from being cut off, and want this light blue range to extend throughout the entire area of the resized plot. I have attempted to add NAs, but it has no effect on expanding this light blue plot area. Code is below. xyz is a dataframe containing two columns with corresponding x and y values library(geneplotter) library(RColorBrewer) layout(matrix(1:1, ncol=2, byrow=TRUE)) smoothScatter(xyz, nrpoints=0, xlim=c(-3,3), ylim=c(0,5),colramp=colorRampPalette(c(#f8f8ff, white, #736AFF, cyan, yellow, #F87431, #FF7F00, red, #7E2217))) ###END Thanks very much for any help, Jason [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Statistical question re assessing fit of distribution functions.
I am in a situation where I have to fit a distrution, such as cauchy or normal, to an empirical dataset. Well and good, that is easy. But I wanted to assess just how good the fit is, using ks.test. I am concerned about the following note in the docs (about the example provided): Note that the distribution theory is not valid here as we have estimated the parameters of the normal distribution from the same sample This implies I should not use ks.test(x,pnorm,mean =1.187, sd =0.917), where the numbers shown are estimated from 'x'. If this is so, how do I get a correct test? I know I can not use different samples because of just how different the parameters are from one sample to the next, so using parameters estimated from the sample from week one to define the distribution function for ks.test will give a poor fit for the data from week two. And the sample size is small enough that I would not have confidence in the parameters estimated from a portion of a samlpe to fit against the remainder of the sample. Thanks Ted -- View this message in context: http://www.nabble.com/Statistical-question-re-assessing-fit-of-distribution-functions.-tp19611539p19611539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
Try this: read.table(pipe(/Rtools/bin/gawk -f cut.awk bigdata.dat)) where cut.awk contains the single line (assuming you want fields 101 through 110 and none other): { for(i = 101; i = 110; i++) printf(%s , $i); printf \n } or just use cut. I tried the gawk command above on Windows Vista with an artificial file of 500,000 columns and 2 rows and it seemed instantaneous. On Windows the above uses gawk from Rtools available at: http://www.murdoch-sutherland.com/Rtools/ or you can separately install gawk. Rtools also has cut if you prefer that. On Mon, Sep 22, 2008 at 2:50 AM, José E. Lozano [EMAIL PROTECTED] wrote: Hello, Recently I have been trying to open a huge database with no success. It's a 4GB csv plain text file with around 2000 rows and over 500,000 columns/variables. I have try with The SAS System, but it reads only around 5000 columns, no more. R hangs up when opening. Is there any way to work with parts (a set of columns) of this database, since its impossible to manage it all at once? Is there any way to establish a link to the csv file and to state the columns you want to fetch every time you make an analysis? I've been searching the net, but found little about this topic. Best regards, Jose Lozano [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] changing the text offset for axis labels
Hi, all, I was wondering if there is a way to change the offset of axis labels from the axis. In other words, I need the axis labels closer to the acis than the default. Thanks for the help. Best wishes, Art Roberts University of Washington Seattle, WA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] as.day() Function (zoo question)
I am was going to look at the as.yearmon function in the zoo package and write a as.day function to aggregate a time series of 96 observations per day into the mean for each day, but I don't know how to look at the code so that I can convert it into something I can use. On top of that I believe that it is probably an S3 method and I haven't quite gotten that far in my programming experience. How I want the mean for each day. the real data set has NA s randomly interspersed. library(chron) library(zoo) t1 - chron(1/1/2006, 00:00:00) t2 - chron(12/31/2006, 23:45:00) deltat - times(00:15:00) tt - seq(t1, t2, by = times(00:15:00)) value - rnorm(35040) z - zoo(value, tt) thanks -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.day() Function (zoo question)
chron values are represented as day + fraction of a day so: try this: aggregate(z, floor, mean) On Mon, Sep 22, 2008 at 12:56 PM, stephen sefick [EMAIL PROTECTED] wrote: I am was going to look at the as.yearmon function in the zoo package and write a as.day function to aggregate a time series of 96 observations per day into the mean for each day, but I don't know how to look at the code so that I can convert it into something I can use. On top of that I believe that it is probably an S3 method and I haven't quite gotten that far in my programming experience. How I want the mean for each day. the real data set has NA s randomly interspersed. library(chron) library(zoo) t1 - chron(1/1/2006, 00:00:00) t2 - chron(12/31/2006, 23:45:00) deltat - times(00:15:00) tt - seq(t1, t2, by = times(00:15:00)) value - rnorm(35040) z - zoo(value, tt) thanks -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.day() Function (zoo question)
perfect thanks On Mon, Sep 22, 2008 at 1:07 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: chron values are represented as day + fraction of a day so: try this: aggregate(z, floor, mean) On Mon, Sep 22, 2008 at 12:56 PM, stephen sefick [EMAIL PROTECTED] wrote: I am was going to look at the as.yearmon function in the zoo package and write a as.day function to aggregate a time series of 96 observations per day into the mean for each day, but I don't know how to look at the code so that I can convert it into something I can use. On top of that I believe that it is probably an S3 method and I haven't quite gotten that far in my programming experience. How I want the mean for each day. the real data set has NA s randomly interspersed. library(chron) library(zoo) t1 - chron(1/1/2006, 00:00:00) t2 - chron(12/31/2006, 23:45:00) deltat - times(00:15:00) tt - seq(t1, t2, by = times(00:15:00)) value - rnorm(35040) z - zoo(value, tt) thanks -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading in results from system(). There must be an easier way...
Sorry, I misunderstood what I was doing and misspoke. I don't think there's a bug. I had called COMMAND w/in read.delim. Thanks for all of your help and sorry for the misinformation. Sincerely, Mike - Department of Ecology Evolutionary Biology 569 Dabney Hall University of Tennessee Knoxville, TN 37996-1610 phone:(865) 974-6453 fax: (865) 974-6042 web: http://eeb.bio.utk.edu/gilchrist.asp - On Thu, 18 Sep 2008, Henrik Bengtsson wrote: On Thu, Sep 18, 2008 at 1:39 PM, Michael A. Gilchrist [EMAIL PROTECTED] wrote: Wow, that's elegant and simple. It's also faster than my approach. NB, you don't need to use close(), read.delim() closes the pipe when its done reading. If read.delim() close the connection in this case, it's a bug. It should only close the connection if it opens it. /Henrik Thank you all for your suggestions, they really helped me with this problem and understand R just a bit better. Sincerely, Mike - Department of Ecology Evolutionary Biology 569 Dabney Hall University of Tennessee Knoxville, TN 37996-1610 phone:(865) 974-6453 fax: (865) 974-6042 web: http://eeb.bio.utk.edu/gilchrist.asp - On Fri, 12 Sep 2008, Prof Brian Ripley wrote: Why not use con - pipe(COMMAND) foo - read.delim(con, colClasses=numeric) close(con) ? See the 'R Data Input/Output Manual'. On Fri, 12 Sep 2008, Michael A. Gilchrist wrote: Hello, I am currently using R to run an external program and then read the results the external program sends to the stdout which are tsv data. When R reads the results in it converts it to to a list of strings which I then have to maniuplate with a whole slew of commands (which, figuring out how to do was a reall challenge for a newbie like myself)--see below. Here's the code I'm using. COMMAND runs the external program. rawInput= system(COMMAND,intern=TRUE);##read in tsv values rawInput = strsplit(rawInput, split=\t);##split elements w/in the list ##of character strings by \t rawInput = unlist(rawInput); ##unlist, making it one long vector mode(rawInput)=double; ##convert from strings to double finalInput = data.frame(t(matrix(rawInput, nrow=6))); ##convert Because I will be doing this 100,000 of times as part of an optimization problem, I am interested in learning a more efficient way of doing this conversion. Any suggestions would be appreciated. Thanks in advance. Mike - Department of Ecology Evolutionary Biology 569 Dabney Hall University of Tennessee Knoxville, TN 37996-1610 phone:(865) 974-6453 fax: (865) 974-6042 web: http://eeb.bio.utk.edu/gilchrist.asp -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statistical question re assessing fit of distribution functions.
If one of the goals is the normality test, then there may be better alternatives to the Kolmogorov-Smirnov test. See an explanation on: http://graphpad.com/FAQ/viewfaq.cfm?faq=959 The R implementation: ?shapiro.test A casual search also turned this up: http://tolstoy.newcastle.edu.au/R/help/04/09/3201.html http://tolstoy.newcastle.edu.au/R/help/04/08/3121.html http://www.karlin.mff.cuni.cz/~pawlas/2008/MAI061/dagost.R Best, Timur -- Timur Shtatland, Ph.D. Senior Bioinformatics Scientist Agencourt Bioscience Corporation - A Beckman Coulter Company 500 Cummings Center, Suite 2450 Beverly, MA 01915 www.agencourt.com On Mon, Sep 22, 2008 at 12:26 PM, Ted Byers [EMAIL PROTECTED] wrote: I am in a situation where I have to fit a distrution, such as cauchy or normal, to an empirical dataset. Well and good, that is easy. But I wanted to assess just how good the fit is, using ks.test. I am concerned about the following note in the docs (about the example provided): Note that the distribution theory is not valid here as we have estimated the parameters of the normal distribution from the same sample This implies I should not use ks.test(x,pnorm,mean =1.187, sd =0.917), where the numbers shown are estimated from 'x'. If this is so, how do I get a correct test? I know I can not use different samples because of just how different the parameters are from one sample to the next, so using parameters estimated from the sample from week one to define the distribution function for ks.test will give a poor fit for the data from week two. And the sample size is small enough that I would not have confidence in the parameters estimated from a portion of a samlpe to fit against the remainder of the sample. Thanks Ted -- View this message in context: http://www.nabble.com/Statistical-question-re-assessing-fit-of-distribution-functions.-tp19611539p19611539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hmisc and Ubuntu (aptitude install)
Thank You All, I think all of this may have been due to shared library conflict headaches. At one point, I inadvertently upgraded my Perl install to 5.10, and I think that messed up a lot of my libraries. I have now started with a clean Ubuntu install, and am going to see if I can work my way back up to installing R and making that work. I will recontact the list if this problem persists through this reimaging of my server. Thanks again, Matt On Mon, Sep 22, 2008 at 8:20 AM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote: Matthew, As per the CRAN Ubuntu README http://cran.r-project.org/bin/linux/ubuntu/ install the Ubuntu r-base-dev package to compile R packages from sources. Well there should be a working r-cran-hmisc package. You simply got a '404' error indicating that your network access (using http) to the external Ubuntu mirror was broken. Fix that, or download the package by hand. It may be easier to just install the missing package. That said, Vincent is of course entirely correct on the need for r-base-dev. Dirk Vincent Le lun. 22 sept. à 00:08, Matthew Pettis a écrit : Hi, I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install. I tried getting Hmisc from within R by issuing the standard 'install.packages' command, but it said I needed 'gfortran' to compile. I thought I could circumvent this by using 'aptitude' to get the package 'r-cran-hmisc', but when I got it, the package had critical missing parts (got 404s). So, I'll be trying to go back and download 'gfortran', but can anybody tell me if this aptitude ubuntu package should be kept up to date and is just currently overlooked? Thanks, Matt -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Three out of two people have difficulties with fractions. -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing the text offset for axis labels
Look at ?par and scroll down to the section on 'mgp'. Or you can suppress the axis when you make the plot, then use the axis function to include it with more control (see ?axis). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of Arthur Roberts Sent: Monday, September 22, 2008 10:23 AM To: [EMAIL PROTECTED] Subject: [R] changing the text offset for axis labels Hi, all, I was wondering if there is a way to change the offset of axis labels from the axis. In other words, I need the axis labels closer to the acis than the default. Thanks for the help. Best wishes, Art Roberts University of Washington Seattle, WA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to execute external programs with R?
Hi, all, Could anyone give me advise on who the execute external programs with R? It would be greatly appreciated. Art Roberts University of Washington. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to execute external programs with R?
On 9/22/2008 2:50 PM, Arthur Roberts wrote: Hi, all, Could anyone give me advise on who the execute external programs with R? It would be greatly appreciated. The system() or shell() functions can do this; Windows also has shell.exec(). Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Building binary package fails because of missing dependent package
On an (Intel Leopard) Mac I try to build a package (mxFinance) which depends on another package (mxGraphics). The dependendy is 1) a 'Depends:' in DESCRIPTION and 2) an import in NAMESPACE. - The build fails if the dependent package (mxGraphics) is not installed in the R.framework Do I need to have installed all packages which are required by packages to be built binary (source builds are ok)? Cheers, Hans-Peter --- Macintosh:mxFinance chappi$ R CMD BUILD --binary mxFinance * checking for file 'mxFinance/DESCRIPTION' ... OK * preparing 'mxFinance': * checking DESCRIPTION meta-information ... OK * cleaning src * removing junk files * checking for LF line-endings in source and make files * checking for empty or unneeded directories * building binary distribution * Installing *source* package 'mxFinance' ... ** libs ** arch - i386 gcc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -mmacosx-version-min=10.4 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -msse3 -fPIC -g -O2 -march=nocona -c init.c -o init.o gcc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -mmacosx-version-min=10.4 -std=gnu99 -dynamiclib -Wl,-headerpad_max_install_names -mmacosx-version-min=10.4 -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/lib -o mxFinance.so init.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation ld: warning, duplicate dylib /Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libgcc_s.1.dylib ** arch - ppc gcc -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -mmacosx-version-min=10.4 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/ppc -I/usr/local/include-fPIC -g -O2 -c init.c -o init.o gcc -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -mmacosx-version-min=10.4 -std=gnu99 -dynamiclib -Wl,-headerpad_max_install_names -mmacosx-version-min=10.4 -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/lib -o mxFinance.so init.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation ld: warning, duplicate dylib /Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libgcc_s.1.dylib ** R ** data ** preparing package for lazy loading Loading required package: mxGraphics Warning in library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'mxGraphics' Error: package 'mxGraphics' could not be loaded Execution halted ERROR: lazy loading failed for package 'mxFinance' ** Removing '/var/folders/xr/xr01D7JAEtGe4S5uaDQSgTI/-Tmp-/Rinst881133514/mxFinance' ERROR * installation failed __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gbm error
Good afternoon Has anyone tried using Dr. Elith's BRT script? I cannot seem to run gbm.step from the installed gbm package. Is it something external to gbm? When I run the script itself - gbm.step(data=model.data, gbm.x = colx:coly, gbm.y = colz, family = bernoulli, tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5) ... I keep encountering the same error: ERROR: unexpected ')' in bag.fraction = 0.5) I've tried all sorts of variations (such as) sep22BRT.lr01 - gbm{data=sep22BRT, gbm.x = sep22BRT[,3:42], gbm.y = sep22BRT[,1], family = bernoulli, tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5} and cannot find the problem. Is there a glaring error that I am overlooking? Darin Brooks Geomatics/GIS/Remote Sensing Coordinator Kim Forest Management Ltd. Cranbrook Office Cranbrook, BC [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] change the panel name in xyplot
Hi, I try to change the panel name in a xyplot without success. Look this example from xyplot manual: xyplot(Murder ~ Population | state.region,data=states) The panel title are: Northeast, South, North Central, West, that are factor from state.region. I need do change some names and, for example, put some of these in italic. I dont find how change this. I looking for this in Deepayan Sakar lattice book, but I dont find the way. Any help? Thanks Ronaldo -- You can't make a program without broken egos. -- Prof. Ronaldo Reis Júnior | .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional | : :' : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil | `- Fone: (38) 3229-8192 | [EMAIL PROTECTED] | [EMAIL PROTECTED] | http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366 -- Favor NÃO ENVIAR arquivos do Word ou Powerpoint Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change the panel name in xyplot
Try this: xyplot(Murder ~ Population | state.region, data = states, strip = strip.custom(factor.levels = c(expression(italic(A)), B, C, D))) On Mon, Sep 22, 2008 at 4:33 PM, Ronaldo Reis Junior [EMAIL PROTECTED] wrote: Hi, I try to change the panel name in a xyplot without success. Look this example from xyplot manual: xyplot(Murder ~ Population | state.region,data=states) The panel title are: Northeast, South, North Central, West, that are factor from state.region. I need do change some names and, for example, put some of these in italic. I dont find how change this. I looking for this in Deepayan Sakar lattice book, but I dont find the way. Any help? Thanks Ronaldo -- You can't make a program without broken egos. -- Prof. Ronaldo Reis Júnior | .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional | : :' : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil | `- Fone: (38) 3229-8192 | [EMAIL PROTECTED] | [EMAIL PROTECTED] | http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366 -- Favor NÃO ENVIAR arquivos do Word ou Powerpoint Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] findInterval(), binary search, log(N) complexity
Dear R users, the help for findInterval(x,vec) suggests a logarithmic dependence on N (=length(vec)), which would imply a binary search type algorithm. However, when I test this hypothesis, in the following manner: set.seed(-3645); l - vector(); N.seq - c(5000, 50, 100, 1000, 5000);k - 1 for (N in N.seq){ tmp - sort(round(stats::rt(N, df=2), 2)); l[k] - system.time(it3 - findInterval(-1, tmp))[2];k - k + 1; } plot(N.seq,l,type=b,xlab=length(vec), ylab=CPU time); the resulting plot suggests a linear relationship. I must be missing sth. here ? Thanks ! Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to find a shift between two curves or data sets
Dear Hans, Thanks for your reply. I will read that book. Cheers! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphing netCDF files
Hi Steve, If you read your netCDF files into R you end up with sp-classes which can be displayed using spplot. But you do not seem to use rgdal. If you can make a data.frame with the x, y and z coordinates this can quite easily be transformed into an sp-class: library(sp) dat = data.frame(x = UTMx, y = UTMy, z = wat.data2001q1,,i]) coordinates(dat) = ~x+y # tell spplot what the names of the columns with the x and y coordinates are gridded(dat) = TRUE # make clear it is a grid spplot(dat) For more details see the documentation for the sp-package, especially spplot. These kinds of questions are more suitable for the r-sig-geo mailing list and not the general r-help list. hope this helps, Paul [EMAIL PROTECTED] schreef: Hello I'm working with a large hydrological data set stored in a netCDF format. The file stores x and y coordinates in the UTM projected coordinate system, yet when I use image to graphically display the z variable, the image is distorted in the sense that it does not plot the map in the correct spatial organization. I'm wondering if I need to define the projection of the netCDF file with rgdal or proj4 routines first before I send it to the graphics device. Defining the projection is not needed My code is as follows: q1_2001 - open.ncdf(H:\\SKF_DESKTOP FILES\\My Documents\\EDEN\\EDEN\\Surfaces\\2000_q1.nc, readunlimi=FALSE) #opens ncdf file for reading wat.data2000q1 - get.var.ncdf(q1_2001, verbose=FALSE ) # gets the real information # GENERAL EXAMINATION OF HEADER DATA in the wat.data file day - get.var.ncdf(q1_2001, time) # length(day) 91 days in quarter UTMx - get.var.ncdf(q1_2001, x) # columns (eastings) # should return 405 UTMy - get.var.ncdf(q1_2001, y) # rows (northings) # should return 287 # plot first 91 days (3 months of the year) for(i in 1:91) { !is.na( image(UTMx, UTMy, z = wat.data2001q1[,,i], col=brewer.pal(8, YlGnBu), axes=T, pty=s, ylab=UTM Northing, xlab=UTM Easting, main = First Quater 2001) ) } As I indicated above the map is displayed on the graphics device. However the orientation is distorted pulling the x axis to wide and the y axis too tall. How can I set the graphics device to know the orientation and scaling (if these are the correct terms) in order to display this map correctly? All insights will be greatly appreciated. Thanks Steve Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 Office (305) 224 - 4282 [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +31302535773 Fax:+31302531145 http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Profiling on Multicore and Parallel Systems
Hello All, In general when we use Rprof for performance evaluation on Multicore systems the output provides the time on the basis of the user time and the sampling time is equal to the the user time as reported by system.time. This does not seem right behavior when R is linked to BLAS/Lapack or other libraries which are optimized for parallel or multicore architectures as over there user time can be more than the elapsed time and one would be more interested in just the elapsed time taken by computation returned by gettimeofday() per routine rather than user time as returned by getrusage(). Could anyone provide any pointers on how to best do R profiling on parallel and multicore systems. Regards, -- Imanpreet Singh Arora [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Manage huge database
On Mon, 22 Sep 2008, Martin Morgan wrote: José E. Lozano [EMAIL PROTECTED] writes: Maybe you've not lurked on R-help for long enough :) Apologies! Probably. So, how much design is in this data? If none, and what you've basically got is a 2000x50 grid of numbers, then maybe a more raw Exactly, raw data, but a little more complex since all the 50 variables are in text format, so the width is around 2,500,000. snip Is genetic DNA data (individuals genotyped), hence the large amount of columns to analyze. The Bioconductor package snpMatrix is designed for this type of data. See http://www.bioconductor.org/packages/2.2/bioc/html/snpMatrix.html and if that looks promising source('http://bioconductor.org/biocLite.R') biocLite('snpMatrix') Likely you'll quickly want a 64 bit (linux or Mac) machine. netCDF is another useful option -- we have been using the ncdf package for large genomic datasets. We read the data in one person at a time and write to netCDF. For analysis we can then read any subsets. Since we have imputed SNP data as well as measured this comes to about 2.5 million variables on 4000 people for one of our data sets. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hmisc and Ubuntu (aptitude install)
Hi All, After rebuilding my Ubuntu image, I followed the instruction in this thread, and everything worked out fine -- thank you again. So, I'll just add: if you use R and perl, and don't have to download perl5.10, then don't do it, at least not yet. Or, if you do, then you will have a lot of shared object tweaking. Matt On Mon, Sep 22, 2008 at 1:22 PM, Matthew Pettis [EMAIL PROTECTED] wrote: Thank You All, I think all of this may have been due to shared library conflict headaches. At one point, I inadvertently upgraded my Perl install to 5.10, and I think that messed up a lot of my libraries. I have now started with a clean Ubuntu install, and am going to see if I can work my way back up to installing R and making that work. I will recontact the list if this problem persists through this reimaging of my server. Thanks again, Matt On Mon, Sep 22, 2008 at 8:20 AM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote: On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote: Matthew, As per the CRAN Ubuntu README http://cran.r-project.org/bin/linux/ubuntu/ install the Ubuntu r-base-dev package to compile R packages from sources. Well there should be a working r-cran-hmisc package. You simply got a '404' error indicating that your network access (using http) to the external Ubuntu mirror was broken. Fix that, or download the package by hand. It may be easier to just install the missing package. That said, Vincent is of course entirely correct on the need for r-base-dev. Dirk Vincent Le lun. 22 sept. à 00:08, Matthew Pettis a écrit : Hi, I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install. I tried getting Hmisc from within R by issuing the standard 'install.packages' command, but it said I needed 'gfortran' to compile. I thought I could circumvent this by using 'aptitude' to get the package 'r-cran-hmisc', but when I got it, the package had critical missing parts (got 404s). So, I'll be trying to go back and download 'gfortran', but can anybody tell me if this aptitude ubuntu package should be kept up to date and is just currently overlooked? Thanks, Matt -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Three out of two people have difficulties with fractions. -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] findInterval(), binary search, log(N) complexity
On 9/22/2008 1:51 PM, Markus Loecher wrote: Dear R users, the help for findInterval(x,vec) suggests a logarithmic dependence on N (=length(vec)), which would imply a binary search type algorithm. However, when I test this hypothesis, in the following manner: R is open source. Why test things this way, when you can look at the source? You don't even need to go to C code for this: findInterval function (x, vec, rightmost.closed = FALSE, all.inside = FALSE) { if (any(is.na(vec))) stop('vec' contains NAs) if (is.unsorted(vec)) stop('vec' must be sorted non-decreasingly) if (has.na - any(ix - is.na(x))) x - x[!ix] nx - length(x) index - integer(nx) .C(find_interv_vec, xt = as.double(vec), n = as.integer(length(vec)), x = as.double(x), nx = as.integer(nx), as.logical(rightmost.closed), as.logical(all.inside), index, DUP = FALSE, NAOK = TRUE, PACKAGE = base) if (has.na) { ii - as.integer(ix) ii[ix] - NA ii[!ix] - index ii } else index } environment: namespace:base Notice the is.unsorted test. How could that be anything other than linear execution time in N? Similarly for any(ix - is.na(x)). If you know the answers to those tests (as you do in your simulation), you could presumably get O(log(n)) behaviour by writing a new function that skipped them. But you could take a look at the source code (in https://svn.r-project.org/R/trunk/src/appl/interv.c) if you want to check, or if you notice any weird timings. Duncan Murdoch set.seed(-3645); l - vector(); N.seq - c(5000, 50, 100, 1000, 5000);k - 1 for (N in N.seq){ tmp - sort(round(stats::rt(N, df=2), 2)); l[k] - system.time(it3 - findInterval(-1, tmp))[2];k - k + 1; } plot(N.seq,l,type=b,xlab=length(vec), ylab=CPU time); the resulting plot suggests a linear relationship. I must be missing sth. here ? Thanks ! Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lme problems
Hi, I'm analysing a dataset in which the same 5 subjects (male.pair) were subjected to two treatments (treatment) and were measured for 12 successive days within each treatment (layingday). Overall 5*2*12=120 observations. I want to test the effect of treatment, time (layingday) and their interaction. I have done so through the ANOVA below: bmc3-aov(Mean1~treatment*layingday+Error(male.pair/treatment/layingday)) summary(bmc3) Error: male.pair Df Sum Sq Mean Sq F value Pr(F) Residuals 1 0.13850 0.13850 Error: male.pair:treatment Df Sum Sq Mean Sq treatment 1 0.60525 0.60525 Error: male.pair:treatment:layingday Df Sum Sq Mean Sq layingday 1 0.64037 0.64037 Error: Within Df Sum Sq Mean Sq F valuePr(F) treatment 1 0.02015 0.02015 0.73400.3934 layingday 1 0.52937 0.52937 19.2878 2.545e-05 *** treatment:layingday 1 0.02959 0.02959 1.07820.3013 Residuals 113 3.10135 0.02745 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 I then wanted to compare this outcome with an lme, and used the model below. However, its outcome doesn't make much sense to me. bmc4- lme(Mean1 ~ treatment*layingday, random = ~1|male.pair) summary(bmc4) Linear mixed-effects model fit by REML Data: NULL AIC BIC logLik -118.4522 -101.9306 65.22609 Random effects: Formula: ~1 | male.pair (Intercept) Residual StdDev: 0.1313573 0.1185902 Fixed effects: Mean1 ~ treatment * layingday Value Std.Error DF t-value p-value (Intercept) 0.5311005 0.09369140 112 5.668615 0. treatment0.0495373 0.04616116 112 1.073138 0.2855 layingday -0.0488055 0.00991701 112 -4.921389 0. treatment:layingday 0.0138449 0.00627207 112 2.207388 0.0293 Correlation: (Intr) trtmnt lyngdy treatment -0.739 layingday -0.688 0.838 treatment:layingday 0.653 -0.883 -0.949 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -2.44529424 -0.68505388 0.01663401 0.59009515 3.53354000 Number of Observations: 120 Number of Groups: 5 I struggle to understand the discrepancy in df between the anova and lme, and the fact that the interaction term is not significant in the anova but significant in lme. Any help would be greatly appreciated. Best Tom -- Dr. Tommaso Pizzari Edward Grey Institute, Dept of Zoology, University of Oxford, Oxford OX1 3PS Tel: (44) 1865 271279, Fax: (44) 1865 271168 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Warranty on Accuracy, Precision, Legality, ... of R in Research
on 09/22/2008 11:26 AM Bert Chan wrote: Warranty on Accuracy, Precision, Legality, ... of R in Research (These questions may well have been raised.) What is the implied warranty of using R for research publications, consulting, etc.? Alternately, how does one obtain such a warranty? Your answers will be much appreciated. Perhaps you can point me to some websites which discussed this subject in the past. Thanks regards - Bert (Bertram K. C. Chan, PhD) As per the banner that appears whenever you start up R: R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. The suitability of R for any particular application is entirely up to the user. Legally, there is nothing preventing you from using R for such applications relative to the license under which R is made available. You did not indicate the specific type of research you have in mind, but if it might be in the domain of clinical trials, please review: http://www.r-project.org/doc/R-FDA.pdf HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coefficients, OR and 95% CL
Dear R-users, After running a logistic regression, I need to calculate OR by exponentiating the coefficient, and then I need the 95% CL for the OR as well. For the following example (taken from P. Dalaagard's book), what would be the most straightforward method of getting what I need? Could anyone enlight me please? Thank you! Lucho summary(glm(menarche~age,binomial)) Call: glm(formula = menarche ~ age, family = binomial) Deviance Residuals: Min1QMedian3Q Max -4.68654 -0.13049 -0.01067 0.09608 2.35254 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -17.9175 1.7074 -10.49 2e-16 *** age 1.3549 0.1296 10.45 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 974.31 on 703 degrees of freedom Residual deviance: 223.95 on 702 degrees of freedom (635 observations deleted due to missingness) AIC: 227.95 Number of Fisher Scoring iterations: 9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coefficients, OR and 95% CL
Dear Luciano, See ?logistic.display in the epicalc package. If glm1 is your model, something like logistic.display(glm1) should do the job. HTH, Jorge On Mon, Sep 22, 2008 at 5:28 PM, Luciano La Sala [EMAIL PROTECTED]wrote: Dear R-users, After running a logistic regression, I need to calculate OR by exponentiating the coefficient, and then I need the 95% CL for the OR as well. For the following example (taken from P. Dalaagard's book), what would be the most straightforward method of getting what I need? Could anyone enlight me please? Thank you! Lucho summary(glm(menarche~age,binomial)) Call: glm(formula = menarche ~ age, family = binomial) Deviance Residuals: Min1QMedian3Q Max -4.68654 -0.13049 -0.01067 0.09608 2.35254 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -17.9175 1.7074 -10.49 2e-16 *** age 1.3549 0.1296 10.45 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 974.31 on 703 degrees of freedom Residual deviance: 223.95 on 702 degrees of freedom (635 observations deleted due to missingness) AIC: 227.95 Number of Fisher Scoring iterations: 9 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Deleting multiple variables
Hi All, i have searched the web for a simple solution but have been unable to find one. Can anyone recommend a neat way of deleting multiple variable? I see, i need to use dataframe$VAR-NULL to get rid of one variable, In my situation i need to delete all vars between two points. I've used the 'which' function to find these out and have assigned to myvar myvars [1] 2 17 but i can't figure out how i should apply this? Should i loop through the values? (Psydo code below?) for (x in c(myvars[1]:myvars[2])) (M_UC$x-NULL)) Any help gratful Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Deleting multiple variables
Mike, how about M_UC - M_UC[,-(myvars[1]:myvars[2])] ? Andrew On Mon, Sep 22, 2008 at 11:04:34PM +0100, Michael Pearmain wrote: Hi All, i have searched the web for a simple solution but have been unable to find one. Can anyone recommend a neat way of deleting multiple variable? I see, i need to use dataframe$VAR-NULL to get rid of one variable, In my situation i need to delete all vars between two points. I've used the 'which' function to find these out and have assigned to myvar myvars [1] 2 17 but i can't figure out how i should apply this? Should i loop through the values? (Psydo code below?) for (x in c(myvars[1]:myvars[2])) (M_UC$x-NULL)) Any help gratful Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weights for polr
Hello, I'm estimating an ordered logit model on a probability weighted survey sample. polr permits case weights with the weights option, but I cannot figure out from existing documentation what it actually does with these weights. I'm concerned about this because I get somewhat different results using Stata's ologit command with the pweights option and very different results using proc logistic in SAS with its weight option. So my basic question is whether or not it is appropriate to use the weight option for polr with my data. Best, Greg . Gregory Wawro [EMAIL PROTECTED] Associate Professor phone: 212-854-8540 Dept. of Political Science fax:212-222-0598 741 International Affairs http://www.columbia.edu/~gjw10/ Columbia University New York, NY 10027 . __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help for SUR model
I am an R beginner and trying to run a SUR model in R framework. subset(esasp500, Obs =449 Obs=197, select = -Date) -ev13sub c(Obs=397) c(Obs=399) -d13 c(Obs=400) c(Obs=449) -f13 SP500*f13 -SP500f13 BBC~SP500+d13+SP500f13 -sur132 BOW~SP500+d13+SP500f13 -sur133 CSK~SP500+d13+SP500f13 -sur134 DTC~SP500+d13+SP500f13 -sur135 GP~SP500+d13+SP500f13 -sur136 HAN~SP500+d13+SP500f13 -sur137 IP~SP500+d13+SP500f13 -sur138 KMB~SP500+d13+SP500f13 -sur139 LPX~SP500+d13+SP500f13 -sur1310 MWV~SP500+d13+SP500f13 -sur1311 PCH~SP500+d13+SP500f13 -sur1312 PCL~SP500+d13+SP500f13 -sur1313 PNR~SP500+d13+SP500f13 -sur1314 POP~SP500+d13+SP500f13 -sur1315 SON~SP500+d13+SP500f13 -sur1316 TIN~SP500+d13+SP500f13 -sur1317 W~SP500+d13+SP500f13-sur1318 WPP~SP500+d13+SP500f13 -sur1319 WY~SP500+d13+SP500f13 -sur1320 system13 - list(sur132, sur133, sur134, sur135, sur136, sur137, sur138, sur139, sur1310, sur1311, sur1312, sur1313, sur1314, sur1315, sur1316, sur1317, sur1318,sur1319,sur1320) labels13 - ist(sur132,sur133,sur134,sur135,sur136,sur137,sur138,sur1 39,sur1310,sur1311,sur1312,sur1313,sur1314,sur1315,sur1316 ,sur1317,sur1318,sur1319,sur1320) res13 - systemfit(SUR, system13,labels13, data=ev13sub) summary(res13) But the results show Error: could not find function systemfit. So, how to write a R code to implement the formula and get right results. Thanks, Bill [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prediction errors from forecast()?
Hello, I am using forecast() in the forecast package to predict future values of an ARIMA model fit to a time series. I have read most of the documentation for the forecast package, but I can't figure out how to obtain the forecast variance for the predicted values. I tried using the argument se.fit=TRUE, hoping this would work since forecast() calls predict(). Is there an easy way to do this? Sample code is below. ar - Arima(as.matrix(Y), order= c(1,0,0),include.drift=TRUE)) f - forecast(ar,h=9,se.fit=TRUE) summary(f) Thanks, Laura [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weights for polr
On Mon, 22 Sep 2008, Gregory Wawro wrote: Hello, I'm estimating an ordered logit model on a probability weighted survey sample. You could use svyolr() in the survey package. polr permits case weights with the weights option, but I cannot figure out from existing documentation what it actually does with these weights. They are frequency weights. I'm concerned about this because I get somewhat different results using Stata's ologit command with the pweights option You should get the same point estimates, but different standard errors. and very different results using proc logistic in SAS with its weight option. Again, it should be the same point estimates but different standard errors. So my basic question is whether or not it is appropriate to use the weight option for polr with my data. No. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Warranty on Accuracy, Precision, Legality, ... of R in Research
On Mon, Sep 22, 2008 at 4:07 PM, Marc Schwartz [EMAIL PROTECTED] wrote: on 09/22/2008 11:26 AM Bert Chan wrote: Warranty on Accuracy, Precision, Legality, ... of R in Research (These questions may well have been raised.) What is the implied warranty of using R for research publications, consulting, etc.? Alternately, how does one obtain such a warranty? Your answers will be much appreciated. Perhaps you can point me to some websites which discussed this subject in the past. Thanks regards - Bert (Bertram K. C. Chan, PhD) As per the banner that appears whenever you start up R: R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. And surely this the most that any software could provide? SAS has: EXCEPT WHERE EXPRESSLY PROVIDED OTHERWISE IN AN AGREEMENT BETWEEN YOU AND SAS, ALL INFORMATION, SOFTWARE, PRODUCTS AND SERVICES ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prediction errors from forecast()?
Sorry, I am resending in plain text. Hello, I am using forecast() in the forecast package to predict future values of an ARIMA model fit to a time series. I have read most of the documentation for the forecast package, but I can't figure out how to obtain the forecast variance for the predicted values. I tried using the argument se.fit=TRUE, hoping this would work since forecast() calls predict(). Is there an easy way to do this? Sample code is below. ar - Arima(as.matrix(Y), order= c(1,0,0),include.drift=TRUE)) f - forecast(ar,h=9,se.fit=TRUE) summary(f) Thanks, Laura __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sort a data matrix by all the values and keep the names
Dear all, If I have a data frame x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)): x1 x2 x3 1 4 8 7 6 2 I want to sort the whole data and get this: x1 1 x3 2 x2 4 x2 6 x1 7 x3 8 If I do sort(X), R reports: Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = FALSE) : unimplemented type 'list' in 'orderVector1' The only way I can sort all the data is by converting it to a matrix: sort(as.matrix(x)) [1] 1 2 4 6 7 8 But now I lost all the names attributes. Is it possible to sort a data frame and keep all the names? Thanks! Zhihua Li _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] perl expression question
If I have the string below. does someone know a regular expression to just get the BLC.NYSE. I bought the O'Reilley book and read it when I can and I study the solutions on the list but I'm still not self sufficient with these things. Thanks. stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort a data matrix by all the values and keep the names
One possibility is: x - data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)) names - t(matrix(rep(names(x),times=nrow(x)),nrow=ncol(x))) m - as.matrix(x) ind - order(m) df - data.frame(name=names[ind],value=m[ind]) df name value 1 x1 1 2 x3 2 3 x2 4 4 x2 6 5 x1 7 6 x3 8 --- On Tue, 23/9/08, zhihuali [EMAIL PROTECTED] wrote: From: zhihuali [EMAIL PROTECTED] Subject: [R] sort a data matrix by all the values and keep the names To: [EMAIL PROTECTED] Received: Tuesday, 23 September, 2008, 9:54 AM Dear all, If I have a data frame x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)): x1 x2 x3 1 4 8 7 6 2 I want to sort the whole data and get this: x1 1 x3 2 x2 4 x2 6 x1 7 x3 8 If I do sort(X), R reports: Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = FALSE) : unimplemented type 'list' in 'orderVector1' The only way I can sort all the data is by converting it to a matrix: sort(as.matrix(x)) [1] 1 2 4 6 7 8 But now I lost all the names attributes. Is it possible to sort a data frame and keep all the names? Thanks! Zhihua Li _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-2.7.2 infected?
I tried downloading R-2.7.2 (http://cran.cnr.berkeley.edu/bin/windows/base/R-2.7.2-win32.exe, both from Berkeley and cran) and both times I got a warning from Computer Associates eTrust Antivirus (version 7.1.710) that the Win32/Adclicker.JO trojan was detected: The Win32/Adclicker.JO was detected in C:\USERS\USER\APPDATA\LOCAL\MICROSOFT\WINDOWS\TEMPORARY INTERNET FILES\LOW\CONTENT.IE5\61HAYRTG\R-2.7.2-WIN32[1].EXE. Has anyone else seen this? Thanks, Dave [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort a data matrix by all the values and keep the names
On Mon, Sep 22, 2008 at 6:54 PM, zhihuali [EMAIL PROTECTED] wrote: Dear all, If I have a data frame x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)): x1 x2 x3 1 4 8 7 6 2 I want to sort the whole data and get this: x1 1 x3 2 x2 4 x2 6 x1 7 x3 8 If I do sort(X), R reports: Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = FALSE) : unimplemented type 'list' in 'orderVector1' The only way I can sort all the data is by converting it to a matrix: sort(as.matrix(x)) [1] 1 2 4 6 7 8 But now I lost all the names attributes. Is it possible to sort a data frame and keep all the names? Here's one way: dfm - melt(x, id = c()) dfm[order(dfm$value), ] Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort a data matrix by all the values and keep the names
This is exactly what I wanted! Thank you so much! Z Date: Mon, 22 Sep 2008 19:21:43 -0500 From: [EMAIL PROTECTED] Subject: RE: [R] sort a data matrix by all the values and keep the names To: [EMAIL PROTECTED] Hi: there might be a quicker way but you can use stack and order. stack creates a dataframe with 2 columns, values and ind, with ind being the associate columns. order(temp$values) creates the indices of the ordered values so you index by that to make it sorted. temp - stack(x) print(temp) print(str(temp)) sortedx - temp[order(temp$values),] print(sortedx) On Mon, Sep 22, 2008 at 7:54 PM, zhihuali wrote: Dear all, If I have a data frame x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)): x1 x2 x3 1 4 8 7 6 2 I want to sort the whole data and get this: x1 1 x3 2 x2 4 x2 6 x1 7 x3 8 If I do sort(X), R reports: Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = FALSE) : unimplemented type 'list' in 'orderVector1' The only way I can sort all the data is by converting it to a matrix: sort(as.matrix(x)) [1] 1 2 4 6 7 8 But now I lost all the names attributes. Is it possible to sort a data frame and keep all the names? Thanks! Zhihua Li _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] suppress legend in ggplot(data, aes(y=Y, x=X,fill=Z))?
On Sun, Sep 21, 2008 at 5:25 PM, Tom Bonen [EMAIL PROTECTED] wrote: hi, is there any way to suppress the legend in ggplot(data, aes(y=Y, x=X,fill=Z)) ? i'd like the values to be displayed in different colors as specified by fill= and this works just fine. but i do not want to have the legend on the right that is automactially created when fill is specified. Hi Tom, + opts(legend.position = none) should do the trick. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] perl expression question
Hi Mark, stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE gsub(.*/([^/]+)$, \\1,stock) [1] BLC.NYSE --- On Tue, 23/9/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: From: [EMAIL PROTECTED] [EMAIL PROTECTED] Subject: [R] perl expression question To: r-help@r-project.org Received: Tuesday, 23 September, 2008, 10:29 AM If I have the string below. does someone know a regular expression to just get the BLC.NYSE. I bought the O'Reilley book and read it when I can and I study the solutions on the list but I'm still not self sufficient with these things. Thanks. stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to view or export values of 'names' in a lm
Hello, I have been using: model - lm(y~x+I(x^2)) I am namely interested in the values of the residuals. If I use the 'names' command I get the following: names(model) [1] coefficients residuals effects rank [5] fitted.values assignqrdf.residual [9] xlevels call terms model I know I can view 'residuals' or 'resid' but how can I view the available values of 'names' together or, perhaps even better, how can I export them. If this is a case of read the manual, could someone direct me to where this is discussed. Thank you kindly, JE [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort a data matrix by all the values and keep the names
Is something missing in the melt()? x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)) require(reshape) Loading required package: reshape dfm - melt(x, id = c()) Error in if (!missing(id.var) !(id.var %in% varnames)) { : missing value where TRUE/FALSE needed dfm[order(dfm$value), ] Error: object dfm not found x x1 x2 x3 1 1 4 8 2 7 6 2 melt(x, id = c()) Error in if (!missing(id.var) !(id.var %in% varnames)) { : missing value where TRUE/FALSE needed Steve McKinney -Original Message- From: [EMAIL PROTECTED] on behalf of hadley wickham Sent: Mon 9/22/2008 5:47 PM To: zhihuali Cc: [EMAIL PROTECTED] Subject: Re: [R] sort a data matrix by all the values and keep the names On Mon, Sep 22, 2008 at 6:54 PM, zhihuali [EMAIL PROTECTED] wrote: Dear all, If I have a data frame x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)): x1 x2 x3 1 4 8 7 6 2 I want to sort the whole data and get this: x1 1 x3 2 x2 4 x2 6 x1 7 x3 8 If I do sort(X), R reports: Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = FALSE) : unimplemented type 'list' in 'orderVector1' The only way I can sort all the data is by converting it to a matrix: sort(as.matrix(x)) [1] 1 2 4 6 7 8 But now I lost all the names attributes. Is it possible to sort a data frame and keep all the names? Here's one way: dfm - melt(x, id = c()) dfm[order(dfm$value), ] Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot implicit function
Hi, I would like to know how to plot the implicit function. For example, f(x,y)=0. I'd like to plot x-y figure. Thanks, Ying __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] perl expression question
Hi Mark, do you mean the regex to get the portion of the address after the final slash? Something like gsub(.*/([^/]*$), \\1, stock, fixed=FALSE) Cheers Andrew On Mon, Sep 22, 2008 at 07:29:25PM -0500, [EMAIL PROTECTED] wrote: If I have the string below. does someone know a regular expression to just get the BLC.NYSE. I bought the O'Reilley book and read it when I can and I study the solutions on the list but I'm still not self sufficient with these things. Thanks. stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] perl expression question
Try this: sub(.*/, , stock) [1] BLC.NYSE On Mon, Sep 22, 2008 at 8:29 PM, [EMAIL PROTECTED] wrote: If I have the string below. does someone know a regular expression to just get the BLC.NYSE. I bought the O'Reilley book and read it when I can and I study the solutions on the list but I'm still not self sufficient with these things. Thanks. stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] perl expression question
By the way, although a regular expression solutions was asked for if one expands that to any solution then R does have a function specifically for this case: basename(stock) [1] BLC.NYSE On Mon, Sep 22, 2008 at 9:23 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this: sub(.*/, , stock) [1] BLC.NYSE On Mon, Sep 22, 2008 at 8:29 PM, [EMAIL PROTECTED] wrote: If I have the string below. does someone know a regular expression to just get the BLC.NYSE. I bought the O'Reilley book and read it when I can and I study the solutions on the list but I'm still not self sufficient with these things. Thanks. stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] perl expression question
If this is a path name, then 'basename' will work for you: stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE basename(stock) [1] BLC.NYSE On Mon, Sep 22, 2008 at 8:29 PM, [EMAIL PROTECTED] wrote: If I have the string below. does someone know a regular expression to just get the BLC.NYSE. I bought the O'Reilley book and read it when I can and I study the solutions on the list but I'm still not self sufficient with these things. Thanks. stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort a data matrix by all the values and keep the names
Hmm, maybe it only works in my development version (to be released v. v. soon) Hadley On Mon, Sep 22, 2008 at 8:02 PM, Steven McKinney [EMAIL PROTECTED] wrote: Is something missing in the melt()? x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)) require(reshape) Loading required package: reshape dfm - melt(x, id = c()) Error in if (!missing(id.var) !(id.var %in% varnames)) { : missing value where TRUE/FALSE needed dfm[order(dfm$value), ] Error: object dfm not found x x1 x2 x3 1 1 4 8 2 7 6 2 melt(x, id = c()) Error in if (!missing(id.var) !(id.var %in% varnames)) { : missing value where TRUE/FALSE needed Steve McKinney -Original Message- From: [EMAIL PROTECTED] on behalf of hadley wickham Sent: Mon 9/22/2008 5:47 PM To: zhihuali Cc: [EMAIL PROTECTED] Subject: Re: [R] sort a data matrix by all the values and keep the names On Mon, Sep 22, 2008 at 6:54 PM, zhihuali [EMAIL PROTECTED] wrote: Dear all, If I have a data frame x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)): x1 x2 x3 1 4 8 7 6 2 I want to sort the whole data and get this: x1 1 x3 2 x2 4 x2 6 x1 7 x3 8 If I do sort(X), R reports: Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = FALSE) : unimplemented type 'list' in 'orderVector1' The only way I can sort all the data is by converting it to a matrix: sort(as.matrix(x)) [1] 1 2 4 6 7 8 But now I lost all the names attributes. Is it possible to sort a data frame and keep all the names? Here's one way: dfm - melt(x, id = c()) dfm[order(dfm$value), ] Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: subscript out of bounds.
Consider: x - array(1:12,dim=12) x[13] [1] NA] m - array(1:12,dim=c(3,4)) m[3,5] Error: subscript out of bounds Can anyone tell me it there is a Good Reason for the difference in behaviour between 1 dimensional and higher dimensional arrays? In a bit of code that I was working on I expected the NA behaviour and didn't get it of course. Then I had to take evasive action to avoid the error. Naive young thing that I am, I would prefer the NA behaviour to be universal. But I expect that, as usual, I'm overlooking something. cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-2.7.2 infected?
Dave DeBarr wrote: I tried downloading R-2.7.2 (http://cran.cnr.berkeley.edu/bin/windows/base/R-2.7.2-win32.exe, both from Berkeley and cran) and both times I got a warning from Computer Associates eTrust Antivirus (version 7.1.710) that the Win32/Adclicker.JO trojan was detected: The Win32/Adclicker.JO was detected in C:\USERS\USER\APPDATA\LOCAL\MICROSOFT\WINDOWS\TEMPORARY INTERNET FILES\LOW\CONTENT.IE5\61HAYRTG\R-2.7.2-WIN32[1].EXE. Has anyone else seen this? You're the first to report it, and 2.7.2 has been out for almost a month, so I think it's likely that the CRAN copy is uninfected. Did you check the md5 checksum on it? It matches on the original, so if it doesn't match at your end, you've got a bad download. If it matches and you still get the virus checker reporting, please let me know the details about that infection, and I'll try to do a manual inspection for it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Create groups from data to compute lm?
Hello, Below are the first two rows from my dataset and the header. This dataset has 5749 rows and I want to select only certain rows to be used based on existing grouping values. I am trying to group the data based on the values under 'ex_bin'. (e.g a group for 250, 251, 252, 500, 501, 502) I would then like to perform a lm for each grouping. My data: all[1:2,] year extent scape bi_ca r ex_bin PriNo pri1234 pri_ex sc_ex Sc_ex_pri sc_ec_p1234 PD LPI ED LSI 13 25 1 1 3251251 1 1 26 125 11251125 21.6565 62.6602 82.0769 15.8792 23 25 1 1 3251251 1 1 26 125 11251125 19.3076 27.6264 111.2014 20.7889 PAFRAC PROX_MN ENN_MN CONTAG pfor purban 1 1.440 319.6529 114.8314 62.0965 69.4891 12.3124 2 1.467 396.1949 105.3712 52.9186 38.1179 15.1906 I tried using: all.lm - (pfor~PD, data = all, subset=(ex_bin==250)) but this resulted in a bogus analysis filed with 'NAs'. I then tried to use getGroups. all.group - getGroups(data=all, ex_bin ='250') Error in getGroups(data = all, ex_bin = 250) : unused argument(s) (ex_bin = 250) Again, no success. I am approaching this correctly? Thank you kindly, Regards, M Just [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-2.7.2 infected?
could this be an intentional attack to compromise a very popular download, and infect thousands of people.what could be the motivations...i hope its not some corporate thug here What exactly does the Win32/Adclicker.JO trojan do ??? Ajay www.decisionstats.com www.iwannacrib.com On Tue, Sep 23, 2008 at 9:11 AM, Duncan Murdoch [EMAIL PROTECTED] wrote: Dave DeBarr wrote: I tried downloading R-2.7.2 (http://cran.cnr.berkeley.edu/bin/windows/base/R-2.7.2-win32.exe, both from Berkeley and cran) and both times I got a warning from Computer Associates eTrust Antivirus (version 7.1.710) that the Win32/Adclicker.JO trojan was detected: The Win32/Adclicker.JO was detected in C:\USERS\USER\APPDATA\LOCAL\MICROSOFT\WINDOWS\TEMPORARY INTERNET FILES\LOW\CONTENT.IE5\61HAYRTG\R-2.7.2-WIN32[1].EXE. Has anyone else seen this? You're the first to report it, and 2.7.2 has been out for almost a month, so I think it's likely that the CRAN copy is uninfected. Did you check the md5 checksum on it? It matches on the original, so if it doesn't match at your end, you've got a bad download. If it matches and you still get the virus checker reporting, please let me know the details about that infection, and I'll try to do a manual inspection for it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Ajay Ohri http://tinyurl.com/liajayohri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.