[R] Unique subsetting question
Hi all, I'm looking at a large data set, and I'm interested in removing rows where only one variable is duplicated. Here's an example: presidents Qtr1 Qtr2 Qtr3 Qtr4 1945 NA 87 82 75 1946 63 50 43 32 1947 35 60 54 55 1948 36 39 NA NA 1949 69 57 57 51 1950 45 37 46 39 1951 36 24 32 23 1952 25 32 NA 32 1953 59 74 75 60 1954 71 61 71 57 1955 71 68 79 73 1956 76 71 67 75 1957 79 62 63 57 1958 60 49 48 52 1959 57 62 61 66 1960 71 62 61 57 1961 72 83 71 78 1962 79 71 62 74 1963 76 64 62 57 1964 80 73 69 69 1965 71 64 69 62 1966 63 46 56 44 1967 44 52 38 46 1968 36 49 35 44 1969 59 65 65 56 1970 66 53 61 52 1971 51 48 54 49 1972 49 61 NA NA 1973 68 44 40 27 1974 28 25 24 24 See how in 1954 and 1955, the Qtr1 approval rating is the same? Let's say I wanted to return the presidents data frame, but only have unique values for Qtr1. I doesn't matter which years are displayed for duplicated values-- it just matters that each value is not displayed more than once. Any way I can do this but still have it be a data frame that shows Qtr2, 3, and 4 values? Thanks in advance, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Unique-subsetting-question-tp2550453p2550453.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique subsetting question
I understand how duplicated and unique work for a list where all parts of a given row are duplicated, or how to find duplicated values if I'm just looking at that first column, but in this case the rows for 1954 and 1955 are not completely the same; only quarter 1 is duplicated, so I'm not sure how to apply either duplicated or unique in that case. Thanks, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Unique-subsetting-question-tp2550453p2550651.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique subsetting question
I just figured that out, but the real data I'm using is a data frame for sure, so I'll find another example. -- View this message in context: http://r.789695.n4.nabble.com/Unique-subsetting-question-tp2550453p2550736.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique subsetting question
How about this: s = c(aa, bb, cc, , aa, dd, , aa) n = c(2, 3, 5, 6, 7, 8, 9, 3) b = c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE) df = data.frame(n, s, b) # df is a data frame I want to display df with no value in s occurring more than once. Also, I want to delete the rows where s contains . -- View this message in context: http://r.789695.n4.nabble.com/Unique-subsetting-question-tp2550453p2550769.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique subsetting question
Thanks-- that works for what I'm trying to do. I was also wondering, in the data frame example you gave, if I just wanted to get rid of rows where the a value is 5, how would I do that? -- View this message in context: http://r.789695.n4.nabble.com/Unique-subsetting-question-tp2550453p2550836.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique subsetting question
Oops, yeah I didn't see that. Thanks, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Unique-subsetting-question-tp2550453p2550865.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding the right url for RCurl
Hi all, I am using RCurl to try and download data from a website, but I'm having trouble finding out what URL to use. Here is the site: http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX See how in the upper right, above the displayed sheet, there's a link to download the data as a .csv file? When I hit copy url and paste into getURL in R, it doesn't work. That's no surprise because there isn't a URL in what gets pasted. I was just wondering if there's any way around this. Thanks in advance, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Finding-the-right-url-for-RCurl-tp2314163p2314163.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding the right url for RCurl
Thanks for the help so far-- one interesting thing about this particular page is that the data displayed on the website actually differs from the data you can access with the download link. The XML package command works, but the table it produces in R has the following column names: x1 = readHTMLTable(http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX;, which + = 13, header = TRUE) colnames(x1) [1] Coupon Rate Maturity Date Ratingâ\u0080 % Weight Warning message: it is not known that wchar_t is Unicode on this platform whereas the .csv file you can get with the link has 8 columns, including a PositionDate column, a Shares column, etc. that aren't present on the page's table. What makes this even more confusing is that the XML table contains MORE information than is presented on the page, such as Maturity Date. What I'm really looking for is a way to access the .csv file, so I doubt that reading info from the webpage will be sufficient seeing as it seems to be displaying different data. --Andrew -- View this message in context: http://r.789695.n4.nabble.com/Finding-the-right-url-for-RCurl-tp2314163p2315461.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odd crash with tcl/tk
Hi, Recently, I've been trying to use packages in R that require loading the Tcl/Tk interface. However, I get a strange result and a crash that I haven't been able to find discussion about on these boards (or any others). When I enter library(tcltk), it reads Loading Tcl/Tk interface ... , but then never says done or displays some sort of error message. Looks like this: x11() library(tcltk) Loading Tcl/Tk interface ... Now you can type additional commands in, at your peril! For example, if I type in the text library, nothing happens, but library( causes R to freeze up irreparably, with executing: try(gsub('\\s+','',paste(capture.output(print(args(library,collapse=)),silent=TRUE) displayed at the bottom. When this happens, there's nothing you can do but restart R because it's completely frozen. I'm running R version 2.11.1 Patched (2010-07-27 r52627) [R.app GUI 1.35 (5603) i386-apple-darwin9.8.0] with XQuartz 2.3.5 (xorg-server 1.4.2-apple53) on a mac (snow leopard) Thanks for any help/suggestions in advance, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Odd-crash-with-tcl-tk-tp2305032p2305032.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Command that is conditional upon file retrieval: is it possible?
Hi all, I'm currently working on an R program where I have to access an FTP server to download some of the data I need. However, the people who post up the files I access are at times inconsistent with regards to time posted, if they post at all, etc Here's some of the code I use: library(RCurl) url1 = paste(ftp://user:passw...@a.great.website.com/;, file, num1, .csv, sep = ) data1 = getURL(url1) write(data1, file = paste(inMyFolder, num1, .csv, sep = )) Sometimes this process works perfectly, and sometimes I get an error message like this attached to data1 = getURL(url1): Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : RETR response: 550 That's because that particular file isn't on the FTP server yet. Now... let's just imagine that there's another way for me to access the file elsewhere, and I can drag it into the working directory with the same name as the file I'm telling R to write immediately after finding it on the FTP server. So here's my question: is it possible to write a command that will write the file if there isn't an error message going along with data1 = getURL(url1), but won't write the file if it can't find it As of right now, if I got the error message, dragged the file into the working directory and ran the program again, R will overwrite my good file with an empty one because in all cases, I'm telling it to write a file with that name that includes the information in data1. Thanks in advance, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Command-that-is-conditional-upon-file-retrieval-is-it-possible-tp2297811p2297811.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Search and extract string function
Hi all, I'm trying to write a function that will search and extract from a long character string, but with a twist: I want to use the characters before and the characters after what I want to extract as reference points. For example, say I'm working with data entries that looks like this: Drink=Coffee:Location=Office:Time=Morning:Market=Flat Drink=Water:Location=Office:Time=Afternoon:Market=Up Drink=Water:Location=Gym:Time=Evening:Market=Closed Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed ... For my function, I'd like to find what's located between Location=, and :Time= in every instance, and extract it, to return something like Office, Office, Gym, Restaurant. In a previous discussion I found (http://tolstoy.newcastle.edu.au/R/help/05/03/0344.html), someone wrote a function where you could find and substitute characters in a string, based on pre and post variables: interp - function(x, e = parent.frame(), pre = \\$, post = ) { for(el in ls(e)) { tag - paste(pre, el, post, sep = ) if (length(grep(tag, x))) x - gsub(tag, eval(parse(text = el), e), x) } x } I'm not sure how to modify it, however, to do what I want it to do. Any suggestions? Thanks in advance, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Search-and-extract-string-function-tp2290268p2290268.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Search and extract string function
Actually I have one more question that's somewhat related-- I'm starting out by importing a .txt file that isn't divided into vectors and is at times inconsistent with regards to spacing, indents, etc., so I can't rely on those. It looks something like this: Drink=Coffee:Location=Office:Time=Morning:Market=Flat Drink=Water:Location=Office:Time=Afternoon:Market=Up Drink=Water:Location=Gym:Time=Evening:Market=Closed Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed Drink=Coffee:Location=Office:Time=Morning:Market=Flat Drink=Water:Location=Office:Time=Afternoon:Market=Up Drink=Water:Location=Gym:Time=Evening:Market=Closed Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed Drink=Coffee:Location=Office:Time=Morning:Market=Flat Drink=Water:Location=Office:Time=Afternoon:Market=Up Drink=Water:Location=Gym:Time=Evening:Market=Closed Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed How can I take a single string like this and divide it into twelve vectors, like this: FixedData [1] Drink=Coffee:Location=Office:Time=Morning:Market=Flat [2] Drink=Water:Location=Office:Time=Afternoon:Market=Up [3] Drink=Water:Location=Gym:Time=Evening:Market=Closed [4] Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed [5] Drink=Coffee:Location=Office:Time=Morning:Market=Flat [6] Drink=Water:Location=Office:Time=Afternoon:Market=Up [7] Drink=Water:Location=Gym:Time=Evening:Market=Closed [8] Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed [9] Drink=Coffee:Location=Office:Time=Morning:Market=Flat [10] Drink=Water:Location=Office:Time=Afternoon:Market=Up [11] Drink=Water:Location=Gym:Time=Evening:Market=Closed [12] Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed Thanks again for all of the help! --Andrew -- View this message in context: http://r.789695.n4.nabble.com/Search-and-extract-string-function-tp2290268p2290375.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.