Re: [R] Error message
Here is the first few bytes, xxd -l 128 X1.RData : 8d5a 35f8 1ac5 cc14 a04e be5c 572f a3ad .Z5..N.\W/.. 0010: 6210 7024 9b58 93c7 34d0 acb7 7a82 3f99 b.p$.X..4...z.?. 0020: 66ce 0ebb 2057 ec36 55b4 0ece a036 695a f... W.6U6iZ 0030: 258b 3493 b661 f620 f7fe ada7 158a 15f7 %.4..a. 0040: e016 a548 6fcb 20c8 6fb4 493d adc9 ea4a ...Ho. .o.I=...J 0050: 0a2b b7cf a416 336e 5e4e abc5 9874 7be3 .+3n^N...t{. 0060: 5a5a 3405 fe35 8a3d ad80 0dc0 ca3e ea7a ZZ4..5.=.>.z 0070: e628 b220 ee50 0b9f 3a81 e971 8a19 4f54 .(. .P..:..q..OT On Fri, Mar 22, 2024 at 2:36 PM Ivan Krylov wrote: > > В Fri, 22 Mar 2024 14:31:17 -0500 > Val пишет: > > > How do I get the first few bytes? > > What does file.info('X1.RData') say? > > Do you get any output if you run print(readBin('X1.RData', raw(), 128))? > > If this is happening on a Linux or macOS machine, the operating system > command xxd -l 128 X1.RData will give the same output in a more > readable manner, but the readBin(...) output from R should be fine too. > > -- > Best regards, > Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error message
Yes, X1.RData is large(more than 40M rows) . How do I get the first few bytes? On Fri, Mar 22, 2024 at 2:20 PM Ivan Krylov wrote: > > В Fri, 22 Mar 2024 14:02:09 -0500 > Val пишет: > > > X2.R > > load("X1.RData") > > > > I am getting this error message: > > Error in load("X1.RData", : > > bad restore file magic number (file may be corrupted) .. no data > > loaded. > > This error happens very early when R tries to load the file, right > at the first few bytes. Is "X1.RData" large? Can you share it, or at > least a hexadecimal dump of the first few hundred bytes? > > -- > Best regards, > Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error message
Hi all, I am creating an X1.RData file using the R 4.2.2 library. x1.R save(datafilename, file="X1.RData") When I am trying to load this file using another script X2.R load("X1.RData") I am getting this error message: Error in load("X1.RData", : bad restore file magic number (file may be corrupted) .. no data loaded. I am using the same R library (R 4.2.2) What would be the cause for this error message and how to fix it? Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiply
Thank you, Avi and Ivan. Worked for this particular Example. Yes, I am looking for something with a more general purpose. I think Ivan's suggestion works for this. multiplication=as.matrix(dat1[,-1]) %*% as.matrix(dat2[match(dat1[,1], dat2[,1]),-1]) Res=data.frame(ID = dat1[,1], Index = multiplication) On Fri, Aug 4, 2023 at 10:59 AM wrote: > > Val, > > A data.frame is not quite the same thing as a matrix. > > But as long as everything is numeric, you can convert both data.frames to > matrices, perform the computations needed and, if you want, convert it back > into a data.frame. > > BUT it must be all numeric and you violate that requirement by having a > character column for ID. You need to eliminate that temporarily: > > dat1 <- read.table(text="ID, x, y, z > A, 10, 34, 12 > B, 25, 42, 18 > C, 14, 20, 8 ",sep=",",header=TRUE,stringsAsFactors=F) > > mat1 <- as.matrix(dat1[,2:4]) > > The result is: > > > mat1 > x y z > [1,] 10 34 12 > [2,] 25 42 18 > [3,] 14 20 8 > > Now do the second matrix, perhaps in one step: > > mat2 <- as.matrix(read.table(text="ID, weight, weiht2 > A, 0.25, 0.35 > B, 0.42, 0.52 > C, 0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)[,2:3]) > > > Do note some people use read.csv() instead of read.table, albeit it simply > calls read.table after setting some parameters like the comma. > > The result is what you asked for, including spelling weight wrong once.: > > > mat2 > weight weiht2 > [1,] 0.25 0.35 > [2,] 0.42 0.52 > [3,] 0.65 0.75 > > Now you wanted to multiply as in matrix multiplication. > > > mat1 %*% mat2 > weight weiht2 > [1,] 24.58 30.18 > [2,] 35.59 44.09 > [3,] 17.10 21.30 > > Of course, you wanted different names for the columns and you can do that > easily enough: > > result <- mat1 %*% mat2 > > colnames(result) <- c("index1", "index2") > > > But this is missing something: > > > result > index1 index2 > [1,] 24.58 30.18 > [2,] 35.59 44.09 > [3,] 17.10 21.30 > > Do you want a column of ID numbers on the left? If numeric, you can keep it > in a matrix in one of many ways but if you want to go back to the data.frame > format and re-use the ID numbers, there are again MANY ways. But note mixing > characters and numbers can inadvertently convert everything to characters. > > Here is one solution. Not the only one nor the best one but reasonable: > > recombined <- data.frame(index=dat1$ID, > index1=result[,1], > index2=result[,2]) > > > > recombined > index index1 index2 > 1 A 24.58 30.18 > 2 B 35.59 44.09 > 3 C 17.10 21.30 > > If for some reason you need a more general purpose way to do this for > arbitrary conformant matrices, you can write a function that does this in a > more general way but perhaps a better idea might be a way to store your > matrices in files in a way that can be read back in directly or to not > include indices as character columns but as row names. > > > > > > > -Original Message- > From: R-help On Behalf Of Val > Sent: Friday, August 4, 2023 10:54 AM > To: r-help@R-project.org (r-help@r-project.org) > Subject: [R] Multiply > > Hi all, > > I want to multiply two data frames as shown below, > > dat1 <-read.table(text="ID, x, y, z > A, 10, 34, 12 > B, 25, 42, 18 > C, 14, 20, 8 ",sep=",",header=TRUE,stringsAsFactors=F) > > dat2 <-read.table(text="ID, weight, weiht2 > A, 0.25, 0.35 > B, 0.42, 0.52 > C, 0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F) > > Desired result > > ID Index1 Index2 > 1 A 24.58 30.18 > 2 B 35.59 44.09 > 3 C 17.10 21.30 > > Here is my attempt, but did not work > > dat3 <- data.frame(ID = dat1[,1], Index = apply(dat1[,-1], 1, FUN= > function(x) {sum(x*dat2[,2:ncol(dat2)])} ), stringsAsFactors=F) > > > Any help? > > Thank you, > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiply
Hi all, I want to multiply two data frames as shown below, dat1 <-read.table(text="ID, x, y, z A, 10, 34, 12 B, 25, 42, 18 C, 14, 20, 8 ",sep=",",header=TRUE,stringsAsFactors=F) dat2 <-read.table(text="ID, weight, weiht2 A, 0.25, 0.35 B, 0.42, 0.52 C, 0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F) Desired result ID Index1 Index2 1 A 24.58 30.18 2 B 35.59 44.09 3 C 17.10 21.30 Here is my attempt, but did not work dat3 <- data.frame(ID = dat1[,1], Index = apply(dat1[,-1], 1, FUN= function(x) {sum(x*dat2[,2:ncol(dat2)])} ), stringsAsFactors=F) Any help? Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlate
Thank you John for your help and advice. On Fri, Aug 26, 2022 at 11:04 AM John Fox wrote: > > Dear Val, > > On 2022-08-26 10:41 a.m., Val wrote: > > Hi John and Timothy > > > > Thank you for your suggestion and help. Using the sample data, I did > > carry out a test run and found a difference in the correlation result. > > > > Option 1. > > data_cor <- cor(dat[ , colnames(dat) != "x1"], # Calculate correlations > > dat$x1, method = "pearson", use = "complete.obs") > > resulted > > [,1] > > x2 -0.5845835 > > x3 -0.4664220 > > x4 0.7202837 > > > > Option 2. > > for(i in colnames(dat)){ > >print(cor.test(dat[,i], dat$x1, method = "pearson", use = > > "complete.obs")$estimate) > > } > > [,1] > > x2 -0.7362030 > > x3 -0.04935132 > > x4 0.85766290 > > > > This was crosschecked using Excel and other softwares and all matches > > with option 2. > > One of the factors that contributed for this difference is loss of > > information when we are using na.rm(). This is because that if x2 has > > missing value but x3 and x4 don’t have then na.rm() removed entire > > row information including x3 and x4. > > Yes, I already explained that in my previous message. > > As well, cor() is capable of computing pairwise-complete correlations -- > see ?cor. > > There's not an obvious right answer here, however. Using > pairwise-complete correlations can produce inconsistent (i.e., > non-positive semi-definite) correlation matrices because correlations > are computed on different subsets of the data. > > There are much better ways to deal with missing data. > > > > > My question is there a way to extract the number of rows (N) used in > > the correlation analysis?. > > I'm sure that there are many ways, but here is one that is very > simple-minded and should be reasonably efficient for ~250 variables: > > > (nc <- ncol(dat)) > [1] 4 > > > R <- N <- matrix(NA, nc, nc) > > diag(R) <- 1 > > for (i in 1:(nc - 1)){ > + for (j in (i + 1):nc){ > + R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs") > + N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)])) > + } > + } > > > round(R, 3) > [,1] [,2] [,3] [,4] > [1,] 1.000 -0.736 -0.049 0.858 > [2,] -0.736 1.000 0.458 -0.428 > [3,] -0.049 0.458 1.000 0.092 > [4,] 0.858 -0.428 0.092 1.000 > > > N > [,1] [,2] [,3] [,4] > [1,] NA888 > [2,]8 NA88 > [3,]88 NA8 > [4,]888 NA > > > round(cor(dat, use="pairwise.complete.obs"), 3) # check > x1 x2 x3 x4 > x1 1.000 -0.736 -0.049 0.858 > x2 -0.736 1.000 0.458 -0.428 > x3 -0.049 0.458 1.000 0.092 > x4 0.858 -0.428 0.092 1.000 > > More generally, I think that it's a good idea to learn a little bit > about R programming if you intend to use R in your work. You'll then be > able to solve problems like this yourself. > > I hope this helps, > John > > > Thank you, > > > > On Mon, Aug 22, 2022 at 1:00 PM John Fox wrote: > >> > >> Dear Val, > >> > >> On 2022-08-22 1:33 p.m., Val wrote: > >>> For the time being I am assuming the relationship across variables > >>> is linear. I want get the values first and detailed examining of > >>> the relationship will follow later. > >> > >> This seems backwards to me, but I'll refrain from commenting further on > >> whether what you want to do makes sense and instead address how to do it > >> (not, BTW, because I disagree with Bert's and Tim's remarks). > >> > >> Please see below: > >> > >>> > >>> On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron > >>> wrote: > >>>> > >>>> I (maybe) agree, but I would go further than that. There are assumptions > >>>> associated with the test that are missing. It is not clear that the > >>>> relationships are all linear. Regardless of a "significant outcome" all > >>>> of the relationships need to be explored in more detail than what is > >>>> provided in the correlation test. > >>>> > >>>> Multiplicity adjustment as in : > >>>> https://www.sciencedirect.com/science/article/pii/S019724561069 is > >>>&g
Re: [R] Correlate
For the time being I am assuming the relationship across variables is linear. I want get the values first and detailed examining of the relationship will follow later. On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron wrote: > > I (maybe) agree, but I would go further than that. There are assumptions > associated with the test that are missing. It is not clear that the > relationships are all linear. Regardless of a "significant outcome" all of > the relationships need to be explored in more detail than what is provided in > the correlation test. > > Multiplicity adjustment as in : > https://www.sciencedirect.com/science/article/pii/S019724561069 is not an > issue that I can see in these data from the information provided. At least > not in the same sense as used in the link. > > My first guess at the meaning of "multiplicity adjustment" was closer to the > experimentwise error rate in a multiple comparison procedure. > https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 > error rate is inflated the more test you do and if you perform enough tests > you find significant outcomes by chance alone. There is great significance in > the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule. > > A simple solution is to apply a Bonferroni correction where alpha is divided > by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. > Another approach is to try to discuss the outcomes in a way that makes sense. > What is the connection between a football team's last home game an the > election result that would enable me to take another team and apply their > last home game result to the outcome of a different election? > > Another complication is if variables x2 through x250 are themselves > correlated. Not enough information was provided in the problem to know if > this is an issue, but 250 orthogonal variables in a real dataset would be a > bit unusual considering the experimentwise error rate previously mentioned. > > Large datasets can be very messy. > > > Tim > > -Original Message- > From: Bert Gunter > Sent: Monday, August 22, 2022 12:07 PM > To: Ebert,Timothy Aaron > Cc: Val ; r-help@R-project.org (r-help@r-project.org) > > Subject: Re: [R] Correlate > > [External Email] > > ... But of course the p-values are essentially meaningless without some sort > of multiplicity adjustment. > (search on "multiplicity adjustment" for details). :-( > > -- Bert > > > On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron wrote: > > > > A somewhat clunky solution: > > for(i in colnames(dat)){ > > print(cor.test(dat[,i], dat$x1, method = "pearson", use = > > "complete.obs")$estimate) > > print(cor.test(dat[,i], dat$x1, method = "pearson", use = > > "complete.obs")$p.value) } > > > > Rather than printing you could set up an array or list to save the results. > > > > > > Tim > > > > -Original Message- > > From: R-help On Behalf Of Val > > Sent: Monday, August 22, 2022 11:09 AM > > To: r-help@R-project.org (r-help@r-project.org) > > Subject: [R] Correlate > > > > [External Email] > > > > Hi all, > > > > I have a data set with ~250 variables(columns). I want to calculate > > the correlation of one variable with the rest of the other variables > > and also want the p-values for each correlation. Please see the > > sample data and my attempt. I have got the correlation but unable to > > get the p-values > > > > dat <- read.table(text="x1 x2 x3 x4 > >1.68 -0.96 -1.25 0.61 > > -0.06 0.41 0.06 -0.96 > > .0.08 1.14 1.42 > >0.80 -0.67 0.53 -0.68 > >0.23 -0.97 -1.18 -0.78 > > -1.03 1.11 -0.61. > >2.15 .0.02 0.66 > >0.35 -0.37 -0.26 0.39 > > -0.66 0.89 .-1.49 > >0.11 1.52 0.73 -1.03",header=TRUE) > > > > #change all to numeric > > dat[] <- lapply(dat, function(x) as.numeric(as.character(x))) > > > > data_cor <- cor(dat[ , colnames(dat) != "x1"], dat$x1, method = > > "pearson", use = "complete.obs") > > > > Result > > [,1] > > x2 -0.5845835 > > x3 -0.4664220 > > x4 0.7202837 > > > > How do I get the p-values ? > > > > Thank you, > > > > __ > > R-help@r-project.org mailing list -- To
[R] Correlate
Hi all, I have a data set with ~250 variables(columns). I want to calculate the correlation of one variable with the rest of the other variables and also want the p-values for each correlation. Please see the sample data and my attempt. I have got the correlation but unable to get the p-values dat <- read.table(text="x1 x2 x3 x4 1.68 -0.96 -1.25 0.61 -0.06 0.41 0.06 -0.96 .0.08 1.14 1.42 0.80 -0.67 0.53 -0.68 0.23 -0.97 -1.18 -0.78 -1.03 1.11 -0.61. 2.15 .0.02 0.66 0.35 -0.37 -0.26 0.39 -0.66 0.89 .-1.49 0.11 1.52 0.73 -1.03",header=TRUE) #change all to numeric dat[] <- lapply(dat, function(x) as.numeric(as.character(x))) data_cor <- cor(dat[ , colnames(dat) != "x1"], dat$x1, method = "pearson", use = "complete.obs") Result [,1] x2 -0.5845835 x3 -0.4664220 x4 0.7202837 How do I get the p-values ? Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row exclude
Thank you David. What about if I want to list the excluded rows? I used this (dat3 <- dat1[unique(c(BadName, BadAge, BadWeight)), ]) It did not work.The desired output is, Alex, 20, 13X John, 3BC, 175 Jack3, 34, 140 Thank you, On Sat, Jan 29, 2022 at 10:15 PM David Carlson wrote: > It is possible that there would be errors on the same row for different > columns. This does not happen in your example. If row 4 was "John6, 3BC, > 175X" then row 4 would be included 3 times, but we only need to remove it > once. Removing the duplicates is not necessary since R would not get > confused, but length(unique(c(BadName, BadAge, BadWeight)) indicates how > many lines are being removed. > > David > > On Sat, Jan 29, 2022 at 8:32 PM Val wrote: > >> Thank you David for your help. I just have one question on this. What is >> the purpose of using the "unique" function on this? (dat2 <- >> dat1[-unique(c(BadName, BadAge, BadWeight)), ]) I got the same result >> without using it. ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> ZjQcmQRYFpfptBannerEnd >> Thank you David for your help. >> >> I just have one question on this. What is the purpose of using the >> "unique" function on this? >> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ]) >> >> I got the same result without using it. >>(dat2 <- dat1[-(c(BadName, BadAge, BadWeight)), ]) >> >> My concern is when I am applying this for the large data set the >> "unique" function may consume resources(time and memory). >> >> Thank you. >> >> On Sat, Jan 29, 2022 at 12:30 AM David Carlson wrote: >> >>> Given that you know which columns should be numeric and which should be >>> character, finding characters in numeric columns or numbers in character >>> columns is not difficult. Your data frame consists of three character >>> columns so you can use regular expressions as Bert mentioned. First you >>> should strip the whitespace out of your data: >>> >>> dat1 <-read.table(text="Name, Age, Weight >>> Alex, 20, 13X >>> Bob, 25, 142 >>> Carol, 24, 120 >>> John, 3BC, 175 >>> Katy, 35, 160 >>> Jack3, 34, 140",sep=",", header=TRUE, stringsAsFactors=FALSE, >>> strip.white=TRUE) >>> >>> Now check to see if all of the fields are character as expected. >>> >>> sapply(dat1, typeof) >>> #Name Age Weight >>> # "character" "character" "character" >>> >>> Now identify character variables containing numbers and numeric >>> variables containing characters: >>> >>> BadName <- which(grepl("[[:digit:]]", dat1$Name)) >>> BadAge <- which(grepl("[[:alpha:]]", dat1$Age)) >>> BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight)) >>> >>> Next remove those rows: >>> >>> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ]) >>> #Name Age Weight >>> # 2 Bob 25142 >>> # 3 Carol 24120 >>> # 5 Katy 35160 >>> >>> You still need to convert Age and Weight to numeric, e.g. dat2$Age <- >>> as.numeric(dat2$Age). >>> >>> David Carlson >>> >>> >>> On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter >>> wrote: >>> >>>> As character 'polluted' entries will cause a column to be read in (via >>>> read.table and relatives) as factor or character data, this sounds like a >>>> job for regular expressions. If you are not familiar with this subject, >>>> time to learn. And, yes, ZjQcmQRYFpfptBannerStart >>>> This Message Is From an External Sender >>>> This message came from outside your organization. >>>> ZjQcmQRYFpfptBannerEnd >>>> >>>> As character 'polluted' entries will cause a column to be read in (via >>>> read.table and relatives) as factor or character data, this sounds like a >>>> job for regular expressions. If you are not familiar with this subject, >>>> time to learn. And, yes, some heavy lifting will be required. >>>> See ?regexp for a start maybe? Or the stringr package? >>>> >>>> Cheers, >>>> Bert >>>> >>>> >>>> >>>> >>>> On Fri, Jan 28, 2022
Re: [R] Row exclude
Hi all, Thank you so much for the useful help and many options that you gave me. Sorry for the delay response, I was away for a while On Sat, Jan 29, 2022 at 3:35 PM Avi Gross via R-help wrote: > Rui has indeed improved my first attempt in several ways so my comments > are now focused on another level. There is seemingly endless discussion > here about what is base R. Questions as well as Answers that go beyond base > R are often challenged and I understand why, even if I personally don't > worry about it. > > As I see it, R has many levels, like many modern programming languages, > and some are built-in by default, while others are add-ons of various kinds > and some are now seen as more commonly used than others. Some here, and NOT > ME, seem particularly annoyed by the concept of the tidyverse existing or > the corporate nature of RSTUDIO. I say, the more the better as long as they > are well-designed and robust and efficient enough. > > There are many ways you can use R in simple mode to the point where you do > not even use vectors as intended but use loops to say add corresponding > entries in two vectors one item at a time using an index, as you might do > with earlier languages. That is perfectly valid in R, albeit not using the > language as intended as A+B in R does that for you fairly trivially, albeit > hiding a kind of loop being done behind the scenes. But if the two vectors > are not the same length, it can lead to subtle errors if it recycles or > broadcasts the shorter one as needed UNLESS that was intended. > > Like many languages, R has additional modes of a sort. it is very loosely > Object-Oriented and some solutions to problems may make use of that or > other features not always found in other languages such as being able to > attach attributes of arbitrary nature to things. But someone taking a > beginner course in R, or just using it in simple ways, generally does not > know or care and being given a possible solution like that may not be very > helpful. > > R is fully a functional programming language and experienced users, like > Rui clearly is, can make serious use of many paradigms like map/reduce to > create what often are quite abstract solutions that can be tailored to do > all kinds of things by simply changing the functions invoked or in this > case also the data invoked. I was tempted to use a variant of his solution > using the pmap() function that I am familiar with but it is not base R, but > part of the "purr" package which is in the not-appreciated-here package of > packages called the tidyverse, LOL! > > Pmap can take an arbitrary data.frame and look at it one row at a time and > apply a function that sees all the columns. That function can be written so > it applies your logic to each column entry for that row that you wish and > combines the calculations to return something like TRUE/FALSE. In this > case, it could be code connecting use of a regular expression on each > column entry combined by the usual logical connectives like AND and NOT > (using R notation) to return a TRUE or FALSE that pmap then combines into a > vector and you use that to index the data.frame to keep only valid rows. > BUT, I reconsidered using it here as it is a tad advanced and not pure R. > Nor do I claim it is better than what Rui and others could come up with. It > is just not as simple as the case we are looking at. > > R has another facet that needs to be used carefully that significantly > alters some approaches as compared to a language like Python which has a > much nicer object-oriented set of tools but does not have some of the > delayed evaluation R supports and that sometimes get in the way as some > people expect them to be evaluated sooner, or at all. I see strengths and > weaknesses and try to use a language suited for my needs that also uses it > mostly as intended. > > I also ask if we have met the needs of the person who asked this question. > If they do not reply and merely REPOST the same question with a shorter > subject-line, then I suggest we all wasted our time trying. Proper > etiquette, I might think, is to reply to some work show by others IN PUBLIC > and especially to explain anything being asked by us and to let us know > what worked for them or met their needs or show a portion of what code they > finally implemented. Some of that may yet happen, but can anyone blame me > for being a tad suspicious this time? > > I tend to be interested in deeper discussions and many are outside the > scope of this forum. So I acknowledge that discussing alternate methods > including more abstract ones using functional programming or other tricks, > is a bit outside what is expected here. > > I want though to add one more idea. Can we agree that the user may have a > more general concept to be considered here. That is the concept of having a > data.frame where each column is purely numeric consisting of just 0 through > 9 with perhaps no spaces, periods or commas or
Re: [R] Linear
str(dat2) data.frame': 37654 obs. ...: $ Yld: int $ A : int $ B : chr $ C : chr On Wed, Jan 26, 2022 at 10:49 AM Bert Gunter wrote: > What does str(dat2) give? > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Wed, Jan 26, 2022 at 7:37 AM Val wrote: > > > > Hi all, > > > > I am trying to get the lsmeans for one of the factors fitted in the > > following model > > > > Model1 = lm(Yld ~ A + B + C, data = dat2) > > M_lsm = as.data.frame(lsmeans(Model1, "C")), > > > > My problem is, I am getting this error message. > > "Error: The rows of your requested reference grid would be 81412, which > > exceeds the limit of 1 (not including any multivariate responses)". > > > > How do I fix this? > > > > Thank you > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear
Hi all, I am trying to get the lsmeans for one of the factors fitted in the following model Model1 = lm(Yld ~ A + B + C, data = dat2) M_lsm = as.data.frame(lsmeans(Model1, "C")), My problem is, I am getting this error message. "Error: The rows of your requested reference grid would be 81412, which exceeds the limit of 1 (not including any multivariate responses)". How do I fix this? Thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date
Thank you All. The issue was not reading different file. I just mistyped the column name, instead of typing My_date I typed mydate in the email. The problem is solved by using this dat=read.csv("myfile.csv",stringsAsFactors=FALS) suggested by Jim. On Thu, Nov 4, 2021 at 7:58 PM Jeff Newmiller wrote: > > Then you are looking at a different file... check your filenames. You have > imported the column as character, and R has not yet recognized that it is > supposed to be a date, so it can only show what it found. > > You will almost certainly find your error if you make a reproducible example. > > On November 4, 2021 5:30:22 PM PDT, Val wrote: > >Jeff, > > > >The date from y data file looks like as follow in the Linux environment, > >My_date > >2019-09-16 > >2021-02-21 > >2021-02-22 > >2017-10-11 > >2017-10-10 > >2018-11-11 > >2017-10-27 > >2017-10-30 > >2019-05-20 > > > >On Thu, Nov 4, 2021 at 5:00 PM Jeff Newmiller > >wrote: > >> > >> You are claiming behavior that is not something R does, but is something > >> Excel does constantly. > >> > >> Compare what your data file looks like using a text editor with what R has > >> imported. Absolutely do not use a spreadsheet program to do this. > >> > >> On November 4, 2021 2:43:25 PM PDT, Val wrote: > >> >IHi All, l, > >> > > >> >I am reading a csv file and one of the columns is named as "mydate" > >> > with this form, 2019-09-16. > >> > > >> >I am reading this file as > >> > > >> >dat=read.csv("myfile.csv") > >> > the structure of the data looks like as follow > >> > > >> >str(dat) > >> >mydate : chr "09/16/2019" "02/21/2021" "02/22/2021" "10/11/2017" ... > >> > > >> >Please note the format has changed from -mm-dd to mm/dd/ > >> >When I tried to change this as a Date using > >> > > >> >as.Date(as.Date(mydate, format="%m/%d/%Y" ) > >> >I am getting this error message > >> >Error in charToDate(x) : > >> > characte string is not in a standard unambiguous format > >> > > >> >My question is, > >> >1. how can I read the file as it is (i.e., without changing the date > >> >format) ? > >> >2. why does R change the date format? > >> > > >> >Thank you, > >> > > >> >__ > >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >https://stat.ethz.ch/mailman/listinfo/r-help > >> >PLEASE do read the posting guide > >> >http://www.R-project.org/posting-guide.html > >> >and provide commented, minimal, self-contained, reproducible code. > >> > >> -- > >> Sent from my phone. Please excuse my brevity. > > -- > Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date
Jeff, The date from y data file looks like as follow in the Linux environment, My_date 2019-09-16 2021-02-21 2021-02-22 2017-10-11 2017-10-10 2018-11-11 2017-10-27 2017-10-30 2019-05-20 On Thu, Nov 4, 2021 at 5:00 PM Jeff Newmiller wrote: > > You are claiming behavior that is not something R does, but is something > Excel does constantly. > > Compare what your data file looks like using a text editor with what R has > imported. Absolutely do not use a spreadsheet program to do this. > > On November 4, 2021 2:43:25 PM PDT, Val wrote: > >IHi All, l, > > > >I am reading a csv file and one of the columns is named as "mydate" > > with this form, 2019-09-16. > > > >I am reading this file as > > > >dat=read.csv("myfile.csv") > > the structure of the data looks like as follow > > > >str(dat) > >mydate : chr "09/16/2019" "02/21/2021" "02/22/2021" "10/11/2017" ... > > > >Please note the format has changed from -mm-dd to mm/dd/ > >When I tried to change this as a Date using > > > >as.Date(as.Date(mydate, format="%m/%d/%Y" ) > >I am getting this error message > >Error in charToDate(x) : > > characte string is not in a standard unambiguous format > > > >My question is, > >1. how can I read the file as it is (i.e., without changing the date format) > >? > >2. why does R change the date format? > > > >Thank you, > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Date
IHi All, l, I am reading a csv file and one of the columns is named as "mydate" with this form, 2019-09-16. I am reading this file as dat=read.csv("myfile.csv") the structure of the data looks like as follow str(dat) mydate : chr "09/16/2019" "02/21/2021" "02/22/2021" "10/11/2017" ... Please note the format has changed from -mm-dd to mm/dd/ When I tried to change this as a Date using as.Date(as.Date(mydate, format="%m/%d/%Y" ) I am getting this error message Error in charToDate(x) : characte string is not in a standard unambiguous format My question is, 1. how can I read the file as it is (i.e., without changing the date format) ? 2. why does R change the date format? Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by group
Thank you all for your help! On Mon, Nov 1, 2021 at 8:47 PM Bert Gunter wrote: > > ... maybe not. According to Rdocumentation.org: > > reshape2's status is: > > reshape2 is retired: only changes necessary to keep it on CRAN will be > made. We recommend using tidyr <http://tidyr.tidyverse.org/> instead. > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Nov 1, 2021 at 5:55 PM Rasmus Liland wrote: > > > Dear Val, > > > > also consider using reshape2::dcast > > > > dat <- structure(list(Year = c(2001L, > > 2001L, 2001L, 2001L, 2001L, 2001L, > > 2002L, 2002L, 2002L, 2002L, 2002L, > > 2002L, 2003L, 2003L, 2003L, 2003L, > > 2003L, 2003L), Sex = c("M", "M", "M", > > "F", "F", "F", "M", "M", "M", "F", "F", > > "F", "M", "M", "M", "F", "F", "F"), wt = > > c(15L, 14L, 16L, 12L, 11L, 13L, 14L, > > 18L, 17L, 11L, 15L, 14L, 18L, 13L, 14L, > > 15L, 10L, 11L)), class = "data.frame", > > row.names = c(NA, -18L)) > > > > reshape2::dcast(data=dat, > > formula=Year~Sex, > > value.var="wt", > > fun.aggregate=mean) > > > > yielding > > > > YearFM > > 1 2001 12.0 15.0 > > 2 2002 13.3 16.3 > > 3 2003 12.0 15.0 > > > > Best, > > Rasmus > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by group
Thank you Avi, One question, I am getting this error from this script > dat %>% + + group_by(Year, Sex) %>% + + summarize( M = mean(wt, na.rm=TRUE)) %>% + + pivot_wider(names_from = Sex, values_from = M) %>% + + as.data.frame %>% + + round(1) Error in group_by(Year, Sex) : object 'Year' not found Why I am getting this? On Mon, Nov 1, 2021 at 7:07 PM Avi Gross via R-help wrote: > > Understood Val. So you need to save the output in something like a data.frame > which can then be saved as a CSV file or whatever else makes sense to be read > in by a later program. As note by() does not produce the output in a usable > way. > > But you mentioned efficient, and that is another whole ball of wax. For small > amounts of data it may not matter much. And some processes may look slower > but turn out be more efficient if compiled as C/C++ or ... > > Sometimes it might be more efficient to change the format of your data before > the analysis, albeit if the output is much smaller, maybe best later. > > Good luck. > > -Original Message- > From: Val > Sent: Monday, November 1, 2021 7:44 PM > To: Avi Gross > Cc: r-help mailing list > Subject: Re: [R] by group > > Thank you all! > I can assure you that this is not HW. This is a sample of my large data set > and I want a simple and efficient approach to get the > desired output in that particular format. That file will be saved > and used as an input file for another external process. > > val > > > > > > > > On Mon, Nov 1, 2021 at 6:08 PM Avi Gross via R-help > wrote: > > > > Jim, > > > > Your code gives the output in quite a different format and as an > > object of class "by" that is not easily convertible to a data.frame. > > So, yes, it is an answer that produces the right numbers but not in > > the places or data structures I think they (or if it is HW ...) wanted. > > > > Trivial standard cases are often handled by a single step but more > > complex ones often suggest a multi-part approach. > > > > Of course Val gets to decide what approach works best for them within > > whatever constraints we here are not made aware of. If this is a class > > assignment, it likely would be using only tools discussed in the > > class. So I would not suggest using a dplyr/tidyverse approach if that > > is not covered or even part of a class. If this is a project in the > > real world, it becomes a matter of programming taste and convenience and so > > on. > > > > Maybe Val can share more about the situation so we can see what is > > helpful and what is not. Realistically, I can think of way too many > > ways to get the required output. > > > > -Original Message- > > From: R-help On Behalf Of Jim Lemon > > Sent: Monday, November 1, 2021 6:25 PM > > To: Val ; r-help mailing list > > > > Subject: Re: [R] by group > > > > Hi Val, > > I think you answered your own question: > > > > by(dat$wt,dat[,c("Sex","Year")],mean) > > > > Jim > > > > On Tue, Nov 2, 2021 at 8:09 AM Val wrote: > > > > > > Hi All, > > > > > > How can I generate mean by group. The sample data looks like as > > > follow, dat<-read.table(text="Year Sex wt > > > 2001 M 15 > > > 2001 M 14 > > > 2001 M 16 > > > 2001 F 12 > > > 2001 F 11 > > > 2001 F 13 > > > 2002 M 14 > > > 2002 M 18 > > > 2002 M 17 > > > 2002 F 11 > > > 2002 F 15 > > > 2002 F 14 > > > 2003 M 18 > > > 2003 M 13 > > > 2003 M 14 > > > 2003 F 15 > > > 2003 F 10 > > > 2003 F 11 ",header=TRUE) > > > > > > The desired output is, > > > MF > > > 20011512 > > > 200216.33 13.33 > > > 200315 12 > > > > > > Thank you, > > > > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the
Re: [R] by group
Thank you all! I can assure you that this is not HW. This is a sample of my large data set and I want a simple and efficient approach to get the desired output in that particular format. That file will be saved and used as an input file for another external process. val On Mon, Nov 1, 2021 at 6:08 PM Avi Gross via R-help wrote: > > Jim, > > Your code gives the output in quite a different format and as an object of > class "by" that is not easily convertible to a data.frame. So, yes, it is an > answer that produces the right numbers but not in the places or data > structures I think they (or if it is HW ...) wanted. > > Trivial standard cases are often handled by a single step but more complex > ones often suggest a multi-part approach. > > Of course Val gets to decide what approach works best for them within > whatever constraints we here are not made aware of. If this is a class > assignment, it likely would be using only tools discussed in the class. So I > would not suggest using a dplyr/tidyverse approach if that is not covered or > even part of a class. If this is a project in the real world, it becomes a > matter of programming taste and convenience and so on. > > Maybe Val can share more about the situation so we can see what is helpful > and what is not. Realistically, I can think of way too many ways to get the > required output. > > -Original Message- > From: R-help On Behalf Of Jim Lemon > Sent: Monday, November 1, 2021 6:25 PM > To: Val ; r-help mailing list > Subject: Re: [R] by group > > Hi Val, > I think you answered your own question: > > by(dat$wt,dat[,c("Sex","Year")],mean) > > Jim > > On Tue, Nov 2, 2021 at 8:09 AM Val wrote: > > > > Hi All, > > > > How can I generate mean by group. The sample data looks like as > > follow, dat<-read.table(text="Year Sex wt > > 2001 M 15 > > 2001 M 14 > > 2001 M 16 > > 2001 F 12 > > 2001 F 11 > > 2001 F 13 > > 2002 M 14 > > 2002 M 18 > > 2002 M 17 > > 2002 F 11 > > 2002 F 15 > > 2002 F 14 > > 2003 M 18 > > 2003 M 13 > > 2003 M 14 > > 2003 F 15 > > 2003 F 10 > > 2003 F 11 ",header=TRUE) > > > > The desired output is, > > MF > > 20011512 > > 200216.33 13.33 > > 200315 12 > > > > Thank you, > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] by group
Hi All, How can I generate mean by group. The sample data looks like as follow, dat<-read.table(text="Year Sex wt 2001 M 15 2001 M 14 2001 M 16 2001 F 12 2001 F 11 2001 F 13 2002 M 14 2002 M 18 2002 M 17 2002 F 11 2002 F 15 2002 F 14 2003 M 18 2003 M 13 2003 M 14 2003 F 15 2003 F 10 2003 F 11 ",header=TRUE) The desired output is, MF 20011512 200216.33 13.33 200315 12 Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read
Let us take the max space is two and the output should not be fixed filed but preferable a csv file. On Mon, Feb 22, 2021 at 8:05 PM jim holtman wrote: > > Messed up did not see your 'desired' output which will be hard since there is > not a consistent number of spaces that would represent the desired column > number. Do you have any hit as to how to interpret the spacing especially > you have several hundred more lines? Is the output supposed to the 'fixed' > field? > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > > On Mon, Feb 22, 2021 at 5:00 PM jim holtman wrote: >> >> Try this: >> >> > library(tidyverse) >> >> > text <- "x1 x2 x3 x4\n1 B12 \n2 C23 \n322 B32 D34 \n4 >> > D44 \n51 D53\n60 D62 " >> >> > # read in the data as characters and replace multiple blanks with single >> > blank >> > input <- read_lines(text) >> >> > input <- str_replace_all(input, ' +', ' ') >> >> > mydata <- read_delim(input, ' ', col_names = TRUE) >> Warning: 5 parsing failures. >> row col expectedactual file >> 1 -- 4 columns 3 columns literal data >> 2 -- 4 columns 3 columns literal data >> 4 -- 4 columns 3 columns literal data >> 5 -- 4 columns 2 columns literal data >> 6 -- 4 columns 3 columns literal data >> >> > mydata >> # A tibble: 6 x 4 >> x1 x2x3x4 >> >> 1 1 B12 NANA >> 2 2 C23 NANA >> 3 322 B32 D34 NA >> 4 4 D44 NANA >> 551 D53 NANA >> 660 D62 NANA >> > >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> >> On Mon, Feb 22, 2021 at 4:49 PM Val wrote: >>> >>> That is my problem. The spacing between columns is not consistent. It >>> may be single space or multiple spaces (two or three). >>> >>> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap >>> wrote: >>> > >>> > You said the column values were separated by space characters. >>> > Copying the text from gmail shows that some column names and column >>> > values are separated by single spaces (e.g., between x1 and x2) and >>> > some by multiple spaces (e.g., between x3 and x4. Did the mail mess >>> > up the spacing or is there some other way to tell where the omitted >>> > values are? >>> > >>> > -Bill >>> > >>> > On Mon, Feb 22, 2021 at 2:54 PM Val wrote: >>> > > >>> > > I Tried that one and it did not work. Please see the error message >>> > > Error in read.table(text = "x1 x2 x3 x4\n1 B12 \n2 C23 >>> > > \n322 B32 D34 \n4D44 \n51 D53\n60 D62 ", >>> > > : >>> > > more columns than column names >>> > > >>> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap >>> > > wrote: >>> > > > >>> > > > Since the columns in the file are separated by a space character, " ", >>> > > > add the read.table argument sep=" ". >>> > > > >>> > > > -Bill >>> > > > >>> > > > On Mon, Feb 22, 2021 at 2:21 PM Val wrote: >>> > > > > >>> > > > > Hi all, I am trying to read a messy data but facing difficulty. >>> > > > > The >>> > > > > data has several columns separated by blank space(s). Each column >>> > > > > value may have different lengths across the rows. The first >>> > > > > row(header) has four columns. However, each row may not have the >>> > > > > four >>> > > > > column values. For instance, the first data row has only the first >>> > > > > two column values. The fourth data row has the first and last column >>> > > > > values, the second and the third column values are missing for this >>> > > > > row.. How do I read this data set correct
Re: [R] Read
That is my problem. The spacing between columns is not consistent. It may be single space or multiple spaces (two or three). On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap wrote: > > You said the column values were separated by space characters. > Copying the text from gmail shows that some column names and column > values are separated by single spaces (e.g., between x1 and x2) and > some by multiple spaces (e.g., between x3 and x4. Did the mail mess > up the spacing or is there some other way to tell where the omitted > values are? > > -Bill > > On Mon, Feb 22, 2021 at 2:54 PM Val wrote: > > > > I Tried that one and it did not work. Please see the error message > > Error in read.table(text = "x1 x2 x3 x4\n1 B12 \n2 C23 > > \n322 B32 D34 \n4D44 \n51 D53\n60 D62 ", > > : > > more columns than column names > > > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap > > wrote: > > > > > > Since the columns in the file are separated by a space character, " ", > > > add the read.table argument sep=" ". > > > > > > -Bill > > > > > > On Mon, Feb 22, 2021 at 2:21 PM Val wrote: > > > > > > > > Hi all, I am trying to read a messy data but facing difficulty. The > > > > data has several columns separated by blank space(s). Each column > > > > value may have different lengths across the rows. The first > > > > row(header) has four columns. However, each row may not have the four > > > > column values. For instance, the first data row has only the first > > > > two column values. The fourth data row has the first and last column > > > > values, the second and the third column values are missing for this > > > > row.. How do I read this data set correctly? Here is my sample data > > > > set, output and desired output. To make it clear to each data point > > > > I have added the row and column numbers. I cannot use fixed width > > > > format reading because each row may have different length for a > > > > given column. > > > > > > > > dat<-read.table(text="x1 x2 x3 x4 > > > > 1 B22 > > > > 2 C33 > > > > 322 B22 D34 > > > > 4 D44 > > > > 51 D53 > > > > 60 D62",header=T, fill=T,na.strings=c("","NA")) > > > > > > > > Output > > > > x1 x2 x3 x4 > > > > 1 1 B12 NA > > > > 2 2C23 NA > > > > 3 322 B32 D34 NA > > > > 4 4 D44NA > > > > 5 51 D53 NA > > > > 6 60 D62NA > > > > > > > > > > > > Desired output > > > >x1 x2 x3 x4 > > > > 1 1B22 NA > > > > 2 2 C33 NA > > > > 3 322 B32NA D34 > > > > 4 4 NA D44 > > > > 5 51D53 NA > > > > 6 60 D62 NA > > > > > > > > Thank you, > > > > > > > > __ > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read
I Tried that one and it did not work. Please see the error message Error in read.table(text = "x1 x2 x3 x4\n1 B12 \n2 C23 \n322 B32 D34 \n4D44 \n51 D53\n60 D62 ", : more columns than column names On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap wrote: > > Since the columns in the file are separated by a space character, " ", > add the read.table argument sep=" ". > > -Bill > > On Mon, Feb 22, 2021 at 2:21 PM Val wrote: > > > > Hi all, I am trying to read a messy data but facing difficulty. The > > data has several columns separated by blank space(s). Each column > > value may have different lengths across the rows. The first > > row(header) has four columns. However, each row may not have the four > > column values. For instance, the first data row has only the first > > two column values. The fourth data row has the first and last column > > values, the second and the third column values are missing for this > > row.. How do I read this data set correctly? Here is my sample data > > set, output and desired output. To make it clear to each data point > > I have added the row and column numbers. I cannot use fixed width > > format reading because each row may have different length for a > > given column. > > > > dat<-read.table(text="x1 x2 x3 x4 > > 1 B22 > > 2 C33 > > 322 B22 D34 > > 4 D44 > > 51 D53 > > 60 D62",header=T, fill=T,na.strings=c("","NA")) > > > > Output > > x1 x2 x3 x4 > > 1 1 B12 NA > > 2 2C23 NA > > 3 322 B32 D34 NA > > 4 4 D44NA > > 5 51 D53 NA > > 6 60 D62NA > > > > > > Desired output > >x1 x2 x3 x4 > > 1 1B22 NA > > 2 2 C33 NA > > 3 322 B32NA D34 > > 4 4 NA D44 > > 5 51D53 NA > > 6 60 D62 NA > > > > Thank you, > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Read
Hi all, I am trying to read a messy data but facing difficulty. The data has several columns separated by blank space(s). Each column value may have different lengths across the rows. The first row(header) has four columns. However, each row may not have the four column values. For instance, the first data row has only the first two column values. The fourth data row has the first and last column values, the second and the third column values are missing for this row.. How do I read this data set correctly? Here is my sample data set, output and desired output. To make it clear to each data point I have added the row and column numbers. I cannot use fixed width format reading because each row may have different length for a given column. dat<-read.table(text="x1 x2 x3 x4 1 B22 2 C33 322 B22 D34 4 D44 51 D53 60 D62",header=T, fill=T,na.strings=c("","NA")) Output x1 x2 x3 x4 1 1 B12 NA 2 2C23 NA 3 322 B32 D34 NA 4 4 D44NA 5 51 D53 NA 6 60 D62NA Desired output x1 x2 x3 x4 1 1B22 NA 2 2 C33 NA 3 322 B32NA D34 4 4 NA D44 5 51D53 NA 6 60 D62 NA Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Undesired result
Very helpful and thank you so much! On Wed, Feb 17, 2021 at 12:50 PM Duncan Murdoch wrote: > > On 17/02/2021 9:50 a.m., Val wrote: > > HI All, > > > > I am reading a data file which has different date formats. I wanted to > > standardize to one format and used a library anytime but got > > undesired results as shown below. It gave me year 2093 instead of 1993 > > > > > > library(anytime) > > DFX<-read.table(text="name ddate > >A 19-10-02 > >D 11/19/2006 > >F 9/9/2011 > >G1 12/29/2010 > >AA 10/18/93 ",header=TRUE) > > getFormats() > > addFormats(c("%d-%m-%y")) > > addFormats(c("%m-%d-%y")) > > addFormats(c("%Y/%d/%m")) > > addFormats(c("%m/%d/%y")) > > > > DFX$anew=anydate(DFX$ddate) > > > > Output > > name ddate anew > > 1A 19-10-02 2002-10-19 > > 2D 11/19/2006 2020-11-19 > > 3F 9/9/2011 2011-09-09 > > 4 G1 12/29/2010 2020-12-29 > > 5 AA 10/18/93 2093-10-18 > > > > The problem is in the last row. It should be 1993-10-18 instead of > > 2093-10-18 > > > > How do I correct this? > > This looks a little tricky. The basic idea is that the %y format has to > guess at the century, but the guess depends on things specific to your > system. So what would be nice is to say "two digit dates should be > assumed to fall between 1922 and 2021", but there's no way to do that > directly. > > What you could do is recognize when you have a two digit year, and then > force the result into the range you want. Here's a function that does > that, but it's not really tested much at all, so be careful if you use > it. (One thing: I recommend the 'useR = TRUE' option to anydate(); it > worked better in my tests than the default.) > > adjustCentury <- function(inputString, >outputDate = anydate(inputString, useR = TRUE), >start = "1922-01-01") { > >start <- as.Date(start) > >twodigityear <- !grepl("[[:digit:]]{4}", inputString) > >while (length(bad <- which(twodigityear & outputDate < start))) { > for (i in bad) { >longdate <- as.POSIXlt(outputDate[i]) >longdate$year <- longdate$year + 100 >outputDate[i] <- as.Date(longdate) > } >} >longdate <- as.POSIXlt(start) >longdate$year <- longdate$year + 100 >finish <- as.Date(longdate) > >while (length(bad <- which(twodigityear & outputDate >= finish))) { > for (i in bad) { >longdate <- as.POSIXlt(outputDate[i]) >longdate$year <- longdate$year - 100 >outputDate[i] <- as.Date(longdate) > } >} >outputDate > } > > library(anytime) > DFX<-read.table(text="name ddate >A 19-10-02 >D 11/19/2006 >F 9/9/2011 >G1 12/29/2010 >AA 10/18/93 >BB 10/18/1893 >CC 10/18/2093",header=TRUE) > > addFormats(c("%d-%m-%y")) > addFormats(c("%m-%d-%y")) > addFormats(c("%Y/%d/%m")) > addFormats(c("%m/%d/%y")) > > DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01") > DFX > #> name ddate anew > #> 1A 19-10-02 2019-10-02 > #> 2D 11/19/2006 2006-11-19 > #> 3F 9/9/2011 2011-09-09 > #> 4 G1 12/29/2010 2010-12-29 > #> 5 AA 10/18/93 1993-10-18 > #> 6 BB 10/18/1893 1893-10-18 > #> 7 CC 10/18/2093 2093-10-18 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Undesired result
HI All, I am reading a data file which has different date formats. I wanted to standardize to one format and used a library anytime but got undesired results as shown below. It gave me year 2093 instead of 1993 library(anytime) DFX<-read.table(text="name ddate A 19-10-02 D 11/19/2006 F 9/9/2011 G1 12/29/2010 AA 10/18/93 ",header=TRUE) getFormats() addFormats(c("%d-%m-%y")) addFormats(c("%m-%d-%y")) addFormats(c("%Y/%d/%m")) addFormats(c("%m/%d/%y")) DFX$anew=anydate(DFX$ddate) Output name ddate anew 1A 19-10-02 2002-10-19 2D 11/19/2006 2020-11-19 3F 9/9/2011 2011-09-09 4 G1 12/29/2010 2020-12-29 5 AA 10/18/93 2093-10-18 The problem is in the last row. It should be 1993-10-18 instead of 2093-10-18 How do I correct this? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] create
Hi all, I have a sample of data as shown below, dt <-read.table(text="name Item check A DESK NORF B RANGE GARRA C CLOCKPALM D DESK RR E ALARMDESPRF H DESK RF K DESK CORR K WARF CORR G NONE RF ",header=TRUE, fill=T) I want create another column (flag2) and assign a value 0 or 1 if the check column values are within code2 list and Item is DESK then flag2 =1 otherwise 0 code2=c("RR","RF") index2=grep(paste(code2,collapse="|"),dt$check) dt$flag2=0 dt$flag2[index2]=1 How can I add the second condition? Desired output is shown below name Itemcheckflag2 1A DESK NORF 0 2B RANGE GARRA 0 3C CLOCK PALM 0 4D DESK RR 1 5E ALARM DESPRF 0 6H DESK RF 1 7K DESK CORR 0 8K WARF CORR 0 9G NONE RF 0 Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split
Thank you again for your help and giving me the opportunity to choose the efficient method. For a small data set there is no discernable difference between the different approaches. I will carry out a comparison using the large data set. On Wed, Sep 23, 2020 at 11:52 AM LMH wrote: > > Below is a script in bash the uses the awk tokenizer to do the work. > > This assumes that your input and output delimiter is space. The number of > consecutive delimiters in > the input is not important. This also assumes that the input file does not > have a header row. That > is easy to modify if you want. I always keep header rows in my data files as > I think that removing > them is asking for trouble down the road. > > I added a NULL for cases where there is no value for the last field. You > could use "." if you want. > > You should be able to find how to run this from inside R if you want. You > will, of course, need a > bash environment to run this, so if you are not in linux you will need cygwin > or something similar. > > This should be very fast, but let me know if needs to be faster. If the X1_X2 > variant occurs less > frequently than not then we should switch the order in which the logic > evaluates the options. > > LMH > > > #! /bin/bash > > # input filename > input_file=$1 > > # output filename > output_file=$2 > > # make sure the input file exists > if [ ! -f $input_file ]; then >echo $input_file " cannot be found" >exit 0 > fi > > # create the output file > touch $output_file > > # make sure the output was created > if [ ! -f $output_file ]; then >echo $output_file " was not created" >exit 0 > fi > > # write the header row > echo "ID1 ID2 Y1 X1 X2" >> $output_file > > # character to find in the third token > look_for='_' > > # process with awk > # if the 3rd token contains '_' > # split the third token on '_' into F[1] and F[2] > # print the first two tokens, the indicator value of 1, and the split > fields F[1] and F[2] > # otherwise, > # print the first two tokens, the indicator value of 0, the 3rd token, and > NULL > > cat $input_file | \ > awk -v find_char=$look_for '{ if($3 ~ find_char) { { split ($3, F, "_") } > { print $1, $2, "1", F[1], > F[2] } > } > else { print $1, $2, "0", $3, "NULL" } > }' >> $output_file > > > > > > > > Val wrote: > > Thank you all for the help! > > > > LMH, Yes I would like to see the alternative. I am using this for a > > large data set and if the alternative is more efficient than this > > then I would be happy. > > > > On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter wrote: > >> > >> To be clear, I think Rui's solution is perfectly fine and probably better > >> than what I offer below. But just for fun, I wanted to do it without the > >> lapply(). Here is one way. I think my comments suffice to explain. > >> > >>> ## which are the non "_" indices? > >>> wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE) > >>> ## paste "_." to these > >>> F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_") > >>> ## Now strsplit() and unlist() them to get a vector > >>> z <- unlist(strsplit(F1$text, "_")) > >>> ## now cbind() to the data frame > >>> F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE)) > >>> F1 > >> ID1 ID2 text1 2 > >> 1 A1 B1 NONE_. NONE . > >> 2 A1 B1 cf_12 cf 12 > >> 3 A1 B1 NONE_. NONE . > >> 4 A2 B2 X2_25 X2 25 > >> 5 A2 B3 fd_15 fd 15 > >>> ## You can change the names of the 2 columns yourself > >> > >> Cheers, > >> Bert > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along and > >> sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas wrote: > >>> > >>> Hello, > >>> > >>> A base R solution with strsplit, like in your code. > >>> > >>> F1$Y1 <- +grepl("_", F1$text) > >>> > >>>
Re: [R] Split
Thank you all for the help! LMH, Yes I would like to see the alternative. I am using this for a large data set and if the alternative is more efficient than this then I would be happy. On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter wrote: > > To be clear, I think Rui's solution is perfectly fine and probably better > than what I offer below. But just for fun, I wanted to do it without the > lapply(). Here is one way. I think my comments suffice to explain. > > > ## which are the non "_" indices? > > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE) > > ## paste "_." to these > > F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_") > > ## Now strsplit() and unlist() them to get a vector > > z <- unlist(strsplit(F1$text, "_")) > > ## now cbind() to the data frame > > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE)) > > F1 > ID1 ID2 text1 2 > 1 A1 B1 NONE_. NONE . > 2 A1 B1 cf_12 cf 12 > 3 A1 B1 NONE_. NONE . > 4 A2 B2 X2_25 X2 25 > 5 A2 B3 fd_15 fd 15 > >## You can change the names of the 2 columns yourself > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas wrote: >> >> Hello, >> >> A base R solution with strsplit, like in your code. >> >> F1$Y1 <- +grepl("_", F1$text) >> >> tmp <- strsplit(as.character(F1$text), "_") >> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x) >> tmp <- do.call(rbind, tmp) >> colnames(tmp) <- c("X1", "X2") >> F1 <- cbind(F1[-3], tmp)# remove the original column >> rm(tmp) >> >> F1 >> # ID1 ID2 Y1 X1 X2 >> #1 A1 B1 0 NONE . >> #2 A1 B1 1 cf 12 >> #3 A1 B1 0 NONE . >> #4 A2 B2 1 X2 25 >> #5 A2 B3 1 fd 15 >> >> >> Note that cbind dispatches on F1, an object of class "data.frame". >> Therefore it's the method cbind.data.frame that is called and the result >> is also a df, though tmp is a "matrix". >> >> >> Hope this helps, >> >> Rui Barradas >> >> >> Às 20:07 de 22/09/20, Rui Barradas escreveu: >> > Hello, >> > >> > Something like this? >> > >> > >> > F1$Y1 <- +grepl("_", F1$text) >> > F1 <- F1[c(1, 2, 4, 3)] >> > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill = >> > "right") >> > F1 >> > >> > >> > Hope this helps, >> > >> > Rui Barradas >> > >> > Às 19:55 de 22/09/20, Val escreveu: >> >> HI All, >> >> >> >> I am trying to create new columns based on another column string >> >> content. First I want to identify rows that contain a particular >> >> string. If it contains, I want to split the string and create two >> >> variables. >> >> >> >> Here is my sample of data. >> >> F1<-read.table(text="ID1 ID2 text >> >> A1 B1 NONE >> >> A1 B1 cf_12 >> >> A1 B1 NONE >> >> A2 B2 X2_25 >> >> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) >> >> If the variable "text" contains this "_" I want to create an indicator >> >> variable as shown below >> >> >> >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0) >> >> >> >> >> >> Then I want to split that string in to two, before "_" and after "_" >> >> and create two variables as shown below >> >> x1= strsplit(as.character(F1$text),'_',2) >> >> >> >> My problem is how to combine this with the original data frame. The >> >> desired output is shown below, >> >> >> >> >> >> ID1 ID2 Y1 X1X2 >> >> A1 B10 NONE . >> >> A1 B1 1cf12 >> >> A1 B1 0 NONE . >> >> A2 B2 1X225 >> >> A2 B3 1fd15 >> >> >> >> Any help? >> >> Thank you. >> >> >> >> __ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Split
HI All, I am trying to create new columns based on another column string content. First I want to identify rows that contain a particular string. If it contains, I want to split the string and create two variables. Here is my sample of data. F1<-read.table(text="ID1 ID2 text A1 B1 NONE A1 B1 cf_12 A1 B1 NONE A2 B2 X2_25 A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) If the variable "text" contains this "_" I want to create an indicator variable as shown below F1$Y1 <- ifelse(grepl("_", F1$text),1,0) Then I want to split that string in to two, before "_" and after "_" and create two variables as shown below x1= strsplit(as.character(F1$text),'_',2) My problem is how to combine this with the original data frame. The desired output is shown below, ID1 ID2 Y1 X1X2 A1 B10 NONE . A1 B1 1cf12 A1 B1 0 NONE . A2 B2 1X225 A2 B3 1fd15 Any help? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] date
Hi All, I am trying to sort dates within a group. My sample data is df <-read.table(text="ID date A1 09/17/04 A1 01/27/05 A1 05/07/03 A2 05/21/17 A2 09/12/16 A3 01/25/13 A4 09/27/19",header=TRUE,stringsAsFactors=F) df$date2 = as.Date(strptime(df$date,format="%m/%d/%y")) df$date =NULL I want to sort date2 from recent to oldest. within the ID group and I used this, df <- df[order(df$ID, rev((df$date2))),]. It did not work and teh output is shown below. ID date2 2 A1 2005-01-27 3 A1 2003-05-07 1 A1 2004-09-17 5 A2 2016-09-12 4 A2 2017-05-21 6 A3 2013-01-25 7 A4 2019-09-27 What am I missing? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sort
HI All, I have a sample of data frame DF1<-read.table(text="name ddate A 2019-10-28 A 2018-01-25 A 2020-01-12 A 2017-10-20 B 2020-11-20 B 2019-10-20 B 2017-05-20 B 2020-01-20 c 2009-10-01 ",header=TRUE) 1. I want sort by name and ddate on decreasing order and the output should like as follow A 2020-01-12 A 2019-01-12 A 2018-01-25 A 2017-10-20 B 2020-11-21 B 2020-11-01 B 2019-10-20 B 2017-05-20 c 2009-10-01 2. Take the top two rows by group( names) and the out put should like A 2020-01-12 A 2019-01-12 B 2020-11-21 B 2020-11-01 c 2009-10-01 3. Within each group (name) get the date difference between the first and second rows dates. If a group has only one row then the difference should be 0 The final out put is Name diff A 365 B 20 C 0 Here is my attempt and have an issue at the sorting DF1$DTime <- as.POSIXct(DF1$ddate , format = "%Y-%m-%d") DF2 <- DF1[order(DF1$name, ((as.Date(DF1$DTime, decreasing = TRUE, ] not working Any help? Thank you __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Label
Thank you Jim, Is it possible to format the label box? The labels(numbers) are surrounded by a big square and wanted to remove it. I just want display only the number. I searched up the documentation for "barlabels" and there is no such example barlabels(xpos,ypos,labels=NULL,cex=1,prop=0.5,miny=0,offset=0,...) Thank you. On Thu, Apr 2, 2020 at 9:38 PM Jim Lemon wrote: > > Hi Val, > > library(plotrix) > barpos<-barplot(dat$count, names.arg=c("A", "B", "C","D"), > col="blue", > ylim = c(0,30), > ylab = "Count", > xlab = "Grade") > barlabels(barpos,dat$count,prop=1) > > Jim > > On Fri, Apr 3, 2020 at 1:31 PM Val wrote: > > > > Hi all, > > > > I have a sample of data set, > > > > dat <- read.table(header=TRUE, text='Lab count > > A 24 > > B 19 > > C 30 > > D 18') > > > > barplot(dat$count, names.arg=c("A", "B", "C","D"), > > col="blue", > > ylim = c(0,30), > > ylab = "Count", > > xlab = "Grade") > > > > I want add the number of counts at the top of each bar plot. How can I do > > that? > > Thank you in advance > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Label
Hi all, I have a sample of data set, dat <- read.table(header=TRUE, text='Lab count A 24 B 19 C 30 D 18') barplot(dat$count, names.arg=c("A", "B", "C","D"), col="blue", ylim = c(0,30), ylab = "Count", xlab = "Grade") I want add the number of counts at the top of each bar plot. How can I do that? Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed format
Thank you all for your help. My data has mixed format such as %m/%d/%y,%d/%m/%y,%m-%d-%y,%d-%m-%y etc. and the library (anytime) handles it very well!! Thank you again. On Tue, Jan 21, 2020 at 5:28 AM Rui Barradas wrote: > > Hello, > > Inline. > > Às 09:22 de 21/01/20, Chris Evans escreveu: > > I think that might risk giving the wrong date for a date like 1/3/1990 > > which I think in Val's data is mdy data not dmy. > > > > As I read the data, where the separator is "/" the format is mdy and where > > the separator is "-" it's dmy. > > Maybe you're right. But I really don't know, in my country (Portugal) we > use "/" and dmy. Anyway, what's important is that the OP must have a > much better understanding of the data, the way it is posted is likely to > cause errors. See, for instance, the expected output with numbers > greater than 12 in the 1st and 2nd places, depending on the row. > > > So I would > > go for: > > > > library(lubridate) > > DFX$dnew[grep("-", DFX$ddate, fixed = TRUE)] <- dmy(DFX$ddate[grep("-", > > DFX$ddate, fixed = TRUE)]) > > DFX$dnew[grep("/", DFX$ddate, fixed = TRUE)] <- mdy(DFX$ddate[grep("/", > > DFX$ddate, fixed = TRUE)]) > > DFX <- DFX[!is.na(DFX$dnew),] > > DFX > > > >name ddate dnew > > 1A 19-10-02 2002-10-19 > > 2B 22-11-20 2020-11-22 > > 3C 19-01-15 2015-01-19 > > 4D 11/19/2006 2006-11-19 > > 5F 9/9/2011 2011-09-09 > > 6G 12/29/2010 2010-12-29 > > > > But I am so much in awe of Rui's skills with R, and that of most of the > > regular commentators here, that I submit > > this a little nervously! > > Thanks! > > Rui Barradas > > > > Many thanks to all who teach me so much here, lovely, if I am correct, to > > contribute for a change! > > > > Chris > > > > > > - Original Message - > >> From: "Rui Barradas" > >> To: "Val" , "r-help@R-project.org > >> (r-help@r-project.org)" > >> Sent: Tuesday, 21 January, 2020 00:40:29 > >> Subject: Re: [R] Mixed format > > > >> Hello, > >> > >> The following strategy works with your data. > >> It uses the fact that most dates are in one of 3 formats, dmy, mdy, ymd. > >> It tries those formats one by one, after each try looks for NA's in the > >> new column. > >> > >> > >> # first round, format is dmy > >> DFX$dnew <- lubridate::dmy(DFX$ddate) > >> na <- is.na(DFX$dnew) > >> > >> # second round, format is mdy > >> DFX$dnew[na] <- lubridate::mdy(DFX$ddate[na]) > >> na <- is.na(DFX$dnew) > >> > >> # last round, format is ymd > >> DFX$dnew[na] <- lubridate::ymd(DFX$ddate[na]) > >> > >> # remove what didn't fit any format > >> DFX <- DFX[!is.na(DFX$dnew), ] > >> DFX > >> > >> > >> Hope this helps, > >> > >> Rui Barradas > >> > >> Às 22:58 de 20/01/20, Val escreveu: > >>> Hi All, > >>> > >>> I have a data frame where one column is a mixed date format, > >>> a date in the form "%m-%d-%y" and "%m/%d/%Y", also some are not in date > >>> format. > >>> > >>> Is there a way to delete the rows that contain non-dates and > >>> standardize the dates in one date format like %m-%d-%Y? > >>> Please see my sample data and desired output > >>> > >>> DFX<-read.table(text="name ddate > >>> A 19-10-02 > >>> B 22-11-20u > >>> C 19-01-15 > >>> D 11/19/2006 > >>> F 9/9/2011 > >>> G 12/29/2010 > >>> H DEX",header=TRUE) > >>> > >>> Desired output > >>> name ddate > >>> A 19-10-2002 > >>> B 22-11-2020 > >>> C 19-01-2015 > >>> D 11-19-2006 > >>> F 09-09-2011 > >>> G 12-29-2010 > >>> > >>> Thank you > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mixed format
Hi All, I have a data frame where one column is a mixed date format, a date in the form "%m-%d-%y" and "%m/%d/%Y", also some are not in date format. Is there a way to delete the rows that contain non-dates and standardize the dates in one date format like %m-%d-%Y? Please see my sample data and desired output DFX<-read.table(text="name ddate A 19-10-02 B 22-11-20 C 19-01-15 D 11/19/2006 F 9/9/2011 G 12/29/2010 H DEX",header=TRUE) Desired output name ddate A 19-10-2002 B 22-11-2020 C 19-01-2015 D 11-19-2006 F 09-09-2011 G 12-29-2010 Thank you __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] date
Hi All, I wanted to to convert character date mm/dd/yy to -mm-dd The sample data and my attempt is shown below gs <-read.table(text="ID date A1 09/27/03 A2 05/27/16 A3 01/25/13 A4 09/27/19",header=TRUE,stringsAsFactors=F) Desired output ID date d1 A1 09/27/03 2003-09-27 A2 05/27/16 2016-05-27 A3 01/25/13 2012-04-25 A4 09/27/19 2019-09-27 I used this gs$d1 = as.Date(as.character(gs$date), format = "%Y-%m-%d") but I got NA's. How do I get my desired result? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditions
HI All, I am having a little issue in my ifelse statement, The data frame looks like as follow. dat2 <-read.table(text="ID d1 d2 d3 A 0 25 35 B 12 22 0 C 0 0 31 E 10 20 30 F 0 0 0",header=TRUE,stringsAsFactors=F) I want to create d4 and set the value based on the following conditions. If d1 !=0 then d4=d1 if d1 = 0 and d2 !=0 then d4=d2 if (d1 and d2 = 0) and d3 !=0 then d4=d3 if all d1, d2 and d3 =0 then d4=0 Here is the desired output and my attempt ID d1 d2 d3 d4 A 0 25 35 25 B 12 22 0 12 C 0 0 31 31 E 10 20 30 10 F 0 0 0 0 0 My attempt dat2$d4 <- 0 dat2$d4 <- ifelse((dat2$d1 =="0"), dat2$d2, ifelse(dat2$d2 == "0"), dat2$d3, 0) but not working. Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] File conca.
Thank you Petr and Jeff fro your suggestions. I made some improvement but still need some tweaking. I could not get correctly the folders names added to each row. Only the last forename was added. table(Alldata$oldername) resulted week2 25500 Please see the complete, folders=c("week1","week2") for(i in folders){ path=paste("\data\"", i , sep = "") wd <- setwd(path) Flist = list.files(path,pattern = "^WT") dataA = lapply(Flist, function(x)read.csv(x, header=T)) setwd(wd) temp = do.call("rbind", Alldata) temp$foldername <- i Alldata <- temp Alldata <- rbind(Alldata, temp) } ### Any suggestion please? On Tue, Nov 5, 2019 at 2:13 AM PIKAL Petr wrote: > > Hi > > Help with such operations is rather tricky as only you know exact structrure > of your folders. > > see some hints in line > > > -Original Message- > > From: R-help On Behalf Of Val > > Sent: Tuesday, November 5, 2019 4:33 AM > > To: r-help@R-project.org (r-help@r-project.org) > > Subject: [R] File conca. > > > > Hi All, > > > > I have data files in several folders and want combine all these files in > one > > file. In each folder there are several files and these > > files have the same structure but different names. First, in each > > folder I want to concatenate(rbind) all files in to one file. While I am > > reading each files and concatenating (rbind) all files, I want to added > the > > folder name as one variable in each row. I am reading the folder names > > from a file and for demonstration I am using only two folders as shown > > below. > > Data\week1 # folder name 1 > >WT13.csv > >WT26.csv ... > >WT10.csv > > Data\week2#folder name 2 > >WT02.csv > >WT12.csv > > > > Below please find my attempt, > > > > folders=c("week1","week2") > > for(i in folders){ > > path=paste("\data\"", i , sep = "") > > setwd(path) > > you should use > wd <- setwd(path) > > which keeps the original directory for subsequent use > > > Flist = list.files(path,pattern = "^WT") > > dataA = lapply(Flist, function(x)read.csv(x, header=T)) > > Alldata = do.call("rbind", dataA) # combine all files > > Alldata$foldername=i # adding the folder name > > > > now you can do > > setwd(wd) > > to return to original directory > } > > > The above works for for one folder but how can I do it for more than one > > folders? > > You also need to decide if you want all data from all folders in one object > called Alldata or if you want several Alldata objects, one for each folder. > > In second case you could use list structure for Alldata. In the first case > you could store data from each folder in some temporary object and use rbind > directly. > > something like > > temp <- do.call("rbind", dataA) > temp$foldername <- i > > Alldata <- temp > in the first cycle > and > Alldata <- rbind(Alldata, temp) > in second and all others. > > Or you could initiate first Alldata manually and use only > Alldata <- rbind(Alldata, temp) > > in your loop. > > Cheers > Petr > > > > > Thank you in advance, > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] File conca.
Hi All, I have data files in several folders and want combine all these files in one file. In each folder there are several files and these files have the same structure but different names. First, in each folder I want to concatenate(rbind) all files in to one file. While I am reading each files and concatenating (rbind) all files, I want to added the folder name as one variable in each row. I am reading the folder names from a file and for demonstration I am using only two folders as shown below. Data\week1 # folder name 1 WT13.csv WT26.csv ... WT10.csv Data\week2#folder name 2 WT02.csv WT12.csv Below please find my attempt, folders=c("week1","week2") for(i in folders){ path=paste("\data\"", i , sep = "") setwd(path) Flist = list.files(path,pattern = "^WT") dataA = lapply(Flist, function(x)read.csv(x, header=T)) Alldata = do.call("rbind", dataA) # combine all files Alldata$foldername=i # adding the folder name } The above works for for one folder but how can I do it for more than one folders? Thank you in advance, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] If statement
Hi all, I am trying to use the if else statement and create two new columns based on the existing two columns. Below please find my sample data, dat1 <-read.table(text="ID a b c d A private couple 25 35 B private single 24 38 C none single28 32 E none none 20 36 ",header=TRUE,stringsAsFactors=F) dat1$z <- "Zero" dat1$y <- 0 if a is "private" and (b is either "couple" rr "single" then z value = a's value and y value = c's value if a is "none" and ( b is either couple of single then z= private then z value =b's value qnd y value= d's value else z value= Zero and y value=0 the desired out put looks like ID a b c d z y 1 A private couple 25 35 private 25 2 B private single 24 38 private 24 3 Cnone single 28 32 single 32 4 Enone none 20 36 Zero0 my attempt if (dat1$a =="private" & (dat1$b =="couple"| dat1$b =="single")) { dat1$z <- dat1$a dat1$y <- dat1$c } else if (dat1$a =="none" & (dat1$b =="couple"| dat1$b =="single")) { dat1$z <- dat1$b dat1$y <- dat1$c } else { default value} did not wok, how could I fix this? Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] new_index
Thank you Jeff and all. I wish to go back to my student life. ID is not necessary in dat2, sorry for that. On Sat, Sep 7, 2019 at 5:10 PM Jeff Newmiller wrote: > > Val has been posting to this list for almost a decade [1] so seems unlikely > to be a student... but in all this time has yet to figure out how to post in > plain text to avoid corruption of code on this plain text mailing list. The > ability to generate small examples has improved, though execution still seems > hazy. Why is there an ID column in dat2 at all? > > Try > > dat3 <- dat1[ 1,, drop=FALSE ] > dat3$Index <- as.matrix( dat1[ -1 ] ) %*% dat2$weight > > [1] https://stat.ethz.ch/pipermail/r-help/2010-March/233533.html > > On September 7, 2019 12:38:12 PM PDT, Bert Gunter > wrote: > >dat1 is wrong also. It should read: > > > >dat1 <-read.table(text="ID, x, y, z > > A, 10, 34, 12 > > B, 25, 42, 18 > > C, 14, 20, 8 ",sep=",",header=TRUE,stringsAsFactors=F) > > > >Is this a homework problem? This list has a no homework policy. > > > >Cheers, > >Bert > > > >Bert Gunter > > > >"The trouble with having an open mind is that people keep coming along > >and > >sticking things into it." > >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > >On Sat, Sep 7, 2019 at 12:24 PM Val wrote: > > > >> Hi all > >> > >> Correction for my previous posting. > >> dat2 should be read as > >> dat2 <-read.table(text="ID, weight > >> A, 0.25 > >> B, 0.42 > >> C, 0.65 ",sep=",",header=TRUE,stringsAsFactors=F) > >> > >> On Sat, Sep 7, 2019 at 1:46 PM Val wrote: > >> > > >> > Hi All, > >> > > >> > I have two data frames with thousand rows and several columns. > >My > >> > samples of the data frames are shown below > >> > > >> > dat1 <-read.table(text="ID, x, y, z > >> > ID , x, y, z > >> > A, 10, 34, 12 > >> > B, 25, 42, 18 > >> > C, 14, 20, 8 ",sep=",",header=TRUE,stringsAsFactors=F) > >> > > >> > dat2 <-read.table(text="ID, x, y, z > >> > ID, weight > >> > A, 0.25 > >> > B, 0.42 > >> > C, 0.65 ",sep=",",header=TRUE,stringsAsFactors=F) > >> > > >> > My goal is to create an index value for each ID by mutliplying > >the > >> > first row of dat1 by the second column of dat2. > >> > > >> > (10*0.25 ) + (34*0.42) + (12*0.65)= 24.58 > >> > (25*0.25 ) + (42*0.42) + (18*0.65)= 35.59 > >> > (14*0.25 ) + (20*0.42) + ( 8*0.65)= 19.03 > >> > > >> > The desired out put is > >> > dat3 > >> > ID, Index > >> > A 24.58 > >> > B 35.59 > >> > C 19.03 > >> > > >> > How do I do it in an efficent way? > >> > > >> > Thank you, > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] new_index
Hi all Correction for my previous posting. dat2 should be read as dat2 <-read.table(text="ID, weight A, 0.25 B, 0.42 C, 0.65 ",sep=",",header=TRUE,stringsAsFactors=F) On Sat, Sep 7, 2019 at 1:46 PM Val wrote: > > Hi All, > > I have two data frames with thousand rows and several columns. My > samples of the data frames are shown below > > dat1 <-read.table(text="ID, x, y, z > ID , x, y, z > A, 10, 34, 12 > B, 25, 42, 18 > C, 14, 20, 8 ",sep=",",header=TRUE,stringsAsFactors=F) > > dat2 <-read.table(text="ID, x, y, z > ID, weight > A, 0.25 > B, 0.42 > C, 0.65 ",sep=",",header=TRUE,stringsAsFactors=F) > > My goal is to create an index value for each ID by mutliplying the > first row of dat1 by the second column of dat2. > > (10*0.25 ) + (34*0.42) + (12*0.65)= 24.58 > (25*0.25 ) + (42*0.42) + (18*0.65)= 35.59 > (14*0.25 ) + (20*0.42) + ( 8*0.65)= 19.03 > > The desired out put is > dat3 > ID, Index > A 24.58 > B 35.59 > C 19.03 > > How do I do it in an efficent way? > > Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] new_index
Hi All, I have two data frames with thousand rows and several columns. My samples of the data frames are shown below dat1 <-read.table(text="ID, x, y, z ID , x, y, z A, 10, 34, 12 B, 25, 42, 18 C, 14, 20, 8 ",sep=",",header=TRUE,stringsAsFactors=F) dat2 <-read.table(text="ID, x, y, z ID, weight A, 0.25 B, 0.42 C, 0.65 ",sep=",",header=TRUE,stringsAsFactors=F) My goal is to create an index value for each ID by mutliplying the first row of dat1 by the second column of dat2. (10*0.25 ) + (34*0.42) + (12*0.65)= 24.58 (25*0.25 ) + (42*0.42) + (18*0.65)= 35.59 (14*0.25 ) + (20*0.42) + ( 8*0.65)= 19.03 The desired out put is dat3 ID, Index A 24.58 B 35.59 C 19.03 How do I do it in an efficent way? Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read
Thank you Jeff! That was so easy command. On Thu, Aug 8, 2019 at 11:06 PM Bert Gunter wrote: > > I stand corrected! > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Aug 8, 2019 at 7:11 PM Jeff Newmiller > wrote: >> >> Val 1 >> Bert 0 >> >> On August 8, 2019 5:22:13 PM PDT, Bert Gunter wrote: >> >read.table() does not have a "text" argument, so maybe you need to go >> >back >> >and go through a tutorial or two to learn R basics (e.g. about function >> >calls and function arguments ?) >> >See ?read.table (of course) >> > >> >Cheers, >> > >> >Bert Gunter >> > >> >"The trouble with having an open mind is that people keep coming along >> >and >> >sticking things into it." >> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> > >> > >> >On Thu, Aug 8, 2019 at 5:11 PM Val wrote: >> > >> >> Hi all, >> >> >> >> I am trying to red data where single and double quotes are embedded >> >> in some of the fields and prevented to read the data. As an example >> >> please see below. >> >> >> >> vld<-read.table(text="name prof >> >> A '4.5 >> >> B "3.2 >> >> C 5.5 ",header=TRUE) >> >> >> >> Error in read.table(text = "name prof \n A '4.5\n B >> >> 3.2 \n C 5.5 ", : >> >> incomplete final line found by readTableHeader on 'text' >> >> >> >> Is there a way how to read this data and gt the following output >> >> name prof >> >> 1A 4.5 >> >> 2B 3.2 >> >> 3C 5.5 >> >> >> >> Thank you inadvertence >> >> >> >> __ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > [[alternative HTML version deleted]] >> > >> >__ >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read
Thank you all, I can read the text file but the problem was there is a single quote embedded in the first row of second column. This quote causes the problem vld<-read.table(text="name prof A '4.5 B "3.2 C 5.5 ",header=TRUE) On Thu, Aug 8, 2019 at 7:24 PM Anaanthan Pillai wrote: > > data <- read.table(header=TRUE, text=' > name prof > A 4.5 > B 3.2 > C 5.5 > ') > > On 9 Aug 2019, at 8:11 AM, Val wrote: > > > > Hi all, > > > > I am trying to red data where single and double quotes are embedded > > in some of the fields and prevented to read the data. As an example > > please see below. > > > > vld<-read.table(text="name prof > > A '4.5 > > B "3.2 > > C 5.5 ",header=TRUE) > > > > Error in read.table(text = "name prof \n A '4.5\n B > > 3.2 \n C 5.5 ", : > > incomplete final line found by readTableHeader on 'text' > > > > Is there a way how to read this data and gt the following output > > name prof > > 1A 4.5 > > 2B 3.2 > > 3C 5.5 > > > > Thank you inadvertence > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read
Hi all, I am trying to red data where single and double quotes are embedded in some of the fields and prevented to read the data. As an example please see below. vld<-read.table(text="name prof A '4.5 B "3.2 C 5.5 ",header=TRUE) Error in read.table(text = "name prof \n A '4.5\n B 3.2 \n C 5.5 ", : incomplete final line found by readTableHeader on 'text' Is there a way how to read this data and gt the following output name prof 1A 4.5 2B 3.2 3C 5.5 Thank you inadvertence __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create
Sorry for the confusion, my sample data does not represent the actual data set. The range of value can be from -ve to +ve values and 0 could be a true value of an observation. So, instead of replacing missing value by zero, I want exclude them from the calculation. On Sat, Apr 13, 2019 at 10:42 PM Jeff Newmiller wrote: > > Looks to me like your initial request contradicts your clarification. Can you > explain this discrepancy? > > On April 13, 2019 8:29:59 PM PDT, Val wrote: > >Hi Bert and Jim, > >Thank you for the suggestion. > >However, those missing values should not be replaced by 0's. > >I want exclude those missing values from the calculation and create > >the index using only the non-missing values. > > > > > >On Sat, Apr 13, 2019 at 10:14 PM Jim Lemon > >wrote: > >> > >> Hi Val, > >> For this particular problem, you can just replace NAs with zeros. > >> > >> vdat[is.na(vdat)]<-0 > >> vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3) > >> vdat > >> obs Year x1 x2 x3 xy > >> 1 1 2001 25 10 10 130 > >> 2 2 2001 0 15 25 150 > >> 3 3 2001 50 10 0 150 > >> 4 4 2001 20 0 60 220 > >> > >> Note that this is not a general solution to the problem of NA values. > >> > >> Jim > >> > >> On Sun, Apr 14, 2019 at 12:54 PM Val wrote: > >> > > >> > Hi All, > >> > I have a data frame with several columns and I want to create > >> > another column by using the values of the other columns. My > >> > problem is that some the row values for some columns have missing > >> > values and I could not get the result I waned . > >> > > >> > Here is the sample of my data and my attempt. > >> > > >> > vdat<-read.table(text="obs, Year, x1, x2, x3 > >> > 1, 2001, 25 ,10, 10 > >> > 2, 2001, , 15, 25 > >> > 3, 2001, 50, 10, > >> > 4, 2001, 20, , 60",sep=",",header=TRUE,stringsAsFactors=F) > >> > vdat$xy <- 0 > >> > vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3) > >> > vdat > >> > > >> > obs Year x1 x2 x3 xy > >> > 1 1 2001 25 10 10 130 > >> > 2 2 2001 NA 15 25 NA > >> > 3 3 2001 50 10 NA NA > >> > 4 4 2001 20 NA 60 NA > >> > > >> > The desired result si this, > >> > > >> >obs Year x1 x2 x3 xy > >> > 1 1 2001 25 10 10 130 > >> > 2 2 2001 NA 15 25 150 > >> > 3 3 2001 50 10 NA 150 > >> > 4 4 2001 20 NA 60 220 > >> > > >> > How do I get my desired result? > >> > Thank you > >> > > >> > __ > >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create
Hi Bert and Jim, Thank you for the suggestion. However, those missing values should not be replaced by 0's. I want exclude those missing values from the calculation and create the index using only the non-missing values. On Sat, Apr 13, 2019 at 10:14 PM Jim Lemon wrote: > > Hi Val, > For this particular problem, you can just replace NAs with zeros. > > vdat[is.na(vdat)]<-0 > vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3) > vdat > obs Year x1 x2 x3 xy > 1 1 2001 25 10 10 130 > 2 2 2001 0 15 25 150 > 3 3 2001 50 10 0 150 > 4 4 2001 20 0 60 220 > > Note that this is not a general solution to the problem of NA values. > > Jim > > On Sun, Apr 14, 2019 at 12:54 PM Val wrote: > > > > Hi All, > > I have a data frame with several columns and I want to create > > another column by using the values of the other columns. My > > problem is that some the row values for some columns have missing > > values and I could not get the result I waned . > > > > Here is the sample of my data and my attempt. > > > > vdat<-read.table(text="obs, Year, x1, x2, x3 > > 1, 2001, 25 ,10, 10 > > 2, 2001, , 15, 25 > > 3, 2001, 50, 10, > > 4, 2001, 20, , 60",sep=",",header=TRUE,stringsAsFactors=F) > > vdat$xy <- 0 > > vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3) > > vdat > > > > obs Year x1 x2 x3 xy > > 1 1 2001 25 10 10 130 > > 2 2 2001 NA 15 25 NA > > 3 3 2001 50 10 NA NA > > 4 4 2001 20 NA 60 NA > > > > The desired result si this, > > > >obs Year x1 x2 x3 xy > > 1 1 2001 25 10 10 130 > > 2 2 2001 NA 15 25 150 > > 3 3 2001 50 10 NA 150 > > 4 4 2001 20 NA 60 220 > > > > How do I get my desired result? > > Thank you > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] create
Hi All, I have a data frame with several columns and I want to create another column by using the values of the other columns. My problem is that some the row values for some columns have missing values and I could not get the result I waned . Here is the sample of my data and my attempt. vdat<-read.table(text="obs, Year, x1, x2, x3 1, 2001, 25 ,10, 10 2, 2001, , 15, 25 3, 2001, 50, 10, 4, 2001, 20, , 60",sep=",",header=TRUE,stringsAsFactors=F) vdat$xy <- 0 vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3) vdat obs Year x1 x2 x3 xy 1 1 2001 25 10 10 130 2 2 2001 NA 15 25 NA 3 3 2001 50 10 NA NA 4 4 2001 20 NA 60 NA The desired result si this, obs Year x1 x2 x3 xy 1 1 2001 25 10 10 130 2 2 2001 NA 15 25 150 3 3 2001 50 10 NA 150 4 4 2001 20 NA 60 220 How do I get my desired result? Thank you __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select
Thank you very much Jeff, Goran and David for your help. On Mon, Feb 11, 2019 at 6:22 PM Jeff Newmiller wrote: > > N <- 8 # however many times you want to do this > ans <- lapply( seq.int( N ) > , function( n ) { > idx <- sample( nrow( mydat ) ) > mydat[ idx[ seq.int( which( 40 < cumsum( mydat[ idx, > "count" ] ) )[ 1 ] ) ], ] > } > ) > > > On Mon, 11 Feb 2019, Val wrote: > > > Sorry Jeff and David for not being clear! > > > > The total sample size should be at least 40, but the selection should > > be based on group ID. A different combination of Group ID could give > > at least 40. > > If I select group G1 with 25 count and G2 and with 15 counts > > then I can get a minimum of 40 counts. So G1 and G2 are > > selected. > > G1 25 > > G2 15 > > > > In another scenario, if G2, G3 and G4 are selected then the total > > count will be 58 which is greater than 40. So G2 , G3 and G4 could > > be selected. > > G2 15 > > G3 12 > > G4 31 > > > > So the restriction is to find group IDs that give a minim of 40. > > Once, I reached a minim of 40 then stop selecting group and output > > the data.. > > > > I am hope this helps > > > > > > > > > > On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller > > wrote: > >> > >> This constraint was not clear in your original sample data set. Can you > >> expand the data set to clarify how this requirement REALLY works? > >> > >> On February 11, 2019 3:00:15 PM PST, Val wrote: > >>> Thank you David. > >>> > >>> However, this will not work for me. If the group ID selected then all > >>> of its observation should be included. > >>> > >>> On Mon, Feb 11, 2019 at 4:51 PM David L Carlson > >>> wrote: > >>>> > >>>> First expand your data frame into a vector where G1 is repeated 25 > >>> times, G2 is repeated 15 times, etc. Then draw random samples of 40 > >>> from that vector: > >>>> > >>>>> grp <- rep(mydat$group, mydat$count) > >>>>> grp.sam <- sample(grp, 40) > >>>>> table(grp.sam) > >>>> grp.sam > >>>> G1 G2 G3 G4 G5 > >>>> 10 9 5 13 3 > >>>> > >>>> > >>>> David L Carlson > >>>> Department of Anthropology > >>>> Texas A University > >>>> College Station, TX 77843-4352 > >>>> > >>>> > >>>> -Original Message- > >>>> From: R-help On Behalf Of Val > >>>> Sent: Monday, February 11, 2019 4:36 PM > >>>> To: r-help@R-project.org (r-help@r-project.org) > >>> > >>>> Subject: [R] Select > >>>> > >>>> Hi all, > >>>> > >>>> I have a data frame with tow variables group and its size. > >>>> mydat<- read.table( text='group count > >>>> G1 25 > >>>> G2 15 > >>>> G3 12 > >>>> G4 31 > >>>> G5 10' , header = TRUE, as.is = TRUE ) > >>>> > >>>> I want to select group ID randomly (without replacement) until > >>> the > >>>> sum of count reaches 40. > >>>> So, in the first case, the data frame could be > >>>>G4 31 > >>>>65 10 > >>>> > >>>> In other case, it could be > >>>> G5 10 > >>>> G2 15 > >>>> G3 12 > >>>> > >>>> How do I put sum of count variable is a minimum of 40 restriction? > >>>> > >>>> Than k you in advance > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> I want to select group ids randomly until I reach the > >>>> > >>>> __ > >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> -- > >> Sent from my phone. Please excuse my brevity. > > > > --- > Jeff NewmillerThe . . Go Live... > DCN:Basics: ##.#. ##.#. Live Go... >Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select
Sorry Jeff and David for not being clear! The total sample size should be at least 40, but the selection should be based on group ID. A different combination of Group ID could give at least 40. If I select group G1 with 25 count and G2 and with 15 counts then I can get a minimum of 40 counts. So G1 and G2 are selected. G1 25 G2 15 In another scenario, if G2, G3 and G4 are selected then the total count will be 58 which is greater than 40. So G2 , G3 and G4 could be selected. G2 15 G3 12 G4 31 So the restriction is to find group IDs that give a minim of 40. Once, I reached a minim of 40 then stop selecting group and output the data.. I am hope this helps On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller wrote: > > This constraint was not clear in your original sample data set. Can you > expand the data set to clarify how this requirement REALLY works? > > On February 11, 2019 3:00:15 PM PST, Val wrote: > >Thank you David. > > > >However, this will not work for me. If the group ID selected then all > >of its observation should be included. > > > >On Mon, Feb 11, 2019 at 4:51 PM David L Carlson > >wrote: > >> > >> First expand your data frame into a vector where G1 is repeated 25 > >times, G2 is repeated 15 times, etc. Then draw random samples of 40 > >from that vector: > >> > >> > grp <- rep(mydat$group, mydat$count) > >> > grp.sam <- sample(grp, 40) > >> > table(grp.sam) > >> grp.sam > >> G1 G2 G3 G4 G5 > >> 10 9 5 13 3 > >> > >> > >> David L Carlson > >> Department of Anthropology > >> Texas A University > >> College Station, TX 77843-4352 > >> > >> > >> -Original Message- > >> From: R-help On Behalf Of Val > >> Sent: Monday, February 11, 2019 4:36 PM > >> To: r-help@R-project.org (r-help@r-project.org) > > > >> Subject: [R] Select > >> > >> Hi all, > >> > >> I have a data frame with tow variables group and its size. > >> mydat<- read.table( text='group count > >> G1 25 > >> G2 15 > >> G3 12 > >> G4 31 > >> G5 10' , header = TRUE, as.is = TRUE ) > >> > >> I want to select group ID randomly (without replacement) until > >the > >> sum of count reaches 40. > >> So, in the first case, the data frame could be > >>G4 31 > >>65 10 > >> > >> In other case, it could be > >> G5 10 > >> G2 15 > >> G3 12 > >> > >> How do I put sum of count variable is a minimum of 40 restriction? > >> > >> Than k you in advance > >> > >> > >> > >> > >> > >> > >> I want to select group ids randomly until I reach the > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select
Thank you David. However, this will not work for me. If the group ID selected then all of its observation should be included. On Mon, Feb 11, 2019 at 4:51 PM David L Carlson wrote: > > First expand your data frame into a vector where G1 is repeated 25 times, G2 > is repeated 15 times, etc. Then draw random samples of 40 from that vector: > > > grp <- rep(mydat$group, mydat$count) > > grp.sam <- sample(grp, 40) > > table(grp.sam) > grp.sam > G1 G2 G3 G4 G5 > 10 9 5 13 3 > > > David L Carlson > Department of Anthropology > Texas A University > College Station, TX 77843-4352 > > > -Original Message- > From: R-help On Behalf Of Val > Sent: Monday, February 11, 2019 4:36 PM > To: r-help@R-project.org (r-help@r-project.org) > Subject: [R] Select > > Hi all, > > I have a data frame with tow variables group and its size. > mydat<- read.table( text='group count > G1 25 > G2 15 > G3 12 > G4 31 > G5 10' , header = TRUE, as.is = TRUE ) > > I want to select group ID randomly (without replacement) until the > sum of count reaches 40. > So, in the first case, the data frame could be >G4 31 >65 10 > > In other case, it could be > G5 10 > G2 15 > G3 12 > > How do I put sum of count variable is a minimum of 40 restriction? > > Than k you in advance > > > > > > > I want to select group ids randomly until I reach the > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Select
Hi all, I have a data frame with tow variables group and its size. mydat<- read.table( text='group count G1 25 G2 15 G3 12 G4 31 G5 10' , header = TRUE, as.is = TRUE ) I want to select group ID randomly (without replacement) until the sum of count reaches 40. So, in the first case, the data frame could be G4 31 65 10 In other case, it could be G5 10 G2 15 G3 12 How do I put sum of count variable is a minimum of 40 restriction? Than k you in advance I want to select group ids randomly until I reach the __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character comp
Thank you Erin and Rui! On Sat, Feb 9, 2019 at 1:08 PM Erin Hodgess wrote: > > Nice, Rui! Thanks > > On Sat, Feb 9, 2019 at 11:55 AM Rui Barradas wrote: >> >> Hello, >> >> The following will do it. >> >> mydataframe$dvar <- c(sapply(mydataframe[-1], nchar) %*% c(1, -1)) >> >> >> Hope this helps, >> >> Rui Barradas >> >> Às 18:05 de 09/02/2019, Val escreveu: >> > Hi All, >> > In a given data frame I want to compare character values of two columns. >> > My sample data looks like as follow, >> > >> > mydataframe <- read.table( text='ID var1 var2 >> >R1 AA AAA >> >R2 AAA AAA >> >R3A >> >R4 AA A >> >R5 A AAA', header = TRUE, as.is = TRUE ) >> > >> > For each ID, I want create the third column "dvar" as difference >> > between var1 and var2 >> > Row1( R1) the "dvar" value will be -1 and the complete desired out >> > put looks like as follow. >> > >> > IDvar1 var2 dvar >> > R1 AAAAA-1 >> > R2 AAA AAA 0 >> > R3A-3 >> > R4 AA A1 >> > R5A AAA -2 >> > >> > How do i do this? Any help please? >> > Thank you >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Erin Hodgess, PhD > mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] character comp
Hi Erin, Yes, it is always A's. On Sat, Feb 9, 2019 at 12:22 PM Erin Hodgess wrote: > > Will it always be A’s or will there be a mix please? > > On Sat, Feb 9, 2019 at 11:06 AM Val wrote: >> >> Hi All, >> In a given data frame I want to compare character values of two columns. >> My sample data looks like as follow, >> >> mydataframe <- read.table( text='ID var1 var2 >> R1 AA AAA >> R2 AAA AAA >> R3A >> R4 AA A >> R5 A AAA', header = TRUE, as.is = TRUE ) >> >> For each ID, I want create the third column "dvar" as difference >> between var1 and var2 >> Row1( R1) the "dvar" value will be -1 and the complete desired out >> put looks like as follow. >> >> IDvar1 var2 dvar >> R1 AAAAA-1 >> R2 AAA AAA 0 >> R3A-3 >> R4 AA A1 >> R5A AAA -2 >> >> How do i do this? Any help please? >> Thank you >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Erin Hodgess, PhD > mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] character comp
Hi All, In a given data frame I want to compare character values of two columns. My sample data looks like as follow, mydataframe <- read.table( text='ID var1 var2 R1 AA AAA R2 AAA AAA R3A R4 AA A R5 A AAA', header = TRUE, as.is = TRUE ) For each ID, I want create the third column "dvar" as difference between var1 and var2 Row1( R1) the "dvar" value will be -1 and the complete desired out put looks like as follow. IDvar1 var2 dvar R1 AAAAA-1 R2 AAA AAA 0 R3A-3 R4 AA A1 R5A AAA -2 How do i do this? Any help please? Thank you __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read
Thank you Jeff and all. My data is very messy and it is nice trick suggested by Jeff to handle it On Fri, Nov 9, 2018 at 8:42 PM Jeff Newmiller wrote: > > Your file has 5 commas in the first data row, but only 4 in the header. R > interprets this to mean your first column is intended to be row names (has > no corresponding column label) rather than data. (Row names are "outside" > the data frame... use str(dsh) to get a better picture.) > > Basically, your file does not conform to consistent practices for csv > files of having the same number of commas in every row. If at all possible > I would eliminate the extra comma. If you have many of these broken files, > you might need to read the data in pieces... e.g. > > dsh <- read.csv( "dat.csv", header=FALSE, skip=1 ) > dsh <- dsh[ , -length( dsh ) ] > dshh <- read.csv( "dat.csv", header=TRUE, nrow=1) > names( dsh ) <- names( dshh ) > > On Fri, 9 Nov 2018, Val wrote: > > > HI all, > > I am trying to read a csv file, but have a problem in the row names. > > After reading, the name of the first column is now "row.names" and > > all other column names are shifted to the right. The value of the last > > column become all NAs( as an extra column). > > > > My sample data looks like as follow, > > filename = dat.csv > > The first row has a missing value at column 3 and 5. The last row has > > a missing value at column 1 and 5 > > x1,x2,x3,x4,x5 > > 12,13,,14,, > > 22,23,24,25,26 > > ,33,34,34, > > To read the file I used this > > > > dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char > > = "", quote = "", stringsAsFactors = FALSE) > > > > The output from the above is > > dsh > > > > row.names x1 x2 x3 x4 x5 > > 112 13 NA 14 NA NA > > 222 23 24 25 26 NA > > 3 33 34 34 NA NA > > > > The name of teh frist column is row,banes and all values of last columns is > > NAs > > > > > > However, the desired output should be > > x1 x2 x3 x4 x5 > > 12 13 NA 14 NA > > 22 23 24 25 26 > > NA 33 34 34 NA > > > > > > How can I fix this? > > Thank you in advance > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > --- > Jeff NewmillerThe . . Go Live... > DCN:Basics: ##.#. ##.#. Live Go... >Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Read
HI all, I am trying to read a csv file, but have a problem in the row names. After reading, the name of the first column is now "row.names" and all other column names are shifted to the right. The value of the last column become all NAs( as an extra column). My sample data looks like as follow, filename = dat.csv The first row has a missing value at column 3 and 5. The last row has a missing value at column 1 and 5 x1,x2,x3,x4,x5 12,13,,14,, 22,23,24,25,26 ,33,34,34, To read the file I used this dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char = "", quote = "", stringsAsFactors = FALSE) The output from the above is dsh row.names x1 x2 x3 x4 x5 112 13 NA 14 NA NA 222 23 24 25 26 NA 3 33 34 34 NA NA The name of teh frist column is row,banes and all values of last columns is NAs However, the desired output should be x1 x2 x3 x4 x5 12 13 NA 14 NA 22 23 24 25 26 NA 33 34 34 NA How can I fix this? Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select and hold missing
I have a data dfc <- read.table( text= 'week v1 v2 w1 11 11 w1 .42 w1 31 32 w2 31 52 w2 41 . w3 51 82 w2 11 22 w3 11 12 w4 21 202 w1 31 72 w2 71 52', header = TRUE, as.is = TRUE, na.strings=c("",".","NA") ) I want to create this new variable diff = v2-v1 and remove rows based on this "diff" value as shown below. dfc$diff <- dfc$v2 - dfc$v1 I want to remove row values <=0 and any value greater than >= 100 and keep all values including NAs dfca <- dfc[((dfc$diff) > 0) & ((dfc$diff) < 100), ] However, the result is not what I wanted. I want the output as follow, week v1 v2 diff w1 NA 42 NA w1 31 321 w2 31 52 21 w2 41 NA NA w3 51 82 31 w2 11 22 11 w3 11 121 w1 31 72 41 However, I got this,l. Why it is setting all row values NA? week v1 v2 diff NA NA NA w1 31 321 w2 31 52 21 NA NA NA w3 51 82 31 w2 11 22 11 w3 11 121 w1 31 72 41 Any help ? Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exclude
Thank you Bert and Jim, Jim, FYI , I have an error message generated as Error in allstates : object 'allstates' not found Bert, it is working. However, If I want to chose to include only mos years example, 2003,2004,2007 and continue the analysis as before. Where should I define the years to get as follow. 2003 2004 2007 AL21 1 NY11 2 Thank you again. On Thu, May 17, 2018 at 8:48 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > ... and similar to Jim's suggestion but perhaps slightly simpler (or not!): > > > cross <- xtabs( Y ~ stat + year, data = tdat) > > keep <- apply(cross, 1, all) > > keep <- names(keep)[keep] > > cross[keep,] > year > stat 2003 2004 2006 2007 2009 2010 > AL 38 21 20 12 16 15 > NY 50 51 57 98 183 230 > > > > > ## for counts just do: > > xtabs( ~ stat + year, data = tdat[tdat$stat %in% keep, ]) > year > stat 2003 2004 2006 2007 2009 2010 > AL211111 > NY111223 > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Thu, May 17, 2018 at 5:48 PM, Val <valkr...@gmail.com> wrote: > >> Hi All, >> >> I have a sample of data set show as below. >> tdat <- read.table(textConnection("stat year Y >> AL 200325 >> AL 200313 >> AL 200421 >> AL 200620 >> AL 200712 >> AL 200916 >> AL 201015 >> FL 200663 >> FL 200714 >> FL 200725 >> FL 200964 >> FL 200947 >> FL 201048 >> NY 200350 >> NY 200451 >> NY 200657 >> NY 200762 >> NY 200736 >> NY 200987 >> NY 200996 >> <https://maps.google.com/?q=2009%C2%A0+%C2%A0+96+%0D%0ANY=gmail=g> >> NY 201091 >> NY 201059 >> NY 201080"),header = TRUE,stringsAsFactors=FALSE) >> >> There are three states, I wan tto select states taht do ahve records in >> all >> year. >> Example, >> xtabs(Y~stat+year, tdat) >> This gave me the following >> >> stat 2003 2004 2006 2007 2009 2010 >> AL 38 21 20 12 16 15 >> FL00 63 39 111 48 >> NY 50 51 57 98 183 230 >> >> Fl state does not have recrods in all year and I wan to exclude from this >> and I want teh result as follow >> >> stat 2003 2004 2006 2007 2009 2010 >> AL 38 21 20 12 16 15 >> NY 50 51 57 98 183 230 >> >> The other thing, how do I get teh counts state by year? >> >> Desired result, >> >>20032004 2006 2007 20092010 >> AL 2 1 1 1 1 1 >> NY 11 12 23 >> >> Thank you >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] exclude
Hi All, I have a sample of data set show as below. tdat <- read.table(textConnection("stat year Y AL 200325 AL 200313 AL 200421 AL 200620 AL 200712 AL 200916 AL 201015 FL 200663 FL 200714 FL 200725 FL 200964 FL 200947 FL 201048 NY 200350 NY 200451 NY 200657 NY 200762 NY 200736 NY 200987 NY 200996 NY 201091 NY 201059 NY 201080"),header = TRUE,stringsAsFactors=FALSE) There are three states, I wan tto select states taht do ahve records in all year. Example, xtabs(Y~stat+year, tdat) This gave me the following stat 2003 2004 2006 2007 2009 2010 AL 38 21 20 12 16 15 FL00 63 39 111 48 NY 50 51 57 98 183 230 Fl state does not have recrods in all year and I wan to exclude from this and I want teh result as follow stat 2003 2004 2006 2007 2009 2010 AL 38 21 20 12 16 15 NY 50 51 57 98 183 230 The other thing, how do I get teh counts state by year? Desired result, 20032004 2006 2007 20092010 AL 2 1 1 1 1 1 NY 11 12 23 Thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] include
Thank you all for your help and sorry for that. On Sun, Feb 25, 2018 at 12:18 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > Jim has been exceedingly patient (and may well continue to be so), but > this smells like "failure to launch". At what point will you start showing > your (failed) attempts at solving your own problems so we can help you work > on your specific weaknesses and become self-sufficient? > -- > Sent from my phone. Please excuse my brevity. > > On February 25, 2018 7:55:55 AM PST, Val <valkr...@gmail.com> wrote: > >HI Jim and all, > > > >I want to put one more condition. Include col2 and col3 if they are > >not > >in col1. > > > >Here is the data > >mydat <- read.table(textConnection("Col1 Col2 col3 > >K2 X1 NA > >Z1 K1 K2 > >Z2 NA NA > >Z3 X1 NA > >Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) > > > >The desired out put would be > > > > Col1 Col2 col3 > >1X100 > >2K100 > >3Y100 > >4W100 > >6K2 X10 > >7Z1 K1 K2 > >8Z200 > >9Z3 X10 > >10 Z4 Y1 W1 > > > >K2 is already is already in col1 and should not be added. > > > >Thank you in advance > > > > > > > > > > > > > > > >On Sat, Feb 24, 2018 at 6:38 PM, Jim Lemon <drjimle...@gmail.com> > >wrote: > > > >> Hi Val, > >> My fault - I assumed that the NA would be first in the result > >produced > >> by "unique": > >> > >> mydat <- read.table(textConnection("Col1 Col2 col3 > >> Z1 K1 K2 > >> Z2 NA NA > >> Z3 X1 NA > >> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) > >> val23<-unique(unlist(mydat[,c("Col2","col3")])) > >> napos<-which(is.na(val23)) > >> preval<-data.frame(Col1=val23[-napos], > >> Col2=NA,col3=NA) > >> mydat<-rbind(preval,mydat) > >> mydat[is.na(mydat)]<-"0" > >> mydat > >> > >> Jim > >> > >> On Sun, Feb 25, 2018 at 11:27 AM, Val <valkr...@gmail.com> wrote: > >> > Thank you Jim, > >> > > >> > I read the data as you suggested but I could not find K1 in col1. > >> > > >> > rbind(preval,mydat) > >> > Col1 Col2 col3 > >> > 1 > >> > 2 X1 > >> > 3 Y1 > >> > 4 K2 > >> > 5 W1 > >> > 6 Z1 K1 K2 > >> > 7 Z2 > >> > 8 Z3 X1 > >> > 9 Z4 Y1 W1 > >> > > >> > > >> > > >> > On Sat, Feb 24, 2018 at 6:18 PM, Jim Lemon <drjimle...@gmail.com> > >wrote: > >> >> > >> >> hi Val, > >> >> Your problem seems to be that the data are read in as a factor. > >The > >> >> simplest way I can think of to get around this is: > >> >> > >> >> mydat <- read.table(textConnection("Col1 Col2 col3 > >> >> Z1 K1 K2 > >> >> Z2 NA NA > >> >> Z3 X1 NA > >> >> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) > >> >> > >preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >> >> Col2=NA,col3=NA) > >> >> rbind(preval,mydat) > >> >> mydat[is.na(mydat)]<-"0" > >> >> > >> >> Jiim > >> >> > >> >> > >> >> On Sun, Feb 25, 2018 at 11:05 AM, Val <valkr...@gmail.com> wrote: > >> >> > Sorry , I hit the send key accidentally here is my complete > >message. > >> >> > > >> >> > Thank you Jim and all, I got it. > >> >> > > >> >> > I have one more question on the original question > >> >> > > >> >> > What does this "[-1] " do? > >> >> > > >preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >> >> >Col2=NA,col3=NA) > >> >> > > >> >> > > >> >> > mydat <- read.table(textConnection("Col1 Col2 col3 > >> >> > Z1 K1 K2 > >> >> > Z2 NA NA > >> >> > Z3 X1 NA > >> >> > Z4 Y1 W1"),header = TRUE) > >
Re: [R] include
HI Jim and all, I want to put one more condition. Include col2 and col3 if they are not in col1. Here is the data mydat <- read.table(textConnection("Col1 Col2 col3 K2 X1 NA Z1 K1 K2 Z2 NA NA Z3 X1 NA Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) The desired out put would be Col1 Col2 col3 1X100 2K100 3Y100 4W100 6K2 X10 7Z1 K1 K2 8Z200 9Z3 X10 10 Z4 Y1 W1 K2 is already is already in col1 and should not be added. Thank you in advance On Sat, Feb 24, 2018 at 6:38 PM, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Val, > My fault - I assumed that the NA would be first in the result produced > by "unique": > > mydat <- read.table(textConnection("Col1 Col2 col3 > Z1 K1 K2 > Z2 NA NA > Z3 X1 NA > Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) > val23<-unique(unlist(mydat[,c("Col2","col3")])) > napos<-which(is.na(val23)) > preval<-data.frame(Col1=val23[-napos], > Col2=NA,col3=NA) > mydat<-rbind(preval,mydat) > mydat[is.na(mydat)]<-"0" > mydat > > Jim > > On Sun, Feb 25, 2018 at 11:27 AM, Val <valkr...@gmail.com> wrote: > > Thank you Jim, > > > > I read the data as you suggested but I could not find K1 in col1. > > > > rbind(preval,mydat) > > Col1 Col2 col3 > > 1 > > 2 X1 > > 3 Y1 > > 4 K2 > > 5 W1 > > 6 Z1 K1 K2 > > 7 Z2 > > 8 Z3 X1 > > 9 Z4 Y1 W1 > > > > > > > > On Sat, Feb 24, 2018 at 6:18 PM, Jim Lemon <drjimle...@gmail.com> wrote: > >> > >> hi Val, > >> Your problem seems to be that the data are read in as a factor. The > >> simplest way I can think of to get around this is: > >> > >> mydat <- read.table(textConnection("Col1 Col2 col3 > >> Z1 K1 K2 > >> Z2 NA NA > >> Z3 X1 NA > >> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) > >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >> Col2=NA,col3=NA) > >> rbind(preval,mydat) > >> mydat[is.na(mydat)]<-"0" > >> > >> Jiim > >> > >> > >> On Sun, Feb 25, 2018 at 11:05 AM, Val <valkr...@gmail.com> wrote: > >> > Sorry , I hit the send key accidentally here is my complete message. > >> > > >> > Thank you Jim and all, I got it. > >> > > >> > I have one more question on the original question > >> > > >> > What does this "[-1] " do? > >> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >> >Col2=NA,col3=NA) > >> > > >> > > >> > mydat <- read.table(textConnection("Col1 Col2 col3 > >> > Z1 K1 K2 > >> > Z2 NA NA > >> > Z3 X1 NA > >> > Z4 Y1 W1"),header = TRUE) > >> > > >> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >> >Col2=NA,col3=NA) > >> > rbind(unique(preval),mydat) > >> > > >> > > >> > Col1 Col2 col3 > >> > 1 > >> > 2 X1 > >> > 3 Y1 > >> > 4 K2 > >> > 5 W1 > >> > 6 Z1 K1 K2 > >> > 7 Z2 > >> > 8 Z3 X1 > >> > 9 Z4 Y1 W1 > >> > > >> > I could not find K1 in the first col1. Is that possible to fix this? > >> > > >> > On Sat, Feb 24, 2018 at 5:59 PM, Val <valkr...@gmail.com> wrote: > >> > > >> >> Thank you Jim and all, I got it. > >> >> > >> >> I have one more question on the original question > >> >> > >> >> What does this "[-1] " do? > >> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2", > "col3")]))[-1], > >> >>Col2=NA,col3=NA) > >> >> > >> >> > >> >> mydat <- read.table(textConnection("Col1 Col2 col3 > >> >> Z1 K1 K2 > >> >> Z2 NA NA > >> >> Z3 X1 NA > >> >> Z4 Y1 W1"),header = TRUE) > >> >> > >> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2", > "col3")]))[-1], > >> >>
Re: [R] include
Thank you so much Jim! On Sat, Feb 24, 2018 at 6:38 PM, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Val, > My fault - I assumed that the NA would be first in the result produced > by "unique": > > mydat <- read.table(textConnection("Col1 Col2 col3 > Z1 K1 K2 > Z2 NA NA > Z3 X1 NA > Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) > val23<-unique(unlist(mydat[,c("Col2","col3")])) > napos<-which(is.na(val23)) > preval<-data.frame(Col1=val23[-napos], > Col2=NA,col3=NA) > mydat<-rbind(preval,mydat) > mydat[is.na(mydat)]<-"0" > mydat > > Jim > > On Sun, Feb 25, 2018 at 11:27 AM, Val <valkr...@gmail.com> wrote: > > Thank you Jim, > > > > I read the data as you suggested but I could not find K1 in col1. > > > > rbind(preval,mydat) > > Col1 Col2 col3 > > 1 > > 2 X1 > > 3 Y1 > > 4 K2 > > 5 W1 > > 6 Z1 K1 K2 > > 7 Z2 > > 8 Z3 X1 > > 9 Z4 Y1 W1 > > > > > > > > On Sat, Feb 24, 2018 at 6:18 PM, Jim Lemon <drjimle...@gmail.com> wrote: > >> > >> hi Val, > >> Your problem seems to be that the data are read in as a factor. The > >> simplest way I can think of to get around this is: > >> > >> mydat <- read.table(textConnection("Col1 Col2 col3 > >> Z1 K1 K2 > >> Z2 NA NA > >> Z3 X1 NA > >> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) > >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >> Col2=NA,col3=NA) > >> rbind(preval,mydat) > >> mydat[is.na(mydat)]<-"0" > >> > >> Jiim > >> > >> > >> On Sun, Feb 25, 2018 at 11:05 AM, Val <valkr...@gmail.com> wrote: > >> > Sorry , I hit the send key accidentally here is my complete message. > >> > > >> > Thank you Jim and all, I got it. > >> > > >> > I have one more question on the original question > >> > > >> > What does this "[-1] " do? > >> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >> >Col2=NA,col3=NA) > >> > > >> > > >> > mydat <- read.table(textConnection("Col1 Col2 col3 > >> > Z1 K1 K2 > >> > Z2 NA NA > >> > Z3 X1 NA > >> > Z4 Y1 W1"),header = TRUE) > >> > > >> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >> >Col2=NA,col3=NA) > >> > rbind(unique(preval),mydat) > >> > > >> > > >> > Col1 Col2 col3 > >> > 1 > >> > 2 X1 > >> > 3 Y1 > >> > 4 K2 > >> > 5 W1 > >> > 6 Z1 K1 K2 > >> > 7 Z2 > >> > 8 Z3 X1 > >> > 9 Z4 Y1 W1 > >> > > >> > I could not find K1 in the first col1. Is that possible to fix this? > >> > > >> > On Sat, Feb 24, 2018 at 5:59 PM, Val <valkr...@gmail.com> wrote: > >> > > >> >> Thank you Jim and all, I got it. > >> >> > >> >> I have one more question on the original question > >> >> > >> >> What does this "[-1] " do? > >> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2", > "col3")]))[-1], > >> >>Col2=NA,col3=NA) > >> >> > >> >> > >> >> mydat <- read.table(textConnection("Col1 Col2 col3 > >> >> Z1 K1 K2 > >> >> Z2 NA NA > >> >> Z3 X1 NA > >> >> Z4 Y1 W1"),header = TRUE) > >> >> > >> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2", > "col3")]))[-1], > >> >>Col2=NA,col3=NA) > >> >> rbind(unique(preval),mydat) > >> >> > >> >> > >> >> Col1 Col2 col3 > >> >> 1 > >> >> 2 X1 > >> >> 3 Y1 > >> >> 4 K2 > >> >> 5 W1 > >> >> 6 Z1 K1 K2 > >> >> 7 Z2 > >> >> 8 Z3 X1 > >> >> 9 Z4 Y1 W1 > >> >> > >> >> > >> &
Re: [R] include
Thank you Jim, I read the data as you suggested but I could not find K1 in col1. rbind(preval,mydat) Col1 Col2 col3 1 2 X1 3 Y1 4 K2 5 W1 6 Z1 K1 K2 7 Z2 8 Z3 X1 9 Z4 Y1 W1 On Sat, Feb 24, 2018 at 6:18 PM, Jim Lemon <drjimle...@gmail.com> wrote: > hi Val, > Your problem seems to be that the data are read in as a factor. The > simplest way I can think of to get around this is: > > mydat <- read.table(textConnection("Col1 Col2 col3 > Z1 K1 K2 > Z2 NA NA > Z3 X1 NA > Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE) > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > Col2=NA,col3=NA) > rbind(preval,mydat) > mydat[is.na(mydat)]<-"0" > > Jiim > > > On Sun, Feb 25, 2018 at 11:05 AM, Val <valkr...@gmail.com> wrote: > > Sorry , I hit the send key accidentally here is my complete message. > > > > Thank you Jim and all, I got it. > > > > I have one more question on the original question > > > > What does this "[-1] " do? > > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >Col2=NA,col3=NA) > > > > > > mydat <- read.table(textConnection("Col1 Col2 col3 > > Z1 K1 K2 > > Z2 NA NA > > Z3 X1 NA > > Z4 Y1 W1"),header = TRUE) > > > > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >Col2=NA,col3=NA) > > rbind(unique(preval),mydat) > > > > > > Col1 Col2 col3 > > 1 > > 2 X1 > > 3 Y1 > > 4 K2 > > 5 W1 > > 6 Z1 K1 K2 > > 7 Z2 > > 8 Z3 X1 > > 9 Z4 Y1 W1 > > > > I could not find K1 in the first col1. Is that possible to fix this? > > > > On Sat, Feb 24, 2018 at 5:59 PM, Val <valkr...@gmail.com> wrote: > > > >> Thank you Jim and all, I got it. > >> > >> I have one more question on the original question > >> > >> What does this "[-1] " do? > >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >>Col2=NA,col3=NA) > >> > >> > >> mydat <- read.table(textConnection("Col1 Col2 col3 > >> Z1 K1 K2 > >> Z2 NA NA > >> Z3 X1 NA > >> Z4 Y1 W1"),header = TRUE) > >> > >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > >>Col2=NA,col3=NA) > >> rbind(unique(preval),mydat) > >> > >> > >> Col1 Col2 col3 > >> 1 > >> 2 X1 > >> 3 Y1 > >> 4 K2 > >> 5 W1 > >> 6 Z1 K1 K2 > >> 7 Z2 > >> 8 Z3 X1 > >> 9 Z4 Y1 W1 > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Sat, Feb 24, 2018 at 5:04 PM, Duncan Murdoch < > murdoch.dun...@gmail.com> > >> wrote: > >> > >>> On 24/02/2018 1:53 PM, William Dunlap via R-help wrote: > >>> > >>>> x1 = rbind(unique(preval),mydat) > >>>>x2 <- x1[is.na(x1)] <- 0 > >>>>x2 # gives 0 > >>>> > >>>> Why introduce the 'x2'? x1[...] <- 0 alters x1 in place and I think > >>>> that > >>>> altered x1 is what you want. > >>>> > >>>> You asked why x2 was zero. The value of the expression > >>>> f(a) <- b > >>>> and assignments are processed right to left so > >>>> x2 <- x[!is.na(x1)] <- 0 > >>>> is equivalent to > >>>> x[!is.na(x1)] <- 0 > >>>> x2 <- 0 > >>>> > >>> > >>> That's not right in general, is it? I'd think that should be > >>> > >>> x[!is.na(x1)] <- 0 > >>> x2 <- x1 > >>> > >>> Of course, in this example, x1 is 0, so it gives the same answer. > >>> > >>> Duncan Murdoch > >>> > >>> > >>> > >>>> > >>>> Bill Dunlap > >>>> TIBCO Software > >>>> wdunlap tibco.com > >>>> > >>>> On
Re: [R] include
Sorry , I hit the send key accidentally here is my complete message. Thank you Jim and all, I got it. I have one more question on the original question What does this "[-1] " do? preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], Col2=NA,col3=NA) mydat <- read.table(textConnection("Col1 Col2 col3 Z1 K1 K2 Z2 NA NA Z3 X1 NA Z4 Y1 W1"),header = TRUE) preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], Col2=NA,col3=NA) rbind(unique(preval),mydat) Col1 Col2 col3 1 2 X1 3 Y1 4 K2 5 W1 6 Z1 K1 K2 7 Z2 8 Z3 X1 9 Z4 Y1 W1 I could not find K1 in the first col1. Is that possible to fix this? On Sat, Feb 24, 2018 at 5:59 PM, Val <valkr...@gmail.com> wrote: > Thank you Jim and all, I got it. > > I have one more question on the original question > > What does this "[-1] " do? > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], >Col2=NA,col3=NA) > > > mydat <- read.table(textConnection("Col1 Col2 col3 > Z1 K1 K2 > Z2 NA NA > Z3 X1 NA > Z4 Y1 W1"),header = TRUE) > > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], >Col2=NA,col3=NA) > rbind(unique(preval),mydat) > > > Col1 Col2 col3 > 1 > 2 X1 > 3 Y1 > 4 K2 > 5 W1 > 6 Z1 K1 K2 > 7 Z2 > 8 Z3 X1 > 9 Z4 Y1 W1 > > > > > > > > > > > > > > > > On Sat, Feb 24, 2018 at 5:04 PM, Duncan Murdoch <murdoch.dun...@gmail.com> > wrote: > >> On 24/02/2018 1:53 PM, William Dunlap via R-help wrote: >> >>> x1 = rbind(unique(preval),mydat) >>>x2 <- x1[is.na(x1)] <- 0 >>>x2 # gives 0 >>> >>> Why introduce the 'x2'? x1[...] <- 0 alters x1 in place and I think >>> that >>> altered x1 is what you want. >>> >>> You asked why x2 was zero. The value of the expression >>> f(a) <- b >>> and assignments are processed right to left so >>> x2 <- x[!is.na(x1)] <- 0 >>> is equivalent to >>> x[!is.na(x1)] <- 0 >>> x2 <- 0 >>> >> >> That's not right in general, is it? I'd think that should be >> >> x[!is.na(x1)] <- 0 >> x2 <- x1 >> >> Of course, in this example, x1 is 0, so it gives the same answer. >> >> Duncan Murdoch >> >> >> >>> >>> Bill Dunlap >>> TIBCO Software >>> wdunlap tibco.com >>> >>> On Sat, Feb 24, 2018 at 9:59 AM, Val <valkr...@gmail.com> wrote: >>> >>> Thank you Jim >>>> >>>> I wanted a final data frame after replacing the NA's to "0" >>>> >>>> x1 = rbind(unique(preval),mydat) >>>> x2 <- x1[is.na(x1)] <- 0 >>>> x2 >>>> but I got this, >>>> >>>> [1] 0 >>>> >>>> why I am getting this? >>>> >>>> >>>> On Sat, Feb 24, 2018 at 12:17 AM, Jim Lemon <drjimle...@gmail.com> >>>> wrote: >>>> >>>> Hi Val, >>>>> Try this: >>>>> >>>>> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], >>>>> Col2=NA,col3=NA) >>>>> rbind(preval,mydat) >>>>> >>>>> Jim >>>>> >>>>> On Sat, Feb 24, 2018 at 3:34 PM, Val <valkr...@gmail.com> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> I am reading a file as follow, >>>>>> >>>>>> mydat <- read.table(textConnection("Col1 Col2 col3 >>>>>> Z2 NA NA >>>>>> Z3 X1 NA >>>>>> Z4 Y1 W1"),header = TRUE) >>>>>> >>>>>> 1. "NA" are missing should be replace by 0 >>>>>> 2. value that are in COl2 and Col3 should be included in col1 >>>>>> before >>>>>> they appear >>>>>> in col2 and col3. So the output data looks like as follow, >>>>>> >>>>>> X1 0 0 >>>>>> Y1 0 0 >>>>>> W1 0 0 >>>>>> Z2 0 0 >>>>>> Z3 X1 0 >>>>>> Z4 Y1 W1 >>>>>> >&g
Re: [R] include
Thank you Jim and all, I got it. I have one more question on the original question What does this "[-1] " do? preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], Col2=NA,col3=NA) mydat <- read.table(textConnection("Col1 Col2 col3 Z1 K1 K2 Z2 NA NA Z3 X1 NA Z4 Y1 W1"),header = TRUE) preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], Col2=NA,col3=NA) rbind(unique(preval),mydat) Col1 Col2 col3 1 2 X1 3 Y1 4 K2 5 W1 6 Z1 K1 K2 7 Z2 8 Z3 X1 9 Z4 Y1 W1 On Sat, Feb 24, 2018 at 5:04 PM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 24/02/2018 1:53 PM, William Dunlap via R-help wrote: > >> x1 = rbind(unique(preval),mydat) >>x2 <- x1[is.na(x1)] <- 0 >>x2 # gives 0 >> >> Why introduce the 'x2'? x1[...] <- 0 alters x1 in place and I think that >> altered x1 is what you want. >> >> You asked why x2 was zero. The value of the expression >> f(a) <- b >> and assignments are processed right to left so >> x2 <- x[!is.na(x1)] <- 0 >> is equivalent to >> x[!is.na(x1)] <- 0 >> x2 <- 0 >> > > That's not right in general, is it? I'd think that should be > > x[!is.na(x1)] <- 0 > x2 <- x1 > > Of course, in this example, x1 is 0, so it gives the same answer. > > Duncan Murdoch > > > >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> On Sat, Feb 24, 2018 at 9:59 AM, Val <valkr...@gmail.com> wrote: >> >> Thank you Jim >>> >>> I wanted a final data frame after replacing the NA's to "0" >>> >>> x1 = rbind(unique(preval),mydat) >>> x2 <- x1[is.na(x1)] <- 0 >>> x2 >>> but I got this, >>> >>> [1] 0 >>> >>> why I am getting this? >>> >>> >>> On Sat, Feb 24, 2018 at 12:17 AM, Jim Lemon <drjimle...@gmail.com> >>> wrote: >>> >>> Hi Val, >>>> Try this: >>>> >>>> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], >>>> Col2=NA,col3=NA) >>>> rbind(preval,mydat) >>>> >>>> Jim >>>> >>>> On Sat, Feb 24, 2018 at 3:34 PM, Val <valkr...@gmail.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I am reading a file as follow, >>>>> >>>>> mydat <- read.table(textConnection("Col1 Col2 col3 >>>>> Z2 NA NA >>>>> Z3 X1 NA >>>>> Z4 Y1 W1"),header = TRUE) >>>>> >>>>> 1. "NA" are missing should be replace by 0 >>>>> 2. value that are in COl2 and Col3 should be included in col1 before >>>>> they appear >>>>> in col2 and col3. So the output data looks like as follow, >>>>> >>>>> X1 0 0 >>>>> Y1 0 0 >>>>> W1 0 0 >>>>> Z2 0 0 >>>>> Z3 X1 0 >>>>> Z4 Y1 W1 >>>>> >>>>> Thank you in advance >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/ >>>>> >>>> posting-guide.html >>>> >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/ >>> posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] include
Thank you Jim I wanted a final data frame after replacing the NA's to "0" x1 = rbind(unique(preval),mydat) x2 <- x1[is.na(x1)] <- 0 x2 but I got this, [1] 0 why I am getting this? On Sat, Feb 24, 2018 at 12:17 AM, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Val, > Try this: > > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1], > Col2=NA,col3=NA) > rbind(preval,mydat) > > Jim > > On Sat, Feb 24, 2018 at 3:34 PM, Val <valkr...@gmail.com> wrote: > > Hi All, > > > > I am reading a file as follow, > > > > mydat <- read.table(textConnection("Col1 Col2 col3 > > Z2 NA NA > > Z3 X1 NA > > Z4 Y1 W1"),header = TRUE) > > > > 1. "NA" are missing should be replace by 0 > > 2. value that are in COl2 and Col3 should be included in col1 before > > they appear > > in col2 and col3. So the output data looks like as follow, > > > > X1 0 0 > > Y1 0 0 > > W1 0 0 > > Z2 0 0 > > Z3 X1 0 > > Z4 Y1 W1 > > > > Thank you in advance > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] include
Hi All, I am reading a file as follow, mydat <- read.table(textConnection("Col1 Col2 col3 Z2 NA NA Z3 X1 NA Z4 Y1 W1"),header = TRUE) 1. "NA" are missing should be replace by 0 2. value that are in COl2 and Col3 should be included in col1 before they appear in col2 and col3. So the output data looks like as follow, X1 0 0 Y1 0 0 W1 0 0 Z2 0 0 Z3 X1 0 Z4 Y1 W1 Thank you in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find unique and summerize
Thank you so much Rui! On Sun, Feb 4, 2018 at 12:20 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > Please always cc the list. > > As for the question, I believe the following does it. > > a <- strsplit(mydata$ID, "[[:alpha:]]+") > b <- strsplit(mydata$ID, "[[:digit:]]+") > > a <- sapply(a, `[`, 1) > c <- sapply(a, `[`, 2) > b <- sapply(b, function(x) x[x != ""]) > > c2 <- sprintf("%010d", as.integer(c)) > > newID <- paste0(a, b, c2) > > > Hope this helps, > > Rui Barradas > > On 2/4/2018 2:01 AM, Val wrote: > >> Thank you so much again for your help! >> >> I have one more question related to this. >> >> 1. How do I further split this "358USA1540165 " into three parts. >> a) 358 >> b) USA >> c) 1540165 >> >> I want to add leading zeros to the third part like "0001540165" >> and then combine b and c to get this USA1540165 >> so USA1540165 changed to USA1540165 >> >> The other one is that the data set has several country codes and if I >> want to limit my data set to only certain country codes , how do I do that. >> >> Thank you again >> >> >> >> >> On Sat, Feb 3, 2018 at 1:05 PM, Rui Barradas <ruipbarra...@sapo.pt >> <mailto:ruipbarra...@sapo.pt>> wrote: >> >> Hello, >> >> As for the first question, instead of writing a xlsx file, maybe it >> is easier to write a csv file and then open it with Excel. >> >> tbl2 <- addmargins(tbl1) >> write.csv(tbl2, "tt1.csv") >> >> As for the second question, the following does it. >> >> inx <- apply(tbl1, 1, function(x) all(x != 0)) >> tbl1b <- addmargins(tbl1[inx, ]) >> tbl1b >> >> >> Hope this helps, >> >> Rui Barradas >> >> On 2/3/2018 4:42 PM, Val wrote: >> >> Thank you so much Rui. >> >> 1. How do I export this table to excel file? >> I used this >> tbl1 <- table(Country, IDNum) >> tbl2=addmargins(tbl1) >> write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", >> row.names=FALSE) >> The above did not give me that table. >> >> >> 2. I want select those unique Ids that do have records in all >> countries. >>From the above data set, this ID "FIN1540166" should be >> excluded from the summary table and the table looks like as follow >> >> IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526 >> USA1540165 290 757 321 171 1539 Sum 337 898 569 261 2065 >> >> Thank you again >> >> >> On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas >> <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt> >> <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>> >> wrote: >> >> Hello, >> >> Thanks for the reproducible example. >> See if the following does what you want. >> >> IDNum <- sub("^(\\d+).*", "\\1", mydata$ID) >> Country <- sub("^\\d+(.*)", "\\1", mydata$ID) >> >> tbl1 <- table(Country, IDNum) >> addmargins(tbl1) >> >> tbl2 <- xtabs(Y ~ Country + IDNum, mydata) >> addmargins(tbl2) >> >> >> Hope this helps, >> >> Rui Barradas >> >> >> On 2/3/2018 3:00 AM, Val wrote: >> >> Hi all, >> >> I have a data set need to be summarized by unique ID >> (count and >> sum of a >> variable) >> A unique individual ID (country name Abbreviation >>followed by >> an integer >> numbers) may have observation in several countries. >> Then the ID was >> changed by adding the country code as a prefix and >>new ID was >> constructed >> or recorded like (country code, + the original unique >> ID Example >> original ID "CAN1540164" , if this ID has an >> observation in >>
Re: [R] find unique and summerize
Thank you so much Rui. 1. How do I export this table to excel file? I used this tbl1 <- table(Country, IDNum) tbl2=addmargins(tbl1) write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", row.names=FALSE) The above did not give me that table. 2. I want select those unique Ids that do have records in all countries. From the above data set, this ID "FIN1540166" should be excluded from the summary table and the table looks like as follow IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526 USA1540165 290 757 321 171 1539 Sum 337 898 569 261 2065 Thank you again On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > Thanks for the reproducible example. > See if the following does what you want. > > IDNum <- sub("^(\\d+).*", "\\1", mydata$ID) > Country <- sub("^\\d+(.*)", "\\1", mydata$ID) > > tbl1 <- table(Country, IDNum) > addmargins(tbl1) > > tbl2 <- xtabs(Y ~ Country + IDNum, mydata) > addmargins(tbl2) > > > Hope this helps, > > Rui Barradas > > > On 2/3/2018 3:00 AM, Val wrote: > >> Hi all, >> >> I have a data set need to be summarized by unique ID (count and sum of a >> variable) >> A unique individual ID (country name Abbreviation followed by an integer >> numbers) may have observation in several countries. Then the ID was >> changed by adding the country code as a prefix and new ID was >> constructed >> or recorded like (country code, + the original unique ID Example >> original ID "CAN1540164" , if this ID has an observation in CANADA then >> the ID was changed to"1CAN1540164". From this new ID I want get out >> the country code get the original unique ID and summarize the data by >> unique ID and country code >> >> The data set look like >> mydata <- read.table(textConnection("GR ID iflag Y >> A 1CAN1540164 1 20 >> A 1CAN1540164 1 12 >> A 1CAN1540164 1 15 >> A 44CAN1540164 1 30 >> A 44CAN1540164 1 24 >> A 44CAN1540164 1 25 >> A 44CAN1540164 1 11 >> A 33CAN1540164 1 12 >> A 33CAN1540164 1 23 >> A 33CAN1540164 1 65 >> A 33CAN1540164 1 41 >> A 358CAN1540164 1 28 >> A 358CAN1540164 1 32 >> A 358CAN1540164 1 41 >> A 358CAN1540164 1 54 >> A 358CAN1540164 1 29 >> A 358CAN1540164 1 64 >> B 1USA1540165 1 125 >> B 1USA1540165 1 165 >> B 44USA1540165 1 171 >> B 33USA1540165 1 254 >> B 33USA1540165 1 241 >> B 33USA1540165 1 262 >> B 358USA1540165 1 321 >> C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE) >> >> From the above data there are three unique IDs and four country codes >> (1, >> 44, 33 and 358) >> >> I want the following two tables >> >> Table 1. count the unique ID by country code >>1 44 33 358 TOT >> CAN1540164 34 4 617 >> USA1540165 2 1 3 1 7 >> FIN1540166 - - - 1 1 >> TOT 55 7 8 25 >> >> >> Table 2 Sum of Y variable by unique ID and country. code >> >>1 44 33 358 TOT >> CAN154016447 90 141 248 526 >> USA1540165 290 171 757 321 1539 >> FIN1540166-- - 225 225 >> TOT 337 261 898794 2290 >> >> >> How do I do it in R? >> >> The first step is to get the unique country codes unique ID by splitting >> the new ID >> >> Thank you in advance >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] find unique and summerize
Hi all, I have a data set need to be summarized by unique ID (count and sum of a variable) A unique individual ID (country name Abbreviation followed by an integer numbers) may have observation in several countries. Then the ID was changed by adding the country code as a prefix and new ID was constructed or recorded like (country code, + the original unique ID Example original ID "CAN1540164" , if this ID has an observation in CANADA then the ID was changed to"1CAN1540164". From this new ID I want get out the country code get the original unique ID and summarize the data by unique ID and country code The data set look like mydata <- read.table(textConnection("GR ID iflag Y A 1CAN1540164 1 20 A 1CAN1540164 1 12 A 1CAN1540164 1 15 A 44CAN1540164 1 30 A 44CAN1540164 1 24 A 44CAN1540164 1 25 A 44CAN1540164 1 11 A 33CAN1540164 1 12 A 33CAN1540164 1 23 A 33CAN1540164 1 65 A 33CAN1540164 1 41 A 358CAN1540164 1 28 A 358CAN1540164 1 32 A 358CAN1540164 1 41 A 358CAN1540164 1 54 A 358CAN1540164 1 29 A 358CAN1540164 1 64 B 1USA1540165 1 125 B 1USA1540165 1 165 B 44USA1540165 1 171 B 33USA1540165 1 254 B 33USA1540165 1 241 B 33USA1540165 1 262 B 358USA1540165 1 321 C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE) >From the above data there are three unique IDs and four country codes (1, 44, 33 and 358) I want the following two tables Table 1. count the unique ID by country code 1 44 33 358 TOT CAN1540164 34 4 617 USA1540165 2 1 3 1 7 FIN1540166 - - - 1 1 TOT 55 7 8 25 Table 2 Sum of Y variable by unique ID and country. code 1 44 33 358 TOT CAN154016447 90 141 248 526 USA1540165 290 171 757 321 1539 FIN1540166-- - 225 225 TOT 337 261 898794 2290 How do I do it in R? The first step is to get the unique country codes unique ID by splitting the new ID Thank you in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] match and new columns
Hi Bill, I put stringsAsFactors = FALSE still did not work. tdat <- read.table(textConnection("A B C Y A12 B03 C04 0.70 A23 B05 C06 0.05 A14 B06 C07 1.20 A25 A23 A12 3.51 A16 A25 A14 2,16"),header = TRUE ,stringsAsFactors = FALSE) tdat$D <- 0 tdat$E <- 0 tdat$D <- (ifelse(tdat$B %in% tdat$A, tdat$A[tdat$B], 0)) tdat$E <- (ifelse(tdat$B %in% tdat$A, tdat$A[tdat$C], 0)) tdat I got this, A B C Y DE 1 A12 B03 C04 0.7000 2 A23 B05 C06 0.0500 3 A14 B06 C07 1.2000 4 A25 A23 A12 3.51 5 A16 A25 A14 2,16 On Wed, Dec 13, 2017 at 7:23 PM, William Dunlap <wdun...@tibco.com> wrote: > Use the stringsAsFactors=FALSE argument to read.table when > making your data.frame - factors are getting in your way here. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Dec 13, 2017 at 3:02 PM, Val <valkr...@gmail.com> wrote: > >> Thank you Rui, >> I did not get the desired result. Here is the output from your script >> >>A B CY D E >> 1 A12 <https://maps.google.com/?q=1+A12=gmail=g> B03 C04 >> 0.70 0 0 >> 2 A23 B05 C06 0.05 0 0 >> 3 A14 <https://maps.google.com/?q=3+A14=gmail=g> B06 C07 >> 1.20 0 0 >> 4 A25 A23 A12 3.51 1 1 >> 5 A16 A25 A14 2,16 4 >> <https://maps.google.com/?q=A14+2,16+4=gmail=g> 4 >> >> >> On Wed, Dec 13, 2017 at 4:36 PM, Rui Barradas <ruipbarra...@sapo.pt> >> wrote: >> >> > Hello, >> > >> > Here is one way. >> > >> > tdat$D <- ifelse(tdat$B %in% tdat$A, tdat$A[tdat$B], 0) >> > tdat$E <- ifelse(tdat$B %in% tdat$A, tdat$A[tdat$C], 0) >> > >> > >> > Hope this helps, >> > >> > Rui Barradas >> > >> > >> > On 12/13/2017 9:36 PM, Val wrote: >> > >> >> Hi all, >> >> >> >> I have a data frame >> >> tdat <- read.table(textConnection("A B C Y >> >> A12 B03 C04 0.70 >> >> A23 B05 C06 0.05 >> >> A14 B06 C07 1.20 >> >> A25 A23 A12 3.51 >> >> A16 A25 A14 2,16 >> <https://maps.google.com/?q=A14+2,16=gmail=g>"),header = >> TRUE) >> >> >> >> I want match tdat$B with tdat$A and populate the column values of >> >> tdat$A >> >> ( col A and Col B) in the newly created columns (col D and col E). >> >> please >> >> find my attempt and the desired output below >> >> >> >> Desired output >> >> A B C Y D E >> >> A12 B03 C04 0.70 0 0 >> >> A23 B05 C06 0.05 0 0 >> >> A14 B06 C07 1.20 0 0 >> >> A25 A23 A12 3.51 B05 C06 >> >> A16 A25 A14 2,16 A23 A12 >> <https://maps.google.com/?q=2,16+A23+A12=gmail=g> >> >> >> >> my attempt, >> >> >> >> tdat$D <- 0 >> >> tdat$E <- 0 >> >> >> >> if(tdat$B %in% tdat$A) >> >>{ >> >>tdat$D <- tdat$A[tdat$B] >> >>tdat$E <- tdat$A[tdat$C] >> >> } >> >> but did not work. >> >> >> >> Thank you in advance >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> __ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/posti >> >> ng-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] match and new columns
Thank you Rui, I did not get the desired result. Here is the output from your script A B CY D E 1 A12 B03 C04 0.70 0 0 2 A23 B05 C06 0.05 0 0 3 A14 B06 C07 1.20 0 0 4 A25 A23 A12 3.51 1 1 5 A16 A25 A14 2,16 4 4 On Wed, Dec 13, 2017 at 4:36 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > Here is one way. > > tdat$D <- ifelse(tdat$B %in% tdat$A, tdat$A[tdat$B], 0) > tdat$E <- ifelse(tdat$B %in% tdat$A, tdat$A[tdat$C], 0) > > > Hope this helps, > > Rui Barradas > > > On 12/13/2017 9:36 PM, Val wrote: > >> Hi all, >> >> I have a data frame >> tdat <- read.table(textConnection("A B C Y >> A12 B03 C04 0.70 >> A23 B05 C06 0.05 >> A14 B06 C07 1.20 >> A25 A23 A12 3.51 >> A16 A25 A14 2,16"),header = TRUE) >> >> I want match tdat$B with tdat$A and populate the column values of >> tdat$A >> ( col A and Col B) in the newly created columns (col D and col E). >> please >> find my attempt and the desired output below >> >> Desired output >> A B C Y D E >> A12 B03 C04 0.70 0 0 >> A23 B05 C06 0.05 0 0 >> A14 B06 C07 1.20 0 0 >> A25 A23 A12 3.51 B05 C06 >> A16 A25 A14 2,16 A23 A12 >> >> my attempt, >> >> tdat$D <- 0 >> tdat$E <- 0 >> >> if(tdat$B %in% tdat$A) >>{ >>tdat$D <- tdat$A[tdat$B] >>tdat$E <- tdat$A[tdat$C] >> } >> but did not work. >> >> Thank you in advance >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] match and new columns
Hi all, I have a data frame tdat <- read.table(textConnection("A B C Y A12 B03 C04 0.70 A23 B05 C06 0.05 A14 B06 C07 1.20 A25 A23 A12 3.51 A16 A25 A14 2,16"),header = TRUE) I want match tdat$B with tdat$A and populate the column values of tdat$A ( col A and Col B) in the newly created columns (col D and col E). please find my attempt and the desired output below Desired output A B C Y D E A12 B03 C04 0.70 0 0 A23 B05 C06 0.05 0 0 A14 B06 C07 1.20 0 0 A25 A23 A12 3.51 B05 C06 A16 A25 A14 2,16 A23 A12 my attempt, tdat$D <- 0 tdat$E <- 0 if(tdat$B %in% tdat$A) { tdat$D <- tdat$A[tdat$B] tdat$E <- tdat$A[tdat$C] } but did not work. Thank you in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] family
Hi all, I am reading a huge data set(12M rows) that contains family information, Offspring, Parent1 and Parent2 Parent1 and parent2 should be in the first column as an offspring before their offspring information. Their parent information (parent1 and parent2) should be set to zero, if unknown. Also the first column should be unique. Here is my sample data set and desired output. fam <- read.table(textConnection(" offspring Parent1 Parent2 Smith Alex1 Alexa Carla Alex1 0 Jacky Smith Abbot Jack 0 Jacky Almo JackCarla "),header = TRUE) desired output. Offspring Parent1 Parent2 Alex1 00 Alexa 00 Abbot 00 SmithAlex1 Alexa CarlaAlex1 0 JackySmith Abbot Jack 0 Jacky Almo JackCarla Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] remove
Hi all, I have a date issue and would appreciate any help. I am reading a field data and n one of the columns I am expecting a date but has non date values such as character and empty. space. Here is a sample of my data. KL <- read.table(header=TRUE, text='ID date 711 Dead 712 Uknown 713 20-11-08 714 11-28-07 301 302 09-02-02 303 09-21-02',stringsAsFactors = FALSE, fill =T) str(KL) data.frame': 7 obs. of 2 variables: $ ID : int 711 712 713 714 301 302 303 $ date: chr "Dead" "Uknown" "20-11-08" "11-28-07" . I wanted to convert the date column as follows. if (max(unique(nchar(as.character(KL$date==10) { KL$date <- as.Date(KL$date,"%m/%d/%Y") } but not working. How could I to remove the corresponding entire row. that do not have a date format and do the operation? thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] New var
Thank you Jeff and All, Within a given time period (say 700 days, from the start day), I am expecting measurements taken at each time interval;. In this case "0" means measurement taken, "1" not taken (stopped or opted out and " -1" don't consider that time period for that individual. This will be compared with the actual measurements taken (Observed- expected) within each time interval. On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > # read.table is NOT part of the data.table package > #library(data.table) > DFM <- read.table( text= > 'obs start end > 1 2/1/2015 1/1/2017 > 2 4/11/2010 1/1/2011 > 3 1/4/2006 5/3/2007 > 4 10/1/2007 1/1/2008 > 5 6/1/2011 1/1/2012 > 6 10/5/2004 12/1/2004 > ',header = TRUE, stringsAsFactors = FALSE) > # cleaner way to compute D > DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" ) > DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" ) > DFM$D <- as.numeric( DFM$end - DFM$start, units="days" ) > # categorize your data into groups > DFM$bin <- cut( DFM$D > , breaks=c( seq( 0, 500, 100 ), Inf ) > , right=FALSE # do not include the right edge > , ordered_result = TRUE > ) > # brute force method you should have been able to figure out to show us > some work > DFM$t1 <- ifelse( DFM$D < 100, 1, 0 ) > DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1, > 0 ) ) > DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1, > 0 ) ) > DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1, > 0 ) ) > DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1, > 0 ) ) > # brute force method with ordered factor > DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 ) > DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" < > DFM$bin, 0, -1 ) ) > DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" < > DFM$bin, 0, -1 ) ) > DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" < > DFM$bin, 0, -1 ) ) > DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" < > DFM$bin, 0, -1 ) ) > # less obvious approach using the fact that factors are integers > # and using the outer function to find all combinations of elements of two > vectors > # and the sign function > DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin ) > , 1:5 > , FUN = function(x,y) { > z <- sign(y-x)+1L > ifelse( 2 == z, -1L, z ) > } > ) > > # my result, provided using dput for precise representation > DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710, > 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167, > 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, > 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, > 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", > "[400,500)", "[500,Inf)"), class = c("ordered", "factor")), t1 = c(0, > 0, 0, 1, 0, 1), t2 = c(0, 0, 0, -1, 0, -1), t3 = c(0, 1, 0, -1, > 1, -1), t4 = c(0, -1, 0, -1, -1, -1), t5 = c(0, -1, 1, -1, -1, > -1), tf1 = c(0, 0, 0, 1, 0, 1), tf2 = c(0, 0, 0, -1, 0, -1), > tf3 = c(0, 1, 0, -1, 1, -1), tf4 = c(0, -1, 0, -1, -1, -1 > ), tf5 = c(0, -1, 1, -1, -1, -1), tm1 = c(0, 0, 0, 1, 0, > 1), tm2 = c(0, 0, 0, -1, 0, -1), tm3 = c(0, 1, 0, -1, 1, > -1), tm4 = c(0, -1, 0, -1, -1, -1), tm5 = c(0, -1, 1, -1, > -1, -1)), row.names = c(NA, -6L), .Names = c("obs", "start", > "end", "D", "bin", "t1", "t2", "t3", "t4", "t5", "tf1", "tf2", > "tf3", "tf4", "tf5", "tm1", "tm2", "tm3", "tm4", "tm5"), class = > "data.frame") > > You did not address Bert's request for some context, but I am curious how > he or Peter would have approached this problem, so I encourage you do > provide some insight on the list as to why you are doing this. > > > On Sat, 3 Jun 2017, Val wrote: > > Thank you all for the useful suggestion. I did
Re: [R] New var
Thank you all for the useful suggestion. I did some of my homework. library(data.table) DFM <- read.table(header=TRUE, text='obs start end 1 2/1/2015 1/1/2017 2 4/11/2010 1/1/2011 3 1/4/2006 5/3/2007 4 10/1/2007 1/1/2008 5 6/1/2011 1/1/2012 6 10/5/2004 12/1/2004',stringsAsFactors = FALSE) DFM DFM$D =as.numeric(difftime(as.Date(DFM$end,format="%m/%d/%Y"), as.Date(DFM$start,format="%m/%d/%Y"), units = "days")) DFM output. obs start end D 1 1 2/1/2015 1/1/2017 700 2 2 4/11/2010 1/1/2011 265 3 3 1/4/2006 5/3/2007 484 4 4 10/1/2007 1/1/2008 92 5 5 6/1/2011 1/1/2012 214 6 6 10/5/2004 12/1/2004 57 My problem is how do I get the other new variables obs start end D t1,t2,t3,t4, t5 1, 2/1/2015, 1/1/2017, 700,0,0,0,0,0 2, 4/11/2010, 1/1/2011, 265,0,0,1,-1,-1 3, 1/4/2006, 5/3/2007, 484,0,0,0,0,1 4, 10/1/2007, 1/1/2008, 92,1,-1,-1,-1,-1 5, 6/1/2011, 1/1/2012, 214,0,0,1,-1,-1 6, 10/15/2004,12/1/2004,47,1,-1,-1,-1,-1 Thank you again. On Sat, Jun 3, 2017 at 12:13 AM, Bert Gunter <bgunter.4...@gmail.com> wrote: > Ii is difficult to provide useful help, because you have failed to > read and follow the posting guide. In particular: > > 1. Plain text, not HTML. > 2. Use dput() or provide code to create your example. Text printouts > such as that which you gave require some work to wrangle into into an > example that we can test. > > Specifically: > > 3. Have you gone through any R tutorials?-- it sure doesn't look like > it. We do expect some effort to learn R before posting. > > 4. What is the format of your date columns? character, factors, > POSIX,...? See ?date-time for details. Note particularly the > "difftime" link to obtain intervals. > > 5. ?ifelse for vectorized conditionals. > > Also, you might want to explain the context of what you are trying to > do. I strongly suspect you shouldn't be doing it at all, but that is > just a guess. > > Be sure to cc your reply to the list, not just to me. > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Jun 2, 2017 at 8:49 PM, Val <valkr...@gmail.com> wrote: >> Hi all, >> >> I have a data set with time interval and depending on the interval I want >> to create 5 more variables . Sample data below >> >> obs, Start, End >> 1,2/1/2015, 1/1/2017 >> 2,4/11/2010, 1/1/2011 >> 3,1/4/2006, 5/3/2007 >> 4,10/1/2007, 1/1/2008 >> 5,6/1/2011, 1/1/2012 >> 6,10/15/2004,12/1/2004 >> >> First, I want get interval between the start date and end dates >> (End-start). >> >> obs, Start , end, datediff >> 1,2/1/2015, 1/1/2017, 700 >> 2,4/11/2010, 1/1/2011, 265 >> 3,1/4/2006, 5/3/2007, 484 >> 4,10/1/2007, 1/1/2008, 92 >> 5,6/1/2011, 1/1/2012, 214 >> 6,10/15/2004,12/1/2004,47 >> >> Second. I want create 5 more variables t1, t2, t3, t4 and t5 >> The value of each variable is defined as follows >> if datediff < 100 then t1=1, t2=t3=t4=t5=-1. >> if datediff >= 100 and < 200 then t1=0, t2=1,t3=t4=t5=-1, >> if datediff >= 200 and < 300 then t1=0, t2=0,t3=1,t4=t5=-1, >> if datediff >= 300 and < 400 then t1=0, t2=0,t3=0,t4=1,t5=-1, >> if datediff >= 400 and < 500 then t1=0, t2=0,t3=0,t4=0,t5=1, >> if datediff >= 500 then t1=0, t2=0,t3=0,t4=0,t5=0 >> >> The complete out put looks like as follow. >> obs, start, end,datediff, t1, t2, t3, t4, t5 >> 1,2/1/2015, 1/1/2017,700, 0, 0, 0, 0, 0 >> 2, 4/11/2010, 1/1/2011,265, 0, 0, 1, -1, -1 >> 3,1/4/2006, 5/3/2007,484, 0, 0, 0, 0, 1 >> 4, 10/1/2007, 1/1/2008, 92, 1, -1, -1,-1, -1 >> 5 ,6/1/2011,1/1/2012, 214, 0, 0, 1,-1, -1 >> 6, 10/15/2004, 12/1/2004, 47, 1, -1, -1, -1, -1 >> >> Thank you. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] New var
Hi all, I have a data set with time interval and depending on the interval I want to create 5 more variables . Sample data below obs, Start, End 1,2/1/2015, 1/1/2017 2,4/11/2010, 1/1/2011 3,1/4/2006, 5/3/2007 4,10/1/2007, 1/1/2008 5,6/1/2011, 1/1/2012 6,10/15/2004,12/1/2004 First, I want get interval between the start date and end dates (End-start). obs, Start , end, datediff 1,2/1/2015, 1/1/2017, 700 2,4/11/2010, 1/1/2011, 265 3,1/4/2006, 5/3/2007, 484 4,10/1/2007, 1/1/2008, 92 5,6/1/2011, 1/1/2012, 214 6,10/15/2004,12/1/2004,47 Second. I want create 5 more variables t1, t2, t3, t4 and t5 The value of each variable is defined as follows if datediff < 100 then t1=1, t2=t3=t4=t5=-1. if datediff >= 100 and < 200 then t1=0, t2=1,t3=t4=t5=-1, if datediff >= 200 and < 300 then t1=0, t2=0,t3=1,t4=t5=-1, if datediff >= 300 and < 400 then t1=0, t2=0,t3=0,t4=1,t5=-1, if datediff >= 400 and < 500 then t1=0, t2=0,t3=0,t4=0,t5=1, if datediff >= 500 then t1=0, t2=0,t3=0,t4=0,t5=0 The complete out put looks like as follow. obs, start, end,datediff, t1, t2, t3, t4, t5 1,2/1/2015, 1/1/2017,700, 0, 0, 0, 0, 0 2, 4/11/2010, 1/1/2011,265, 0, 0, 1, -1, -1 3,1/4/2006, 5/3/2007,484, 0, 0, 0, 0, 1 4, 10/1/2007, 1/1/2008, 92, 1, -1, -1,-1, -1 5 ,6/1/2011,1/1/2012, 214, 0, 0, 1,-1, -1 6, 10/15/2004, 12/1/2004, 47, 1, -1, -1, -1, -1 Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] missing and replace
HI all, I have a data frame with three variables. Some of the variables do have missing values and I want to replace those missing values (1represented by NA) with the mean value of that variable. In this sample data, variable z and y do have missing values. The mean value of y and z are152. 25 and 359.5, respectively . I want replace those missing values by the respective mean value ( rounded to the nearest whole number). DF1 <- read.table(header=TRUE, text='ID1 x y z 1 25 122352 2 30 135376 3 40 NA350 4 26 157NA 5 60 195360') mean x= 36.2 mean y=152.25 mean z= 359.5 output ID1 x y z 1 25 122 352 2 30 135 376 3 40 152 350 4 26 157 360 5 60 195 360 Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combination
Hi all, I have two variables x and y. X has five observation and y has three. I want combine each element of x to each element of y values to produce 15 observation. Below is my sample data and desired output data x Y 1 A 2 B 3 C 4 5 Output 1 A 1 B 1 C 2 A 2 B 2 C 3 A 3 B 3 C 4 A 4 B 4 C 5 A 5 B 5 C Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] create new
Hi all, I have several variables in a group and one group contains three variables. Sample of data ( Year, x1, x3 and x2) mydat <- read.table(header=TRUE, text=' Year x1 x3 x2 Year1 10 120 Year2 0 150 Year3 0 020 Year4 25 0 12 Year5 15 25 12 Year6 0 16 14 Year7 0 100') I want create another variable( x4) based on the following condition. if x1 > 0 then x4 = x1; regardless of x2 and x3 values. if x1 = 0 and x2 > 0 then x4 = x2; if x1 = 0 and x2 = 0 then x4 = x3 The desired output looks like as follows Yearx1 x3 x2 x4 Year1 10 120 10 Year20 150 15 Year300 20 20 Year4 250 12 25 Year5 15 25 12 15 Year60 16 14 14 Year70 10 0 10 Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] screen
HI all, I have some data to be screened based on the recording flag (obs). Some family recorded properly (1) and others not (0). Th 0 = improper and 1 = proper The recording period starts week1. All families may not start in the same week in recording properly an observation, DF2 <- read.table(header=TRUE, text='family time obs A WEEK1 0 A WEEK1 0 A WEEK1 0 A WEEK2 1 A WEEK2 0 A WEEK3 1 A WEEK3 0 B WEEK1 1 B WEEK1 0 B WEEK1 1 B WEEK2 0 B WEEK2 0 B WEEK3 1 B WEEK3 0 C WEEK3 0 C WEEK3 0 C WEEK4 1 C WEEK4 1') Example, in week1 all records of family "A" are 0 (improper), but starting the week2 they start recording proper (1) records as well. Then I create a table that shows me the ratio of proper records to the total records for each family within week. If the ratio is zero and there is no prior proper recordings for that family then I want to delete those records. However, once any family started showing proper records as "1" and even if in the the subsequent week the ratio is 0 then I want keep that record for that family. Example records of week2 for family B Here is the summary table WEEK1 WEEK2WEEK3WEEK4 A 00.5 0.5 . B 0.33 00.5 . C . . 01 >From the above table For A- I want exclude all records of week1 and keep the rest. Because they were not recording it propeller For B- Keep all records, as they stated recording properly from the beginning. For C- Keep only the week4 records because all records are 1's Final and desired result will be A WEEK2 1 A WEEK2 0 A WEEK3 1 A WEEK3 0 B WEEK1 1 B WEEK1 0 B WEEK1 1 B WEEK2 0 B WEEK2 0 B WEEK3 1 B WEEK3 0 C WEEK4 1 C WEEK4 1 and the summary table looks like as follows WEEK1 WEEK2 WEEK3 WEEK4 A .0.5 0.5. B 0.330 0.5. C . . .1 Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replace
HI all, if first name is Alex then I want concatenate the second column to Alex to produce Alex and the second column value DF1 <- read.table(header=TRUE, text='first YR Alex2001 Bob 2001 Cory2001 Cory2002 Bob 2002 Bob 2003 Alex2002 Alex2003 Alex2004') Output data frame DF2 Alex-2001 2001 Bob 2001 Cory2001 Cory2002 Bob 2002 Bob 2003 Alex-2002 2002 Alex-2003 2003 Alex-2004 2004 I tried this one but did not work. DF1$first[DF1$first=="Alex"] <- paste(DF1$first, DF1$YR, sep='-') Thank you in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] if and
Thank you Rolf and Bert! I found the problem and this if(country="USA" & year-month = "FEB2015" | "FEB2012" ){ has be changed to this if(country="USA" & year-month == "FEB2015" | year-month == "FEB2012" ){ On Mon, Feb 27, 2017 at 8:45 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > I note that you have "Year-month" (capital 'Y') and "year-month" in > your code; case matters in R. > > Otherwise, Rolf's advice applies. > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Feb 27, 2017 at 6:16 PM, Rolf Turner <r.tur...@auckland.ac.nz> > wrote: > > On 28/02/17 14:47, Val wrote: > >> > >> Currently I have about six or more scripts that do the same job. I > >> thought it might be possible and more efficient to use one script by > using > >> IF ELSE statements. Here is an example but this will be expandable for > >> several countries ans year-months > >> > >> > >> Year-month = FEB2015, FEB2012, Feb2010 > >> country = USA, CAN.MEX > >> First I want to do if country = USA and year-month = FEB2015, FEB2012 do > >> the statements > >> second if country = CAN and year-month =Feb2010 do the statements > >> > >> > >> if(country="USA" & year-month = "FEB2015" | "FEB2012" ){ > >> statemnt1 > >> . > >> statemnt10 > >> > >> } else if (country="USA" & year-month ="FEB2015") { > >> statemnt1 > >> . > >> statemnt10 > >> } > >> > >> else > >> { > >> statemnt1 > >> . > >> statemnt10 > >> } > >> > >> The above script did not work. is there a different ways of doing it? > > > > > > Uh, yes. Get the syntax right. Use R, when you are using R. > > > > Looking at ?Syntax and ?Logic might help you a bit. > > > > Other than that, there's not much that one can say without seeing a > > reproducible example. And if you sat down and wrote out a *reproducible > > example*, using correct R syntax, you probably wouldn't need any > assistance > > from R-help. > > > > Have you read any of the readily available R tutorials? If not do so. If > > so, read them again and actually take note of what they say! > > > > cheers, > > > > Rolf Turner > > > > -- > > Technical Editor ANZJS > > Department of Statistics > > University of Auckland > > Phone: +64-9-373-7599 ext. 88276 > > > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] if and
Currently I have about six or more scripts that do the same job. I thought it might be possible and more efficient to use one script by using IF ELSE statements. Here is an example but this will be expandable for several countries ans year-months Year-month = FEB2015, FEB2012, Feb2010 country = USA, CAN.MEX First I want to do if country = USA and year-month = FEB2015, FEB2012 do the statements second if country = CAN and year-month =Feb2010 do the statements if(country="USA" & year-month = "FEB2015" | "FEB2012" ){ statemnt1 . statemnt10 } else if (country="USA" & year-month ="FEB2015") { statemnt1 . statemnt10 } else { statemnt1 . statemnt10 } The above script did not work. is there a different ways of doing it? Thank you in advance . [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove
Hi Jeff and All, When I examined the excluded data, ie., first name with with different last names, I noticed that some last names were not recorded or instance, I modified the data as follows DF <- read.table( text= 'first week last Alex1 West Bob 1 John Cory1 Jack Cory2 - Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ', header = TRUE, as.is = TRUE ) err2 <- ave( seq_along( DF$first ) , DF[ , "first", drop = FALSE] , FUN = function( n ) { length( unique( DF[ n, "last" ] ) ) } ) result2 <- DF[ 1 == err2, ] result2 first week last 2 Bob1 John 5 Bob2 John 6 Bob3 John However, I want keep Cory's record. It is assumed that not recorded should have the same last name. Final out put should be first week last Bob1 John Bob2 John Bob3 John Cory1 Jack Cory2 - Thank you again! On Sun, Feb 12, 2017 at 7:28 PM, Val <valkr...@gmail.com> wrote: > Sorry Jeff, I did not finish my email. I accidentally touched the send > button. > My question was the > when I used this one > length(unique(result2$first)) > vs > dim(result2[!duplicated(result2[,c('first')]),]) [1] > > I did get different results but now I found out the problem. > > Thank you!. > > > > > > > > > On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller > <jdnew...@dcn.davis.ca.us> wrote: >> Your question mystifies me, since it looks to me like you already know the >> answer. >> -- >> Sent from my phone. Please excuse my brevity. >> >> On February 12, 2017 3:30:49 PM PST, Val <valkr...@gmail.com> wrote: >>>Hi Jeff and all, >>> How do I get the number of unique first names in the two data sets? >>> >>>for the first one, >>>result2 <- DF[ 1 == err2, ] >>>length(unique(result2$first)) >>> >>> >>> >>> >>>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller >>><jdnew...@dcn.davis.ca.us> wrote: >>>> The "by" function aggregates and returns a result with generally >>>fewer rows >>>> than the original data. Since you are looking to index the rows in >>>the >>>> original data set, the "ave" function is better suited because it >>>always >>>> returns a vector that is just as long as the input vector: >>>> >>>> # I usually work with character data rather than factors if I plan >>>> # to modify the data (e.g. removing rows) >>>> DF <- read.table( text= >>>> 'first week last >>>> Alex1 West >>>> Bob 1 John >>>> Cory1 Jack >>>> Cory2 Jack >>>> Bob 2 John >>>> Bob 3 John >>>> Alex2 Joseph >>>> Alex3 West >>>> Alex4 West >>>> ', header = TRUE, as.is = TRUE ) >>>> >>>> err <- ave( DF$last >>>> , DF[ , "first", drop = FALSE] >>>> , FUN = function( lst ) { >>>> length( unique( lst ) ) >>>> } >>>> ) >>>> result <- DF[ "1" == err, ] >>>> result >>>> >>>> Notice that the ave function returns a vector of the same type as was >>>given >>>> to it, so even though the function returns a numeric the err >>>> vector is character. >>>> >>>> If you wanted to be able to examine more than one other column in >>>> determining the keep/reject decision, you could do: >>>> >>>> err2 <- ave( seq_along( DF$first ) >>>>, DF[ , "first", drop = FALSE] >>>>, FUN = function( n ) { >>>> length( unique( DF[ n, "last" ] ) ) >>>> } >>>>) >>>> result2 <- DF[ 1 == err2, ] >>>> result2 >>>> >>>> and then you would have the option to re-use the "n" index to look at >>>other >>>> columns as well. >>>> >>>> Finally, here is a dplyr solution: >>>> >>>> library(dplyr) >>>> result3 <- ( DF >>>>%>% group_by( first ) # like a prep for ave or by >>>>%>% mutate( err = length( unique( last ) ) ) # similar to >>>ave >>>>%>% filter( 1 == err ) # drop the rows with too many last >
Re: [R] remove
Sorry Jeff, I did not finish my email. I accidentally touched the send button. My question was the when I used this one length(unique(result2$first)) vs dim(result2[!duplicated(result2[,c('first')]),]) [1] I did get different results but now I found out the problem. Thank you!. On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > Your question mystifies me, since it looks to me like you already know the > answer. > -- > Sent from my phone. Please excuse my brevity. > > On February 12, 2017 3:30:49 PM PST, Val <valkr...@gmail.com> wrote: >>Hi Jeff and all, >> How do I get the number of unique first names in the two data sets? >> >>for the first one, >>result2 <- DF[ 1 == err2, ] >>length(unique(result2$first)) >> >> >> >> >>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller >><jdnew...@dcn.davis.ca.us> wrote: >>> The "by" function aggregates and returns a result with generally >>fewer rows >>> than the original data. Since you are looking to index the rows in >>the >>> original data set, the "ave" function is better suited because it >>always >>> returns a vector that is just as long as the input vector: >>> >>> # I usually work with character data rather than factors if I plan >>> # to modify the data (e.g. removing rows) >>> DF <- read.table( text= >>> 'first week last >>> Alex1 West >>> Bob 1 John >>> Cory1 Jack >>> Cory2 Jack >>> Bob 2 John >>> Bob 3 John >>> Alex2 Joseph >>> Alex3 West >>> Alex4 West >>> ', header = TRUE, as.is = TRUE ) >>> >>> err <- ave( DF$last >>> , DF[ , "first", drop = FALSE] >>> , FUN = function( lst ) { >>> length( unique( lst ) ) >>> } >>> ) >>> result <- DF[ "1" == err, ] >>> result >>> >>> Notice that the ave function returns a vector of the same type as was >>given >>> to it, so even though the function returns a numeric the err >>> vector is character. >>> >>> If you wanted to be able to examine more than one other column in >>> determining the keep/reject decision, you could do: >>> >>> err2 <- ave( seq_along( DF$first ) >>>, DF[ , "first", drop = FALSE] >>>, FUN = function( n ) { >>> length( unique( DF[ n, "last" ] ) ) >>> } >>>) >>> result2 <- DF[ 1 == err2, ] >>> result2 >>> >>> and then you would have the option to re-use the "n" index to look at >>other >>> columns as well. >>> >>> Finally, here is a dplyr solution: >>> >>> library(dplyr) >>> result3 <- ( DF >>>%>% group_by( first ) # like a prep for ave or by >>>%>% mutate( err = length( unique( last ) ) ) # similar to >>ave >>>%>% filter( 1 == err ) # drop the rows with too many last >>names >>>%>% select( -err ) # drop the temporary column >>>%>% as.data.frame # convert back to a plain-jane data >>frame >>>) >>> result3 >>> >>> which uses a small set of verbs in a pipeline of functions to go from >>input >>> to result in one pass. >>> >>> If your data set is really big (running out of memory big) then you >>might >>> want to investigate the data.table or sqlite packages, either of >>which can >>> be combined with dplyr to get a standardized syntax for managing >>larger >>> amounts of data. However, most people actually aren't running out of >>memory >>> so in most cases the extra horsepower isn't actually needed. >>> >>> >>> On Sun, 12 Feb 2017, P Tennant wrote: >>> >>>> Hi Val, >>>> >>>> The by() function could be used here. With the dataframe dfr: >>>> >>>> # split the data by first name and check for more than one last name >>for >>>> each first name >>>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) >>>> # make the result more easily manipulated >>>> res <- as.table(res) >>>> res >>>> # first >>>> # Alex Bob Cory >
Re: [R] remove
Hi Jeff and all, How do I get the number of unique first names in the two data sets? for the first one, result2 <- DF[ 1 == err2, ] length(unique(result2$first)) On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > The "by" function aggregates and returns a result with generally fewer rows > than the original data. Since you are looking to index the rows in the > original data set, the "ave" function is better suited because it always > returns a vector that is just as long as the input vector: > > # I usually work with character data rather than factors if I plan > # to modify the data (e.g. removing rows) > DF <- read.table( text= > 'first week last > Alex1 West > Bob 1 John > Cory1 Jack > Cory2 Jack > Bob 2 John > Bob 3 John > Alex2 Joseph > Alex3 West > Alex4 West > ', header = TRUE, as.is = TRUE ) > > err <- ave( DF$last > , DF[ , "first", drop = FALSE] > , FUN = function( lst ) { > length( unique( lst ) ) > } > ) > result <- DF[ "1" == err, ] > result > > Notice that the ave function returns a vector of the same type as was given > to it, so even though the function returns a numeric the err > vector is character. > > If you wanted to be able to examine more than one other column in > determining the keep/reject decision, you could do: > > err2 <- ave( seq_along( DF$first ) >, DF[ , "first", drop = FALSE] >, FUN = function( n ) { > length( unique( DF[ n, "last" ] ) ) > } >) > result2 <- DF[ 1 == err2, ] > result2 > > and then you would have the option to re-use the "n" index to look at other > columns as well. > > Finally, here is a dplyr solution: > > library(dplyr) > result3 <- ( DF >%>% group_by( first ) # like a prep for ave or by >%>% mutate( err = length( unique( last ) ) ) # similar to ave >%>% filter( 1 == err ) # drop the rows with too many last names >%>% select( -err ) # drop the temporary column >%>% as.data.frame # convert back to a plain-jane data frame >) > result3 > > which uses a small set of verbs in a pipeline of functions to go from input > to result in one pass. > > If your data set is really big (running out of memory big) then you might > want to investigate the data.table or sqlite packages, either of which can > be combined with dplyr to get a standardized syntax for managing larger > amounts of data. However, most people actually aren't running out of memory > so in most cases the extra horsepower isn't actually needed. > > > On Sun, 12 Feb 2017, P Tennant wrote: > >> Hi Val, >> >> The by() function could be used here. With the dataframe dfr: >> >> # split the data by first name and check for more than one last name for >> each first name >> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) >> # make the result more easily manipulated >> res <- as.table(res) >> res >> # first >> # Alex Bob Cory >> # TRUE FALSE FALSE >> >> # then use this result to subset the data >> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ] >> # sort if needed >> nw.dfr[order(nw.dfr$first) , ] >> >> first week last >> 2 Bob1 John >> 5 Bob2 John >> 6 Bob3 John >> 3 Cory1 Jack >> 4 Cory2 Jack >> >> >> Philip >> >> On 12/02/2017 4:02 PM, Val wrote: >>> >>> Hi all, >>> I have a big data set and want to remove rows conditionally. >>> In my data file each person were recorded for several weeks. Somehow >>> during the recording periods, their last name was misreported. For >>> each person, the last name should be the same. Otherwise remove from >>> the data. Example, in the following data set, Alex was found to have >>> two last names . >>> >>> Alex West >>> Alex Joseph >>> >>> Alex should be removed from the data. if this happens then I want >>> remove all rows with Alex. Here is my data set >>> >>> df<- read.table(header=TRUE, text='first week last >>> Alex1 West >>> Bob 1 John >>> Cory1 Jack >>> Cory2 Jack >>> Bob 2 John >>> Bob 3 John >>> Alex2 Joseph >>> Alex3 West >>> Alex4 W
Re: [R] [FORGED] Re: remove
Thank you Rainer, The question was :- 1. Identify those first names with different last names or more than one last names. 2. Once identified (like Alex) then exclude them. This is because not reliable record. On Sun, Feb 12, 2017 at 11:17 AM, Rainer Schuermann <rainer.schuerm...@gmx.net> wrote: > I may not be understanding the question well enough but for me > > df[ df[ , "first"] != "Alex", ] > > seems to do the job: > > first week last > > Rainer > > > > > On Sonntag, 12. Februar 2017 19:04:19 CET Rolf Turner wrote: >> >> On 12/02/17 18:36, Bert Gunter wrote: >> > Basic stuff! >> > >> > Either subscripting or ?subset. >> > >> > There are many good R tutorials on the web. You should spend some >> > (more?) time with some. >> >> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't >> seem basic to me. The only way that I can see how to go at it is via >> a for loop: >> >> rdln <- function(X) { >> # Remove discordant last names. >> ok <- logical(nrow(X)) >> for(nm in unique(X$first)) { >> xxx <- unique(X$last[X$first==nm]) >> if(length(xxx)==1) ok[X$first==nm] <- TRUE >> } >> Y <- X[ok,] >> Y <- Y[order(Y$first),] >> rownames(Y) <- 1:nrow(Y) >> Y >> } >> >> Calling the toy data frame "melvin" rather than "df" (since "df" is the >> name of the built in F density function, it is bad form to use it as the >> name of another object) I get: >> >> > rdln(melvin) >>first week last >> 1 Bob1 John >> 2 Bob2 John >> 3 Bob3 John >> 4 Cory1 Jack >> 5 Cory2 Jack >> >> which is the desired output. If there is a "basic stuff" way to do this >> I'd like to see it. Perhaps I will then be toadally embarrassed, but >> they say that this is good for one. >> >> cheers, >> >> Rolf >> >> > On Sat, Feb 11, 2017 at 9:02 PM, Val <valkr...@gmail.com> wrote: >> >> Hi all, >> >> I have a big data set and want to remove rows conditionally. >> >> In my data file each person were recorded for several weeks. Somehow >> >> during the recording periods, their last name was misreported. For >> >> each person, the last name should be the same. Otherwise remove from >> >> the data. Example, in the following data set, Alex was found to have >> >> two last names . >> >> >> >> Alex West >> >> Alex Joseph >> >> >> >> Alex should be removed from the data. if this happens then I want >> >> remove all rows with Alex. Here is my data set >> >> >> >> df <- read.table(header=TRUE, text='first week last >> >> Alex1 West >> >> Bob 1 John >> >> Cory1 Jack >> >> Cory2 Jack >> >> Bob 2 John >> >> Bob 3 John >> >> Alex2 Joseph >> >> Alex3 West >> >> Alex4 West ') >> >> >> >> Desired output >> >> >> >> first week last >> >> 1 Bob 1 John >> >> 2 Bob 2 John >> >> 3 Bob 3 John >> >> 4 Cory 1 Jack >> >> 5 Cory 2 Jack >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove
Jeff, Rolf and Philip. Thank you very much for your suggestion. Jeff, you suggested if your data is big then consider data.table My data is "big" it is more than 200M records and I will see if this function works. Thank you again. On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > The "by" function aggregates and returns a result with generally fewer rows > than the original data. Since you are looking to index the rows in the > original data set, the "ave" function is better suited because it always > returns a vector that is just as long as the input vector: > > # I usually work with character data rather than factors if I plan > # to modify the data (e.g. removing rows) > DF <- read.table( text= > 'first week last > Alex1 West > Bob 1 John > Cory1 Jack > Cory2 Jack > Bob 2 John > Bob 3 John > Alex2 Joseph > Alex3 West > Alex4 West > ', header = TRUE, as.is = TRUE ) > > err <- ave( DF$last > , DF[ , "first", drop = FALSE] > , FUN = function( lst ) { > length( unique( lst ) ) > } > ) > result <- DF[ "1" == err, ] > result > > Notice that the ave function returns a vector of the same type as was given > to it, so even though the function returns a numeric the err > vector is character. > > If you wanted to be able to examine more than one other column in > determining the keep/reject decision, you could do: > > err2 <- ave( seq_along( DF$first ) >, DF[ , "first", drop = FALSE] >, FUN = function( n ) { > length( unique( DF[ n, "last" ] ) ) > } >) > result2 <- DF[ 1 == err2, ] > result2 > > and then you would have the option to re-use the "n" index to look at other > columns as well. > > Finally, here is a dplyr solution: > > library(dplyr) > result3 <- ( DF >%>% group_by( first ) # like a prep for ave or by >%>% mutate( err = length( unique( last ) ) ) # similar to ave >%>% filter( 1 == err ) # drop the rows with too many last names >%>% select( -err ) # drop the temporary column >%>% as.data.frame # convert back to a plain-jane data frame >) > result3 > > which uses a small set of verbs in a pipeline of functions to go from input > to result in one pass. > > If your data set is really big (running out of memory big) then you might > want to investigate the data.table or sqlite packages, either of which can > be combined with dplyr to get a standardized syntax for managing larger > amounts of data. However, most people actually aren't running out of memory > so in most cases the extra horsepower isn't actually needed. > > > On Sun, 12 Feb 2017, P Tennant wrote: > >> Hi Val, >> >> The by() function could be used here. With the dataframe dfr: >> >> # split the data by first name and check for more than one last name for >> each first name >> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) >> # make the result more easily manipulated >> res <- as.table(res) >> res >> # first >> # Alex Bob Cory >> # TRUE FALSE FALSE >> >> # then use this result to subset the data >> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ] >> # sort if needed >> nw.dfr[order(nw.dfr$first) , ] >> >> first week last >> 2 Bob1 John >> 5 Bob2 John >> 6 Bob3 John >> 3 Cory1 Jack >> 4 Cory2 Jack >> >> >> Philip >> >> On 12/02/2017 4:02 PM, Val wrote: >>> >>> Hi all, >>> I have a big data set and want to remove rows conditionally. >>> In my data file each person were recorded for several weeks. Somehow >>> during the recording periods, their last name was misreported. For >>> each person, the last name should be the same. Otherwise remove from >>> the data. Example, in the following data set, Alex was found to have >>> two last names . >>> >>> Alex West >>> Alex Joseph >>> >>> Alex should be removed from the data. if this happens then I want >>> remove all rows with Alex. Here is my data set >>> >>> df<- read.table(header=TRUE, text='first week last >>> Alex1 West >>> Bob 1 John >>> Cory1 Jack >>> Cory2 Jack >>> Bob 2 John >>> Bob 3 John >>>
[R] remove
Hi all, I have a big data set and want to remove rows conditionally. In my data file each person were recorded for several weeks. Somehow during the recording periods, their last name was misreported. For each person, the last name should be the same. Otherwise remove from the data. Example, in the following data set, Alex was found to have two last names . Alex West Alex Joseph Alex should be removed from the data. if this happens then I want remove all rows with Alex. Here is my data set df <- read.table(header=TRUE, text='first week last Alex1 West Bob 1 John Cory1 Jack Cory2 Jack Bob 2 John Bob 3 John Alex2 Joseph Alex3 West Alex4 West ') Desired output first week last 1 Bob 1 John 2 Bob 2 John 3 Bob 3 John 4 Cory 1 Jack 5 Cory 2 Jack Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] output
Hi Marc and all, Last time you suggest me to use WriteXLS function to write more than 65,000 row in excel. Creating the file worked fine. Now I wanted to read it using the WriteXLS function but have a problem,. The file has more than one sheets. Here is the script and the error message. datx <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) } dat <-data.frame(datx(11,10,2)) WriteXLS(dat, "test5.xlsx", row.names=FALSE) I created several sheets by copying the first sheet t1<- read.xls("Test6.xlsx",2, stringsAsFactors=FALSE) I am getting an error message of Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input Thank you in advance On Tue, Dec 13, 2016 at 5:07 PM, Val <valkr...@gmail.com> wrote: > Marc, > Thank you so much! That was helpful comment. > > > On Mon, Dec 12, 2016 at 10:09 PM, Marc Schwartz <marc_schwa...@me.com> wrote: >> Hi, >> >> With the WriteXLS() function, from the package of the same name, if you >> specify '.xlsx' for the file name extension, the function will create an >> Excel 2007 compatible file, which can handle worksheets of up to 1,048,576 >> rows by 16,384 columns. >> >> Thus: >> >> WriteXLS(dat, "test4.xlsx", row.names = FALSE) >> >> That is all described in the help file for the function. >> >> Regards, >> >> Marc Schwartz >> >> >>> On Dec 12, 2016, at 6:51 PM, Val <valkr...@gmail.com> wrote: >>> >>> Hi all, >>> >>> I have a data frame with more than 100,000 rows. >>> >>> datx <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) } >>> dat <- datx(11,10,2) >>> >>> 1) >>> WriteXLS(dat, "test4.xls", row.names=FALSE) >>> Error in WriteXLS(dat, "test4.xls", row.names = FALSE) : >>> One or more of the data frames named in 'x' exceeds 65,535 rows or 256 >>> columns >>> >>> I noticed that *.xls has row and column limitations. >>> >>> How can I take the excess row to the next sheet? >>> >>> 2) I also tried to use xlsx and have a problem >>> >>> write.xlsx(dat, "test3.xlsx",sheetName="sheet1", row.names=FALSE) >>> Error in .jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook") : >>> java.lang.OutOfMemoryError: Java heap >>> space.jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook")>> class "jobjRef"> >>> >>> Any help ? >>> Thank you in advance >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] output
Hi all, I have a data frame with more than 100,000 rows. datx <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) } dat <- datx(11,10,2) 1) WriteXLS(dat, "test4.xls", row.names=FALSE) Error in WriteXLS(dat, "test4.xls", row.names = FALSE) : One or more of the data frames named in 'x' exceeds 65,535 rows or 256 columns I noticed that *.xls has row and column limitations. How can I take the excess row to the next sheet? 2) I also tried to use xlsx and have a problem write.xlsx(dat, "test3.xlsx",sheetName="sheet1", row.names=FALSE) Error in .jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook") : java.lang.OutOfMemoryError: Java heap space.jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook") Any help ? Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data
Hi all, I am trying to read and summarize a big data frame( >10M records) Here is the sample of my data state,city,x 1,12,100 1,12,100 1,12,200 1,13,200 1,13,100 1,13,100 1,14,200 2,21,200 2,21,200 2,21,100 2,23,100 2,23,200 2,34,200 2,34,100 2,35,100 I want get the total count by state, and the the number of cities by state. The x variable is either 100 or 200 and count each The result should look like as follows. state,city,count,100's,200's 1,3,7,4,3 2,4,8,4,4 At the present I am doing it in several steps and taking too long Is there an efficient way of doing this? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] files
Thank you Sarah, Some of the files are not csv but some are *txt with space delimited. Bdat.txt Bdat123.txt Bdat456.txt How do I do that? On Tue, Nov 29, 2016 at 8:28 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > Something like this: > > filelist <- list.files(pattern="^test") > myfiles <- lapply(filelist, read.csv) > myfiles <- do.call(rbind, myfiles) > > > > On Tue, Nov 29, 2016 at 9:11 PM, Val <valkr...@gmail.com> wrote: >> Hi all, >> >> In one folder I have several files and I want >> combine/concatenate(rbind) based on some condition . >> Here is the sample of the files in one folder >>test.csv >>test123.csv >>test456.csv >>Adat.csv >>Adat123.csv >>Adat456.csv >> >> I want to create 2 files as follows >> >> test_all = rbind(test.csv, test123.csv,test456.csv) >> Adat_al l= rbind(Adat.csv, Adat123.csv,Adat456.csv) >> >> The actual number of of files are many and is there an efficient way >> of doing it? >> >> Thank you >> > > > -- > Sarah Goslee > http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] files
Hi all, In one folder I have several files and I want combine/concatenate(rbind) based on some condition . Here is the sample of the files in one folder test.csv test123.csv test456.csv Adat.csv Adat123.csv Adat456.csv I want to create 2 files as follows test_all = rbind(test.csv, test123.csv,test456.csv) Adat_al l= rbind(Adat.csv, Adat123.csv,Adat456.csv) The actual number of of files are many and is there an efficient way of doing it? Thank you __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read
Hi Jeff and John, Thank you for your response. In each folder, I am expecting a single file name (either dat or dat.csv).v so will this work? Is the following correct? fns <- list.files(mydir) if (is.element(pattern="dat(\\.[^.]+)$",fns )) Thank you again. On Mon, Nov 28, 2016 at 7:20 PM, Jeff Newmillerwrote: > No, and yes, depending what you mean. > > No, because you have to supply the file name to open it... you cannot > directly use wildcards to open files. > > Yes, because the list.files function can be used to match all file names > fitting a regex pattern, and you can use those filenames to open the files. > > E.g. > > fns <- list.files( pattern="dat(\\.[^.]+)$" ) > dtaL <- lapply( fns, function(fn){ read.csv( fn, stringsAsFactors=FALSE ) } ) > > If you only expect one file to be in any given directory, you can skip the > lapply and just read the file, or you can extract the data frame from the > list using dtaL[[ 1 ]]. > > ?list.files > ?regex for help on patterns > -- > Sent from my phone. Please excuse my brevity. > > On November 28, 2016 2:23:23 PM PST, Ashta wrote: >>Hi all, >> >>I have a script that reads a file (dat.csv) from several folders. >>However, in some folders the file name is (dat) with out csv and in >>other folders it is dat.csv. The format of data is the same(only the >>file name differs with and without "csv". >> >>Is it possible to read these files depending on their name in one? >>like read.csv("dat.csv"). How can I read both type of file names? >> >>Thank you in advance >> >>__ >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Variable
Hi all, I am trying to get shell variable(s) into my R script in Linux . How do I get them? my shell script is t1.sh #!bin/bash Name=Alex; export Name Age=25; export Age How do get the Name and Age variables in my R script? My R script is test.R print " Your Name is $Name and you are $Age years old" My another shell script that call the R script is test.sh #!bin/bash source t1.sh Rscript test.R So by running this script ./test.sh I want get: Your Name is Alex and you are 25 years old I can define those variables in R but that is not my intention. Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.