Re: [R] Error message

2024-03-22 Thread Val
Here is the first few bytes,
xxd -l 128 X1.RData
: 8d5a 35f8 1ac5 cc14 a04e be5c 572f a3ad  .Z5..N.\W/..
0010: 6210 7024 9b58 93c7 34d0 acb7 7a82 3f99  b.p$.X..4...z.?.
0020: 66ce 0ebb 2057 ec36 55b4 0ece a036 695a  f... W.6U6iZ
0030: 258b 3493 b661 f620 f7fe ada7 158a 15f7  %.4..a. 
0040: e016 a548 6fcb 20c8 6fb4 493d adc9 ea4a  ...Ho. .o.I=...J
0050: 0a2b b7cf a416 336e 5e4e abc5 9874 7be3  .+3n^N...t{.
0060: 5a5a 3405 fe35 8a3d ad80 0dc0 ca3e ea7a  ZZ4..5.=.>.z
0070: e628 b220 ee50 0b9f 3a81 e971 8a19 4f54  .(. .P..:..q..OT

On Fri, Mar 22, 2024 at 2:36 PM Ivan Krylov  wrote:
>
> В Fri, 22 Mar 2024 14:31:17 -0500
> Val  пишет:
>
> > How do I get the first   few bytes?
>
> What does file.info('X1.RData') say?
>
> Do you get any output if you run print(readBin('X1.RData', raw(), 128))?
>
> If this is happening on a Linux or macOS machine, the operating system
> command xxd -l 128 X1.RData will give the same output in a more
> readable manner, but the readBin(...) output from R should be fine too.
>
> --
> Best regards,
> Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error message

2024-03-22 Thread Val
Yes, X1.RData is large(more than 40M rows) .
How do I get the first   few bytes?

On Fri, Mar 22, 2024 at 2:20 PM Ivan Krylov  wrote:
>
> В Fri, 22 Mar 2024 14:02:09 -0500
> Val  пишет:
>
> > X2.R
> > load("X1.RData")
> >
> > I am getting this error message:
> >  Error in load("X1.RData", :
> >  bad restore file magic number (file may be corrupted)  .. no data
> > loaded.
>
> This error happens very early when R tries to load the file, right
> at the first few bytes. Is "X1.RData" large? Can you share it, or at
> least a hexadecimal dump of the first few hundred bytes?
>
> --
> Best regards,
> Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error message

2024-03-22 Thread Val
Hi all,

I am creating an X1.RData file using the R 4.2.2 library.
   x1.R
save(datafilename, file="X1.RData")

When I am trying to load  this file using another script

X2.R
load("X1.RData")

I am getting this error message:
 Error in load("X1.RData", :
 bad restore file magic number (file may be corrupted)  .. no data  loaded.

I am using the same R library (R 4.2.2)

What would be the cause for this error message and how to fix it?

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiply

2023-08-04 Thread Val
Thank you,  Avi and Ivan.  Worked for this particular Example.

Yes, I am looking for something with a more general purpose.
I think Ivan's suggestion works for this.

multiplication=as.matrix(dat1[,-1]) %*% as.matrix(dat2[match(dat1[,1],
dat2[,1]),-1])
Res=data.frame(ID = dat1[,1], Index = multiplication)

On Fri, Aug 4, 2023 at 10:59 AM  wrote:
>
> Val,
>
> A data.frame is not quite the same thing as a matrix.
>
> But as long as everything is numeric, you can convert both data.frames to
> matrices, perform the computations needed and, if you want, convert it back
> into a data.frame.
>
> BUT it must be all numeric and you violate that requirement by having a
> character column for ID. You need to eliminate that temporarily:
>
> dat1 <- read.table(text="ID, x, y, z
>  A, 10,  34, 12
>  B, 25,  42, 18
>  C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)
>
> mat1 <- as.matrix(dat1[,2:4])
>
> The result is:
>
> > mat1
>   x  y  z
> [1,] 10 34 12
> [2,] 25 42 18
> [3,] 14 20  8
>
> Now do the second matrix, perhaps in one step:
>
> mat2 <- as.matrix(read.table(text="ID, weight, weiht2
>  A,  0.25, 0.35
>  B,  0.42, 0.52
>  C,  0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)[,2:3])
>
>
> Do note some people use read.csv() instead of read.table, albeit it simply
> calls read.table after setting some parameters like the comma.
>
> The result is what you asked for, including spelling weight wrong once.:
>
> > mat2
>  weight weiht2
> [1,]   0.25   0.35
> [2,]   0.42   0.52
> [3,]   0.65   0.75
>
> Now you wanted to multiply as in matrix multiplication.
>
> > mat1 %*% mat2
>  weight weiht2
> [1,]  24.58  30.18
> [2,]  35.59  44.09
> [3,]  17.10  21.30
>
> Of course, you wanted different names for the columns and you can do that
> easily enough:
>
> result <- mat1 %*% mat2
>
> colnames(result) <- c("index1", "index2")
>
>
> But this is missing something:
>
> > result
>  index1 index2
> [1,]  24.58  30.18
> [2,]  35.59  44.09
> [3,]  17.10  21.30
>
> Do you want a column of ID numbers on the left? If numeric, you can keep it
> in a matrix in one of many ways but if you want to go back to the data.frame
> format and re-use the ID numbers, there are again MANY ways. But note mixing
> characters and numbers can inadvertently convert everything to characters.
>
> Here is one solution. Not the only one nor the best one but reasonable:
>
> recombined <- data.frame(index=dat1$ID,
>  index1=result[,1],
>  index2=result[,2])
>
>
> > recombined
>   index index1 index2
> 1 A  24.58  30.18
> 2 B  35.59  44.09
> 3 C  17.10  21.30
>
> If for some reason you need a more general purpose way to do this for
> arbitrary conformant matrices, you can write a function that does this in a
> more general way but perhaps a better idea might be a way to store your
> matrices in files in a way that can be read back in directly or to not
> include indices as character columns but as row names.
>
>
>
>
>
>
> -Original Message-
> From: R-help  On Behalf Of Val
> Sent: Friday, August 4, 2023 10:54 AM
> To: r-help@R-project.org (r-help@r-project.org) 
> Subject: [R] Multiply
>
> Hi all,
>
> I want to multiply two  data frames as shown below,
>
> dat1 <-read.table(text="ID, x, y, z
>  A, 10,  34, 12
>  B, 25,  42, 18
>  C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)
>
> dat2 <-read.table(text="ID, weight, weiht2
>  A,  0.25, 0.35
>  B,  0.42, 0.52
>  C,  0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)
>
> Desired result
>
> ID  Index1 Index2
> 1  A 24.58 30.18
> 2  B 35.59 44.09
> 3  C 17.10 21.30
>
> Here is my attempt,  but did not work
>
> dat3 <- data.frame(ID = dat1[,1], Index = apply(dat1[,-1], 1, FUN=
> function(x) {sum(x*dat2[,2:ncol(dat2)])} ), stringsAsFactors=F)
>
>
> Any help?
>
> Thank you,
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiply

2023-08-04 Thread Val
Hi all,

I want to multiply two  data frames as shown below,

dat1 <-read.table(text="ID, x, y, z
 A, 10,  34, 12
 B, 25,  42, 18
 C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)

dat2 <-read.table(text="ID, weight, weiht2
 A,  0.25, 0.35
 B,  0.42, 0.52
 C,  0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)

Desired result

ID  Index1 Index2
1  A 24.58 30.18
2  B 35.59 44.09
3  C 17.10 21.30

Here is my attempt,  but did not work

dat3 <- data.frame(ID = dat1[,1], Index = apply(dat1[,-1], 1, FUN=
function(x) {sum(x*dat2[,2:ncol(dat2)])} ), stringsAsFactors=F)


Any help?

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlate

2022-08-26 Thread Val
Thank you John for your help and advice.

On Fri, Aug 26, 2022 at 11:04 AM John Fox  wrote:
>
> Dear Val,
>
> On 2022-08-26 10:41 a.m., Val wrote:
> > Hi John and Timothy
> >
> > Thank you for your suggestion and help. Using the sample data, I did
> > carry out a test run and found a difference in the correlation result.
> >
> > Option 1.
> > data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
> >  dat$x1, method = "pearson", use = "complete.obs")
> > resulted
> >   [,1]
> >  x2 -0.5845835
> >  x3 -0.4664220
> >  x4  0.7202837
> >
> > Option 2.
> >   for(i in colnames(dat)){
> >print(cor.test(dat[,i], dat$x1, method = "pearson", use =
> > "complete.obs")$estimate)
> >  }
> > [,1]
> > x2  -0.7362030
> > x3  -0.04935132
> > x4   0.85766290
> >
> > This was crosschecked  using Excel and other softwares and all matches
> > with option 2.
> > One of the factors that contributed for this difference  is loss of
> > information when we are using na.rm(). This is because that if x2 has
> > missing value but x3 and x4 don’t have then  na.rm()  removed  entire
> > row information including x3 and x4.
>
> Yes, I already explained that in my previous message.
>
> As well, cor() is capable of computing pairwise-complete correlations --
> see ?cor.
>
> There's not an obvious right answer here, however. Using
> pairwise-complete correlations can produce inconsistent (i.e.,
> non-positive semi-definite) correlation matrices because correlations
> are computed on different subsets of the data.
>
> There are much better ways to deal with missing data.
>
> >
> > My question is there  a way to extract the number of rows (N)  used in
> > the correlation analysis?.
>
> I'm sure that there are many ways, but here is one that is very
> simple-minded and should be reasonably efficient for ~250 variables:
>
>  > (nc <- ncol(dat))
> [1] 4
>
>  > R <- N <- matrix(NA, nc, nc)
>  > diag(R) <- 1
>  > for (i in 1:(nc - 1)){
> +   for (j in (i + 1):nc){
> + R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
> + N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
> +   }
> + }
>
>  > round(R, 3)
> [,1]   [,2]   [,3]   [,4]
> [1,]  1.000 -0.736 -0.049  0.858
> [2,] -0.736  1.000  0.458 -0.428
> [3,] -0.049  0.458  1.000  0.092
> [4,]  0.858 -0.428  0.092  1.000
>
>  > N
>   [,1] [,2] [,3] [,4]
> [1,]   NA888
> [2,]8   NA88
> [3,]88   NA8
> [4,]888   NA
>
>  > round(cor(dat, use="pairwise.complete.obs"), 3) # check
> x1     x2 x3 x4
> x1  1.000 -0.736 -0.049  0.858
> x2 -0.736  1.000  0.458 -0.428
> x3 -0.049  0.458  1.000  0.092
> x4  0.858 -0.428  0.092  1.000
>
> More generally, I think that it's a good idea to learn a little bit
> about R programming if you intend to use R in your work. You'll then be
> able to solve problems like this yourself.
>
> I hope this helps,
>   John
>
> > Thank you,
> >
> > On Mon, Aug 22, 2022 at 1:00 PM John Fox  wrote:
> >>
> >> Dear Val,
> >>
> >> On 2022-08-22 1:33 p.m., Val wrote:
> >>> For the time being  I am assuming the relationship across  variables
> >>> is linear.  I want get the values first  and detailed examining  of
> >>> the relationship will follow later.
> >>
> >> This seems backwards to me, but I'll refrain from commenting further on
> >> whether what you want to do makes sense and instead address how to do it
> >> (not, BTW, because I disagree with Bert's and Tim's remarks).
> >>
> >> Please see below:
> >>
> >>>
> >>> On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron  
> >>> wrote:
> >>>>
> >>>> I (maybe) agree, but I would go further than that. There are assumptions 
> >>>> associated with the test that are missing. It is not clear that the 
> >>>> relationships are all linear. Regardless of a "significant outcome" all 
> >>>> of the relationships need to be explored in more detail than what is 
> >>>> provided in the correlation test.
> >>>>
> >>>> Multiplicity adjustment as in : 
> >>>> https://www.sciencedirect.com/science/article/pii/S019724561069 is 
> >>>&g

Re: [R] Correlate

2022-08-22 Thread Val
For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.

On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron  wrote:
>
> I (maybe) agree, but I would go further than that. There are assumptions 
> associated with the test that are missing. It is not clear that the 
> relationships are all linear. Regardless of a "significant outcome" all of 
> the relationships need to be explored in more detail than what is provided in 
> the correlation test.
>
> Multiplicity adjustment as in : 
> https://www.sciencedirect.com/science/article/pii/S019724561069 is not an 
> issue that I can see in these data from the information provided. At least 
> not in the same sense as used in the link.
>
> My first guess at the meaning of "multiplicity adjustment" was closer to the 
> experimentwise error rate in a multiple comparison procedure. 
> https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 
> error rate is inflated the more test you do and if you perform enough tests 
> you find significant outcomes by chance alone. There is great significance in 
> the Redskins rule: https://en.wikipedia.org/wiki/Redskins_Rule.
>
> A simple solution is to apply a Bonferroni correction where alpha is divided 
> by the number of comparisons. If there are 250, then 0.05/250 = 0.0002. 
> Another approach is to try to discuss the outcomes in a way that makes sense. 
> What is the connection between a football team's last home game an the 
> election result that would enable me to take another team and apply their 
> last home game result to the outcome of a different election?
>
> Another complication is if variables x2 through x250 are themselves 
> correlated. Not enough information was provided in the problem to know if 
> this is an issue, but 250 orthogonal variables in a real dataset would be a 
> bit unusual considering the experimentwise error rate previously mentioned.
>
> Large datasets can be very messy.
>
>
> Tim
>
> -Original Message-
> From: Bert Gunter 
> Sent: Monday, August 22, 2022 12:07 PM
> To: Ebert,Timothy Aaron 
> Cc: Val ; r-help@R-project.org (r-help@r-project.org) 
> 
> Subject: Re: [R] Correlate
>
> [External Email]
>
> ... But of course the p-values are essentially meaningless without some sort 
> of multiplicity adjustment.
> (search on "multiplicity adjustment" for details). :-(
>
> -- Bert
>
>
> On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron  wrote:
> >
> > A somewhat clunky solution:
> > for(i in colnames(dat)){
> >   print(cor.test(dat[,i], dat$x1, method = "pearson", use = 
> > "complete.obs")$estimate)
> >   print(cor.test(dat[,i], dat$x1, method = "pearson", use =
> > "complete.obs")$p.value) }
> >
> > Rather than printing you could set up an array or list to save the results.
> >
> >
> > Tim
> >
> > -Original Message-
> > From: R-help  On Behalf Of Val
> > Sent: Monday, August 22, 2022 11:09 AM
> > To: r-help@R-project.org (r-help@r-project.org) 
> > Subject: [R] Correlate
> >
> > [External Email]
> >
> > Hi all,
> >
> > I have a data set with  ~250  variables(columns).  I want to calculate
> > the correlation of  one variable with the rest of the other variables
> > and also want  the p-values  for each correlation.  Please see the
> > sample data and my attempt.  I  have got the correlation but unable to
> > get the p-values
> >
> > dat <- read.table(text="x1 x2 x3 x4
> >1.68 -0.96 -1.25  0.61
> >   -0.06  0.41  0.06 -0.96
> >   .0.08  1.14  1.42
> >0.80 -0.67  0.53 -0.68
> >0.23 -0.97 -1.18 -0.78
> >   -1.03  1.11 -0.61.
> >2.15 .0.02  0.66
> >0.35 -0.37 -0.26  0.39
> >   -0.66  0.89   .-1.49
> >0.11  1.52  0.73  -1.03",header=TRUE)
> >
> > #change all to numeric
> > dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
> >
> > data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
> > "pearson", use = "complete.obs")
> >
> > Result
> >   [,1]
> > x2 -0.5845835
> > x3 -0.4664220
> > x4  0.7202837
> >
> > How do I get the p-values ?
> >
> > Thank you,
> >
> > __
> > R-help@r-project.org mailing list -- To

[R] Correlate

2022-08-22 Thread Val
Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
   1.68 -0.96 -1.25  0.61
  -0.06  0.41  0.06 -0.96
  .0.08  1.14  1.42
   0.80 -0.67  0.53 -0.68
   0.23 -0.97 -1.18 -0.78
  -1.03  1.11 -0.61.
   2.15 .0.02  0.66
   0.35 -0.37 -0.26  0.39
  -0.66  0.89   .-1.49
   0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))

data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
  [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row exclude

2022-01-30 Thread Val
Thank you David.

What about if I want to list the excluded rows?
I used this
(dat3 <- dat1[unique(c(BadName, BadAge, BadWeight)), ])

It did not work.The desired output  is,
  Alex,  20,  13X
 John,  3BC, 175
 Jack3, 34,  140

Thank you,

On Sat, Jan 29, 2022 at 10:15 PM David Carlson  wrote:

> It is possible that there would be errors on the same row for different
> columns. This does not happen in your example. If row 4 was "John6, 3BC,
> 175X" then row 4 would be included 3 times, but we only need to remove it
> once. Removing the duplicates is not necessary since R would not get
> confused, but length(unique(c(BadName, BadAge, BadWeight)) indicates how
> many lines are being removed.
>
> David
>
> On Sat, Jan 29, 2022 at 8:32 PM Val  wrote:
>
>> Thank you David for your help. I just have one question on this. What is
>> the purpose of  using the "unique" function on this?   (dat2 <-
>> dat1[-unique(c(BadName, BadAge, BadWeight)), ])   I got the same result
>> without using it. ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> ZjQcmQRYFpfptBannerEnd
>> Thank you David for your help.
>>
>> I just have one question on this. What is the purpose of  using the
>> "unique" function on this?
>>   (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
>>
>> I got the same result without using it.
>>(dat2 <- dat1[-(c(BadName, BadAge, BadWeight)), ])
>>
>> My concern is when I am applying this for the large data set the
>> "unique"  function may consume resources(time  and memory).
>>
>> Thank you.
>>
>> On Sat, Jan 29, 2022 at 12:30 AM David Carlson  wrote:
>>
>>> Given that you know which columns should be numeric and which should be
>>> character, finding characters in numeric columns or numbers in character
>>> columns is not difficult. Your data frame consists of three character
>>> columns so you can use regular expressions as Bert mentioned. First you
>>> should strip the whitespace out of your data:
>>>
>>> dat1 <-read.table(text="Name, Age, Weight
>>>   Alex,  20,  13X
>>>   Bob,  25,  142
>>>   Carol, 24,  120
>>>   John,  3BC,  175
>>>   Katy,  35,  160
>>>   Jack3, 34,  140",sep=",", header=TRUE, stringsAsFactors=FALSE,
>>> strip.white=TRUE)
>>>
>>> Now check to see if all of the fields are character as expected.
>>>
>>> sapply(dat1, typeof)
>>> #Name Age  Weight
>>> # "character" "character" "character"
>>>
>>> Now identify character variables containing numbers and numeric
>>> variables containing characters:
>>>
>>> BadName <- which(grepl("[[:digit:]]", dat1$Name))
>>> BadAge <- which(grepl("[[:alpha:]]", dat1$Age))
>>> BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight))
>>>
>>> Next remove those rows:
>>>
>>> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
>>> #Name Age Weight
>>> #  2   Bob  25142
>>> #  3 Carol  24120
>>> #  5  Katy  35160
>>>
>>> You still need to convert Age and Weight to numeric, e.g. dat2$Age <-
>>> as.numeric(dat2$Age).
>>>
>>> David Carlson
>>>
>>>
>>> On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter 
>>> wrote:
>>>
>>>> As character 'polluted' entries will cause a column to be read in (via
>>>> read.table and relatives) as factor or character data, this sounds like a
>>>> job for regular expressions. If you are not familiar with this subject,
>>>> time to learn. And, yes, ZjQcmQRYFpfptBannerStart
>>>> This Message Is From an External Sender
>>>> This message came from outside your organization.
>>>> ZjQcmQRYFpfptBannerEnd
>>>>
>>>> As character 'polluted' entries will cause a column to be read in (via
>>>> read.table and relatives) as factor or character data, this sounds like a
>>>> job for regular expressions. If you are not familiar with this subject,
>>>> time to learn. And, yes, some heavy lifting will be required.
>>>> See ?regexp for a start maybe? Or the stringr package?
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jan 28, 2022

Re: [R] Row exclude

2022-01-29 Thread Val
Hi  all,
Thank you so much for the useful help and many options that you gave me.
Sorry for the delay response,  I was away for a while

On Sat, Jan 29, 2022 at 3:35 PM Avi Gross via R-help 
wrote:

> Rui has indeed improved my first attempt in several ways so my comments
> are now focused on another level. There is seemingly endless discussion
> here about what is base R. Questions as well as Answers that go beyond base
> R are often challenged and I understand why, even if I personally don't
> worry about it.
>
> As I see it, R has many levels, like many modern programming languages,
> and some are built-in by default, while others are add-ons of various kinds
> and some are now seen as more commonly used than others. Some here, and NOT
> ME, seem particularly annoyed by the concept of the tidyverse existing or
> the corporate nature of RSTUDIO. I say, the more the better as long as they
> are well-designed and robust and efficient enough.
>
> There are many ways you can use R in simple mode to the point where you do
> not even use vectors as intended but use loops to say add corresponding
> entries in two vectors one item at a time using an index, as you might do
> with earlier languages. That is perfectly valid in R, albeit not using the
> language as intended as A+B in R does that for you fairly trivially, albeit
> hiding a kind of loop being done behind the scenes. But if the two vectors
> are not the same length, it can lead to subtle errors if it recycles or
> broadcasts the shorter one as needed UNLESS that was intended.
>
> Like many languages, R has additional modes of a sort. it is very loosely
> Object-Oriented and some solutions to problems may make use of that or
> other features not always found in other languages such as being able to
> attach attributes of arbitrary nature to things. But someone taking a
> beginner course in R, or just using it in simple ways, generally does not
> know or care and being given a possible solution like that may not be very
> helpful.
>
> R is fully a functional programming language and experienced users, like
> Rui clearly is, can make serious use of many paradigms like map/reduce to
> create what often are quite abstract solutions that can be tailored to do
> all kinds of things by simply changing the functions invoked or in this
> case also the data invoked. I was tempted to use a variant of his solution
> using the pmap() function that I am familiar with but it is not base R, but
> part of the "purr" package which is in the not-appreciated-here package of
> packages called the tidyverse, LOL!
>
> Pmap can take an arbitrary data.frame and look at it one row at a time and
> apply a function that sees all the columns. That function can be written so
> it applies your logic to each column entry for that row that you wish and
> combines the calculations to return something like TRUE/FALSE. In this
> case, it could be code connecting use of a regular expression on each
> column entry combined by the usual logical connectives like AND and NOT
> (using R notation) to return a TRUE or FALSE that pmap then combines into a
> vector and you use that to index the data.frame to keep only valid rows.
> BUT, I reconsidered using it here as it is a tad advanced and not pure R.
> Nor do I claim it is better than what Rui and others could come up with. It
> is just not as simple as the case we are looking at.
>
> R has another facet that needs to be used carefully that significantly
> alters some approaches as compared to a language like Python which has a
> much nicer object-oriented set of tools but does not have some of the
> delayed evaluation R supports and that sometimes get in the way as some
> people expect them to be evaluated sooner, or at all. I see strengths and
> weaknesses and try to use a language suited for my needs that also uses it
> mostly as intended.
>
> I also ask if we have met the needs of the person who asked this question.
> If they do not reply and merely REPOST the same question with a shorter
> subject-line, then I suggest we all wasted our time trying. Proper
> etiquette, I might think, is to reply to some work show by others IN PUBLIC
> and especially to explain anything being asked by us and to let us know
> what worked for them or met their needs or show a portion of what code they
> finally implemented. Some of that may yet happen, but can anyone blame me
> for being a tad suspicious this time?
>
> I tend to be interested in deeper discussions and many are outside the
> scope of this forum. So I acknowledge that discussing alternate methods
> including more abstract ones using functional programming or other tricks,
> is a bit outside what is expected here.
>
> I want though to add one more idea. Can we agree that the user may have a
> more general concept to be considered here. That is the concept of having a
> data.frame where each column is purely numeric consisting of just 0 through
> 9 with perhaps no spaces, periods or commas or 

Re: [R] Linear

2022-01-26 Thread Val
str(dat2)
data.frame': 37654 obs. ...:
 $ Yld: int
 $ A   : int
 $ B   : chr
 $ C   : chr

On Wed, Jan 26, 2022 at 10:49 AM Bert Gunter  wrote:

> What does str(dat2) give?
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Wed, Jan 26, 2022 at 7:37 AM Val  wrote:
> >
> > Hi all,
> >
> > I am trying to get the lsmeans for one of the factors  fitted in the
> > following model
> >
> > Model1 = lm(Yld ~ A + B + C, data = dat2)
> > M_lsm =  as.data.frame(lsmeans(Model1, "C")),
> >
> > My problem is, I am getting this error message.
> > "Error: The rows of your requested reference grid would be 81412, which
> > exceeds the limit of 1 (not including any multivariate responses)".
> >
> > How do I fix this?
> >
> > Thank you
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Linear

2022-01-26 Thread Val
Hi all,

I am trying to get the lsmeans for one of the factors  fitted in the
following model

Model1 = lm(Yld ~ A + B + C, data = dat2)
M_lsm =  as.data.frame(lsmeans(Model1, "C")),

My problem is, I am getting this error message.
"Error: The rows of your requested reference grid would be 81412, which
exceeds the limit of 1 (not including any multivariate responses)".

How do I fix this?

Thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Date

2021-11-05 Thread Val
Thank you All.
The issue was not reading different file. I just mistyped the column
name, instead of typing My_date I typed mydate in the email.  The
problem is solved by using this
dat=read.csv("myfile.csv",stringsAsFactors=FALS)
suggested by Jim.

On Thu, Nov 4, 2021 at 7:58 PM Jeff Newmiller  wrote:
>
> Then you are looking at a different file... check your filenames. You have 
> imported the column as character, and R has not yet recognized that it is 
> supposed to be a date, so it can only show what it found.
>
> You will almost certainly find your error if you make a reproducible example.
>
> On November 4, 2021 5:30:22 PM PDT, Val  wrote:
> >Jeff,
> >
> >The date from y data file looks like as follow in the Linux environment,
> >My_date
> >2019-09-16
> >2021-02-21
> >2021-02-22
> >2017-10-11
> >2017-10-10
> >2018-11-11
> >2017-10-27
> >2017-10-30
> >2019-05-20
> >
> >On Thu, Nov 4, 2021 at 5:00 PM Jeff Newmiller  
> >wrote:
> >>
> >> You are claiming behavior that is not something R does, but is something 
> >> Excel does constantly.
> >>
> >> Compare what your data file looks like using a text editor with what R has 
> >> imported. Absolutely do not use a spreadsheet program to do this.
> >>
> >> On November 4, 2021 2:43:25 PM PDT, Val  wrote:
> >> >IHi All, l,
> >> >
> >> >I am  reading a csv file  and one of the columns is named as  "mydate"
> >> > with this form, 2019-09-16.
> >> >
> >> >I am reading this file as
> >> >
> >> >dat=read.csv("myfile.csv")
> >> > the structure of the data looks like as follow
> >> >
> >> >str(dat)
> >> >mydate : chr  "09/16/2019" "02/21/2021" "02/22/2021" "10/11/2017" ...
> >> >
> >> >Please note the format  has  changed from -mm-dd  to mm/dd/
> >> >When I tried to change this   as a Date using
> >> >
> >> >as.Date(as.Date(mydate, format="%m/%d/%Y" )
> >> >I am getting this error message
> >> >Error in charToDate(x) :
> >> >  characte string is not in a standard unambiguous format
> >> >
> >> >My question is,
> >> >1. how can I read the file as it is (i.e., without changing the date 
> >> >format) ?
> >> >2. why does R change the date format?
> >> >
> >> >Thank you,
> >> >
> >> >__
> >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >> >PLEASE do read the posting guide 
> >> >http://www.R-project.org/posting-guide.html
> >> >and provide commented, minimal, self-contained, reproducible code.
> >>
> >> --
> >> Sent from my phone. Please excuse my brevity.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Date

2021-11-04 Thread Val
Jeff,

The date from y data file looks like as follow in the Linux environment,
My_date
2019-09-16
2021-02-21
2021-02-22
2017-10-11
2017-10-10
2018-11-11
2017-10-27
2017-10-30
2019-05-20

On Thu, Nov 4, 2021 at 5:00 PM Jeff Newmiller  wrote:
>
> You are claiming behavior that is not something R does, but is something 
> Excel does constantly.
>
> Compare what your data file looks like using a text editor with what R has 
> imported. Absolutely do not use a spreadsheet program to do this.
>
> On November 4, 2021 2:43:25 PM PDT, Val  wrote:
> >IHi All, l,
> >
> >I am  reading a csv file  and one of the columns is named as  "mydate"
> > with this form, 2019-09-16.
> >
> >I am reading this file as
> >
> >dat=read.csv("myfile.csv")
> > the structure of the data looks like as follow
> >
> >str(dat)
> >mydate : chr  "09/16/2019" "02/21/2021" "02/22/2021" "10/11/2017" ...
> >
> >Please note the format  has  changed from -mm-dd  to mm/dd/
> >When I tried to change this   as a Date using
> >
> >as.Date(as.Date(mydate, format="%m/%d/%Y" )
> >I am getting this error message
> >Error in charToDate(x) :
> >  characte string is not in a standard unambiguous format
> >
> >My question is,
> >1. how can I read the file as it is (i.e., without changing the date format) 
> >?
> >2. why does R change the date format?
> >
> >Thank you,
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Date

2021-11-04 Thread Val
IHi All, l,

I am  reading a csv file  and one of the columns is named as  "mydate"
 with this form, 2019-09-16.

I am reading this file as

dat=read.csv("myfile.csv")
 the structure of the data looks like as follow

str(dat)
mydate : chr  "09/16/2019" "02/21/2021" "02/22/2021" "10/11/2017" ...

Please note the format  has  changed from -mm-dd  to mm/dd/
When I tried to change this   as a Date using

as.Date(as.Date(mydate, format="%m/%d/%Y" )
I am getting this error message
Error in charToDate(x) :
  characte string is not in a standard unambiguous format

My question is,
1. how can I read the file as it is (i.e., without changing the date format) ?
2. why does R change the date format?

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group

2021-11-01 Thread Val
Thank you all for your help!

On Mon, Nov 1, 2021 at 8:47 PM Bert Gunter  wrote:
>
> ... maybe not. According to Rdocumentation.org:
>
> reshape2's status is:
>
> reshape2 is retired: only changes necessary to keep it on CRAN will be
> made. We recommend using tidyr <http://tidyr.tidyverse.org/> instead.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Nov 1, 2021 at 5:55 PM Rasmus Liland  wrote:
>
> > Dear Val,
> >
> > also consider using reshape2::dcast
> >
> > dat <- structure(list(Year = c(2001L,
> > 2001L, 2001L, 2001L, 2001L, 2001L,
> > 2002L, 2002L, 2002L, 2002L, 2002L,
> > 2002L, 2003L, 2003L, 2003L, 2003L,
> > 2003L, 2003L), Sex = c("M", "M", "M",
> > "F", "F", "F", "M", "M", "M", "F", "F",
> > "F", "M", "M", "M", "F", "F", "F"), wt =
> > c(15L, 14L, 16L, 12L, 11L, 13L, 14L,
> > 18L, 17L, 11L, 15L, 14L, 18L, 13L, 14L,
> > 15L, 10L, 11L)), class = "data.frame",
> > row.names = c(NA, -18L))
> >
> > reshape2::dcast(data=dat,
> > formula=Year~Sex,
> > value.var="wt",
> > fun.aggregate=mean)
> >
> > yielding
> >
> >   YearFM
> > 1 2001 12.0 15.0
> > 2 2002 13.3 16.3
> > 3 2003 12.0 15.0
> >
> > Best,
> > Rasmus
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by group

2021-11-01 Thread Val
Thank you Avi,

One question, I am getting this  error from this script

> dat %>%
+   +   group_by(Year, Sex) %>%
+   +   summarize( M = mean(wt, na.rm=TRUE)) %>%
+   +   pivot_wider(names_from = Sex, values_from = M) %>%
+   +   as.data.frame %>%
+   +   round(1)
Error in group_by(Year, Sex) : object 'Year' not found
Why I am getting this?



On Mon, Nov 1, 2021 at 7:07 PM Avi Gross via R-help
 wrote:
>
> Understood Val. So you need to save the output in something like a data.frame 
> which can then be saved as a CSV file or whatever else makes sense to be read 
> in by a later program. As note by() does not produce the output in a usable 
> way.
>
> But you mentioned efficient, and that is another whole ball of wax. For small 
> amounts of data it may not matter much. And some processes may look slower 
> but turn out be more efficient if compiled as C/C++ or ...
>
> Sometimes it might be more efficient to change the format of your data before 
> the analysis, albeit if the output is much smaller, maybe best later.
>
> Good luck.
>
> -Original Message-
> From: Val 
> Sent: Monday, November 1, 2021 7:44 PM
> To: Avi Gross 
> Cc: r-help mailing list 
> Subject: Re: [R] by group
>
> Thank you all!
> I can assure you that this is not  HW. This is a sample of my large data set 
> and I want a simple  and efficient approach to get the
> desired  output   in that particular format.  That file will be saved
> and used  as an input file for another external process.
>
> val
>
>
>
>
>
>
>
> On Mon, Nov 1, 2021 at 6:08 PM Avi Gross via R-help  
> wrote:
> >
> > Jim,
> >
> > Your code gives the output in quite a different format and as an
> > object of class "by" that is not easily convertible to a data.frame.
> > So, yes, it is an answer that produces the right numbers but not in
> > the places or data structures I think they (or if it is HW ...) wanted.
> >
> > Trivial standard cases are often handled by a single step but more
> > complex ones often suggest a multi-part approach.
> >
> > Of course Val gets to decide what approach works best for them within
> > whatever constraints we here are not made aware of. If this is a class
> > assignment, it likely would be using only tools discussed in the
> > class. So I would not suggest using a dplyr/tidyverse approach if that
> > is not covered or even part of a class. If this is a project in the
> > real world, it becomes a matter of programming taste and convenience and so 
> > on.
> >
> > Maybe Val can share more about the situation so we can see what is
> > helpful and what is not. Realistically, I can think of way too many
> > ways to get the required output.
> >
> > -Original Message-
> > From: R-help  On Behalf Of Jim Lemon
> > Sent: Monday, November 1, 2021 6:25 PM
> > To: Val ; r-help mailing list
> > 
> > Subject: Re: [R] by group
> >
> > Hi Val,
> > I think you answered your own question:
> >
> > by(dat$wt,dat[,c("Sex","Year")],mean)
> >
> > Jim
> >
> > On Tue, Nov 2, 2021 at 8:09 AM Val  wrote:
> > >
> > > Hi All,
> > >
> > > How can I generate mean by group. The sample data looks like as
> > > follow, dat<-read.table(text="Year Sex wt
> > > 2001 M 15
> > > 2001 M 14
> > > 2001 M 16
> > > 2001 F 12
> > > 2001 F 11
> > > 2001 F 13
> > > 2002 M 14
> > > 2002 M 18
> > > 2002 M 17
> > > 2002 F 11
> > > 2002 F 15
> > > 2002 F 14
> > > 2003 M 18
> > > 2003 M 13
> > > 2003 M 14
> > > 2003 F 15
> > > 2003 F 10
> > > 2003 F 11  ",header=TRUE)
> > >
> > > The desired  output  is,
> > >  MF
> > > 20011512
> > > 200216.33   13.33
> > > 200315  12
> > >
> > > Thank you,
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the 

Re: [R] by group

2021-11-01 Thread Val
Thank you all!
I can assure you that this is not  HW. This is a sample of my large
data set and I want a simple  and efficient approach to get the
desired  output   in that particular format.  That file will be saved
and used  as an input file for another external process.

val







On Mon, Nov 1, 2021 at 6:08 PM Avi Gross via R-help
 wrote:
>
> Jim,
>
> Your code gives the output in quite a different format and as an object of
> class "by" that is not easily convertible to a data.frame. So, yes, it is an
> answer that produces the right numbers but not in the places or data
> structures I think they (or if it is HW ...) wanted.
>
> Trivial standard cases are often handled by a single step but more complex
> ones often suggest a multi-part approach.
>
> Of course Val gets to decide what approach works best for them within
> whatever constraints we here are not made aware of. If this is a class
> assignment, it likely would be using only tools discussed in the class. So I
> would not suggest using a dplyr/tidyverse approach if that is not covered or
> even part of a class. If this is a project in the real world, it becomes a
> matter of programming taste and convenience and so on.
>
> Maybe Val can share more about the situation so we can see what is helpful
> and what is not. Realistically, I can think of way too many ways to get the
> required output.
>
> -Original Message-
> From: R-help  On Behalf Of Jim Lemon
> Sent: Monday, November 1, 2021 6:25 PM
> To: Val ; r-help mailing list 
> Subject: Re: [R] by group
>
> Hi Val,
> I think you answered your own question:
>
> by(dat$wt,dat[,c("Sex","Year")],mean)
>
> Jim
>
> On Tue, Nov 2, 2021 at 8:09 AM Val  wrote:
> >
> > Hi All,
> >
> > How can I generate mean by group. The sample data looks like as
> > follow, dat<-read.table(text="Year Sex wt
> > 2001 M 15
> > 2001 M 14
> > 2001 M 16
> > 2001 F 12
> > 2001 F 11
> > 2001 F 13
> > 2002 M 14
> > 2002 M 18
> > 2002 M 17
> > 2002 F 11
> > 2002 F 15
> > 2002 F 14
> > 2003 M 18
> > 2003 M 13
> > 2003 M 14
> > 2003 F 15
> > 2003 F 10
> > 2003 F 11  ",header=TRUE)
> >
> > The desired  output  is,
> >  MF
> > 20011512
> > 200216.33   13.33
> > 200315  12
> >
> > Thank you,
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] by group

2021-11-01 Thread Val
Hi All,

How can I generate mean by group. The sample data looks like as follow,
dat<-read.table(text="Year Sex wt
2001 M 15
2001 M 14
2001 M 16
2001 F 12
2001 F 11
2001 F 13
2002 M 14
2002 M 18
2002 M 17
2002 F 11
2002 F 15
2002 F 14
2003 M 18
2003 M 13
2003 M 14
2003 F 15
2003 F 10
2003 F 11  ",header=TRUE)

The desired  output  is,
 MF
20011512
200216.33   13.33
200315  12

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read

2021-02-22 Thread Val
Let us take the max space is two and the output should not be fixed
filed but preferable a csv file.

On Mon, Feb 22, 2021 at 8:05 PM jim holtman  wrote:
>
> Messed up did not see your 'desired' output which will be hard since there is 
> not a consistent number of spaces that would represent the desired column 
> number.  Do you have any hit as to how to interpret the spacing especially 
> you have several hundred more lines?  Is the output supposed to the 'fixed' 
> field?
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Mon, Feb 22, 2021 at 5:00 PM jim holtman  wrote:
>>
>> Try this:
>>
>> > library(tidyverse)
>>
>> > text <-  "x1  x2  x3 x4\n1 B12 \n2   C23 \n322 B32  D34 \n4
>> > D44 \n51 D53\n60 D62 "
>>
>> > # read in the data as characters and replace multiple blanks with single 
>> > blank
>> > input <- read_lines(text)
>>
>> > input <- str_replace_all(input, ' +', ' ')
>>
>> > mydata <- read_delim(input, ' ', col_names = TRUE)
>> Warning: 5 parsing failures.
>> row col  expectedactual file
>>   1  -- 4 columns 3 columns literal data
>>   2  -- 4 columns 3 columns literal data
>>   4  -- 4 columns 3 columns literal data
>>   5  -- 4 columns 2 columns literal data
>>   6  -- 4 columns 3 columns literal data
>>
>> > mydata
>> # A tibble: 6 x 4
>>  x1 x2x3x4
>>  
>> 1 1 B12   NANA
>> 2 2 C23   NANA
>> 3   322 B32   D34   NA
>> 4 4 D44   NANA
>> 551 D53   NANA
>> 660 D62   NANA
>> >
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> On Mon, Feb 22, 2021 at 4:49 PM Val  wrote:
>>>
>>> That is my problem. The spacing between columns is not consistent.  It
>>>   may be  single space  or multiple spaces (two or three).
>>>
>>> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap  
>>> wrote:
>>> >
>>> > You said the column values were separated by space characters.
>>> > Copying the text from gmail shows that some column names and column
>>> > values are separated by single spaces (e.g., between x1 and x2) and
>>> > some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
>>> > up the spacing or is there some other way to tell where the omitted
>>> > values are?
>>> >
>>> > -Bill
>>> >
>>> > On Mon, Feb 22, 2021 at 2:54 PM Val  wrote:
>>> > >
>>> > > I Tried that one and it did not work. Please see the error message
>>> > > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2   C23
>>> > > \n322 B32  D34 \n4D44 \n51 D53\n60 D62 ",
>>> > > :
>>> > >   more columns than column names
>>> > >
>>> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap  
>>> > > wrote:
>>> > > >
>>> > > > Since the columns in the file are separated by a space character, " ",
>>> > > > add the read.table argument sep=" ".
>>> > > >
>>> > > > -Bill
>>> > > >
>>> > > > On Mon, Feb 22, 2021 at 2:21 PM Val  wrote:
>>> > > > >
>>> > > > > Hi all, I am trying to read a messy data  but facing  difficulty.  
>>> > > > > The
>>> > > > > data has several columns separated by blank space(s).  Each column
>>> > > > > value may have different lengths across the rows.   The first
>>> > > > > row(header) has four columns. However, each row may not have the 
>>> > > > > four
>>> > > > > column values.  For instance, the first data row has only the first
>>> > > > > two column values. The fourth data row has the first and last column
>>> > > > > values, the second and the third column values are missing for this
>>> > > > > row..  How do I read this data set correct

Re: [R] Read

2021-02-22 Thread Val
That is my problem. The spacing between columns is not consistent.  It
  may be  single space  or multiple spaces (two or three).

On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap  wrote:
>
> You said the column values were separated by space characters.
> Copying the text from gmail shows that some column names and column
> values are separated by single spaces (e.g., between x1 and x2) and
> some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
> up the spacing or is there some other way to tell where the omitted
> values are?
>
> -Bill
>
> On Mon, Feb 22, 2021 at 2:54 PM Val  wrote:
> >
> > I Tried that one and it did not work. Please see the error message
> > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2   C23
> > \n322 B32  D34 \n4D44 \n51 D53\n60 D62 ",
> > :
> >   more columns than column names
> >
> > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap  
> > wrote:
> > >
> > > Since the columns in the file are separated by a space character, " ",
> > > add the read.table argument sep=" ".
> > >
> > > -Bill
> > >
> > > On Mon, Feb 22, 2021 at 2:21 PM Val  wrote:
> > > >
> > > > Hi all, I am trying to read a messy data  but facing  difficulty.  The
> > > > data has several columns separated by blank space(s).  Each column
> > > > value may have different lengths across the rows.   The first
> > > > row(header) has four columns. However, each row may not have the four
> > > > column values.  For instance, the first data row has only the first
> > > > two column values. The fourth data row has the first and last column
> > > > values, the second and the third column values are missing for this
> > > > row..  How do I read this data set correctly? Here is my sample data
> > > > set, output and desired output.   To make it clear to each data point
> > > > I have added the row and column numbers. I cannot use fixed width
> > > > format reading because each row  may have different length for  a
> > > > given column.
> > > >
> > > > dat<-read.table(text="x1  x2  x3 x4
> > > > 1 B22
> > > > 2 C33
> > > > 322 B22  D34
> > > > 4 D44
> > > > 51 D53
> > > > 60 D62",header=T, fill=T,na.strings=c("","NA"))
> > > >
> > > > Output
> > > >   x1  x2 x3 x4
> > > > 1   1 B12  NA
> > > > 2   2C23   NA
> > > > 3 322  B32  D34   NA
> > > > 4   4   D44NA
> > > > 5  51 D53 NA
> > > > 6  60 D62NA
> > > >
> > > >
> > > > Desired output
> > > >x1   x2 x3   x4
> > > > 1   1B22   NA
> > > > 2   2 C33 NA
> > > > 3 322  B32NA  D34
> > > > 4   4  NA  D44
> > > > 5  51D53 NA
> > > > 6  60   D62  NA
> > > >
> > > > Thank you,
> > > >
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide 
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read

2021-02-22 Thread Val
I Tried that one and it did not work. Please see the error message
Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2   C23
\n322 B32  D34 \n4D44 \n51 D53\n60 D62 ",
:
  more columns than column names

On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap  wrote:
>
> Since the columns in the file are separated by a space character, " ",
> add the read.table argument sep=" ".
>
> -Bill
>
> On Mon, Feb 22, 2021 at 2:21 PM Val  wrote:
> >
> > Hi all, I am trying to read a messy data  but facing  difficulty.  The
> > data has several columns separated by blank space(s).  Each column
> > value may have different lengths across the rows.   The first
> > row(header) has four columns. However, each row may not have the four
> > column values.  For instance, the first data row has only the first
> > two column values. The fourth data row has the first and last column
> > values, the second and the third column values are missing for this
> > row..  How do I read this data set correctly? Here is my sample data
> > set, output and desired output.   To make it clear to each data point
> > I have added the row and column numbers. I cannot use fixed width
> > format reading because each row  may have different length for  a
> > given column.
> >
> > dat<-read.table(text="x1  x2  x3 x4
> > 1 B22
> > 2 C33
> > 322 B22  D34
> > 4 D44
> > 51 D53
> > 60 D62",header=T, fill=T,na.strings=c("","NA"))
> >
> > Output
> >   x1  x2 x3 x4
> > 1   1 B12  NA
> > 2   2C23   NA
> > 3 322  B32  D34   NA
> > 4   4   D44NA
> > 5  51 D53 NA
> > 6  60 D62NA
> >
> >
> > Desired output
> >x1   x2 x3   x4
> > 1   1B22   NA
> > 2   2 C33 NA
> > 3 322  B32NA  D34
> > 4   4  NA  D44
> > 5  51D53 NA
> > 6  60   D62  NA
> >
> > Thank you,
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Read

2021-02-22 Thread Val
Hi all, I am trying to read a messy data  but facing  difficulty.  The
data has several columns separated by blank space(s).  Each column
value may have different lengths across the rows.   The first
row(header) has four columns. However, each row may not have the four
column values.  For instance, the first data row has only the first
two column values. The fourth data row has the first and last column
values, the second and the third column values are missing for this
row..  How do I read this data set correctly? Here is my sample data
set, output and desired output.   To make it clear to each data point
I have added the row and column numbers. I cannot use fixed width
format reading because each row  may have different length for  a
given column.

dat<-read.table(text="x1  x2  x3 x4
1 B22
2 C33
322 B22  D34
4 D44
51 D53
60 D62",header=T, fill=T,na.strings=c("","NA"))

Output
  x1  x2 x3 x4
1   1 B12  NA
2   2C23   NA
3 322  B32  D34   NA
4   4   D44NA
5  51 D53 NA
6  60 D62NA


Desired output
   x1   x2 x3   x4
1   1B22   NA
2   2 C33 NA
3 322  B32NA  D34
4   4  NA  D44
5  51D53 NA
6  60   D62  NA

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Undesired result

2021-02-17 Thread Val
Very helpful and thank you so much!


On Wed, Feb 17, 2021 at 12:50 PM Duncan Murdoch
 wrote:
>
> On 17/02/2021 9:50 a.m., Val wrote:
> > HI All,
> >
> > I am reading a data file which has different date formats. I wanted to
> > standardize to one format and used  a library anytime but got
> > undesired results as shown below. It gave me year 2093 instead of 1993
> >
> >
> > library(anytime)
> > DFX<-read.table(text="name ddate
> >A  19-10-02
> >D  11/19/2006
> >F  9/9/2011
> >G1  12/29/2010
> >AA   10/18/93 ",header=TRUE)
> >  getFormats()
> >  addFormats(c("%d-%m-%y"))
> >  addFormats(c("%m-%d-%y"))
> >  addFormats(c("%Y/%d/%m"))
> >  addFormats(c("%m/%d/%y"))
> >
> > DFX$anew=anydate(DFX$ddate)
> >
> > Output
> >   name  ddate   anew
> > 1A   19-10-02 2002-10-19
> > 2D 11/19/2006 2020-11-19
> > 3F   9/9/2011 2011-09-09
> > 4   G1 12/29/2010 2020-12-29
> > 5   AA   10/18/93 2093-10-18
> >
> > The problem is in the last row. It should be  1993-10-18 instead of 
> > 2093-10-18
> >
> > How do I correct this?
>
> This looks a little tricky.  The basic idea is that the %y format has to
> guess at the century, but the guess depends on things specific to your
> system.  So what would be nice is to say "two digit dates should be
> assumed to fall between 1922 and 2021", but there's no way to do that
> directly.
>
> What you could do is recognize when you have a two digit year, and then
> force the result into the range you want.  Here's a function that does
> that, but it's not really tested much at all, so be careful if you use
> it.  (One thing:  I recommend the 'useR = TRUE' option to anydate(); it
> worked better in my tests than the default.)
>
> adjustCentury <- function(inputString,
>outputDate = anydate(inputString, useR = TRUE),
>start = "1922-01-01") {
>
>start <- as.Date(start)
>
>twodigityear <- !grepl("[[:digit:]]{4}", inputString)
>
>while (length(bad <- which(twodigityear & outputDate < start))) {
>  for (i in bad) {
>longdate <- as.POSIXlt(outputDate[i])
>longdate$year <- longdate$year + 100
>outputDate[i] <- as.Date(longdate)
>  }
>}
>longdate <- as.POSIXlt(start)
>longdate$year <- longdate$year + 100
>finish <- as.Date(longdate)
>
>while (length(bad <- which(twodigityear & outputDate >= finish))) {
>  for (i in bad) {
>longdate <- as.POSIXlt(outputDate[i])
>longdate$year <- longdate$year - 100
>outputDate[i] <- as.Date(longdate)
>  }
>}
>outputDate
> }
>
> library(anytime)
> DFX<-read.table(text="name ddate
>A  19-10-02
>D  11/19/2006
>F  9/9/2011
>G1  12/29/2010
>AA   10/18/93
>BB   10/18/1893
>CC   10/18/2093",header=TRUE)
>
> addFormats(c("%d-%m-%y"))
> addFormats(c("%m-%d-%y"))
> addFormats(c("%Y/%d/%m"))
> addFormats(c("%m/%d/%y"))
>
> DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01")
> DFX
> #>   name  ddate   anew
> #> 1A   19-10-02 2019-10-02
> #> 2D 11/19/2006 2006-11-19
> #> 3F   9/9/2011 2011-09-09
> #> 4   G1 12/29/2010 2010-12-29
> #> 5   AA   10/18/93 1993-10-18
> #> 6   BB 10/18/1893 1893-10-18
> #> 7   CC 10/18/2093 2093-10-18

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Undesired result

2021-02-17 Thread Val
HI All,

I am reading a data file which has different date formats. I wanted to
standardize to one format and used  a library anytime but got
undesired results as shown below. It gave me year 2093 instead of 1993


library(anytime)
DFX<-read.table(text="name ddate
  A  19-10-02
  D  11/19/2006
  F  9/9/2011
  G1  12/29/2010
  AA   10/18/93 ",header=TRUE)
getFormats()
addFormats(c("%d-%m-%y"))
addFormats(c("%m-%d-%y"))
addFormats(c("%Y/%d/%m"))
addFormats(c("%m/%d/%y"))

DFX$anew=anydate(DFX$ddate)

Output
 name  ddate   anew
1A   19-10-02 2002-10-19
2D 11/19/2006 2020-11-19
3F   9/9/2011 2011-09-09
4   G1 12/29/2010 2020-12-29
5   AA   10/18/93 2093-10-18

The problem is in the last row. It should be  1993-10-18 instead of 2093-10-18

How do I correct this?
Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] create

2021-01-27 Thread Val
Hi all, I have a sample of data as shown below,

 dt <-read.table(text="name Item check
 A  DESK  NORF
 B  RANGE   GARRA
 C  CLOCKPALM
 D  DESK  RR
 E  ALARMDESPRF
 H  DESK   RF
 K  DESK  CORR
 K  WARF CORR
 G  NONE  RF ",header=TRUE, fill=T)

I want create  another  column (flag2) and assign a value  0 or 1
if the check column values are  within  code2 list  and Item is DESK
then flag2 =1 otherwise 0

code2=c("RR","RF")
index2=grep(paste(code2,collapse="|"),dt$check)

dt$flag2=0
dt$flag2[index2]=1
How can I add the second condition?


Desired output  is  shown below
 name Itemcheckflag2
1A   DESK NORF  0
2B   RANGE  GARRA   0
3C   CLOCK  PALM  0
4D  DESK  RR  1
5E  ALARM   DESPRF  0
6H  DESK  RF   1
7K DESK  CORR  0
8K WARF CORR  0
9G NONE  RF   0

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split

2020-09-23 Thread Val
Thank you again for your help  and giving me the opportunity to choose
the efficient method.  For a small data set there is no discernable
difference between the different approaches.  I will carry out a
comparison using  the large data set.


On Wed, Sep 23, 2020 at 11:52 AM LMH  wrote:
>
> Below is a script in bash the uses the awk tokenizer to do the work.
>
> This assumes that your input and output delimiter is space. The number of 
> consecutive delimiters in
> the input is not important. This also assumes that the input file does not 
> have a header row. That
> is easy to modify if you want. I always keep header rows in my data files as 
> I think that removing
> them is asking for trouble down the road.
>
> I added a NULL for cases where there is no value for the last field. You 
> could use "." if you want.
>
> You should be able to find how to run this from inside R if you want. You 
> will, of course, need a
> bash environment to run this, so if you are not in linux you will need cygwin 
> or something similar.
>
> This should be very fast, but let me know if needs to be faster. If the X1_X2 
> variant occurs less
> frequently than not then we should switch the order in which the logic 
> evaluates the options.
>
> LMH
>
>
> #! /bin/bash
>
> # input filename
> input_file=$1
>
> # output filename
> output_file=$2
>
> # make sure the input file exists
> if [ ! -f $input_file ]; then
>echo $input_file "  cannot be found"
>exit 0
> fi
>
> # create the output file
> touch $output_file
>
> # make sure the output was created
> if [ ! -f $output_file ]; then
>echo $output_file "  was not created"
>exit 0
> fi
>
> # write the header row
> echo "ID1 ID2 Y1 X1 X2" >> $output_file
>
> # character to find in the third token
> look_for='_'
>
> # process with awk
> # if the 3rd token contains '_'
> #   split the third token on '_' into F[1] and F[2]
> #   print the first two tokens, the indicator value of 1, and the split 
> fields F[1] and F[2]
> # otherwise,
> #   print the first two tokens, the indicator value of 0, the 3rd token, and 
> NULL
>
> cat $input_file | \
> awk -v find_char=$look_for '{ if($3 ~ find_char) { { split ($3, F, "_") }
>        { print $1, $2, "1", F[1], 
> F[2] }
>  }
>   else { print $1, $2, "0", $3, "NULL" }
> }' >> $output_file
>
>
>
>
>
>
>
> Val wrote:
> > Thank you all for the help!
> >
> > LMH, Yes I would like to see the alternative.  I am using this for a
> > large data set and if the  alternative is more efficient than this
> > then I would be happy.
> >
> > On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter  wrote:
> >>
> >> To be clear, I think Rui's solution is perfectly fine and probably better 
> >> than what I offer below. But just for fun, I wanted to do it without the 
> >> lapply().  Here is one way. I think my comments suffice to explain.
> >>
> >>> ## which are the  non "_" indices?
> >>> wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
> >>> ## paste "_." to these
> >>> F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_")
> >>> ## Now strsplit() and unlist() them to get a vector
> >>> z <- unlist(strsplit(F1$text, "_"))
> >>> ## now cbind() to the data frame
> >>> F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
> >>> F1
> >>   ID1 ID2   text1  2
> >> 1  A1  B1 NONE_. NONE  .
> >> 2  A1  B1  cf_12   cf 12
> >> 3  A1  B1 NONE_. NONE  .
> >> 4  A2  B2  X2_25   X2 25
> >> 5  A2  B3  fd_15   fd 15
> >>> ## You can change the names of the 2 columns yourself
> >>
> >> Cheers,
> >> Bert
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along and 
> >> sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas  wrote:
> >>>
> >>> Hello,
> >>>
> >>> A base R solution with strsplit, like in your code.
> >>>
> >>> F1$Y1 <- +grepl("_", F1$text)
> >>>
> >>>

Re: [R] Split

2020-09-22 Thread Val
Thank you all for the help!

LMH, Yes I would like to see the alternative.  I am using this for a
large data set and if the  alternative is more efficient than this
then I would be happy.

On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter  wrote:
>
> To be clear, I think Rui's solution is perfectly fine and probably better 
> than what I offer below. But just for fun, I wanted to do it without the 
> lapply().  Here is one way. I think my comments suffice to explain.
>
> > ## which are the  non "_" indices?
> > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE)
> > ## paste "_." to these
> > F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_")
> > ## Now strsplit() and unlist() them to get a vector
> > z <- unlist(strsplit(F1$text, "_"))
> > ## now cbind() to the data frame
> > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE))
> > F1
>   ID1 ID2   text1  2
> 1  A1  B1 NONE_. NONE  .
> 2  A1  B1  cf_12   cf 12
> 3  A1  B1 NONE_. NONE  .
> 4  A2  B2  X2_25   X2 25
> 5  A2  B3  fd_15   fd 15
> >## You can change the names of the 2 columns yourself
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas  wrote:
>>
>> Hello,
>>
>> A base R solution with strsplit, like in your code.
>>
>> F1$Y1 <- +grepl("_", F1$text)
>>
>> tmp <- strsplit(as.character(F1$text), "_")
>> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x)
>> tmp <- do.call(rbind, tmp)
>> colnames(tmp) <- c("X1", "X2")
>> F1 <- cbind(F1[-3], tmp)# remove the original column
>> rm(tmp)
>>
>> F1
>> #  ID1 ID2 Y1   X1 X2
>> #1  A1  B1  0 NONE  .
>> #2  A1  B1  1   cf 12
>> #3  A1  B1  0 NONE  .
>> #4  A2  B2  1   X2 25
>> #5  A2  B3  1   fd 15
>>
>>
>> Note that cbind dispatches on F1, an object of class "data.frame".
>> Therefore it's the method cbind.data.frame that is called and the result
>> is also a df, though tmp is a "matrix".
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> Às 20:07 de 22/09/20, Rui Barradas escreveu:
>> > Hello,
>> >
>> > Something like this?
>> >
>> >
>> > F1$Y1 <- +grepl("_", F1$text)
>> > F1 <- F1[c(1, 2, 4, 3)]
>> > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill =
>> > "right")
>> > F1
>> >
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>> >
>> > Às 19:55 de 22/09/20, Val escreveu:
>> >> HI All,
>> >>
>> >> I am trying to create   new columns based on another column string
>> >> content. First I want to identify rows that contain a particular
>> >> string.  If it contains, I want to split the string and create two
>> >> variables.
>> >>
>> >> Here is my sample of data.
>> >> F1<-read.table(text="ID1  ID2  text
>> >> A1 B1   NONE
>> >> A1 B1   cf_12
>> >> A1 B1   NONE
>> >> A2 B2   X2_25
>> >> A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
>> >> If the variable "text" contains this "_" I want to create an indicator
>> >> variable as shown below
>> >>
>> >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0)
>> >>
>> >>
>> >> Then I want to split that string in to two, before "_" and after "_"
>> >> and create two variables as shown below
>> >> x1= strsplit(as.character(F1$text),'_',2)
>> >>
>> >> My problem is how to combine this with the original data frame. The
>> >> desired  output is shown   below,
>> >>
>> >>
>> >> ID1 ID2  Y1   X1X2
>> >> A1  B10   NONE   .
>> >> A1  B1   1cf12
>> >> A1  B1   0  NONE   .
>> >> A2  B2   1X225
>> >> A2  B3   1fd15
>> >>
>> >> Any help?
>> >> Thank you.
>> >>
>> >> __
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split

2020-09-22 Thread Val
HI All,

I am trying to create   new columns based on another column string
content. First I want to identify rows that contain a particular
string.  If it contains, I want to split the string and create two
variables.

Here is my sample of data.
F1<-read.table(text="ID1  ID2  text
A1 B1   NONE
A1 B1   cf_12
A1 B1   NONE
A2 B2   X2_25
A2 B3   fd_15  ",header=TRUE,stringsAsFactors=F)
If the variable "text" contains this "_" I want to create an indicator
variable as shown below

F1$Y1 <- ifelse(grepl("_", F1$text),1,0)


Then I want to split that string in to two, before "_" and after "_"
and create two variables as shown below
x1= strsplit(as.character(F1$text),'_',2)

My problem is how to combine this with the original data frame. The
desired  output is shown   below,


ID1 ID2  Y1   X1X2
A1  B10   NONE   .
A1  B1   1cf12
A1  B1   0  NONE   .
A2  B2   1X225
A2  B3   1fd15

Any help?
Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] date

2020-09-21 Thread Val
Hi All,

I am trying to sort dates within a group. My sample data is

df <-read.table(text="ID date
A1   09/17/04
A1   01/27/05
A1   05/07/03
A2   05/21/17
A2   09/12/16
A3   01/25/13
A4   09/27/19",header=TRUE,stringsAsFactors=F)
df$date2 = as.Date(strptime(df$date,format="%m/%d/%y"))
df$date =NULL

I want to sort  date2  from recent to oldest.  within the ID group and
I used this,
df <- df[order(df$ID, rev((df$date2))),]. It did not work and teh
output is  shown below.

ID  date2
2 A1 2005-01-27
3 A1 2003-05-07
1 A1 2004-09-17
5 A2 2016-09-12
4 A2 2017-05-21
6 A3 2013-01-25
7 A4 2019-09-27
What am I missing?
Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sort

2020-05-14 Thread Val
HI All,
I have a sample of data frame
DF1<-read.table(text="name ddate
  A  2019-10-28
  A  2018-01-25
  A  2020-01-12
  A  2017-10-20
  B  2020-11-20
  B  2019-10-20
  B  2017-05-20
  B  2020-01-20
  c  2009-10-01  ",header=TRUE)

1. I want sort by name and ddate on decreasing order and the output
should like as follow
   A  2020-01-12
   A  2019-01-12
   A  2018-01-25
   A  2017-10-20
   B  2020-11-21
  B  2020-11-01
  B  2019-10-20
  B  2017-05-20
  c  2009-10-01

2.  Take the top two rows by group( names) and the out put should like
   A  2020-01-12
   A  2019-01-12
   B  2020-11-21
   B  2020-11-01
c  2009-10-01

3.  Within each group (name) get the date difference  between the
first and second rows dates. If a group has only one row then the
difference should be 0

The final out put is
Name diff
   A  365
B  20
C  0

Here is my attempt and have an issue at the sorting
DF1$DTime <- as.POSIXct(DF1$ddate , format = "%Y-%m-%d")
DF2 <- DF1[order(DF1$name, ((as.Date(DF1$DTime, decreasing = TRUE, ]

not working
Any help?

Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Label

2020-04-03 Thread Val
Thank you Jim,

Is it possible to format the label box? The labels(numbers) are
surrounded by a big square and wanted to remove it. I just want
display only the number.  I searched up the documentation  for
"barlabels" and there is no such example

barlabels(xpos,ypos,labels=NULL,cex=1,prop=0.5,miny=0,offset=0,...)

Thank you.

On Thu, Apr 2, 2020 at 9:38 PM Jim Lemon  wrote:
>
> Hi Val,
>
> library(plotrix)
> barpos<-barplot(dat$count, names.arg=c("A", "B", "C","D"),
>  col="blue",
>  ylim = c(0,30),
>  ylab = "Count",
>      xlab = "Grade")
> barlabels(barpos,dat$count,prop=1)
>
> Jim
>
> On Fri, Apr 3, 2020 at 1:31 PM Val  wrote:
> >
> > Hi all,
> >
> > I have a sample of data set,
> >
> > dat <- read.table(header=TRUE, text='Lab count
> > A 24
> > B 19
> > C 30
> > D 18')
> >
> > barplot(dat$count, names.arg=c("A", "B", "C","D"),
> > col="blue",
> > ylim = c(0,30),
> > ylab = "Count",
> > xlab = "Grade")
> >
> > I want add the number of counts at the top of each bar plot. How can I do 
> > that?
> > Thank you in advance
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Label

2020-04-02 Thread Val
Hi all,

I have a sample of data set,

dat <- read.table(header=TRUE, text='Lab count
A 24
B 19
C 30
D 18')

barplot(dat$count, names.arg=c("A", "B", "C","D"),
col="blue",
ylim = c(0,30),
ylab = "Count",
xlab = "Grade")

I want add the number of counts at the top of each bar plot. How can I do that?
Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixed format

2020-01-24 Thread Val
Thank you all for your help.
My data has  mixed  format such as
%m/%d/%y,%d/%m/%y,%m-%d-%y,%d-%m-%y etc. and
the library (anytime) handles it very well!!

Thank you again.


On Tue, Jan 21, 2020 at 5:28 AM Rui Barradas  wrote:
>
> Hello,
>
> Inline.
>
> Às 09:22 de 21/01/20, Chris Evans escreveu:
> > I think that might risk giving the wrong date for a date like 1/3/1990 
> > which I think in Val's data is mdy data not dmy.
> >
> > As I read the data, where the separator is "/" the format is mdy and where 
> > the separator is "-" it's dmy.
>
> Maybe you're right. But I really don't know, in my country (Portugal) we
> use "/" and dmy. Anyway, what's important is that the OP must have a
> much better understanding of the data, the way it is posted is likely to
> cause errors. See, for instance, the expected output with numbers
> greater than 12 in the 1st and 2nd places, depending on the row.
>
>
> So I would
> > go for:
> >
> > library(lubridate)
> > DFX$dnew[grep("-", DFX$ddate, fixed = TRUE)] <- dmy(DFX$ddate[grep("-", 
> > DFX$ddate, fixed = TRUE)])
> > DFX$dnew[grep("/", DFX$ddate, fixed = TRUE)] <- mdy(DFX$ddate[grep("/", 
> > DFX$ddate, fixed = TRUE)])
> > DFX <- DFX[!is.na(DFX$dnew),]
> > DFX
> >
> >name  ddate   dnew
> > 1A   19-10-02 2002-10-19
> > 2B   22-11-20 2020-11-22
> > 3C   19-01-15 2015-01-19
> > 4D 11/19/2006 2006-11-19
> > 5F   9/9/2011 2011-09-09
> > 6G 12/29/2010 2010-12-29
> >
> > But I am so much in awe of Rui's skills with R, and that of most of the 
> > regular commentators here, that I submit
> > this a little nervously!
>
> Thanks!
>
> Rui Barradas
> >
> > Many thanks to all who teach me so much here, lovely, if I am correct, to 
> > contribute for a change!
> >
> > Chris
> >
> >
> > - Original Message -
> >> From: "Rui Barradas" 
> >> To: "Val" , "r-help@R-project.org 
> >> (r-help@r-project.org)" 
> >> Sent: Tuesday, 21 January, 2020 00:40:29
> >> Subject: Re: [R] Mixed format
> >
> >> Hello,
> >>
> >> The following strategy works with your data.
> >> It uses the fact that most dates are in one of 3 formats, dmy, mdy, ymd.
> >> It tries those formats one by one, after each try looks for NA's in the
> >> new column.
> >>
> >>
> >> # first round, format is dmy
> >> DFX$dnew <- lubridate::dmy(DFX$ddate)
> >> na <- is.na(DFX$dnew)
> >>
> >> # second round, format is mdy
> >> DFX$dnew[na] <- lubridate::mdy(DFX$ddate[na])
> >> na <- is.na(DFX$dnew)
> >>
> >> # last round, format is ymd
> >> DFX$dnew[na] <- lubridate::ymd(DFX$ddate[na])
> >>
> >> # remove what didn't fit any format
> >> DFX <- DFX[!is.na(DFX$dnew), ]
> >> DFX
> >>
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >> Às 22:58 de 20/01/20, Val escreveu:
> >>> Hi All,
> >>>
> >>> I have a data frame where one column is  a mixed date format,
> >>> a date in the form "%m-%d-%y"  and "%m/%d/%Y", also some are not in date 
> >>> format.
> >>>
> >>> Is there a way to delete the rows that contain non-dates  and
> >>> standardize the dates in one date format like  %m-%d-%Y?
> >>> Please see my  sample data and desired output
> >>>
> >>> DFX<-read.table(text="name ddate
> >>> A  19-10-02
> >>> B  22-11-20u
> >>> C  19-01-15
> >>> D  11/19/2006
> >>> F  9/9/2011
> >>> G  12/29/2010
> >>> H  DEX",header=TRUE)
> >>>
> >>> Desired output
> >>> name ddate
> >>> A  19-10-2002
> >>> B  22-11-2020
> >>> C  19-01-2015
> >>> D  11-19-2006
> >>> F  09-09-2011
> >>> G  12-29-2010
> >>>
> >>> Thank you
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mixed format

2020-01-20 Thread Val
Hi All,

I have a data frame where one column is  a mixed date format,
a date in the form "%m-%d-%y"  and "%m/%d/%Y", also some are not in date format.

Is there a way to delete the rows that contain non-dates  and
standardize the dates in one date format like  %m-%d-%Y?
Please see my  sample data and desired output

DFX<-read.table(text="name ddate
  A  19-10-02
  B  22-11-20
  C  19-01-15
  D  11/19/2006
  F  9/9/2011
  G  12/29/2010
  H  DEX",header=TRUE)

Desired output
name ddate
A  19-10-2002
B  22-11-2020
C  19-01-2015
D  11-19-2006
F  09-09-2011
G  12-29-2010

Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] date

2019-12-17 Thread Val
Hi All,

I wanted to to convert character date  mm/dd/yy  to -mm-dd
The sample data and my attempt is shown below

gs <-read.table(text="ID date
A1   09/27/03
A2   05/27/16
A3   01/25/13
A4   09/27/19",header=TRUE,stringsAsFactors=F)

Desired output
  ID date  d1
 A1 09/27/03 2003-09-27
 A2 05/27/16 2016-05-27
 A3 01/25/13 2012-04-25
 A4 09/27/19 2019-09-27

I used this
gs$d1 = as.Date(as.character(gs$date), format = "%Y-%m-%d")

but I got NA's.

How do I get my desired result?
Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditions

2019-11-26 Thread Val
HI All, I am having a little issue in my ifelse statement,
The data frame looks like as follow.

dat2 <-read.table(text="ID  d1 d2 d3
A 0 25 35
B 12 22  0
C 0  0  31
E 10 20 30
F 0  0   0",header=TRUE,stringsAsFactors=F)
I want to create d4 and set the value based on the following conditions.
If d1  !=0  then d4=d1
if d1 = 0  and d2 !=0  then d4=d2
if (d1  and d2 = 0) and d3  !=0 then d4=d3
if all d1, d2 and d3 =0 then d4=0

Here is the desired output and my attempt
 ID d1 d2 d3 d4
  A  0 25 35  25
  B 12 22  0  12
  C  0  0 31   31
  E 10 20 30  10
  F  0  0  0  0  0

My attempt
dat2$d4 <-  0
dat2$d4  <- ifelse((dat2$d1 =="0"), dat2$d2, ifelse(dat2$d2 == "0"), dat2$d3, 0)
but not working.

Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] File conca.

2019-11-05 Thread Val
Thank you Petr and Jeff fro your suggestions.

I made some improvement but  still need some tweaking.  I could not
get correctly the folders names added to each row. Only the last
forename was added.
table(Alldata$oldername) resulted
   week2
25500

Please see the complete,


folders=c("week1","week2")
for(i in folders){
  path=paste("\data\"", i , sep = "")
  wd <-  setwd(path)
  Flist = list.files(path,pattern = "^WT")
  dataA =  lapply(Flist, function(x)read.csv(x, header=T))
  setwd(wd)
  temp = do.call("rbind", Alldata)
  temp$foldername <- i
  Alldata <- temp
  Alldata <- rbind(Alldata, temp)
}
###
Any suggestion please?


On Tue, Nov 5, 2019 at 2:13 AM PIKAL Petr  wrote:
>
> Hi
>
> Help with such operations is rather tricky as only you know exact structrure
> of your folders.
>
> see some hints in line
>
> > -Original Message-
> > From: R-help  On Behalf Of Val
> > Sent: Tuesday, November 5, 2019 4:33 AM
> > To: r-help@R-project.org (r-help@r-project.org) 
> > Subject: [R] File conca.
> >
> > Hi All,
> >
> > I have data files in several folders and want combine all  these files in
> one
> > file.  In each folder  there are several files  and these
> > files have the same structure but different names.   First, in each
> > folder  I want to concatenate(rbind) all files in to one file. While I am
> > reading each files and concatenating (rbind) all files, I want to added
> the
> > folder name as one variable  in each row. I am reading the folder names
> > from a file and for demonstration I am using only two folders  as shown
> > below.
> > Data\week1 # folder name 1
> >WT13.csv
> >WT26.csv   ...
> >WT10.csv
> > Data\week2#folder name 2
> >WT02.csv
> >WT12.csv
> >
> > Below please find  my attempt,
> >
> > folders=c("week1","week2")
> > for(i in folders){
> >   path=paste("\data\"", i , sep = "")
> >   setwd(path)
>
> you should use
> wd <- setwd(path)
>
> which keeps the original directory for subsequent use
>
> >   Flist = list.files(path,pattern = "^WT")
> >   dataA =  lapply(Flist, function(x)read.csv(x, header=T))
> >   Alldata = do.call("rbind", dataA) # combine all files
> >   Alldata$foldername=i  # adding the folder name
> >
>
> now you can do
>
> setwd(wd)
>
> to return to original directory
> }
>
> > The above works for  for one folder but how can I do it for more than one
> > folders?
>
> You also need to decide if you want all data from all folders in one object
> called Alldata or if you want several Alldata objects, one for each folder.
>
> In second case you could use list structure for Alldata. In the first case
> you could store data from each folder in some temporary object and use rbind
> directly.
>
> something like
>
> temp <- do.call("rbind", dataA)
> temp$foldername <- i
>
> Alldata <- temp
> in the first cycle
> and
> Alldata <- rbind(Alldata, temp)
> in second and all others.
>
> Or you could initiate first Alldata manually and use only
> Alldata <- rbind(Alldata, temp)
>
> in your loop.
>
> Cheers
> Petr
>
> >
> > Thank you in advance,
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] File conca.

2019-11-04 Thread Val
Hi All,

I have data files in several folders and want combine all  these files
in one file.  In each folder  there are several files  and these
files have the same structure but different names.   First, in each
folder  I want to concatenate(rbind) all files in to one file. While I
am  reading each files and concatenating (rbind) all files, I want to
added  the folder name as one variable  in each row. I am reading the
folder names  from a file and for demonstration I am using only two
folders  as shown below.
Data\week1 # folder name 1
   WT13.csv
   WT26.csv   ...
   WT10.csv
Data\week2#folder name 2
   WT02.csv
   WT12.csv

Below please find  my attempt,

folders=c("week1","week2")
for(i in folders){
  path=paste("\data\"", i , sep = "")
  setwd(path)
  Flist = list.files(path,pattern = "^WT")
  dataA =  lapply(Flist, function(x)read.csv(x, header=T))
  Alldata = do.call("rbind", dataA) # combine all files
  Alldata$foldername=i  # adding the folder name
}
The above works for  for one folder but how can I do it for more than
one folders?

Thank you in advance,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] If statement

2019-09-12 Thread Val
Hi all,

I am trying to use the  if else statement and create  two new columns
based on the existing two columns.  Below please find my sample data,

dat1 <-read.table(text="ID  a b c d
A private couple  25 35
B private single  24 38
C none  single28 32
E none none 20 36 ",header=TRUE,stringsAsFactors=F)

dat1$z <- "Zero"
dat1$y <-  0

if a is "private" and (b is either "couple" rr "single"
then  z value = a's value   and y value = c's value
if a is "none" and  ( b is either couple of single then  z= private
  then  z value =b's value  qnd  y value= d's value
else z value= Zero and y value=0

the desired out put looks like
ID  a  b  c d z   y
1  A private couple 25 35 private 25
2  B private single 24 38 private 24
3  Cnone single 28 32 single  32
4  Enone   none 20 36 Zero0

my attempt

if (dat1$a =="private"  &  (dat1$b =="couple"| dat1$b =="single"))
{
  dat1$z  <-   dat1$a
  dat1$y  <-   dat1$c
}

else if (dat1$a =="none"  &  (dat1$b =="couple"| dat1$b =="single")) {
dat1$z  <-   dat1$b
dat1$y  <-   dat1$c
  }
else
{ default value}
did not wok, how could I fix this?
Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] new_index

2019-09-07 Thread Val
Thank you Jeff and all.   I wish to go back to my student life.
ID  is not  necessary in  dat2, sorry for that.

On Sat, Sep 7, 2019 at 5:10 PM Jeff Newmiller  wrote:
>
> Val has been posting to this list for almost a decade [1] so seems unlikely 
> to be a student... but in all this time has yet to figure out how to post in 
> plain text to avoid corruption of code on this plain text mailing list. The 
> ability to generate small examples has improved, though execution still seems 
> hazy. Why is there an ID column in dat2 at all?
>
> Try
>
> dat3 <- dat1[ 1,, drop=FALSE ]
> dat3$Index <- as.matrix( dat1[ -1 ] ) %*% dat2$weight
>
> [1] https://stat.ethz.ch/pipermail/r-help/2010-March/233533.html
>
> On September 7, 2019 12:38:12 PM PDT, Bert Gunter  
> wrote:
> >dat1 is wrong also. It should read:
> >
> >dat1 <-read.table(text="ID, x, y, z
> >  A, 10,  34, 12
> >  B, 25,  42, 18
> >   C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)
> >
> >Is this a homework problem?  This list has a no homework policy.
> >
> >Cheers,
> >Bert
> >
> >Bert Gunter
> >
> >"The trouble with having an open mind is that people keep coming along
> >and
> >sticking things into it."
> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> >On Sat, Sep 7, 2019 at 12:24 PM Val  wrote:
> >
> >> Hi  all
> >>
> >> Correction for my previous posting.
> >> dat2 should be read as
> >> dat2 <-read.table(text="ID, weight
> >> A,  0.25
> >> B,  0.42
> >> C,  0.65 ",sep=",",header=TRUE,stringsAsFactors=F)
> >>
> >> On Sat, Sep 7, 2019 at 1:46 PM Val  wrote:
> >> >
> >> > Hi All,
> >> >
> >> > I have two data frames   with thousand  rows  and several columns.
> >My
> >> > samples of the data frames are shown below
> >> >
> >> > dat1 <-read.table(text="ID, x, y, z
> >> > ID , x, y, z
> >> > A, 10,  34, 12
> >> > B, 25,  42, 18
> >> > C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)
> >> >
> >> > dat2 <-read.table(text="ID, x, y, z
> >> > ID, weight
> >> > A,  0.25
> >> > B,  0.42
> >> > C,  0.65 ",sep=",",header=TRUE,stringsAsFactors=F)
> >> >
> >> > My goal is to  create an index value  for each ID  by mutliplying
> >the
> >> > first row of dat1 by the second  column of dat2.
> >> >
> >> >   (10*0.25 ) + (34*0.42) + (12*0.65)=  24.58
> >> >   (25*0.25 ) + (42*0.42) + (18*0.65)=  35.59
> >> >   (14*0.25 ) + (20*0.42) + (  8*0.65)=  19.03
> >> >
> >> > The  desired out put is
> >> > dat3
> >> > ID, Index
> >> > A 24.58
> >> > B  35.59
> >> > C  19.03
> >> >
> >> > How do I do it in an efficent way?
> >> >
> >> > Thank you,
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] new_index

2019-09-07 Thread Val
Hi  all

Correction for my previous posting.
dat2 should be read as
dat2 <-read.table(text="ID, weight
A,  0.25
B,  0.42
C,  0.65 ",sep=",",header=TRUE,stringsAsFactors=F)

On Sat, Sep 7, 2019 at 1:46 PM Val  wrote:
>
> Hi All,
>
> I have two data frames   with thousand  rows  and several columns. My
> samples of the data frames are shown below
>
> dat1 <-read.table(text="ID, x, y, z
> ID , x, y, z
> A, 10,  34, 12
> B, 25,  42, 18
> C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)
>
> dat2 <-read.table(text="ID, x, y, z
> ID, weight
> A,  0.25
> B,  0.42
> C,  0.65 ",sep=",",header=TRUE,stringsAsFactors=F)
>
> My goal is to  create an index value  for each ID  by mutliplying the
> first row of dat1 by the second  column of dat2.
>
>   (10*0.25 ) + (34*0.42) + (12*0.65)=  24.58
>   (25*0.25 ) + (42*0.42) + (18*0.65)=  35.59
>   (14*0.25 ) + (20*0.42) + (  8*0.65)=  19.03
>
> The  desired out put is
> dat3
> ID, Index
> A 24.58
> B  35.59
> C  19.03
>
> How do I do it in an efficent way?
>
> Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] new_index

2019-09-07 Thread Val
Hi All,

I have two data frames   with thousand  rows  and several columns. My
samples of the data frames are shown below

dat1 <-read.table(text="ID, x, y, z
ID , x, y, z
A, 10,  34, 12
B, 25,  42, 18
C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)

dat2 <-read.table(text="ID, x, y, z
ID, weight
A,  0.25
B,  0.42
C,  0.65 ",sep=",",header=TRUE,stringsAsFactors=F)

My goal is to  create an index value  for each ID  by mutliplying the
first row of dat1 by the second  column of dat2.

  (10*0.25 ) + (34*0.42) + (12*0.65)=  24.58
  (25*0.25 ) + (42*0.42) + (18*0.65)=  35.59
  (14*0.25 ) + (20*0.42) + (  8*0.65)=  19.03

The  desired out put is
dat3
ID, Index
A 24.58
B  35.59
C  19.03

How do I do it in an efficent way?

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read

2019-08-09 Thread Val
Thank you Jeff! That was so easy command.

On Thu, Aug 8, 2019 at 11:06 PM Bert Gunter  wrote:
>
> I stand corrected!
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Aug 8, 2019 at 7:11 PM Jeff Newmiller  
> wrote:
>>
>> Val 1
>> Bert 0
>>
>> On August 8, 2019 5:22:13 PM PDT, Bert Gunter  wrote:
>> >read.table() does not have a "text" argument, so maybe you need to go
>> >back
>> >and go through a tutorial or two to learn R basics (e.g. about function
>> >calls and function arguments ?)
>> >See ?read.table  (of course)
>> >
>> >Cheers,
>> >
>> >Bert Gunter
>> >
>> >"The trouble with having an open mind is that people keep coming along
>> >and
>> >sticking things into it."
>> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >
>> >
>> >On Thu, Aug 8, 2019 at 5:11 PM Val  wrote:
>> >
>> >> Hi all,
>> >>
>> >> I am trying to red data where single and double quotes are embedded
>> >> in some of the fields and prevented to read the data.   As an example
>> >> please see below.
>> >>
>> >> vld<-read.table(text="name prof
>> >>   A  '4.5
>> >>   B   "3.2
>> >>   C   5.5 ",header=TRUE)
>> >>
>> >> Error in read.table(text = "name prof \n  A  '4.5\n  B
>> >> 3.2 \n  C   5.5 ",  :
>> >>   incomplete final line found by readTableHeader on 'text'
>> >>
>> >> Is there a way how to  read this data and gt the following output
>> >>   name prof
>> >> 1A  4.5
>> >> 2B  3.2
>> >> 3C  5.5
>> >>
>> >> Thank you inadvertence
>> >>
>> >> __
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> >__
>> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read

2019-08-08 Thread Val
Thank you  all, I can read the text file but the problem was there is
a single quote embedded  in  the first row of second column. This
quote causes the problem

vld<-read.table(text="name prof
  A  '4.5
  B   "3.2
  C   5.5 ",header=TRUE)

On Thu, Aug 8, 2019 at 7:24 PM Anaanthan Pillai
 wrote:
>
> data <- read.table(header=TRUE, text='
>  name prof
>   A  4.5
>   B  3.2
>   C  5.5
>  ')
> > On 9 Aug 2019, at 8:11 AM, Val  wrote:
> >
> > Hi all,
> >
> > I am trying to red data where single and double quotes are embedded
> > in some of the fields and prevented to read the data.   As an example
> > please see below.
> >
> > vld<-read.table(text="name prof
> >  A  '4.5
> >  B   "3.2
> >  C   5.5 ",header=TRUE)
> >
> > Error in read.table(text = "name prof \n  A  '4.5\n  B
> > 3.2 \n  C   5.5 ",  :
> >  incomplete final line found by readTableHeader on 'text'
> >
> > Is there a way how to  read this data and gt the following output
> >  name prof
> > 1A  4.5
> > 2B  3.2
> > 3C  5.5
> >
> > Thank you inadvertence
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read

2019-08-08 Thread Val
Hi all,

I am trying to red data where single and double quotes are embedded
in some of the fields and prevented to read the data.   As an example
please see below.

vld<-read.table(text="name prof
  A  '4.5
  B   "3.2
  C   5.5 ",header=TRUE)

Error in read.table(text = "name prof \n  A  '4.5\n  B
3.2 \n  C   5.5 ",  :
  incomplete final line found by readTableHeader on 'text'

Is there a way how to  read this data and gt the following output
  name prof
1A  4.5
2B  3.2
3C  5.5

Thank you inadvertence

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create

2019-04-13 Thread Val
Sorry for the confusion, my sample data  does not represent  the
actual data set.

The range of  value can be  from -ve to +ve values and 0 could be a
true value of an observation. So, instead of replacing  missing value
by zero, I want  exclude them  from the  calculation.

On Sat, Apr 13, 2019 at 10:42 PM Jeff Newmiller
 wrote:
>
> Looks to me like your initial request contradicts your clarification. Can you 
> explain this discrepancy?
>
> On April 13, 2019 8:29:59 PM PDT, Val  wrote:
> >Hi Bert and Jim,
> >Thank you for the suggestion.
> >However, those missing values should not be replaced by 0's.
> >I want exclude those missing values from the calculation and create
> >the index using only the non-missing values.
> >
> >
> >On Sat, Apr 13, 2019 at 10:14 PM Jim Lemon 
> >wrote:
> >>
> >> Hi Val,
> >> For this particular problem, you can just replace NAs with zeros.
> >>
> >> vdat[is.na(vdat)]<-0
> >> vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3)
> >> vdat
> >>  obs Year x1 x2 x3  xy
> >> 1   1 2001 25 10 10 130
> >> 2   2 2001  0 15 25 150
> >> 3   3 2001 50 10  0 150
> >> 4   4 2001 20  0 60 220
> >>
> >> Note that this is not a general solution to the problem of NA values.
> >>
> >> Jim
> >>
> >> On Sun, Apr 14, 2019 at 12:54 PM Val  wrote:
> >> >
> >> > Hi All,
> >> > I have a data frame  with several  columns  and I want to  create
> >> > another  column  by using  the values of the other columns.  My
> >> > problem is that some the row values  for some columns  have missing
> >> > values  and I could not get  the result I waned .
> >> >
> >> > Here is the sample of my data and my attempt.
> >> >
> >> > vdat<-read.table(text="obs, Year, x1, x2, x3
> >> > 1,  2001, 25 ,10, 10
> >> > 2,  2001,  ,  15, 25
> >> > 3,  2001,  50, 10,
> >> > 4,  2001,  20, , 60",sep=",",header=TRUE,stringsAsFactors=F)
> >> > vdat$xy <- 0
> >> > vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3)
> >> > vdat
> >> >
> >> >  obs Year x1 x2 x3  xy
> >> > 1   1 2001 25 10 10 130
> >> > 2   2 2001 NA 15 25  NA
> >> > 3   3 2001 50 10 NA  NA
> >> > 4   4 2001 20 NA 60  NA
> >> >
> >> > The desired result si this,
> >> >
> >> >obs Year x1 x2 x3   xy
> >> > 1   1 2001 25 10 10   130
> >> > 2   2 2001 NA 15 25  150
> >> > 3   3 2001 50 10 NA  150
> >> > 4   4 2001 20 NA 60  220
> >> >
> >> > How do I get my desired result?
> >> > Thank you
> >> >
> >> > __
> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create

2019-04-13 Thread Val
Hi Bert and Jim,
Thank you for the suggestion.
However, those missing values should not be replaced by 0's.
I want exclude those missing values from the calculation and create
the index using only the non-missing values.


On Sat, Apr 13, 2019 at 10:14 PM Jim Lemon  wrote:
>
> Hi Val,
> For this particular problem, you can just replace NAs with zeros.
>
> vdat[is.na(vdat)]<-0
> vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3)
> vdat
>  obs Year x1 x2 x3  xy
> 1   1 2001 25 10 10 130
> 2   2 2001  0 15 25 150
> 3   3 2001 50 10  0 150
> 4   4 2001 20  0 60 220
>
> Note that this is not a general solution to the problem of NA values.
>
> Jim
>
> On Sun, Apr 14, 2019 at 12:54 PM Val  wrote:
> >
> > Hi All,
> > I have a data frame  with several  columns  and I want to  create
> > another  column  by using  the values of the other columns.  My
> > problem is that some the row values  for some columns  have missing
> > values  and I could not get  the result I waned .
> >
> > Here is the sample of my data and my attempt.
> >
> > vdat<-read.table(text="obs, Year, x1, x2, x3
> > 1,  2001, 25 ,10, 10
> > 2,  2001,  ,  15, 25
> > 3,  2001,  50, 10,
> > 4,  2001,  20, , 60",sep=",",header=TRUE,stringsAsFactors=F)
> > vdat$xy <- 0
> > vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3)
> > vdat
> >
> >  obs Year x1 x2 x3  xy
> > 1   1 2001 25 10 10 130
> > 2   2 2001 NA 15 25  NA
> > 3   3 2001 50 10 NA  NA
> > 4   4 2001 20 NA 60  NA
> >
> > The desired result si this,
> >
> >obs Year x1 x2 x3   xy
> > 1   1 2001 25 10 10   130
> > 2   2 2001 NA 15 25  150
> > 3   3 2001 50 10 NA  150
> > 4   4 2001 20 NA 60  220
> >
> > How do I get my desired result?
> > Thank you
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] create

2019-04-13 Thread Val
Hi All,
I have a data frame  with several  columns  and I want to  create
another  column  by using  the values of the other columns.  My
problem is that some the row values  for some columns  have missing
values  and I could not get  the result I waned .

Here is the sample of my data and my attempt.

vdat<-read.table(text="obs, Year, x1, x2, x3
1,  2001, 25 ,10, 10
2,  2001,  ,  15, 25
3,  2001,  50, 10,
4,  2001,  20, , 60",sep=",",header=TRUE,stringsAsFactors=F)
vdat$xy <- 0
vdat$xy <- 2*(vdat$x1) + 5*(vdat$x2) + 3*(vdat$x3)
vdat

 obs Year x1 x2 x3  xy
1   1 2001 25 10 10 130
2   2 2001 NA 15 25  NA
3   3 2001 50 10 NA  NA
4   4 2001 20 NA 60  NA

The desired result si this,

   obs Year x1 x2 x3   xy
1   1 2001 25 10 10   130
2   2 2001 NA 15 25  150
3   3 2001 50 10 NA  150
4   4 2001 20 NA 60  220

How do I get my desired result?
Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select

2019-02-11 Thread Val
Thank you very much Jeff, Goran and David  for your help.


On Mon, Feb 11, 2019 at 6:22 PM Jeff Newmiller  wrote:
>
> N <- 8 # however many times you want to do this
> ans <- lapply( seq.int( N )
>   , function( n ) {
>   idx <- sample( nrow( mydat ) )
>   mydat[ idx[ seq.int( which( 40 < cumsum( mydat[ idx, 
> "count" ] ) )[ 1 ] ) ], ]
> }
>   )
>
>
> On Mon, 11 Feb 2019, Val wrote:
>
> > Sorry Jeff and David  for not being clear!
> >
> > The total sample size should be at least 40, but the selection should
> > be based on group ID.  A different combination of Group ID could give
> > at least  40.
> > If I select  group G1   with 25  count and  G2  and with 15  counts
> > then   I can get  a minimum of 40  counts.   So G1 and G2 are
> > selected.
> > G1  25
> > G2  15
> >
> > In another scenario, if G2, G3 and G4  are  selected  then the total
> > count will be 58 which is  greater than 40. So G2 , G3 and G4  could
> > be selected.
> > G2 15
> > G3 12
> > G4 31
> >
> > So the restriction is to  find group IDs  that give a minim of  40.
> > Once, I reached a minim of 40 then stop selecting group  and output
> > the data..
> >
> > I am hope this helps
> >
> >
> >
> >
> > On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller  
> > wrote:
> >>
> >> This constraint was not clear in your original sample data set. Can you 
> >> expand the data set to clarify how this requirement REALLY works?
> >>
> >> On February 11, 2019 3:00:15 PM PST, Val  wrote:
> >>> Thank you David.
> >>>
> >>> However, this will not work for me. If the group ID selected then all
> >>> of its observation should be included.
> >>>
> >>> On Mon, Feb 11, 2019 at 4:51 PM David L Carlson 
> >>> wrote:
> >>>>
> >>>> First expand your data frame into a vector where G1 is repeated 25
> >>> times, G2 is repeated 15 times, etc. Then draw random samples of 40
> >>> from that vector:
> >>>>
> >>>>> grp <- rep(mydat$group, mydat$count)
> >>>>> grp.sam <- sample(grp, 40)
> >>>>> table(grp.sam)
> >>>> grp.sam
> >>>> G1 G2 G3 G4 G5
> >>>> 10  9  5 13  3
> >>>>
> >>>> 
> >>>> David L Carlson
> >>>> Department of Anthropology
> >>>> Texas A University
> >>>> College Station, TX 77843-4352
> >>>>
> >>>>
> >>>> -Original Message-
> >>>> From: R-help  On Behalf Of Val
> >>>> Sent: Monday, February 11, 2019 4:36 PM
> >>>> To: r-help@R-project.org (r-help@r-project.org)
> >>> 
> >>>> Subject: [R] Select
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I have a data frame  with tow variables  group and its size.
> >>>> mydat<- read.table( text='group  count
> >>>> G1 25
> >>>> G2 15
> >>>> G3 12
> >>>> G4 31
> >>>> G5 10' , header = TRUE, as.is = TRUE )
> >>>>
> >>>> I want to select   group ID randomly (without replacement)  until
> >>> the
> >>>> sum of count reaches 40.
> >>>> So, in  the first case, the data frame could be
> >>>>G4 31
> >>>>65 10
> >>>>
> >>>> In other case, it could be
> >>>>   G5 10
> >>>>   G2 15
> >>>>   G3 12
> >>>>
> >>>> How do I put sum of count variable   is  a minimum of 40 restriction?
> >>>>
> >>>> Than k you in advance
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> I want to select group  ids randomly until I reach the
> >>>>
> >>>> __
> >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >
>
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live Go...
>Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select

2019-02-11 Thread Val
Sorry Jeff and David  for not being clear!

The total sample size should be at least 40, but the selection should
be based on group ID.  A different combination of Group ID could give
 at least  40.
If I select  group G1   with 25  count and  G2  and with 15  counts
then   I can get  a minimum of 40  counts.   So G1 and G2 are
selected.
G1  25
G2  15

In another scenario, if G2, G3 and G4  are  selected  then the total
count will be 58 which is  greater than 40. So G2 , G3 and G4  could
be selected.
 G2 15
 G3 12
 G4 31

So the restriction is to  find group IDs  that give a minim of  40.
Once, I reached a minim of 40 then stop selecting group  and output
the data..

I am hope this helps




On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller  wrote:
>
> This constraint was not clear in your original sample data set. Can you 
> expand the data set to clarify how this requirement REALLY works?
>
> On February 11, 2019 3:00:15 PM PST, Val  wrote:
> >Thank you David.
> >
> >However, this will not work for me. If the group ID selected then all
> >of its observation should be included.
> >
> >On Mon, Feb 11, 2019 at 4:51 PM David L Carlson 
> >wrote:
> >>
> >> First expand your data frame into a vector where G1 is repeated 25
> >times, G2 is repeated 15 times, etc. Then draw random samples of 40
> >from that vector:
> >>
> >> > grp <- rep(mydat$group, mydat$count)
> >> > grp.sam <- sample(grp, 40)
> >> > table(grp.sam)
> >> grp.sam
> >> G1 G2 G3 G4 G5
> >> 10  9  5 13  3
> >>
> >> 
> >> David L Carlson
> >> Department of Anthropology
> >> Texas A University
> >> College Station, TX 77843-4352
> >>
> >>
> >> -Original Message-
> >> From: R-help  On Behalf Of Val
> >> Sent: Monday, February 11, 2019 4:36 PM
> >> To: r-help@R-project.org (r-help@r-project.org)
> >
> >> Subject: [R] Select
> >>
> >> Hi all,
> >>
> >> I have a data frame  with tow variables  group and its size.
> >> mydat<- read.table( text='group  count
> >> G1 25
> >> G2 15
> >> G3 12
> >> G4 31
> >> G5 10' , header = TRUE, as.is = TRUE )
> >>
> >> I want to select   group ID randomly (without replacement)  until
> >the
> >> sum of count reaches 40.
> >> So, in  the first case, the data frame could be
> >>G4 31
> >>65 10
> >>
> >> In other case, it could be
> >>   G5 10
> >>   G2 15
> >>   G3 12
> >>
> >> How do I put sum of count variable   is  a minimum of 40 restriction?
> >>
> >> Than k you in advance
> >>
> >>
> >>
> >>
> >>
> >>
> >> I want to select group  ids randomly until I reach the
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select

2019-02-11 Thread Val
Thank you David.

However, this will not work for me. If the group ID selected then all
of its observation should be included.

On Mon, Feb 11, 2019 at 4:51 PM David L Carlson  wrote:
>
> First expand your data frame into a vector where G1 is repeated 25 times, G2 
> is repeated 15 times, etc. Then draw random samples of 40 from that vector:
>
> > grp <- rep(mydat$group, mydat$count)
> > grp.sam <- sample(grp, 40)
> > table(grp.sam)
> grp.sam
> G1 G2 G3 G4 G5
> 10  9  5 13  3
>
> 
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77843-4352
>
>
> -Original Message-
> From: R-help  On Behalf Of Val
> Sent: Monday, February 11, 2019 4:36 PM
> To: r-help@R-project.org (r-help@r-project.org) 
> Subject: [R] Select
>
> Hi all,
>
> I have a data frame  with tow variables  group and its size.
> mydat<- read.table( text='group  count
> G1 25
> G2 15
> G3 12
> G4 31
> G5 10' , header = TRUE, as.is = TRUE )
>
> I want to select   group ID randomly (without replacement)  until  the
> sum of count reaches 40.
> So, in  the first case, the data frame could be
>G4 31
>65 10
>
> In other case, it could be
>   G5 10
>   G2 15
>   G3 12
>
> How do I put sum of count variable   is  a minimum of 40 restriction?
>
> Than k you in advance
>
>
>
>
>
>
> I want to select group  ids randomly until I reach the
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Select

2019-02-11 Thread Val
Hi all,

I have a data frame  with tow variables  group and its size.
mydat<- read.table( text='group  count
G1 25
G2 15
G3 12
G4 31
G5 10' , header = TRUE, as.is = TRUE )

I want to select   group ID randomly (without replacement)  until  the
sum of count reaches 40.
So, in  the first case, the data frame could be
   G4 31
   65 10

In other case, it could be
  G5 10
  G2 15
  G3 12

How do I put sum of count variable   is  a minimum of 40 restriction?

Than k you in advance






I want to select group  ids randomly until I reach the

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] character comp

2019-02-09 Thread Val
Thank you Erin and Rui!


On Sat, Feb 9, 2019 at 1:08 PM Erin Hodgess  wrote:
>
> Nice, Rui!  Thanks
>
> On Sat, Feb 9, 2019 at 11:55 AM Rui Barradas  wrote:
>>
>> Hello,
>>
>> The following will do it.
>>
>> mydataframe$dvar <- c(sapply(mydataframe[-1], nchar) %*% c(1, -1))
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Às 18:05 de 09/02/2019, Val escreveu:
>> > Hi  All,
>> > In a given data frame I  want to compare character values of two columns.
>> > My sample data looks like as follow,
>> >
>> > mydataframe <- read.table( text='ID  var1 var2
>> >R1   AA  AAA
>> >R2   AAA AAA
>> >R3A  
>> >R4   AA   A
>> >R5   A  AAA', header = TRUE, as.is = TRUE )
>> >
>> > For each ID, I want  create the third column "dvar" as  difference
>> > between var1 and var2
>> >   Row1( R1)   the "dvar" value will be -1 and the complete  desired out
>> > put looks like as follow.
>> >
>> >   IDvar1 var2   dvar
>> >   R1   AAAAA-1
>> >   R2  AAA  AAA  0
>> >   R3A-3
>> >   R4   AA   A1
>> >   R5A AAA  -2
>> >
>> > How do i do this? Any help please?
>> > Thank you
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Erin Hodgess, PhD
> mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] character comp

2019-02-09 Thread Val
Hi Erin,  Yes, it is always  A's.

On Sat, Feb 9, 2019 at 12:22 PM Erin Hodgess  wrote:
>
> Will it always be A’s or will there be a mix please?
>
> On Sat, Feb 9, 2019 at 11:06 AM Val  wrote:
>>
>> Hi  All,
>> In a given data frame I  want to compare character values of two columns.
>> My sample data looks like as follow,
>>
>> mydataframe <- read.table( text='ID  var1 var2
>>   R1   AA  AAA
>>   R2   AAA AAA
>>   R3A  
>>   R4   AA   A
>>   R5   A  AAA', header = TRUE, as.is = TRUE )
>>
>> For each ID, I want  create the third column "dvar" as  difference
>> between var1 and var2
>>  Row1( R1)   the "dvar" value will be -1 and the complete  desired out
>> put looks like as follow.
>>
>>  IDvar1 var2   dvar
>>  R1   AAAAA-1
>>  R2  AAA  AAA  0
>>  R3A-3
>>  R4   AA   A1
>>  R5A AAA  -2
>>
>> How do i do this? Any help please?
>> Thank you
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Erin Hodgess, PhD
> mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] character comp

2019-02-09 Thread Val
Hi  All,
In a given data frame I  want to compare character values of two columns.
My sample data looks like as follow,

mydataframe <- read.table( text='ID  var1 var2
  R1   AA  AAA
  R2   AAA AAA
  R3A  
  R4   AA   A
  R5   A  AAA', header = TRUE, as.is = TRUE )

For each ID, I want  create the third column "dvar" as  difference
between var1 and var2
 Row1( R1)   the "dvar" value will be -1 and the complete  desired out
put looks like as follow.

 IDvar1 var2   dvar
 R1   AAAAA-1
 R2  AAA  AAA  0
 R3A-3
 R4   AA   A1
 R5A AAA  -2

How do i do this? Any help please?
Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read

2018-11-10 Thread Val
Thank you Jeff and all.

My data is very messy and it is nice trick suggested by Jeff to handle it

On Fri, Nov 9, 2018 at 8:42 PM Jeff Newmiller  wrote:
>
> Your file has 5 commas in the first data row, but only 4 in the header. R
> interprets this to mean your first column is intended to be row names (has
> no corresponding column label) rather than data. (Row names are "outside"
> the data frame... use str(dsh) to get a better picture.)
>
> Basically, your file does not conform to consistent practices for csv
> files of having the same number of commas in every row. If at all possible
> I would eliminate the extra comma. If you have many of these broken files,
> you might need to read the data in pieces... e.g.
>
> dsh <- read.csv( "dat.csv", header=FALSE, skip=1 )
> dsh <- dsh[ , -length( dsh ) ]
> dshh <- read.csv( "dat.csv", header=TRUE, nrow=1)
> names( dsh ) <- names( dshh )
>
> On Fri, 9 Nov 2018, Val wrote:
>
> > HI all,
> > I am trying to read a csv file, but  have a problem in the row names.
> > After reading, the name of the first column is now "row.names" and
> > all other column names are shifted to the right. The value of the last
> > column become all NAs( as an extra column).
> >
> > My sample data looks like as follow,
> > filename = dat.csv
> > The first row has a missing value at column 3 and 5. The last row has
> > a missing value at column 1 and  5
> > x1,x2,x3,x4,x5
> > 12,13,,14,,
> > 22,23,24,25,26
> > ,33,34,34,
> > To read the file I used this
> >
> > dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char
> > = "", quote = "", stringsAsFactors = FALSE)
> >
> > The output  from the above  is
> > dsh
> >
> > row.names x1 x2 x3 x4 x5
> > 112 13 NA 14 NA  NA
> > 222 23 24 25 26  NA
> > 3 33 34 34 NA  NA
> >
> > The name of teh frist column is row,banes and all values of last columns is 
> > NAs
> >
> >
> > However, the desired output should be
> > x1 x2 x3 x4 x5
> > 12 13 NA 14 NA
> > 22 23 24 25 26
> > NA 33 34 34 NA
> >
> >
> > How can I fix this?
> > Thank you in advance
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live Go...
>Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Read

2018-11-09 Thread Val
HI all,
I am trying to read a csv file, but  have a problem in the row names.
After reading, the name of the first column is now "row.names" and
all other column names are shifted to the right. The value of the last
column become all NAs( as an extra column).

My sample data looks like as follow,
filename = dat.csv
The first row has a missing value at column 3 and 5. The last row has
a missing value at column 1 and  5
x1,x2,x3,x4,x5
12,13,,14,,
22,23,24,25,26
,33,34,34,
To read the file I used this

dsh<-read.csv(file="dat.csv",sep=",",row.names=NULL,fill=TRUE,header=TRUE,comment.char
= "", quote = "", stringsAsFactors = FALSE)

The output  from the above  is
dsh

 row.names x1 x2 x3 x4 x5
112 13 NA 14 NA  NA
222 23 24 25 26  NA
3 33 34 34 NA  NA

The name of teh frist column is row,banes and all values of last columns is NAs


However, the desired output should be
 x1 x2 x3 x4 x5
 12 13 NA 14 NA
 22 23 24 25 26
 NA 33 34 34 NA


How can I fix this?
Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] select and hold missing

2018-09-12 Thread Val
I have a data
dfc <- read.table( text= 'week v1 v2
  w1  11  11
  w1  .42
  w1  31  32
  w2  31  52
  w2  41  .
  w3  51  82
  w2  11  22
  w3  11  12
  w4  21  202
  w1  31  72
  w2  71  52', header = TRUE, as.is = TRUE, na.strings=c("",".","NA") )

I want to create this new variable diff = v2-v1  and remove rows based
on this "diff" value as shown below.
dfc$diff <-  dfc$v2 - dfc$v1
I want to   remove row values  <=0  and any value greater than  >=
100   and keep all values including NAs
dfca  <- dfc[((dfc$diff) > 0) & ((dfc$diff) < 100), ]

 However, the result is not what I wanted. I want the output as follow,
  week v1 v2 diff
  w1 NA  42  NA
  w1 31 321
  w2 31 52   21
  w2 41  NA  NA
  w3 51 82   31
  w2 11 22   11
  w3 11 121
  w1 31 72   41

However, I got this,l. Why it is setting all row values  NA?
   week v1 v2 diff
   NA NA   NA
  w1 31 321
 w2 31 52   21
  NA NA   NA
  w3 51 82   31
  w2 11 22   11
  w3 11 121
  w1 31 72   41

Any help ?
Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exclude

2018-05-17 Thread Val
Thank you Bert and Jim,
Jim, FYI , I have an error message generated as

Error in allstates : object 'allstates' not found

Bert, it is working. However, If I want to chose to include only mos years
example, 2003,2004,2007 and continue the analysis as before.  Where should
I define the years   to get as follow.
  2003 2004  2007
  AL21   1
  NY11  2

Thank you again.








On Thu, May 17, 2018 at 8:48 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:

> ... and similar to Jim's suggestion but perhaps slightly simpler (or not!):
>
> > cross <- xtabs( Y ~ stat + year, data = tdat)
> > keep <- apply(cross, 1, all)
> > keep <- names(keep)[keep]
> > cross[keep,]
> year
> stat 2003 2004 2006 2007 2009 2010
>   AL   38   21   20   12   16   15
>   NY   50   51   57   98  183  230
>
>
>
> > ## for counts just do:
> > xtabs( ~ stat + year, data = tdat[tdat$stat %in% keep, ])
> year
> stat 2003 2004 2006 2007 2009 2010
>   AL211111
>   NY111223
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Thu, May 17, 2018 at 5:48 PM, Val <valkr...@gmail.com> wrote:
>
>> Hi All,
>>
>> I have a sample of  data set show as below.
>> tdat <- read.table(textConnection("stat year Y
>> AL 200325
>> AL 200313
>> AL 200421
>> AL 200620
>> AL 200712
>> AL 200916
>> AL 201015
>> FL 200663
>> FL 200714
>> FL 200725
>> FL 200964
>> FL 200947
>> FL 201048
>> NY 200350
>> NY 200451
>> NY 200657
>> NY 200762
>> NY 200736
>> NY 200987
>> NY 200996
>> <https://maps.google.com/?q=2009%C2%A0+%C2%A0+96+%0D%0ANY=gmail=g>
>> NY 201091
>> NY 201059
>> NY 201080"),header = TRUE,stringsAsFactors=FALSE)
>>
>> There are three states, I wan tto select states taht do ahve records in
>> all
>> year.
>> Example,
>> xtabs(Y~stat+year, tdat)
>>  This gave me the following
>>
>>  stat 2003 2004 2006 2007 2009 2010
>>   AL   38   21   20   12   16   15
>>   FL00   63   39  111   48
>>   NY   50   51   57   98  183  230
>>
>> Fl state does not have recrods in all year  and I wan to exclude from this
>> and I want teh result   as follow
>>
>>  stat 2003 2004 2006 2007 2009 2010
>>   AL   38   21   20   12   16   15
>>   NY   50   51   57   98  183  230
>>
>> The other thing, how do I get teh counts state by year?
>>
>> Desired result,
>>
>>20032004   2006   2007   20092010
>> AL  2   1  1   1  1 1
>> NY 11 12  23
>>
>> Thank you
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] exclude

2018-05-17 Thread Val
Hi All,

I have a sample of  data set show as below.
tdat <- read.table(textConnection("stat year Y
AL 200325
AL 200313
AL 200421
AL 200620
AL 200712
AL 200916
AL 201015
FL 200663
FL 200714
FL 200725
FL 200964
FL 200947
FL 201048
NY 200350
NY 200451
NY 200657
NY 200762
NY 200736
NY 200987
NY 200996
NY 201091
NY 201059
NY 201080"),header = TRUE,stringsAsFactors=FALSE)

There are three states, I wan tto select states taht do ahve records in all
year.
Example,
xtabs(Y~stat+year, tdat)
 This gave me the following

 stat 2003 2004 2006 2007 2009 2010
  AL   38   21   20   12   16   15
  FL00   63   39  111   48
  NY   50   51   57   98  183  230

Fl state does not have recrods in all year  and I wan to exclude from this
and I want teh result   as follow

 stat 2003 2004 2006 2007 2009 2010
  AL   38   21   20   12   16   15
  NY   50   51   57   98  183  230

The other thing, how do I get teh counts state by year?

Desired result,

   20032004   2006   2007   20092010
AL  2   1  1   1  1 1
NY 11 12  23

Thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] include

2018-02-25 Thread Val
Thank you all for your help and sorry for that.

On Sun, Feb 25, 2018 at 12:18 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>
wrote:

> Jim has been exceedingly patient (and may well continue to be so), but
> this smells like "failure to launch". At what point will you start showing
> your (failed) attempts at solving your own problems so we can help you work
> on your specific weaknesses and become self-sufficient?
> --
> Sent from my phone. Please excuse my brevity.
>
> On February 25, 2018 7:55:55 AM PST, Val <valkr...@gmail.com> wrote:
> >HI Jim and all,
> >
> >I want to put one more condition.   Include col2 and col3 if they are
> >not
> >in col1.
> >
> >Here is the data
> >mydat <- read.table(textConnection("Col1 Col2 col3
> >K2 X1 NA
> >Z1 K1 K2
> >Z2 NA NA
> >Z3 X1 NA
> >Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)
> >
> >The desired out put would be
> >
> >  Col1 Col2 col3
> >1X100
> >2K100
> >3Y100
> >4W100
> >6K2   X10
> >7Z1   K1   K2
> >8Z200
> >9Z3   X10
> >10   Z4   Y1   W1
> >
> >K2 is already is already in col1 and should not be added.
> >
> >Thank you in advance
> >
> >
> >
> >
> >
> >
> >
> >On Sat, Feb 24, 2018 at 6:38 PM, Jim Lemon <drjimle...@gmail.com>
> >wrote:
> >
> >> Hi Val,
> >> My fault - I assumed that the NA would be first in the result
> >produced
> >> by "unique":
> >>
> >> mydat <- read.table(textConnection("Col1 Col2 col3
> >> Z1 K1 K2
> >> Z2 NA NA
> >> Z3 X1 NA
> >> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)
> >> val23<-unique(unlist(mydat[,c("Col2","col3")]))
> >> napos<-which(is.na(val23))
> >> preval<-data.frame(Col1=val23[-napos],
> >>  Col2=NA,col3=NA)
> >> mydat<-rbind(preval,mydat)
> >> mydat[is.na(mydat)]<-"0"
> >> mydat
> >>
> >> Jim
> >>
> >> On Sun, Feb 25, 2018 at 11:27 AM, Val <valkr...@gmail.com> wrote:
> >> > Thank you Jim,
> >> >
> >> > I read the data as you suggested but I could not find K1 in   col1.
> >> >
> >> > rbind(preval,mydat)
> >> >   Col1 Col2 col3
> >> > 1   
> >> > 2   X1  
> >> > 3   Y1  
> >> > 4   K2  
> >> > 5   W1  
> >> > 6   Z1   K1   K2
> >> > 7   Z2  
> >> > 8   Z3   X1 
> >> > 9   Z4   Y1   W1
> >> >
> >> >
> >> >
> >> > On Sat, Feb 24, 2018 at 6:18 PM, Jim Lemon <drjimle...@gmail.com>
> >wrote:
> >> >>
> >> >> hi Val,
> >> >> Your problem seems to be that the data are read in as a factor.
> >The
> >> >> simplest way I can think of to get around this is:
> >> >>
> >> >> mydat <- read.table(textConnection("Col1 Col2 col3
> >> >> Z1 K1 K2
> >> >> Z2 NA NA
> >> >> Z3 X1 NA
> >> >> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)
> >> >>
> >preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >> >>  Col2=NA,col3=NA)
> >> >> rbind(preval,mydat)
> >> >> mydat[is.na(mydat)]<-"0"
> >> >>
> >> >> Jiim
> >> >>
> >> >>
> >> >> On Sun, Feb 25, 2018 at 11:05 AM, Val <valkr...@gmail.com> wrote:
> >> >> > Sorry , I hit the send key accidentally  here is my complete
> >message.
> >> >> >
> >> >> > Thank you Jim   and all, I got it.
> >> >> >
> >> >> > I have one more question on the original question
> >> >> >
> >> >> >  What does this  "[-1] "  do?
> >> >> >
> >preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >> >> >Col2=NA,col3=NA)
> >> >> >
> >> >> >
> >> >> > mydat <- read.table(textConnection("Col1 Col2 col3
> >> >> > Z1 K1 K2
> >> >> > Z2 NA NA
> >> >> > Z3 X1 NA
> >> >> > Z4 Y1 W1"),header = TRUE)
> >

Re: [R] include

2018-02-25 Thread Val
HI Jim and all,

I want to put one more condition.   Include col2 and col3 if they are not
in col1.

Here is the data
mydat <- read.table(textConnection("Col1 Col2 col3
K2 X1 NA
Z1 K1 K2
Z2 NA NA
Z3 X1 NA
Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)

The desired out put would be

  Col1 Col2 col3
1X100
2K100
3Y100
4W100
6K2   X10
7Z1   K1   K2
8Z200
9Z3   X10
10   Z4   Y1   W1

K2 is already is already in col1 and should not be added.

Thank you in advance







On Sat, Feb 24, 2018 at 6:38 PM, Jim Lemon <drjimle...@gmail.com> wrote:

> Hi Val,
> My fault - I assumed that the NA would be first in the result produced
> by "unique":
>
> mydat <- read.table(textConnection("Col1 Col2 col3
> Z1 K1 K2
> Z2 NA NA
> Z3 X1 NA
> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)
> val23<-unique(unlist(mydat[,c("Col2","col3")]))
> napos<-which(is.na(val23))
> preval<-data.frame(Col1=val23[-napos],
>  Col2=NA,col3=NA)
> mydat<-rbind(preval,mydat)
> mydat[is.na(mydat)]<-"0"
> mydat
>
> Jim
>
> On Sun, Feb 25, 2018 at 11:27 AM, Val <valkr...@gmail.com> wrote:
> > Thank you Jim,
> >
> > I read the data as you suggested but I could not find K1 in   col1.
> >
> > rbind(preval,mydat)
> >   Col1 Col2 col3
> > 1   
> > 2   X1  
> > 3   Y1  
> > 4   K2  
> > 5   W1  
> > 6   Z1   K1   K2
> > 7   Z2  
> > 8   Z3   X1 
> > 9   Z4   Y1   W1
> >
> >
> >
> > On Sat, Feb 24, 2018 at 6:18 PM, Jim Lemon <drjimle...@gmail.com> wrote:
> >>
> >> hi Val,
> >> Your problem seems to be that the data are read in as a factor. The
> >> simplest way I can think of to get around this is:
> >>
> >> mydat <- read.table(textConnection("Col1 Col2 col3
> >> Z1 K1 K2
> >> Z2 NA NA
> >> Z3 X1 NA
> >> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)
> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >>  Col2=NA,col3=NA)
> >> rbind(preval,mydat)
> >> mydat[is.na(mydat)]<-"0"
> >>
> >> Jiim
> >>
> >>
> >> On Sun, Feb 25, 2018 at 11:05 AM, Val <valkr...@gmail.com> wrote:
> >> > Sorry , I hit the send key accidentally  here is my complete message.
> >> >
> >> > Thank you Jim   and all, I got it.
> >> >
> >> > I have one more question on the original question
> >> >
> >> >  What does this  "[-1] "  do?
> >> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >> >Col2=NA,col3=NA)
> >> >
> >> >
> >> > mydat <- read.table(textConnection("Col1 Col2 col3
> >> > Z1 K1 K2
> >> > Z2 NA NA
> >> > Z3 X1 NA
> >> > Z4 Y1 W1"),header = TRUE)
> >> >
> >> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >> >Col2=NA,col3=NA)
> >> > rbind(unique(preval),mydat)
> >> >
> >> >
> >> >  Col1 Col2 col3
> >> > 1   
> >> > 2   X1  
> >> > 3   Y1  
> >> > 4   K2  
> >> > 5   W1  
> >> > 6   Z1   K1   K2
> >> > 7   Z2  
> >> > 8   Z3   X1 
> >> > 9   Z4   Y1   W1
> >> >
> >> > I could not find K1 in the first   col1. Is that possible to fix this?
> >> >
> >> > On Sat, Feb 24, 2018 at 5:59 PM, Val <valkr...@gmail.com> wrote:
> >> >
> >> >> Thank you Jim   and all, I got it.
> >> >>
> >> >> I have one more question on the original question
> >> >>
> >> >>  What does this  "[-1] "  do?
> >> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2",
> "col3")]))[-1],
> >> >>Col2=NA,col3=NA)
> >> >>
> >> >>
> >> >> mydat <- read.table(textConnection("Col1 Col2 col3
> >> >> Z1 K1 K2
> >> >> Z2 NA NA
> >> >> Z3 X1 NA
> >> >> Z4 Y1 W1"),header = TRUE)
> >> >>
> >> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2",
> "col3")]))[-1],
> >> >>   

Re: [R] include

2018-02-24 Thread Val
Thank you so much Jim!

On Sat, Feb 24, 2018 at 6:38 PM, Jim Lemon <drjimle...@gmail.com> wrote:

> Hi Val,
> My fault - I assumed that the NA would be first in the result produced
> by "unique":
>
> mydat <- read.table(textConnection("Col1 Col2 col3
> Z1 K1 K2
> Z2 NA NA
> Z3 X1 NA
> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)
> val23<-unique(unlist(mydat[,c("Col2","col3")]))
> napos<-which(is.na(val23))
> preval<-data.frame(Col1=val23[-napos],
>  Col2=NA,col3=NA)
> mydat<-rbind(preval,mydat)
> mydat[is.na(mydat)]<-"0"
> mydat
>
> Jim
>
> On Sun, Feb 25, 2018 at 11:27 AM, Val <valkr...@gmail.com> wrote:
> > Thank you Jim,
> >
> > I read the data as you suggested but I could not find K1 in   col1.
> >
> > rbind(preval,mydat)
> >   Col1 Col2 col3
> > 1   
> > 2   X1  
> > 3   Y1  
> > 4   K2  
> > 5   W1  
> > 6   Z1   K1   K2
> > 7   Z2  
> > 8   Z3   X1 
> > 9   Z4   Y1   W1
> >
> >
> >
> > On Sat, Feb 24, 2018 at 6:18 PM, Jim Lemon <drjimle...@gmail.com> wrote:
> >>
> >> hi Val,
> >> Your problem seems to be that the data are read in as a factor. The
> >> simplest way I can think of to get around this is:
> >>
> >> mydat <- read.table(textConnection("Col1 Col2 col3
> >> Z1 K1 K2
> >> Z2 NA NA
> >> Z3 X1 NA
> >> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)
> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >>  Col2=NA,col3=NA)
> >> rbind(preval,mydat)
> >> mydat[is.na(mydat)]<-"0"
> >>
> >> Jiim
> >>
> >>
> >> On Sun, Feb 25, 2018 at 11:05 AM, Val <valkr...@gmail.com> wrote:
> >> > Sorry , I hit the send key accidentally  here is my complete message.
> >> >
> >> > Thank you Jim   and all, I got it.
> >> >
> >> > I have one more question on the original question
> >> >
> >> >  What does this  "[-1] "  do?
> >> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >> >Col2=NA,col3=NA)
> >> >
> >> >
> >> > mydat <- read.table(textConnection("Col1 Col2 col3
> >> > Z1 K1 K2
> >> > Z2 NA NA
> >> > Z3 X1 NA
> >> > Z4 Y1 W1"),header = TRUE)
> >> >
> >> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >> >Col2=NA,col3=NA)
> >> > rbind(unique(preval),mydat)
> >> >
> >> >
> >> >  Col1 Col2 col3
> >> > 1   
> >> > 2   X1  
> >> > 3   Y1  
> >> > 4   K2  
> >> > 5   W1  
> >> > 6   Z1   K1   K2
> >> > 7   Z2  
> >> > 8   Z3   X1 
> >> > 9   Z4   Y1   W1
> >> >
> >> > I could not find K1 in the first   col1. Is that possible to fix this?
> >> >
> >> > On Sat, Feb 24, 2018 at 5:59 PM, Val <valkr...@gmail.com> wrote:
> >> >
> >> >> Thank you Jim   and all, I got it.
> >> >>
> >> >> I have one more question on the original question
> >> >>
> >> >>  What does this  "[-1] "  do?
> >> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2",
> "col3")]))[-1],
> >> >>Col2=NA,col3=NA)
> >> >>
> >> >>
> >> >> mydat <- read.table(textConnection("Col1 Col2 col3
> >> >> Z1 K1 K2
> >> >> Z2 NA NA
> >> >> Z3 X1 NA
> >> >> Z4 Y1 W1"),header = TRUE)
> >> >>
> >> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2",
> "col3")]))[-1],
> >> >>Col2=NA,col3=NA)
> >> >> rbind(unique(preval),mydat)
> >> >>
> >> >>
> >> >>  Col1 Col2 col3
> >> >> 1   
> >> >> 2   X1  
> >> >> 3   Y1  
> >> >> 4   K2  
> >> >> 5   W1  
> >> >> 6   Z1   K1   K2
> >> >> 7   Z2  
> >> >> 8   Z3   X1 
> >> >> 9   Z4   Y1   W1
> >> >>
> >> >>
> >> &

Re: [R] include

2018-02-24 Thread Val
Thank you Jim,

I read the data as you suggested but I could not find K1 in   col1.

rbind(preval,mydat)  Col1 Col2 col3
1   
2   X1  
3   Y1  
4   K2  
5   W1  
6   Z1   K1   K2
7   Z2  
8   Z3   X1 
9   Z4   Y1   W1



On Sat, Feb 24, 2018 at 6:18 PM, Jim Lemon <drjimle...@gmail.com> wrote:

> hi Val,
> Your problem seems to be that the data are read in as a factor. The
> simplest way I can think of to get around this is:
>
> mydat <- read.table(textConnection("Col1 Col2 col3
> Z1 K1 K2
> Z2 NA NA
> Z3 X1 NA
> Z4 Y1 W1"),header = TRUE,stringsAsFactors=FALSE)
> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
>  Col2=NA,col3=NA)
> rbind(preval,mydat)
> mydat[is.na(mydat)]<-"0"
>
> Jiim
>
>
> On Sun, Feb 25, 2018 at 11:05 AM, Val <valkr...@gmail.com> wrote:
> > Sorry , I hit the send key accidentally  here is my complete message.
> >
> > Thank you Jim   and all, I got it.
> >
> > I have one more question on the original question
> >
> >  What does this  "[-1] "  do?
> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >Col2=NA,col3=NA)
> >
> >
> > mydat <- read.table(textConnection("Col1 Col2 col3
> > Z1 K1 K2
> > Z2 NA NA
> > Z3 X1 NA
> > Z4 Y1 W1"),header = TRUE)
> >
> > preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >Col2=NA,col3=NA)
> > rbind(unique(preval),mydat)
> >
> >
> >  Col1 Col2 col3
> > 1   
> > 2   X1  
> > 3   Y1  
> > 4   K2  
> > 5   W1  
> > 6   Z1   K1   K2
> > 7   Z2  
> > 8   Z3   X1 
> > 9   Z4   Y1   W1
> >
> > I could not find K1 in the first   col1. Is that possible to fix this?
> >
> > On Sat, Feb 24, 2018 at 5:59 PM, Val <valkr...@gmail.com> wrote:
> >
> >> Thank you Jim   and all, I got it.
> >>
> >> I have one more question on the original question
> >>
> >>  What does this  "[-1] "  do?
> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >>Col2=NA,col3=NA)
> >>
> >>
> >> mydat <- read.table(textConnection("Col1 Col2 col3
> >> Z1 K1 K2
> >> Z2 NA NA
> >> Z3 X1 NA
> >> Z4 Y1 W1"),header = TRUE)
> >>
> >> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
> >>Col2=NA,col3=NA)
> >> rbind(unique(preval),mydat)
> >>
> >>
> >>  Col1 Col2 col3
> >> 1   
> >> 2   X1  
> >> 3   Y1  
> >> 4   K2  
> >> 5   W1  
> >> 6   Z1   K1   K2
> >> 7   Z2  
> >> 8   Z3   X1 
> >> 9   Z4   Y1   W1
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Feb 24, 2018 at 5:04 PM, Duncan Murdoch <
> murdoch.dun...@gmail.com>
> >> wrote:
> >>
> >>> On 24/02/2018 1:53 PM, William Dunlap via R-help wrote:
> >>>
> >>>> x1 =  rbind(unique(preval),mydat)
> >>>>x2 <- x1[is.na(x1)] <- 0
> >>>>x2  # gives 0
> >>>>
> >>>> Why introduce the 'x2'?   x1[...] <- 0 alters x1 in place and I think
> >>>> that
> >>>> altered x1 is what you want.
> >>>>
> >>>> You asked why x2 was zero.  The value of the expression
> >>>> f(a) <- b
> >>>> and assignments are processed right to left so
> >>>> x2 <- x[!is.na(x1)] <- 0
> >>>> is equivalent to
> >>>> x[!is.na(x1)] <- 0
> >>>> x2 <- 0
> >>>>
> >>>
> >>> That's not right in general, is it?  I'd think that should be
> >>>
> >>> x[!is.na(x1)] <- 0
> >>> x2 <- x1
> >>>
> >>> Of course, in this example, x1 is 0, so it gives the same answer.
> >>>
> >>> Duncan Murdoch
> >>>
> >>>
> >>>
> >>>>
> >>>> Bill Dunlap
> >>>> TIBCO Software
> >>>> wdunlap tibco.com
> >>>>
> >>>> On 

Re: [R] include

2018-02-24 Thread Val
Sorry , I hit the send key accidentally  here is my complete message.

Thank you Jim   and all, I got it.

I have one more question on the original question

 What does this  "[-1] "  do?
preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
   Col2=NA,col3=NA)


mydat <- read.table(textConnection("Col1 Col2 col3
Z1 K1 K2
Z2 NA NA
Z3 X1 NA
Z4 Y1 W1"),header = TRUE)

preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
   Col2=NA,col3=NA)
rbind(unique(preval),mydat)


 Col1 Col2 col3
1   
2   X1  
3   Y1  
4   K2  
5   W1  
6   Z1   K1   K2
7   Z2  
8   Z3   X1 
9   Z4   Y1   W1

I could not find K1 in the first   col1. Is that possible to fix this?

On Sat, Feb 24, 2018 at 5:59 PM, Val <valkr...@gmail.com> wrote:

> Thank you Jim   and all, I got it.
>
> I have one more question on the original question
>
>  What does this  "[-1] "  do?
> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
>Col2=NA,col3=NA)
>
>
> mydat <- read.table(textConnection("Col1 Col2 col3
> Z1 K1 K2
> Z2 NA NA
> Z3 X1 NA
> Z4 Y1 W1"),header = TRUE)
>
> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
>Col2=NA,col3=NA)
> rbind(unique(preval),mydat)
>
>
>  Col1 Col2 col3
> 1   
> 2   X1  
> 3   Y1  
> 4   K2  
> 5   W1  
> 6   Z1   K1   K2
> 7   Z2  
> 8   Z3   X1 
> 9   Z4   Y1   W1
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Feb 24, 2018 at 5:04 PM, Duncan Murdoch <murdoch.dun...@gmail.com>
> wrote:
>
>> On 24/02/2018 1:53 PM, William Dunlap via R-help wrote:
>>
>>> x1 =  rbind(unique(preval),mydat)
>>>x2 <- x1[is.na(x1)] <- 0
>>>x2  # gives 0
>>>
>>> Why introduce the 'x2'?   x1[...] <- 0 alters x1 in place and I think
>>> that
>>> altered x1 is what you want.
>>>
>>> You asked why x2 was zero.  The value of the expression
>>> f(a) <- b
>>> and assignments are processed right to left so
>>> x2 <- x[!is.na(x1)] <- 0
>>> is equivalent to
>>> x[!is.na(x1)] <- 0
>>> x2 <- 0
>>>
>>
>> That's not right in general, is it?  I'd think that should be
>>
>> x[!is.na(x1)] <- 0
>> x2 <- x1
>>
>> Of course, in this example, x1 is 0, so it gives the same answer.
>>
>> Duncan Murdoch
>>
>>
>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Sat, Feb 24, 2018 at 9:59 AM, Val <valkr...@gmail.com> wrote:
>>>
>>> Thank you Jim
>>>>
>>>> I wanted a final data frame  after replacing the NA's to "0"
>>>>
>>>> x1 =  rbind(unique(preval),mydat)
>>>> x2 <- x1[is.na(x1)] <- 0
>>>> x2
>>>>   but I got this,
>>>>
>>>> [1] 0
>>>>
>>>> why I am getting this?
>>>>
>>>>
>>>> On Sat, Feb 24, 2018 at 12:17 AM, Jim Lemon <drjimle...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Val,
>>>>> Try this:
>>>>>
>>>>> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
>>>>>   Col2=NA,col3=NA)
>>>>> rbind(preval,mydat)
>>>>>
>>>>> Jim
>>>>>
>>>>> On Sat, Feb 24, 2018 at 3:34 PM, Val <valkr...@gmail.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I am reading a file as follow,
>>>>>>
>>>>>> mydat <- read.table(textConnection("Col1 Col2 col3
>>>>>> Z2 NA NA
>>>>>> Z3 X1 NA
>>>>>> Z4 Y1 W1"),header = TRUE)
>>>>>>
>>>>>> 1. "NA" are   missing  should be replace by 0
>>>>>> 2.  value that are in COl2 and Col3  should be included  in col1
>>>>>> before
>>>>>> they appear
>>>>>> in col2 and col3. So the output data looks like as follow,
>>>>>>
>>>>>> X1  0  0
>>>>>> Y1  0  0
>>>>>> W1  0  0
>>>>>> Z2  0  0
>>>>>> Z3 X1  0
>>>>>> Z4 Y1 W1
>>>>>>
>&g

Re: [R] include

2018-02-24 Thread Val
Thank you Jim   and all, I got it.

I have one more question on the original question

 What does this  "[-1] "  do?
preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
   Col2=NA,col3=NA)


mydat <- read.table(textConnection("Col1 Col2 col3
Z1 K1 K2
Z2 NA NA
Z3 X1 NA
Z4 Y1 W1"),header = TRUE)

preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
   Col2=NA,col3=NA)
rbind(unique(preval),mydat)


 Col1 Col2 col3
1   
2   X1  
3   Y1  
4   K2  
5   W1  
6   Z1   K1   K2
7   Z2  
8   Z3   X1 
9   Z4   Y1   W1















On Sat, Feb 24, 2018 at 5:04 PM, Duncan Murdoch <murdoch.dun...@gmail.com>
wrote:

> On 24/02/2018 1:53 PM, William Dunlap via R-help wrote:
>
>> x1 =  rbind(unique(preval),mydat)
>>x2 <- x1[is.na(x1)] <- 0
>>x2  # gives 0
>>
>> Why introduce the 'x2'?   x1[...] <- 0 alters x1 in place and I think that
>> altered x1 is what you want.
>>
>> You asked why x2 was zero.  The value of the expression
>> f(a) <- b
>> and assignments are processed right to left so
>> x2 <- x[!is.na(x1)] <- 0
>> is equivalent to
>> x[!is.na(x1)] <- 0
>> x2 <- 0
>>
>
> That's not right in general, is it?  I'd think that should be
>
> x[!is.na(x1)] <- 0
> x2 <- x1
>
> Of course, in this example, x1 is 0, so it gives the same answer.
>
> Duncan Murdoch
>
>
>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Sat, Feb 24, 2018 at 9:59 AM, Val <valkr...@gmail.com> wrote:
>>
>> Thank you Jim
>>>
>>> I wanted a final data frame  after replacing the NA's to "0"
>>>
>>> x1 =  rbind(unique(preval),mydat)
>>> x2 <- x1[is.na(x1)] <- 0
>>> x2
>>>   but I got this,
>>>
>>> [1] 0
>>>
>>> why I am getting this?
>>>
>>>
>>> On Sat, Feb 24, 2018 at 12:17 AM, Jim Lemon <drjimle...@gmail.com>
>>> wrote:
>>>
>>> Hi Val,
>>>> Try this:
>>>>
>>>> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
>>>>   Col2=NA,col3=NA)
>>>> rbind(preval,mydat)
>>>>
>>>> Jim
>>>>
>>>> On Sat, Feb 24, 2018 at 3:34 PM, Val <valkr...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am reading a file as follow,
>>>>>
>>>>> mydat <- read.table(textConnection("Col1 Col2 col3
>>>>> Z2 NA NA
>>>>> Z3 X1 NA
>>>>> Z4 Y1 W1"),header = TRUE)
>>>>>
>>>>> 1. "NA" are   missing  should be replace by 0
>>>>> 2.  value that are in COl2 and Col3  should be included  in col1 before
>>>>> they appear
>>>>> in col2 and col3. So the output data looks like as follow,
>>>>>
>>>>> X1  0  0
>>>>> Y1  0  0
>>>>> W1  0  0
>>>>> Z2  0  0
>>>>> Z3 X1  0
>>>>> Z4 Y1 W1
>>>>>
>>>>> Thank you in advance
>>>>>
>>>>>  [[alternative HTML version deleted]]
>>>>>
>>>>> __
>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>
>>>> posting-guide.html
>>>>
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] include

2018-02-24 Thread Val
Thank you Jim

I wanted a final data frame  after replacing the NA's to "0"

x1 =  rbind(unique(preval),mydat)
x2 <- x1[is.na(x1)] <- 0
x2
 but I got this,

[1] 0

why I am getting this?


On Sat, Feb 24, 2018 at 12:17 AM, Jim Lemon <drjimle...@gmail.com> wrote:

> Hi Val,
> Try this:
>
> preval<-data.frame(Col1=unique(unlist(mydat[,c("Col2","col3")]))[-1],
>  Col2=NA,col3=NA)
> rbind(preval,mydat)
>
> Jim
>
> On Sat, Feb 24, 2018 at 3:34 PM, Val <valkr...@gmail.com> wrote:
> > Hi All,
> >
> > I am reading a file as follow,
> >
> > mydat <- read.table(textConnection("Col1 Col2 col3
> > Z2 NA NA
> > Z3 X1 NA
> > Z4 Y1 W1"),header = TRUE)
> >
> > 1. "NA" are   missing  should be replace by 0
> > 2.  value that are in COl2 and Col3  should be included  in col1 before
> > they appear
> > in col2 and col3. So the output data looks like as follow,
> >
> > X1  0  0
> > Y1  0  0
> > W1  0  0
> > Z2  0  0
> > Z3 X1  0
> > Z4 Y1 W1
> >
> > Thank you in advance
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] include

2018-02-23 Thread Val
Hi All,

I am reading a file as follow,

mydat <- read.table(textConnection("Col1 Col2 col3
Z2 NA NA
Z3 X1 NA
Z4 Y1 W1"),header = TRUE)

1. "NA" are   missing  should be replace by 0
2.  value that are in COl2 and Col3  should be included  in col1 before
they appear
in col2 and col3. So the output data looks like as follow,

X1  0  0
Y1  0  0
W1  0  0
Z2  0  0
Z3 X1  0
Z4 Y1 W1

Thank you in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] find unique and summerize

2018-02-04 Thread Val
Thank you so much Rui!

On Sun, Feb 4, 2018 at 12:20 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote:

> Hello,
>
> Please always cc the list.
>
> As for the question, I believe the following does it.
>
> a <- strsplit(mydata$ID, "[[:alpha:]]+")
> b <- strsplit(mydata$ID, "[[:digit:]]+")
>
> a <- sapply(a, `[`, 1)
> c <- sapply(a, `[`, 2)
> b <- sapply(b, function(x) x[x != ""])
>
> c2 <- sprintf("%010d", as.integer(c))
>
> newID <- paste0(a, b, c2)
>
>
> Hope this helps,
>
> Rui Barradas
>
> On 2/4/2018 2:01 AM, Val wrote:
>
>> Thank you so much again for your help!
>>
>> I have one more question related to this.
>>
>> 1. How do I further split  this "358USA1540165 " into three parts.
>> a) 358
>> b) USA
>> c) 1540165
>>
>> I want to add leading zeros to the third part  like "0001540165"
>> and then combine   b and c  to get this USA1540165
>> so USA1540165  changed to USA1540165
>>
>> The other one is that the data set has several country codes and if I
>> want to limit my data set to only certain country codes , how do I do that.
>>
>> Thank you again
>>
>>
>>
>>
>> On Sat, Feb 3, 2018 at 1:05 PM, Rui Barradas <ruipbarra...@sapo.pt
>> <mailto:ruipbarra...@sapo.pt>> wrote:
>>
>> Hello,
>>
>> As for the first question, instead of writing a xlsx file, maybe it
>> is easier to write a csv file and then open it with Excel.
>>
>> tbl2 <- addmargins(tbl1)
>> write.csv(tbl2, "tt1.csv")
>>
>> As for the second question, the following does it.
>>
>> inx <- apply(tbl1, 1, function(x) all(x != 0))
>> tbl1b <- addmargins(tbl1[inx, ])
>> tbl1b
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> On 2/3/2018 4:42 PM, Val wrote:
>>
>> Thank you so much Rui.
>>
>> 1. How do I export this table to excel file?
>> I used this
>> tbl1 <- table(Country, IDNum)
>> tbl2=addmargins(tbl1)
>> write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary",
>> row.names=FALSE)
>> The above did not give me that table.
>>
>>
>> 2. I want select those unique Ids that do have records in all
>> countries.
>>From the above data set, this ID  "FIN1540166"  should be
>> excluded from the summary table and the table looks like as follow
>>
>> IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526
>> USA1540165 290 757 321 171 1539 Sum 337 898 569 261 2065
>>
>> Thank you again
>>
>>
>> On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas
>> <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>
>> <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>>
>> wrote:
>>
>>  Hello,
>>
>>  Thanks for the reproducible example.
>>  See if the following does what you want.
>>
>>  IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
>>  Country <- sub("^\\d+(.*)", "\\1", mydata$ID)
>>
>>  tbl1 <- table(Country, IDNum)
>>  addmargins(tbl1)
>>
>>  tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
>>  addmargins(tbl2)
>>
>>
>>  Hope this helps,
>>
>>  Rui Barradas
>>
>>
>>  On 2/3/2018 3:00 AM, Val wrote:
>>
>>  Hi all,
>>
>>  I have a data set  need to be summarized by unique ID
>> (count and
>>  sum of a
>>  variable)
>>  A unique individual ID (country name  Abbreviation
>>followed by
>>  an integer
>>  numbers)  may  have observation in several countries.
>> Then the ID was
>>  changed by adding the country code as a prefix  and
>>new ID was
>>  constructed
>>  or recorded like (country code, + the original unique
>> ID  Example
>>  original ID   "CAN1540164" , if this ID has an
>> observation in
>>

Re: [R] find unique and summerize

2018-02-03 Thread Val
Thank you so much Rui.

1. How do I export this table to excel file?
I used this
  tbl1 <- table(Country, IDNum)
  tbl2=addmargins(tbl1)
  write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", row.names=FALSE)
The above did not give me that table.


2. I want select those unique Ids that do have records in all countries.
 From the above data set, this ID  "FIN1540166"  should be excluded from
the summary table and the table looks like as follow

IDNum
Country 1   33  358   44  Sum
  CAN1540164   47  141  248   90  526
  USA1540165  290  757  321  171 1539
  Sum 337  898  569  261 2065

Thank you again


On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote:

> Hello,
>
> Thanks for the reproducible example.
> See if the following does what you want.
>
> IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
> Country <- sub("^\\d+(.*)", "\\1", mydata$ID)
>
> tbl1 <- table(Country, IDNum)
> addmargins(tbl1)
>
> tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
> addmargins(tbl2)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 2/3/2018 3:00 AM, Val wrote:
>
>> Hi all,
>>
>> I have a data set  need to be summarized by unique ID (count and sum of a
>> variable)
>> A unique individual ID (country name  Abbreviation  followed by an integer
>> numbers)  may  have observation in several countries. Then the  ID was
>> changed by adding the country code as a prefix  and  new ID was
>> constructed
>> or recorded like (country code, + the original unique ID  Example
>> original ID   "CAN1540164" , if this ID has an observation in CANADA then
>> the ID was changed to"1CAN1540164".   From this new ID I want get out
>> the country code  get the  original unique ID  and   summarize the data by
>> unique ID and country code
>>
>> The data set look like
>> mydata <- read.table(textConnection("GR ID iflag Y
>> A 1CAN1540164 1 20
>> A 1CAN1540164 1 12
>> A 1CAN1540164 1 15
>> A 44CAN1540164 1 30
>> A 44CAN1540164 1 24
>> A 44CAN1540164 1 25
>> A 44CAN1540164 1 11
>> A 33CAN1540164 1 12
>> A 33CAN1540164 1 23
>> A 33CAN1540164 1 65
>> A 33CAN1540164 1 41
>> A 358CAN1540164 1 28
>> A 358CAN1540164 1 32
>> A 358CAN1540164 1 41
>> A 358CAN1540164 1 54
>> A 358CAN1540164 1 29
>> A 358CAN1540164 1 64
>> B 1USA1540165 1 125
>> B 1USA1540165 1 165
>> B 44USA1540165 1 171
>> B 33USA1540165 1 254
>> B 33USA1540165 1 241
>> B 33USA1540165 1 262
>> B 358USA1540165 1 321
>> C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
>>
>>  From the above data there are three unique IDs and  four country codes
>> (1,
>> 44, 33 and 358)
>>
>> I want the following two tables
>>
>> Table 1. count  the  unique ID by country code
>>1   44   33   358 TOT
>> CAN1540164 34 4  617
>> USA1540165  2   1  3 1  7
>> FIN1540166   - -   -  1 1
>> TOT 55  7  8   25
>>
>>
>> Table 2  Sum of Y variable by unique ID and country. code
>>
>>1   44   33  358  TOT
>> CAN154016447 90  141  248   526
>> USA1540165   290   171  757  321 1539
>> FIN1540166-- - 225   225
>>  TOT  337 261  898794 2290
>>
>>
>> How do I do it in R?
>>
>>   The first step is to get the unique country codes unique ID by splitting
>> the new ID
>>
>> Thank you in advance
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] find unique and summerize

2018-02-02 Thread Val
Hi all,

I have a data set  need to be summarized by unique ID (count and sum of a
variable)
A unique individual ID (country name  Abbreviation  followed by an integer
numbers)  may  have observation in several countries. Then the  ID was
changed by adding the country code as a prefix  and  new ID was constructed
or recorded like (country code, + the original unique ID  Example
original ID   "CAN1540164" , if this ID has an observation in CANADA then
the ID was changed to"1CAN1540164".   From this new ID I want get out
the country code  get the  original unique ID  and   summarize the data by
unique ID and country code

The data set look like
mydata <- read.table(textConnection("GR ID iflag Y
A 1CAN1540164 1 20
A 1CAN1540164 1 12
A 1CAN1540164 1 15
A 44CAN1540164 1 30
A 44CAN1540164 1 24
A 44CAN1540164 1 25
A 44CAN1540164 1 11
A 33CAN1540164 1 12
A 33CAN1540164 1 23
A 33CAN1540164 1 65
A 33CAN1540164 1 41
A 358CAN1540164 1 28
A 358CAN1540164 1 32
A 358CAN1540164 1 41
A 358CAN1540164 1 54
A 358CAN1540164 1 29
A 358CAN1540164 1 64
B 1USA1540165 1 125
B 1USA1540165 1 165
B 44USA1540165 1 171
B 33USA1540165 1 254
B 33USA1540165 1 241
B 33USA1540165 1 262
B 358USA1540165 1 321
C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)

>From the above data there are three unique IDs and  four country codes (1,
44, 33 and 358)

I want the following two tables

Table 1. count  the  unique ID by country code
  1   44   33   358 TOT
CAN1540164 34 4  617
USA1540165  2   1  3 1  7
FIN1540166   - -   -  1 1
   TOT 55  7  8   25


Table 2  Sum of Y variable by unique ID and country. code

  1   44   33  358  TOT
CAN154016447 90  141  248   526
USA1540165   290   171  757  321 1539
FIN1540166-- - 225   225
TOT  337 261  898794 2290


How do I do it in R?

 The first step is to get the unique country codes unique ID by splitting
the new ID

Thank you in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] match and new columns

2017-12-13 Thread Val
Hi Bill,

I put stringsAsFactors = FALSE
 still did not work.

tdat <- read.table(textConnection("A B C Y
A12 B03 C04 0.70
A23 B05 C06 0.05
A14 B06 C07 1.20
A25 A23 A12 3.51
A16 A25 A14 2,16"),header = TRUE ,stringsAsFactors = FALSE)
tdat$D <- 0
tdat$E <- 0

tdat$D <- (ifelse(tdat$B %in% tdat$A, tdat$A[tdat$B], 0))
tdat$E <- (ifelse(tdat$B %in% tdat$A, tdat$A[tdat$C], 0))
tdat

I got this,

 A  B  C   Y   DE
1 A12 B03 C04 0.7000
2 A23 B05 C06 0.0500
3 A14 B06 C07 1.2000
4 A25 A23 A12 3.51  
5 A16 A25 A14 2,16  





On Wed, Dec 13, 2017 at 7:23 PM, William Dunlap <wdun...@tibco.com> wrote:

> Use the stringsAsFactors=FALSE argument to read.table when
> making your data.frame - factors are getting in your way here.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Dec 13, 2017 at 3:02 PM, Val <valkr...@gmail.com> wrote:
>
>> Thank you Rui,
>> I did not get the desired result. Here is the output from your script
>>
>>A   B   CY D E
>> 1 A12 <https://maps.google.com/?q=1+A12=gmail=g> B03 C04
>> 0.70 0 0
>> 2 A23 B05 C06 0.05 0 0
>> 3 A14 <https://maps.google.com/?q=3+A14=gmail=g> B06 C07
>> 1.20 0 0
>> 4 A25 A23 A12 3.51 1 1
>> 5 A16 A25 A14 2,16 4
>> <https://maps.google.com/?q=A14+2,16+4=gmail=g> 4
>>
>>
>> On Wed, Dec 13, 2017 at 4:36 PM, Rui Barradas <ruipbarra...@sapo.pt>
>> wrote:
>>
>> > Hello,
>> >
>> > Here is one way.
>> >
>> > tdat$D <- ifelse(tdat$B %in% tdat$A, tdat$A[tdat$B], 0)
>> > tdat$E <- ifelse(tdat$B %in% tdat$A, tdat$A[tdat$C], 0)
>> >
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>> >
>> >
>> > On 12/13/2017 9:36 PM, Val wrote:
>> >
>> >> Hi all,
>> >>
>> >> I have a data frame
>> >> tdat <- read.table(textConnection("A B C Y
>> >> A12 B03 C04 0.70
>> >> A23 B05 C06 0.05
>> >> A14 B06 C07 1.20
>> >> A25 A23 A12 3.51
>> >> A16 A25 A14 2,16
>> <https://maps.google.com/?q=A14+2,16=gmail=g>"),header =
>> TRUE)
>> >>
>> >> I want match tdat$B with tdat$A and populate the  column   values of
>> >> tdat$A
>> >> ( col A and Col B) in the newly created columns (col D and col  E).
>> >> please
>> >> find my attempt and the desired output below
>> >>
>> >> Desired output
>> >> A B C Y  D E
>> >> A12 B03 C04 0.70  0  0
>> >> A23 B05 C06 0.05  0  0
>> >> A14 B06 C07 1.20  0  0
>> >> A25 A23 A12 3.51 B05 C06
>> >> A16 A25 A14 2,16 A23 A12
>> <https://maps.google.com/?q=2,16+A23+A12=gmail=g>
>> >>
>> >> my attempt,
>> >>
>> >> tdat$D <- 0
>> >> tdat$E <- 0
>> >>
>> >> if(tdat$B %in% tdat$A)
>> >>{
>> >>tdat$D <- tdat$A[tdat$B]
>> >>tdat$E <- tdat$A[tdat$C]
>> >> }
>> >>   but did not work.
>> >>
>> >> Thank you in advance
>> >>
>> >> [[alternative HTML version deleted]]
>> >>
>> >> __
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide http://www.R-project.org/posti
>> >> ng-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] match and new columns

2017-12-13 Thread Val
Thank you Rui,
I did not get the desired result. Here is the output from your script

   A   B   CY D E
1 A12 B03 C04 0.70 0 0
2 A23 B05 C06 0.05 0 0
3 A14 B06 C07 1.20 0 0
4 A25 A23 A12 3.51 1 1
5 A16 A25 A14 2,16 4 4


On Wed, Dec 13, 2017 at 4:36 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote:

> Hello,
>
> Here is one way.
>
> tdat$D <- ifelse(tdat$B %in% tdat$A, tdat$A[tdat$B], 0)
> tdat$E <- ifelse(tdat$B %in% tdat$A, tdat$A[tdat$C], 0)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 12/13/2017 9:36 PM, Val wrote:
>
>> Hi all,
>>
>> I have a data frame
>> tdat <- read.table(textConnection("A B C Y
>> A12 B03 C04 0.70
>> A23 B05 C06 0.05
>> A14 B06 C07 1.20
>> A25 A23 A12 3.51
>> A16 A25 A14 2,16"),header = TRUE)
>>
>> I want match tdat$B with tdat$A and populate the  column   values of
>> tdat$A
>> ( col A and Col B) in the newly created columns (col D and col  E).
>> please
>> find my attempt and the desired output below
>>
>> Desired output
>> A B C Y  D E
>> A12 B03 C04 0.70  0  0
>> A23 B05 C06 0.05  0  0
>> A14 B06 C07 1.20  0  0
>> A25 A23 A12 3.51 B05 C06
>> A16 A25 A14 2,16 A23 A12
>>
>> my attempt,
>>
>> tdat$D <- 0
>> tdat$E <- 0
>>
>> if(tdat$B %in% tdat$A)
>>{
>>tdat$D <- tdat$A[tdat$B]
>>tdat$E <- tdat$A[tdat$C]
>> }
>>   but did not work.
>>
>> Thank you in advance
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] match and new columns

2017-12-13 Thread Val
Hi all,

I have a data frame
tdat <- read.table(textConnection("A B C Y
A12 B03 C04 0.70
A23 B05 C06 0.05
A14 B06 C07 1.20
A25 A23 A12 3.51
A16 A25 A14 2,16"),header = TRUE)

I want match tdat$B with tdat$A and populate the  column   values of tdat$A
( col A and Col B) in the newly created columns (col D and col  E).  please
find my attempt and the desired output below

Desired output
A B C Y  D E
A12 B03 C04 0.70  0  0
A23 B05 C06 0.05  0  0
A14 B06 C07 1.20  0  0
A25 A23 A12 3.51 B05 C06
A16 A25 A14 2,16 A23 A12

my attempt,

tdat$D <- 0
tdat$E <- 0

if(tdat$B %in% tdat$A)
  {
  tdat$D <- tdat$A[tdat$B]
  tdat$E <- tdat$A[tdat$C]
}
 but did not work.

Thank you in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] family

2017-11-17 Thread Val
Hi all,
I am reading a huge data set(12M rows) that contains family information,
Offspring, Parent1 and Parent2

Parent1 and parent2 should be in the first column as an offspring
before their offspring information. Their parent information (parent1
and parent2) should be  set to zero, if unknown.  Also the first
column should be unique.


Here is my sample data  set  and desired output.


fam <- read.table(textConnection(" offspring  Parent1 Parent2
Smith Alex1  Alexa
Carla Alex1 0
Jacky Smith   Abbot
Jack  0   Jacky
Almo  JackCarla
 "),header = TRUE)



desired output.
Offspring Parent1 Parent2
Alex1  00
Alexa  00
Abbot  00
SmithAlex1  Alexa
CarlaAlex1  0
JackySmith   Abbot
Jack   0 Jacky
Almo JackCarla

Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remove

2017-06-10 Thread Val
Hi all,
I have  a date  issue and would appreciate any help.

I am reading a field data and  n one of the columns I am expecting a
date but has  non date  values  such as  character and  empty. space.
Here is a sample of my data.

KL <- read.table(header=TRUE, text='ID date
711 Dead
712 Uknown
713 20-11-08
714 11-28-07
301
302 09-02-02
303 09-21-02',stringsAsFactors = FALSE, fill =T)

str(KL)
data.frame': 7 obs. of  2 variables:
 $ ID  : int  711 712 713 714 301 302 303
 $ date: chr  "Dead" "Uknown" "20-11-08" "11-28-07" .

I wanted to convert the date column as follows.
if (max(unique(nchar(as.character(KL$date==10) {
  KL$date <- as.Date(KL$date,"%m/%d/%Y")
}
but not working.


How  could I to remove the corresponding entire row. that do not have
a date format and do the operation?
thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] New var

2017-06-04 Thread Val
Thank you Jeff and All,

Within a given time period (say 700 days, from the start day),  I am
expecting measurements taken at each time interval;. In this case "0" means
measurement taken, "1"  not taken (stopped or opted out  and " -1"  don't
consider that time period for that individual. This will be compared with
the actual measurements taken (Observed- expected)  within each time
interval.




On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>
wrote:

> # read.table is NOT part of the data.table package
> #library(data.table)
> DFM <- read.table( text=
> 'obs start end
> 1 2/1/2015   1/1/2017
> 2 4/11/2010  1/1/2011
> 3 1/4/2006   5/3/2007
> 4 10/1/2007  1/1/2008
> 5 6/1/2011   1/1/2012
> 6 10/5/2004 12/1/2004
> ',header = TRUE, stringsAsFactors = FALSE)
> # cleaner way to compute D
> DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" )
> DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" )
> DFM$D <- as.numeric( DFM$end - DFM$start, units="days" )
> # categorize your data into groups
> DFM$bin <- cut( DFM$D
>   , breaks=c( seq( 0, 500, 100 ), Inf )
>   , right=FALSE # do not include the right edge
>   , ordered_result = TRUE
>   )
> # brute force method you should have been able to figure out to show us
> some work
> DFM$t1 <- ifelse( DFM$D < 100, 1, 0 )
> DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1,
> 0 ) )
> DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1,
> 0 ) )
> DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1,
> 0 ) )
> DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1,
> 0 ) )
> # brute force method with ordered factor
> DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 )
> DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" <
> DFM$bin, 0, -1 ) )
> DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" <
> DFM$bin, 0, -1 ) )
> DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" <
> DFM$bin, 0, -1 ) )
> DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" <
> DFM$bin, 0, -1 ) )
> # less obvious approach using the fact that factors are integers
> # and using the outer function to find all combinations of elements of two
> vectors
> # and the sign function
> DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin )
> , 1:5
> , FUN = function(x,y) {
>   z <- sign(y-x)+1L
>   ifelse( 2 == z, -1L, z )
>   }
> )
>
> # my result, provided using dput for precise representation
> DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710,
> 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167,
> 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700,
> 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L,
> 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)",
> "[400,500)", "[500,Inf)"), class = c("ordered", "factor")), t1 = c(0,
> 0, 0, 1, 0, 1), t2 = c(0, 0, 0, -1, 0, -1), t3 = c(0, 1, 0, -1,
> 1, -1), t4 = c(0, -1, 0, -1, -1, -1), t5 = c(0, -1, 1, -1, -1,
> -1), tf1 = c(0, 0, 0, 1, 0, 1), tf2 = c(0, 0, 0, -1, 0, -1),
> tf3 = c(0, 1, 0, -1, 1, -1), tf4 = c(0, -1, 0, -1, -1, -1
> ), tf5 = c(0, -1, 1, -1, -1, -1), tm1 = c(0, 0, 0, 1, 0,
> 1), tm2 = c(0, 0, 0, -1, 0, -1), tm3 = c(0, 1, 0, -1, 1,
> -1), tm4 = c(0, -1, 0, -1, -1, -1), tm5 = c(0, -1, 1, -1,
> -1, -1)), row.names = c(NA, -6L), .Names = c("obs", "start",
> "end", "D", "bin", "t1", "t2", "t3", "t4", "t5", "tf1", "tf2",
> "tf3", "tf4", "tf5", "tm1", "tm2", "tm3", "tm4", "tm5"), class =
> "data.frame")
>
> You did not address Bert's request for some context, but I am curious how
> he or Peter would have approached this problem, so I encourage you do
> provide some insight on the list as to why you are doing this.
>
>
> On Sat, 3 Jun 2017, Val wrote:
>
> Thank you all for the useful suggestion. I did 

Re: [R] New var

2017-06-03 Thread Val
Thank you all for the useful suggestion. I did some of my homework.

library(data.table)
DFM <- read.table(header=TRUE, text='obs start end
1 2/1/2015   1/1/2017
2 4/11/2010  1/1/2011
3 1/4/2006   5/3/2007
4 10/1/2007  1/1/2008
5 6/1/2011   1/1/2012
6 10/5/2004 12/1/2004',stringsAsFactors = FALSE)
DFM

DFM$D =as.numeric(difftime(as.Date(DFM$end,format="%m/%d/%Y"),
as.Date(DFM$start,format="%m/%d/%Y"), units = "days"))
DFM

output.
 obs start   end   D
1   1  2/1/2015  1/1/2017 700
2   2 4/11/2010  1/1/2011 265
3   3  1/4/2006  5/3/2007 484
4   4 10/1/2007  1/1/2008  92
5   5  6/1/2011  1/1/2012 214
6   6 10/5/2004 12/1/2004  57

My problem is how do I get the other new variables

obs start   end   D  t1,t2,t3,t4, t5
1, 2/1/2015,  1/1/2017, 700,0,0,0,0,0
2, 4/11/2010, 1/1/2011, 265,0,0,1,-1,-1
3, 1/4/2006,  5/3/2007, 484,0,0,0,0,1
4, 10/1/2007, 1/1/2008, 92,1,-1,-1,-1,-1
5, 6/1/2011,  1/1/2012, 214,0,0,1,-1,-1
6, 10/15/2004,12/1/2004,47,1,-1,-1,-1,-1

Thank you again.



On Sat, Jun 3, 2017 at 12:13 AM, Bert Gunter <bgunter.4...@gmail.com> wrote:
> Ii is difficult to provide useful help, because you have failed to
> read and follow the posting guide. In particular:
>
> 1. Plain text, not HTML.
> 2. Use dput() or provide code to create your example. Text printouts
> such as that which you gave require some work to wrangle into into an
> example that we can test.
>
> Specifically:
>
> 3. Have you gone through any R tutorials?-- it sure doesn't look like
> it. We do expect some effort to learn R before posting.
>
> 4. What is the format of your date columns? character, factors,
> POSIX,...? See ?date-time for details. Note particularly the
> "difftime" link to obtain intervals.
>
> 5. ?ifelse  for vectorized conditionals.
>
> Also, you might want to explain the context of what you are trying to
> do. I strongly suspect you shouldn't be doing it at all, but that is
> just a guess.
>
> Be sure to cc your reply to the list, not just to me.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Jun 2, 2017 at 8:49 PM, Val <valkr...@gmail.com> wrote:
>> Hi all,
>>
>> I have a data set with time interval and depending on the interval I want
>> to create 5 more variables . Sample data below
>>
>> obs,   Start,   End
>> 1,2/1/2015,  1/1/2017
>> 2,4/11/2010, 1/1/2011
>> 3,1/4/2006,  5/3/2007
>> 4,10/1/2007, 1/1/2008
>> 5,6/1/2011,  1/1/2012
>> 6,10/15/2004,12/1/2004
>>
>> First, I want get  interval between the start date and end dates
>> (End-start).
>>
>>  obs,  Start , end, datediff
>> 1,2/1/2015,  1/1/2017, 700
>> 2,4/11/2010, 1/1/2011, 265
>> 3,1/4/2006,  5/3/2007, 484
>> 4,10/1/2007, 1/1/2008, 92
>> 5,6/1/2011,  1/1/2012, 214
>> 6,10/15/2004,12/1/2004,47
>>
>> Second. I want create 5 more variables  t1, t2, t3, t4 and  t5
>> The value of each variable is defined as follows
>> if datediff <   100 then  t1=1,  t2=t3=t4=t5=-1.
>> if datediff >= 100 and  < 200 then  t1=0, t2=1,t3=t4=t5=-1,
>> if datediff >= 200 and  < 300 then  t1=0, t2=0,t3=1,t4=t5=-1,
>> if datediff >= 300 and  < 400 then  t1=0, t2=0,t3=0,t4=1,t5=-1,
>> if datediff >= 400 and  < 500 then  t1=0, t2=0,t3=0,t4=0,t5=1,
>> if datediff >= 500 then  t1=0, t2=0,t3=0,t4=0,t5=0
>>
>> The complete out put looks like as follow.
>> obs, start, end,datediff,   t1, t2, t3, t4, t5
>> 1,2/1/2015,   1/1/2017,700, 0,  0,  0,  0,  0
>> 2,  4/11/2010,   1/1/2011,265, 0,  0,  1, -1,  -1
>> 3,1/4/2006,   5/3/2007,484, 0,  0,  0, 0,   1
>> 4,   10/1/2007,  1/1/2008,  92, 1, -1, -1,-1,  -1
>> 5 ,6/1/2011,1/1/2012,  214,  0,  0,  1,-1,  -1
>> 6, 10/15/2004, 12/1/2004, 47, 1, -1, -1, -1, -1
>>
>> Thank you.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] New var

2017-06-02 Thread Val
Hi all,

I have a data set with time interval and depending on the interval I want
to create 5 more variables . Sample data below

obs,   Start,   End
1,2/1/2015,  1/1/2017
2,4/11/2010, 1/1/2011
3,1/4/2006,  5/3/2007
4,10/1/2007, 1/1/2008
5,6/1/2011,  1/1/2012
6,10/15/2004,12/1/2004

First, I want get  interval between the start date and end dates
(End-start).

 obs,  Start , end, datediff
1,2/1/2015,  1/1/2017, 700
2,4/11/2010, 1/1/2011, 265
3,1/4/2006,  5/3/2007, 484
4,10/1/2007, 1/1/2008, 92
5,6/1/2011,  1/1/2012, 214
6,10/15/2004,12/1/2004,47

Second. I want create 5 more variables  t1, t2, t3, t4 and  t5
The value of each variable is defined as follows
if datediff <   100 then  t1=1,  t2=t3=t4=t5=-1.
if datediff >= 100 and  < 200 then  t1=0, t2=1,t3=t4=t5=-1,
if datediff >= 200 and  < 300 then  t1=0, t2=0,t3=1,t4=t5=-1,
if datediff >= 300 and  < 400 then  t1=0, t2=0,t3=0,t4=1,t5=-1,
if datediff >= 400 and  < 500 then  t1=0, t2=0,t3=0,t4=0,t5=1,
if datediff >= 500 then  t1=0, t2=0,t3=0,t4=0,t5=0

The complete out put looks like as follow.
obs, start, end,datediff,   t1, t2, t3, t4, t5
1,2/1/2015,   1/1/2017,700, 0,  0,  0,  0,  0
2,  4/11/2010,   1/1/2011,265, 0,  0,  1, -1,  -1
3,1/4/2006,   5/3/2007,484, 0,  0,  0, 0,   1
4,   10/1/2007,  1/1/2008,  92, 1, -1, -1,-1,  -1
5 ,6/1/2011,1/1/2012,  214,  0,  0,  1,-1,  -1
6, 10/15/2004, 12/1/2004, 47, 1, -1, -1, -1, -1

Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] missing and replace

2017-04-26 Thread Val
HI all,

I have a data frame with three variables. Some of the variables do
have missing values and I want to replace those missing values
(1represented by NA) with the mean value of that variable. In this
sample data,  variable z and y do have missing values. The mean value
of y  and z are152. 25  and 359.5, respectively . I want replace those
missing values  by the respective mean value ( rounded to the nearest
whole number).

DF1 <- read.table(header=TRUE, text='ID1 x y z
1  25  122352
2  30  135376
3  40   NA350
4  26  157NA
5  60  195360')
mean x= 36.2
mean y=152.25
mean z= 359.5

output
ID1  x  y  z
1   25 122   352
2   30 135   376
3   40 152   350
4   26 157   360
5   60 195   360


Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combination

2017-04-12 Thread Val
Hi all,
I have two variables x and y. X has five observation and y has three.
I want combine each element of x to  each element of y values to
produce 15  observation. Below is my sample data and desired output

data
x   Y
1   A
2   B
3   C
4
5

Output
1  A
1  B
1  C
2  A
2  B
2  C
3  A
3  B
3  C
4  A
4  B
4  C
5  A
5  B
5  C

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] create new

2017-03-24 Thread Val
Hi all,


I have several variables in a group and one group  contains three
variables. Sample of data ( Year, x1, x3 and x2)

mydat <- read.table(header=TRUE, text=' Year  x1  x3  x2
Year1  10  120
Year2   0  150
Year3   0   020
Year4  25   0   12
Year5  15  25   12
Year6   0  16   14
Year7   0  100')

I want create another variable( x4) based on the following condition.

if x1  > 0  then x4 = x1; regardless of  x2 and x3 values.
if x1  = 0  and x2  > 0 then x4 = x2;
if x1  = 0 and  x2  = 0 then x4 = x3

The desired output looks like as follows
Yearx1  x3  x2   x4
Year1  10  120  10
Year20  150  15
Year300   20  20
Year4  250   12  25
Year5  15  25   12  15
Year60  16   14  14
Year70  10 0  10

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] screen

2017-03-15 Thread Val
HI all,

I have some data to be screened  based on the recording flag (obs).
Some family recorded properly (1) and others not (0).  Th 0 = improper
and 1 = proper

The recording  period starts week1.  All families may not start in the
same week in recording properly an observation,

  DF2 <- read.table(header=TRUE, text='family time obs
A  WEEK1 0
A  WEEK1 0
A  WEEK1 0
A  WEEK2 1
A  WEEK2 0
A  WEEK3 1
A  WEEK3 0
B  WEEK1 1
B  WEEK1 0
B  WEEK1 1
B  WEEK2 0
B  WEEK2 0
B  WEEK3 1
B  WEEK3 0
C  WEEK3 0
C  WEEK3 0
C  WEEK4 1
C  WEEK4 1')

Example, in week1  all records of family "A" are 0 (improper), but
starting the week2 they start recording proper (1) records as well.
Then I create a table that shows me the ratio of proper records to the
total records for each family within week. If the ratio is zero and
there is no prior proper recordings for that family then I want to
delete those records.

However,  once any family started showing proper records  as "1"  and
even if in the  the subsequent week the ratio is 0  then I want keep
that record for that family. Example records of week2 for family B

Here is the summary table

  WEEK1  WEEK2WEEK3WEEK4
A  00.5  0.5   .
B   0.33   00.5   .
C  .   . 01

>From the above table
For A-  I want exclude all records of week1 and keep the rest. Because
they were not recording it propeller
For B-  Keep all records, as they stated recording properly from the beginning.
For C-  Keep only the week4 records because all records are  1's

Final and desired  result will be

A WEEK2 1
A WEEK2 0
A WEEK3 1
A WEEK3 0
B WEEK1 1
B WEEK1 0
B WEEK1 1
B WEEK2 0
B WEEK2 0
B WEEK3 1
B WEEK3 0
C WEEK4 1
C WEEK4 1


and the summary table looks like as follows

   WEEK1  WEEK2  WEEK3  WEEK4
A .0.5 0.5.
B  0.330 0.5.
C   .  . .1

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] replace

2017-03-13 Thread Val
HI all,

if first name  is  Alex then I want concatenate the second column to Alex
to produce Alex and  the second column value

DF1 <- read.table(header=TRUE, text='first YR
Alex2001
Bob 2001
Cory2001
Cory2002
Bob 2002
Bob 2003
Alex2002
Alex2003
Alex2004')


Output
data frame
DF2
Alex-2001   2001
Bob 2001
Cory2001
Cory2002
Bob 2002
Bob 2003
Alex-2002   2002
Alex-2003   2003
Alex-2004   2004

I tried this one but did not work.
DF1$first[DF1$first=="Alex"] <-  paste(DF1$first, DF1$YR, sep='-')

Thank you in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] if and

2017-02-27 Thread Val
Thank you  Rolf and Bert!

I found the problem and this

if(country="USA" & year-month = "FEB2015" | "FEB2012" ){
has be changed  to  this
if(country="USA" & year-month == "FEB2015" | year-month == "FEB2012" ){

On Mon, Feb 27, 2017 at 8:45 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:

> I note that you have "Year-month" (capital 'Y') and "year-month" in
> your code; case matters in R.
>
> Otherwise, Rolf's advice applies.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Feb 27, 2017 at 6:16 PM, Rolf Turner <r.tur...@auckland.ac.nz>
> wrote:
> > On 28/02/17 14:47, Val wrote:
> >>
> >> Currently I have  about  six or more  scripts that do the same job.  I
> >> thought it might be possible and more efficient to use one script by
> using
> >> IF ELSE statements. Here is an example but this will be expandable for
> >> several countries ans year-months
> >>
> >>
> >> Year-month = FEB2015, FEB2012,  Feb2010
> >>  country  = USA, CAN.MEX
> >> First I want to do if country = USA and year-month = FEB2015, FEB2012 do
> >> the statements
> >> second if country = CAN and year-month =Feb2010 do  the statements
> >>
> >>
> >> if(country="USA" & year-month = "FEB2015" | "FEB2012" ){
> >> statemnt1
> >> .
> >> statemnt10
> >>
> >> } else if (country="USA" & year-month ="FEB2015") {
> >> statemnt1
> >> .
> >> statemnt10
> >> }
> >>
> >> else
> >> {
> >> statemnt1
> >> .
> >> statemnt10
> >> }
> >>
> >> The above script did not work. is there a different ways of doing it?
> >
> >
> > Uh, yes.  Get the syntax right.  Use R, when you are using R.
> >
> > Looking at ?Syntax and ?Logic might help you a bit.
> >
> > Other than that, there's not much that one can say without seeing a
> > reproducible example.  And if you sat down and wrote out a *reproducible
> > example*, using correct R syntax, you probably wouldn't need any
> assistance
> > from R-help.
> >
> > Have you read any of the readily available R tutorials?  If not do so. If
> > so, read them again and actually take note of what they say!
> >
> > cheers,
> >
> > Rolf Turner
> >
> > --
> > Technical Editor ANZJS
> > Department of Statistics
> > University of Auckland
> > Phone: +64-9-373-7599 ext. 88276
> >
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] if and

2017-02-27 Thread Val
Currently I have  about  six or more  scripts that do the same job.  I
thought it might be possible and more efficient to use one script by using
IF ELSE statements. Here is an example but this will be expandable for
several countries ans year-months


Year-month = FEB2015, FEB2012,  Feb2010
 country  = USA, CAN.MEX
First I want to do if country = USA and year-month = FEB2015, FEB2012 do
the statements
second if country = CAN and year-month =Feb2010 do  the statements


if(country="USA" & year-month = "FEB2015" | "FEB2012" ){
statemnt1
.
statemnt10

} else if (country="USA" & year-month ="FEB2015") {
statemnt1
.
statemnt10
}

else
{
statemnt1
.
statemnt10
}

The above script did not work. is there a different ways of doing it?

Thank you in advance
.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove

2017-02-12 Thread Val
Hi Jeff and All,

When I examined the excluded  data,  ie.,  first name with  with
different last names, I noticed that  some last names were  not
recorded
or instance, I modified the data as follows
DF <- read.table( text=
'first  week last
Alex1  West
Bob 1  John
Cory1  Jack
Cory2 -
Bob 2  John
Bob 3  John
Alex2  Joseph
Alex3  West
Alex4  West
', header = TRUE, as.is = TRUE )


err2 <- ave( seq_along( DF$first )
   , DF[ , "first", drop = FALSE]
   , FUN = function( n ) {
  length( unique( DF[ n, "last" ] ) )
 }
   )
result2 <- DF[ 1 == err2, ]
result2

first week last
2   Bob1 John
5   Bob2 John
6   Bob3 John

However, I want keep Cory's record. It is assumed that not recorded
should have the same last name.

Final out put should be

first week last
   Bob1 John
   Bob2 John
   Bob3 John
  Cory1  Jack
  Cory2   -

Thank you again!

On Sun, Feb 12, 2017 at 7:28 PM, Val <valkr...@gmail.com> wrote:
> Sorry  Jeff, I did not finish my email. I accidentally touched the send 
> button.
> My question was the
> when I used this one
> length(unique(result2$first))
>  vs
> dim(result2[!duplicated(result2[,c('first')]),]) [1]
>
> I did get different results but now I found out the problem.
>
> Thank you!.
>
>
>
>
>
>
>
>
> On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller
> <jdnew...@dcn.davis.ca.us> wrote:
>> Your question mystifies me, since it looks to me like you already know the 
>> answer.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On February 12, 2017 3:30:49 PM PST, Val <valkr...@gmail.com> wrote:
>>>Hi Jeff and all,
>>> How do I get the  number of unique first names   in the two data sets?
>>>
>>>for the first one,
>>>result2 <- DF[ 1 == err2, ]
>>>length(unique(result2$first))
>>>
>>>
>>>
>>>
>>>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
>>><jdnew...@dcn.davis.ca.us> wrote:
>>>> The "by" function aggregates and returns a result with generally
>>>fewer rows
>>>> than the original data. Since you are looking to index the rows in
>>>the
>>>> original data set, the "ave" function is better suited because it
>>>always
>>>> returns a vector that is just as long as the input vector:
>>>>
>>>> # I usually work with character data rather than factors if I plan
>>>> # to modify the data (e.g. removing rows)
>>>> DF <- read.table( text=
>>>> 'first  week last
>>>> Alex1  West
>>>> Bob 1  John
>>>> Cory1  Jack
>>>> Cory2  Jack
>>>> Bob 2  John
>>>> Bob 3  John
>>>> Alex2  Joseph
>>>> Alex3  West
>>>> Alex4  West
>>>> ', header = TRUE, as.is = TRUE )
>>>>
>>>> err <- ave( DF$last
>>>>   , DF[ , "first", drop = FALSE]
>>>>   , FUN = function( lst ) {
>>>>   length( unique( lst ) )
>>>> }
>>>>   )
>>>> result <- DF[ "1" == err, ]
>>>> result
>>>>
>>>> Notice that the ave function returns a vector of the same type as was
>>>given
>>>> to it, so even though the function returns a numeric the err
>>>> vector is character.
>>>>
>>>> If you wanted to be able to examine more than one other column in
>>>> determining the keep/reject decision, you could do:
>>>>
>>>> err2 <- ave( seq_along( DF$first )
>>>>, DF[ , "first", drop = FALSE]
>>>>, FUN = function( n ) {
>>>>   length( unique( DF[ n, "last" ] ) )
>>>>  }
>>>>)
>>>> result2 <- DF[ 1 == err2, ]
>>>> result2
>>>>
>>>> and then you would have the option to re-use the "n" index to look at
>>>other
>>>> columns as well.
>>>>
>>>> Finally, here is a dplyr solution:
>>>>
>>>> library(dplyr)
>>>> result3 <- (   DF
>>>>%>% group_by( first ) # like a prep for ave or by
>>>>%>% mutate( err = length( unique( last ) ) ) # similar to
>>>ave
>>>>%>% filter( 1 == err ) # drop the rows with too many last
>

Re: [R] remove

2017-02-12 Thread Val
Sorry  Jeff, I did not finish my email. I accidentally touched the send button.
My question was the
when I used this one
length(unique(result2$first))
 vs
dim(result2[!duplicated(result2[,c('first')]),]) [1]

I did get different results but now I found out the problem.

Thank you!.








On Sun, Feb 12, 2017 at 6:31 PM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> Your question mystifies me, since it looks to me like you already know the 
> answer.
> --
> Sent from my phone. Please excuse my brevity.
>
> On February 12, 2017 3:30:49 PM PST, Val <valkr...@gmail.com> wrote:
>>Hi Jeff and all,
>> How do I get the  number of unique first names   in the two data sets?
>>
>>for the first one,
>>result2 <- DF[ 1 == err2, ]
>>length(unique(result2$first))
>>
>>
>>
>>
>>On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
>><jdnew...@dcn.davis.ca.us> wrote:
>>> The "by" function aggregates and returns a result with generally
>>fewer rows
>>> than the original data. Since you are looking to index the rows in
>>the
>>> original data set, the "ave" function is better suited because it
>>always
>>> returns a vector that is just as long as the input vector:
>>>
>>> # I usually work with character data rather than factors if I plan
>>> # to modify the data (e.g. removing rows)
>>> DF <- read.table( text=
>>> 'first  week last
>>> Alex1  West
>>> Bob 1  John
>>> Cory1  Jack
>>> Cory2  Jack
>>> Bob 2  John
>>> Bob 3  John
>>> Alex2  Joseph
>>> Alex3  West
>>> Alex4  West
>>> ', header = TRUE, as.is = TRUE )
>>>
>>> err <- ave( DF$last
>>>   , DF[ , "first", drop = FALSE]
>>>   , FUN = function( lst ) {
>>>   length( unique( lst ) )
>>> }
>>>   )
>>> result <- DF[ "1" == err, ]
>>> result
>>>
>>> Notice that the ave function returns a vector of the same type as was
>>given
>>> to it, so even though the function returns a numeric the err
>>> vector is character.
>>>
>>> If you wanted to be able to examine more than one other column in
>>> determining the keep/reject decision, you could do:
>>>
>>> err2 <- ave( seq_along( DF$first )
>>>, DF[ , "first", drop = FALSE]
>>>, FUN = function( n ) {
>>>   length( unique( DF[ n, "last" ] ) )
>>>  }
>>>)
>>> result2 <- DF[ 1 == err2, ]
>>> result2
>>>
>>> and then you would have the option to re-use the "n" index to look at
>>other
>>> columns as well.
>>>
>>> Finally, here is a dplyr solution:
>>>
>>> library(dplyr)
>>> result3 <- (   DF
>>>%>% group_by( first ) # like a prep for ave or by
>>>%>% mutate( err = length( unique( last ) ) ) # similar to
>>ave
>>>%>% filter( 1 == err ) # drop the rows with too many last
>>names
>>>%>% select( -err ) # drop the temporary column
>>>%>% as.data.frame # convert back to a plain-jane data
>>frame
>>>)
>>> result3
>>>
>>> which uses a small set of verbs in a pipeline of functions to go from
>>input
>>> to result in one pass.
>>>
>>> If your data set is really big (running out of memory big) then you
>>might
>>> want to investigate the data.table or sqlite packages, either of
>>which can
>>> be combined with dplyr to get a standardized syntax for managing
>>larger
>>> amounts of data. However, most people actually aren't running out of
>>memory
>>> so in most cases the extra horsepower isn't actually needed.
>>>
>>>
>>> On Sun, 12 Feb 2017, P Tennant wrote:
>>>
>>>> Hi Val,
>>>>
>>>> The by() function could be used here. With the dataframe dfr:
>>>>
>>>> # split the data by first name and check for more than one last name
>>for
>>>> each first name
>>>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
>>>> # make the result more easily manipulated
>>>> res <- as.table(res)
>>>> res
>>>> # first
>>>> # Alex   Bob  Cory
>

Re: [R] remove

2017-02-12 Thread Val
Hi Jeff and all,
 How do I get the  number of unique first names   in the two data sets?

for the first one,
result2 <- DF[ 1 == err2, ]
length(unique(result2$first))




On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> The "by" function aggregates and returns a result with generally fewer rows
> than the original data. Since you are looking to index the rows in the
> original data set, the "ave" function is better suited because it always
> returns a vector that is just as long as the input vector:
>
> # I usually work with character data rather than factors if I plan
> # to modify the data (e.g. removing rows)
> DF <- read.table( text=
> 'first  week last
> Alex1  West
> Bob 1  John
> Cory1  Jack
> Cory2  Jack
> Bob 2  John
> Bob 3  John
> Alex2  Joseph
> Alex3  West
> Alex4  West
> ', header = TRUE, as.is = TRUE )
>
> err <- ave( DF$last
>   , DF[ , "first", drop = FALSE]
>   , FUN = function( lst ) {
>   length( unique( lst ) )
> }
>   )
> result <- DF[ "1" == err, ]
> result
>
> Notice that the ave function returns a vector of the same type as was given
> to it, so even though the function returns a numeric the err
> vector is character.
>
> If you wanted to be able to examine more than one other column in
> determining the keep/reject decision, you could do:
>
> err2 <- ave( seq_along( DF$first )
>, DF[ , "first", drop = FALSE]
>, FUN = function( n ) {
>   length( unique( DF[ n, "last" ] ) )
>  }
>)
> result2 <- DF[ 1 == err2, ]
> result2
>
> and then you would have the option to re-use the "n" index to look at other
> columns as well.
>
> Finally, here is a dplyr solution:
>
> library(dplyr)
> result3 <- (   DF
>%>% group_by( first ) # like a prep for ave or by
>%>% mutate( err = length( unique( last ) ) ) # similar to ave
>%>% filter( 1 == err ) # drop the rows with too many last names
>%>% select( -err ) # drop the temporary column
>%>% as.data.frame # convert back to a plain-jane data frame
>)
> result3
>
> which uses a small set of verbs in a pipeline of functions to go from input
> to result in one pass.
>
> If your data set is really big (running out of memory big) then you might
> want to investigate the data.table or sqlite packages, either of which can
> be combined with dplyr to get a standardized syntax for managing larger
> amounts of data. However, most people actually aren't running out of memory
> so in most cases the extra horsepower isn't actually needed.
>
>
> On Sun, 12 Feb 2017, P Tennant wrote:
>
>> Hi Val,
>>
>> The by() function could be used here. With the dataframe dfr:
>>
>> # split the data by first name and check for more than one last name for
>> each first name
>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
>> # make the result more easily manipulated
>> res <- as.table(res)
>> res
>> # first
>> # Alex   Bob  Cory
>> # TRUE FALSE FALSE
>>
>> # then use this result to subset the data
>> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ]
>> # sort if needed
>> nw.dfr[order(nw.dfr$first) , ]
>>
>>  first week last
>> 2   Bob1 John
>> 5   Bob2 John
>> 6   Bob3 John
>> 3  Cory1 Jack
>> 4  Cory2 Jack
>>
>>
>> Philip
>>
>> On 12/02/2017 4:02 PM, Val wrote:
>>>
>>> Hi all,
>>> I have a big data set and want to  remove rows conditionally.
>>> In my data file  each person were recorded  for several weeks. Somehow
>>> during the recording periods, their last name was misreported.   For
>>> each person,   the last name should be the same. Otherwise remove from
>>> the data. Example, in the following data set, Alex was found to have
>>> two last names .
>>>
>>> Alex   West
>>> Alex   Joseph
>>>
>>> Alex should be removed  from the data.  if this happens then I want
>>> remove  all rows with Alex. Here is my data set
>>>
>>> df<- read.table(header=TRUE, text='first  week last
>>> Alex1  West
>>> Bob 1  John
>>> Cory1  Jack
>>> Cory2  Jack
>>> Bob 2  John
>>> Bob 3  John
>>> Alex2  Joseph
>>> Alex3  West
>>> Alex4  W

Re: [R] [FORGED] Re: remove

2017-02-12 Thread Val
Thank you Rainer,

The question was :-
1. Identify those first names with different last names or more than
one last names.
2. Once identified (like Alex)  then exclude them.  This is because
not reliable record.

On Sun, Feb 12, 2017 at 11:17 AM, Rainer Schuermann
<rainer.schuerm...@gmx.net> wrote:
> I may not be understanding the question well enough but for me
>
> df[ df[ , "first"]  != "Alex", ]
>
> seems to do the job:
>
>   first week last
>
> Rainer
>
>
>
>
> On Sonntag, 12. Februar 2017 19:04:19 CET Rolf Turner wrote:
>>
>> On 12/02/17 18:36, Bert Gunter wrote:
>> > Basic stuff!
>> >
>> > Either subscripting or ?subset.
>> >
>> > There are many good R tutorials on the web. You should spend some
>> > (more?) time with some.
>>
>> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't
>> seem basic to me.  The only way that I can see how to go at it is via
>> a for loop:
>>
>> rdln <- function(X) {
>> # Remove discordant last names.
>>  ok <- logical(nrow(X))
>>  for(nm in unique(X$first)) {
>>  xxx <- unique(X$last[X$first==nm])
>>  if(length(xxx)==1) ok[X$first==nm] <- TRUE
>>  }
>>  Y <- X[ok,]
>>  Y <- Y[order(Y$first),]
>>  rownames(Y) <- 1:nrow(Y)
>>  Y
>> }
>>
>> Calling the toy data frame "melvin" rather than "df" (since "df" is the
>> name of the built in F density function, it is bad form to use it as the
>> name of another object) I get:
>>
>>  > rdln(melvin)
>>first week last
>> 1   Bob1 John
>> 2   Bob2 John
>> 3   Bob3 John
>> 4  Cory1 Jack
>> 5  Cory2 Jack
>>
>> which is the desired output.  If there is a "basic stuff" way to do this
>> I'd like to see it.  Perhaps I will then be toadally embarrassed, but
>> they say that this is good for one.
>>
>> cheers,
>>
>> Rolf
>>
>> > On Sat, Feb 11, 2017 at 9:02 PM, Val <valkr...@gmail.com> wrote:
>> >> Hi all,
>> >> I have a big data set and want to  remove rows conditionally.
>> >> In my data file  each person were recorded  for several weeks. Somehow
>> >> during the recording periods, their last name was misreported.   For
>> >> each person,   the last name should be the same. Otherwise remove from
>> >> the data. Example, in the following data set, Alex was found to have
>> >> two last names .
>> >>
>> >> Alex   West
>> >> Alex   Joseph
>> >>
>> >> Alex should be removed  from the data.  if this happens then I want
>> >> remove  all rows with Alex. Here is my data set
>> >>
>> >> df <- read.table(header=TRUE, text='first  week last
>> >> Alex1  West
>> >> Bob 1  John
>> >> Cory1  Jack
>> >> Cory2  Jack
>> >> Bob 2  John
>> >> Bob 3  John
>> >> Alex2  Joseph
>> >> Alex3  West
>> >> Alex4  West ')
>> >>
>> >> Desired output
>> >>
>> >>   first  week last
>> >> 1 Bob 1   John
>> >> 2 Bob 2   John
>> >> 3 Bob 3   John
>> >> 4 Cory 1   Jack
>> >> 5 Cory 2   Jack
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove

2017-02-12 Thread Val
 Jeff, Rolf and Philip.
Thank you very much for your suggestion.

Jeff, you suggested if your data is big then consider data.table 
My data is "big"  it is more than 200M  records and I will see if this
function works.

Thank you again.


On Sun, Feb 12, 2017 at 12:42 AM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> The "by" function aggregates and returns a result with generally fewer rows
> than the original data. Since you are looking to index the rows in the
> original data set, the "ave" function is better suited because it always
> returns a vector that is just as long as the input vector:
>
> # I usually work with character data rather than factors if I plan
> # to modify the data (e.g. removing rows)
> DF <- read.table( text=
> 'first  week last
> Alex1  West
> Bob 1  John
> Cory1  Jack
> Cory2  Jack
> Bob 2  John
> Bob 3  John
> Alex2  Joseph
> Alex3  West
> Alex4  West
> ', header = TRUE, as.is = TRUE )
>
> err <- ave( DF$last
>   , DF[ , "first", drop = FALSE]
>   , FUN = function( lst ) {
>   length( unique( lst ) )
> }
>   )
> result <- DF[ "1" == err, ]
> result
>
> Notice that the ave function returns a vector of the same type as was given
> to it, so even though the function returns a numeric the err
> vector is character.
>
> If you wanted to be able to examine more than one other column in
> determining the keep/reject decision, you could do:
>
> err2 <- ave( seq_along( DF$first )
>, DF[ , "first", drop = FALSE]
>, FUN = function( n ) {
>   length( unique( DF[ n, "last" ] ) )
>  }
>)
> result2 <- DF[ 1 == err2, ]
> result2
>
> and then you would have the option to re-use the "n" index to look at other
> columns as well.
>
> Finally, here is a dplyr solution:
>
> library(dplyr)
> result3 <- (   DF
>%>% group_by( first ) # like a prep for ave or by
>%>% mutate( err = length( unique( last ) ) ) # similar to ave
>%>% filter( 1 == err ) # drop the rows with too many last names
>%>% select( -err ) # drop the temporary column
>%>% as.data.frame # convert back to a plain-jane data frame
>)
> result3
>
> which uses a small set of verbs in a pipeline of functions to go from input
> to result in one pass.
>
> If your data set is really big (running out of memory big) then you might
> want to investigate the data.table or sqlite packages, either of which can
> be combined with dplyr to get a standardized syntax for managing larger
> amounts of data. However, most people actually aren't running out of memory
> so in most cases the extra horsepower isn't actually needed.
>
>
> On Sun, 12 Feb 2017, P Tennant wrote:
>
>> Hi Val,
>>
>> The by() function could be used here. With the dataframe dfr:
>>
>> # split the data by first name and check for more than one last name for
>> each first name
>> res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1)
>> # make the result more easily manipulated
>> res <- as.table(res)
>> res
>> # first
>> # Alex   Bob  Cory
>> # TRUE FALSE FALSE
>>
>> # then use this result to subset the data
>> nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ]
>> # sort if needed
>> nw.dfr[order(nw.dfr$first) , ]
>>
>>  first week last
>> 2   Bob1 John
>> 5   Bob2 John
>> 6   Bob3 John
>> 3  Cory1 Jack
>> 4  Cory2 Jack
>>
>>
>> Philip
>>
>> On 12/02/2017 4:02 PM, Val wrote:
>>>
>>> Hi all,
>>> I have a big data set and want to  remove rows conditionally.
>>> In my data file  each person were recorded  for several weeks. Somehow
>>> during the recording periods, their last name was misreported.   For
>>> each person,   the last name should be the same. Otherwise remove from
>>> the data. Example, in the following data set, Alex was found to have
>>> two last names .
>>>
>>> Alex   West
>>> Alex   Joseph
>>>
>>> Alex should be removed  from the data.  if this happens then I want
>>> remove  all rows with Alex. Here is my data set
>>>
>>> df<- read.table(header=TRUE, text='first  week last
>>> Alex1  West
>>> Bob 1  John
>>> Cory1  Jack
>>> Cory2  Jack
>>> Bob 2  John
>>> Bob 3  John
>>>

[R] remove

2017-02-11 Thread Val
Hi all,
I have a big data set and want to  remove rows conditionally.
In my data file  each person were recorded  for several weeks. Somehow
during the recording periods, their last name was misreported.   For
each person,   the last name should be the same. Otherwise remove from
the data. Example, in the following data set, Alex was found to have
two last names .

Alex   West
Alex   Joseph

Alex should be removed  from the data.  if this happens then I want
remove  all rows with Alex. Here is my data set

df <- read.table(header=TRUE, text='first  week last
Alex1  West
Bob 1  John
Cory1  Jack
Cory2  Jack
Bob 2  John
Bob 3  John
Alex2  Joseph
Alex3  West
Alex4  West ')

Desired output

  first  week last
1 Bob 1   John
2 Bob 2   John
3 Bob 3   John
4 Cory 1   Jack
5 Cory 2   Jack

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] output

2017-01-17 Thread Val
Hi Marc and all,

Last time you suggest me to use  WriteXLS  function to write more than
65,000  row in excel.  Creating the file worked fine.  Now I wanted to
read it using the WriteXLS   function but have a problem,. The file
has more than  one sheets.  Here is the script and the error message.

datx <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
dat <-data.frame(datx(11,10,2))
WriteXLS(dat, "test5.xlsx", row.names=FALSE)
 I created several sheets  by copying the first sheet

t1<- read.xls("Test6.xlsx",2, stringsAsFactors=FALSE)

I am getting an error message of
Error in read.table(file = file, header = header, sep = sep, quote = quote,  :
  no lines available in input

Thank you in advance


On Tue, Dec 13, 2016 at 5:07 PM, Val <valkr...@gmail.com> wrote:
> Marc,
> Thank you so much! That was helpful comment.
>
>
> On Mon, Dec 12, 2016 at 10:09 PM, Marc Schwartz <marc_schwa...@me.com> wrote:
>> Hi,
>>
>> With the WriteXLS() function, from the package of the same name, if you 
>> specify '.xlsx' for the file name extension, the function will create an 
>> Excel 2007 compatible file, which can handle worksheets of up to 1,048,576 
>> rows by 16,384 columns.
>>
>> Thus:
>>
>>   WriteXLS(dat, "test4.xlsx", row.names = FALSE)
>>
>> That is all described in the help file for the function.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Dec 12, 2016, at 6:51 PM, Val <valkr...@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> I have a data frame with more than 100,000 rows.
>>>
>>> datx <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
>>> dat <- datx(11,10,2)
>>>
>>> 1)
>>> WriteXLS(dat, "test4.xls", row.names=FALSE)
>>> Error in WriteXLS(dat, "test4.xls", row.names = FALSE) :
>>>  One or more of the data frames named in 'x' exceeds 65,535 rows or 256 
>>> columns
>>>
>>> I noticed that *.xls has  row and column limitations.
>>>
>>> How can I take the excess row to the next sheet?
>>>
>>> 2) I also tried to use xlsx and have a problem
>>>
>>> write.xlsx(dat, "test3.xlsx",sheetName="sheet1", row.names=FALSE)
>>> Error in .jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook") :
>>>  java.lang.OutOfMemoryError: Java heap
>>> space.jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook")>> class "jobjRef">
>>>
>>> Any help ?
>>> Thank you in advance
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] output

2016-12-12 Thread Val
Hi all,

I have a data frame with more than 100,000 rows.

datx <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
dat <- datx(11,10,2)

1)
WriteXLS(dat, "test4.xls", row.names=FALSE)
Error in WriteXLS(dat, "test4.xls", row.names = FALSE) :
  One or more of the data frames named in 'x' exceeds 65,535 rows or 256 columns

I noticed that *.xls has  row and column limitations.

How can I take the excess row to the next sheet?

2) I also tried to use xlsx and have a problem

write.xlsx(dat, "test3.xlsx",sheetName="sheet1", row.names=FALSE)
Error in .jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook") :
  java.lang.OutOfMemoryError: Java heap
space.jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook")

Any help ?
Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data

2016-12-03 Thread Val
Hi all,

I am trying to read and summarize  a big data frame( >10M records)

Here is the sample of my data
state,city,x
1,12,100
1,12,100
1,12,200
1,13,200
1,13,100
1,13,100
1,14,200
2,21,200
2,21,200
2,21,100
2,23,100
2,23,200
2,34,200
2,34,100
2,35,100

I want  get  the total count by state, and the  the number of cities
by state. The x variable is either 100 or 200 and count each

The result should look like as follows.

state,city,count,100's,200's
1,3,7,4,3
2,4,8,4,4

At the present I am doing it  in several steps and taking too long

Is there an efficient way of doing this?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] files

2016-11-29 Thread Val
Thank you Sarah,

Some of the files are not csv but  some are *txt with  space delimited.
   Bdat.txt
   Bdat123.txt
   Bdat456.txt
How do I do that?



On Tue, Nov 29, 2016 at 8:28 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote:
> Something like this:
>
> filelist <- list.files(pattern="^test")
> myfiles <- lapply(filelist, read.csv)
> myfiles <- do.call(rbind, myfiles)
>
>
>
> On Tue, Nov 29, 2016 at 9:11 PM, Val <valkr...@gmail.com> wrote:
>> Hi all,
>>
>> In one folder  I have several files  and  I want
>> combine/concatenate(rbind) based on some condition .
>> Here is  the sample of the files in one folder
>>test.csv
>>test123.csv
>>test456.csv
>>Adat.csv
>>Adat123.csv
>>Adat456.csv
>>
>> I want to create 2  files as follows
>>
>> test_all  = rbind(test.csv, test123.csv,test456.csv)
>> Adat_al l= rbind(Adat.csv, Adat123.csv,Adat456.csv)
>>
>> The actual number of  of files are many and  is there an efficient way
>> of doing it?
>>
>> Thank you
>>
>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] files

2016-11-29 Thread Val
Hi all,

In one folder  I have several files  and  I want
combine/concatenate(rbind) based on some condition .
Here is  the sample of the files in one folder
   test.csv
   test123.csv
   test456.csv
   Adat.csv
   Adat123.csv
   Adat456.csv

I want to create 2  files as follows

test_all  = rbind(test.csv, test123.csv,test456.csv)
Adat_al l= rbind(Adat.csv, Adat123.csv,Adat456.csv)

The actual number of  of files are many and  is there an efficient way
of doing it?

Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read

2016-11-28 Thread Val
Hi Jeff  and John,

Thank you for your response.
In each folder, I am expecting a single file name (either dat or
dat.csv).v  so will this work?


Is the following correct?
fns  <- list.files(mydir)
if (is.element(pattern="dat(\\.[^.]+)$",fns ))

Thank you again.

On Mon, Nov 28, 2016 at 7:20 PM, Jeff Newmiller
 wrote:
> No, and yes, depending what you mean.
>
> No, because you have to supply the file name to open it... you cannot 
> directly use wildcards to open files.
>
> Yes,  because the list.files function can be used to match all file names 
> fitting a regex pattern, and you can use those filenames to open the files.
>
> E.g.
>
> fns  <- list.files( pattern="dat(\\.[^.]+)$" )
> dtaL <- lapply( fns, function(fn){ read.csv( fn, stringsAsFactors=FALSE ) } )
>
> If you only expect one file to be in any given directory, you can skip the 
> lapply and just read the file, or you can extract the data frame from the 
> list using dtaL[[ 1 ]].
>
> ?list.files
> ?regex for help on patterns
> --
> Sent from my phone. Please excuse my brevity.
>
> On November 28, 2016 2:23:23 PM PST, Ashta  wrote:
>>Hi all,
>>
>>I have a script that  reads a file (dat.csv)  from several folders.
>>However, in some folders the file name is (dat) with out csv  and in
>>other folders it is dat.csv.  The format of data is the same(only the
>>file name differs  with and without "csv".
>>
>>Is it possible to read these files  depending on their name in one?
>>like read.csv("dat.csv"). How can I read both type of file names?
>>
>>Thank you in advance
>>
>>__
>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Variable

2016-11-24 Thread Val
Hi all,

I am trying to get shell variable(s) into my R script  in Linux . How
do I get them?

my shell script is
t1.sh
 #!bin/bash
   Name=Alex; export Name
   Age=25; export Age


How do get the Name and Age variables in my R script?

My R script is

test.R
print " Your Name is $Name and  you are $Age  years old"

My another shell script that call the R script is

test.sh
 #!bin/bash
 source  t1.sh
 Rscript test.R
So by running this script  ./test.sh

I want get:  Your Name is Alex and  you are 25  years old

I can define those variables in R  but that is not my intention.

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >