Re: [R] Splitting a data column randomly into 3 groups

2021-09-02 Thread Avi Gross via R-help
Abou,

 

I am not trying to be negative. Assuming you are a professor of Statistics, 
your request seems odd as what you are asking about is very routine in much of 
statistical work where you want to make a model or something using just part of 
your data and need to reserve some to check if you perhaps trained an algorithm 
too much for the original data used.

 

A simple online search before asking questions here is appreciated. I did a 
quick search for something like “R split data into three parts” and see several 
applicable answers.

 

There are people on this forum who actually get paid to do nontrivial tasks and 
do not mind help in spots but feel sort of used if expected to write a serious 
amount of code and perhaps then be asked to redo it with more bells and 
whistles added. A recent badly phrased request comes to mind where several of 
us provided and answer only to find out it was for a different scenario, …

 

So let me continue with a serious answer. May we assume you KNOW how to read 
the data in to something like a data.frame? If so, and if you see no need or 
value in doing this the hard way, then your question could have been to ask if 
there is an R built-in function or perhaps a pacjkage already set to solve it 
quickly. Again, a simple online search can do wonders.  Here, for example is a 
package called caret and this page discusses spliutting data multiple ways:

 

https://topepo.github.io/caret/data-splitting.html

 

There are other such pages suggesting how to do it using base R.

 

Here is one that gives an example on how to make  three unequal partitions:

 

inds <- partition(iris$Sepal.Length, p = c(train = 0.6, valid = 0.2, test = 
0.2))

 

 

There is more to do below but in the above, you would use whatever names you 
want instead of train/valid/test and set all three to 0.33 and so on.

 

I repeat, that what you want to do strikes some of us as a fairly routine thing 
to do and lots of people have written how they have done it and you can pick 
and choose, or redo it on your own. If what you have is a homework assignment, 
the appropriate thing is to have you learn to use some technique yourself and 
perhaps get minor help when it fails. But if you will be doing this regularly, 
use of some packages is highly valuable.

 

Good Luck.

 

 

 

 

 

From: AbouEl-Makarim Aboueissa  
Sent: Thursday, September 2, 2021 9:51 PM
To: Avi Gross 
Cc: R mailing list 
Subject: Re: [R] Splitting a data column randomly into 3 groups

 

Sorry, please forget about it. I believe that I am very serious when I posted 
my question.

 

with thanks

abou


__

AbouEl-Makarim Aboueissa, PhD

 

Professor, Statistics and Data Science

Graduate Coordinator

Department of Mathematics and Statistics

University of Southern Maine

 

 

 

On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help mailto:r-help@r-project.org> > wrote:

What is stopping you Abou?

Some of us here start wondering if we have better things to do than homework 
for others. Help is supposed to be after they try and encounter issues that we 
may help with.

So think about your problem. You supplied data in a file that is NOT in CSV 
format but is in Tab separated format.

You need to get it in to your program and store it in something. It looks like 
you have 204 items so 1/3 of those would be exactly 68.

So if your data is in an object like a vector or data.frame, you want to choose 
random number between 1 and 204. How do you do that? You need 1/3 of the length 
of the object items, in your case 68.

Now extract the items with  those indices into say A1. Extract all the rest 
into a temporary item.

Make another 68 random indices, with no overlap, and copy those items into A2 
and the ones that do not have those into A3 and you are sort of done, other 
than some cleanup or whatever.

There are many ways to do the above and I am sure packages too.

But since you have made no visible effort, I personally am not going to pick 
anything in particular.

Had you shown some text and code along the lines of the above and just wanted 
to know how to copy just the ones that were not selected, we could easily ...


-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of AbouEl-Makarim Aboueissa
Sent: Thursday, September 2, 2021 9:30 PM
To: R mailing list mailto:r-help@r-project.org> >
Subject: [R] Splitting a data column randomly into 3 groups

Dear All:

How to split a column data *randomly* into three groups. Please see the 
attached data. I need to split column #2 titled "Data"

with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science* *Graduate Coordinator*

*Department of Mathematics and Statistics* *University of Southern Maine*

__
R-help@r-project.org   mailing list -- To 
UNSUBSCRIBE and more, see

Re: [R] Splitting a data column randomly into 3 groups

2021-09-02 Thread Jim Lemon
Hi Abou,
One way is to shuffle the original data frame using sample(). and
split up the result into three equal parts.
I was going to provide example code, but Avi's response popped up and
I kind of agree with him.

Jim

On Fri, Sep 3, 2021 at 11:31 AM AbouEl-Makarim Aboueissa
 wrote:
>
> Dear All:
>
> How to split a column data *randomly* into three groups. Please see the
> attached data. I need to split column #2 titled "Data"
>
> with many thanks
> abou
> __
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Statistics and Data Science*
> *Graduate Coordinator*
>
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-02 Thread AbouEl-Makarim Aboueissa
Sorry, please forget about it. I believe that I am very serious when I
posted my question.

with thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help 
wrote:

> What is stopping you Abou?
>
> Some of us here start wondering if we have better things to do than
> homework for others. Help is supposed to be after they try and encounter
> issues that we may help with.
>
> So think about your problem. You supplied data in a file that is NOT in
> CSV format but is in Tab separated format.
>
> You need to get it in to your program and store it in something. It looks
> like you have 204 items so 1/3 of those would be exactly 68.
>
> So if your data is in an object like a vector or data.frame, you want to
> choose random number between 1 and 204. How do you do that? You need 1/3 of
> the length of the object items, in your case 68.
>
> Now extract the items with  those indices into say A1. Extract all the
> rest into a temporary item.
>
> Make another 68 random indices, with no overlap, and copy those items into
> A2 and the ones that do not have those into A3 and you are sort of done,
> other than some cleanup or whatever.
>
> There are many ways to do the above and I am sure packages too.
>
> But since you have made no visible effort, I personally am not going to
> pick anything in particular.
>
> Had you shown some text and code along the lines of the above and just
> wanted to know how to copy just the ones that were not selected, we could
> easily ...
>
>
> -Original Message-
> From: R-help  On Behalf Of AbouEl-Makarim
> Aboueissa
> Sent: Thursday, September 2, 2021 9:30 PM
> To: R mailing list 
> Subject: [R] Splitting a data column randomly into 3 groups
>
> Dear All:
>
> How to split a column data *randomly* into three groups. Please see the
> attached data. I need to split column #2 titled "Data"
>
> with many thanks
> abou
> __
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Statistics and Data Science* *Graduate Coordinator*
>
> *Department of Mathematics and Statistics* *University of Southern Maine*
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-02 Thread Avi Gross via R-help
What is stopping you Abou?

Some of us here start wondering if we have better things to do than homework 
for others. Help is supposed to be after they try and encounter issues that we 
may help with.

So think about your problem. You supplied data in a file that is NOT in CSV 
format but is in Tab separated format.

You need to get it in to your program and store it in something. It looks like 
you have 204 items so 1/3 of those would be exactly 68.

So if your data is in an object like a vector or data.frame, you want to choose 
random number between 1 and 204. How do you do that? You need 1/3 of the length 
of the object items, in your case 68.

Now extract the items with  those indices into say A1. Extract all the rest 
into a temporary item.

Make another 68 random indices, with no overlap, and copy those items into A2 
and the ones that do not have those into A3 and you are sort of done, other 
than some cleanup or whatever.

There are many ways to do the above and I am sure packages too.

But since you have made no visible effort, I personally am not going to pick 
anything in particular.

Had you shown some text and code along the lines of the above and just wanted 
to know how to copy just the ones that were not selected, we could easily ...


-Original Message-
From: R-help  On Behalf Of AbouEl-Makarim 
Aboueissa
Sent: Thursday, September 2, 2021 9:30 PM
To: R mailing list 
Subject: [R] Splitting a data column randomly into 3 groups

Dear All:

How to split a column data *randomly* into three groups. Please see the 
attached data. I need to split column #2 titled "Data"

with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science* *Graduate Coordinator*

*Department of Mathematics and Statistics* *University of Southern Maine*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Splitting a data column randomly into 3 groups

2021-09-02 Thread AbouEl-Makarim Aboueissa
Dear All:

How to split a column data *randomly* into three groups. Please see the
attached data. I need to split column #2 titled "Data"

with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*
ID Data
1   366
2   394
3   222
4   396
5   399
6   158
7   361
8   426
9   255
10  32
11  31
12  53
13  377
14  405
15  448
16  362
17  260
18  90
19  95
20  8
21  385
22  306
23  154
24  345
25  136
26  39
27  472
28  19
29  404
30  463
31  134
32  72
33  477
34  22
35  240
36  389
37  482
38  287
39  180
40  140
41  456
42  403
43  81
44  425
45  57
46  251
47  421
48  343
49  310
50  62
51  412
52  93
53  111
54  148
55  311
56  430
57  12
58  100
59  437
60  363
61  126
62  367
63  165
64  272
65  171
66  167
67  234
68  113
69  315
70  175
71  484
72  379
73  474
74  216
75  250
76  177
77  293
78  133
79  203
80  408
81  150
82  155
83  223
84  381
85  336
86  368
87  290
88  359
89  333
90  219
91  455
92  427
93  444
94  178
95  302
96  221
97  248
98  160
99  304
100 56
101 25
102 400
103 485
104 89
105 254
106 186
107 283
108 431
109 188
110 354
111 119
112 67
113 415
114 346
115 319
116 344
117 121
118 34
119 288
120 416
121 308
122 340
123 166
124 443
125 388
126 286
127 245
128 406
129 253
130 395
131 274
132 428
133 329
134 410
135 127
136 420
137 187
138 244
139 125
140 137
141 206
142 205
143 327
144 211
145 7
146 192
147 317
148 60
149 54
150 4
151 434
152 233
153 47
154 280
155 76
156 398
157 320
158 347
159 453
160 465
161 382
162 476
163 213
164 418
165 409
166 230
167 3
168 229
169 436
170 262
171 77
172 207
173 118
174 99
175 243
176 27
177 479
178 438
179 152
180 109
181 330
182 17
183 179
184 323
185 124
186 296
187 435
188 225
189 128
190 84
191 316
192 195
193 74
194 138
195 149
196 63
197 249
198 104
199 35
200 228
201 44
202 275
203 259
204 356
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting some rows in different color

2021-09-02 Thread Jim Lemon
Hi Eliza
This seems to work:

plot(BFA3[,1],BFA3[,4],
 pch=16, xlab = "", ylab = "",col=(BFA3[,2]==BFA3[,3])+2,axes=FALSE)

but I have no idea what you are trying to do with the

as.numeric(as.Date(...))

business.

Jim

On Fri, Sep 3, 2021 at 8:44 AM Eliza Botto  wrote:
>
> Dear useRs,
>
> For the following dataset,
>
> dput(BFA3)
>
> structure(c(17532, 17533, 17534, 17535, 17536, 17537, 17538,
> 17539, 17540, 17541, 17542, 17543, 17544, 17545, 17546, 17547,
> 17548, 17549, 17550, 17551, 17552, 17553, 17554, 17555, 17556,
> 17557, 17558, 17559, 17560, 17561, 17562, 17563, 17564, 17565,
> 17566, 17567, 17568, 17569, 17570, 17571, 17572, 17573, 17574,
> 17575, 17576, 17577, 17578, 17579, 17580, 17581, 17582, 17583,
> 17584, 17585, 17586, 17587, 17588, 17589, 17590, 17591, 17592,
> 17593, 17594, 17595, 17596, 17597, 17598, 17599, 17600, 17601,
> 17602, 17603, 17604, 17605, 17606, 17607, 17608, 17609, 17610,
> 17611, 17612, 17613, 17614, 17615, 17616, 17617, 17618, 17619,
> 17620, 17621, 17622, 17623, 17624, 17625, 17626, 17627, 17628,
> 17629, 17630, 17631, 17632, 17633, 17634, 17635, 17636, 17637,
> 17638, 17639, 17640, 17641, 17642, 17643, 17644, 17645, 17646,
> 17647, 17648, 17649, 17650, 17651, 17652, 17653, 17654, 17655,
> 17656, 17657, 17658, 17659, 17660, 17661, 17662, 17663, 17664,
> 17665, 17666, 17667, 17668, 17669, 17670, 17671, 17672, 17673,
> 17674, 17675, 17676, 17677, 17678, 17679, 17680, 17681, 17682,
> 17683, 17684, 17685, 17686, 17687, 17688, 17689, 17690, 17691,
> 17692, 17693, 17694, 17695, 17696, 17697, 17698, 17699, 17700,
> 17701, 17702, 17703, 17704, 17705, 17706, 17707, 17708, 17709,
> 17710, 17711, 17712, 17713, 17714, 17715, 17716, 17717, 17718,
> 17719, 17720, 17721, 17722, 17723, 17724, 17725, 17726, 17727,
> 17728, 17729, 17730, 17731, 17732, 17733, 17734, 17735, 17736,
> 17737, 17738, 17739, 17740, 17741, 17742, 17743, 17744, 17745,
> 17746, 17747, 17748, 17749, 17750, 17751, 17752, 17753, 17754,
> 17755, 17756, 17757, 17758, 17759, 17760, 17761, 17762, 17763,
> 17764, 17765, 17766, 17767, 17768, 17769, 17770, 17771, 17772,
> 17773, 17774, 17775, 17776, 1, 17778, 17779, 17780, 17781,
> 17782, 17783, 17784, 17785, 17786, 17787, 17788, 17789, 17790,
> 17791, 17792, 17793, 17794, 17795, 17796, 17797, 17798, 17799,
> 17800, 17801, 17802, 17803, 17804, 17805, 17806, 17807, 17808,
> 17809, 17810, 17811, 17812, 17813, 17814, 17815, 17816, 17817,
> 17818, 17819, 17820, 17821, 17822, 17823, 17824, 17825, 17826,
> 17827, 17828, 17829, 17830, 17831, 17832, 17833, 17834, 17835,
> 17836, 17837, 17838, 17839, 17840, 17841, 17842, 17843, 17844,
> 17845, 17846, 17847, 17848, 17849, 17850, 17851, 17852, 17853,
> 17854, 17855, 17856, 17857, 17858, 17859, 17860, 17861, 17862,
> 17863, 17864, 17865, 17866, 17867, 17868, 17869, 17870, 17871,
> 17872, 17873, 17874, 17875, 17876, 17877, 17878, 17879, 17880,
> 17881, 17882, 17883, 17884, 17885, 17886, 17887, 17888, 17889,
> 17890, 17891, 17892, 17893, 17894, 17895, 17896, 8.36875, 15.12875,
> 14.2825, 12.355, 13.1825, 88.58375, 47.52125, 26.53375, 22.85875,
> 12.4925, 9.86875, 13.125, 14.055, 14.0175, 14.70625, 15.46125,
> 11.8725, 17.505, 19.8575, 74.875445, 62.018935, 23.9481046910236,
> 9.68, 9.6175, 9.78, 9.8875, 9.5125, 9.885, 9.99, 14.16625, 11.99375,
> 10.7875, 9.85625, 12.17125, 11.76625, 11.0425, 9.28875, 9.425,
> 23.29375, 12.66875, 9.6, 10.06875, 10.055, 30.89625, 36.69375,
> 16.63875, 11.84625, 12.825, 14.94, 11.495, 10.795, 9.14625, 10.17875,
> 11.0525, 10.0175, 10.67625, 10.4325, 12.5175, 13.93, 14.1675,
> 17.3175, 18.3875, 12.06, 10.3125, 9.94125, 10.8575, 11.2425,
> 13.28875, 43.885, 76.225, 125.277272727273, 198.285, 181.40125,
> 113.38875, 89.7925, 108.27875, 100.24375, 103.57, 189.015, 190.8925,
> 113.76875, 109.055, 96.4925, 99.04625, 127.47125, 300.86, 250.0725,
> 108.72125, 54.61, 39.76625, 30.11875, 31.46875, 40.73, 40.63,
> 27.48125, 24.9125, 24.105, 53.65625, 209.77125, 281.53125, 460.90875,
> 296.4225, 349.58375, 494.825, 404.4475, 510.68125, 681.65625,
> 637.78125, 838.40125, 740.0875, 601.81375, 246.75625, 127.1725,
> 92.36875, 78.11875, 73.61625, 62.77875, 59.87, 106.36, 115.31125,
> 64.025, 96.30125, 97.50625, 92.875, 92.49875, 89.295, 84.46375,
> 80.05625, 80.745, 114.13, 91.3225, 79.72125, 70.555, 30.8975,
> 14.28625, 13.02875, 93.59125, 246.7875, 54.37125, 29.45375, 16.2725,
> 15.175, 15.1475, 16.27875, 15.0575, 14.0425, 11.675, 12.9275,
> 11.26, 12.56, 183.555, 413.2025, 111.46375, 43.01375, 27.66125,
> 17.55875, 15.28, 14.88875, 14.60875, 14.44625, 281.95125, 85.16875,
> 24.6675, 14.88875, 15.02, 23.35125, 65.385, 83.95, 37.675, 22.31375,
> 15.1075, 15.02625, 96.39, 1856.72375, 612.275, 97.04875, 46.065,
> 28.62125, 23.22875, 234.78375, 58.21375, 33.29, 55.595, 66.57375,
> 81.39875, 42.84625, 26.945, 20.00375, 14.26875, 14.87625, 82.975,
> 85.12125, 35.7575, 26.875, 40.36375, 28.63875, 15.68, 13.70125,
> 29.42625, 51.81125, 26.6125, 15.56375, 13.725, 191.72625, 376.08625,
> 66.27875, 72.0275, 47.50375, 26.555, 

[R] plotting some rows in different color

2021-09-02 Thread Eliza Botto
Dear useRs,

For the following dataset,

dput(BFA3)

structure(c(17532, 17533, 17534, 17535, 17536, 17537, 17538,
17539, 17540, 17541, 17542, 17543, 17544, 17545, 17546, 17547,
17548, 17549, 17550, 17551, 17552, 17553, 17554, 17555, 17556,
17557, 17558, 17559, 17560, 17561, 17562, 17563, 17564, 17565,
17566, 17567, 17568, 17569, 17570, 17571, 17572, 17573, 17574,
17575, 17576, 17577, 17578, 17579, 17580, 17581, 17582, 17583,
17584, 17585, 17586, 17587, 17588, 17589, 17590, 17591, 17592,
17593, 17594, 17595, 17596, 17597, 17598, 17599, 17600, 17601,
17602, 17603, 17604, 17605, 17606, 17607, 17608, 17609, 17610,
17611, 17612, 17613, 17614, 17615, 17616, 17617, 17618, 17619,
17620, 17621, 17622, 17623, 17624, 17625, 17626, 17627, 17628,
17629, 17630, 17631, 17632, 17633, 17634, 17635, 17636, 17637,
17638, 17639, 17640, 17641, 17642, 17643, 17644, 17645, 17646,
17647, 17648, 17649, 17650, 17651, 17652, 17653, 17654, 17655,
17656, 17657, 17658, 17659, 17660, 17661, 17662, 17663, 17664,
17665, 17666, 17667, 17668, 17669, 17670, 17671, 17672, 17673,
17674, 17675, 17676, 17677, 17678, 17679, 17680, 17681, 17682,
17683, 17684, 17685, 17686, 17687, 17688, 17689, 17690, 17691,
17692, 17693, 17694, 17695, 17696, 17697, 17698, 17699, 17700,
17701, 17702, 17703, 17704, 17705, 17706, 17707, 17708, 17709,
17710, 17711, 17712, 17713, 17714, 17715, 17716, 17717, 17718,
17719, 17720, 17721, 17722, 17723, 17724, 17725, 17726, 17727,
17728, 17729, 17730, 17731, 17732, 17733, 17734, 17735, 17736,
17737, 17738, 17739, 17740, 17741, 17742, 17743, 17744, 17745,
17746, 17747, 17748, 17749, 17750, 17751, 17752, 17753, 17754,
17755, 17756, 17757, 17758, 17759, 17760, 17761, 17762, 17763,
17764, 17765, 17766, 17767, 17768, 17769, 17770, 17771, 17772,
17773, 17774, 17775, 17776, 1, 17778, 17779, 17780, 17781,
17782, 17783, 17784, 17785, 17786, 17787, 17788, 17789, 17790,
17791, 17792, 17793, 17794, 17795, 17796, 17797, 17798, 17799,
17800, 17801, 17802, 17803, 17804, 17805, 17806, 17807, 17808,
17809, 17810, 17811, 17812, 17813, 17814, 17815, 17816, 17817,
17818, 17819, 17820, 17821, 17822, 17823, 17824, 17825, 17826,
17827, 17828, 17829, 17830, 17831, 17832, 17833, 17834, 17835,
17836, 17837, 17838, 17839, 17840, 17841, 17842, 17843, 17844,
17845, 17846, 17847, 17848, 17849, 17850, 17851, 17852, 17853,
17854, 17855, 17856, 17857, 17858, 17859, 17860, 17861, 17862,
17863, 17864, 17865, 17866, 17867, 17868, 17869, 17870, 17871,
17872, 17873, 17874, 17875, 17876, 17877, 17878, 17879, 17880,
17881, 17882, 17883, 17884, 17885, 17886, 17887, 17888, 17889,
17890, 17891, 17892, 17893, 17894, 17895, 17896, 8.36875, 15.12875,
14.2825, 12.355, 13.1825, 88.58375, 47.52125, 26.53375, 22.85875,
12.4925, 9.86875, 13.125, 14.055, 14.0175, 14.70625, 15.46125,
11.8725, 17.505, 19.8575, 74.875445, 62.018935, 23.9481046910236,
9.68, 9.6175, 9.78, 9.8875, 9.5125, 9.885, 9.99, 14.16625, 11.99375,
10.7875, 9.85625, 12.17125, 11.76625, 11.0425, 9.28875, 9.425,
23.29375, 12.66875, 9.6, 10.06875, 10.055, 30.89625, 36.69375,
16.63875, 11.84625, 12.825, 14.94, 11.495, 10.795, 9.14625, 10.17875,
11.0525, 10.0175, 10.67625, 10.4325, 12.5175, 13.93, 14.1675,
17.3175, 18.3875, 12.06, 10.3125, 9.94125, 10.8575, 11.2425,
13.28875, 43.885, 76.225, 125.277272727273, 198.285, 181.40125,
113.38875, 89.7925, 108.27875, 100.24375, 103.57, 189.015, 190.8925,
113.76875, 109.055, 96.4925, 99.04625, 127.47125, 300.86, 250.0725,
108.72125, 54.61, 39.76625, 30.11875, 31.46875, 40.73, 40.63,
27.48125, 24.9125, 24.105, 53.65625, 209.77125, 281.53125, 460.90875,
296.4225, 349.58375, 494.825, 404.4475, 510.68125, 681.65625,
637.78125, 838.40125, 740.0875, 601.81375, 246.75625, 127.1725,
92.36875, 78.11875, 73.61625, 62.77875, 59.87, 106.36, 115.31125,
64.025, 96.30125, 97.50625, 92.875, 92.49875, 89.295, 84.46375,
80.05625, 80.745, 114.13, 91.3225, 79.72125, 70.555, 30.8975,
14.28625, 13.02875, 93.59125, 246.7875, 54.37125, 29.45375, 16.2725,
15.175, 15.1475, 16.27875, 15.0575, 14.0425, 11.675, 12.9275,
11.26, 12.56, 183.555, 413.2025, 111.46375, 43.01375, 27.66125,
17.55875, 15.28, 14.88875, 14.60875, 14.44625, 281.95125, 85.16875,
24.6675, 14.88875, 15.02, 23.35125, 65.385, 83.95, 37.675, 22.31375,
15.1075, 15.02625, 96.39, 1856.72375, 612.275, 97.04875, 46.065,
28.62125, 23.22875, 234.78375, 58.21375, 33.29, 55.595, 66.57375,
81.39875, 42.84625, 26.945, 20.00375, 14.26875, 14.87625, 82.975,
85.12125, 35.7575, 26.875, 40.36375, 28.63875, 15.68, 13.70125,
29.42625, 51.81125, 26.6125, 15.56375, 13.725, 191.72625, 376.08625,
66.27875, 72.0275, 47.50375, 26.555, 16.58625, 16.9275, 15.26875,
33.3125, 64.98625, 66.93875, 194.75875, 65.15, 29.03375, 15.545,
14.83625, 14.89, 15.08875, 14.71, 146.1525, 112.855, 34.10625,
16.46625, 15.0175, 15.06125, 13.94625, 12.1075, 14.265, 14.30125,
13.77125, 12.51, 181.65625, 82.07875, 59.46125, 209.9875, 42.5525,
22.19, 32.95, 19.89875, 37.7175, 29.62875, 41.705, 34.1225, 23.7275,
20.565, 17.61125, 16.53125, 15.75125, 119.1025, 79.8675, 27.6375,

Re: [R] Show only header of str() function

2021-09-02 Thread Luigi Marongiu
Thanks, that is perfect!

On Thu, Sep 2, 2021 at 7:02 PM Deepayan Sarkar
 wrote:
>
> On Thu, Sep 2, 2021 at 9:26 PM Enrico Schumann  
> wrote:
> >
> > On Thu, 02 Sep 2021, Luigi Marongiu writes:
> >
> > > Hello, is it possible to show only the header (that is: `'data.frame':
> > > x obs. of  y variables:` part) of the str function?
> > > Thank you
> >
> > Perhaps one more solution. You could limit the number
> > of list components to be printed, though it will leave
> > a "truncated" message.
> >
> > str(iris, list.len = 0)
> > ## 'data.frame':150 obs. of  5 variables:
> > ##   [list output truncated]
>
> Or use 'max.level', which is also generally useful for nested lists:
>
> str(iris, max.level=0)
> ## 'data.frame':150 obs. of  5 variables:
>
> Best,
> -Deepayan
>
> > Since 'str' is a generic function, you could also
> > define a new 'str' method. Perhaps something among
> > those lines:
> >
> > str.data.frame.oneline <- function (object, ...) {
> > cat("'data.frame':\t", nrow(object), " obs. of  ",
> > (p <- length(object)),
> > " variable", if (p != 1) "s", "\n", sep = "")
> > invisible(NULL)
> > }
> >
> > (which is essentially taken from 'str.data.frame').
> >
> > Then:
> >
> > class(iris) <- c("data.frame.oneline", class(iris))
> >
> > str(iris)
> > ## 'data.frame':  150 obs. of  5 variables
> >
> > str(list(a = 1,
> >  list(b = 2,
> >   c = iris)))
> > ## List of 2
> > ##  $ a: num 1
> > ##  $  :List of 2
> > ##   ..$ b: num 2
> > ##   ..$ c:'data.frame':   150 obs. of  5 variables
> >
> >
> >
> >
> > --
> > Enrico Schumann
> > Lucerne, Switzerland
> > http://enricoschumann.net
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Show only header of str() function

2021-09-02 Thread Rui Barradas

Hello,

I believe but do not have references that str was meant for interactive 
use, not for use in a script or package. If this is the case, then it 
should be rare to have to output to an object such as a character vector.


As for my solution, it is far from perfect, I try to avoid 
capture.output and, once again, limit its use to interactive R. It is 
overkill but I use it so few times that performance issues probably do 
not matter. It is sometimes a convenient way of solving an immediate 
problem and once done, move on.


As for the OP, Enrico's solution seems better, even with the 2nd printed 
line. Unless the 1st line is to be processed (?).


Hope this helps,

Rui Barradas

Às 17:32 de 02/09/21, Avi Gross via R-help escreveu:

Thanks for the interesting method Rui. So that is a way to do a redirect of 
output not to a sinkfile but to an in-memory variable as a textConnection.

Of course, one has to wonder why the makers of str thought it would be too 
inefficient to have an option that returns the output in a form that can be 
captured directly, not just to the screen.

I have in the past done odd things such as using sink() to capture the output 
of a program that wrote another program dynamically in a loop. The saved file 
could then be used with source(). So a similar technique can capture the output 
from str() or cat() or whatever normally only writes to the screen and then the 
file can be read in to get the first line or whatever you need. I have had to 
play games to get the right output from some statistical programs too as it was 
assumed the user would read it, and sometimes had to cherry pick what I needed 
directly from withing the underlying object.

I suspect one reason R has so many packages including the tidyverse I like to use, is because the 
original R was designed in another time and in many places is not very consistent. I wonder how 
hard it would be to change some programs to simply accept an additional argument like sink() has 
where you can say split=TRUE and get a copy of what is being diverted to also come to the screen. I 
find cat() to be a very useful way to put more complicated output together than say print() but 
since it does not allow capture of the text into variables, I end up having to use other methods 
such as the glue() function or something like print(sprint("Hello %s, I have %d left.\n", 
"Brian", 5))

But you work with what you have. Your solution works albeit having read the 
function definition, is quite a bit of overkill when I read the code as it does 
things not needed. But as noted, if efficiency matters and you are only looking 
at data.frame style objects, there are cheaper solutions.


-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Thursday, September 2, 2021 7:31 AM
To: Luigi Marongiu ; r-help 
Subject: Re: [R] Show only header of str() function

Hello,

Not perfect but works for data.frames:


header_str <- function(x){
capture.output(str(x))[[1]]
}
header_str(iris)
header_str(AirPassengers)
header_str(1:10)


Hope this helps,

Rui Barradas

Às 12:02 de 02/09/21, Luigi Marongiu escreveu:

Hello, is it possible to show only the header (that is: `'data.frame':
x obs. of  y variables:` part) of the str function?
Thank you



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Duncan Murdoch

On 02/09/2021 3:20 p.m., Greg Minshall wrote:

Andrew,


x[] <- lapply(x, function(xx) {
 xx[is.nan(xx)] <- NA_real_
 xx
})

is different from

x <- lapply(x, function(xx) {
 xx[is.nan(xx)] <- NA_real_
 xx
})


indeed, the two are different -- but some ignorance of mine is exposed.
i wonder, can you explain why the two are different?


x <- lapply(...) says "set x to the list on the RHS", so x becomes a 
list, not a dataframe.


x[] <- lapply(...) says "set the values in x to the values in the list 
on the RHS", so x retains its class.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Jeff Newmiller
Regardless of whether you use the lower-level split function, or the 
higher-level aggregate function, or the tidyverse group_by function, the key is 
learning how to create the column that is the same for all records 
corresponding to the time interval of interest.

If you convert the sampdate to POSIXct, the tz IS important, because most of us 
use local timezones that respect daylight savings time, and a naive conversion 
of standard time will run into trouble if R is assuming daylight savings time 
applies. The lubridate package gets around this by always assuming UTC and 
giving you a function to "fix" the timezone after the conversion. I prefer to 
always be specific about timezones, at least by using so something like

Sys.setenv( TZ = "Etc/GMT+8" )

which does not respect daylight savings.

Regarding using character data for identifying the month, in order to have 
clean plots of the data I prefer to use the trunc function but it returns a 
POSIXlt so I convert it to POSIXct:

discharge$sampmonthbegin <- as.POSIXct( trunc( discharge$sampdate, units = 
"months" ) )

Then any of various ways can be used to aggregate the records by that column.

On September 2, 2021 12:10:15 PM PDT, Andrew Simmons  wrote:
>You could use 'split' to create a list of data frames, and then apply a
>function to each to get the means and sds.
>
>
>cols <- "cfs"  # add more as necessary
>S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
>means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
>sds   <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd, na.rm =
>TRUE)))
>
>On Thu, Sep 2, 2021 at 3:01 PM Rich Shepard 
>wrote:
>
>> On Thu, 2 Sep 2021, Rich Shepard wrote:
>>
>> > If I correctly understand the output of as.POSIXlt each date and time
>> > element is separate, so input such as 2016-03-03 12:00 would now be 2016
>> 03
>> > 03 12 00 (I've not read how the elements are separated). (The TZ is not
>> > important because all data are either PST or PDT.)
>>
>> Using this script:
>> discharge <- read.csv('../data/water/discharge.dat', header = TRUE, sep =
>> ',', stringsAsFactors = FALSE)
>> discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz = "",
>>   format = '%Y-%m-%d %H:%M',
>>   optional = 'logical')
>> discharge$cfs <- as.numeric(discharge$cfs, length = 6)
>>
>> I get this result:
>> > head(discharge)
>>   sampdatecfs
>> 1 2016-03-03 12:00:00 149000
>> 2 2016-03-03 12:10:00 15
>> 3 2016-03-03 12:20:00 151000
>> 4 2016-03-03 12:30:00 156000
>> 5 2016-03-03 12:40:00 154000
>> 6 2016-03-03 12:50:00 15
>>
>> I'm completely open to suggestions on using this output to calculate
>> monthly
>> means and sds.
>>
>> If dplyr:summarize() will do so please show me how to modify this command:
>> disc_monthly <- ( discharge
>>  %>% group_by(sampdate)
>>  %>% summarize(exp_value = mean(cfs, na.rm = TRUE))
>> because it produces daily means, not monthly means.
>>
>> TIA,
>>
>> Rich
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Rich Shepard

On Thu, 2 Sep 2021, Andrew Simmons wrote:


You could use 'split' to create a list of data frames, and then apply a
function to each to get the means and sds.

cols <- "cfs"  # add more as necessary
S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
sds   <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd, na.rm =
TRUE)))


Andrew,

Thank you for the valuable lesson. This is new to me and I know I'll have
use for it in the future, too.

Much appreciated!

Stay well,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Greg Minshall
Andrew,

> x[] <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
> 
> is different from
> 
> x <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })

indeed, the two are different -- but some ignorance of mine is exposed.
i wonder, can you explain why the two are different?

is it because of (or, "is the clue...") this in the "Value:" section of
: ?"[<-.data.frame"

 For '[<-', '[[<-' and '$<-', a data frame.

?

cheers, Greg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Andrew Simmons
You could use 'split' to create a list of data frames, and then apply a
function to each to get the means and sds.


cols <- "cfs"  # add more as necessary
S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
sds   <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd, na.rm =
TRUE)))

On Thu, Sep 2, 2021 at 3:01 PM Rich Shepard 
wrote:

> On Thu, 2 Sep 2021, Rich Shepard wrote:
>
> > If I correctly understand the output of as.POSIXlt each date and time
> > element is separate, so input such as 2016-03-03 12:00 would now be 2016
> 03
> > 03 12 00 (I've not read how the elements are separated). (The TZ is not
> > important because all data are either PST or PDT.)
>
> Using this script:
> discharge <- read.csv('../data/water/discharge.dat', header = TRUE, sep =
> ',', stringsAsFactors = FALSE)
> discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz = "",
>   format = '%Y-%m-%d %H:%M',
>   optional = 'logical')
> discharge$cfs <- as.numeric(discharge$cfs, length = 6)
>
> I get this result:
> > head(discharge)
>   sampdatecfs
> 1 2016-03-03 12:00:00 149000
> 2 2016-03-03 12:10:00 15
> 3 2016-03-03 12:20:00 151000
> 4 2016-03-03 12:30:00 156000
> 5 2016-03-03 12:40:00 154000
> 6 2016-03-03 12:50:00 15
>
> I'm completely open to suggestions on using this output to calculate
> monthly
> means and sds.
>
> If dplyr:summarize() will do so please show me how to modify this command:
> disc_monthly <- ( discharge
>  %>% group_by(sampdate)
>  %>% summarize(exp_value = mean(cfs, na.rm = TRUE))
> because it produces daily means, not monthly means.
>
> TIA,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Rich Shepard

On Thu, 2 Sep 2021, Rich Shepard wrote:


If I correctly understand the output of as.POSIXlt each date and time
element is separate, so input such as 2016-03-03 12:00 would now be 2016 03
03 12 00 (I've not read how the elements are separated). (The TZ is not
important because all data are either PST or PDT.)


Using this script:
discharge <- read.csv('../data/water/discharge.dat', header = TRUE, sep = ',', 
stringsAsFactors = FALSE)
discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz = "",
 format = '%Y-%m-%d %H:%M',
 optional = 'logical')
discharge$cfs <- as.numeric(discharge$cfs, length = 6)

I get this result:

head(discharge)

 sampdatecfs
1 2016-03-03 12:00:00 149000
2 2016-03-03 12:10:00 15
3 2016-03-03 12:20:00 151000
4 2016-03-03 12:30:00 156000
5 2016-03-03 12:40:00 154000
6 2016-03-03 12:50:00 15

I'm completely open to suggestions on using this output to calculate monthly
means and sds.

If dplyr:summarize() will do so please show me how to modify this command:
disc_monthly <- ( discharge
%>% group_by(sampdate)
%>% summarize(exp_value = mean(cfs, na.rm = TRUE))
because it produces daily means, not monthly means.

TIA,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Rich Shepard

On Mon, 30 Aug 2021, Richard O'Keefe wrote:


x <- rnorm(samples.per.day * 365)
length(x)

[1] 105120

Reshape the fake data into a matrix where each row represents one
24-hour period.


m <- matrix(x, ncol=samples.per.day, byrow=TRUE)


Richard,

Now I understand the need to keep the date and time as a single datetime
column; separately dplyr's sumamrize() provides daily means (too many data
points to plot over 3-5 years). I reformatted the data to provide a
sampledatetime column and a values column.

If I correctly understand the output of as.POSIXlt each date and time
element is separate, so input such as 2016-03-03 12:00 would now be 2016 03
03 12 00 (I've not read how the elements are separated). (The TZ is not
important because all data are either PST or PDT.)


Now we can summarise the rows any way we want.
The basic tool here is ?apply.
?rowMeans is said to be faster than using apply to calculate means,
so we'll use that.  There is no *rowSds so we have to use apply
for the standard deviation.  I use ?head because I don't want to
post tens of thousands of meaningless numbers.


If I create a matrix using the above syntax the resulting rows contain all
recorded values for a specific day. What would be the syntax to collect all
values for each month?

This would result in 12 rows per year; the periods of record for the five
variables availble from that gauge station vary in length.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv() error

2021-09-02 Thread Rich Shepard

On Thu, 2 Sep 2021, Enrico Schumann wrote:


There is no column 'ht'.


Enrico,

New eyeballs caught my change in variable name that I kept missing.

Thanks very much,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Show only header of str() function

2021-09-02 Thread Deepayan Sarkar
On Thu, Sep 2, 2021 at 9:26 PM Enrico Schumann  wrote:
>
> On Thu, 02 Sep 2021, Luigi Marongiu writes:
>
> > Hello, is it possible to show only the header (that is: `'data.frame':
> > x obs. of  y variables:` part) of the str function?
> > Thank you
>
> Perhaps one more solution. You could limit the number
> of list components to be printed, though it will leave
> a "truncated" message.
>
> str(iris, list.len = 0)
> ## 'data.frame':150 obs. of  5 variables:
> ##   [list output truncated]

Or use 'max.level', which is also generally useful for nested lists:

str(iris, max.level=0)
## 'data.frame':150 obs. of  5 variables:

Best,
-Deepayan

> Since 'str' is a generic function, you could also
> define a new 'str' method. Perhaps something among
> those lines:
>
> str.data.frame.oneline <- function (object, ...) {
> cat("'data.frame':\t", nrow(object), " obs. of  ",
> (p <- length(object)),
> " variable", if (p != 1) "s", "\n", sep = "")
> invisible(NULL)
> }
>
> (which is essentially taken from 'str.data.frame').
>
> Then:
>
> class(iris) <- c("data.frame.oneline", class(iris))
>
> str(iris)
> ## 'data.frame':  150 obs. of  5 variables
>
> str(list(a = 1,
>  list(b = 2,
>   c = iris)))
> ## List of 2
> ##  $ a: num 1
> ##  $  :List of 2
> ##   ..$ b: num 2
> ##   ..$ c:'data.frame':   150 obs. of  5 variables
>
>
>
>
> --
> Enrico Schumann
> Lucerne, Switzerland
> http://enricoschumann.net
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv() error

2021-09-02 Thread Enrico Schumann
On Thu, 02 Sep 2021, Rich Shepard writes:

> The first three commands in the script are:
> stage <- read.csv('../data/water/gauge-ht.dat', header
> = TRUE, sep = ',', stringsAsFactors = FALSE)
> stage$sampdate <- as.Date(stage$sampdate)
> stage$ht <- as.numeric(stage$ht, length = 6)
>
> Running the script produces this error:
>> source('stage.R')
> Error in `$<-.data.frame`(`*tmp*`, ht, value = numeric(0)) :
>   replacement has 0 rows, data has 486336
>
> Sample lines from the data file:
> sampdate,samptime,elev
> 2007-10-01,01:00,2.80
> 2007-10-01,01:15,2.71
> 2007-10-01,01:30,2.63
> 2007-10-01,01:45,2.53
> 2007-10-01,02:00,2.45
> 2007-10-01,02:15,2.36
> 2007-10-01,02:30,2.27
> 2007-10-01,02:45,2.17
> 2007-10-01,03:00,2.07
>
> Maximum value for elev is about 11.00, 5 digits.
>
> I don't understand this error because the equivalent commands for another
> data source file completes without error.
>
> What is that error message telling me?
>
> TIA,
>
> Rich
>

(Sorry, sent too early.)

There is no column 'ht'.

df <- data.frame(a = 1:5)
df$b <- as.numeric(df$b)
## Error in `$<-.data.frame`(`*tmp*`, b, value = numeric(0)) : 
##   replacement has 0 rows, data has 5

-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv() error

2021-09-02 Thread Enrico Schumann
On Thu, 02 Sep 2021, Rich Shepard writes:

> The first three commands in the script are:
> stage <- read.csv('../data/water/gauge-ht.dat', header
> = TRUE, sep = ',', stringsAsFactors = FALSE)
> stage$sampdate <- as.Date(stage$sampdate)
> stage$ht <- as.numeric(stage$ht, length = 6)
>
> Running the script produces this error:
>> source('stage.R')
> Error in `$<-.data.frame`(`*tmp*`, ht, value = numeric(0)) :
>   replacement has 0 rows, data has 486336
>
> Sample lines from the data file:
> sampdate,samptime,elev
> 2007-10-01,01:00,2.80
> 2007-10-01,01:15,2.71
> 2007-10-01,01:30,2.63
> 2007-10-01,01:45,2.53
> 2007-10-01,02:00,2.45
> 2007-10-01,02:15,2.36
> 2007-10-01,02:30,2.27
> 2007-10-01,02:45,2.17
> 2007-10-01,03:00,2.07
>
> Maximum value for elev is about 11.00, 5 digits.
>
> I don't understand this error because the equivalent commands for another
> data source file completes without error.
>
> What is that error message telling me?
>
> TIA,
>
> Rich
>

There is no column 'ht'.

-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Show only header of str() function

2021-09-02 Thread Avi Gross via R-help
Thanks for the interesting method Rui. So that is a way to do a redirect of 
output not to a sinkfile but to an in-memory variable as a textConnection.

Of course, one has to wonder why the makers of str thought it would be too 
inefficient to have an option that returns the output in a form that can be 
captured directly, not just to the screen. 

I have in the past done odd things such as using sink() to capture the output 
of a program that wrote another program dynamically in a loop. The saved file 
could then be used with source(). So a similar technique can capture the output 
from str() or cat() or whatever normally only writes to the screen and then the 
file can be read in to get the first line or whatever you need. I have had to 
play games to get the right output from some statistical programs too as it was 
assumed the user would read it, and sometimes had to cherry pick what I needed 
directly from withing the underlying object.

I suspect one reason R has so many packages including the tidyverse I like to 
use, is because the original R was designed in another time and in many places 
is not very consistent. I wonder how hard it would be to change some programs 
to simply accept an additional argument like sink() has where you can say 
split=TRUE and get a copy of what is being diverted to also come to the screen. 
I find cat() to be a very useful way to put more complicated output together 
than say print() but since it does not allow capture of the text into 
variables, I end up having to use other methods such as the glue() function or 
something like print(sprint("Hello %s, I have %d left.\n", "Brian", 5))

But you work with what you have. Your solution works albeit having read the 
function definition, is quite a bit of overkill when I read the code as it does 
things not needed. But as noted, if efficiency matters and you are only looking 
at data.frame style objects, there are cheaper solutions.


-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Thursday, September 2, 2021 7:31 AM
To: Luigi Marongiu ; r-help 
Subject: Re: [R] Show only header of str() function

Hello,

Not perfect but works for data.frames:


header_str <- function(x){
   capture.output(str(x))[[1]]
}
header_str(iris)
header_str(AirPassengers)
header_str(1:10)


Hope this helps,

Rui Barradas

Às 12:02 de 02/09/21, Luigi Marongiu escreveu:
> Hello, is it possible to show only the header (that is: `'data.frame':
> x obs. of  y variables:` part) of the str function?
> Thank you
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.csv() error

2021-09-02 Thread Rich Shepard

The first three commands in the script are:
stage <- read.csv('../data/water/gauge-ht.dat', header = TRUE, sep = ',', 
stringsAsFactors = FALSE)

stage$sampdate <- as.Date(stage$sampdate)
stage$ht <- as.numeric(stage$ht, length = 6)

Running the script produces this error:

source('stage.R')

Error in `$<-.data.frame`(`*tmp*`, ht, value = numeric(0)) :
  replacement has 0 rows, data has 486336

Sample lines from the data file:
sampdate,samptime,elev
2007-10-01,01:00,2.80
2007-10-01,01:15,2.71
2007-10-01,01:30,2.63
2007-10-01,01:45,2.53
2007-10-01,02:00,2.45
2007-10-01,02:15,2.36
2007-10-01,02:30,2.27
2007-10-01,02:45,2.17
2007-10-01,03:00,2.07

Maximum value for elev is about 11.00, 5 digits.

I don't understand this error because the equivalent commands for another
data source file completes without error.

What is that error message telling me?

TIA,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Show only header of str() function

2021-09-02 Thread Enrico Schumann
On Thu, 02 Sep 2021, Luigi Marongiu writes:

> Hello, is it possible to show only the header (that is: `'data.frame':
> x obs. of  y variables:` part) of the str function?
> Thank you

Perhaps one more solution. You could limit the number
of list components to be printed, though it will leave
a "truncated" message.

str(iris, list.len = 0)
## 'data.frame':150 obs. of  5 variables:
##   [list output truncated]

Since 'str' is a generic function, you could also
define a new 'str' method. Perhaps something among
those lines:

str.data.frame.oneline <- function (object, ...) {
cat("'data.frame':\t", nrow(object), " obs. of  ",
(p <- length(object)), 
" variable", if (p != 1) "s", "\n", sep = "")
invisible(NULL)
}

(which is essentially taken from 'str.data.frame').

Then:

class(iris) <- c("data.frame.oneline", class(iris))

str(iris)
## 'data.frame':  150 obs. of  5 variables

str(list(a = 1,
 list(b = 2,
  c = iris)))
## List of 2
##  $ a: num 1
##  $  :List of 2
##   ..$ b: num 2
##   ..$ c:'data.frame':   150 obs. of  5 variables




-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Show only header of str() function

2021-09-02 Thread Avi Gross via R-help
Luigi,

If you are sure you are looking at something like a data.frame, and all you
want o know is how many rows and how many columns are in it, then str() is
perhaps too detailed a tool.

The functions nrow() and ncol() tell you what you want and you can get both
together with dim(). You can, of course, print out whatever message you want
using the numbers supplied by throwing together some function like this:

sstr <- function(x) {
  cat(nrow(x), "obs. of ", ncol(x), " variables\n")
}

Calling that instead of str may meet your needs.  Of course, unlike str, it
will not work on arbitrary data structures.

Note the output of str()goes straight to the screen, similar to what cat
does. Capturing the output to say chop out just the first line is not
therefore a simple option. 


-Original Message-
From: R-help  On Behalf Of Luigi Marongiu
Sent: Thursday, September 2, 2021 7:02 AM
To: r-help 
Subject: [R] Show only header of str() function

Hello, is it possible to show only the header (that is: `'data.frame':
x obs. of  y variables:` part) of the str function?
Thank you

--
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Luigi Marongiu
Thank you!

On Thu, Sep 2, 2021 at 4:17 PM Andrew Simmons  wrote:
>
> It seems like you might've missed one more thing, you need the brackets next 
> to 'x' to get it to work.
>
>
> x[] <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
>
> is different from
>
> x <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
>
> Also, if all of your data is numeric, it might be better to convert to a 
> matrix before doing your calculations. For example:
>
> x <- as.matrix(x)
> x[is.nan(x)] <- NA_real_
>
> I'd also suggest this same solution for the other question you posted,
>
> x[x == 0] <- NA
>
> On Thu, Sep 2, 2021 at 10:01 AM Luigi Marongiu  
> wrote:
>>
>> Sorry,
>> still I don't get it:
>> ```
>> > dim(df)
>> [1] 302 626
>> > # clean
>> > df <- lapply(x, function(xx) {
>> +   xx[is.nan(xx)] <- NA
>> +   xx
>> + })
>> > dim(df)
>> NULL
>> ```
>>
>> On Thu, Sep 2, 2021 at 3:47 PM Andrew Simmons  wrote:
>> >
>> > You removed the second line 'xx' from the function, put it back and it 
>> > should work
>> >
>> > On Thu, Sep 2, 2021, 09:45 Luigi Marongiu  wrote:
>> >>
>> >> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
>> >> still get NaN when using the summary function, for instance one of the
>> >> columns give:
>> >> ```
>> >> Min.   : NA
>> >> 1st Qu.: NA
>> >> Median : NA
>> >> Mean   :NaN
>> >> 3rd Qu.: NA
>> >> Max.   : NA
>> >> NA's   :110
>> >> ```
>> >> I tried to implement the second solution but:
>> >> ```
>> >> df <- lapply(x, function(xx) {
>> >>   xx[is.nan(xx)] <- NA
>> >> })
>> >> > str(df)
>> >> List of 1
>> >>  $ sd_ef_rash_loc___palm: logi NA
>> >> ```
>> >> What am I getting wrong?
>> >> Thanks
>> >>
>> >> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons  wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> >
>> >> > I would use something like:
>> >> >
>> >> >
>> >> > x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |> 
>> >> > as.data.frame()
>> >> > x[] <- lapply(x, function(xx) {
>> >> > xx[is.nan(xx)] <- NA_real_
>> >> > xx
>> >> > })
>> >> >
>> >> >
>> >> > This prevents attributes from being changed in 'x', but accomplishes 
>> >> > the same thing as you have above, I hope this helps!
>> >> >
>> >> > On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu 
>> >> >  wrote:
>> >> >>
>> >> >> Hello,
>> >> >> I have some NaN values in some elements of a dataframe that I would
>> >> >> like to convert to NA.
>> >> >> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
>> >> >> Is there an alternative for the global modification at once of all
>> >> >> instances?
>> >> >> I have seen from
>> >> >> https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-with-zero-in-a-huge-data-frame/18143097#18143097
>> >> >> that once could use:
>> >> >> ```
>> >> >>
>> >> >> is.nan.data.frame <- function(x)
>> >> >> do.call(cbind, lapply(x, is.nan))
>> >> >>
>> >> >> data123[is.nan(data123)] <- 0
>> >> >> ```
>> >> >> replacing o with NA, but I got
>> >> >> ```
>> >> >> str(df)
>> >> >> > logi NA
>> >> >> ```
>> >> >> when modifying my dataframe df.
>> >> >> What would be the correct syntax?
>> >> >> Thank you
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Best regards,
>> >> >> Luigi
>> >> >>
>> >> >> __
>> >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> PLEASE do read the posting guide 
>> >> >> http://www.R-project.org/posting-guide.html
>> >> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >> Luigi
>>
>>
>>
>> --
>> Best regards,
>> Luigi



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Andrew Simmons
It seems like you might've missed one more thing, you need the brackets
next to 'x' to get it to work.


x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})

is different from

x <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})

Also, if all of your data is numeric, it might be better to convert to a
matrix before doing your calculations. For example:

x <- as.matrix(x)
x[is.nan(x)] <- NA_real_

I'd also suggest this same solution for the other question you posted,

x[x == 0] <- NA

On Thu, Sep 2, 2021 at 10:01 AM Luigi Marongiu 
wrote:

> Sorry,
> still I don't get it:
> ```
> > dim(df)
> [1] 302 626
> > # clean
> > df <- lapply(x, function(xx) {
> +   xx[is.nan(xx)] <- NA
> +   xx
> + })
> > dim(df)
> NULL
> ```
>
> On Thu, Sep 2, 2021 at 3:47 PM Andrew Simmons  wrote:
> >
> > You removed the second line 'xx' from the function, put it back and it
> should work
> >
> > On Thu, Sep 2, 2021, 09:45 Luigi Marongiu 
> wrote:
> >>
> >> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
> >> still get NaN when using the summary function, for instance one of the
> >> columns give:
> >> ```
> >> Min.   : NA
> >> 1st Qu.: NA
> >> Median : NA
> >> Mean   :NaN
> >> 3rd Qu.: NA
> >> Max.   : NA
> >> NA's   :110
> >> ```
> >> I tried to implement the second solution but:
> >> ```
> >> df <- lapply(x, function(xx) {
> >>   xx[is.nan(xx)] <- NA
> >> })
> >> > str(df)
> >> List of 1
> >>  $ sd_ef_rash_loc___palm: logi NA
> >> ```
> >> What am I getting wrong?
> >> Thanks
> >>
> >> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons 
> wrote:
> >> >
> >> > Hello,
> >> >
> >> >
> >> > I would use something like:
> >> >
> >> >
> >> > x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
> as.data.frame()
> >> > x[] <- lapply(x, function(xx) {
> >> > xx[is.nan(xx)] <- NA_real_
> >> > xx
> >> > })
> >> >
> >> >
> >> > This prevents attributes from being changed in 'x', but accomplishes
> the same thing as you have above, I hope this helps!
> >> >
> >> > On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu <
> marongiu.lu...@gmail.com> wrote:
> >> >>
> >> >> Hello,
> >> >> I have some NaN values in some elements of a dataframe that I would
> >> >> like to convert to NA.
> >> >> The command `df1$col[is.nan(df1$col)]<-NA` allows to work
> column-wise.
> >> >> Is there an alternative for the global modification at once of all
> >> >> instances?
> >> >> I have seen from
> >> >>
> https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-with-zero-in-a-huge-data-frame/18143097#18143097
> >> >> that once could use:
> >> >> ```
> >> >>
> >> >> is.nan.data.frame <- function(x)
> >> >> do.call(cbind, lapply(x, is.nan))
> >> >>
> >> >> data123[is.nan(data123)] <- 0
> >> >> ```
> >> >> replacing o with NA, but I got
> >> >> ```
> >> >> str(df)
> >> >> > logi NA
> >> >> ```
> >> >> when modifying my dataframe df.
> >> >> What would be the correct syntax?
> >> >> Thank you
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best regards,
> >> >> Luigi
> >> >>
> >> >> __
> >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Luigi
>
>
>
> --
> Best regards,
> Luigi
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Luigi Marongiu
Sorry,
still I don't get it:
```
> dim(df)
[1] 302 626
> # clean
> df <- lapply(x, function(xx) {
+   xx[is.nan(xx)] <- NA
+   xx
+ })
> dim(df)
NULL
```

On Thu, Sep 2, 2021 at 3:47 PM Andrew Simmons  wrote:
>
> You removed the second line 'xx' from the function, put it back and it should 
> work
>
> On Thu, Sep 2, 2021, 09:45 Luigi Marongiu  wrote:
>>
>> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
>> still get NaN when using the summary function, for instance one of the
>> columns give:
>> ```
>> Min.   : NA
>> 1st Qu.: NA
>> Median : NA
>> Mean   :NaN
>> 3rd Qu.: NA
>> Max.   : NA
>> NA's   :110
>> ```
>> I tried to implement the second solution but:
>> ```
>> df <- lapply(x, function(xx) {
>>   xx[is.nan(xx)] <- NA
>> })
>> > str(df)
>> List of 1
>>  $ sd_ef_rash_loc___palm: logi NA
>> ```
>> What am I getting wrong?
>> Thanks
>>
>> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons  wrote:
>> >
>> > Hello,
>> >
>> >
>> > I would use something like:
>> >
>> >
>> > x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |> 
>> > as.data.frame()
>> > x[] <- lapply(x, function(xx) {
>> > xx[is.nan(xx)] <- NA_real_
>> > xx
>> > })
>> >
>> >
>> > This prevents attributes from being changed in 'x', but accomplishes the 
>> > same thing as you have above, I hope this helps!
>> >
>> > On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu  
>> > wrote:
>> >>
>> >> Hello,
>> >> I have some NaN values in some elements of a dataframe that I would
>> >> like to convert to NA.
>> >> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
>> >> Is there an alternative for the global modification at once of all
>> >> instances?
>> >> I have seen from
>> >> https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-with-zero-in-a-huge-data-frame/18143097#18143097
>> >> that once could use:
>> >> ```
>> >>
>> >> is.nan.data.frame <- function(x)
>> >> do.call(cbind, lapply(x, is.nan))
>> >>
>> >> data123[is.nan(data123)] <- 0
>> >> ```
>> >> replacing o with NA, but I got
>> >> ```
>> >> str(df)
>> >> > logi NA
>> >> ```
>> >> when modifying my dataframe df.
>> >> What would be the correct syntax?
>> >> Thank you
>> >>
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >> Luigi
>> >>
>> >> __
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide 
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Best regards,
>> Luigi



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop over columns of dataframe and change values condtionally

2021-09-02 Thread Rui Barradas

Hello,

In the particular case you have, to change to NA based on condition, use 
`is.na<-`.


Here is some test data, 3 times the same df.


set.seed(2021)
df3 <- df2 <- df1 <- data.frame(
  x = c(0, 0, 1, 2, 3),
  y = c(1, 2, 3, 0, 0),
  z = rbinom(5, 1, prob = c(0.25, 0.75)),
  a = letters[1:5]
)


# change all columns
is.na(df1) <- df1 == 0
df1

# only one column
is.na(df2[, 2]) <- df2[, 2] == 0
df2

# change several columns given by an index
is.na(df3[c(1, 3)]) <- df3[c(1, 3)] == 0
df3


Hope this helps,

Rui Barradas


Às 14:35 de 02/09/21, Luigi Marongiu escreveu:

Hello,
it is possible to select the columns of a dataframe in sequence with:
```
for(i in 1:ncol(df)) {
   df[ , i]
}
# or
for(i in 1:ncol(df)) {
   df[ i]
}
```
And change all values with, for instance:
```
for(i in 1:ncol(df)) {
   df[ , i] <- df[ , i] + 10
}
```
Is it possible to apply a condition? What would be the syntax?
For instance, to change all 0s in a column to NA would `df[i][df[i ==
0] = NA` be right?
Thank you




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop over columns of dataframe and change values condtionally

2021-09-02 Thread PIKAL Petr
Hi

you could operate with whole data frame (sometimes)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1  5.1 3.5  1.4 0.2  setosa
2  4.9 3.0  1.4 0.2  setosa
3  4.7 3.2  1.3 0.2  setosa
4  4.6 3.1  1.5 0.2  setosa
5  5.0 3.6  1.4 0.2  setosa
6  5.4 3.9  1.7 0.4  setosa

chenge all

> head(iris[,1:4]+10) 
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1 15.113.5 11.410.2
2 14.913.0 11.410.2
3 14.713.2 11.310.2
4 14.613.1 11.510.2
5 15.013.6 11.410.2
6 15.413.9 11.710.4

change only some
> iris[,1:4][iris[,1:4]<2] <- iris[,1:4][iris[,1:4]<2]+10
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1  5.1 3.5 11.410.2  setosa
2  4.9 3.0 11.410.2  setosa
3  4.7 3.2 11.310.2  setosa
4  4.6 3.1 11.510.2  setosa
5  5.0 3.6 11.410.2  setosa
6  5.4 3.9 11.710.4  setosa


Cheers
Petr


> -Original Message-
> From: R-help  On Behalf Of Luigi Marongiu
> Sent: Thursday, September 2, 2021 3:35 PM
> To: r-help 
> Subject: [R] Loop over columns of dataframe and change values condtionally
> 
> Hello,
> it is possible to select the columns of a dataframe in sequence with:
> ```
> for(i in 1:ncol(df)) {
>   df[ , i]
> }
> # or
> for(i in 1:ncol(df)) {
>   df[ i]
> }
> ```
> And change all values with, for instance:
> ```
> for(i in 1:ncol(df)) {
>   df[ , i] <- df[ , i] + 10
> }
> ```
> Is it possible to apply a condition? What would be the syntax?
> For instance, to change all 0s in a column to NA would `df[i][df[i == 0] =
NA`
> be right?
> Thank you
> 
> 
> --
> Best regards,
> Luigi
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Andrew Simmons
You removed the second line 'xx' from the function, put it back and it
should work

On Thu, Sep 2, 2021, 09:45 Luigi Marongiu  wrote:

> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
> still get NaN when using the summary function, for instance one of the
> columns give:
> ```
> Min.   : NA
> 1st Qu.: NA
> Median : NA
> Mean   :NaN
> 3rd Qu.: NA
> Max.   : NA
> NA's   :110
> ```
> I tried to implement the second solution but:
> ```
> df <- lapply(x, function(xx) {
>   xx[is.nan(xx)] <- NA
> })
> > str(df)
> List of 1
>  $ sd_ef_rash_loc___palm: logi NA
> ```
> What am I getting wrong?
> Thanks
>
> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons  wrote:
> >
> > Hello,
> >
> >
> > I would use something like:
> >
> >
> > x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
> as.data.frame()
> > x[] <- lapply(x, function(xx) {
> > xx[is.nan(xx)] <- NA_real_
> > xx
> > })
> >
> >
> > This prevents attributes from being changed in 'x', but accomplishes the
> same thing as you have above, I hope this helps!
> >
> > On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu 
> wrote:
> >>
> >> Hello,
> >> I have some NaN values in some elements of a dataframe that I would
> >> like to convert to NA.
> >> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
> >> Is there an alternative for the global modification at once of all
> >> instances?
> >> I have seen from
> >>
> https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-with-zero-in-a-huge-data-frame/18143097#18143097
> >> that once could use:
> >> ```
> >>
> >> is.nan.data.frame <- function(x)
> >> do.call(cbind, lapply(x, is.nan))
> >>
> >> data123[is.nan(data123)] <- 0
> >> ```
> >> replacing o with NA, but I got
> >> ```
> >> str(df)
> >> > logi NA
> >> ```
> >> when modifying my dataframe df.
> >> What would be the correct syntax?
> >> Thank you
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Luigi
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Best regards,
> Luigi
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Luigi Marongiu
`data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
still get NaN when using the summary function, for instance one of the
columns give:
```
Min.   : NA
1st Qu.: NA
Median : NA
Mean   :NaN
3rd Qu.: NA
Max.   : NA
NA's   :110
```
I tried to implement the second solution but:
```
df <- lapply(x, function(xx) {
  xx[is.nan(xx)] <- NA
})
> str(df)
List of 1
 $ sd_ef_rash_loc___palm: logi NA
```
What am I getting wrong?
Thanks

On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons  wrote:
>
> Hello,
>
>
> I would use something like:
>
>
> x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |> 
> as.data.frame()
> x[] <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
>
>
> This prevents attributes from being changed in 'x', but accomplishes the same 
> thing as you have above, I hope this helps!
>
> On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu  
> wrote:
>>
>> Hello,
>> I have some NaN values in some elements of a dataframe that I would
>> like to convert to NA.
>> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
>> Is there an alternative for the global modification at once of all
>> instances?
>> I have seen from
>> https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-with-zero-in-a-huge-data-frame/18143097#18143097
>> that once could use:
>> ```
>>
>> is.nan.data.frame <- function(x)
>> do.call(cbind, lapply(x, is.nan))
>>
>> data123[is.nan(data123)] <- 0
>> ```
>> replacing o with NA, but I got
>> ```
>> str(df)
>> > logi NA
>> ```
>> when modifying my dataframe df.
>> What would be the correct syntax?
>> Thank you
>>
>>
>>
>> --
>> Best regards,
>> Luigi
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loop over columns of dataframe and change values condtionally

2021-09-02 Thread Luigi Marongiu
Hello,
it is possible to select the columns of a dataframe in sequence with:
```
for(i in 1:ncol(df)) {
  df[ , i]
}
# or
for(i in 1:ncol(df)) {
  df[ i]
}
```
And change all values with, for instance:
```
for(i in 1:ncol(df)) {
  df[ , i] <- df[ , i] + 10
}
```
Is it possible to apply a condition? What would be the syntax?
For instance, to change all 0s in a column to NA would `df[i][df[i ==
0] = NA` be right?
Thank you


-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Andrew Simmons
Hello,


I would use something like:


x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
as.data.frame()
x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})


This prevents attributes from being changed in 'x', but accomplishes the
same thing as you have above, I hope this helps!

On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu 
wrote:

> Hello,
> I have some NaN values in some elements of a dataframe that I would
> like to convert to NA.
> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
> Is there an alternative for the global modification at once of all
> instances?
> I have seen from
>
> https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-with-zero-in-a-huge-data-frame/18143097#18143097
> that once could use:
> ```
>
> is.nan.data.frame <- function(x)
> do.call(cbind, lapply(x, is.nan))
>
> data123[is.nan(data123)] <- 0
> ```
> replacing o with NA, but I got
> ```
> str(df)
> > logi NA
> ```
> when modifying my dataframe df.
> What would be the correct syntax?
> Thank you
>
>
>
> --
> Best regards,
> Luigi
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread PIKAL Petr
Hi

what about

data[sapply(data, is.nan)] <- NA

Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of Luigi Marongiu
> Sent: Thursday, September 2, 2021 3:18 PM
> To: r-help 
> Subject: [R] How to globally convert NaN to NA in dataframe?
> 
> Hello,
> I have some NaN values in some elements of a dataframe that I would like
to
> convert to NA.
> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
> Is there an alternative for the global modification at once of all
instances?
> I have seen from
> https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-
> with-zero-in-a-huge-data-frame/18143097#18143097
> that once could use:
> ```
> 
> is.nan.data.frame <- function(x)
> do.call(cbind, lapply(x, is.nan))
> 
> data123[is.nan(data123)] <- 0
> ```
> replacing o with NA, but I got
> ```
> str(df)
> > logi NA
> ```
> when modifying my dataframe df.
> What would be the correct syntax?
> Thank you
> 
> 
> 
> --
> Best regards,
> Luigi
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to globally convert NaN to NA in dataframe?

2021-09-02 Thread Luigi Marongiu
Hello,
I have some NaN values in some elements of a dataframe that I would
like to convert to NA.
The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
Is there an alternative for the global modification at once of all
instances?
I have seen from
https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-with-zero-in-a-huge-data-frame/18143097#18143097
that once could use:
```

is.nan.data.frame <- function(x)
do.call(cbind, lapply(x, is.nan))

data123[is.nan(data123)] <- 0
```
replacing o with NA, but I got
```
str(df)
> logi NA
```
when modifying my dataframe df.
What would be the correct syntax?
Thank you



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Show only header of str() function

2021-09-02 Thread Luigi Marongiu
Thank you! better than dim() anyway.
Best regards
Luigi

On Thu, Sep 2, 2021 at 1:31 PM Rui Barradas  wrote:
>
> Hello,
>
> Not perfect but works for data.frames:
>
>
> header_str <- function(x){
>capture.output(str(x))[[1]]
> }
> header_str(iris)
> header_str(AirPassengers)
> header_str(1:10)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 12:02 de 02/09/21, Luigi Marongiu escreveu:
> > Hello, is it possible to show only the header (that is: `'data.frame':
> > x obs. of  y variables:` part) of the str function?
> > Thank you
> >



-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Show only header of str() function

2021-09-02 Thread Rui Barradas

Hello,

Not perfect but works for data.frames:


header_str <- function(x){
  capture.output(str(x))[[1]]
}
header_str(iris)
header_str(AirPassengers)
header_str(1:10)


Hope this helps,

Rui Barradas

Às 12:02 de 02/09/21, Luigi Marongiu escreveu:

Hello, is it possible to show only the header (that is: `'data.frame':
x obs. of  y variables:` part) of the str function?
Thank you



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Show only header of str() function

2021-09-02 Thread Luigi Marongiu
Hello, is it possible to show only the header (that is: `'data.frame':
x obs. of  y variables:` part) of the str function?
Thank you

-- 
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What if there's nothing to dispatch on?

2021-09-02 Thread Rolf Turner


On Wed, 1 Sep 2021 19:29:32 -0400
Duncan Murdoch  wrote:



> I don't know the header of your foo() method, but let's suppose foo()
> is
> 
>foo <- function(x, data, ...) {
>  UseMethod("foo")
>}
> 
> with
> 
>foo.formula <- function(x, data, ...) {
>  # do something with the formula x
>}
> 
>foo.default <- function(x, data, ...) {
>  # do the default thing.
>}
> 
> Now you have
> 
>xxx <- data.frame(u = 1:10, v = rnorm(10))
>foo(x = u, y = v, data = xxx)
> 
> You want this to dispatch to the default method, because u is not a 
> formula, it's a column in xxx.  But how do you know that?  Maybe in
> some other part of your code you have
> 
>u <- someresponse ~ somepredictor

Well I *don't* have such code anywhere, but a user could have such a
formula saved in the global environment.

> So now u *is* a formula, and this will dispatch to the formula
> method, causing havoc.
> 
> I think Bill's suggestion doesn't help here.  To do what you want to
> do doesn't really match what S3 is designed to do.

Yes.  I have come to realise that and have moved away from the S3
classes and method approach. I now have a solution with which I am
basically satisfied.  But I now understand the problem that you raised.
(Sorry to be so slow!  And thank you for the explanation.)

We need to guard against the possibility that a user may invoke the
"non-formula" syntax, foo(x,y,data)  where x is the predictor and y is
the response, and inadvertently trigger the formula syntax because
there is a pre-constructed formula, with the same name as x, hanging
about.

Not really very likely, but certainly not impossible.

I think that the following works:  suppose that x turns out (using
your handy-dandy try() trick) to be a formula.

x1 <-try(x,silent=TRUE)

If inherits(x1,"formula") firstly check whether this formula exists in
the global environment:

nmx <- deparse(substitute(x))
if(exists(nmx,envir=.GlobalEnv)) {
(throw an error)
}

I have also added an argument forceFormula=FALSE, which if set to TRUE
prevents the error from being thrown.   Just in case using the formula
named by x *really is* what the user wants to do!

I've tested this out a bit (in my real application) and it seems to
work.  I'm sure that there are other pitfalls and Traps for Young
Players.  E.g. someone might call my function from inside
another function in which the offending formula is constructed.
So the offending formula *won't* be found in the global environment and
the error won't be triggered.  Psigh! Somebody will always be able to
find a way to break things. See fortunes::fortune(15).

However I think the code that I have written is reasonably robust, and
does what I want.  (BTW I want the function to accommodate the
"non-formula" syntax, as well as the formula syntax, to maintain some
semblance of backwards-compatibility.)

Thanks again for (a) the try() trick, and (b) pointing out the lurking
danger.

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional replacement of elements of matrix with another matrix column

2021-09-02 Thread Rui Barradas

Hello,

With the new data, here are two ways.
The first with a for loop. I find it simple and readable.


for(b in unique(B[,1])){
  A[which(A[,1] == b), 2] <- B[which(B[,1] == b), 2]
}
na <- is.na(A[,2])
A[!na, 2]

sum(!na)   # [1] 216
sum(A[,1] %in% B[,1])  # [1] 216

# Another way, with merge
mrg <- merge(as.data.frame(A), as.data.frame(B), by = "V1", all.x = 
TRUE)[c(1, 3)]

sum(!is.na(mrg[[2]]))  # [1] 216

identical(A[,2], mrg[[2]])  # [1] TRUE


Note that mrg is a data.frame, you can coerce back to matrix


A <- as.matrix(mrg)


Hope this helps,

Rui Barradas


Às 23:00 de 01/09/21, Eliza Botto escreveu:

I thank you all. But the code doesn't work on my different dataset where A and 
B have different column lengths. For example,


dput(A)

structure(c(17897, 17897, 17897, 17897, 17897, 17897, 17897,
17897, 17897, 17897, 17897, 17897, 17897, 17897, 17897, 17897,
17897, 17897, 17897, 17897, 17897, 17897, 17897, 17897, 17898,
17898, 17898, 17898, 17898, 17898, 17898, 17898, 17898, 17898,
17898, 17898, 17898, 17898, 17898, 17898, 17898, 17898, 17898,
17898, 17898, 17898, 17898, 17898, 17899, 17899, 17899, 17899,
17899, 17899, 17899, 17899, 17899, 17899, 17899, 17899, 17899,
17899, 17899, 17899, 17899, 17899, 17899, 17899, 17899, 17899,
17899, 17899, 17900, 17900, 17900, 17900, 17900, 17900, 17900,
17900, 17900, 17900, 17900, 17900, 17900, 17900, 17900, 17900,
17900, 17900, 17900, 17900, 17900, 17900, 17900, 17900, 17901,
17901, 17901, 17901, 17901, 17901, 17901, 17901, 17901, 17901,
17901, 17901, 17901, 17901, 17901, 17901, 17901, 17901, 17901,
17901, 17901, 17901, 17901, 17901, 17902, 17902, 17902, 17902,
17902, 17902, 17902, 17902, 17902, 17902, 17902, 17902, 17902,
17902, 17902, 17902, 17902, 17902, 17902, 17902, 17902, 17902,
17902, 17902, 17903, 17903, 17903, 17903, 17903, 17903, 17903,
17903, 17903, 17903, 17903, 17903, 17903, 17903, 17903, 17903,
17903, 17903, 17903, 17903, 17903, 17903, 17903, 17903, 17904,
17904, 17904, 17904, 17904, 17904, 17904, 17904, 17904, 17904,
17904, 17904, 17904, 17904, 17904, 17904, 17904, 17904, 17904,
17904, 17904, 17904, 17904, 17904, 17905, 17905, 17905, 17905,
17905, 17905, 17905, 17905, 17905, 17905, 17905, 17905, 17905,
17905, 17905, 17905, 17905, 17905, 17905, 17905, 17905, 17905,
17905, 17905, 17906, 17906, 17906, 17906, 17906, 17906, 17906,
17906, 17906, 17906, 17906, 17906, 17906, 17906, 17906, 17906,
17906, 17906, 17906, 17906, 17906, 17906, 17906, 17906, 17907,
17907, 17907, 17907, 17907, 17907, 17907, 17907, 17907, 17907,
17907, 17907, 17907, 17907, 17907, 17907, 17907, 17907, 17907,
17907, 17907, 17907, 17907, 17907, 17908, 17908, 17908, 17908,
17908, 17908, 17908, 17908, 17908, 17908, 17908, 17908, 17908,
17908, 17908, 17908, 17908, 17908, 17908, 17908, 17908, 17908,
17908, 17908, 17909, 17909, 17909, 17909, 17909, 17909, 17909,
17909, 17909, 17909, 17909, 17909, 17909, 17909, 17909, 17909,
17909, 17909, 17909, 17909, 17909, 17909, 17909, 17909, 17910,
17910, 17910, 17910, 17910, 17910, 17910, 17910, 17910, 17910,
17910, 17910, 17910, 17910, 17910, 17910, 17910, 17910, 17910,
17910, 17910, 17910, 17910, 17910, 17911, 17911, 17911, 17911,
17911, 17911, 17911, 17911, 17911, 17911, 17911, 17911, 17911,
17911, 17911, 17911, 17911, 17911, 17911, 17911, 17911, 17911,
17911, 17911, 17912, 17912, 17912, 17912, 17912, 17912, 17912,
17912, 17912, 17912, 17912, 17912, 17912, 17912, 17912, 17912,
17912, 17912, 17912, 17912, 17912, 17912, 17912, 17912, 17913,
17913, 17913, 17913, 17913, 17913, 17913, 17913, 17913, 17913,
17913, 17913, 17913, 17913, 17913, 17913, 17913, 17913, 17913,
17913, 17913, 17913, 17913, 17913, 17914, 17914, 17914, 17914,
17914, 17914, 17914, 17914, 17914, 17914, 17914, 17914, 17914,
17914, 17914, 17914, 17914, 17914, 17914, 17914, 17914, 17914,
17914, 17914, 17915, 17915, 17915, 17915, 17915, 17915, 17915,
17915, 17915, 17915, 17915, 17915, 17915, 17915, 17915, 17915,
17915, 17915, 17915, 17915, 17915, 17915, 17915, 17915, 17916,
17916, 17916, 17916, 17916, 17916, 17916, 17916, 17916, 17916,
17916, 17916, 17916, 17916, 17916, 17916, 17916, 17916, 17916,
17916, 17916, 17916, 17916, 17916, 17917, 17917, 17917, 17917,
17917, 17917, 17917, 17917, 17917, 17917, 17917, 17917, 17917,
17917, 17917, 17917, 17917, 17917, 17917, 17917, 17917, 17917,
17917, 17917, 17918, 17918, 17918, 17918, 17918, 17918, 17918,
17918, 17918, 17918, 17918, 17918, 17918, 17918, 17918, 17918,
17918, 17918, 17918, 17918, 17918, 17918, 17918, 17918, 17919,
17919, 17919, 17919, 17919, 17919, 17919, 17919, 17919, 17919,
17919, 17919, 17919, 17919, 17919, 17919, 17919, 17919, 17919,
17919, 17919, 17919, 17919, 17919, 17920, 17920, 17920, 17920,
17920, 17920, 17920, 17920, 17920, 17920, 17920, 17920, 17920,
17920, 17920, 17920, 17920, 17920, 17920, 17920, 17920, 17920,
17920, 17920, 17921, 17921, 17921, 17921, 17921, 17921, 17921,
17921, 17921, 17921, 17921, 17921, 17921, 17921, 17921, 17921,
17921, 17921, 17921, 17921, 17921, 17921, 17921, 17921, 17922,
17922, 

Re: [R] how to install npsm package

2021-09-02 Thread caghpm
Thank you, Eric. Very useful. 

 

From: Eric Berger  
Sent: Wednesday, September 1, 2021 12:31 PM
To: cag...@gmail.com
Cc: R mailing list 
Subject: Re: [R] how to install npsm package

 

Instructions can be found at https://github.com/kloke/npsm

 

 

On Wed, Sep 1, 2021 at 6:27 PM mailto:cag...@gmail.com> > 
wrote:

I need to install the package "npsm" to follow Kloke & McKean book. However,
npsm is no longer on CRAN. So, please let me know in detail how to proceed
to install it.



Thanks.



Carlos Gonzalez


[[alternative HTML version deleted]]

__
R-help@r-project.org   mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combining geom_boxplot and geom_point with jitter

2021-09-02 Thread Ivan Calandra

Dear useRs,

I'm having a problem to combine geom_boxplot and geom_point with jitter. 
It is difficult to explain but the code and result should make it clear 
(the example dataset is long so I copy it at the end of the email):


p <- ggplot(my_data, aes(x = Diet, y = value, color = Software))
p <- p + geom_boxplot(outlier.shape = NA)
p <- p + geom_point(mapping = aes(shape = NMP_cat), position = 
position_jitterdodge())

print(p)

As you can see in the resulting plot, the points with different shapes 
are dodged across the boxplot categories (colors). I'd like the three 
shapes per color to be restricted in one boxplot color, with jitter of 
course to better visualize the points.


Does that make sense?

I have played with the arguments of position_jitterdodge(), but it seems 
to me that the problem is that the shape aesthetic is not in the 
geom_boxplot() call (but I don't want it there, see below).


For background information, the column used for shape gives some sort of 
"quality" to the points; that's why I want to show the points 
differently, so that it can easily be seen whether "good" points plot in 
the same area as the "bad" points.
Because I'm doing facet plots with other variables, I do not want to 
separate these categories in the boxplots - the resulting plots would be 
overcrowded.


Thank you for the help.
Ivan

---

my_data <- structure(list(Diet = c("Dry lucerne", "Dry lucerne", "Dry 
lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", 
"Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry 
lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", 
"Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry 
lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", 
"Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry 
lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", 
"Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne",
"Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry 
lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", "Dry lucerne", 
"Dry lucerne", "Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry 
grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", 
"Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry 
grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", 
"Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry 
grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", 
"Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry 
grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", "Dry grass", 
"Dry grass", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", "Dry bamboo", 
"Dry bamboo", "Dry bamboo",
"Dry bamboo", "Dry bamboo"), Software = c("ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax",
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax",
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", 
"Toothfrax", "ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 
"ConfoMap", "Toothfrax", "ConfoMap", "Toothfrax", 

Re: [R] ISO Code for Namibia ('NA')

2021-09-02 Thread Dr Eberhard Lisse

Thank you.

el


On 02/09/2021 00:41, Bill Dunlap wrote:

z <- tibble(Code=c("NA","NZ",NA), Name=c("Namibia","New Zealand","?"))
z

# A tibble: 3 x 2
   Code  Name

1 NANamibia
2 NZNew Zealand
3   ?

subset(z, Code=="NA")

# A tibble: 1 x 2
   Code  Name

1 NANamibia

subset(z, is.na(Code))

# A tibble: 1 x 2
   Code  Name

1   ?

subset(z, Code==NA_character_)

# A tibble: 0 x 2
# ... with 2 variables: Code , Name 

On Wed, Sep 1, 2021 at 3:33 PM Dr Eberhard Lisse  wrote:


Hi,

how can I look for the ISO code for Namibia 'NA' in a list of ISO codes
which looks something like

 # A tibble: 10 × 1
location_code

  1 NC
[...]
 10 NZ

but should look like

 # A tibble: 10 × 1
location_code

  1 NA
  2 NC
[...]
 11 NZ

In other words 'NA' is taken for the missing value NA.

greetings, el

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]




--
To email me replace 'nospam' with 'el'

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.