Re: [R] How to speed up a double loop?

2015-03-03 Thread jeff6868
I tried another faster way which seems to do the trick right now:

myts
-data.frame(x=c(10,2,50,40,NA,NA,0,50,1,2,0,0,NA,50,0,15,3,5,4,20,0,0,25,22,0,1,100),z=NA)
 

test - function(x){
st1 - numeric(length(x))
temp - st1[1]
for (i in 2:(length(x))){ 
if((!is.na(x[i]))  (!is.na(x[i-1])) (abs((x[i])-(temp)) = 15)){
st1[i] - 1
} } 
return(st1)
}

myts[,2] - apply(as.data.frame(myts[,1]),2,test)  
myts[,2] - as.numeric(myts[,2])

Thanks anyway for your help!



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-speed-up-a-double-loop-tp4704054p4704112.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to speed up a double loop?

2015-03-02 Thread jeff6868
Dear R-users,

I would like to speed up a double-loop I developed for detecting and
removing outliers in my whole data.frame. The idea is to remove data with a
too big difference with the previous value. If detected, this test must be
done here on maximum the next 10 values following the last correct one (and
put an index on another column).

It works well on a small data frame, but really too slowly for my real DF
with 500 000 rows.
Here's a fake data example and the double-loop:

myts - data.frame(x=c(1,2,50,40,30,40,100,1,50,1,2,3,3,5,4),y=NA)

for(jj in 1:(nrow(myts)-10)){
for(nn in ((jj+1):(jj+10))) {
   if((!is.na(myts[jj,1]))  (!is.na(myts[nn,1])) 
(abs((myts[nn,1])-(myts[jj,1]))15))
   { myts[nn,2] - 1
 myts[nn,1] - NA } } } 

Can somebody explain me how can I speed this up easily? I heard about
vectorization but I don't really understand how it works.




--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-speed-up-a-double-loop-tp4704054.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting data frame by prepared order

2015-03-02 Thread jeff6868
Hi,

Maybe a beginning of solution with this?

test -
data.frame(x=c(1,1,1,1,1,1,2,2,2,2,2,2),y=c(a,a,a,b,b,b,a,a,b,b,b,a))
test[order(test$x),]

out - split(test,test$x)

for (i in 1:length(out)) {
foo - unique(out[[i]][,2])
   out[[i]][,2] - rep(foo,(nrow(out[[i]])/(length(foo }

Seems to work for an length with a even value of your unique values in your
first column. But still a problem for odd lengths. Maybe solved by adding
fake rows that you can remove afterwords (with a specific index for
example).



--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-data-frame-by-prepared-order-tp4704038p4704058.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to speed up a double loop?

2015-03-02 Thread jeff6868
Hi Petr,

Thanks for your reply,

Actually it's not what I'm looking for. The aim is not simply to remove each
value  15. 

In my loop, I consider the first numeric value of my column as correct.
Then, I want to test the second value. If the absolute difference with the
previous correct one is 15, it's a new correct one, but if it's 15, then
it's a wrong one. 
If it's a wrong one, it has to test the third one to check if it's still 15
from the last correct value (first one).
The value becomes correct again when the difference with the last correct
one goes under 15 (and so, this value is the new correct one, and so one
for the rest of the column).

My loop is already doing the trick, but I just want to speed it up (or maybe
another faster way to do the job).
Hope it's more understandable right now! 





--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-speed-up-a-double-loop-tp4704054p4704061.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transpose a data frame according to a specific variable

2015-02-10 Thread jeff6868
Both ways are doing well the job. Nice!
Thanks again!



--
View this message in context: 
http://r.789695.n4.nabble.com/transpose-a-data-frame-according-to-a-specific-variable-tp4702971p4703007.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] transpose a data frame according to a specific variable

2015-02-09 Thread jeff6868
Dear R-users,

I would like to transpose a large data.frame according to a specific column.
Here's a reproductible example, it will be more understandable.

At the moment, my data.frame looks like this example:

DF - data.frame(id=c(A,A,A,B,B,B,C,C,C),
Year=c(2001,2002,2003,2002,2003,2004,2000,2001,2002),
Day=c(120,90,54,18,217,68,164,99,48))

I would like it being transformed to this (fake example again, still just
for being understandable):

finalDF -
data.frame(id=c(A,B,C),2000=c(NA,NA,164),2001=c(120,NA,99),
2002=c(90,18,48),2003=c(54,217,NA),2004=c(NA,68,NA))

Any ideas for doing this easily? I haven't found any good answer on the web.

Thanks for the help!




--
View this message in context: 
http://r.789695.n4.nabble.com/transpose-a-data-frame-according-to-a-specific-variable-tp4702971.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] keep only the first value of a numeric sequence

2014-12-15 Thread jeff6868
Hello dear R-helpers,

I have a small problem in my algorithm. I have sequences of 0 and 1
values in a column of a huge data frame, and I just would like to keep the
first value of each sequences of 1 values, such like in this example:

data -
data.frame(mydata=c(0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,1,1,1),final_wished_data=c(0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0))

Any easy way to do this?

Thanks everybody!




--
View this message in context: 
http://r.789695.n4.nabble.com/keep-only-the-first-value-of-a-numeric-sequence-tp4700774.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] keep only the first value of a numeric sequence

2014-12-15 Thread jeff6868
Great! Both ways works well for my whole data!

Thanks guys! 



--
View this message in context: 
http://r.789695.n4.nabble.com/keep-only-the-first-value-of-a-numeric-sequence-tp4700774p4700783.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] find remove sequences of at least N values for a specific value

2014-07-10 Thread jeff6868
Hi everybody,

I have a small problem in a function, about removing short sequences of
identical numeric values.

For the example, we can consider this data, containing only some 0 and
1:

test - data.frame(x=c(0,0,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1))

The aim of my purpose here is simply to remove each sequence of 1 with a
length shorter than 5, and to keep sequences of 1 which are bigger than 5.
So my final data should look like this:

final - data.frame(x=c(0,0,NA,NA,NA,0,0,0,0,1,1,1,1,1,1,1,1))

For the moment, I have this function:

foo - function(X,N){
  tab - table(X[X==1])
  under.n - as.numeric(names(tab)[tabN]) 
  ind - X %in% under.n
  Ind.sup - which(ind)
  X - ifelse(ind,NA,X)
}

test$x - apply(as.data.frame(test$x),2,function(x) foo(x,5))

The problem is that the function doesn't consider each sequence separately,
but only one sequence. I think that adding rle() instead of table() in my
function should to the trick, but it doesn't work yet. 
Does someone have an idea about fixing this problem?





--
View this message in context: 
http://r.789695.n4.nabble.com/find-remove-sequences-of-at-least-N-values-for-a-specific-value-tp4693810.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove duplicated row according to NA condition

2014-05-29 Thread jeff6868
Yes, this is the good one Arun! Thank you very much. 
I tried each solution but yours was the best. It works well.
Thanks anyway for all your replies!






--
View this message in context: 
http://r.789695.n4.nabble.com/remove-duplicated-row-according-to-NA-condition-tp4691362p4691422.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remove duplicated row according to NA condition

2014-05-28 Thread jeff6868
Hi everybody,

I have a little problem in my R-code which seems be easy to solve, but I
wasn't able to find the solution by myself for the moment.

Here's an example of the form of my data:

data -
data.frame(col1=c(a,a,b,b),col2=c(1,1,2,2),col3=c(NA,ST001,ST002,NA))

I would like to remove duplicated data based on the first two columns
(col1,col2), but in both cases here, I would like to remove the duplicated
row which is equal to NA in col3.

Here's the data.frame I would like to obtain:

data2 - data.frame(col1=c(a,b),col2=c(1,2),col3=c(ST001,ST002))

I've been trying to mix duplicated() with is.na() but it doesn't work yet.

Can someone tell me the best and easiest way to do this?

Thanks a lot!







--
View this message in context: 
http://r.789695.n4.nabble.com/remove-duplicated-row-according-to-NA-condition-tp4691362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a color palette with custom ranges between colors

2012-12-13 Thread jeff6868
Thank you Nicole!

I did it with the color.palette function in the link you gave me.
I added then in my levelplot function a sequence with at:

at=seq(-40,40,1)

And it works quite good. 

Thanks again Nicole.

Merci à toi aussi pascal, et vive le CRC ainsi que le grand C. C. !
;)





--
View this message in context: 
http://r.789695.n4.nabble.com/create-a-color-palette-with-custom-ranges-between-colors-tp4652875p4652969.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] create a color palette with custom ranges between colors

2012-12-12 Thread jeff6868
Hello everybody,

I'm trying to create my own color palette on R, in order to interpolate some
different temperature data on different maps (daily means, seasonal
means,...).

I would like to create a color palette which works for each map, so I need a
color palette between -40 and +40°C. Sometimes my data for one map range
from -10 to +20, sometimes from 10 to 30, etc... but always between -40 and
+40°C.

I would like a fluent color gradation between my extremas (-40 and +40),
with different colors between customed values.

For example, if the temperature is under -20°C I would like the color
darkblue, then if the temperature is between -20 and 0°C I would like the
color lightblue, then between 0 and 20°C the color yellow and finally
over 20°C the color red.

Is it possible to create a fluent gradation color palette with customed
colors (not just one color for each part, but something fluent based on the
chosen colors) ?
Something like this: http://i.stack.imgur.com/5NoJh.jpg

I would like then to join this customed color palette to all my levelplot
or image.plot functions,in order to create all my maps.

Any help for doing this would be very much appreciated!

Thanks a lot!







--
View this message in context: 
http://r.789695.n4.nabble.com/create-a-color-palette-with-custom-ranges-between-colors-tp4652875.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] filling NA gaps according to previous data mean and following data mean

2012-10-18 Thread jeff6868
Hi everybody,

I have a little problem about filling some gaps of NAs in my data.

These gaps are between nearly constant data (temperature under snow). Here's
a fake example to illustrate how it looks like approximately:

DF -
data.frame(data=c(-0.51,-0.51,-0.48,-0.6,-0.54,-0.38,-0.6,-0.42,NA,NA,NA,NA,NA,NA,NA,
-0.25,-0.41,-0.5,-0.5,-0.35,-0.7,-1,-0.87))

I would like to replace my NAs with 0 with this condition:
Fill the gap with 0 if the mean of the 5 previous values before the gap
(NA) is under 0°C, AND if the mean of the 5 following values after the gap
(NA) is also under 0°C (actually it's not the 5 previous and following
values in my real data, it's my 500 previous and following values, but let's
juste take the 5 ones in my example).

I think that the nearest function for doing this is the na.locf function
of the package zoo, but it does not really do what I want.

Can somebody help me to resolve this?
Thanks a lot guys!






--
View this message in context: 
http://r.789695.n4.nabble.com/filling-NA-gaps-according-to-previous-data-mean-and-following-data-mean-tp4646613.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting a string to an integer vector

2012-10-18 Thread jeff6868
Hello,

Try this, It'll maybe help you:

a - 1,2 
b - strsplit(a,,)   #split your data according to ,
b - unlist(b)  # it creates a list, so we unlist the result to obtain a
vector like c(1,2)





--
View this message in context: 
http://r.789695.n4.nabble.com/converting-a-string-to-an-integer-vector-tp4646610p4646619.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filling NA gaps according to previous data mean and following data mean

2012-10-18 Thread jeff6868
Still so perfect Rui! A bit much more complicated as what I thought,
nevertheless it's what I want!

Thank you Rui!



--
View this message in context: 
http://r.789695.n4.nabble.com/filling-NA-gaps-according-to-previous-data-mean-and-following-data-mean-tp4646613p4646620.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] create new column in a DF according to values from another column

2012-09-26 Thread jeff6868
Hi everyone,

I have a small problem in my R-code.

Imagine this DF for example:

DF - data.frame(number=c(1,4,7,3,11,16,14,17,20,19),data=c(1:10))

I would like to add a new column Station in this DF. This new column must
be automatically filled with: V1 or V2 or V3.
The choice must be done on the numbers (1st column).

For example, I would like to have V1 in the column Station in the rows
where the numbers of the 1st column are: 1,7,11,16 ; then I would like to
have V2 in the rows where the numbers are: 4,14,20 and finally V3 in the
rows where the numbers are: 3,17,19.

I'm trying with if and something like this, but it's not working yet:
# For V1:
if(DF$number %in% c(1,7,11,16)) {test$Station==V1}
# For V2:
... 

So my final DF should look like this:

FINALDF - data.frame(number=c(1,4,7,3,11,16,14,17,20,19),data=c(1:10),
Station=c(V1,V2,V1,V3,V1,V1,V2,V3,V2,V3))

Could someone help me to finish this?

Thank you very much!!!





--
View this message in context: 
http://r.789695.n4.nabble.com/create-new-column-in-a-DF-according-to-values-from-another-column-tp4644217.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create new column in a DF according to values from another column

2012-09-26 Thread jeff6868
Yes this is it!

Thank you for your help Berend!



--
View this message in context: 
http://r.789695.n4.nabble.com/create-new-column-in-a-DF-according-to-values-from-another-column-tp4644217p4644225.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Get variable data Reading from the list

2012-08-28 Thread jeff6868
Hello,

And this: 
get(MyList[[1]]) with [[ ]] instead of [ ] ?

If you do for example:
MyList - list()
MyList [length(MyList )+1]- MyVar 
MyVar - c(1:10)
get(MyList[[1]])

It seems to do what you want



--
View this message in context: 
http://r.789695.n4.nabble.com/Get-variable-data-Reading-from-the-list-tp4641559p4641563.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply a function separately on each element of a list

2012-08-28 Thread jeff6868
Yes, this is it (as would say michael)! Thank you guys!

Last question about another function on this list: imagine this list is my
data after your function for the regression model:

mydf - data.frame(x=c(1:5), y=c(21:25),z=rnorm(1:5))
mylist - rep(list(mydf),5) 

Don't care about this fake data, it's just for the example. I've my results
in column z (from regression) for each DF of the list, and 2 other columns
x and y representing some spatial coordinates.

I have another independent DF containing a list of x and y too,
representing some specific regions (imagine 10 regions):
region - data.frame(x=c(1:10),y=c(21:30),region=c(1:10))
 
The final aim is to have for each 10 regions, a value z (of my regresion)
from the nearest point of each of the DF of my list.
That means for one region: 10 results z from DF1 of my list, then 10 other
results z from DF2, ...

I already have a small function to look for the nearest value:

min.dist - function(p, coord){
 which.min( colSums((t(coord) - p)^2) )
}

Then, I'm trying to make a loop to havewhat I want, but I have difficulties
with the list. I would need to put 2 variables in the loop, but it doesn't
works.

This works approximately if I just take 1 DF of my list:

for (j in 1:nrow(region))
{

imin[j] - min.dist(c(plante[j,1],plante[j,2]),mylist[[j]][,1:2])
final[j] - mylist[[j]][imin[j], z]
final - as.data.frame(final)
}

But if I select my whole list (in order to have one column of results for
each DF of the list in the object final), I have errors.

I think the first problem is that the length of regions is different of
the length of my list, and the second maybe is about adding a second
variable for the length of my list.

Is there any solution?




--
View this message in context: 
http://r.789695.n4.nabble.com/apply-a-function-separately-on-each-element-of-a-list-tp4641186p4641530.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a normal distribution table

2012-08-28 Thread jeff6868
Hello,

You can try this:

x=seq(-3,0,length=30)
y=1/sqrt(2*pi)*exp(-x^2/2)
plot(x,y,type=l,lwd=2,col=red)

with:
x: your vector between -3 and 0 (you can choose the length of your vector)
y: the probability density function for the standard normal distribution
formula




--
View this message in context: 
http://r.789695.n4.nabble.com/create-a-normal-distribution-table-tp4641558p4641561.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] apply a function separately on each element of a list

2012-08-24 Thread jeff6868
Hi everybody,

I have a question about applying a specific function (with the calculations
I want to do), on a list of elements.

Each elements are like a data.frame (with nrows and ncolumns), and have the
same structure.
At frist, I had a big data.frame that I splitted in all my elements of my
list. They have been splitted by day.
For example, the name of the first element of my list is 2011-01-01, and
is a data.frame corresponding to all my data from this specific date. Then
my second element is 2011-01-02, etc

My question is: how can I apply a function on each element separately (a bit
like a loop)?

For example, if my data from the first element 2011-01-01 is:
element1 - data.frame(x=rnorm(1:10),data=c(1:10))

I would like to do a regression between x and data (so lm(data ~x) ), to
get the predicted values of the regression, and then to keep the results in
a new object.

And then, do the same with the second element (regression between x and
data of the second element), keep the results of the predicted values and
keep the results.

... and so one with 200 elements.

Is there any way to do this?

Thanks a lot!



--
View this message in context: 
http://r.789695.n4.nabble.com/apply-a-function-separately-on-each-element-of-a-list-tp4641186.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to find data in a map according to coordinates?

2012-08-10 Thread jeff6868
Yes this is it!

It works also in this way with your code, without calling directly the user
in the console. 

Thank you very much Rui. You're still so helpful!



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-find-data-in-a-map-according-to-coordinates-tp4639724p4639876.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to find data in a map according to coordinates?

2012-08-09 Thread jeff6868
Hello,

I have created a spatial map of temperature over an area thanks to
interpolation. So I have temperature data everywhere on my map.
My question is: how can I find temperature data on my map for a specific
location according to coordinates?

For this, I have a data frame containing 4 columns: x for longitude, y
for latitude, z for altitude and temperature for my data, for each
pixel of my map. My real data has more than 9 million of rows (because I
have a temperature data for each 0.0008° of longitude or latitude).

Let's take a smallest example with 10 rows and so just 1° of LAT and LON for
each pixel (just with x, y and my data):

test - data.frame(x=c(1:10),y=c(41:50),temperature=rnorm(1:10))

In this example, I would like to ask the user of the algorithm to type on R
the coordinates (x and y) for which the user wants the temperature.

For example:
cat(choose your coordinates:\n)
x: HERE THE USER SHOULD TYPE A NUMERIC VALUE
y: HERE THE USER SHOULD TYPE A NUMERIC VALUE

and then R gives the value for temperature (in my third column).

In this example, if the user type 6 for x and  46 for y, R should give
as a result:   0.9713693

And if the coordinates typed by the user are between the coordinates in my
data.frame, it should response the temperature value of the nearest pixel.

For example, if the user type 3.89 for x and 43.62 for y, R should give
as a result:   0.6871172 (value of the nearest pixel: 4 for x and 44 for
y).

I absolutely don't know how to do this, and if it's easy to do or not. I
didn't find anything about it on the web for R.
Have you any idea or suggestions about a package, function or a way to do
this?
Hope you've understood!

thanks everybody!










--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-find-data-in-a-map-according-to-coordinates-tp4639724.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to calculate seasonal mean for temperatures

2012-08-01 Thread jeff6868
Hello everybody,

I need to calculate seasonal means with temperature data for my work. 
I have 70 files coming from weather stations, which looks like this for
example:

startdate - as.POSIXct(01/01/2006, format = %d/%m/%Y)
enddate - as.POSIXct(05/01/2006, format = %d/%m/%Y)
date - seq(from = startdate, to = enddate, by = days,format = %d/%m/%Y)

DF - data.frame(data=c(2.5,1.4,3.6,0.5,-1.2),date=date)

With this daily data, I need to calculate seasonal means.
I mean for season: winter (January,February,March) ; Spring (April,May,June)
; Summer(July,August,September) and Autumn(October,November,December).

My main problem is that all my files starts and ends not the same year (some
of them starts 1st January 2006 and ends 31th december 2008, some of them
starts 1st January 2007 and ends 31th December 2011, ...).

So not the same year, but all of them starts a 1st January and ends a 31th
December.

I'd like first to delete (or ignore) all the first 2 months (January and
February) and the last month (December) of all my files, because I cannot
calculate a seasonal means for them (not all the 3 months).
But the problem for the first 2 months is for leap yars (with 29th
February). For example, if my file starts in 2008, the first 2 months will
not be the same length as files starting in 2007 or 2006. So I cannot just
delete the first lines of my files because there'll be a problem for these
leap years.
And then, I'd like to calculate my seasonal means on each 3 months (like I
showed you before).
For example, my object seasonal means should look like this: Spring 2006:
xx ; Summer 2006: xx, ... (with xx my seasonal means).

Have you any idea how to do this? I found functions such like xts() but it
need to specify a year, so in my case it couldn't work. I need to automatize
this for all my files, so it shouldn't depend on the start year.
Thanks a lot! 







--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-calculate-seasonal-mean-for-temperatures-tp4638639.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to calculate seasonal mean for temperatures

2012-08-01 Thread jeff6868
Thank you both for your answers.

I found a best way to delete the first 2 months (Jan + Feb) and the last
month (Dec), which should work everytime:

DF$year - as.numeric(format(DF$Day, format = %Y))
DF$month - as.numeric(format(DF$Day, format = %m)) 

# delete first 2 months
for(i in DF[1,3])  # column year
ifelse(i==2008,(DF= DF[c(-(1:60)),]),(DF=DF[c(-(1:59)),]))
I delete the first 60 days if the first year if my file starts with a leap
year (I'll need just to add the new leap years afterwords), and the 59 first
days if it's not.

# delete last month   
DF= DF[c(-((nrowDF)-30):nrow(DF))),] 

I did a mistake next for the seasons:
- winter should be month 12,1,2 (so month 12 of the previous year, that's
why I deleted the first 2 months)
- spring: month 3,4,5
- summer: month 6,7,8
- autumn: month 9,10,11

Now my file starts month 3 of the first year (as I deleted the first 2
months).
My first year has so: month 3,4,5,6,7,8,9,10,11,12 and the nest year: month
1,2,3,4,5,6,7,8,9,10,11 (we imagine a file with just 2 years).
 I tried to modify your proposition but it doesn't work yet.

Could you help me again with this new version of season?
Thank you very much Ricardogg







--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-calculate-seasonal-mean-for-temperatures-tp4638639p4638651.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to calculate seasonal mean for temperatures

2012-08-01 Thread jeff6868
It's working now!

The problem was not for winter, but with the with you had in your object
DF$season. I got an error: invalid 'envir' argument. 
I removed it and now it seems to be OK.
Thank you very much for your help ricardo.



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-calculate-seasonal-mean-for-temperatures-tp4638639p4638670.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] duplicate data between two data frames according to row names

2012-07-18 Thread jeff6868
Hi everybody.

I'll first explain my problem and what I'm trying to do. 
Admit this example:
I'm working on 5 different weather stations.
I have first in one file 3 of these 5 weather stations, containing their
data. Here's an example of this file:

DF1 - data.frame(station=c(ST001,ST004,ST005),data=c(5,2,8))

And my two other stations in this other data.frame:

DF2 - data.frame(station=c(ST002,ST003),data=c(3,7))

I would like to add geographical coordinates of these weather stations
inside these two data.frames, according to the number of the weather
station.

All of my geographical coordinates for each of the 5 weather stations are
inside another data frame:

DF3 -
data.frame(station=c(ST001,ST002,ST003,ST004,ST005),lat=c(40,41,42,43,44),lon=c(1,2,3,4,5))

My question is: how can I put automatically these geographical coordinates
inside my first 2 data frames, according to the number of the weather
station?

For this example, the first two data frames DF1 and DF2 should become:

DF1 -
data.frame(station=c(ST001,ST004,ST005),lat=c(40,43,44),lon=c(1,4,5),data=c(5,2,8))
and
DF2 -
data.frame(station=c(ST002,ST003),lat=c(41,42),lon=c(2,3),data=c(3,7))

I need to automatize this method because my real dataset contains 70 weather
stations, and each file contains other (or same sometimes) stations , but
each station can be found in the list of the coordinates file (DF3).

Is there any way or any function able to do this kind of thing?

Thank you very much!




--
View this message in context: 
http://r.789695.n4.nabble.com/duplicate-data-between-two-data-frames-according-to-row-names-tp4636845.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] duplicate data between two data frames according to row names

2012-07-18 Thread jeff6868
merge is enough for me, thanks!
I was thinking about a loop, or a function like grep, or maybe another
function.
I'll have to think easier next time!
Thanks again! 

--
View this message in context: 
http://r.789695.n4.nabble.com/duplicate-data-between-two-data-frames-according-to-row-names-tp4636845p4636859.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using na.locf from package zoo to fill NA gaps

2012-07-02 Thread jeff6868
Hi everybody,

I have a small question about the function na.locf from the package zoo.
I saw in the help that this function is able to fill NA gaps with the last
value before the NA gap (or with the next value).
But it is possible to fill my NA gaps according to the last AND the next
value at the same time?
Actually, I want R to fill my gaps with the method of na.locf only if the
last value before the gap and the next value after the gap are identical.
Here's an example: imagine this small DF:

df - data.frame(x1=c(1:3,NA,NA,NA,6:9))

In this case, the last value before NA (3) and the next value after NA
(6) are different, so I don't want him to fill this gap.

But if I have a DF like this:

df2 - data.frame(x2=c(1:3,NA,NA,NA,3:6))

The last and next value (3) are identical, so in this case I want him to
fill my gap with 3 as would do the na.locf function: 
na.locf(df2)

But as you understood, I want to do this only if last and next value are
identical. If they're not, I want to keep my NA gap.

Have you any idea how I can do this (maybe something to add to na.locf or
maybe another better function to do this)?

Thank you very much!


--
View this message in context: 
http://r.789695.n4.nabble.com/using-na-locf-from-package-zoo-to-fill-NA-gaps-tp4635150.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using na.locf from package zoo to fill NA gaps

2012-07-02 Thread jeff6868
Seems to work very well!
Thank you very much Gabor!

--
View this message in context: 
http://r.789695.n4.nabble.com/using-na-locf-from-package-zoo-to-fill-NA-gaps-tp4635150p4635160.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to ignore NA with NA or NULL

2012-06-06 Thread jeff6868
Hello,

I added your flags in my code but there are still errors.
Actually I tried some things:

- in function na.fill, I changed: 
if(all(!is.na(y[1:8700,1])))  return(NA)  to
if(all(!is.finite(y[1:8700,1])))  return(y) 
In order to have this file unchanged.

It has removed my dimension problem. I don't have errors anymore in:
 refill - process.all(lst, corhiver2008capt1) but  just some message
d'avis readable with warnings()

Then I noticed in refill (the object which should be filled with my code)
that files containing only NAs are turned as NULL in this object. So I have
0 rows for these objects instead of having them unchanged (35000 rows).
So when I transform it to data.frame, it doesn't work because of a new
dimension problem due to these NULL files.

But I don't understand where these files have been turned as NULL in my
code. Could you maybe tell me how can I have in output my only NA files
not as NULL but kept unchanged like at the beginning?
Thanks again.



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632506.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to ignore NA with NA or NULL

2012-06-06 Thread jeff6868
Ok Jeff, but then it'll be a big one. I'm working on a list of files and my
problem depends on different functions used previously. So it's very hard
for me to summarize to reproduct my error. But here is the reproductible
example with the error at the last line of the code (just copy and paste
it).
You'll notice that the data.frame with only NAs is set to NULL in refill,
and I just want to have it unchanged in output (so the same as input).
The aim of the function is to fill the NAs of my data.frames. It'll not work
in this example because there're only big NA gaps which are my problem for
the moment. But maybe now you can have an idea where the problem is (change
NULL for only NA DF in output to the same DF as in input).
For the example, we are just testing for x1.
Hope you have understood my problem now :)
Thanks Jeff, Rui or everyone else!

# my data for example
DF1 - data.frame(x1=rnorm(1:20),x2=c(31:50))
write.table(DF1,ST001_2008.csv,sep=;)
DF2 -
data.frame(x1=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,rnorm(1:10)),x2=c(1:20))
write.table(DF2,ST002_2008.csv,sep=;)
DF3 - data.frame(x1=rnorm(81:100),x2=NA)
write.table(DF3,ST003_2008.csv,sep=;)
DF4 - data.frame(x1=c(21:40),x2=rnorm(1:20))
write.table(DF4,ST004_2008.csv,sep=;)

#list my data
filenames - list.files(pattern=\\_2008.csv$)

Sensors - paste(x, 1:2,sep=)

Stations -substr(filenames,1,5)

nsensors - length(Sensors)
nstations - length(Stations)

nobs - nrow(read.table(filenames[1], header=TRUE))

yr2008 - array(NA, dim=c(nobs, nsensors, nstations))

for(i in seq_len(nstations)){
tmp - read.table(filenames[i], header=TRUE, sep=;)
yr2008[ , , i] - as.matrix(tmp[, Sensors])
}

dimnames(yr2008) - list(seq.int(nobs), Sensors, Stations)

yr2008capt1hiver-yr2008[1:10,1,]
yr2008capt1hiver - as.data.frame(yr2008capt1hiver)

#correlation between my data for x1 (for the example)
corhiver2008capt1 - cor(yr2008capt1hiver,use=pairwise.complete.obs)

capt1hiver - c(1:length(yr2008capt1hiver))

for(i in 1:length(capt1hiver))
{
   
if(sum(!is.na(yr2008capt1hiver[,capt1hiver[i]]))(length(yr2008capt1hiver[[capt1hiver[i]]])/2))
{
 corhiver2008capt1[i,]=NA
 corhiver2008capt1[,i]=NA
  }
}


lst - lapply(list.files(pattern=\\_2008.csv$), read.table,sep=;,
header=TRUE, stringsAsFactors=FALSE)
names(lst) - Stations

# searching the highest correlation for each data.Frame
get.max.cor - function(station, mat){
 mat[row(mat) == col(mat)] - -Inf
 m - max(mat[station, ],na.rm=TRUE)
 if (is.finite(m)) {return(which( mat[station, ] == m ))}
 else {return(NA)}
}

# fill the data.frame with the data.frame which has the highest
correlation coefficient
na.fill - function(x, y){
 if(all(!is.finite(y[1:10,1])))  return(y)
 i - is.na(x[1:10,1])
 xx - y[1:10,1]
 new - data.frame(xx=xx)
 x[1:10,1][i] - predict(lm(x[1:10,1]~xx, na.action=na.exclude),new)[i]
 x
}

process.all - function(df.list, mat){

f - function(station)
 na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])

g - function(station){
x - df.list[[station]]
if(any(!is.finite(x[1:10,1]))){
mat[row(mat) == col(mat)] - -Inf
nas - which(is.na(x[1:10,1]))
ord - order(mat[station, ], decreasing = TRUE)[-c(1,
ncol(mat))]
for(y in ord){
if(all(!is.na(df.list[[y]][1:10,1][nas]))){
xx - df.list[[y]][1:10,1]
new - data.frame(xx=xx)
x[1:10,1][nas] - predict(lm(x[1:10,1]~xx,
na.action=na.exclude), new)[nas]
break
}
}
}
x
}

n - length(df.list)
nms - names(df.list)
max.cor - sapply(seq.int(n), get.max.cor, corhiver2008capt1)
df.list - lapply(seq.int(n), f)
df.list - lapply(seq.int(n), g)
names(df.list) - nms
df.list
}

refill - process.all(lst, corhiver2008capt1)
refill - as.data.frame(refill) 
 
## HERE IS THE PROBLEM ##
head(refill)

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632527.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to ignore NA with NA or NULL

2012-06-06 Thread jeff6868
Thanks again for your help jeff.
Sorry if I'm not very clear. It's programmingly speaking hard to explain,
and even to explain in english as I'm French.
But i'll try again.

Well your proposition removes the error, but it's not the result I'm
expecting. You've removed NULL data.frames, but I need to keep them, well
not to keep them but to transform them to something non-NULL actually.

I'll try to show you in a very small and fake exemple what I want results to
be:
Imagine these are my 3 input data frames (10 rows each):
ST1 - data.frame(x1=c(1:10))
ST2 - data.frame(x2=c(1:5,NA,NA,8:10))
ST3 - data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))

The aim of my code is to fill all the NA of my data.frames with data,
according to the correlation coefficient  of my data.frames(for example, if
there're NAs in ST1, ST1 must be filled with data from the best correlated
file with ST1 (between ST2 and ST3 in this example)).

As ST3 has no data, I cannot have any correlation coefficient. So NAs from
ST3 cannot be filled, and ST3 cannot also be used to fill another file. So
ST3 has no use if you want. Nevertheless I want to keep ST3 unchanged during
all my code.
For the moment my code would give for refill this (filled NA in my
data.frames):

ST1 - data.frame(x1=c(1:10))
ST2 - data.frame(x2=c(1:5,6,7,8:10))
ST3 - NULL

But actually, I want for results in refill this: 

ST1 - data.frame(x1=c(1:10))
ST2 - data.frame(x2=c(1:5,6,7,8:10))
ST3 - data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))

So for data.frames with only NAs, I don't want them to be NULL in refill,
but I want them to be identical as in input. I need this to have the same
dimensions of data.frames between inputs and outputs.
If I set them as NULL (like it is for the moment but I don't understand why
and I want to change this), there will be 0 rows in this data.frame instead
of 10 rows like the other data.frames. 

So I think there's something wrong in my code in function process.all or
na.fill or maybe lst.
We don't seem to be far from the solution but I still don't find it for the
moment.
For information, in function process.all and na.fill: x is the
data.frame I want to fill, and y is the file which will be used to fill x
(so the best correlated file with x).

I really hope I've been enoughly clear and understandable this time.
Thank you!



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632546.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to ignore NA with NA or NULL

2012-06-05 Thread jeff6868
Thanks again but my errors are still here. Is it maybe coming from the next
fonction (I combinate these 2 functions but I thought it was coming from the
first one):

process.all - function(df.list, mat){

f - function(station)
 na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
 
g - function(station){
x - df.list[[station]]
if(any(is.na(x[1:8700,1]))){
mat[row(mat) == col(mat)] - -Inf
nas - which(is.na(x[1:8700,1]))
ord - order(mat[station, ], decreasing = TRUE)[-c(1,
ncol(mat))]
for(y in ord){   
if(all(!is.na(df.list[[y]][1:8700,1][nas]))){
xx - df.list[[y]][1:8700,1]
new - data.frame(xx=xx)
x[1:8700,1][nas] - predict(lm(x[1:8700,1]~xx,
na.action=na.exclude), new)[nas]
break
}
}
}
x
} 

n - length(df.list)
nms - names(df.list)
max.cor - sapply(seq.int(n), get.max.cor, corhiver2008capt1)
df.list - lapply(seq.int(n), f)
df.list - lapply(seq.int(n), g)
names(df.list) - nms
df.list
}

refill - process.all(lst, corhiver2008capt1)
refill - as.data.frame(refill)

The error is when refill is created. It applies process.all in which
na.fill is also used. Do you see perhaps any error or missing code which
could create this NA problem when I introduce only NAs files?

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632388.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to ignore NA with NA or NULL

2012-06-04 Thread jeff6868
Hello dear R-users,

I have a problem in my code about ignoring NA values without removing them.
I'm working on a list of files. The aim is to fill one file from another
according to the highest correlation (correlation coeff between all my
files, so the file which looks like the most to the one I want to fill).
When I have just small gaps of NA, my function works well.
The problem is when I have only NAs in some files. As a consequence, it
cannot calculate any correlation coefficients (my previous function in the
case of only NAs in the file returns NA for the correlation coefficient),
and so it cannot fill it or make any calculation with it.

Nevertheless in my work I need to keep these NA files in my list (and so to
keep their dimensions). Otherwise it creates some dimensions problems, and
my function needs to me automatic for every files.

So my question in this post is: how to ignore (or do nothing with them if
you prefer) NA files with NA correlation coefficients?
The function for filling files (where there's the problem) is:

na.fill - function(x, y){
i - is.na(x[1:8700,1])
xx - y[1:8700,1] 
new - data.frame(xx=xx)   
x[1:8700,1][i] - predict(lm(x[1:8700,1]~xx, na.action=na.exclude),
new)[i]
x
}

My error message is: Error in model.frame.default(formula = x[1:8700, 1] ~
xx, na.action = na.exclude,  :  : invalid type (NULL) for variable 'xx'

I tried to add in the function:  
ifelse( all(is.null(xx))==TRUE,return(NA),xx)  or
ifelse( all(is.null(xx))==TRUE,return(NULL),xx)

but it still doesn't work.
How can I write that in my function? With NA, NULL or in another way?
Thank you very much for your answers


--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to ignore NA with NA or NULL

2012-06-04 Thread jeff6868
Thanks for answering Jeff.
Yes sorry it's not easy to explain my problem. I'll try to give you a
reproductible example (even if it'll not be exactly like my files), and I'll
try to explain my function and what I want to do more precisely.

Imagine for the example: df1, df2 and df3 are my files:
df1 - data.frame(x1=c(rnorm(1:5),NA,NA,rnorm(8:10)))
df2 - data.frame(x2=rnorm(1:10))
df3 - data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
df - list(df1,df2,df3)

I want to fill each NA gaps of my files. If I have only df1 and df2 in my
list, it'll work. If I introduce df3 (a file with only NAs), R won't
understand what to do.

In my function:

na.fill - function(x, y){
i - is.na(x[1:10,1])
xx - y[1:10,1]
new - data.frame(xx=xx)
x[1:10,1][i] - predict(lm(x[1:10,1]~xx, na.action=na.exclude),
new)[i]
x
}

x is the file I want to fill. So i lists all the NA gaps of the file.
xx is the file that will be used to fill x (actually the best correlated
file with x according to all my files).
And then I apply a linear regression between my 2 files: x and xx to
take predicted values from xx to put in the gaps of x.

Before I got files containing only NAs, it was working well. But since I
introduced some files with no data and so only NAs, I have my problem.
I got different NA problems when I tried a few solutions:
Error in model.frame.default(formula = x[1:8700,1] ~xx, na.action =
na.exclude,  :  : invalid type (NULL) for variable 'xx' OR
0 (non-NA) cases OR
is.na() applied to non-(list or vector) of type 'NULL

Actually I'm looking for a solution in na.fill to avoid these problems, in
order to ignore these only NA files from the calculation (maybe something
like na.pass) but I would like to keep them in the list. So the aim would be
maybe to keep them unchanged (if I have for example ST1 file with 30 only NA
in input, I want to have ST1 file with 30 only NA in output) but calculation
should work with these kinds of files in my list even if the code does
nothing with them.

Hope you've understood. Thanks again for your help.

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632314.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to ignore NA with NA or NULL

2012-06-04 Thread jeff6868
Hello Rui,

Sorry I read your post after having answered to jeff.

If seems effectively to be better than ifelse, thanks. But I still have some
errors:
Error in x[1:8700, 1] : incorrect number of dimensions AND
In is.na(xx) : is.na() applied to non-(list or vector) of type 'NULL

It seems to have modified the length of my data, due to these NAs

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632315.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ignore NA column in a DF (for calculation) without removing them

2012-05-31 Thread jeff6868
Dear users,

I have for the moment a function which looks for the best correlation for
each file I have in my correlation matrix. I'm working on a list.files.
Here's the function:

get.max.cor - function(station, mat){
mat[row(mat) == col(mat)] - -Inf
which( mat[station, ] == max(mat[station, ],na.rm=TRUE) )
 }

If I have a correlation matrix like this (no NA-value):

cor1 - read.table(text=
ST208 ST209 ST210 ST211 ST212
ST208 1.000 0.8646358 0.8104837 0.8899451 0.7486417
ST209 0.8646358 1.000 0.9335584 0.8392696 0.8676857
ST210 0.8104837 0.9335584 1.000 0.8304132 0.9141465
ST211 0.8899451 0.8392696 0.8304132 1.000 0.8064669
ST212 0.7486417 0.8676857 0.9141465 0.8064669 1.000
, header=TRUE)

It works perfectly. If I have a correlation matrix with some NAs (but not
only NAs) like this:

cor2 - read.table(text=
ST208 ST209 ST210 ST211 ST212
ST208 1.000 NA 0.9666491 0.9573701 0.9233598
ST209 NA 1.000 0.9744054 0.9577192 0.9346706
ST210 0.9666491 0.9744054 1.000 0.9460145 0.9582683
ST211 0.9573701 0.9577192 0.9460145 1.000 NA
ST212 0.9233598 0.9346706 0.9582683 NA 1.000
, header=TRUE)

It still works thanks to na.rm=TRUE, but when I have one file with no data,
and so only NAs in the column like this:
cor3 - read.table(text=
ST208 ST209 ST210 ST211 ST212
ST208 1.000 NA 0.8104837 0.8899451 0.7486417
ST209 NA NA NA NA NA
ST210 0.8104837 NA 1.000 0.8304132 0.9141465
ST211 0.8899451 NA 0.8304132 1.000 0.8064669
ST212 0.7486417 NA 0.9141465 0.8064669 1.000
, header=TRUE)

It doesn't work of course, because there's no non-NA value and so, no max
correlation for this file.
That's why I have this error: 0 (non-na) cases.
I tried to remove the NA columns, but as I'm working on a list.files, the
number of files in the list and in the matrix will be not the same. I
searched on the web but I only found some topics about removing NA columns.
In my case, I would like to ignore these NA columns without removing them.

I would like to say to R: when you are looking for the highest correlation
for each file in the correlation matrix, if you see a file with no
correlation coeff (only NAs column), don't do anything with it, keep it like
this and go to the next file (next column or row).
I also tried to put else {NA} or else {NULL} to avoid this problem but it
still doesn't work.

Does somebody have an idea how to solve this problem?
Thank you very much.

Best regards
Geoffrey




--
View this message in context: 
http://r.789695.n4.nabble.com/ignore-NA-column-in-a-DF-for-calculation-without-removing-them-tp4631912.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] correlation matrix only if enough non-NA values

2012-05-29 Thread jeff6868
Hi everybody.

I'm trying to do a correlation matrix in a list of files. Each file contains
2 columns: capt1 and capt2. For the example, I merged all in one
data.frame. My data also contains many missing data. The aim is to do a
correlation matrix for the same data for course (one correlation matrix for
capt1 and another for capt2).
For the moment, I have a correlation matrix which works (for capt1 or
capt2). But correlation coefficients of this matrix are calculated whatever
the number of missing data per column.
What I want to do is to have exactly the same correlation matrix, but only
with coefficients calculated with at least half of non missing data in the
column (in the example, at least 5 non NA values out of 10).

table - data.frame(ST1_capt1=rnorm(1:10),ST1_capt2=c(1,2,3,4,NA,NA,7:9,NA),
  ST2_capt1=c(NA,NA,NA,NA,NA,6:10),ST2_capt2=c(21,NA,NA,NA,25:30),
  ST3_capt1=c(1,NA,NA,4:10),ST3_capt2=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))

cormatrix - cor(table[,c(1,3,5)],use=pairwise.complete.obs)

To solve this problem, I think  it would be useful to use a code like this
before calculating the correlation matrix:

if(sum(!is.na(table[1:10,])) =5) then calculate the correlation
coefficient, and else (if less than 5 non-NA values) put NA in the
correlation matrix.

I'm trying to combinate all this stuff but it doesn't work. Could somebody
help me to do this?
Many thanks!



--
View this message in context: 
http://r.789695.n4.nabble.com/correlation-matrix-only-if-enough-non-NA-values-tp4631666.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] select part of files from a list.files

2012-05-24 Thread jeff6868
Hi again Joshua.

I tried your function. I think it's what I need. It works well in the small
example of my first post. But I have difficulties to adapt it to my data.
I'll try to give you another fake example with my real script and kind of
data (you can just copy and paste it to try):

ST1 -
data.frame(sensor1=rnorm(1:10),sensor2=c(NA,NA,NA,NA,NA,rnorm(6:10)),sensor3=c(1,NA,NA,4:10),sensor4=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),date_time=(date()))
write.table(ST1,ST1_2012.csv,sep=;,quote=F, row.names = TRUE)
ST2 -
data.frame(sensor1=c(NA,NA,NA,NA,NA,6:10),sensor2=rnorm(1:10),sensor3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),sensor4=c(1,NA,NA,4:10),date_time=(date()))
write.table(ST2,ST2_2012.csv,sep=;,quote=F, row.names = TRUE)
ST3 -
data.frame(sensor1=c(1,NA,NA,4:10),sensor2=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),sensor3=rnorm(1:10),sensor4=c(NA,NA,NA,NA,NA,6:10),date_time=(date()))
write.table(ST3,ST3_2012.csv,sep=;,quote=F, row.names = TRUE)
ST4 -
data.frame(sensor1=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),sensor2=c(1,NA,NA,4:10),sensor3=c(NA,NA,NA,NA,NA,6:10),sensor4=rnorm(1:10),date_time=(date()))
write.table(ST4,ST4_2012.csv,sep=;,quote=F, row.names = TRUE)

filenames - list.files(pattern=\\_2012.csv$)

Sensors - paste(sensor, 1:4,sep=)

Stations -substr(filenames,1,3)

nsensors - length(Sensors)
nstations - length(Stations)

nobs - nrow(read.table(filenames[1], header=TRUE,sep=;))

yr2008 - array(NA,dim=c(nobs, nsensors, nstations))

for(i in seq_len(nstations)){
tmp - read.table(filenames[i], header=TRUE, sep=;)
yr2008[ , , i] - as.matrix(tmp[, Sensors])
}

dimnames(yr2008) - list(seq.int(nobs), Sensors, Stations)

cor1_5 - lapply(Sensors, function(s) cor(yr2008[1:5, s,
],use=pairwise.complete.obs))
names(cor1_5) - Sensors
cor1_5

For the moment, it makes correlations between the same sensors of each file
(only for a part of my data), whatever the number of NA or numeric data.
I want it to do the same, but with your function: 
if (sum(!is.na(data[rows, ])) = minpresent){
data
  } else {NULL}
} 

I want it to give me the same correlation matrices for each sensors between
my files, but I want it to calculate the correlation coefficient only if I
have at least 3 numeric values (out of 5 in the example), and not whatever
the number of these numeric values (just 1 or 2 for example). If there're
less than 3 numeric values (1 or 2), give NA for correlation in the matrix.
And if there're only NAs in the sensor data, do nothing with it (keep it and
go to the next sensor).

I tried to combinate your function with mine but it doesn't work. Hope
you've understood. Thanks for your help!




--
View this message in context: 
http://r.789695.n4.nabble.com/select-part-of-files-from-a-list-files-tp4630769p4631185.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Prevent calculation when only NA

2012-05-21 Thread jeff6868
Hi everybody,

I have a small question about R.
I'm doing some correlation matrices between my files. These files contains
each 4 columns of data.
These data files contains missing data too. It could happen sometimes that
in one file, one of the 4 columns contains only missing data NA. As I'm
doing correlations between the same columns of each files, I get a
correlation matrix with a column containing only NAs such like this:

  file1 file 2 file 3
file11   NA0.8
file2NA 1 NA   
file3   0.8 NA 1

For file2, I have no correlation coefficient. 
My function is looking for the highest correlation coefficient for each
file. But I have an error message due to this.
My question is: how can I say to the function: don't do any calculation if
you see only NAs for the file you're working on? The aim of this function is
to automatize this calculation for 300 files.
I tried by adding: na.rm=TRUE, but it stills wants to do the calculation for
the file containing only NAs (error: 0 (non-NA) cases).
Could you tell me what I should add in my function? Thanks a lot!

get.max.cor - function(station, mat){
mat[row(mat) == col(mat)] - -Inf
which( mat[station, ] == max(mat[station, ], na.rm=TRUE) )
 }





--
View this message in context: 
http://r.789695.n4.nabble.com/Prevent-calculation-when-only-NA-tp4630716.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prevent calculation when only NA

2012-05-21 Thread jeff6868
Hi Jim,

Thanks for your answer.
I tried your proposition. The idea seems to be good but I still have my
error.
Actually, the error is in the next function, which uses the function
get.max.cor I told you before.
I also tried these 2 functions with data containing no missing data, and it
works well.
But I think that the next function is doing the calculation by column (it
seems to read each column). 
Do you think it's possible to introduce in the function get.max.cor
something which stops the calculation for a file if there're only NAs in the
correlation matrix for this file, instead of removing the NAs?
For example: if there're only NAs in file2, don't try to do any calculation
with file2 and go to file3 (and so one)?
I think that this is the problem, because even if I remove NAs, it stills
wants to do a calculation. But as there're no numeric values, it gives an
error.


--
View this message in context: 
http://r.789695.n4.nabble.com/Prevent-calculation-when-only-NA-tp4630716p4630722.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prevent calculation when only NA

2012-05-21 Thread jeff6868
Hello Rui,

Thanks for your answer too.
I tried your proposition too, but by giving the value 0 for this file, it
still wants to make a calculation with it. As it is looking for the best
correlation, and then the 2nd best correlation, giving only 0 seems to be a
problem for the 2nd best correlation at least.
Maybe the best way to solve the problem would be to introduce in the
function get.max.cor a line which would delete all the colums containing
only NAs in my correlation matrix? 
For example if my calculated correlation matrix is (imagine that the numeric
values are correlation coefficients for the example):

x - data.frame(a = 1:10, b = c(1:5,NA,7:9, NA), c = 21:30, d = NA)

Maybe is it possible in my function to delete only columns containing 100%
of NA, in order to have a matrix like this:

 x - data.frame(a = 1:10, b = c(1:5,NA,7:9, NA), c = 21:30)

and to keep other columns even if there're some NAs (the calculation is
still possible as they're numeric coefficients in the column).
Actually, it cannot look for the best or the second best correlation
coefficient in a column if it contains only NA.
I think that a correlation matrix like this would allow the calculation for
the next function and the rest of my script.

--
View this message in context: 
http://r.789695.n4.nabble.com/Prevent-calculation-when-only-NA-tp4630716p4630731.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prevent calculation when only NA

2012-05-21 Thread jeff6868
I tried your function. It works great thanks. I used then diag() in order to
have the value 1 for the whole diagonal of my matrix. But it still doesn't
work it's crazy.
By deleting colums and rows (and so some files) containing only NAs in the
correlation matrix, it doesn't work when I apply the function, because I'm
working on a list of files.
By deleting the files in the correlation matrix, it cannot apply the
function on the list.files (dimensions are different if I delete some files
in the correlation). And as I don't know before the calculation which files
are going to contain these NA columns and rows, I have to do it on another
way. 
I think I should first select the files for my list (and for the
correlation) which contains at least for example 1000 numeric values in a
certain array in order to calculate my correlations. But i'll post it in
another topic.


--
View this message in context: 
http://r.789695.n4.nabble.com/Prevent-calculation-when-only-NA-tp4630716p4630752.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] select part of files from a list.files

2012-05-21 Thread jeff6868
Hi everyone.

I'm working on a list of files (about 50 files). I've listed them thanks to
the function: list.files.
Each of my files contains 35000 lines of data. These files may also contain
some missing values NA (sometimes till 10 000 NAs following each other).
The aim is to do some correlation matrices between these files (I already
have the script). But as I have often missing values, the script doesn't
work yet for all my files.

In this topic, I would like to select a part of the data of these files
before the correlation.
In the files list I've created, I would like to select only the 9000 first
lines of each of my files: myfiles[1:9000,1], and then, in these 9000 lines,
I would like to keep only in my list the files which contains at least 1000
non-NA lines (so numeric data) on my 9000 lines.

I would like then to apply my script on this list of files which contains at
least 1000 numeric data on the first 9000 lines of my whole data.

I've created easy data.frames for the example, if someone could explain me
how I can do this easily (at least 2 non NA values for the 5 first lines for
example for these fake data.frames just here).
Thank you very much!

ST1 - data.frame(a=1:10)
ST2 - data.frame(b=c(NA,NA,NA,NA,NA,6:10))
ST3 - data.frame(c=c(1,NA,NA,4:10))
ST4 - data.frame(d=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
ST5 - data.frame(e=c(1,2,3,4,NA,NA,7:9,NA))

( in this example, the aim is to keep only in the list.files: ST1, ST3 and
ST5 because they all contains at least 2 non-NA values in the 5 first lines,
and so to remove from the list.files ST2 and ST4 because they contain both
too much NAs in the first 5 lines). Hope you've understood! Thanks again!




--
View this message in context: 
http://r.789695.n4.nabble.com/select-part-of-files-from-a-list-files-tp4630769.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] select part of files from a list.files

2012-05-21 Thread jeff6868
Hi Joshua,

Thanks you for your answer. I have to leave my work now but I'll try your
proposition tomorrow and I'll tell you if it works for me.
Good evening

--
View this message in context: 
http://r.789695.n4.nabble.com/select-part-of-files-from-a-list-files-tp4630769p4630777.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] stop calculation in a function

2012-05-10 Thread jeff6868
Hi dear R-users,

I have a question about a function I'm trying to improve.
How can I stop the function calculation at the last numeric value of my
data?
The problem is that the end of my data contains missing values (NAs). And
the aim of my function is to compare the first numeric value with the next
one (till the end). For the moment, It works well when my data doesn't
contains any NAs at the end of my file. I think that the problem is, as I
have NAs at the end of my data, R tries to compare my last numeric value
with the next numeric value wich doesn't exists, and so tries to modify the
length of my data (the error message is that the output has not the same
length as the input).
Could somebody tell me what I should modify or add in my function in order
to fix this problem?
Here's the function. Thanks for your advises! 

out2NA - function(x,seuil){
st1 = NULL
# Temporal variable memorysing the last correct numeric value#
temp - st1[1] - x[1]
ind_temp - 1
# Max time gap between two comparisons #
ecart_temps - 10
tps - time(x)

for (i in 2:length(x)){
if((!is.na(x[i]))){
if((tps[i]-tps[ind_temp]  ecart_temps)  (abs(x[i]-temp)  seuil)){
#(abs(x[i+1]-x[i])1)){
st1[i] - NA
}
else {
temp - st1[i] - x[i]
ind_temp - i
}
}
}
return(st1)
}

dat1 - myts[,2]
myts[,2] - apply(dat1,2,function(x) out2NA(x,2))

--
View this message in context: 
http://r.789695.n4.nabble.com/stop-calculation-in-a-function-tp4622964.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stop calculation in a function

2012-05-10 Thread jeff6868
Thank you for your reply sarah.
Well actually I don't try to access x[i+1]. The line where you saw it starts
with #. It was just try I wanted to keep (sorry I should have removed it
before posting).

But I ask him to access to the next value if conditions in the loop are not
verified (restart the comparison from the next value). It works well as long
as I have numeric values in my data. But if my data ends with NAs, I have
this problem.
That's why I'm trying to ask him to stop the calculation in the loop at the
last numeric value to avoid this error (don't know if it's the best way to
solve it, but it's the main idea I think).
Have you got any other idea about this?
Thanks a lot!




--
View this message in context: 
http://r.789695.n4.nabble.com/stop-calculation-in-a-function-tp4622964p4623259.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stop calculation in a function

2012-05-10 Thread jeff6868
Thanks for your answer too Berend.
Yes you're right about x[i+1]. You answered juste before me.
Well your idea of declaring all in numeric is great. It avoids my problem.
But actually I also have small missing data gaps in the rest of my data (in
the middle of numeric values).
And one of the aim of my function is to avoid comparison between 2 numeric
values which are separated with a long period of time (with NA inside), in
order for example not to compare a value of the 1st january and the next
numeric value of the 1st april.
I'm trying to combine both. For the moment, it works only for data which
doesn't ends with NAs as you've understood. With numeric() for st1, the
problem of NAs at the end is solved but it creates a new problem with the
other NAs (which was OK before). Do you better understand what I'm trying to
do?
If you have an other idea, It'll be welcomed.
Thanks

--
View this message in context: 
http://r.789695.n4.nabble.com/stop-calculation-in-a-function-tp4622964p4623391.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stop calculation in a function

2012-05-10 Thread jeff6868
I tried your proposition Sarah (I was answering to Berend when you posted
your answer).
Well it seems to work! 
I just had to add afterwords a line to have my NAs again.
I converted values = 0 by NA (numeric() in the function did the contrary for
the calculation):

mydata[mydata==0] - NA 

At first it was working for such kind of data: NAs just in the middle of my
data
test -
data.frame(c(1,2,3,4,NA,NA,7,8,9,10),c(11,12,NA,14,15,16,17,NA,19,20))
colnames(test)- c(data1,data2)

but not for data with NAs at the beginning, in the middle and at the end:

test2 -
data.frame(c(NA,2,3,4,NA,NA,NA,NA,NA,NA),c(NA,12,13,NA,15,16,17,NA,NA,NA))
colnames(test2)- c(data3,data4)

But thanks to your proposition, it seems to work in both cases now!
Thanks a lot sarah!

--
View this message in context: 
http://r.789695.n4.nabble.com/stop-calculation-in-a-function-tp4622964p4623584.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-05-03 Thread jeff6868
Hi Rui it's me again.
I would have another question in the function process.all you explained
me. But as you already helped me a lot, and as I promised I won't disturb
you again, I want to ask you first if you accept to help me one more time
before telling you more precisely my problem (about adding an automatic
linear regression in order to have more realistic filling data in the gaps). 
I wrote you a personal message (don't know if you got it), because I would
like to send you a present from the Alps to thank you for all the help you
gave me, and maybe the new help (and so to have your home or work postal
address).
If you agree, let me know and send me your address by mail. I'll explain in
a new post what my boss wants me know to add in your function (this function
is so tricky for me to understand with my small knowledge + Google + R
help).

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4605385.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] add an automatized linear regression in a function

2012-05-03 Thread jeff6868
Dear R users,

For the moment, I have a script and a function which calculates correlation
matrices between all my data files. Then, it chooses the best correlation
for each data and take it in order to fill missing data in the analysed file
(so the data from the best correlation file is put automatically into the
missing data gaps of the first file (because my files are containing missing
values (NAs))). If the best correlated file doesn't contain data , it takes
the data from the second best correlated file. 
The problem is that for the moment, it takes raw data from the best
correlated file. 

So I need to adapt this raw data to the file that is going to be filled. As
a consequence, I'd like to automatize the calculation of a linear regression
(after the selection of the best or the second best correlated data file)
between the two files.
Instead of taking the raw data from the best correlated file to fill the
first one, it should take the estimated data from the regression to fill it
(in order to have more precise filled data). 
The idea is so to do an lm() between these two files, to extract the
coefficients of the straight line (from the regression) and to calculate the
estimated data for all my file (NA included), and finally to fill the gaps
with this estimated data. Hope you've understand my problem.
Here's the function:

process.all - function(df.list, mat){
f - function(station)
 na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
 
g - function(station){
x - df.list[[station]]
if(any(is.na(x$data))){
mat[row(mat) == col(mat)] - -Inf
nas - which(is.na(x$data))
ord - order(mat[station, ], decreasing = TRUE)[-c(1,
ncol(mat))]
for(i in nas){
for(y in ord){
if(!is.na(df.list[[y]]$data[i])){
x$data[i] - df.list[[y]]$data[i]
break
}
}
}
}
x
}

n - length(df.list)
nms - names(df.list)
max.cor - sapply(seq.int(n), get.max.cor, corhiver2008capt1)
df.list - lapply(seq.int(n), f)
df.list - lapply(seq.int(n), g) 
names(df.list) - nms
df.list
}

I succeded for a small data.frame I've created, but I don't know how to do
it in this particular case.
Thanks a lot for your help!


--
View this message in context: 
http://r.789695.n4.nabble.com/add-an-automatized-linear-regression-in-a-function-tp4606047.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-26 Thread jeff6868
Hello Rui,

For the write.table, it's OK!
And for the second one (for the 2nd best correlation) seems to work great!
You're too strong ^^
I have to check a bit more to be sure, but it seems to do it!

If you come in the Alps, it will be more liqueurs such as Chartreuse or
Génépi (from mountain plants) if you know them. I'll offer you one bottle if
you come one day. I could even send it to you in portugal if you want.
Thanks a lot again for all.

Geoffrey

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4590193.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-25 Thread jeff6868
Seems to work great! 

I have a last question (or 2) for you about it, and I will leave you alone
afterwords, I promise :)

I tested your function process.all for the automatization. It seems to be
OK.
It's just when I'd like to save the filled data files.
If I name process.all, for example:  test - process.all(lst, corr2008)
and I save it: write.table(test, ...)
and I check the test file, It has filled my data but all the files from
lst are in one file (the columns are: ST001, ST001_time, ST002,
ST002_time, . (with ST001 for station 1 for example)).
How can I cut these files and save them automatically (one file for ST001,
another for ST002, ...) according to these columns names?

And it is possible in your script to take the second best correlated station
data instead of the best one, if there are NAs in this best correlated
station at the same lines with the NA gaps of the station to fill?

Thanks again for all your help. If you come one day in France near the Alps
or Chamonix (where I'm working), just tell me. I'll pay you some beers or a
restaurant! You deserve it ^^
By the way, where do my rescuer come from? Are you a statistician?

Geoffrey

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4586079.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-24 Thread jeff6868
Hi again Rui,

I tested your script as you wrote it with my examples, it works perfectly!
It seems to be exactly what I'm trying to do.
I just have a question about your function na.fill.
When I'm trying to apply your script to my data, it doesn't work. I think
it's because in your example, you already open the data.frames in your list.
But in my case, these data.frames are in different files (as I have 70
files). I'm trying to apply your function na.fill on a list.files.
That's why I think it tells me: Error dans x$data : $ operator is invalid
for atomic vectors
I tried like this: x[,2] but it doesn't work too: incorrect number of
dimensions.
How can I do exactly the same for na.fill, but by calling a file (according
to the name of the file) and not directly a data.frame like you (s1,s2,s3)? 

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4583404.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread jeff6868
Hi everyone.

I have a question about a work on R I have to do for my job.
I have temperature data coming from 70 weather stations. One data file
corresponds to one station for one year (so 70 files for one year). Each
file looks like this (important: each file contains NAs):

time  data
01/01/2008 00:00 -0.25 
01/01/2008 00:15 -0.18 
01/01/2008 00:30 -0.25 
01/01/2008 00:45 -0.25 

(one column with date + time every 15mn for the whole year, and one column
with data). 

I already did correlation matrices between my weather stations (in order to
find the nearest). For example:

  Station1 Station2 Station3 [...]
Station11  0.90.8
Station20.9 1 0.7
Station30.8   0.7 1
[...]

Now, I would like to fill the NA data gaps of a station with data from
another station according to their correlation coefficient.
Let's take an example for the Station 1: if the most correlated Station with
Station 1 is Station 2, it has to take data from Station 2 to fill NA gaps
of Station 1, for the same date and hour of course (or same lines as I'm
doing correlations for the same year). 
So for year 2008 (for example), if the correlation is the highest between
Station 1 and 2 (according to all the Stations), and if the data are:

timedata
01/01/2008 00:00   1
01/01/2008 00:15   2   FOR STATION 1
01/01/2008 00:30   *NA* 
01/01/2008 00:45   4 

and 

timedata
01/01/2008 00:00   8
01/01/2008 00:15   9  FOR STATION 2 for the same year and the same
time
01/01/2008 00:30   *10 *
01/01/2008 00:45   11

The Station1 file should become:

timedata
01/01/2008 00:00   1
01/01/2008 00:15   2   STATION 1
01/01/2008 00:30   *10 *
01/01/2008 00:45   4 

Hope you've understood what I would like to do :)
Thanks a lot for your ideas and your replies!






--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580054.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread jeff6868
Hi Sarah,

Thank you for your answer.
Yes I know that my proposition is not necessary the better way to do it. But
my problem concerns only big gaps of course (more than half a day of missing
data, till several months of missing data).
I've already filled small gaps with the interpolation that you were talking
in your message (with the function na.approx of the package zoo).
For the study, it's not important to have perfectly  identical values
between the 2 correlated stations, because I'll calculate after the
reconstruction the daily mean of each station. For my boss, it's enough to
work on daily means. But before that, I need to rebuild the big missing data
gaps of my stations (by the way I explained in the first message of my
topic).
Do you have any idea of the way to do it on R according to my first post?
I forgot to precise that my examples are completely fakes! I chose these
numbers in order for you to understand what I want to do (I chose easy and
readable numbers). I tested on excel with 2 stations, it was not too bad
when I filled the gaps (between the data of the 2 well correlated stations).


--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580296.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread jeff6868
Hi Rui,

Yes you're right. It's me again ^^
This post is the last part (I hope) of my job. You helped me a lot last time
for the correlation matrices. 
I have to leave my work now, so I'll check and test your proposition
tomorrow. But it makes no doubt that it'll help me a lot again. 
I'll tell you tomorrow. Thanks Rui!

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580898.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation matrix between data from different files

2012-04-18 Thread jeff6868
I improved yesterday a bit your script (mostly according to station numbers
for the automatization). Here's the final version. thanks again! 

filenames - list.files(pattern=\\_2008_reconstruit.csv$)

Sensors - paste(capteur_, 1:4, sep=)

Stations -substr(filenames,1,5)

nsensors - length(Sensors)
nstations - length(Stations)

nobs - nrow(read.table(filenames[1], header=TRUE))

yr2008 - array(NA, dim=c(nobs, nsensors, nstations))

for(i in seq_len(nstations)){
tmp - read.table(filenames[i], header=TRUE, sep=;)
yr2008[ , , i] - as.matrix(tmp[, Sensors])
}

dimnames(yr2008) - list(seq.int(nobs), Sensors, Stations)
cor2008 - lapply(Sensors, function(s) cor(yr2008[ , s,
],use=complete.obs))
names(cor2008) - Sensors
cor2008$capteur_1
cor2008$capteur_2
cor2008$capteur_3
cor2008$capteur_4

--
View this message in context: 
http://r.789695.n4.nabble.com/correlation-matrix-between-data-from-different-files-tp4552226p4567031.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation matrix between data from different files

2012-04-17 Thread jeff6868
Hello Rui,

Thanks a lot for your answer.

Hou hoped that your script would help me?
I answer you: It is WON-DER-FUL!
It works very well!  I had first some difficulties to adapt it to my data,
but I succeeded afterwords when I made a test between 2 stations.
It's not perfect yet (I still have to modify a bit my data because it
doesn't recognize the time column, and I have some problems with the
automatization according to the name of the data from each stations), but
the main problem (correlation matrix) seems to be resolved thanks to you!

Thanks a lot again!

--
View this message in context: 
http://r.789695.n4.nabble.com/correlation-matrix-between-data-from-different-files-tp4552226p4564610.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] correlation matrix between data from different files

2012-04-12 Thread jeff6868
Dear users,

I'm quite a new french R-user, and I have a problem about doing a
correlation matrix.
I have temperature data for each weather station of my study area and for
each year (for example, a data file for the weather station N°1 for the year
2009, a data file  for the N°2 for the year 2010, ). So I have 70
weather stations with one data file per year since 2005. Each station has 4
temperature sensors.
Each data file has exactly the same structure: datehour, sensor1, sensor2,
sensor3, sensor4. Here's an example:

time  sensor1   sensor2 sensor3sensor4
01/01/2008 00:00-0.25   -2.43   -3.25   -2.37
01/01/2008 00:15-0.18   -2.37   -3.18   -2.25
01/01/2008 00:30-0.25   -2.5-3.37   -2.56
01/01/2008 00:45-0.25   -2.37   -3.31   -2.37

I need to do a matrix correlation between each same sensors of the different
stations (one correlation matrix between all the sensors 1 of the 70
stations, another one for sensor 2, ...). 
I have to find for each year and each station the best correlation. For
example, which one of the 70 weather stations is the most well correlated
with station 1 for the sensor 1? and with station 2? ... and so one for each
sensor and each station.

Example:

Sensor 1 for the year 2009

   Station 1 Station 2 Station 3 [...]
Station 1 1   0.910 0.748
Station 2 0.91010.6 
Station 3  0.748   0.6  1   
[...]

And the same for year 2005,2006,2007,2008,2009,2010,2011 for each of the 4
sensors.

Have you got any idea how can I do this on R? 
Should I first merge all the sensors in one file or could I do it with data
in separate files (like I have for the moment)?
Thank you very much for all your answers!


--
View this message in context: 
http://r.789695.n4.nabble.com/correlation-matrix-between-data-from-different-files-tp4552226p4552226.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filling small gaps of N/A

2012-04-04 Thread jeff6868
Wow, thank you for all your answers.

You were completely right michael. Well, it's my fault. I didn't understood
your 2nd reply, when you were talking about arguments for larger gaps. I
thought it was for deleting big gaps too. I apologize.
It was too easy in fact. I also didn't noticed the argument maxgap of the
function. 
Finally, it works perfectly only with this:

require(zoo)

imputation - function(x){
met - na.approx(x, maxgap = 4)

return(met)
}

data - myts[,2:5]
myts[,2:5]-apply(data,2,imputation)

Sorry for my stupidity. I'll try to be more careful next time, for such
small problems (when I was thinking it would be a big one) ;).
Well, thank you very much michael and the other repliers, and thank you for
having spared a bit of your time for me!

--
View this message in context: 
http://r.789695.n4.nabble.com/filling-small-gaps-of-N-A-tp4528184p4531224.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] filling small gaps of N/A

2012-04-03 Thread jeff6868
Hi everybody,

I'm a new R french user. Sorry if my english is not perfect. Hope you'll
understand my problem ;)

I have to work on temperature data (35000 lines in one file) containing some
missing data (N/A). Sometimes I have only 2 or 3 N/A following each other,
but I have also sometimes 100 or 200 N/A following each other. Here's an
example of my data, when I have only small gaps of missing data (2 or 3
N/A):

09/01/2008 12:00   2   1.93   2.93   4.56   5.43
09/01/2008 12:15   2   *3.93*   3.25   4.93   5.56
09/01/2008 12:30   2NA   3.5   5.06   5.56
09/01/2008 12:45   2NA   3.68 5.25   5.68
09/01/2008 13:00   2   *4.93 *  3.87   5.56   5.93
09/01/2008 13:15   2   5.93   4.25   5.75   6.06
09/01/2008 13:30   2   3.93   4.56   5.93   6.18

My question is: how can I replace these small gaps of N/A by numeric values?
I would like a fonction which only replace the small gaps (2 or 3 N/A) in my
data, but not the big gaps (more than 5 N/A following each other).

For the moment, i'm trying to do it by working with the time gap between the
2 numeric values surrounding the N/A as following:

imputation - function(x){
met = NULL

temp - met[1] - x[1]

ind_temp - 1

tps - time(x)
   
for (i in 2:(length(x)) ){
if((tps[i]-tps[ind_temp]  1)(tps[i]-tps[ind_temp] =
4)(is.na(x[i]))){
met[i] - na.approx(x)
}
else {
temp - met[i] - x[i]
ind_temp - i
}   
}

return(met)
}

In this example, I would like to apply the function: na.approx(x) on my N/A,
but only when I have maximum 4 N/A following each other.
There's no error, but it doesn't work (it was working in the other way, when
I had to detect aberrant data and replace it by N/A, but not now). It is
maybe not the good way to solve this problem. I don't have a lot of
experience in R. Maybe there is an easier way to do it...
Does somebody have an idea about it for helping me?
Thanks a lot!


--
View this message in context: 
http://r.789695.n4.nabble.com/filling-small-gaps-of-N-A-tp4528184p4528184.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filling small gaps of N/A

2012-04-03 Thread jeff6868
Michael,

First of all, thank you very much for your answer.
I've read your 2 answers, but I'm not really sure that they corresponds to
my problem of NAs.
I'll try to detail you a bit more.

This problem concerns the second part of my program. In the first part, I've
already created a timeseries object with the library (timeseries). I had to
delete first all the wrong values in my data and replace it with NAs. 
So my data contains already missing data (NAs), as I have cleaned it before.

The thing is that sometimes I have small gaps of missing data (only 2 or 3
following) like in example 1 below:

example 1:

09/01/2008 12:00  1.93   
09/01/2008 12:15  3.93   
09/01/2008 12:30   NASo here you have a small gap with only
2 NAs
09/01/2008 12:45   NA   
09/01/2008 13:00  4.93  
09/01/2008 13:15  5.93

But sometimes, always in the same file, I have big gaps, such as 10 or more
NAs following each other like in example 2 below:

example 2:

09/01/2008 16:152.93
09/01/2008 16:302.93
09/01/2008 16:45NA
09/01/2008 17:00NA
09/01/2008 17:15NA
09/01/2008 17:30NA
09/01/2008 17:45NA
09/01/2008 18:00NA  So here you have a big gap with 
more than 10
NAs following each other
09/01/2008 18:15NA
09/01/2008 18:30NA
09/01/2008 18:45NA
09/01/2008 19:00NA
09/01/2008 19:15NA
09/01/2008 19:30NA
09/01/2008 19:45NA
09/01/2008 20:00NA
09/01/2008 20:157.93
09/01/2008 20:307.93

So in the whole same file, I can have sometimes big gaps (2 or 3 NAs),
sometimes big or very big gaps (10 or 100 NAs following).

The aim of my problem is to apply the function: na.approx(x) of the library
(zoo) to fill NAs ONLY for small gaps.

If I just do: apply(na.approx(x)), it will fill all the NAs of my data (big
gaps + small gaps). It's exactly what I DON'T WANT.

My problem is to say to R:  you apply the function (na.approx) to fill NAs
ONLY if you see 4 NAs maximum following each other (small gaps) (like
example 1). If you see more than 4 NAs following each other (big gaps like
in example 2), you keep these NAs and you DON'T fill this big gap.

My question is: how can I say this to R? I don't know how to do it.
Hope I've been understandable this time ^^
Thanks a lot again for all your answers!



--
View this message in context: 
http://r.789695.n4.nabble.com/filling-small-gaps-of-N-A-tp4528184p4528907.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.