Re: [R] [External] finding nearest zip codes

2020-08-04 Thread Jeff Newmiller
If you fail to force the zip code data to be read in as character data then you 
will have problems. Your code does not use the colClasses argument or the 
stringsAsFactors=FALSE argument (needed if you are using a version of R earlier 
than 4.x). Richard was suggesting that you use the str function to examine your 
data frames to verify correct data types were present.

Please read the Posting Guide... using HTML formatted email and attaching 
disallowed file types as you have done are good ways to prevent useful answers 
from being offered.

On August 4, 2020 11:01:01 PM PDT, Debasmita Sur  wrote:
>Hi Richard,
>
>I have not considered the 4 digit zip codes, I have taken only 5
>digits. I
>have attached two folders, in the 'air' folder I have some specific zip
>codes and in output I got proper results, whereas in the 'par' folder I
>got
>'NA's in the minimum distance column. Actually, the problem was to find
>the
>nearest store for a specific brand.
>
>Thanks,
>*Debasmita*
>
>On Wed, Aug 5, 2020 at 7:08 AM Richard M. Heiberger 
>wrote:
>
>> verify that you actually have five-digit zip codes stored as
>> characters. New Jersey and Massachusetts have zero as
>> the first digit.  When these codes are saved as numbers, they become
>> four-digit codes and will probably cause errors.
>> For example Cambridge, Mass is '02138', and would be reported as 2138
>> when interpreted as a number..
>>
>> On Tue, Aug 4, 2020 at 9:29 PM Debasmita Sur 
>wrote:
>> >
>> > Dear R-experts,
>> > I have two lists of US zip codes and want to pick the nearest zip
>code
>> from
>> > second list against my first list.e.g.30043 (from second list) is
>closest
>> > to the zip code 30094 (from first list).So,it should come against
>> 30094.The
>> > code should compare the distance from each zip and pick the nearest
>one.
>> > I have written the following code. It is giving proper results for
>many,
>> > but in mindist, it is showing 'NAs'. But for some of the zip codes,
>it is
>> > giving proper minimum distance. Please note it will be effective
>for 5
>> > digit zip codes. Any help will be highly appreciated.
>> >
>> > df1<-read.csv("C:/Users/dxsur/Desktop/ZIP1.csv")
>> > df2<-read.csv("C:/Users/dxsur/Desktop/ZIP2.csv")
>> >
>> > results<-merge(x=df1,y=zipcode,all.x=TRUE)
>> > results1<-merge(x=df2,y=zipcode,all.x=TRUE)
>> >
>>
>distance<-distm(subset(results,select=c(longitude,latitude)),subset(results1,select=c(longitude,latitude)))
>> >
>> > rnum=apply(distance, 1, which.min)
>> > mindist=apply(distance, 1, min)
>> >
>> > final<-cbind(results,results1$zip[unlist(rnum)],mindist)
>> >
>> >
>> > Thanks & Regards,
>> > *Debasmita *
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] finding nearest zip codes

2020-08-04 Thread Debasmita Sur
Hi Richard,

I have not considered the 4 digit zip codes, I have taken only 5 digits. I
have attached two folders, in the 'air' folder I have some specific zip
codes and in output I got proper results, whereas in the 'par' folder I got
'NA's in the minimum distance column. Actually, the problem was to find the
nearest store for a specific brand.

Thanks,
*Debasmita*

On Wed, Aug 5, 2020 at 7:08 AM Richard M. Heiberger  wrote:

> verify that you actually have five-digit zip codes stored as
> characters. New Jersey and Massachusetts have zero as
> the first digit.  When these codes are saved as numbers, they become
> four-digit codes and will probably cause errors.
> For example Cambridge, Mass is '02138', and would be reported as 2138
> when interpreted as a number..
>
> On Tue, Aug 4, 2020 at 9:29 PM Debasmita Sur  wrote:
> >
> > Dear R-experts,
> > I have two lists of US zip codes and want to pick the nearest zip code
> from
> > second list against my first list.e.g.30043 (from second list) is closest
> > to the zip code 30094 (from first list).So,it should come against
> 30094.The
> > code should compare the distance from each zip and pick the nearest one.
> > I have written the following code. It is giving proper results for many,
> > but in mindist, it is showing 'NAs'. But for some of the zip codes, it is
> > giving proper minimum distance. Please note it will be effective for 5
> > digit zip codes. Any help will be highly appreciated.
> >
> > df1<-read.csv("C:/Users/dxsur/Desktop/ZIP1.csv")
> > df2<-read.csv("C:/Users/dxsur/Desktop/ZIP2.csv")
> >
> > results<-merge(x=df1,y=zipcode,all.x=TRUE)
> > results1<-merge(x=df2,y=zipcode,all.x=TRUE)
> >
> distance<-distm(subset(results,select=c(longitude,latitude)),subset(results1,select=c(longitude,latitude)))
> >
> > rnum=apply(distance, 1, which.min)
> > mindist=apply(distance, 1, min)
> >
> > final<-cbind(results,results1$zip[unlist(rnum)],mindist)
> >
> >
> > Thanks & Regards,
> > *Debasmita *
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: defining group colours in a call to rda

2020-08-04 Thread Andrew Halford
-- Forwarded message -
From: Abby Spurdle 
Date: Wed, Aug 5, 2020 at 3:07 PM
Subject: Re: [R] defining group colours in a call to rda
To: Andrew Halford 


Hi Andrew,

Perhaps you want this:

cols <- rep_len (c ("red", "green", "blue", "aquamarine", "magenta"), 9)
cols

Or this:

cols = rep ("", 9)
cols [unique (MI_fish_all.mrt$where)] = c ("red", "green", "blue",
"aquamarine", "magenta")
cols

Then you can substitute either into your original example:

plotcolor <- cols [MI_fish_all.mrt$where]


On Wed, Aug 5, 2020 at 1:02 PM Andrew Halford 
wrote:
>
> Hi Abby,
>
> Apologies for not providing more info but you have worked out what I was
on about anyways.
>
> I thought it would scroll through and allocate the colours to each unique
number sequentially. I will add more colours to my vector but I would like
to know if it is possible to do what I originally hoped for.
>
> cheers
>
> Andy
>
> On Wed, Aug 5, 2020 at 8:40 AM Abby Spurdle  wrote:
>>
>> Hi,
>>
>> Your example is not reproducible.
>> However, I suspect that the following is the problem:
>>
>> c("red","green","blue","aquamarine","magenta")[MI_fish_all.mrt$where]
>>
>> Here's my version:
>>
>> where = c (3, 3, 8, 6, 6, 9, 5, 5, 9, 3, 8, 6, 9, 6, 5, 9, 5, 3, 8, 6,
>> 9, 6, 5, 9, 5, 3, 3, 8, 6, 6, 9, 5, 5, 9, 6, 9, 5, 9)
>> unique (where)
>>
>> c("red", "green", "blue", "aquamarine", "magenta")[where]
>>
>> There's five colors.
>> But only two of the indices are within one to five.
>> So, the resulting color vector contains missing values.
>>
>> In the base graphics system, if you set colors to NA, it usually means
no color.
>>
>> I'm not sure exactly what you want to do, but I'm assuming you can fix
>> it from here.
>>
>> On Tue, Aug 4, 2020 at 9:49 PM Andrew Halford 
wrote:
>> >
>> > Hi,
>> >
>> > I've been trying to use the output on group membership of the final
leaves
>> > in a MRT analysis to define my own colours, however I am not getting
the
>> > result I'm after.
>> >
>> > Here is the code
>> > fish.pca <-rda(fish_all.hel,scale=TRUE)
>> > fish.site <- scores(fish.pca,display="sites",scaling=3)
>> > fish.spp <-
>> >
scores(fish.pca,display="species",scaling=3)[fish.MRT.indval$pval<=0.05,]
>> > plot(fish.pca,display=c("sites","species"),type="n",scaling=3)
>> > points(fish.site,pch=21,bg=MI_fish_all.mrt$where,cex=1.2)
>> > plotcolor <-
>> > c("red","green","blue","aquamarine","magenta")[MI_fish_all.mrt$where]
>> >  fish.pca <-rda(fish_all.hel,scale=TRUE)
>> > plot(fish.pca,display=c("sites","species"),type="n",scaling=3)
>> > points(fish.site,pch=21,bg=plotcolor,cex=1.2)
>> > MI_fish_all.mrt$where
>> >
>> > If I run the points command and insert the group membership direct
from the
>> > MRT analysis e.g.  bg=MI_fish_all.mrt$where , then the subsequent
points
>> > plot up correctly with a different colour for each group.However if I
try
>> > to impose my own colour combo with plotcolor.It prints colours for
2
>> > groups and leaves the rest uncoloured.
>> >
>> > The call to  MI_fish_all.mrt$where gives...
>> >  [1] 3 3 8 6 6 9 5 5 9 3 8 6 9 6 5 9 5 3 8 6 9 6 5 9 5 3 3 8 6 6 9 5 5
9 6
>> > 9 5 9.
>> >
>> > These are the split groupings for all 39 sites in the analysis and
there
>> > are 5 numbers corresponding to 5 final leaves in the tree.
>> >
>> > I cant see why my colour scheme isnt being recognised.
>> >
>> > All help accepted.
>> >
>> > Andy
>> >
>> >
>> > --
>> > Andrew Halford Ph.D
>> > Senior Coastal Fisheries Scientist
>> > Pacific Community | Communauté du Pacifique CPS – B.P. D5 | 98848
Noumea,
>> > New Caledonia | Nouméa, Nouvelle-Calédonie
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Andrew Halford Ph.D
> Senior Coastal Fisheries Scientist
> Pacific Community | Communauté du Pacifique CPS – B.P. D5 | 98848 Noumea,
> New Caledonia | Nouméa, Nouvelle-Calédonie


-- 
Andrew Halford Ph.D
Senior Coastal Fisheries Scientist
Pacific Community | Communauté du Pacifique CPS – B.P. D5 | 98848 Noumea,
New Caledonia | Nouméa, Nouvelle-Calédonie

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] finding nearest zip codes

2020-08-04 Thread Bert Gunter
In addition to Rich's advice...
as always, have you searched?!
e.g. on "zip code distances" or similar at rseek.org.

This appears to have been asked before and there are tools available.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 4, 2020 at 6:29 PM Debasmita Sur  wrote:

> Dear R-experts,
> I have two lists of US zip codes and want to pick the nearest zip code from
> second list against my first list.e.g.30043 (from second list) is closest
> to the zip code 30094 (from first list).So,it should come against 30094.The
> code should compare the distance from each zip and pick the nearest one.
> I have written the following code. It is giving proper results for many,
> but in mindist, it is showing 'NAs'. But for some of the zip codes, it is
> giving proper minimum distance. Please note it will be effective for 5
> digit zip codes. Any help will be highly appreciated.
>
> df1<-read.csv("C:/Users/dxsur/Desktop/ZIP1.csv")
> df2<-read.csv("C:/Users/dxsur/Desktop/ZIP2.csv")
>
> results<-merge(x=df1,y=zipcode,all.x=TRUE)
> results1<-merge(x=df2,y=zipcode,all.x=TRUE)
>
> distance<-distm(subset(results,select=c(longitude,latitude)),subset(results1,select=c(longitude,latitude)))
>
> rnum=apply(distance, 1, which.min)
> mindist=apply(distance, 1, min)
>
> final<-cbind(results,results1$zip[unlist(rnum)],mindist)
>
>
> Thanks & Regards,
> *Debasmita *
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] finding nearest zip codes

2020-08-04 Thread Richard M. Heiberger
verify that you actually have five-digit zip codes stored as
characters. New Jersey and Massachusetts have zero as
the first digit.  When these codes are saved as numbers, they become
four-digit codes and will probably cause errors.
For example Cambridge, Mass is '02138', and would be reported as 2138
when interpreted as a number..

On Tue, Aug 4, 2020 at 9:29 PM Debasmita Sur  wrote:
>
> Dear R-experts,
> I have two lists of US zip codes and want to pick the nearest zip code from
> second list against my first list.e.g.30043 (from second list) is closest
> to the zip code 30094 (from first list).So,it should come against 30094.The
> code should compare the distance from each zip and pick the nearest one.
> I have written the following code. It is giving proper results for many,
> but in mindist, it is showing 'NAs'. But for some of the zip codes, it is
> giving proper minimum distance. Please note it will be effective for 5
> digit zip codes. Any help will be highly appreciated.
>
> df1<-read.csv("C:/Users/dxsur/Desktop/ZIP1.csv")
> df2<-read.csv("C:/Users/dxsur/Desktop/ZIP2.csv")
>
> results<-merge(x=df1,y=zipcode,all.x=TRUE)
> results1<-merge(x=df2,y=zipcode,all.x=TRUE)
> distance<-distm(subset(results,select=c(longitude,latitude)),subset(results1,select=c(longitude,latitude)))
>
> rnum=apply(distance, 1, which.min)
> mindist=apply(distance, 1, min)
>
> final<-cbind(results,results1$zip[unlist(rnum)],mindist)
>
>
> Thanks & Regards,
> *Debasmita *
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] finding nearest zip codes

2020-08-04 Thread Debasmita Sur
Dear R-experts,
I have two lists of US zip codes and want to pick the nearest zip code from
second list against my first list.e.g.30043 (from second list) is closest
to the zip code 30094 (from first list).So,it should come against 30094.The
code should compare the distance from each zip and pick the nearest one.
I have written the following code. It is giving proper results for many,
but in mindist, it is showing 'NAs'. But for some of the zip codes, it is
giving proper minimum distance. Please note it will be effective for 5
digit zip codes. Any help will be highly appreciated.

df1<-read.csv("C:/Users/dxsur/Desktop/ZIP1.csv")
df2<-read.csv("C:/Users/dxsur/Desktop/ZIP2.csv")

results<-merge(x=df1,y=zipcode,all.x=TRUE)
results1<-merge(x=df2,y=zipcode,all.x=TRUE)
distance<-distm(subset(results,select=c(longitude,latitude)),subset(results1,select=c(longitude,latitude)))

rnum=apply(distance, 1, which.min)
mindist=apply(distance, 1, min)

final<-cbind(results,results1$zip[unlist(rnum)],mindist)


Thanks & Regards,
*Debasmita *

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Arrange data

2020-08-04 Thread Rui Barradas

Hello,

Please keep cc-ing the list R-help is threaded and questions and answers 
might be of help to others in the future.


As for the question, see if the following code does what you want.
First, create a logical index i of the months between 7 and 3 and use 
that index to subset the original data.frame. Then, a cumsum trick gives 
a vector M defining the data grouping. Group and compute the Value means 
with aggregate. Finally, since each group spans a year border, create a 
more meaningful Years column and put everything together.


df1 <- read.csv("mddat.csv")

i <- with(df1, (Month >= 7 & Month <= 12) | (Month >= 1 & Month <= 3))
df2 <- df1[i, ]
M <- cumsum(c(FALSE, diff(as.integer(row.names(df2))) > 1))

agg <- aggregate(Value ~ M, df2, mean)
Years <- sapply(split(df2$Year, M), function(x){paste(x[1], 
x[length(x)], sep = "-")})

final <- cbind.data.frame(Years, Value = agg[["Value"]])

head(final)
#  YearsValue
#0 1975-1975 87.0
#1 1975-1976 89.4
#2 1976-1977 85.8
#3 1977-1978 81.6
#4 1978-1979 71.6
#5 1979-1980 75.8


Hope this helps,

Rui Barradas



Às 20:44 de 04/08/20, Md. Moyazzem Hossain escreveu:

Dear Rui,

Thanks a lot for your help.

It is working. Now I am also trying to find the average of values for 
*July 1975 to March 1976* and record as the value of the year 1975. 
Moreover, I want to continue it up to the year 2017. You may check the 
attached file for data (mddat.csv).


I use the following function but got error
aggregate(Value ~ Year, data = subset(df1, Month >= 7 & Month <= 3), FUN 
= mean)


Please help me again. Thanks in advance.

Best Regards,
Md

On Mon, Aug 3, 2020 at 11:28 PM Rui Barradas > wrote:


Hello,

And here is another way, with aggregate.

Make up test data.

set.seed(2020)
df1 <- expand.grid(Year = 2000:2018, Month = 1:12)
df1 <- df1[order(df1$Year),]
df1$Value <- sample(20:30, nrow(df1), TRUE)
head(df1)


#Use subset to keep only the relevant months
aggregate(Value ~ Year, data = subset(df1, Month <= 7), FUN = mean)


Hope this helps,

Rui Barradas

Às 12:33 de 03/08/2020, Rasmus Liland escreveu:
 > On 2020-08-03 21:11 +1000, Jim Lemon wrote:
 >> On Mon, Aug 3, 2020 at 8:52 PM Md. Moyazzem Hossain
mailto:hossai...@juniv.edu>> wrote:
 >>> Hi,
 >>>
 >>> I have a dataset having monthly
 >>> observations (from January to
 >>> December) over a period of time like
 >>> (2000 to 2018). Now, I am trying to
 >>> take an average the value from
 >>> January to July of each year.
 >>>
 >>> The data looks like
 >>> Year    Month  Value
 >>> 2000    1         25
 >>> 2000    2         28
 >>> 2000    3         22
 >>>     ..      .
 >>> 2000    12       26
 >>> 2001     1       27
 >>> ...         
 >>> 2018    11       30
 >>> 20118   12      29
 >>>
 >>> Can someone help me in this regard?
 >>>
 >>> Many thanks in advance.
 >> Hi Md,
 >> One way is to form a subset of your
 >> data, then calculate the means by
 >> year:
 >>
 >> # assume your data is named mddat
 >> mddat2<-mddat[mddat$month < 7,]
 >> jan2jun<-by(mddat2$value,mddat2$year,mean)
 >>
 >> Jim
 > Hi Md,
 >
 > you can also define the period in a new
 > column, and use aggregate like this:
 >
 >       Md <- structure(list(
 >       Year = c(2000L, 2000L, 2000L,
 >       2000L, 2001L, 2018L, 2018L),
 >       Month = c(1L, 2L, 3L, 12L, 1L,
 >       11L, 12L),
 >       Value = c(25L, 28L, 22L, 26L,
 >       27L, 30L, 29L)),
 >       class = "data.frame",
 >       row.names = c(NA, -7L))
 >
 >       Md[Md$Month %in%
 >               1:6,"Period"] <- "first six months of the year"
 >       Md[Md$Month %in% 7:12,"Period"] <- "last six months of the
year"
 >
 >       aggregate(
 >         formula=Value~Year+Period,
 >         data=Md,
 >         FUN=mean)
 >
 > Rasmus
 >
 > __
 > R-help@r-project.org  mailing list
-- To UNSUBSCRIBE and more, see
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.


-- 
Este e-mail foi verificado em termos de vírus pelo software

antivírus Avast.
https://www.avast.com/antivirus

__
R-help@r-project.org  mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





___

Re: [R] defining group colours in a call to rda

2020-08-04 Thread Abby Spurdle
Hi,

Your example is not reproducible.
However, I suspect that the following is the problem:

c("red","green","blue","aquamarine","magenta")[MI_fish_all.mrt$where]

Here's my version:

where = c (3, 3, 8, 6, 6, 9, 5, 5, 9, 3, 8, 6, 9, 6, 5, 9, 5, 3, 8, 6,
9, 6, 5, 9, 5, 3, 3, 8, 6, 6, 9, 5, 5, 9, 6, 9, 5, 9)
unique (where)

c("red", "green", "blue", "aquamarine", "magenta")[where]

There's five colors.
But only two of the indices are within one to five.
So, the resulting color vector contains missing values.

In the base graphics system, if you set colors to NA, it usually means no color.

I'm not sure exactly what you want to do, but I'm assuming you can fix
it from here.

On Tue, Aug 4, 2020 at 9:49 PM Andrew Halford  wrote:
>
> Hi,
>
> I've been trying to use the output on group membership of the final leaves
> in a MRT analysis to define my own colours, however I am not getting the
> result I'm after.
>
> Here is the code
> fish.pca <-rda(fish_all.hel,scale=TRUE)
> fish.site <- scores(fish.pca,display="sites",scaling=3)
> fish.spp <-
> scores(fish.pca,display="species",scaling=3)[fish.MRT.indval$pval<=0.05,]
> plot(fish.pca,display=c("sites","species"),type="n",scaling=3)
> points(fish.site,pch=21,bg=MI_fish_all.mrt$where,cex=1.2)
> plotcolor <-
> c("red","green","blue","aquamarine","magenta")[MI_fish_all.mrt$where]
>  fish.pca <-rda(fish_all.hel,scale=TRUE)
> plot(fish.pca,display=c("sites","species"),type="n",scaling=3)
> points(fish.site,pch=21,bg=plotcolor,cex=1.2)
> MI_fish_all.mrt$where
>
> If I run the points command and insert the group membership direct from the
> MRT analysis e.g.  bg=MI_fish_all.mrt$where , then the subsequent points
> plot up correctly with a different colour for each group.However if I try
> to impose my own colour combo with plotcolor.It prints colours for 2
> groups and leaves the rest uncoloured.
>
> The call to  MI_fish_all.mrt$where gives...
>  [1] 3 3 8 6 6 9 5 5 9 3 8 6 9 6 5 9 5 3 8 6 9 6 5 9 5 3 3 8 6 6 9 5 5 9 6
> 9 5 9.
>
> These are the split groupings for all 39 sites in the analysis and there
> are 5 numbers corresponding to 5 final leaves in the tree.
>
> I cant see why my colour scheme isnt being recognised.
>
> All help accepted.
>
> Andy
>
>
> --
> Andrew Halford Ph.D
> Senior Coastal Fisheries Scientist
> Pacific Community | Communauté du Pacifique CPS – B.P. D5 | 98848 Noumea,
> New Caledonia | Nouméa, Nouvelle-Calédonie
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mathematical working procedure of duplicated() function in r

2020-08-04 Thread Greg Snow
Rui pointed out that you can examine the source yourself.  FAQ 7.40
has a link to an article with detail on finding and examining the
source code.

A general algorithm for checking for duplicates follows (I have not
examined to R source code to see if they use something more clever).

Create an empty object (I will call it seen).  This could be a simple
vector, but for efficiency it is better to use an object type that has
fast lookup, e.g. binary tree, associative array/hash/dictionary, etc.

Create an empty vector of logicals the same length as x (I will call it result).

loop from 1 to the length of x (or from the length to 1 if
fromLast=TRUE), on each iteration
 check to see if the value of x[i] is in seen
   If it is: set result[i] to TRUE
   If it is not: add the current value to seen and set result[i] to false

After the loop finishes, throw away seen and reclaim the memory, then
return result.

Since it looks like you are using this on a matrix or data frame,
there is probably a preprocessing step that combines all the values on
each row into a single character string.

On Tue, Aug 4, 2020 at 6:45 AM K Purna Prakash  wrote:
>
> Dear Sir(s),
> I request you to provide the detailed* internal mathematical working
> mechanism of the following function *for better understanding.
> *x[duplicated(x) | duplicated(x, fromLast=TRUE), ]*
> I am having some confusion in understanding how duplicates are being
> identified when thousands of records are there.
> I will look for a positive response.
> Thank you,
> K.Purna Prakash.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mathematical working procedure of duplicated() function in r

2020-08-04 Thread Rui Barradas

Hello,

R is open source, you can see exactly what is the internal working of 
any function. You can have access to the code by typing the function's 
name without parenthesis at an R command line.


> duplicated
function (x, incomparables = FALSE, ...)
UseMethod("duplicated")



Now, this tells users that duplicated is a generic function, and that 
there are methods written to handle the different S3 classes of objects x.

When this happens, there is always a default method, duplicated.default

> duplicated.default
function (x, incomparables = FALSE, fromLast = FALSE, nmax = NA,
...)
.Internal(duplicated(x, incomparables, fromLast, if (is.factor(x)) 
min(length(x),

nlevels(x) + 1L) else nmax))




The default method calls .Internal(duplicated, etc). So you'll have to 
download the R sources, if you haven't done it yet, and search for a 
file where that function might be. The file is


src/main/duplicate.c


Good reading.
Also, like the posting guide asks R-Help users to do, please post in 
plain text, not in HTML.


Hope this helps,

Rui Barradas

Às 12:54 de 04/08/20, K Purna Prakash escreveu:

Dear Sir(s),
I request you to provide the detailed* internal mathematical working
mechanism of the following function *for better understanding.
*x[duplicated(x) | duplicated(x, fromLast=TRUE), ]*
I am having some confusion in understanding how duplicates are being
identified when thousands of records are there.
I will look for a positive response.
Thank you,
K.Purna Prakash.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] confidence intervals for the difference between group means

2020-08-04 Thread Prof. Dr. Matthias Kohl

you could try:

library(MKinfer)
meanDiffCI(a, b, boot = TRUE)

Best
Matthias

Am 04.08.20 um 16:08 schrieb varin sacha via R-help:

Dear R-experts,

Using the bootES package I can easily calculate the bootstrap confidence 
intervals of the means like in the toy example here below. Now, I am looking 
for the confidence intervals for the difference between group means. In my 
case, the point estimate of the mean difference is 64.4. I am looking at the 
95% confidence intervals around this point estimate (64.4).

Many thanks for your response.


library(bootES)
a<-c(523,435,478,567,654)
b<-c(423,523,421,467,501)
bootES(a)
bootES(b)


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Prof. Dr. Matthias Kohl
www.stamats.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] confidence intervals for the difference between group means

2020-08-04 Thread varin sacha via R-help
Dear R-experts,

Using the bootES package I can easily calculate the bootstrap confidence 
intervals of the means like in the toy example here below. Now, I am looking 
for the confidence intervals for the difference between group means. In my 
case, the point estimate of the mean difference is 64.4. I am looking at the 
95% confidence intervals around this point estimate (64.4).

Many thanks for your response.


library(bootES)
a<-c(523,435,478,567,654) 
b<-c(423,523,421,467,501)
bootES(a)
bootES(b)


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double MAD with R

2020-08-04 Thread varin sacha via R-help
Dear Rui,

Many thanks for your response. 

Best,

SV







Le lundi 3 août 2020 à 16:54:35 UTC+2, Rui Barradas  a 
écrit : 





Hello,

No, there isn't a built-in that I know of.
Here is one:


double.mad <- function(x, include.right = FALSE, na.rm = FALSE){
  if(na.rm) x <- x[!is.na(x)]
  m <- median(x)
  odd <- (length(x) %% 2L) == 1L
  out <- if(odd){
    if(include.right) {
  c(lo = mad(x[x < m]), hi = mad(x[x >= m]))
    } else {
  c(lo = mad(x[x <= m]), hi = mad(x[x > m]))
    }
  } else {
    c(lo = mad(x[x < m]), hi = mad(x[x > m]))
  }
  out
}

double.mad(x)
# lo  hi
#0.81543 0.44478

double.mad(c(x, 1))
# lo  hi
#2.29803 0.44478

double.mad(c(x, 1), include.right = TRUE)
# lo  hi
#1.03782 1.63086


Hope this helps,

Rui Barradas

Às 15:22 de 03/08/2020, varin sacha via R-help escreveu:
> Dear R-Experts,
>
> Is there an all-ready function to calculate the Double MAD (Median absolute 
> deviation) as there is an easy function to calculate the MAD "mad function". 
> Or I have to write my own function for Double MAD ?
>
> To calculate the double MAD, the idea is the following : for the obtained 
> median value, we should calculate two median absolution deviations. One 
> deviation should be calculated for the numbers below the median and one for 
> the numbers above the median:
>
> Here is the very easy reproducible example :
>
> x<-c(2.5,4.4,3.2,2.1,1.3,2.6,5,6.6,5,5,6.1,7.2,9.4,6.9)
> mad(x)
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mathematical working procedure of duplicated() function in r

2020-08-04 Thread K Purna Prakash
Dear Sir(s),
I request you to provide the detailed* internal mathematical working
mechanism of the following function *for better understanding.
*x[duplicated(x) | duplicated(x, fromLast=TRUE), ]*
I am having some confusion in understanding how duplicates are being
identified when thousands of records are there.
I will look for a positive response.
Thank you,
K.Purna Prakash.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hist from a list

2020-08-04 Thread Pedro páramo
Hi Rasmus, Josh and Rui,

First of all many thanks in advance about your help.

The first thig is sometimes you say " you are posting in HTML and that
makes the
post unreadable as this is a plain text list" how can I put the code in the
correct way, not html (attaching in txt?)

The second about the code:

I have used this:

bwc <- cbind(bwfinal2,bwfinal)
colnames(bwc)=c("Accion","reval")
df <- matrix(unlist(bwc), nrow=nrow(bwc), byrow=F)
colnames(bwchist)=c("Accion","reval")
bwchist <-as.data.frame(bwc[order(df[,2]), ])

bwchist is the ordered cum stock returns in the year but because is a list
it is not possible to plot and histogram with x (names of stocks) and the x
axist the value of cum stocks (reval)

when I put dput(bwchist) the console says:

dput(bwchist)
structure(list(Accion = list("REE", "Enagas", "Grifols", "Ferrovial",
"Acerinox", "Naturgy", "Inditex", "Bankia", "ENCE", "Aena",
"Bankinter", "Mapfre", "CaixaBank", "CIE", "Colonial", "Almirall",
"Indra", "ArcelorMittal", "ACS", "Telefonica", "Amadeus",
"BBVA", "Merlin", "Santander", "Repsol", "Melia", "Sabadell",
"IAG", "Acciona", "Endesa", "MasMovil", "Iberdrola", "SGamesa",
"Viscofan", "Cellnex"), reval = list(-0.0200827282700085,
-0.0590294115600855, -0.214126598790964, -0.220773677809979,
-0.229653300324357, -0.257944379583984, -0.283942789063822,
-0.285159347392533, -0.303814713896458, -0.30734460425763,
-0.309408155539818, -0.319912221435868, -0.322790949659181,
-0.344047579452905, -0.347919538415482, -0.356898907103825,
-0.374263261296661, -0.40147247119078, -0.405150043834815,
-0.406022775042175, -0.413786100987797, -0.440679109311707,
-0.442603156492871, -0.491634140733524, -0.499254932434042,
-0.6, -0.709737357505148, -0.724461258850966, 0.0220528711420083,
0.0462767672643172, 0.115044247787611, 0.238734548714937,
0.274578114644054, 0.343422896082666, 0.387826126094928)), class =
"data.frame", row.names = c(NA,
-35L))

I try to make an hist or barplot but because it is a list no way to obtain
the plot.

Many thanks again for your help.

I have printed two manuals to improve my level, but if you can help me, I
would be very very gratefull.



El vie., 31 jul. 2020 a las 18:28, Rasmus Liland ()
escribió:

> On 2020-07-31 10:07 -0500, Joshua Ulrich wrote:
> | On Fri, Jul 31, 2020 at 9:55 AM Rui Barradas wrote:
> | | Às 15:44 de 31/07/2020, Michael Dewey escreveu:
> | | | Dear Pedro
> | | |
> | | | Some comments in-line
> | | |
> | | | On 30/07/2020 21:16, Pedro páramo wrote:
> | | | | Hi all,
> | | | |
> | | | | I attach my code, the think is I
> | | | | want to make a bar plot the last
> | | | | variable called "bwchist" so the
> | | | | X axis are "Accion" and the y
> | | | | axis are "reval" values.
> | | | |
> | | | | I have prove class(bwchist) and
> | | | | says dataframe but its still a
> | | | | list because it says me I have
> | | | | prove to unlist, but it doesnt
> | | | | work
> | | | |
> | | | | hist(bwchist)
> | | | | Error in hist.default(bwchist) : 'x' must be numeric
> | | |
> | | | So bwchist is not a numeric
> | | | variable as hist needs. Aboce you
> | | | said it is a data frame but data
> | | | frames are not numeric.
> | | |
> | | | For future reference your example
> | | | is way too long for anyone to go
> | | | through and try to help you. Try
> | | | next time to reduce it to the
> | | | absolute minimum by removing
> | | | sections while you still get the
> | | | error.  It is also easier to get
> | | | help if you can remove unnecessary
> | | | packages.
> | | |
> | | | It is also unreadable because you
> | | | are posting in HTML and that makes
> | | | the post unreadable as this is a
> | | | plain text list.
> | |
> | | Hello,
> | |
> | | I second Michael's opinion. When the
> | | post's code is very long, there is a
> | | tendency to have less answers.
> | |
> | | Please post the output of
> | |
> | | dput(head(bwchist, 30))
> | |
> | | It's much shorter code and it
> | | recreates the data so we will be
> | | able to see what's wrong and try to
> | | find a solution.
> |
> | Hi Pedro,
> |
> | Another 'best practice' and polite
> | thing to do is link to other places
> | you may have cross-posted.  That will
> | give people the opportunity to see if
> | your questions has been answered in
> | another forum.
> |
> | I saw your post on R-SIG-Finance
> | (https://stat.ethz.ch/pipermail/r-sig-finance/2020q3/014979.html),
> | and started to work on a solution.
> |
> | I don't know how to do this in
> | tidyquant, but here's how you can do
> | it with quantmod:
> |
> | # all tickers
> | tk <- c("ANA.MC", "ACS.MC", "AENA.MC", "AMS.MC", "MTS.MC", "BBVA.MC", "
> SAB.MC",
> |   "SAN.MC", "BKT.MC", "CABK.MC", "CLNX.MC", "ENG.MC", "ENC.MC", "ELE.MC
> ",
> |   "FER.MC", "GRF.MC", "IBE.MC", "ITX.MC", "COL.MC", "IAG.MC", "MAP.MC",
> |   "MEL.MC", "MRL.MC", "NTGY.MC", "REE.MC", "REP.MC", "SGRE.MC", "TEF.MC
> ",
> |   "VIS.MC", "ACX.MC", "BKIA.MC", "C

[R] defining group colours in a call to rda

2020-08-04 Thread Andrew Halford
Hi,

I've been trying to use the output on group membership of the final leaves
in a MRT analysis to define my own colours, however I am not getting the
result I'm after.

Here is the code
fish.pca <-rda(fish_all.hel,scale=TRUE)
fish.site <- scores(fish.pca,display="sites",scaling=3)
fish.spp <-
scores(fish.pca,display="species",scaling=3)[fish.MRT.indval$pval<=0.05,]
plot(fish.pca,display=c("sites","species"),type="n",scaling=3)
points(fish.site,pch=21,bg=MI_fish_all.mrt$where,cex=1.2)
plotcolor <-
c("red","green","blue","aquamarine","magenta")[MI_fish_all.mrt$where]
 fish.pca <-rda(fish_all.hel,scale=TRUE)
plot(fish.pca,display=c("sites","species"),type="n",scaling=3)
points(fish.site,pch=21,bg=plotcolor,cex=1.2)
MI_fish_all.mrt$where

If I run the points command and insert the group membership direct from the
MRT analysis e.g.  bg=MI_fish_all.mrt$where , then the subsequent points
plot up correctly with a different colour for each group.However if I try
to impose my own colour combo with plotcolor.It prints colours for 2
groups and leaves the rest uncoloured.

The call to  MI_fish_all.mrt$where gives...
 [1] 3 3 8 6 6 9 5 5 9 3 8 6 9 6 5 9 5 3 8 6 9 6 5 9 5 3 3 8 6 6 9 5 5 9 6
9 5 9.

These are the split groupings for all 39 sites in the analysis and there
are 5 numbers corresponding to 5 final leaves in the tree.

I cant see why my colour scheme isnt being recognised.

All help accepted.

Andy


-- 
Andrew Halford Ph.D
Senior Coastal Fisheries Scientist
Pacific Community | Communauté du Pacifique CPS – B.P. D5 | 98848 Noumea,
New Caledonia | Nouméa, Nouvelle-Calédonie

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Arrange data

2020-08-04 Thread Jim Lemon
Your problem is in the subset operation. You have asked for a value of
month greater or equal to 7 and less than or equal to 6. You probably
got an error message that told you that the data were of length zero
or something similar. If you check the result of that statement:

> mddat$month >= 7 & mddat$month <= 6
logical(0)

In other words, the two logical statements when ANDed cannot produce a
result. A number cannot be greater than or equal to 7 AND less than or
equal to 6. What you want is:

mddat2<-mddat[mddat$Year == 1975 & mddat$Month >= 7 |
 mddat$Year == 1976 & mddat$Month <= 6,]
mean(mddat2$Value)
[1] 88.91667

Apart from that, your email client is inserting EOL characters that
cause an error when pasted into R.

Error: unexpected input in "�"

Probably due to MS Outlook, this has been happening quite a bit lately.

Jim

On Mon, Aug 3, 2020 at 11:30 PM Md. Moyazzem Hossain
 wrote:
>
> Dear Jim,
>
> Thank you very much. It is working now.
>
> However, I am also trying to find the average of the value from July 1975 to 
> June 1976 and recorded as the value for the year 1975 but got an error 
> message. I am attaching the data file here. Please check the attachment.
>
> mddat=read.csv("F:/mddat.csv", header=TRUE)
> mddat2<-mddat[mddat$Month >=7 & mddat$Month <= 6,]
> jan2jun<-by(mddat2$Value,mddat2$Year,mean)
> jan2jun
>
> Please help me again and many thanks in advance.
>
> Md
>
>
> On Mon, Aug 3, 2020 at 12:33 PM Rasmus Liland  wrote:
>>
>> On 2020-08-03 21:11 +1000, Jim Lemon wrote:
>> > On Mon, Aug 3, 2020 at 8:52 PM Md. Moyazzem Hossain  
>> > wrote:
>> > >
>> > > Hi,
>> > >
>> > > I have a dataset having monthly
>> > > observations (from January to
>> > > December) over a period of time like
>> > > (2000 to 2018). Now, I am trying to
>> > > take an average the value from
>> > > January to July of each year.
>> > >
>> > > The data looks like
>> > > YearMonth  Value
>> > > 20001 25
>> > > 20002 28
>> > > 20003 22
>> > > ..  .
>> > > 200012   26
>> > > 2001 1   27
>> > > ... 
>> > > 201811   30
>> > > 20118   12  29
>> > >
>> > > Can someone help me in this regard?
>> > >
>> > > Many thanks in advance.
>> >
>> > Hi Md,
>> > One way is to form a subset of your
>> > data, then calculate the means by
>> > year:
>> >
>> > # assume your data is named mddat
>> > mddat2<-mddat[mddat$month < 7,]
>> > jan2jun<-by(mddat2$value,mddat2$year,mean)
>> >
>> > Jim
>>
>> Hi Md,
>>
>> you can also define the period in a new
>> column, and use aggregate like this:
>>
>> Md <- structure(list(
>> Year = c(2000L, 2000L, 2000L,
>> 2000L, 2001L, 2018L, 2018L),
>> Month = c(1L, 2L, 3L, 12L, 1L,
>> 11L, 12L),
>> Value = c(25L, 28L, 22L, 26L,
>> 27L, 30L, 29L)),
>> class = "data.frame",
>> row.names = c(NA, -7L))
>>
>> Md[Md$Month %in%
>> 1:6,"Period"] <- "first six months of the year"
>> Md[Md$Month %in% 7:12,"Period"] <- "last six months of the year"
>>
>> aggregate(
>>   formula=Value~Year+Period,
>>   data=Md,
>>   FUN=mean)
>>
>> Rasmus
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.