Re: [R] using match-type function to return correctly ordered data from a dataframe

2012-10-27 Thread Markus Weisner
Hi Jeff. I believe my Function #1 actually does use %in% to select the
data.  I use %in% all the time but, as far as I can tell, it can only
return a vector of logical values.  As a result, it does keep the order of
the dataframe from which you are selecting data.  It does not, however,
appear that you can return the data in the order of the values that you
were specifying the data to be in.

To try and clarify my order assertion, take for example a dataframe that
has a column LETTER with a record for each alphabetical letter.  The
dataframe is ordered so that A is record 1 and Z is record 26.  Say
that I want to pull records from this dataframe based on a list of letters
and I want it to return those records in the order of the letters I passed
it.  I could use a something like the following code to pull records ...

myDataFrame[myDataFrame$LETTERS, %in% myPassedListOfLetters,]

If I pass it the list, myPassedListOfLetters - c(C, B, A), I will
receive the data back in the order A, B, C.  What I am trying to
figure out is how to get the data back in the order of the list that I
specified I want the data in (C, B, A).

Hope that clarifies what I am trying to figure out a bit.  Thanks for your
help!
Best,
Markus




On Fri, Oct 26, 2012 at 11:00 PM, Jeff Newmiller
jdnew...@dcn.davis.ca.uswrote:

 Have you actually read

 ?%in%

 ?

 Although a valuable tool, not all answers are most effectively obtained by
 Googling.

 Also, your repeated assertions that the answers are not maintained in
 order are poorly framed. They DO stay in order according to the zipcode
 database order. That said, your desire for numeric indexes is only as far
 away as your help file.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 Markus Weisner r...@themarkus.com wrote:

 I am regularly running into a problem where I can't seem to figure out
 how
 maintain correct data order when selecting data out of a dataframe.
 The
 below code shows an example of trying to pull data from a dataframe
 using
 ordered zip codes.  My problem is returning the pulled data in the
 correct
 order.  This is a very simple example, but it illustrates a regular
 problem
 that I am running into.
 
 In the past, I have used fairly complicated solutions to pull this off.
 There has got to be a more simple and straightforward method ...
 probably
 some function that I missed in all my googling.
 
 Thanks in advance for anybody's help figuring this out.
 ~Markus
 
 
 ### Function Definitions ###
 
 # FUNCTION #1 (returns wrong order)
 getLatitude1 = function(myzips) {
 
   # load libraries and data
   library(zipcode)
   data(zipcode)
 
   # get latitude values
  mylats = zipcode[zipcode$zip %in% myzips, latitude] #problem is that
 this code does not maintain order
 
   # return data
   return(mylats)
 }
 
 # FUNCTION #2 (also returns wrong order)
 getLatitude2 = function(myzips) {
 
   # load libraries and data
   library(zipcode)
   data(zipcode)
 
   # convert myzips to DF
   myzips = as.data.frame(as.character(myzips))
 
   # merge in zipcode data based on zip
   results = merge(myzips, zipcode[,c(zip, latitude)], by.x =
 as.character(myzips), by.y=zip, all.x=TRUE)
 
   # return data
   return(results$latitude)
 }
 
 
 ### Code ###
 
 # specify a set of zip codes
 myzips = c(74432, 72537, 06026, 01085, 65793)
 
 # create a DF
 myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA)
 
 # look at data to determine what should be returned and in what order
 library(zipcode)
 data(zipcode)
 zipcode[zipcode$zip %in% myzips,]
 
 # test function #1 (function definition below)
 myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order
 
 # test function #2 (function definition below)
 myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong
 order
 
 
 
 # need myzips %in% zipcode$zip to return array/df indices rather than
 logical
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide

[R] using match-type function to return correctly ordered data from a dataframe

2012-10-26 Thread Markus Weisner
I am regularly running into a problem where I can't seem to figure out how
maintain correct data order when selecting data out of a dataframe.  The
below code shows an example of trying to pull data from a dataframe using
ordered zip codes.  My problem is returning the pulled data in the correct
order.  This is a very simple example, but it illustrates a regular problem
that I am running into.

In the past, I have used fairly complicated solutions to pull this off.
 There has got to be a more simple and straightforward method ... probably
some function that I missed in all my googling.

Thanks in advance for anybody's help figuring this out.
~Markus


### Function Definitions ###

# FUNCTION #1 (returns wrong order)
getLatitude1 = function(myzips) {

  # load libraries and data
  library(zipcode)
  data(zipcode)

  # get latitude values
  mylats = zipcode[zipcode$zip %in% myzips, latitude] #problem is that
this code does not maintain order

  # return data
  return(mylats)
}

# FUNCTION #2 (also returns wrong order)
getLatitude2 = function(myzips) {

  # load libraries and data
  library(zipcode)
  data(zipcode)

  # convert myzips to DF
  myzips = as.data.frame(as.character(myzips))

  # merge in zipcode data based on zip
  results = merge(myzips, zipcode[,c(zip, latitude)], by.x =
as.character(myzips), by.y=zip, all.x=TRUE)

  # return data
  return(results$latitude)
}


### Code ###

# specify a set of zip codes
myzips = c(74432, 72537, 06026, 01085, 65793)

# create a DF
myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA)

# look at data to determine what should be returned and in what order
library(zipcode)
data(zipcode)
zipcode[zipcode$zip %in% myzips,]

# test function #1 (function definition below)
myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order

# test function #2 (function definition below)
myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong order



# need myzips %in% zipcode$zip to return array/df indices rather than
logical

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] need advice on using excel to check data for import into R

2012-04-22 Thread Markus Weisner
I have created an S4 object type for conducting fire department data
analysis.  The object includes validity check that ensures certain fields
are present and that duplicate records don't exist for certain combinations
of columns (e.g. no duplicate incident number / incident data / unit ID
ensures that the data does not show the same fire engine responding twice
on the same call).

I am finding that I spend a lot of time taking client data, converting it
to my S4 object, and then sending it back to the client to correct data
validity issues.

I am trying to figure out a clever way to have excel (typically the program
used by my clients) check client data prior to them submitting it to me.  I
have been working with somebody on trying to develop an excel toolbar
add-in with limited success.

My question is whether anybody can think of clever alternatives for clients
to validate their data … for example, is their a R excel plugin (that would
be easily installed by a client) where I might be able write some lines of
R to check the data and output messages … or maybe some sort of server
where they could upload their data and I could have some lines of R code
that would check the code and send back potential error messages?

I realize this is a fairly open ended question … just looking for some
general ideas and directions to go. Getting a little frustrated with
spending most of my work time dealing with data cleaning issues … guessing
this is a problem shared by many of us that use R!

Thanks,
Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need advice on using excel to check data for import into R

2012-04-22 Thread Markus Weisner
If I go to wiki - how to install it looks like a rather complicated
installation that involves installing R followed by several command line
prompts.

It looks like it might be too much of an installation process to make sense
for a client to conduct a one-time data check.

Looks like a great tool though.  Is there a simpler way of deploying Rexcel
that I am not seeing?

Thanks,
Markus


On Sun, Apr 22, 2012 at 3:43 PM, Richard M. Heiberger r...@temple.eduwrote:

 This looks like a perfect case for an RExcel solution.
 RExcel is an addin that allows you, among other things, to place an
 arbitrary R function inside the
 Excel automatic recalculation mode.  For details see
 rcom.univie.ac.at
 There are many references item listed on the wiki page in the left panel.
 For further followup, please sign up for the rcom mailing list, again with
 the
 details on the web site.

 Rich

 On Sun, Apr 22, 2012 at 2:34 PM, Markus Weisner r...@themarkus.com wrote:

 I have created an S4 object type for conducting fire department data
 analysis.  The object includes validity check that ensures certain fields
 are present and that duplicate records don't exist for certain
 combinations
 of columns (e.g. no duplicate incident number / incident data / unit ID
 ensures that the data does not show the same fire engine responding twice
 on the same call).

 I am finding that I spend a lot of time taking client data, converting it
 to my S4 object, and then sending it back to the client to correct data
 validity issues.

 I am trying to figure out a clever way to have excel (typically the
 program
 used by my clients) check client data prior to them submitting it to me.
  I
 have been working with somebody on trying to develop an excel toolbar
 add-in with limited success.

 My question is whether anybody can think of clever alternatives for
 clients
 to validate their data … for example, is their a R excel plugin (that
 would
 be easily installed by a client) where I might be able write some lines of
 R to check the data and output messages … or maybe some sort of server
 where they could upload their data and I could have some lines of R code
 that would check the code and send back potential error messages?

 I realize this is a fairly open ended question … just looking for some
 general ideas and directions to go. Getting a little frustrated with
 spending most of my work time dealing with data cleaning issues … guessing
 this is a problem shared by many of us that use R!

 Thanks,
 Markus

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to match exact phrase using gsub (or similar function)

2012-03-28 Thread Markus Weisner
trying to switch out addresses that have double directions, such as the
following example:

a = S S Main St  Interstate 95

a = gsub(pattern=S S , replacement=S , a)


… the problem is that I don't want to affect instances where this might be
a correct address such as the following:


3421 BIGS St


what I want to say is switch out only if this is either of the following
situations


[beginning of char]S S

 S S 

S S[end of char]


Is there anyway of making gsub or a similar function make the replacements
I want?  Thanks in advance for your help.


~Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to match exact phrase using gsub (or similar function)

2012-03-28 Thread Markus Weisner
Thanks Justin and Bill.  That did the trick!!
*
~Markus
*


On Wed, Mar 28, 2012 at 4:45 PM, Justin Haynes jto...@gmail.com wrote:

 wow!  and here I thought I was starting to know most things about
 regexes...

 On Wed, Mar 28, 2012 at 1:34 PM, William Dunlap wdun...@tibco.com wrote:
  You can use the \ and \ patterns (backslashing the backslashes) to
  mean start and end of word, respectively.  E.g.,
 
addresses - c(S S Main St  Interstate 95, 3421 BIGS St)
gsub(\\S S\\, S, addresses)
   [1] S Main St  Interstate 95 3421 BIGS St
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf
  Of Justin Haynes
  Sent: Wednesday, March 28, 2012 1:24 PM
  To: Markus Weisner
  Cc: r-help@r-project.org
  Subject: Re: [R] how to match exact phrase using gsub (or similar
 function)
 
  In most regexs the carrot( ^ ) signifies the start of a line and the
  dollar sign ( $ ) signifies the end.
 
  gsub('^S S', 'S', a)
 
  gsub('^S S', 'S', '3421 BIGS St')
 
  you can use logical or inside your pattern too:
 
  gsub('^S S|S S$| S S ', 'S', a)
 
  the  S S  condition is difficult.
 
  gsub('^S S|S S$| S S ', 'S', 'foo S S bar')
 
  gives the wrong output. as does:
 
  gsub('^S S | S S$| S S ', ' S ', 'foo S S bar')
  gsub('^S S | S S$| S S ', ' S ', a)
 
 
  so you might have to catch that with a second gsub.
 
  gsub(' S S ', ' S ', 'foo S S bar')
 
 
  On Wed, Mar 28, 2012 at 12:32 PM, Markus Weisner r...@themarkus.com
 wrote:
   trying to switch out addresses that have double directions, such as
 the
   following example:
  
   a = S S Main St  Interstate 95
  
   a = gsub(pattern=S S , replacement=S , a)
  
  
   . the problem is that I don't want to affect instances where this
 might be
   a correct address such as the following:
  
  
   3421 BIGS St
  
  
   what I want to say is switch out only if this is either of the
 following
   situations
  
  
   [beginning of char]S S
  
S S 
  
   S S[end of char]
  
  
   Is there anyway of making gsub or a similar function make the
 replacements
   I want?  Thanks in advance for your help.
  
  
   ~Markus
  
  [[alternative HTML version deleted]]
  
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] stumped on how to reorder factors

2011-08-30 Thread Markus Weisner
I am trying to reorder a factor data type so that when I plot stats
associated with the factor, the ordering makes sense.

For instance, if I have a factor entered as follows ...

A = as.factor(c(1, 10, 3, 3, 10, 10))

levels(A)


... the ordering does not really make sense (assuming I want the factor
ordered by integer value), but I understand that this mis-ordering  is
because the ordering is based on a character string data type and not on an
integer data type.  Because I run into this problem frequently, I wrote a
small function to fix this:


reorder_factor = function(x, x_sum, decreasing=FALSE){

factor(as.character(x), levels=levels(x)[order(x_sum, decreasing=decreasing)
])

}


I can then run the following code to fix the problem:


A = reorder_factor(x=A, x_sum=as.numeric(levels(A)), decreasing=FALSE)

levels(A)


... and now I have correctly ordered integers.  Perhaps not the most elegant
solution, but it worked for my purposes.  Now I have a more complicated
problem and I need help.  Assuming the following factor:


B = as.factor(c(Engine 1, Engine 10, Ladder 3, Engine 3, Ladder 10,
Engine 10))

levels(B)


I would like the factor ordered first by the proceeding unit type and then
ordered by the following integer.  In this case, I would like to see this
order:  Engine 1, Engine 3, Engine 10, Ladder 3, Ladder 10.  I have tried
many different ways of separating out the unit type from the number, but am
having trouble figuring out a good way of achieving this factor order.  For
such a small example, I could obviously manually change the order, but I am
dealing with much larger datasets with many unit types and up to 20
different numbers for each unit type.  Having an automated way of ordering
these units would be a huge help.  Thanks in advance for any help you can
provide.

--Markus Weisner

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dataframe selection using a multi-value key

2010-09-07 Thread Markus Weisner
I am merging two dataframes using a relational key (incident number and
incident year), but not all the records match up.  I want to be able to
review only the records that cannot be merged for each individual dataframe
(essentially trying to select records from one dataframe using a multi-value
relational key from the other dataframe).  The following code shows what I
am trying to do.  The final two lines of code do not work, but if somebody
could figure out a workable solution, that would be great.  Thanks.
--Markus

incidents = data.frame(
INC_NO = c(1,2,3,4,5,6,7,8,9,10),
INC_YEAR = c(2006, 2006, 2006, 2007, 2008, 2008, 2008, 2008, 2009,
2010),
INC_TYPE = c(EMS, FIRE, GAS, MVA, EMS, EMS, EMS,
FIRE, EMS, EMS))

responses = data.frame(
INC_NO = c(1,2,2,2,3,4,5,6,7,8,8,8,9,10),
INC_YEAR = c(2006, 2006, 2006, 2006, 2006, 2007, 2008, 2008, 2008,
2018, 2018, 2018, 2009, 2010),
UNIT_TYPE = c(E2, E2, E5, T1, E7, E6, E2, E2, E1,
E3, E7, T1, E7, E5))

merged_data = merge(incidents, responses, by=c(INC_NO, INC_YEAR))

relational_key = c(INC_NO, INC_YEAR)

## following does not work, but I want DF of incidents that did not merge up
with responses
incidents[incidents[,relational_key] %in% responses[,relational_key],]

## following does not work, but I want DF of responses that did not merge up
with incidents
responses[responses[,relational_key] %in% incidents[,relational_key],]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe selection using a multi-value key

2010-09-07 Thread Markus Weisner
Hi Erik and Jim.  Both solutions did the trick.  Thanks you!!
--Markus

On Tue, Sep 7, 2010 at 9:05 PM, jim holtman jholt...@gmail.com wrote:

 try this:

  merged_data = merge(incidents, responses, by=c(INC_NO, INC_YEAR),
 all=TRUE)
  # responses that don't match
  subset(merged_data, is.na(INC_TYPE), select=c(INC_NO, INC_YEAR,
 UNIT_TYPE))
INC_NO INC_YEAR UNIT_TYPE
 11  8 2018E3
 12  8 2018E7
 13  8 2018T1
  # incidents that don't match
  subset(merged_data, is.na(UNIT_TYPE), select=c(INC_NO, INC_YEAR,
 INC_TYPE))
INC_NO INC_YEAR INC_TYPE
 10  8 2008 FIRE
 


 On Tue, Sep 7, 2010 at 8:25 PM, Markus Weisner ma...@me.com wrote:
  I am merging two dataframes using a relational key (incident number and
  incident year), but not all the records match up.  I want to be able to
  review only the records that cannot be merged for each individual
 dataframe
  (essentially trying to select records from one dataframe using a
 multi-value
  relational key from the other dataframe).  The following code shows what
 I
  am trying to do.  The final two lines of code do not work, but if
 somebody
  could figure out a workable solution, that would be great.  Thanks.
  --Markus
 
  incidents = data.frame(
 INC_NO = c(1,2,3,4,5,6,7,8,9,10),
 INC_YEAR = c(2006, 2006, 2006, 2007, 2008, 2008, 2008, 2008, 2009,
  2010),
 INC_TYPE = c(EMS, FIRE, GAS, MVA, EMS, EMS, EMS,
  FIRE, EMS, EMS))
 
  responses = data.frame(
 INC_NO = c(1,2,2,2,3,4,5,6,7,8,8,8,9,10),
 INC_YEAR = c(2006, 2006, 2006, 2006, 2006, 2007, 2008, 2008, 2008,
  2018, 2018, 2018, 2009, 2010),
 UNIT_TYPE = c(E2, E2, E5, T1, E7, E6, E2, E2,
 E1,
  E3, E7, T1, E7, E5))
 
  merged_data = merge(incidents, responses, by=c(INC_NO, INC_YEAR))
 
  relational_key = c(INC_NO, INC_YEAR)
 
  ## following does not work, but I want DF of incidents that did not merge
 up
  with responses
  incidents[incidents[,relational_key] %in% responses[,relational_key],]
 
  ## following does not work, but I want DF of responses that did not merge
 up
  with incidents
  responses[responses[,relational_key] %in% incidents[,relational_key],]
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] to extend data.frame or not ... that is the question

2010-07-19 Thread Markus Weisner
(EMS, FIRE,
FIRE, EMS, FIRE, FIRE), unit=c(E1, E5, T1, E3, E1, T1),
response_time=c(300,400,350,250,500,200))
data = as.CAD(DF)

### test methods on CAD example
head(data)
tail(data)
subset(data, data$unit %in% c(E5, T1))
as.data.frame(data[2, c(incident_num, unit)])

###

*Markus Weisner*, Firefighter Medic and GIS Analyst
Charlottesville Fire Department
203 Ridge Street
Charlottesville, Virginia 22901
(434) 970-3240

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using setMethod or setGeneric to change S4 accessor symbol from @ to $

2010-02-09 Thread Markus Weisner
Thanks so much for your help.  I am realizing that I may be
over-complicating things for myself.  I have learned a ton about creating
methods, but I feel like I am trying to reinvent the data.frame class.
Basically, I am trying to create a data.frame type object where I can
enforce the header names and column data types.  I am trying to force the
user to setup the following fields:

   - event_number (character)
   - agency (factor)
   - unit_num (factor)
   - alarm (POSIXct)
   - priority (factor)

A user might use the following code:

event_number = c(1:5)
agency = c(CFD, rep(ACFR, 3), CFD)
unit_num = c(E1, T10, E3, E2, BC1)
temp =  c(00:52:35, 06:58:18, 13:42:18, 20:59:45, 21:19:00)
alarm = as.POSIXct(strptime(temp, format=%H:%M:%S))
priority = c(A, E, A, C, C)
data = data.frame(event_number=event_number, agency=agency,
unit_number=unit_num, alarm=alarm, priority=priority)

I have all sorts of functions that I am trying to incorporate into a package
for analyzing fire department data, but keep having problems with small
deviations in data format causing errors.  In this example, the following
might cause issues in my functions:

   - event_number should be of type character
   - agency, unit_number, and priority, should be of type factor
   - unit_number should actually have name unit_num

Ideally, I would be able to extend either the actual data.frame class or
something similar, so that users are forced to create correctly formatted
data.frames ... something that would create error messages until the user
uses the following code:

data = data.frame(event_number=as.character(event_number),
agency=as.factor(agency), unit_num=as.factor(unit_num), alarm=alarm,
priority=as.factor(priority))

After a user has created a correctly formatted object, the user may need to
manipulate the data prior to applying the analysis functions.  For instance,
a user might just want to analyze data for Engine #1 (unit_num == E1).
Because of the need to manipulate data, I am trying to maintain all the same
functionality as a data frame ... subset(), head(), [i,j], et cetera.

Just wondering if you think creating a new S4 class is the way to go.  So
far I got the head(), tail(), and subset() methods working for my new S4
class, but the [ seems like a pretty big undertaking.  Is there something
easier you might recommend?  Would it be possible to extend the data.frame
class to include some data verifications?  If so, do you have some basic
pointers for setting something like that up?

Really appreciate all your help thus far.  Hopefully, one last advice email
will do the trick.  Thanks.
--Markus



On Mon, Feb 8, 2010 at 6:43 PM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 02/08/2010 02:54 PM, Markus Weisner wrote:
  Thanks.  Used getGeneric([) to figure out the general format for the
  setMethod, but am having some problem with how to set up the actual
  function:
 
  getGeneric([)
  standardGeneric for [ defined from package base
 
  function (x, i, j, ..., drop = TRUE)
  standardGeneric([, .Primitive([))
  environment: 0x116513c30
  Methods may be defined for arguments: x, i, j, drop
  Use  showMethods([)  for currently available ones.
 
  Based on this, I set up the following code:
 
  setClass(A, representation(a=numeric, b=numeric))
  data = new(A, a=1:10, b=1:10)
  setMethod([, A,
  function(x, i, j, ..., drop) {
  slotnames - slotNames(x)[j]
  new_ = new(A)
  for(slot in slotnames) new_d...@slot = x...@slot[i]
  new_data
  })
  data[5,c(a)]

 probably there are several issues and covering them in an email response
 won't do them justice.

 instead of new_d...@slot, use slot(new_data, slot)

 [ dispatches on four arguments, and likely the cases need to be
 handled differently (e.g., data[,a] vs. data[,TRUE] vs data[1,]). So
 you'll end up with methods

 setMethod([, c(A, missing, character, ANY), ...
 setMethod([, c(A, missing, logical, ANY), ...
 setMethod([, c(A, numeric, missing, ANY), ...

 plus others, or you'll write something like

 setMethod([, c(A, missing, ANY, ANY),
   function(x, i, j, ..., drop=TRUE)
 {
   if (is.character(j))
   j - match(j, slotNames(x))
   j - slotNames(x)[j]
   ...
 })

 (ANY is implicit, it's unlikely you'll ever dispatch on 'drop', so a
 signature for [ often omits teh fourth signature element). You'll aim
 for re-use, so likely the methods are all wrappers around some simple
 function .subset_A(x, i, j, drop) where i, j are the types that'll work.

 x...@slot - value and slot(x, slot) - value make (at least) one copy of x
 each time they're invoked, so your code above is making multiple copies
 of the data. One strategy is not to define an  'initialize' method and
 gain the benefit of the default method as a kind of copy constructor,
 along the lines of

   initialize(x, a=slot(x, a)[j], b=slot(x, b)[j])

 if the subset were to be of slots a and b.

 You said your objective was to write a kind of enhanced

[R] using setMethod or setGeneric to change S4 accessor symbol from @ to $

2010-02-08 Thread Markus Weisner
I created some S4 objects that are essentially data frame objects.  The S4
object definitions were necessary to verify data integrity and force a
standardized data format.  I am, however, finding myself redefining all the
typical generic functions so that I can still manipulate my S4 objects as if
they were data frames ... I have used setMethod to set methods for subset,
head, and tail.  I would like to use setMethod or setGeneric to enable
me to use object$slotname to access obj...@slotname for my S4 objects.  Any
advice is appreciated.  Thanks.
--Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using setMethod or setGeneric to change S4 accessor symbol from @ to $

2010-02-08 Thread Markus Weisner
Worked like a charm!!  Thank you so much.  I just plugged the following into
my code ...

setMethod($, CADresponses, function(x, name) slot(x, name))

... and it worked perfect.  If you don't mind, I have a quick follow up
question, using your example

setClass(A, representation(a=numeric, b=numeric))
setMethod($, A, function(x, name) slot(x, name))
data = new(A, a=1:10, b=1:10)
data$a[5] #now works thanks to your code
data$a[5] - 200 #assignments do not work -- any ideas?
data[5,c(a)] = 200 #would also like this to work -- any ideas?

Do you have any suggestions for getting assignments and brackets to work as
they would for data frames?  Thanks so much for your help.
Best,
Markus



On Mon, Feb 8, 2010 at 2:44 PM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 02/07/2010 08:31 PM, Markus Weisner wrote:
  I created some S4 objects that are essentially data frame objects.  The
 S4
  object definitions were necessary to verify data integrity and force a
  standardized data format.  I am, however, finding myself redefining all
 the
  typical generic functions so that I can still manipulate my S4 objects as
 if
  they were data frames ... I have used setMethod to set methods for
 subset,
  head, and tail.  I would like to use setMethod or setGeneric to
 enable
  me to use object$slotname to access obj...@slotname for my S4 objects.
  Any
  advice is appreciated.  Thanks.

 Hi Markus --

  setClass(A, representation(a=numeric))
 [1] A
  new(A)$a
 Error in new(A)$a : $ operator not defined for this S4 class
  getGeneric($)
 standardGeneric for $ defined from package base

 function (x, name)
 standardGeneric($, .Primitive($))
 environment: 0xa62028
 Methods may be defined for arguments: x
 Use  showMethods($)  for currently available ones.
  setMethod($, A, function(x, name) slot(x, name))
 [1] $
  new(A, a=1:10)$a
  [1]  1  2  3  4  5  6  7  8  9 10
  new(A, a=1:10)$b
 Error in slot(x, name) : no slot of name b for this object of class A

 does that help?

 Martin

  --Markus
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


 --
 Martin Morgan
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using setMethod or setGeneric to change S4 accessor symbol from @ to $

2010-02-08 Thread Markus Weisner
Thanks.  Used getGeneric([) to figure out the general format for the
setMethod, but am having some problem with how to set up the actual
function:

 getGeneric([)
standardGeneric for [ defined from package base

function (x, i, j, ..., drop = TRUE)
standardGeneric([, .Primitive([))
environment: 0x116513c30
Methods may be defined for arguments: x, i, j, drop
Use  showMethods([)  for currently available ones.

Based on this, I set up the following code:

setClass(A, representation(a=numeric, b=numeric))
data = new(A, a=1:10, b=1:10)
setMethod([, A,
function(x, i, j, ..., drop) {
slotnames - slotNames(x)[j]
new_ = new(A)
for(slot in slotnames) new_d...@slot = x...@slot[i]
new_data
})
data[5,c(a)]

The problem is that I cannot access S4 object slots using @ and a character
variable.  I also cannot access a slot using the typical brackets since that
is what I am trying to define here.  Kind of stuck.  Thanks for any advice
you might have.
Best,
Markus

On Mon, Feb 8, 2010 at 4:54 PM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 02/08/2010 01:22 PM, Markus Weisner wrote:
  Worked like a charm!!  Thank you so much.  I just plugged the following
 into
  my code ...
 
  setMethod($, CADresponses, function(x, name) slot(x, name))
 
  ... and it worked perfect.  If you don't mind, I have a quick follow up
  question, using your example
 
  setClass(A, representation(a=numeric, b=numeric))
  setMethod($, A, function(x, name) slot(x, name))
  data = new(A, a=1:10, b=1:10)
  data$a[5] #now works thanks to your code
  data$a[5] - 200 #assignments do not work -- any ideas?

 same idea, but for $-

  setClass(A, representation(a=numeric))
 [1] A
  getGeneric($-)
 standardGeneric for $- defined from package base

 function (x, name, value)
 standardGeneric($-, .Primitive($-))
 environment: 0x14c33a8
 Methods may be defined for arguments: x, value
 Use  showMethods($-)  for currently available ones.
  setReplaceMethod($, A, function(x, name, value) {
 + slot(x, name) - value
 + x
 + })
 [1] $-
  a - new(A, a=1:10)
  a$a - 10:1
  a
 An object of class A
 Slot a:
  [1] 10  9  8  7  6  5  4  3  2  1

  data[5,c(a)] = 200 #would also like this to work -- any ideas?
 
  Do you have any suggestions for getting assignments and brackets to work
 as
  they would for data frames?  Thanks so much for your help.

 same approach, but using getGeneric([) and getGeneric([-) to guide
 you.

 Martin

  Best,
  Markus
 
 
 
  On Mon, Feb 8, 2010 at 2:44 PM, Martin Morgan mtmor...@fhcrc.org
 wrote:
 
  On 02/07/2010 08:31 PM, Markus Weisner wrote:
  I created some S4 objects that are essentially data frame objects.  The
  S4
  object definitions were necessary to verify data integrity and force a
  standardized data format.  I am, however, finding myself redefining all
  the
  typical generic functions so that I can still manipulate my S4 objects
 as
  if
  they were data frames ... I have used setMethod to set methods for
  subset,
  head, and tail.  I would like to use setMethod or setGeneric to
  enable
  me to use object$slotname to access obj...@slotname for my S4 objects.
   Any
  advice is appreciated.  Thanks.
 
  Hi Markus --
 
  setClass(A, representation(a=numeric))
  [1] A
  new(A)$a
  Error in new(A)$a : $ operator not defined for this S4 class
  getGeneric($)
  standardGeneric for $ defined from package base
 
  function (x, name)
  standardGeneric($, .Primitive($))
  environment: 0xa62028
  Methods may be defined for arguments: x
  Use  showMethods($)  for currently available ones.
  setMethod($, A, function(x, name) slot(x, name))
  [1] $
  new(A, a=1:10)$a
   [1]  1  2  3  4  5  6  7  8  9 10
  new(A, a=1:10)$b
  Error in slot(x, name) : no slot of name b for this object of class
 A
 
  does that help?
 
  Martin
 
  --Markus
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  --
  Martin Morgan
  Computational Biology / Fred Hutchinson Cancer Research Center
  1100 Fairview Ave. N.
  PO Box 19024 Seattle, WA 98109
 
  Location: Arnold Building M1 B861
  Phone: (206) 667-2793
 
 


 --
 Martin Morgan
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] documenting methods for S4 objects

2010-01-06 Thread Markus Weisner
I put out an email last night with output from my package check.  Many
thanks to Uwe and Liviu who got back to me quickly about how to fix some of
the errors.  I realize though, that I had some additional questions /
confusion points in terms of documenting packages.

1)  I seem to understand how to document a straightforward function, but am
having trouble figuring out how to document methods functions when creating
new S4 objects.  For instance, I am trying to document the following
function:

setMethod(tail, NFIRS, function(x, n=5) tail(as.data.frame(x), n=n))

So far, I have the following for my .Rd file (built automatically by
package.skeleton)

\name{tail-methods}
\docType{methods}
\alias{tail-methods}
\alias{tail,ANY-method}
\alias{tail,NFIRS-method}
\title{ ~~ Methods for Function tail  ~~}
\description{
 ~~ Methods for function \code{tail}  ~~
}
\section{Methods}{
\describe{

\item{x = ANY}{ ~~describe this method here }

\item{x = NFIRS}{ ~~describe this method here }
}}
\keyword{methods}
\keyword{ ~~ other possible keyword(s)}

Since the tail method is already defined for R and my function works the
exact same, do I still need to make a man file for this function?  Do I just
copy the information from the R tail function (i.e. Title:  Return the
First or Last Part of an Object).  What do I put down under describe (the
items don't really make sense to me since my tail function requires two
things 1) an object of type NFIRS and 2) n, the number of rows to display
-- I don't understand what x=ANY and x=NFIRS means)?  Do I need to add
any keywords, or can I just erase the other possible keywords line?

2)  Do all functions in a package need to be documented even if the user
will never use them.  I have lots of little support functions that the user
should never really use.  Just wondering if all packages submitted to CRAN
are required to have every last function documented.


Thanks for your help.
--Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] debugging package

2010-01-05 Thread Markus Weisner
I am trying to debug a package to submit it to CRAN and am getting a bunch
of error messages.  Most of the errors are because of the Rd files which
were automatically populated by the package.skeleton function.  I find the
section on documentation to be pretty confusion in the R Extensions manual.
Any help on getting these errors fixed would be hugely appreciated.  Thanks.
--Markus

* checking for working pdflatex ... OK
* using log directory '/Users/markus/Dropbox/NFIRS_S4/NFIRS.Rcheck'
* using R version 2.9.2 Patched (2009-09-24 r50179)
* using session charset: UTF-8
* checking for file 'NFIRS/DESCRIPTION' ... OK
* checking extension type ... Package
* this is package 'NFIRS' version '1.0'
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking for executable files ... OK
* checking whether package 'NFIRS' can be installed ... WARNING
Found the following significant warnings:
   missing link(s):  ~~fun~~ CLASSNAME-class
See '/Users/markus/Dropbox/NFIRS_S4/NFIRS.Rcheck/00install.out' for details.
* checking package directory ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking for unstated dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... ERROR
Rd files with likely Rd problems:
Unaccounted top-level text in file 'NFIRS-class.Rd':
Following section 'note':
\n\n ~Make other sections like Warning with \\section{Warning }{}
~\n\n

Unaccounted top-level text in file 'NFIRS-package.Rd':
Following section 'references':
\n~~ Optionally other standard keywords, one per line, from file KEYWORDS
in the R documentation directory ~~\n

Rd files with missing or empty '\title':
  NFIRS.summary.Rd
  read.NFIRS.Rd

Rd files without 'description':
  NFIRS.summary.Rd
  read.NFIRS.Rd
Rd files without 'title':
  NFIRS.summary.Rd
  read.NFIRS.Rd
These entries are required in an Rd file.

Rd files with non-standard keywords:
  as.data.frame-methods.Rd: ~~ other possible keyword(s)
  head-methods.Rd: ~~ other possible keyword(s)
  NFIRS.summary.Rd: ~kwd1 ~kwd2
  read.NFIRS.Rd: ~kwd1 ~kwd2
  summary-methods.Rd: ~~ other possible keyword(s)
  tail-methods.Rd: ~~ other possible keyword(s)
Each '\keyword' entry should specify one of the standard keywords (as
listed in file 'KEYWORDS' in the R documentation directory).

Rd files with duplicated alias 'as.data.frame,NFIRS-method':
  as.data.frame-methods.Rd NFIRS-class.Rd
Rd files with duplicated alias 'head,NFIRS-method':
  head-methods.Rd NFIRS-class.Rd
Rd files with duplicated alias 'summary,NFIRS-method':
  NFIRS-class.Rd summary-methods.Rd
Rd files with duplicated alias 'tail,NFIRS-method':
  NFIRS-class.Rd tail-methods.Rd

See the chapter 'Writing R documentation files' in manual 'Writing R
Extensions'.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package license questions

2010-01-03 Thread Markus Weisner
I am looking for some advice on licenses.  Here is my situation:

Over the last couple years, I have developed a rather large number of fire
department analysis functions.  I am in the process of trying to publish
some packages to make these functions available to the public.  I am trying
to release two packages that essentially define S4 classes for common types
of fire department data.  Then, I would like to publish a package that
essentially reads in these fire department data types and returns analysis
results.  My concern is that I may eventually want to build and sell some
proprietary functions and I am trying not to box myself out of this future
option.  It is my understanding that if I use the GPL license, all work
based on my packages would have to take on the GPL license (effectively
making it impossible to sell software).  It looks like the Lesser General
Public License (LGPL) may suit my needs by allowing me to make public my
current work without eliminating the possibility of future proprietary
work.  I have a couple questions I am hoping somebody can answer.


   - It says that libraries licensed under a LGPL can be used by
   proprietary software.  What is meant by libraries?  Are class definitions
   and functions considered libraries?
   - If I use the LGPL for all my packages, do I maintain the right to build
   and sell software that is based on these LGPL packages?  For instance, could
   I use the class definitions from a LGPL package as inputs for analysis in a
   piece of proprietary software?
   - Other than potentially allowing competitors to also use my LGPL
   packages in their proprietary software, are there any big disadvantages to
   using a LGPL?
   - If somebody improves on my LGPL S4 class definitions, can I still then
   use them in a proprietary package despite their being modified?

I am a big supporter of the open source community and have personally
benefitted greately from open source software.  My intentions are to release
my work as open source, but just don't want to be boxed out of future
proprietary developments.  These licenses can be pretty confusing, so I
appreciate any information that can help me figure this out.

Thanks,
Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] projecting GIS coordinates for analysis with spatstat package

2009-03-01 Thread Markus Weisner
I am working on creating an R package for doing fire department analysis and
am trying to create a function that can display emergency incident
densities.  The following code sort of does the trick, but I need a display
that shows the number of incidents per square mile.  I believe the code
below shows incidents per square unit (in this case, degrees lat/long).

To solve this problem, I believe that I need to convert the coordinates
(currently WGS84) to some projection that is based on miles rather than
degrees lat/long.  Does anybody know the code for projecting coordinates so
that my density plot will show incidents per sq-mile?

If there is a simpler way of displaying incident densities than using the
spatstat package, please let me know.

Thanks,
Markus


#create data
data = data.frame(xcoord=c(-123.1231, -123.0245, -123.1042, -123.1555,
-123.1243, -123.0984, -123.1050, -123.0909, -123.1292, -123.0973, -123.0987,
-123.1016, -123.2355, -123.1005, -123.1130, -123.1308, -123.1281, -123.1281,
-123.1275, -123.1269, -123.1595, -123.1202, -123.1756, -123.0791, -123.0791,
-123.0969, -123.0969, -123.0905, -123.0718, -123.0969, -123.1337, -123.1531,
-123.1362, -123.1550, -123.0725, -123.1249, -123.1249, -123.1249, -123.1249,
-123.1249, -123.1777, -123.1237, -123.1912, -123.0256, -123.1347, -123.1246,
-123.1931, -123.0971, -123.0281, -123.0928), ycoord=c(49.27919, 49.23780,
49.24881, 49.27259, 49.26057, 49.25654, 49.25000, 49.28119, 49.27908,
49.28442, 49.28318, 49.27293, 49.25805, 49.28137, 49.22528, 49.26066,
49.27841, 49.27841, 49.28019, 49.27414, 49.24220, 49.27744, 49.23474,
49.28229, 49.28229, 49.27671, 49.27671, 49.25974, 49.26510, 49.27671,
49.29036, 49.26100, 49.27989, 49.26103, 49.27216, 49.27548, 49.27548,
49.27548, 49.27548, 49.27548, 49.23475, 49.27759, 49.24524, 49.26271,
49.20531, 49.26337, 49.23862, 49.28447, 49.20871, 49.28306),
itype=c(Emergency Medical Service, Rescue, Service Call, Alarm
Activation, Hazardous Condition, Motor Vehicle Accident, Emergency
Medical Service, Emergency Medical Service, Fire, Alarm Activation,
Emergency Medical Service, Motor Vehicle Accident, Emergency Medical
Service, Emergency Medical Service, Emergency Medical Service, Alarm
Activation, Alarm Activation, Alarm Activation, Emergency Medical
Service, Emergency Medical Service, Emergency Medical Service, Alarm
Activation, Emergency Medical Service, Hazardous Condition, Hazardous
Condition, Motor Vehicle Accident, Motor Vehicle Accident, Motor
Vehicle Accident, Alarm Activation, Motor Vehicle Accident, Emergency
Medical Service, Motor Vehicle Accident, Alarm Activation, Emergency
Medical Service, Emergency Medical Service, Fire, Fire, Fire,
Fire, Fire, Motor Vehicle Accident, Emergency Medical Service,
Emergency Medical Service, Motor Vehicle Accident, Alarm Activation,
Emergency Medical Service, Alarm Activation, Fire, Emergency Medical
Service, Emergency Medical Service))

#add necessary libraries
library(sp)
library(maptools)
library(spatstat)
library(RColorBrewer)

#add coordinates to data
coordinates(data) = c(xcoord, ycoord)

#convert coordinates to spatstat point pattern dataset
ppp_data = as(data[itype], ppp)

#determine density of point pattern
density_data = density.ppp(ppp_data)

#plot density
plot(density_data, col=brewer.pal(9, Reds))

-- 
Markus Weisner, Firefighter
Charlottesville Fire Department
203 Ridge Street
Charlottesville, Virginia 22901
(434) 970-3240

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.