Re: [R] using match-type function to return correctly ordered data from a dataframe
Hi Jeff. I believe my Function #1 actually does use %in% to select the data. I use %in% all the time but, as far as I can tell, it can only return a vector of logical values. As a result, it does keep the order of the dataframe from which you are selecting data. It does not, however, appear that you can return the data in the order of the values that you were specifying the data to be in. To try and clarify my order assertion, take for example a dataframe that has a column LETTER with a record for each alphabetical letter. The dataframe is ordered so that A is record 1 and Z is record 26. Say that I want to pull records from this dataframe based on a list of letters and I want it to return those records in the order of the letters I passed it. I could use a something like the following code to pull records ... myDataFrame[myDataFrame$LETTERS, %in% myPassedListOfLetters,] If I pass it the list, myPassedListOfLetters - c(C, B, A), I will receive the data back in the order A, B, C. What I am trying to figure out is how to get the data back in the order of the list that I specified I want the data in (C, B, A). Hope that clarifies what I am trying to figure out a bit. Thanks for your help! Best, Markus On Fri, Oct 26, 2012 at 11:00 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote: Have you actually read ?%in% ? Although a valuable tool, not all answers are most effectively obtained by Googling. Also, your repeated assertions that the answers are not maintained in order are poorly framed. They DO stay in order according to the zipcode database order. That said, your desire for numeric indexes is only as far away as your help file. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Markus Weisner r...@themarkus.com wrote: I am regularly running into a problem where I can't seem to figure out how maintain correct data order when selecting data out of a dataframe. The below code shows an example of trying to pull data from a dataframe using ordered zip codes. My problem is returning the pulled data in the correct order. This is a very simple example, but it illustrates a regular problem that I am running into. In the past, I have used fairly complicated solutions to pull this off. There has got to be a more simple and straightforward method ... probably some function that I missed in all my googling. Thanks in advance for anybody's help figuring this out. ~Markus ### Function Definitions ### # FUNCTION #1 (returns wrong order) getLatitude1 = function(myzips) { # load libraries and data library(zipcode) data(zipcode) # get latitude values mylats = zipcode[zipcode$zip %in% myzips, latitude] #problem is that this code does not maintain order # return data return(mylats) } # FUNCTION #2 (also returns wrong order) getLatitude2 = function(myzips) { # load libraries and data library(zipcode) data(zipcode) # convert myzips to DF myzips = as.data.frame(as.character(myzips)) # merge in zipcode data based on zip results = merge(myzips, zipcode[,c(zip, latitude)], by.x = as.character(myzips), by.y=zip, all.x=TRUE) # return data return(results$latitude) } ### Code ### # specify a set of zip codes myzips = c(74432, 72537, 06026, 01085, 65793) # create a DF myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA) # look at data to determine what should be returned and in what order library(zipcode) data(zipcode) zipcode[zipcode$zip %in% myzips,] # test function #1 (function definition below) myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order # test function #2 (function definition below) myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong order # need myzips %in% zipcode$zip to return array/df indices rather than logical [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide
[R] using match-type function to return correctly ordered data from a dataframe
I am regularly running into a problem where I can't seem to figure out how maintain correct data order when selecting data out of a dataframe. The below code shows an example of trying to pull data from a dataframe using ordered zip codes. My problem is returning the pulled data in the correct order. This is a very simple example, but it illustrates a regular problem that I am running into. In the past, I have used fairly complicated solutions to pull this off. There has got to be a more simple and straightforward method ... probably some function that I missed in all my googling. Thanks in advance for anybody's help figuring this out. ~Markus ### Function Definitions ### # FUNCTION #1 (returns wrong order) getLatitude1 = function(myzips) { # load libraries and data library(zipcode) data(zipcode) # get latitude values mylats = zipcode[zipcode$zip %in% myzips, latitude] #problem is that this code does not maintain order # return data return(mylats) } # FUNCTION #2 (also returns wrong order) getLatitude2 = function(myzips) { # load libraries and data library(zipcode) data(zipcode) # convert myzips to DF myzips = as.data.frame(as.character(myzips)) # merge in zipcode data based on zip results = merge(myzips, zipcode[,c(zip, latitude)], by.x = as.character(myzips), by.y=zip, all.x=TRUE) # return data return(results$latitude) } ### Code ### # specify a set of zip codes myzips = c(74432, 72537, 06026, 01085, 65793) # create a DF myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA) # look at data to determine what should be returned and in what order library(zipcode) data(zipcode) zipcode[zipcode$zip %in% myzips,] # test function #1 (function definition below) myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order # test function #2 (function definition below) myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong order # need myzips %in% zipcode$zip to return array/df indices rather than logical [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need advice on using excel to check data for import into R
I have created an S4 object type for conducting fire department data analysis. The object includes validity check that ensures certain fields are present and that duplicate records don't exist for certain combinations of columns (e.g. no duplicate incident number / incident data / unit ID ensures that the data does not show the same fire engine responding twice on the same call). I am finding that I spend a lot of time taking client data, converting it to my S4 object, and then sending it back to the client to correct data validity issues. I am trying to figure out a clever way to have excel (typically the program used by my clients) check client data prior to them submitting it to me. I have been working with somebody on trying to develop an excel toolbar add-in with limited success. My question is whether anybody can think of clever alternatives for clients to validate their data for example, is their a R excel plugin (that would be easily installed by a client) where I might be able write some lines of R to check the data and output messages or maybe some sort of server where they could upload their data and I could have some lines of R code that would check the code and send back potential error messages? I realize this is a fairly open ended question just looking for some general ideas and directions to go. Getting a little frustrated with spending most of my work time dealing with data cleaning issues guessing this is a problem shared by many of us that use R! Thanks, Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need advice on using excel to check data for import into R
If I go to wiki - how to install it looks like a rather complicated installation that involves installing R followed by several command line prompts. It looks like it might be too much of an installation process to make sense for a client to conduct a one-time data check. Looks like a great tool though. Is there a simpler way of deploying Rexcel that I am not seeing? Thanks, Markus On Sun, Apr 22, 2012 at 3:43 PM, Richard M. Heiberger r...@temple.eduwrote: This looks like a perfect case for an RExcel solution. RExcel is an addin that allows you, among other things, to place an arbitrary R function inside the Excel automatic recalculation mode. For details see rcom.univie.ac.at There are many references item listed on the wiki page in the left panel. For further followup, please sign up for the rcom mailing list, again with the details on the web site. Rich On Sun, Apr 22, 2012 at 2:34 PM, Markus Weisner r...@themarkus.com wrote: I have created an S4 object type for conducting fire department data analysis. The object includes validity check that ensures certain fields are present and that duplicate records don't exist for certain combinations of columns (e.g. no duplicate incident number / incident data / unit ID ensures that the data does not show the same fire engine responding twice on the same call). I am finding that I spend a lot of time taking client data, converting it to my S4 object, and then sending it back to the client to correct data validity issues. I am trying to figure out a clever way to have excel (typically the program used by my clients) check client data prior to them submitting it to me. I have been working with somebody on trying to develop an excel toolbar add-in with limited success. My question is whether anybody can think of clever alternatives for clients to validate their data for example, is their a R excel plugin (that would be easily installed by a client) where I might be able write some lines of R to check the data and output messages or maybe some sort of server where they could upload their data and I could have some lines of R code that would check the code and send back potential error messages? I realize this is a fairly open ended question just looking for some general ideas and directions to go. Getting a little frustrated with spending most of my work time dealing with data cleaning issues guessing this is a problem shared by many of us that use R! Thanks, Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to match exact phrase using gsub (or similar function)
trying to switch out addresses that have double directions, such as the following example: a = S S Main St Interstate 95 a = gsub(pattern=S S , replacement=S , a) the problem is that I don't want to affect instances where this might be a correct address such as the following: 3421 BIGS St what I want to say is switch out only if this is either of the following situations [beginning of char]S S S S S S[end of char] Is there anyway of making gsub or a similar function make the replacements I want? Thanks in advance for your help. ~Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to match exact phrase using gsub (or similar function)
Thanks Justin and Bill. That did the trick!! * ~Markus * On Wed, Mar 28, 2012 at 4:45 PM, Justin Haynes jto...@gmail.com wrote: wow! and here I thought I was starting to know most things about regexes... On Wed, Mar 28, 2012 at 1:34 PM, William Dunlap wdun...@tibco.com wrote: You can use the \ and \ patterns (backslashing the backslashes) to mean start and end of word, respectively. E.g., addresses - c(S S Main St Interstate 95, 3421 BIGS St) gsub(\\S S\\, S, addresses) [1] S Main St Interstate 95 3421 BIGS St Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Justin Haynes Sent: Wednesday, March 28, 2012 1:24 PM To: Markus Weisner Cc: r-help@r-project.org Subject: Re: [R] how to match exact phrase using gsub (or similar function) In most regexs the carrot( ^ ) signifies the start of a line and the dollar sign ( $ ) signifies the end. gsub('^S S', 'S', a) gsub('^S S', 'S', '3421 BIGS St') you can use logical or inside your pattern too: gsub('^S S|S S$| S S ', 'S', a) the S S condition is difficult. gsub('^S S|S S$| S S ', 'S', 'foo S S bar') gives the wrong output. as does: gsub('^S S | S S$| S S ', ' S ', 'foo S S bar') gsub('^S S | S S$| S S ', ' S ', a) so you might have to catch that with a second gsub. gsub(' S S ', ' S ', 'foo S S bar') On Wed, Mar 28, 2012 at 12:32 PM, Markus Weisner r...@themarkus.com wrote: trying to switch out addresses that have double directions, such as the following example: a = S S Main St Interstate 95 a = gsub(pattern=S S , replacement=S , a) . the problem is that I don't want to affect instances where this might be a correct address such as the following: 3421 BIGS St what I want to say is switch out only if this is either of the following situations [beginning of char]S S S S S S[end of char] Is there anyway of making gsub or a similar function make the replacements I want? Thanks in advance for your help. ~Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stumped on how to reorder factors
I am trying to reorder a factor data type so that when I plot stats associated with the factor, the ordering makes sense. For instance, if I have a factor entered as follows ... A = as.factor(c(1, 10, 3, 3, 10, 10)) levels(A) ... the ordering does not really make sense (assuming I want the factor ordered by integer value), but I understand that this mis-ordering is because the ordering is based on a character string data type and not on an integer data type. Because I run into this problem frequently, I wrote a small function to fix this: reorder_factor = function(x, x_sum, decreasing=FALSE){ factor(as.character(x), levels=levels(x)[order(x_sum, decreasing=decreasing) ]) } I can then run the following code to fix the problem: A = reorder_factor(x=A, x_sum=as.numeric(levels(A)), decreasing=FALSE) levels(A) ... and now I have correctly ordered integers. Perhaps not the most elegant solution, but it worked for my purposes. Now I have a more complicated problem and I need help. Assuming the following factor: B = as.factor(c(Engine 1, Engine 10, Ladder 3, Engine 3, Ladder 10, Engine 10)) levels(B) I would like the factor ordered first by the proceeding unit type and then ordered by the following integer. In this case, I would like to see this order: Engine 1, Engine 3, Engine 10, Ladder 3, Ladder 10. I have tried many different ways of separating out the unit type from the number, but am having trouble figuring out a good way of achieving this factor order. For such a small example, I could obviously manually change the order, but I am dealing with much larger datasets with many unit types and up to 20 different numbers for each unit type. Having an automated way of ordering these units would be a huge help. Thanks in advance for any help you can provide. --Markus Weisner [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dataframe selection using a multi-value key
I am merging two dataframes using a relational key (incident number and incident year), but not all the records match up. I want to be able to review only the records that cannot be merged for each individual dataframe (essentially trying to select records from one dataframe using a multi-value relational key from the other dataframe). The following code shows what I am trying to do. The final two lines of code do not work, but if somebody could figure out a workable solution, that would be great. Thanks. --Markus incidents = data.frame( INC_NO = c(1,2,3,4,5,6,7,8,9,10), INC_YEAR = c(2006, 2006, 2006, 2007, 2008, 2008, 2008, 2008, 2009, 2010), INC_TYPE = c(EMS, FIRE, GAS, MVA, EMS, EMS, EMS, FIRE, EMS, EMS)) responses = data.frame( INC_NO = c(1,2,2,2,3,4,5,6,7,8,8,8,9,10), INC_YEAR = c(2006, 2006, 2006, 2006, 2006, 2007, 2008, 2008, 2008, 2018, 2018, 2018, 2009, 2010), UNIT_TYPE = c(E2, E2, E5, T1, E7, E6, E2, E2, E1, E3, E7, T1, E7, E5)) merged_data = merge(incidents, responses, by=c(INC_NO, INC_YEAR)) relational_key = c(INC_NO, INC_YEAR) ## following does not work, but I want DF of incidents that did not merge up with responses incidents[incidents[,relational_key] %in% responses[,relational_key],] ## following does not work, but I want DF of responses that did not merge up with incidents responses[responses[,relational_key] %in% incidents[,relational_key],] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe selection using a multi-value key
Hi Erik and Jim. Both solutions did the trick. Thanks you!! --Markus On Tue, Sep 7, 2010 at 9:05 PM, jim holtman jholt...@gmail.com wrote: try this: merged_data = merge(incidents, responses, by=c(INC_NO, INC_YEAR), all=TRUE) # responses that don't match subset(merged_data, is.na(INC_TYPE), select=c(INC_NO, INC_YEAR, UNIT_TYPE)) INC_NO INC_YEAR UNIT_TYPE 11 8 2018E3 12 8 2018E7 13 8 2018T1 # incidents that don't match subset(merged_data, is.na(UNIT_TYPE), select=c(INC_NO, INC_YEAR, INC_TYPE)) INC_NO INC_YEAR INC_TYPE 10 8 2008 FIRE On Tue, Sep 7, 2010 at 8:25 PM, Markus Weisner ma...@me.com wrote: I am merging two dataframes using a relational key (incident number and incident year), but not all the records match up. I want to be able to review only the records that cannot be merged for each individual dataframe (essentially trying to select records from one dataframe using a multi-value relational key from the other dataframe). The following code shows what I am trying to do. The final two lines of code do not work, but if somebody could figure out a workable solution, that would be great. Thanks. --Markus incidents = data.frame( INC_NO = c(1,2,3,4,5,6,7,8,9,10), INC_YEAR = c(2006, 2006, 2006, 2007, 2008, 2008, 2008, 2008, 2009, 2010), INC_TYPE = c(EMS, FIRE, GAS, MVA, EMS, EMS, EMS, FIRE, EMS, EMS)) responses = data.frame( INC_NO = c(1,2,2,2,3,4,5,6,7,8,8,8,9,10), INC_YEAR = c(2006, 2006, 2006, 2006, 2006, 2007, 2008, 2008, 2008, 2018, 2018, 2018, 2009, 2010), UNIT_TYPE = c(E2, E2, E5, T1, E7, E6, E2, E2, E1, E3, E7, T1, E7, E5)) merged_data = merge(incidents, responses, by=c(INC_NO, INC_YEAR)) relational_key = c(INC_NO, INC_YEAR) ## following does not work, but I want DF of incidents that did not merge up with responses incidents[incidents[,relational_key] %in% responses[,relational_key],] ## following does not work, but I want DF of responses that did not merge up with incidents responses[responses[,relational_key] %in% incidents[,relational_key],] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] to extend data.frame or not ... that is the question
(EMS, FIRE, FIRE, EMS, FIRE, FIRE), unit=c(E1, E5, T1, E3, E1, T1), response_time=c(300,400,350,250,500,200)) data = as.CAD(DF) ### test methods on CAD example head(data) tail(data) subset(data, data$unit %in% c(E5, T1)) as.data.frame(data[2, c(incident_num, unit)]) ### *Markus Weisner*, Firefighter Medic and GIS Analyst Charlottesville Fire Department 203 Ridge Street Charlottesville, Virginia 22901 (434) 970-3240 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using setMethod or setGeneric to change S4 accessor symbol from @ to $
Thanks so much for your help. I am realizing that I may be over-complicating things for myself. I have learned a ton about creating methods, but I feel like I am trying to reinvent the data.frame class. Basically, I am trying to create a data.frame type object where I can enforce the header names and column data types. I am trying to force the user to setup the following fields: - event_number (character) - agency (factor) - unit_num (factor) - alarm (POSIXct) - priority (factor) A user might use the following code: event_number = c(1:5) agency = c(CFD, rep(ACFR, 3), CFD) unit_num = c(E1, T10, E3, E2, BC1) temp = c(00:52:35, 06:58:18, 13:42:18, 20:59:45, 21:19:00) alarm = as.POSIXct(strptime(temp, format=%H:%M:%S)) priority = c(A, E, A, C, C) data = data.frame(event_number=event_number, agency=agency, unit_number=unit_num, alarm=alarm, priority=priority) I have all sorts of functions that I am trying to incorporate into a package for analyzing fire department data, but keep having problems with small deviations in data format causing errors. In this example, the following might cause issues in my functions: - event_number should be of type character - agency, unit_number, and priority, should be of type factor - unit_number should actually have name unit_num Ideally, I would be able to extend either the actual data.frame class or something similar, so that users are forced to create correctly formatted data.frames ... something that would create error messages until the user uses the following code: data = data.frame(event_number=as.character(event_number), agency=as.factor(agency), unit_num=as.factor(unit_num), alarm=alarm, priority=as.factor(priority)) After a user has created a correctly formatted object, the user may need to manipulate the data prior to applying the analysis functions. For instance, a user might just want to analyze data for Engine #1 (unit_num == E1). Because of the need to manipulate data, I am trying to maintain all the same functionality as a data frame ... subset(), head(), [i,j], et cetera. Just wondering if you think creating a new S4 class is the way to go. So far I got the head(), tail(), and subset() methods working for my new S4 class, but the [ seems like a pretty big undertaking. Is there something easier you might recommend? Would it be possible to extend the data.frame class to include some data verifications? If so, do you have some basic pointers for setting something like that up? Really appreciate all your help thus far. Hopefully, one last advice email will do the trick. Thanks. --Markus On Mon, Feb 8, 2010 at 6:43 PM, Martin Morgan mtmor...@fhcrc.org wrote: On 02/08/2010 02:54 PM, Markus Weisner wrote: Thanks. Used getGeneric([) to figure out the general format for the setMethod, but am having some problem with how to set up the actual function: getGeneric([) standardGeneric for [ defined from package base function (x, i, j, ..., drop = TRUE) standardGeneric([, .Primitive([)) environment: 0x116513c30 Methods may be defined for arguments: x, i, j, drop Use showMethods([) for currently available ones. Based on this, I set up the following code: setClass(A, representation(a=numeric, b=numeric)) data = new(A, a=1:10, b=1:10) setMethod([, A, function(x, i, j, ..., drop) { slotnames - slotNames(x)[j] new_ = new(A) for(slot in slotnames) new_d...@slot = x...@slot[i] new_data }) data[5,c(a)] probably there are several issues and covering them in an email response won't do them justice. instead of new_d...@slot, use slot(new_data, slot) [ dispatches on four arguments, and likely the cases need to be handled differently (e.g., data[,a] vs. data[,TRUE] vs data[1,]). So you'll end up with methods setMethod([, c(A, missing, character, ANY), ... setMethod([, c(A, missing, logical, ANY), ... setMethod([, c(A, numeric, missing, ANY), ... plus others, or you'll write something like setMethod([, c(A, missing, ANY, ANY), function(x, i, j, ..., drop=TRUE) { if (is.character(j)) j - match(j, slotNames(x)) j - slotNames(x)[j] ... }) (ANY is implicit, it's unlikely you'll ever dispatch on 'drop', so a signature for [ often omits teh fourth signature element). You'll aim for re-use, so likely the methods are all wrappers around some simple function .subset_A(x, i, j, drop) where i, j are the types that'll work. x...@slot - value and slot(x, slot) - value make (at least) one copy of x each time they're invoked, so your code above is making multiple copies of the data. One strategy is not to define an 'initialize' method and gain the benefit of the default method as a kind of copy constructor, along the lines of initialize(x, a=slot(x, a)[j], b=slot(x, b)[j]) if the subset were to be of slots a and b. You said your objective was to write a kind of enhanced
[R] using setMethod or setGeneric to change S4 accessor symbol from @ to $
I created some S4 objects that are essentially data frame objects. The S4 object definitions were necessary to verify data integrity and force a standardized data format. I am, however, finding myself redefining all the typical generic functions so that I can still manipulate my S4 objects as if they were data frames ... I have used setMethod to set methods for subset, head, and tail. I would like to use setMethod or setGeneric to enable me to use object$slotname to access obj...@slotname for my S4 objects. Any advice is appreciated. Thanks. --Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using setMethod or setGeneric to change S4 accessor symbol from @ to $
Worked like a charm!! Thank you so much. I just plugged the following into my code ... setMethod($, CADresponses, function(x, name) slot(x, name)) ... and it worked perfect. If you don't mind, I have a quick follow up question, using your example setClass(A, representation(a=numeric, b=numeric)) setMethod($, A, function(x, name) slot(x, name)) data = new(A, a=1:10, b=1:10) data$a[5] #now works thanks to your code data$a[5] - 200 #assignments do not work -- any ideas? data[5,c(a)] = 200 #would also like this to work -- any ideas? Do you have any suggestions for getting assignments and brackets to work as they would for data frames? Thanks so much for your help. Best, Markus On Mon, Feb 8, 2010 at 2:44 PM, Martin Morgan mtmor...@fhcrc.org wrote: On 02/07/2010 08:31 PM, Markus Weisner wrote: I created some S4 objects that are essentially data frame objects. The S4 object definitions were necessary to verify data integrity and force a standardized data format. I am, however, finding myself redefining all the typical generic functions so that I can still manipulate my S4 objects as if they were data frames ... I have used setMethod to set methods for subset, head, and tail. I would like to use setMethod or setGeneric to enable me to use object$slotname to access obj...@slotname for my S4 objects. Any advice is appreciated. Thanks. Hi Markus -- setClass(A, representation(a=numeric)) [1] A new(A)$a Error in new(A)$a : $ operator not defined for this S4 class getGeneric($) standardGeneric for $ defined from package base function (x, name) standardGeneric($, .Primitive($)) environment: 0xa62028 Methods may be defined for arguments: x Use showMethods($) for currently available ones. setMethod($, A, function(x, name) slot(x, name)) [1] $ new(A, a=1:10)$a [1] 1 2 3 4 5 6 7 8 9 10 new(A, a=1:10)$b Error in slot(x, name) : no slot of name b for this object of class A does that help? Martin --Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using setMethod or setGeneric to change S4 accessor symbol from @ to $
Thanks. Used getGeneric([) to figure out the general format for the setMethod, but am having some problem with how to set up the actual function: getGeneric([) standardGeneric for [ defined from package base function (x, i, j, ..., drop = TRUE) standardGeneric([, .Primitive([)) environment: 0x116513c30 Methods may be defined for arguments: x, i, j, drop Use showMethods([) for currently available ones. Based on this, I set up the following code: setClass(A, representation(a=numeric, b=numeric)) data = new(A, a=1:10, b=1:10) setMethod([, A, function(x, i, j, ..., drop) { slotnames - slotNames(x)[j] new_ = new(A) for(slot in slotnames) new_d...@slot = x...@slot[i] new_data }) data[5,c(a)] The problem is that I cannot access S4 object slots using @ and a character variable. I also cannot access a slot using the typical brackets since that is what I am trying to define here. Kind of stuck. Thanks for any advice you might have. Best, Markus On Mon, Feb 8, 2010 at 4:54 PM, Martin Morgan mtmor...@fhcrc.org wrote: On 02/08/2010 01:22 PM, Markus Weisner wrote: Worked like a charm!! Thank you so much. I just plugged the following into my code ... setMethod($, CADresponses, function(x, name) slot(x, name)) ... and it worked perfect. If you don't mind, I have a quick follow up question, using your example setClass(A, representation(a=numeric, b=numeric)) setMethod($, A, function(x, name) slot(x, name)) data = new(A, a=1:10, b=1:10) data$a[5] #now works thanks to your code data$a[5] - 200 #assignments do not work -- any ideas? same idea, but for $- setClass(A, representation(a=numeric)) [1] A getGeneric($-) standardGeneric for $- defined from package base function (x, name, value) standardGeneric($-, .Primitive($-)) environment: 0x14c33a8 Methods may be defined for arguments: x, value Use showMethods($-) for currently available ones. setReplaceMethod($, A, function(x, name, value) { + slot(x, name) - value + x + }) [1] $- a - new(A, a=1:10) a$a - 10:1 a An object of class A Slot a: [1] 10 9 8 7 6 5 4 3 2 1 data[5,c(a)] = 200 #would also like this to work -- any ideas? Do you have any suggestions for getting assignments and brackets to work as they would for data frames? Thanks so much for your help. same approach, but using getGeneric([) and getGeneric([-) to guide you. Martin Best, Markus On Mon, Feb 8, 2010 at 2:44 PM, Martin Morgan mtmor...@fhcrc.org wrote: On 02/07/2010 08:31 PM, Markus Weisner wrote: I created some S4 objects that are essentially data frame objects. The S4 object definitions were necessary to verify data integrity and force a standardized data format. I am, however, finding myself redefining all the typical generic functions so that I can still manipulate my S4 objects as if they were data frames ... I have used setMethod to set methods for subset, head, and tail. I would like to use setMethod or setGeneric to enable me to use object$slotname to access obj...@slotname for my S4 objects. Any advice is appreciated. Thanks. Hi Markus -- setClass(A, representation(a=numeric)) [1] A new(A)$a Error in new(A)$a : $ operator not defined for this S4 class getGeneric($) standardGeneric for $ defined from package base function (x, name) standardGeneric($, .Primitive($)) environment: 0xa62028 Methods may be defined for arguments: x Use showMethods($) for currently available ones. setMethod($, A, function(x, name) slot(x, name)) [1] $ new(A, a=1:10)$a [1] 1 2 3 4 5 6 7 8 9 10 new(A, a=1:10)$b Error in slot(x, name) : no slot of name b for this object of class A does that help? Martin --Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] documenting methods for S4 objects
I put out an email last night with output from my package check. Many thanks to Uwe and Liviu who got back to me quickly about how to fix some of the errors. I realize though, that I had some additional questions / confusion points in terms of documenting packages. 1) I seem to understand how to document a straightforward function, but am having trouble figuring out how to document methods functions when creating new S4 objects. For instance, I am trying to document the following function: setMethod(tail, NFIRS, function(x, n=5) tail(as.data.frame(x), n=n)) So far, I have the following for my .Rd file (built automatically by package.skeleton) \name{tail-methods} \docType{methods} \alias{tail-methods} \alias{tail,ANY-method} \alias{tail,NFIRS-method} \title{ ~~ Methods for Function tail ~~} \description{ ~~ Methods for function \code{tail} ~~ } \section{Methods}{ \describe{ \item{x = ANY}{ ~~describe this method here } \item{x = NFIRS}{ ~~describe this method here } }} \keyword{methods} \keyword{ ~~ other possible keyword(s)} Since the tail method is already defined for R and my function works the exact same, do I still need to make a man file for this function? Do I just copy the information from the R tail function (i.e. Title: Return the First or Last Part of an Object). What do I put down under describe (the items don't really make sense to me since my tail function requires two things 1) an object of type NFIRS and 2) n, the number of rows to display -- I don't understand what x=ANY and x=NFIRS means)? Do I need to add any keywords, or can I just erase the other possible keywords line? 2) Do all functions in a package need to be documented even if the user will never use them. I have lots of little support functions that the user should never really use. Just wondering if all packages submitted to CRAN are required to have every last function documented. Thanks for your help. --Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] debugging package
I am trying to debug a package to submit it to CRAN and am getting a bunch of error messages. Most of the errors are because of the Rd files which were automatically populated by the package.skeleton function. I find the section on documentation to be pretty confusion in the R Extensions manual. Any help on getting these errors fixed would be hugely appreciated. Thanks. --Markus * checking for working pdflatex ... OK * using log directory '/Users/markus/Dropbox/NFIRS_S4/NFIRS.Rcheck' * using R version 2.9.2 Patched (2009-09-24 r50179) * using session charset: UTF-8 * checking for file 'NFIRS/DESCRIPTION' ... OK * checking extension type ... Package * this is package 'NFIRS' version '1.0' * checking package dependencies ... OK * checking if this is a source package ... OK * checking for executable files ... OK * checking whether package 'NFIRS' can be installed ... WARNING Found the following significant warnings: missing link(s): ~~fun~~ CLASSNAME-class See '/Users/markus/Dropbox/NFIRS_S4/NFIRS.Rcheck/00install.out' for details. * checking package directory ... OK * checking for portable file names ... OK * checking for sufficient/correct file permissions ... OK * checking DESCRIPTION meta-information ... OK * checking top-level files ... OK * checking index information ... OK * checking package subdirectories ... OK * checking R files for non-ASCII characters ... OK * checking R files for syntax errors ... OK * checking whether the package can be loaded ... OK * checking whether the package can be loaded with stated dependencies ... OK * checking for unstated dependencies in R code ... OK * checking S3 generic/method consistency ... OK * checking replacement functions ... OK * checking foreign function calls ... OK * checking R code for possible problems ... OK * checking Rd files ... ERROR Rd files with likely Rd problems: Unaccounted top-level text in file 'NFIRS-class.Rd': Following section 'note': \n\n ~Make other sections like Warning with \\section{Warning }{} ~\n\n Unaccounted top-level text in file 'NFIRS-package.Rd': Following section 'references': \n~~ Optionally other standard keywords, one per line, from file KEYWORDS in the R documentation directory ~~\n Rd files with missing or empty '\title': NFIRS.summary.Rd read.NFIRS.Rd Rd files without 'description': NFIRS.summary.Rd read.NFIRS.Rd Rd files without 'title': NFIRS.summary.Rd read.NFIRS.Rd These entries are required in an Rd file. Rd files with non-standard keywords: as.data.frame-methods.Rd: ~~ other possible keyword(s) head-methods.Rd: ~~ other possible keyword(s) NFIRS.summary.Rd: ~kwd1 ~kwd2 read.NFIRS.Rd: ~kwd1 ~kwd2 summary-methods.Rd: ~~ other possible keyword(s) tail-methods.Rd: ~~ other possible keyword(s) Each '\keyword' entry should specify one of the standard keywords (as listed in file 'KEYWORDS' in the R documentation directory). Rd files with duplicated alias 'as.data.frame,NFIRS-method': as.data.frame-methods.Rd NFIRS-class.Rd Rd files with duplicated alias 'head,NFIRS-method': head-methods.Rd NFIRS-class.Rd Rd files with duplicated alias 'summary,NFIRS-method': NFIRS-class.Rd summary-methods.Rd Rd files with duplicated alias 'tail,NFIRS-method': NFIRS-class.Rd tail-methods.Rd See the chapter 'Writing R documentation files' in manual 'Writing R Extensions'. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package license questions
I am looking for some advice on licenses. Here is my situation: Over the last couple years, I have developed a rather large number of fire department analysis functions. I am in the process of trying to publish some packages to make these functions available to the public. I am trying to release two packages that essentially define S4 classes for common types of fire department data. Then, I would like to publish a package that essentially reads in these fire department data types and returns analysis results. My concern is that I may eventually want to build and sell some proprietary functions and I am trying not to box myself out of this future option. It is my understanding that if I use the GPL license, all work based on my packages would have to take on the GPL license (effectively making it impossible to sell software). It looks like the Lesser General Public License (LGPL) may suit my needs by allowing me to make public my current work without eliminating the possibility of future proprietary work. I have a couple questions I am hoping somebody can answer. - It says that libraries licensed under a LGPL can be used by proprietary software. What is meant by libraries? Are class definitions and functions considered libraries? - If I use the LGPL for all my packages, do I maintain the right to build and sell software that is based on these LGPL packages? For instance, could I use the class definitions from a LGPL package as inputs for analysis in a piece of proprietary software? - Other than potentially allowing competitors to also use my LGPL packages in their proprietary software, are there any big disadvantages to using a LGPL? - If somebody improves on my LGPL S4 class definitions, can I still then use them in a proprietary package despite their being modified? I am a big supporter of the open source community and have personally benefitted greately from open source software. My intentions are to release my work as open source, but just don't want to be boxed out of future proprietary developments. These licenses can be pretty confusing, so I appreciate any information that can help me figure this out. Thanks, Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] projecting GIS coordinates for analysis with spatstat package
I am working on creating an R package for doing fire department analysis and am trying to create a function that can display emergency incident densities. The following code sort of does the trick, but I need a display that shows the number of incidents per square mile. I believe the code below shows incidents per square unit (in this case, degrees lat/long). To solve this problem, I believe that I need to convert the coordinates (currently WGS84) to some projection that is based on miles rather than degrees lat/long. Does anybody know the code for projecting coordinates so that my density plot will show incidents per sq-mile? If there is a simpler way of displaying incident densities than using the spatstat package, please let me know. Thanks, Markus #create data data = data.frame(xcoord=c(-123.1231, -123.0245, -123.1042, -123.1555, -123.1243, -123.0984, -123.1050, -123.0909, -123.1292, -123.0973, -123.0987, -123.1016, -123.2355, -123.1005, -123.1130, -123.1308, -123.1281, -123.1281, -123.1275, -123.1269, -123.1595, -123.1202, -123.1756, -123.0791, -123.0791, -123.0969, -123.0969, -123.0905, -123.0718, -123.0969, -123.1337, -123.1531, -123.1362, -123.1550, -123.0725, -123.1249, -123.1249, -123.1249, -123.1249, -123.1249, -123.1777, -123.1237, -123.1912, -123.0256, -123.1347, -123.1246, -123.1931, -123.0971, -123.0281, -123.0928), ycoord=c(49.27919, 49.23780, 49.24881, 49.27259, 49.26057, 49.25654, 49.25000, 49.28119, 49.27908, 49.28442, 49.28318, 49.27293, 49.25805, 49.28137, 49.22528, 49.26066, 49.27841, 49.27841, 49.28019, 49.27414, 49.24220, 49.27744, 49.23474, 49.28229, 49.28229, 49.27671, 49.27671, 49.25974, 49.26510, 49.27671, 49.29036, 49.26100, 49.27989, 49.26103, 49.27216, 49.27548, 49.27548, 49.27548, 49.27548, 49.27548, 49.23475, 49.27759, 49.24524, 49.26271, 49.20531, 49.26337, 49.23862, 49.28447, 49.20871, 49.28306), itype=c(Emergency Medical Service, Rescue, Service Call, Alarm Activation, Hazardous Condition, Motor Vehicle Accident, Emergency Medical Service, Emergency Medical Service, Fire, Alarm Activation, Emergency Medical Service, Motor Vehicle Accident, Emergency Medical Service, Emergency Medical Service, Emergency Medical Service, Alarm Activation, Alarm Activation, Alarm Activation, Emergency Medical Service, Emergency Medical Service, Emergency Medical Service, Alarm Activation, Emergency Medical Service, Hazardous Condition, Hazardous Condition, Motor Vehicle Accident, Motor Vehicle Accident, Motor Vehicle Accident, Alarm Activation, Motor Vehicle Accident, Emergency Medical Service, Motor Vehicle Accident, Alarm Activation, Emergency Medical Service, Emergency Medical Service, Fire, Fire, Fire, Fire, Fire, Motor Vehicle Accident, Emergency Medical Service, Emergency Medical Service, Motor Vehicle Accident, Alarm Activation, Emergency Medical Service, Alarm Activation, Fire, Emergency Medical Service, Emergency Medical Service)) #add necessary libraries library(sp) library(maptools) library(spatstat) library(RColorBrewer) #add coordinates to data coordinates(data) = c(xcoord, ycoord) #convert coordinates to spatstat point pattern dataset ppp_data = as(data[itype], ppp) #determine density of point pattern density_data = density.ppp(ppp_data) #plot density plot(density_data, col=brewer.pal(9, Reds)) -- Markus Weisner, Firefighter Charlottesville Fire Department 203 Ridge Street Charlottesville, Virginia 22901 (434) 970-3240 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.