Re: [R] Problem connecting to database via RPostgreSQL/RS-DBI: could not connect error
Hi Christian, thanks for responding! I wrote a reply to you when i first saw your post, but it looks like it didn't get to the list somehow. I'm still in Windows XP, though I imagine I'll have to switch over to 7 or something soon. soon. I think you are right - the problem is that I have not been able to successfully start the server. -- View this message in context: http://r.789695.n4.nabble.com/Problem-connecting-to-database-via-RPostgreSQL-RS-DBI-could-not-connect-error-tp4684534p4685414.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem connecting to database via RPostgreSQL/RS-DBI: could not connect error
On Feb 03, 2014; 7:24am, hadley wickham wrote: Or you're not running a postgres db on your local machine with that accepts a connection with username Administrator and no password? I doubt that's the error you would see if RPostgreSQL hadn't found libpq. I have learned enough since this posting to be aware that my question was based on assumptions so false to fact that they verge on the nonsensical, but not enough to reframe it more sensibly. So I am just going to say what I believed then and what I believe now, and perhaps you can head off the more extreme excursions from reality in my current beliefs. Because I knew that libpq depended on the version of PostgreSQL that was installed, and that it used to be the case that you could not just install the package, but it is now possible because of the addition of libpq, I concluded that libpq contained an installer for PostgreSQL. I now think that this is false, and that libpq is some sort of tailored interface that varies with the OS and the PostgreSQL version. Tailored where and by what, I still don't know, but maybe I do not need to know. Since I thought I had to use the version of PostgreSQL installed by libpq, I did not try to independently install PostgreSQL. Now I think that PostgreSQL has to be installed and working before you start to install RPostgreSQL, in order, among other things, to get the right libpq contents or settings. I spent most of a day trying to figure out how to do what I was thinking of as a strictly local installation of PostgreSQL, meaning one that did not connect to the internet via a port and TCP/IP. I now believe that there is no such thing, at least on a Windows machine, or that doing so would be complex and difficult, compared to a connection via a machine-internal TCP/IP connection -- something I did not know existed until the day before yesterday. For the last few days I have been trying, so far unsuccessfully, to install PostgreSQL from the (misnamed) one click installer from enterprisedb.com. I have figured out that the installation and error logs are written to stderr, and where my OS puts it. I've gone through the 40+ pages of log. Although the log contains more than 2 dozen warnings and errors, I now believe that all of the ones that matter derive in some way from this one: Executing batch file 'rad3BBD8.bat'... 'icacls' is not recognized as an internal or external command, operable program or batch file. . . . and that this means I am having some sort of problem with the Windows XP/NT access control/permissions system. Unfortunately I know nothing whatsoever about the Windows permission system. Learning about it is the next thing on my list. I am really quite good at framing feasible research agendas to address important policy questions on a shoestring. I'm good with regression, with displaying complicated argument in sensible graphics. I am new to using databases on my own behalf (rather than having someone to do it for me), but I am finding the query side of databases elegant and intuitive. But I am not a competent IT/sysadmin/tech support person, and I am not going to be one soon, or maybe ever. So I find it somewhat disheartening that I am spending so much time being just that. Still, I am very grateful for the help and support of this list and the various Stack Exchange lists, which have made possible such learning as I have been able to accomplish. Like the glaciers, my progress is slow but inexorable. Or so I like to believe. Warmest regards, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Problem-connecting-to-database-via-RPostgreSQL-RS-DBI-could-not-connect-error-tp4684534p4684775.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem connecting to database via RPostgreSQL/RS-DBI: could not connect error
In the description section of the RPostgreSQL package documentation, it states: In order to build and install this package from source, PostgreSQL itself must be present your system to provide PostgreSQL functionality via its libraries and header files. . . . On Microsoft Windows system the attached libpq library source will be used. I am not sure what attached means in this setting. After successfully running: install.packages('RPostgreSQL') When I tried to run: require('RPostgreSQL') drv - dbDriver(PostgreSQL, fetch.default.rec=1) con - dbConnect(drv, dbname=postgres) I get the error message: Error in postgresqlNewConnection(drv, ...) : RS-DBI driver: (could not connect Administrator@local on dbname postgres) I am guessing that this error is because PostgreSQL either does not exist or has been improperly installed. Since RPostgreSQL is supposed to work with the particular version of PostgreSQL that comes with it in libpq, I assume that either install.packages('RPostgreSQL') does this installation or I have to do it. I have not done it, because I can not find attached libpq library. There is no file or directory called either libpq or PostgreSQL in my \Program Files\ directory, my \R-3.0.2 directory, my \R-3.0.2 \library directory, or my R-3.0.2\library\RPostgreSQL directory, nor is such a file or folder in my working directory. There is no library called libpq on CRAN. There is no discussion that I could find of how to install it, where it comes from, or where installation puts it in either the package documentation or on the Google site for the package, http://code.google.com/p/rpostgresql/ At that site under the Source tab I did find some files at RPostgreSQL/src/libpq, but no information on what to do with them. I tried: require('libpq') and got: Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘libpg’ I though maybe I was misunderstanding what or where libpq is, and that I should be going to the PostgreSQL site and folowing their installation instructions without reference to R. But I have carefully gone over the documentation for dbDriver and dbConnect several times, and neither one gives me any way (except a name or an IP address for a remote server) to tell R where the PostgreSQL directory, program, or file are located. That seems to imply that the local database RPostgreSQL connects to must be in a location where R puts it. So I am missing something, but I do not know what. I am running R 3.0.2 through RStudio on a Windows XP 32-bit machine. Any help anyone could offer would be greatly Warmest regards, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Problem-connecting-to-database-via-RPostgreSQL-RS-DBI-could-not-connect-error-tp4684534.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Diagnostic and helper functions for defective hard-to-import files
Hi Folks! I have been writing a small set of utilities for dealing with files that are hard to open correctly for one reason or another, especially because they are too big for memory, non-rectangular, or contain odd characters or unexpected codings, or all of these things together. Today it suddenly hit me that this has probably been done, done better, and upgraded to package form a dozen times already. There were pointers to a couple functions useful in this regard in the Core Import/Export document. But my effort to come up with search terms that were productive of such packages was unsuccessful. I would be grateful if someone would point me toward such a package or packages if they exist. Warmest regards, andrewH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Diagnostic and helper functions for defective hard-to-import files
On Jan 28, 2014 at 8:56pm, David Winsemius wrote: On Jan 28, 2014, at 8:43 PM, andrewH wrote: Hi Folks! I have been writing a small set of utilities for dealing with files that are hard to open correctly for one reason or another, especially because they are too big for memory, non-rectangular, or contain odd characters or unexpected codings, or all of these things together. Today it suddenly hit me that this has probably been done, done better, and upgraded to package form a dozen times already. There were pointers to a couple functions useful in this regard in the Core Import/Export document. But my effort to come up with search terms that were productive of such packages was unsuccessful. I don't know of a package to do that. You know the quote from that Russian author whose name I am forgetting (in Anna Karinena perhaps) about happy families being all the same but unhappy families being impossible to classify. I think it applies to datasets as well. There are too many different dataset pathologies to allow a neat packaging approach. My approach has been to study the options in read.table very carefully and if that is insufficient look at either readLines or scan as options. It is very useful to be able to use `count.fields` with different parameter settings of quotes and comment.char. Wrapping it in table() can deliver a very compact, useful result. And don't forget to search the Archives if you have a regular but non-rectangular arrangement. David Winsemius Alameda, CA, USA Thanks, David! You have quickly summarized a set of techniques that it took me a long time to learn (much of it spent disentangling the truth from various misconceptions about the data-reading process. I don't think I have very much to add to your list, but as always, the effectiveness depends on correct implementation, and I have made a _lot_ of mistake in trying to implement these in the past. Moreover, all these thing become much more complicated if the file is too big to just read into a data frame. I am working with Census records right now, and my primary data file is a 14 gig csv that had me tearing my hair out trying to read it and pull out the variables I have needed at any given moment. I finally did get it read and the right subset extracted, but it was a pretty empirical process - I would just keep trying things that didn't work until I found something that did, often not quite understanding why my previous efforts had failed. I know that If I have to do this again six months from now I will have no idea how I did it. So I wanted to reduce the things that worked to functions and set up a sort of decision tree that I could work through to find and correct at least the more common problems. But I was hoping -- am still hoping, actually -- to find that someone else has already done this so I could get back to my real work. It seems like the sort of thing that could easily be buried in the 100+ pages of documentation of one of the big utility packages like Hmisc, MASS or car. I have often wished there was a data manipulation and import/export task view, with a purview to cover things like what I am talking about here, the contents of Phil Spector's book, and packages like Hadley Wickham's plyr. Warmest regards, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Diagnostic-and-helper-functions-for-defective-hard-to-import-files-tp4684357p4684364.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning default function arguments to themselves: Why?
Dear Bill-- I have seen it most often in functions that are defined or used inside of other functions, and need an argument from the calling function. So I have, purely as a matter of imitation, taken to doing it when I am writing a function that wants an argument of the calling function passed to it unchanged, because that is how I saw it used. So for instance, in read.table(), scan() is called several times with reflexive argument assignments that include: file = file what = what sep = sep quote = quote comment.char = comment.char allowEscapes = allowEscapes encoding = encoding Is this what you mean by an example that 'works'? I am sort of foggy on the shade of meaning conveyed by the single quotes. If not, let me know what kind of example you want, and I'll try and find it. -- View this message in context: http://r.789695.n4.nabble.com/Assigning-default-function-arguments-to-themselves-Why-tp4682294p4682817.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What purpose is served by reflexive function assignments?
Dear Duncan -- I am terribly sorry. I had a browser crash, and when I reopened it I found a tab with a Nabble composition box containing an unposted version of my question. So I thought I had never hit the send button, so I edited it a bit and sent it off. I should have checked first. My apologies. andrewH -- View this message in context: http://r.789695.n4.nabble.com/What-purpose-is-served-by-reflexive-function-assignments-tp4682794p4682818.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What purpose is served by reflexive function assignments?
Dear David-- Thanks so much for your helpful reply! David Winsemius wrote: The LHS X becomes a name, the RHS X will be looked up in the calling environment and fails if no value is positionally matched and then no X is found (at the time of the function definition. Does X really have to exist when the function is defined? I thought it was enough if it existed in the environment of the calling function, or somewhere up the environment chain of the calling function. If this is not true, then that means it matters a lot whether you write a function inside another function or just call it in that function. Suppose a function with a reflexive assignment X=X is defined in the global environment but called inside another function, and X has a different value in those two places. Will it look first in the global environment and only then in the calling environment? And is this different from the behavior without the reflexive assignment? I should not bother you with those questions. I should just run it both ways and see what happens.calling function and will it look first in the If you use`X - value` in the argument list, then what is returned is only the value and the name `X` may be lost. Or in the case of data.frame morphed into a strange name: [example omitted] I am not sure that I am understanding you correctly here. Are you saying that assignment using the = retains the name (and other attributes? which ones?) of the RHS, while - does not? -- View this message in context: http://r.789695.n4.nabble.com/What-purpose-is-served-by-reflexive-function-assignments-tp4682794p4682819.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What purpose is served by reflexive function assignments?
Dear Peter-- This is a truly wonderful explanation. It makes many things clear that were completely mysterious to me. For one thing, I realize that, for functions called inside the definitions of other functions, I have been confusing function definitions with function calls -- as if the called function were also being defined. So I have seen a lot of what I was calling reflexive assignments _inside_ of function definitions, but not as a _part of_ function definitions -- rather they are a part of the calls to other, already-defined functions that just happen to take place inside of function definitions. Let me sum up a few things I think I have learned, to make sure I am not merely hallucinating an improved understanding. 1. Outside of function definitions and calls, = and - are pretty similar in their effect. 2. Inside of the parentheses of a function call, - assigns the RHS to the variable on the LHS in the enclosing environment, so the value is picked up when the call is executed -- but is also a permanent change in the variable of that name in the enclosing environment. (This seems like an exception to the general no-side-effects rule, yes?) 3. Inside parentheses of a function call, = assigns the RHS to the LHS, but only in the environment of that function -- more like it was in the function body. 4. Inside the parentheses of a function definition, you can not do a - assignment at all, and = has a a pretty different secondary meaning, a sort of conditional assignment, along the lines of if you can not match the argument before this positionally, use the value of the RHS. 5. The names of formals in the function definition do not matter outside the function. Only their position (or an = assignment) matters. You can not get a function to recognize a variable in its surrounding environment because it has the same name as the name of a formal in the function definition. Conversely, inside the function, only the formal name matters. if f-function(Y){X}, f(X) still gets you an argument X is missing error. 6. In a call, f(x=x) is different from f(x) if and only if x has a default value different from the value of x in the calling environment. 7. I have learned by experiment that a reflexive assignment with = in a function definition does not assign the value of x in the calling environment as the default, though I do not know why. In fact, I have not been able to make X=X do anything useful (and different from plain X) inside the parentheses of a function definition, unless I am trying to generate strange recursive default error warnings. So the simple rule i wanted with respect to function definitions is just say no. Is that more or less right? Many thanks! --andrewH plangfelder wrote On Sat, Dec 28, 2013 at 7:27 PM, Andrew Hoerner lt; ahoerner@ gt; wrote: Let us suppose that we have a function foo(X) which is called inside another function, bar(). Suppose, moreover, that the name X has been assigned a value when foo is called: X - 2 bar(X=X){ foo(X) } I have noticed that many functions contain arguments with defaults of the form X=X. Call this reflexive assignment of arguments. Your example code makes little sense, it throws an error even before reaching foo(): X - 2 bar(X=X){ Error: unexpected '{' in bar(X=X){ foo(X) Error: could not find function foo } Error: unexpected '}' in } What you may have in mind is something like bar = function(X) { foo(X) } X-2 bar(X=X) Note that bar(X=4) is different from bar(X-4), as seen here: # Define a trivial function bar = function(X) {X+2} X = 0 bar(X=2) [1] 4 # Here only the formal argument X of function bar was set to 2; the global variable X was left untouched: X [1] 0 # This assigns the value 4 to the global variable X and uses that value as the value for the first formal argument of bar(): bar(X-4) [1] 6 # Note that X changed in the global environment X [1] 4 What you call reflexive assignment X=X is not really: the left hand side is the formal argument of bar(), the right hand side is the variable X in the calling environment of bar() (in this case global environment). Oh yes, and it has absolutely nothing to do with defaults. If you use my example above, the default for the argument X is 2, but doing X=0 bar(X=X) will call the function with argument X=0, not X=2. When there is only one argument, saying X=X does not make much sense, but when there are many arguments, say bar = function(X=0, Y=0, Z=0) and you only want to set the argument Z to a value you call Z in the calling function, saying bar(Z=Z) makes perfect sense and is very different from saying bar(Z) which would set the argument X to value Z, and leave argument Z at the default. Hope this helps. Peter __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R
Re: [R] What purpose is served by reflexive function assignments?
Dear Ista-- Peter's post has already persuaded me that my original question was based on several misunderstandings and so difficult if not impossible to follow -- though he did a remarkable job of figuring out where I was going astray and what examples might set me right. But I will post the results of two of my experiments that I still find puzzling. This generates a recursive default error in the cat function. I do not see why it does not print 5: X - 2 gg - function(X=X){cat(gg: , X)} ss- function(X){ X - 5 gg() } ss() And this generates an 'x' is missing error in x-y. I expected it to return the number -1: x-1 y-2 foo- function(x=x,y=y){x-y} foo() Thanks so much for your time and attention! andrewH Ista Zahn wrote On Sat, Dec 28, 2013 at 10:27 PM, Andrew Hoerner lt; ahoerner@ gt; wrote: Let us suppose that we have a function foo(X) which is called inside another function, bar(). Suppose, moreover, that the name X has been assigned a value when foo is called: X - 2 bar(X=X){ foo(X) } I have noticed that many functions contain arguments with defaults of the form X=X. An example would be really helpful here. Call this reflexive assignment of arguments. Why call this anything special? All this does is set the default value of the X argument. I'm not sure what makes this reflexive, or why it needs a special descriptive term. How is foo(X=X) different from foo(X)? Isn't the environment from which X is located the foo(X) is hardcoded, foo(X = X) just sets a default. parent environment of foo() in either case? Or if it looks first in the environment inside of foo, will it not immediately pop up to the parent environment if it is not found in foo? Are reflexive assignments just to keep X from being positionally assigned accidentally, or are they doing something deeper? Moreover, this is the only place I have seen people consistently using an equals sign in place of the usual -, and I am confident that there is some subtle difference in how the two assignment operators work, perhaps beyond the ken of lesser mortals like myself, that explains why the = is preferred in this particular application. Again, some examples would really help here. Actually, although I would like to hear the deep answer, which I am sure has something to do with scoping, as everything really confusing in R does, my real question is, is there some rule of thumb by which one could decide whether or not to do a reflexive assignment in a function definition and be right most of the time? I'm still not even sure what reflexive assignment means. Can you clarify, preferably with some examples. Lately I have gotten several Error: Promise is already under evaluation messages, and my current rule of thumb for dealing with this is to add reflexive assignment to the variable if it is missing and take it out if it is present. This seems to work, but it makes me feel unintelligent. Is there a better rule? I would be most grateful for anyone who could shed light on the subject. Perhaps someone can, but you will certainly make their job easier if you provide a concrete example that produces this error. Best, Ista Sincerely, andrewH -- J. Andrew Hoerner Director, Sustainable Economics Program Redefining Progress (510) 507-4820 [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/What-purpose-is-served-by-reflexive-function-assignments-tp4682794p4682827.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning default function arguments to themselves: Why?
Dear Bill-- I have figured out that my original question and my most recent response to you were largely nonsensical bits of idiocy. Please do not trouble yourself with them further. But I do thank you most sincerely for your time and attention. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Assigning-default-function-arguments-to-themselves-Why-tp4682294p4682828.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The Stoppa distribution
Thanks enormously, Bill! I'll run with this for a while, and let you know how it works for me. Yours, andrewH -- View this message in context: http://r.789695.n4.nabble.com/The-Stoppa-distribution-tp4682171p4682254.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I find nonstandard or control characters in a large file?
Thanks, Earl. Your utility ran like a charm, and confirmed that my effort to adapt Enrico's code to this purpose had not gone astray, which is to say, I found no funky characters. Your help is greatly appreciated. Sincerely, andrewH On Tue, Dec 10, 2013 at 7:35 AM, Earl F Glynn [via R] ml-node+s789695n4681952...@n4.nabble.com wrote: andrewH wrote: However, my suspicion is that there are some funky characters, either control characters or characters with some non-standard encoding, somewhere in this 14 gig file. Moreover, I am concerned that these characters may cause me trouble down the road even if I use a different approach to getting columns out of the file. This is not an R solution, but here's a Windows utility I wrote to produce a table of frequency counts for all hex characters x00 to xFF in a file. http://www.efg2.com/Lab/OtherProjects/CharCount.ZIP Normally, you'll want to scrutinize anything below x20 or above x7F, since ASCII printable characters are in the range x20 to x7E. You can see how many tab (x09) characters are in the file, and whether the line endings are from Linux (x0A) or Windows (paired x0A and x0D). The ZIP includes Delphi source code, but provides a Windows executable. I made a change several months ago to allow drag-and-drop, so you can just drop the file on the application to have the characters counted. Just run the EXE after unzipping. No installation is needed. Once you find problems characters in the file, you can read the file as character data and use sub/gsub or other tools to remove or alter problem characters. efg Earl F Glynn UMKC School of Medicine Center for Health Insights __ [hidden email] http://user/SendEmail.jtp?type=nodenode=4681952i=0mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896p4681952.html To unsubscribe from How can I find nonstandard or control characters in a large file?, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4681896code=YWhvZXJuZXJAcnByb2dyZXNzLm9yZ3w0NjgxODk2fC0yMDQ3NjI1NDM5 . NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- J. Andrew Hoerner Director, Sustainable Economics Program Redefining Progress (510) 507-4820 -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896p4682257.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I find nonstandard or control characters in a large file?
I have a humongous csv file containing census data, far too big to read into RAM. I have been trying to extract individual columns from this file using the colbycol package. This works for certain subsets of the columns, but not for others. I have not yet been able to precisely identify the problem columns, as there are 731 columns and running colbycol on the file on my old slow machine takes about 6 hours. However, my suspicion is that there are some funky characters, either control characters or characters with some non-standard encoding, somewhere in this 14 gig file. Moreover, I am concerned that these characters may cause me trouble down the road even if I use a different approach to getting columns out of the file. Is there an r utility will search through my file without trying to read it all into memory at one time and find non-standard characters or misplaced (non-end-of-line) control characters? Or some R code to the same end? Even if the real problem ultimately proves top be different, it would be helpful to eliminate this possibility. And this is also something I would routinely run on files from external sources if I had it. I am working in a windows XP environment, in case that makes a difference. Any help anyone could offer would be greatly appreciated. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] filehash error in the colbycol method for as.data.frame from a large object
Dear Folks-- I have a 14 gig .csv file with 731 columns. I have read it into a colbycol object (which took overnight – about 16 hours) using the code below, which produced no warnings or error messages. The object, CPS62_12, is 49 gig. After the reading, summary() produced the output below and colnames() successfully returned the names of all 731 columns. CPS62_12 - cbc.read.table( + C:\\R_PROJ\\INEQ_TRENDS\\TESTS\\monofile_ALLVARS\\cps_00078.csv, + header = T, sep = , ) summary(CPS62_12) Object of class colbycol with 8093281 rows and 731 columns. Data for the object is stored at C:\DOCUME~1\ADMINI~2\LOCALS~1\Temp\RtmpMj3LRP\dir1a6d82a1df37. nrow(CPS62_12) [1] 8093281 ncol(CPS62_12) [1] 731 colnames(CPS62_12) [1] RECTYPE YEARSERIAL MISH etc. I then ran as.data.frame() (code below) and got the following error and warning message: income_HH_CPS -as.data.frame(CPS62_12, + c(YEAR, STATEFIP, RECTYPE, SERIAL, HWTSUPP, HHINCOME, NUMPREC)) Error in readSingleKey(con, map, key) : unable to obtain value for key 'RECTYPE' In addition: Warning message: In readKeyMap(filecon) : NAs introduced by coercion I tried the command on a number of column name combinations and subsequently always got the error message without the warning. The error is always on RECTYPE, which is the name of the first column in the csv file. I am as yet not able to reproduce this error on a smaller object. I copied the first 10 lines of my file into an object by using a connection with readLines. I evaluated the object in the console, and passed the result into notepad, and saved it. Then I manually sliced off all but the first 15 variables. The resulting file sailed through the code above and produced a data frame faultlessly. This undercut my leading theory, which was that the slash double-quotes (/) that bracketed the column names were causing the problem. I tried running cbc.get.col on the second variable in the file, YEAR. These two commands: yearCPS -cbc.get.col(CPS62_12, YEAR) yearCPS -cbc.get.col(CPS62_12, 2) both resulted in the following error message: Error in readSingleKey(con, map, key) : unable to obtain value for key 'YEAR' Note that numerical indexing still returned an error on the variable name, YEAR. I got the same result for several other variables, returning their own names I tracked the error message back to the following function in the filehash package: readSingleKey - function(con, map, key) { start - map[[key]] if(is.null(start)) stop(gettextf(unable to obtain value for key '%s', key)) seek(con, start, rw = read) unserialize(con) } Now I am at a loss. I see that the element “key” of the list “map” has the value NULL, that any call to as.data.frame uses RECTYPE as the key, and that any call to cbc.get.col() uses the passed variable name as a key, even those that only pass a number. But I don’t know much of anything about file hashing, and I have run out of ideas. Can anyone tell me what I am doing wrong, or whether there is a particular problem with my file that is likely to be causing this problem, or what my next diagnostic step should be? Please be aware that I can only do things I can run on 3 gig of ram. I am running R under RStudio 0.97.551, on a Widows XP machine with Service Pack 3. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/filehash-error-in-the-colbycol-method-for-as-data-frame-from-a-large-object-tp4681052.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Learning the R way – A Wish
There is something that I wish I had that I think would help me a lot to be a better R programmer, that I think would probably help many others as well. I put the wish out there in the hopes that someone might think it was worth doing at some point. I wish I had the code of some substantial, widely used package – lm, say – heavily annotated and explained at roughly the level of R knowledge of someone who has completed an intro statistics course using R and picked up some R along the way. The idea is that you would say what the various blocks of code are doing, why the authors chose to do it this way rather than some other way, point out coding techniques that save time or memory or prevent errors relative to alternatives, and generally, to explain what it does and point out and explain as many of the smarter features as possible. Ideally, this would include a description at least at the conceptual level if not at the code level of the major C functions that the package calls, so that you understand at least what is happening at that level, if not the nitty-gritty details of coding. I imagine this as a piece of annotated code, but maybe it could be a video of someone, or some couple of people, scrolling through the code and talking about it. Or maybe something more like a wiki page, with various people contributing explanations for different lines, sections, and practices. I am learning R on my own from books and the internet, and I think I would learn a lot from a chatty line-by-line description of some substantial block of code by someone who really knows what he or she is doing – perhaps with a little feedback from some people who are new about where they get lost in the description. There are a couple of particular things that I personally would hope to get out of this. First, there are lots of instances of good coding practice that I think most people pick up from other programmers or by having individual bits of code explained to them that are pretty hard to get from books and help files. I think this might be a good way to get at them. Second, there are a whole bunch of functions in R that I call meta-programming functions – don’t know if they have a more proper name. These are things that are intended primarily to act on R language objects or to control how R objects are evaluated. They include functions like call, match.call, parse and deparse, deparen, get, envir, substitute, eval, etc. Although I have read the individual documentation for many of these command, and even used most of them, I don’t think I have any fluency with them, or understand well how and when to code with them. I think reading a good-sized hunk of code that uses these functions to do a lot of things that packages often need to do in the best-practice or standard R way, together with comments that describe and explain them would help a lot with that. (There is a good smaller-scale example of this in Friedrich Leisch’s tutorial on creating R packages). These are things I think I probably share with many others. I actually have an ulterior motive for suggesting lm in particular that is more peculiar to me, though not unique I am sure. I would like to understand how formulas work well enough to use them in my own functions. I do not think there is any way to get that from the help documentation. I have been working on a piece of code that I suspect is reinventing, but in an awkward and kludgey way, a piece of the functionality of formulas. So far as I have been able to gather, the only place they are really explained in detail is in chapters 2 3 of the White Book, “Statistical Models in S”. Unfortunately, I do not have ready access to a major research library and I have way, way outspent my book budget. Someday I’ll probably buy a copy, but for the time being, I am stuck without it. So it would be great to have a piece of code that uses them explained in detail. Warmest regards to all, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Learning-the-R-way-A-Wish-tp4660287.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sparse dataframes?
Dear Folks-- Is there a data frame analog to sparse matrices? I am working with a panel data set that has a large number of variables that are redefined repeatedly or exist for only a few years (out of 48). In my current structure, where variables are columns and rows are years, more than 90 percent of the cells and more than 3/4 of the total size of my file are NAs. I am wondering if there is an alternate file specification currently available that still allows numeric, character and factor data to be stored. Besides just using a database. A pointer in the right direction (or a solid no if that is the truth) would be greatly appreciated. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Sparse-dataframes-tp4655614.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using factor variables with overlapping categories
ear folks – I have a question, though it is more of a logic- or a good practices-question than a programming question per se. I am working with data from the American Community Survey summary file. It is mainly categorical count data. Currently I am working with about 40 tables covering about 35 variables, mainly in two-way tables, with some 3-way and a handful of four-way tables. I am going to be doing a lot of analysis on these tables, and hope to make them available in zipped format to other R users. Right now I am keeping this data in single-state data frames, but I will probably have to shift over to a database if I add many more variables. Here is my problem: of my 35 variables, five of them are different versions of age. Different tables cover different age ranges, and have different levels of disaggregation for the age ranges they cover. Currently I just have a factor for each with the cut-points in the labels. But I feel uncomfortable with this. It seems to throw away a lot of information. There is a “natural” mapping from the different age ranges to one another, at least within universes (e.g. individuals vs. heads of household), and my current approach does not encode that mapping in any way that R can notice (unless I write special functions that read the labels) One of the first things I am doing with this data is using all the cross-tabs to produce some basic estimates of higher-dimensional tabulations – some 10-way tables covering age, race, sex, age, rent/own, income, etc. that are consistent with all the lower-dimensional margins, using a multi-dimensional analogue of the RAS balancing (biproportional matrix balancing) algorithm often used to update Leontief input-output tables. Right now the approach I am using is to sum the age variables into four categories the let me use four of my five age variables, and throw the fifth (which has inconsistent breakpoints and is used in only one table) away. But this seems wasteful to me – not only of one table, but of a lot of information on finer age sub-structure which is shared by two or more tables. I am guessing that this is a fairly common problem in dealing with large data sets of count objects. Is there a “standard” approach to is, or a set of commonly used approaches, that anyone could suggest or point me to? I’d be happy with either coding suggestions or pointers to the methodology literature if there is one. Any help or suggestions would be greatly appreciated. Thanks! andrewH -- View this message in context: http://r.789695.n4.nabble.com/Using-factor-variables-with-overlapping-categories-tp4651054.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can you turn a string into a (working) symbol?
Dear Michael – This is _very_ interesting and I want to play around with the functions you suggest. I had no idea it was so easy to define assignment operators. However, one question: even after reading the “get” documentation and doing a bunch of mousing around for the expressions “pos” and “the search path”, I am not sure what function the numeral 1 in these expression serves. Why do I want to look in the global environment rather than the current environment? I also can not find anything that explains what the default “pos = -1” does. Thanks for responding! andrewH -- View this message in context: http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4651069.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can you turn a string into a (working) symbol?
Dear Greg— You mean FAQ 7.21, not 7.22, correct? Though 7.12 also seems relevant. Though I would say I was asking about turning a string into an expression rather than a variable. At any rate, thanks for the pointer. I sure I would benefit from rereading the FAQ on a monthly basis, until I actually know most of what is in it. As to your question about my question, I’ve wanted to do this exact thing several times in different contexts. However, you are quite correct that I am struggling with this problem in a particular context. I have of a large, multi-dimensional object containing count data. Currently this object is implemented as a 26 dimensional (and growing) array with two to thirteen dimnames per dimension, though I am thinking of switching it to a data frame with dimensions as factors and dimname-equivilent factor levels. I need to take a lot of complicated partitions of this object, mainly, though not always, summing to the entire object. Most of the partitions are subsets of -- – OK, now I have to digress to address a terminological uncertainty. Think of a 4X4X4 cube. It has three dimensions, and each dimension has four of what? I’m going to call them levels right now, though I don’t think that is right -- it would be confusing if there were factors in the picture. Also, the dimnames do not name the dimensions, but the thing I am calling levels, which is also confusing. -- Anyway, most of the partitions consist of two to four dimensions out of 24, but sometimes with some levels omitted or summed, and occasionally the partitions that are much more complicated (to deal with censored data, mainly). I have to use each partition multiple times, doing a very different thing each time (and then repeat the whole set many times) The next 4 paragraphs describe what I am actually doing with the partitions, but you can skip over them and cut to the chase if you are not so interested. I am summing over the dimensions in each partition, dividing a table of “forcing totals” for that partition by those sums (element by element), and then taking the resulting ratios and multiplying each of the terms in the original, non-summed object by the corresponding ratio. This is easiest to understand by analogy to the two-dimensional case. You take the row sums and divide them element by element by a vector of pre-determined row “forcing totals,” to get a vector of forcing ratios. Then you multiply each row by the corresponding forcing ratio, so that the row sum will then match the forcing total. Then you do the same thing with the columns. Repeat, alternating row and columns, to convergence. Each column has a corresponding column forcing total, and each row has a corresponding row forcing total. The elements of the matrix have two partitions that we use, one into rows, and the other into columns. This is sometimes called RAS balancing, or biproportional matrix adjustment. It is an algorithm that is used a lot to update big matrices in national income accounting and input-output analysis. What I am doing is the same, but I have forcing totals in two to four dimensional tables instead of a one dimensional vectors. Each partition divides the array into groups of elements that I want to sum to my forcing totals. Again, you go around in a circle, doing forcing with each of the (currently 18) tables, to convergence. On count data it should always converge. The thing is, I need to keep track of all these partitions, and then multiply the forcing totals by the exact same elements of the array as I previously summed. I got up to five dimensions, coding by hand, and then realized that 1: the amount of work in going from, e.g., 19 dimensions to 20 was going to very great, and 2. the likelihood that I would get all the nesting and partition-matching right was vanishingly small. So I am looking for a way to encode the partitions that I use, that would allow me to use the same encoding to represent both the subsets of the array to sum over, crunching the array down to a set of totals corresponding to my forcing totals, and also defining the subsets of the array that should be multiplied by each forcing ratio. And I thought, maybe I could do it with strings of indexing commands, one per table of forcing totals. But this will only work If I can sum the array over the subdivisions that the partition defines, multiply all the elements in partition subdivisions by the corresponding constants, and then assign the results back to the array, or to a new array. Hence my question. I’m afraid that this explanation is too long for people to read, but hope springs eternal. I’d be remarkably pleased and eternally grateful if I got a solution to the problem of keeping track of partitions that can be used in the three ways described in the previous paragraph, even if it has nothing to do with executing strings. Warmest regards, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Can-you-turn-a-string
[R] Making part of a data frame into a time series
Dear folks – I have a bunch of data frames where columns 1:(n-1) contain information about a county, and columns n and higher contain a time series of monthly observations on that county. I wanted to get the data in columns n and higher to be recognized as a bunch of time series. So I wrote a function that was supposed to turn all the columns from a given column number on into a time series: # Convert the final cols of a data frame into a time series MakeTS - function(data.df, firstColNo, firstYear, firstSubNo = NULL, freq = 1){ data.df[,firstColNo:ncol(data.df)] - ts(data = data.df[,firstColNo:ncol(data.df)], start = c(firstYear, firstSubNo), frequency = freq) data.df } However it does not appear to work. The is.ts function will not let me test a subset of the data frame: # Simplified example for check. AA - data.frame(rbind(c(X, 1:12), c(Y, 1:12))) AA X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 1 X 1 2 3 4 5 6 7 8 9 10 11 12 2 Y 1 2 3 4 5 6 7 8 9 10 11 12 BB - MakeTS(AA, 2, 2010, 1, 12) is.ts(window(BB[,2], start = c(2010, 1), [1,2:13]) Error: unexpected '[' in is.ts(window(BB[,2], start = c(2010, 1), [ In addition, and to my great confusion, the values in columns 3 and higher have all been replaced by ones: BB X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 1 X 1 1 1 1 1 1 1 1 1 1 1 1 2 Y 1 1 1 1 1 1 1 1 1 1 1 1 I am guessing that you are not allowed to define part of a data frame as a time series, and that I will just have to give up on that idea. Is that right? And why is everything a one? Is ts using the default frequency instead of the one I handed to it? And if so, why? Offers of help or insight greatly appreciated. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Making-part-of-a-data-frame-into-a-time-series-tp4650392.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting information encoded in a SAS, SPSS or Stata command file into R.
Dear Anthony – On closer examination, what I am talking about is not factor levels, but something different (but analogous). The data that is categorical all has integer codes, so the file is entirely numeric. The SAS proc format then gives text strings for each code for each categorical variable. Like this: value REGION_f 11 = New England Division 12 = Middle Atlantic Division 21 = East North Central Division 22 = West North Central Division 31 = South Atlantic Division 32 = East South Central Division 33 = West South Central Division 41 = Mountain Division 42 = Pacific Division 97 = State not identified So it would make sense to have a lookup table of these codes linked to the variables. I’m not sure if it makes more sense to have that table live in R or in the database. For R purposes, I imagine it would make sense to convert these integer-valued variables into factors. What I do not understand is how SAS knows where the variables begin and end. I managed to break off a little hunk of the beginning of my file and look at it in an editor, and it is numbers without any obvious delimiters. Is the delimiter a particular numeric string? I thought the SAS command file would contain the starting location for each of the fixed-length fields, but I do not see anything in the file that could be interpreted that way – just a little wraparound code and then a long list of variable names followed by triplets of a code, an equals sign, and a text string, terminating with a semicolon. I’m sorry if I am being obtuse. When I said before that I had saved the SAS files as flat files, what I really meant was that I had an intern do it. When I was doing my own analysis, I mainly used TSP, before I switched to R about a year ago. I’ve never used SAS. I find your data project very interesting. Very. It is not actually necessary to wait for BLS to release the older CEX files, if you can lay your hands on the CDs. I spoke to the BLS data products office about 2 years ago, and they have no problem with people republishing purchased data in any format they like, including simple duplication. In fact, they seemed to like the idea. I think the sale of data was forced on them by some kind of mandate from above. I'll be playing with your code (which is a model of readability, and a lesson to me on same, BTW) and keep you posted on my progress. Warmly, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649541.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting information encoded in a SAS, SPSS or Stata command file into R.
Wow! After reading Jan's post, I said Great, I'll do that, because it was the closest to what I originally had in mind. Then I read Ista's post, and said I think I'l try that first, because it got me back on the track of following directions in the R Data Import/Export manual. Then I read Anthony's post. Now, I am not so thrilled to go the database route, because frankly have hardly ever used them before, and this would make an already complex project take longer. But, I know that I will need to use the sample survey package for what I am trying to do. So i think I am going to try to get the data into SQLite format, and just hope the effort builds character. Anthony, I have not used your packages yet, but they look great! It will probably be more than a week before i get all this worked out and implemented. Given how much work this will be, I do not want to do it twice, so I think I will go back to IPUMS and get the rest of the variables, and break the file up into smaller chunks at the same time, both so I really have the whole thing, and also so that it is easier to work with. The IPUMS version of the file is rectangular (it duplicates the household data in each individual), and IPUMS has done a lot of valuable work in cleaning the data and harmonizing variable names and definitions that have changed over the history of the CPS. (Annoyingly, however, they have not connected the cross-sections between years. All the CPS samples consist of two sets of four consecutive months, eight months apart, so the March Supplement always consist half of people who were interviewed in the last year and half of people who will be interviewed in the next year (barring turnover)). Anyway, when I have figured out my route to import I will report back here. In the meantime, I have three more questions that one of you may be able to answer: 1. Anthony, does the read.SAScii.sqlite function preserve the label names for factors in a data frame it imports into SQLite, when those labels are coded in the command file? 2. If I want to make the resulting SQLite database available to the R community, is there a good place for me to put it? Assume it is 10-20 gigs in size. Ideally, it would be set up so that it could be queried remotely and extracts downloaded. Setting this up is beyond my competence today, but maybe not in a couple of months. (I'd like to do the same thing with the 30 years of Consumer Expenditure Survey data I have. I don't have access to SAS any more, but I converted it all to flat flies while I still did. Currently the BLS only makes 2011 microdata available free. Earlier years on cd are $200/year. But they have told me that they have no objection to my making them available). 3. I have not yet been able to determine whether CPS micro data from the period 1940-1961 exists. Does anyone know? It is not on http://thedataweb.rm.census.gov/ftp/cps_ftp.html, and IPUMS and NBER (http://www.nber.org/data/current-population-survey-data.html) both only give data back to 1962. I wrote to Census a week ago, but I have not heard back from them, and in the past they have not been very helpful about historical micro data. Thanks to all! Andrew -- View this message in context: http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649466.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting information encoded in a SAS, SPSS or Stata command file into R.
Dear folks – I have a large (26 gig) ASCII flat file in fixed-width format with about 10 million observations of roughly 400 variables. (It is 51 years of Current Population Survey micro data from IPUMS, roughly half the fields for each record). The file was produced by automatic process in response to a data request of mine. The file is not accompanied by a human-readable file giving the fieldnames and starting positions for each field. Instead it comes with three command files that describe the file, one each for SAS SPSS, and Stata. I do not have ready access to any of these programs. I understand that these files also include the equivalent of the levels attribute for the coded data. I might be able to hand-extract the information I need from the command files, but this would involve days of tedious work that I am hoping to avoid. I have read through the R Data Import/Export manual 2 and the foreign package documentation and I do not see anything that would allow me to extract the necessary information from these command files. Does anyone know of any r package or other non-proprietary tools that would allow me to get this data set from its current form into any of the following formats: SAS, SPSS or Stata binary files read by R. A MySQL data base An ffdf object readable using the ff package. My ultimate goal is to get the data into an ffdf object so that I can manipulate it in R, perhaps by way of a database. In allocation I will probably be using no more than 20 variables at a time, probably a bit under a gig. I am working on a machine with three gig of ram. (I have seen some suggestions that data.table also provides a memory-efficient way of providing database-like functions, but I am unsure whether it would let me cope with an object of this size). Any help or suggestions anyone could offer would be very much appreciated. Warmest regards, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can you turn a string into a (working) symbol?
Dear folks-- Suppose I have an expression that evaluates to a string, and that that string, were it not a character vector, would be a symbol. I would like a function, call it doppel(), that will take that expression as an argument and produce something that functions exactly like the symbol would have if I typed it in the place of the function of the expression. It should go as far along the path to evaluation as the symbol would have, and then stop, and be available for subsequent manipulation. For example, if aa - 3.1416 bb - function(x) {x^2} r - 2 xx - c(aa, bb) out - doppel(xx[1])*doppel(xx[2])(r) Then out should be 13.3664 Or similarly, after doppel(paste(a, a, sep='')) - 3 aa typing aa should return 3. Is there such a function? Can there be? I thought as.symbol would do this, but it does not. as.symbol (xx[1])*as.symbol (xx[2])(r) Error: attempt to apply non-function Looking forward to hearing from y'all.--andrewH -- View this message in context: http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can you turn a string into a (working) symbol?
Ah! Excellent! That will be most useful. And sorry about the typo. I found another function in a different discussion that also seems to work, at least in most cases I have tried. I do not at all understand the difference between the two. doppel - function(x) {eval(parse(text=x)) However, neither one seems to work on the left hand side of a -, a -, or an =. Again, my thanks.--andrewH -- View this message in context: http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4648365.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can you turn a string into a (working) symbol?
Yes, the assign command goes a little way toward what what I was hoping for. But it requires a different syntax, and it does not in general let you use quoted expressions that you could use with other assignment operators. For instance, DD - 1:3 assign(DD[2], 5) DD [1] 1 2 3 So I am still looking for a function that produces an output that is fully equivalent to the string without quotation marks. Or for a definite statement that no such function can exist. Thanks so much for your attention to this problem. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4648366.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing ugly for loops
Dear Bert-- I tried your function on the data that I provided (data.df) and it worked beautifully (after I added a missing final parenthesis), producing exactly the same output as my function. This is an excellent example of what I was looking for, because it is (a) 50% shorter than mine, (b) fully vectorized, and (c) uses three functions that I have never used before: with, unique, and do.call I am going to spend a happy afternoon working through this command by command and at the end I am confident that I will have learned some valuable new ( to me) tricks. Thanks! Warmest Regards, AndrewH -- View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821p4645914.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replacing ugly for loops
I have a couple of hundred American Community Survey Summary Files files containing rectangular arrays of data, mainly though not exclusively numeric. Each file is referred to as a sequence (henceforth seq). From these files I am trying to extract particular subsets (tables) consisting of a sets of columns. These tables are defined by three numbers (now in columns in a data frame): 1. a file identifier (seq) 2. first column position numbers (startNo) 3. length of table (len) so the columns to select for one triple would consist of startNo:(startNo+length-1). I am trying to create for each sequence a vector of all the column numbers for tables in that sequence. Obviously I could do this with nested for loops,e.g.. seq - c(1,1,2,2) startNo - c(3, 10, 3, 15) len - c(4, 2, 5, 3) data.df - data.frame(seq, startNo, len) seq.f - factor(data.df$seq) data.l - split(data.df, seq.f) selectColsList- vector(list, length(levels(seq.f))) for (i in seq_along(levels(seq.f))){ selectCols - numeric() for (j in seq_along(data.l[[i]]$startNo)){ selectCols - c(selectCols, data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j] data.l[[i]]$len[j]-1)) } selectColsList[[i]] - selectCols } selectColsList [[1]] [1] 3 4 5 6 10 11 [[2]] [1] 3 4 5 6 7 15 16 17 But this code strikes me as inelegant and verbose. It seems to me that there ought to be a way to make the outer loop, (indexed with i) into a tapply function (which is why I started with a split()), and the inner loop (indexed with j) into some cute recursive function, but I was not able to do so. If anyone could suggest some nicer (e.g. shorter, or faster, or just more sophisticated) way to do this instead, I would be most grateful. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Importing a complex XML file (SDMX format)
Hi folks! I am trying to read a large XML file from the Fed that contains quarterly Flow of Funds data since the 1950s. It contains lots of individual tables in something called the Statistical Data and Metadata eXchange format (SDMX format). I am not sure if I need something specific to the SDMX format to read the file, or just to use the XML package correctly. The XML package includes over 70 documented functions and frankly I have not been able to figure where to start. This is the first time I have ever needed to open up an XML file of any kind, so I am starting from scratch. I would be very grateful for advice on either reading an arbitrary but complex XML file or from anyone who has succeeded in opening an XML file in SDMX format. Warmest regards, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Importing-a-complex-XML-file-SDMX-format-tp4188411p4188411.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Consistant test for NAs in a factor when exclude = NULL?
Dear folks? Is there a function to correctly find (and count) the NAs in a factor when exclude=NULL, regardless of whether their origin is in the original data or by subsequent assignment? In example number 1 below, where NAs are assigned by is.na()-, testing the factor with is.na() finds the correct number of NAs. In example number 2, where the NAs are from the data, neither is.na(), ==NA, nor ==NA correctly identifies the NAs. In example number 3, which mixes NAs from assignment with NAs from data, is.na does not even find the NAs created by assignment, as it did in example 1. I'm running R 2.13.2 on Windows XP with ServicePack 3 Any assistance would be greatly appreciated. Appreciatively, andrewH Example #1 # Origin: is.na()- Exclude: NULL KK - factor(c(A,A,B,B,C,C), exclude=NULL) KK[KK==C] [1] C C Levels: A B C is.na(KK[KK==C]) - TRUE KK [1] AABBNA NA Levels: A B C levels(KK) [1] A B C levels(KK)[KK] [1] A A B B NA NA KK==NA [1] NA NA NA NA NA NA sum(KK==NA) [1] NA KK==NA [1] FALSE FALSE FALSE FALSENANA sum(KK==NA) [1] NA is.na(KK) [1] FALSE FALSE FALSE FALSE TRUE TRUE sum(is.na(KK)) [1] 2 Example #2 # Origin: data Exclude: NULL GG - factor(c(A,A,B,B, NA, NA), exclude=NULL) GG [1] AABBNA NA Levels: A B NA levels(GG) [1] A B NA levels(GG)[GG] [1] A A B B NA NA GG==NA [1] NA NA NA NA NA NA sum(GG==NA) [1] NA GG==NA [1] FALSE FALSE FALSE FALSE FALSE FALSE sum(GG==NA) [1] 0 is.na(GG) [1] FALSE FALSE FALSE FALSE FALSE FALSE sum(is.na(GG)) Example #3. MM - factor(c(A,A,B,B,C,C, NA), exclude=NULL) is.na(MM[MM==C]) - TRUE MM [1] AABBNA NA NA Levels: A B C NA levels(MM) [1] A B C NA levels(MM)[MM] [1] A A B B NA NA NA MM==NA [1] NA NA NA NA NA NA NA sum(MM==NA) [1] NA MM==NA [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE sum(MM==NA) [1] 0 is.na(MM) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE sum(is.na(MM)) [1] 0 -- View this message in context: http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3942755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Consistant test for NAs in a factor when exclude = NULL?
Thanks Jeff! I appreciate you sharing your experience. My data set is survey data, 13,209 records over nine years, collected by someone else, converted from SPSS format. It includes missing values, identified however SPSS does so, and translated to NAs by the import process. It also includes values along the lines of none of your business or beats me that are missing so far as I am concerned. I have assigned NAs to these values. Now I am trying to figure out some things about where these missing values are -- whether they are disproportionately located in any period or group. I have been trying to get counts for subsets, but I have not been able to make the subset counts add up to the total counts that I get from, e.g. summary. So I wrote these simplified versions, and even for the simplest examples, I could not find a function that correctly identified the NAs that I knew were there because I put them there myself. That is why I am looking for help. Does this make sense? Warmest regards, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3943157.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] re coercing data frame rows to character: Am I right that this is a bug?
Dear Folks-- All this seems to me to behave the way you expect, recognising that column b is a factor: AA - data.frame(a=3:4, b=c('x', 'y')) AA[1,] a b 1 3 x as.numeric(AA[1,]) [1] 3 1 AA[,2] [1] x y Levels: x y as.numeric(AA[,2]) [1] 1 2 as.character(AA[,2]) [1] x y But this seems to me to be wrong: as.character(AA[1,]) [1] 3 1 Shouldn't it be: [1] 3 x to be consistant with the normal pattern of coercing factors to character values? If it is a bug, is this the right place to post it? sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/re-coercing-data-frame-rows-to-character-Am-I-right-that-this-is-a-bug-tp3924449p3924449.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reporting multiple objects out of a function
Thanks, Gabor! When a beginner (like myself) asks a question, it seems that the thing that we believe we are confused about, or want to learn, may not be the thing that would actually help us the most if it were clearly understood. Your response is what I consider ideal: Answer my question, then tell me the answer to the question that I would ask if I were smart enough. I had dismissed the idea of env- on grounds that I did not know what I might be overwriting. The env$x- trick is a very nice one that I had not considered because I have so little understanding of what an environment is. I is descibed in the environment documentation as a collection of named objects, and a pointer to an enclosing environment, which is fairly opaque by he standard of R documentation. R has lots of different kinds of boxes in which to collect objects. If you have a favorite introduction to how R environments work and/or best practice in programming with them, I'd be pleased to read it. Again, many thanks. and on to classes. Warmly, andrewH -- View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3881118.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unexpected behavior of extract (`[`) or sapply functions
Dear folks-- The function below is a snippet of a larger function that is not doing what it is supposed to do, and I do not understand its behavior. The larger function is supposed to produce an array containing the results of a user-specified function applied to groups of data defined by the intersection of one or more factors, and return them in an array with a dimension for each factor and a dimension level for each factor level. This snippet is supposed to take a data frame, a vector of column numbers containing factors, and a column number for the data, and return (in the test function below, just print) a list of character vectors of the level names (one vector per dimension) and the length of those vectors. It works fine so long as I give it more than one factor column, but if I give it a vector of factor columns of length 1, it behave differently and when I try to assign the names from test.levels to the dimnames of the array, I end up with an error message: Error in dimnames(data) - dimnames : length of 'dimnames' [1] not equal to array extent The example below shows the function output for a test data frame (“test.df”) when run first of a vector of two column number for factors and then on just one. You can see how the structure of the output shifts. I can not understand what is happening. What I want it to do when given just factor cols =c(1) is to give me back exactly what it gives me bact for factor colum 1 in factor.cols = c(1,2). Any help or suggestions would be greatly appreciated. Sincerely, andrewH # Test Data test.df - data.frame(AA=rep(LETTERS[1:2], c(6,6)),BB=rep(LETTERS[3:5], c(4,4,4)), CC=rep(LETTERS[6:9],c(3,3,3,3)), DD=c(1:12)) # The function getLevels - function(data.df, factor.cols, data.col){ test.levels - sapply(test.df[,factor.cols, drop=F], levels) cat(test.levels:\n); print(test.levels) no.levels - sapply(sapply(data.df[,factor.cols, drop=F], levels), length) cat(no.levels:\n); print(no.levels) } # Run it with two factors and again with 1, Output below cat(\nTest 2 factors:\n) getLevels(test.df, c(1,2), 4) cat(\nTest 1 factor:\n) getLevels(test.df, c(1), 4) Test 2 factors: getLevels(test.df, c(1,2), 4) test.levels= $AA [1] A B $BB [1] C D E no.levels=AA BB 2 3 cat(\nTest 1 factor:\n) Test 1 factor: getLevels(test.df, c(1), 4) test.levels= AA [1,] A [2,] B no.levels=A B 1 1 -- View this message in context: http://r.789695.n4.nabble.com/Unexpected-behavior-of-extract-or-sapply-functions-tp3881176p3881176.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reporting multiple objects out of a function
Thanks for the response, Paul! But I thought these dumped the variables into the global environment. Is that not correct? I want to make them available in the calling environment, without making them available in the global environment, unless that is where the function is called. This is my bow to the fact that what I want this function to do is not good programming practice in general. The whole purpose of this function is to save me time, typing and wear on my limited short-term memory capacity, by having standard objects with standard names quickly available. I wonder if eval.parent would do the job. Like: fun1 - function(x, y, z) eval.parent{obj1 - x; obj2 - y; obj3 - z }) Or does that just use the parent environment for the inputs, not the output? Part of my problem is that I am not sure how to tell if I have succeeded. Otherwise I would just test it myself. andrewH -- View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3875586.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reporting multiple objects out of a function
Thanks, Sina! This is very helpful and informative, but still not quite what I want. So, here is the thing: When a function returns an object, that object is available in the calling environment. If it is returned inside a function, it is available in the function, but not outside of the function. What I want to do is simply to return more than one object in the usual sense in which functions return objects. Here is a test to see if a function fun does this, at least to the depth of 1. obj1 - 1 obj2 - 2 cat(obj1 in global=, obj1) cat(obj2 in global=, obj2) wrapFun - function(fun) { obj1 - 3 obj2 - 4 cat(obj1 in calling=, obj1) cat(obj2 in calling=, obj2) fun() cat(obj in calling=, obj) cat(obj1 in calling=, obj1) cat(obj2 in calling=, obj2) } cat(obj1 in global=, obj1) cat(obj2 in global=, obj2) Suppose the function fun assigns the values 5 and 6 to obj1 and obj2. If the function does what I want, this code should print: obj1 in global= 1 obj2 in global= 2 obj1 in calling= 3 obj2 in calling= 4 obj1 in calling= 5 obj2 in calling= 6 obj1 in global= 1 obj2 in global= 2 I turned Paul’s and Sina’s code into functions as follows: paulFun - function() { obj1 - 5; obj2 - 6; } sinaFun - function() { attach(what = NULL, name = my_env) assign(obj1, 5, envir = as.environment(my_env)) assign(obj1, 5, envir = as.environment(my_env)) } Running these two functions in the code above yields: paulFun: obj1 in global= 1 obj2 in global= 2 obj1 in calling= 3 obj2 in calling= 4 obj1 in calling= 3 obj2 in calling= 4 obj1 in global= 5 obj2 in global= 6 So paulFun puts the objects in the global environment but not in the calling environment. Let’s try sinaFun: sinaFun: obj1 in global= 1 obj2 in global= 2 obj1 in calling= 3 obj2 in calling= 4 obj1 in calling= 3 obj2 in calling= 4 obj1 in global= 1 obj2 in global= 2 sinaFun puts the objects in the new environment it defines, but they are available in neither the calling nor the global environment. However, I was immediately convinced that Sina had given me the tool I was missing: the assign function. (Thanks, Sina!) But I was wrong (or used it wrong), and now I am even more deeply confused. Here is a function that I thought would do what I want: andrewFun - function() { assign(obj1, 5, pos = sys.parent(n = 1)) assign(obj2, 6, pos = sys.parent(n = 1)) NULL } However, when I tried it, my results were the same as paulFun: assigned in the global environment, but not in the calling environment. Setting n = 0 seemed to limit the assignment to the interior of andrewFun: none of the printed obj values were affected. Help? andrewH -- View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3876201.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reporting multiple objects out of a function
Dear folks, I’m trying to build a function to create and make available some variables I frequently use for testing purposes. Suppose I have a function that takes some inputs and creates (internally) several named objects. Say, fun1 - function(x, y, z) {obj1 - x; obj2 - y; obj3 - z missing stuff } Here is the challenge: After I run it, I want the objects to be available in the calling environment, but not necessarily in the global environment. I want them to be individually available, not as part of a list or some larger object. I can not figure out how to do this. If I understand the situation correctly, I am trying to move several separate objects from the environment of the function to the environment in which the function was invoked (the “calling environment,” yes?). I’m pretty sure there is a command to do this, but I’m not sure how to find it. Any help would be greatly appreciated – either on the necessary code, or on how to search for it, or a reference to a good discussion of this family of problems. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3873380.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Referring to an object by a variable containing its name: 6 failures
Thanks Josh Duncan! That was very clear and helpful. After going back and reviewing documentation for { and $ I am realizing that R the pattern in R documentation is simply to tell you the truth, and not to give much effort to distinguishing confusable choices. Once again, things that seemed crazy to me become perfectly sensible once understood. I think I need to read function documentation more the way one reads concept definitions in a math book. Josh, one question: Your reasons to avoid attach() seem cogent. However, Venables, Smith et al. say in “An Introduction to R” : A useful convention that allows you to work with many different problems comfortably together in the same working directory is gather together all variables for any well defined and separate problem in a data frame under a suitably informative name; when working with a problem attach the appropriate data frame at position 2, and use the working directory at level 1 for operational quantities and temporary variables; before leaving a problem, add any variables you wish to keep for future reference to the data frame using the $ form of assignment, and then detach(); finally remove all unwanted variables from the working directory I'm still at the point that I am doing things just because Authority says so, but unfortunately, everyone is Authority, relative to me. Still, I wonder if you have any thoughts about why such a venerable authority as Venables et al. would recomend a programming practice if that practice should always be avoided. For cognative dissonance form authority conflicts, that's up there with the Google R stylesheet saying to avoid using S4 classes. Again, my thanks. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Referring-to-an-object-by-a-variable-containing-its-name-6-failures-tp3817129p3822436.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Referring to an object by a variable containing its name: 6 failures
Dear Folks -- The anonymous poster (rmailbox) is perfectly correct. I had forgotten you could use names in this way. When referring to rows or columns by name rather than by number, I usually use either attach() or the $ operator, neither of which works here. If anyone understands why data.df[,colName] works in this setting but datadf$colName and the use of as.symbol(colName) after attach(data.df) do not work, i would love an explanation, because I sure don't. Thanks, Timothy, for helping to clarify what I was trying to do. You are exactly right, and your analogy to the $$ command in PHP – a command that works -- was thereby more perfect than my analogy to things in R which do not work. Elk's suggestion to use the get() function was very welcome, as I had never really understood what get() was for, and this is a great use that often arises. However, for this purpose, get() is somewhat capricious in its effectiveness. “get(colName)” works as the operand of class(), length(), mode(), and summary(), but it does not work for typeof(), where it returns this error: Error in eval(substitute(expr), data, enclos = parent.frame()) : numeric 'envir' arg not of length one And it does not work for str(), where it treats the variable name as a character string rather than a symbol. Again, I do nut understand what distinguishes the functions for which Elk's solution works from those for which it does not. Does anybody know? Ideas welcome. --and thanks again for all the help. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Referring-to-an-object-by-a-variable-containing-its-name-6-failures-tp3817129p3819813.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Name the dots! (...)
Dear Folk-- Suppose I have some objects A, B C, and a function getDots - function(...) {args - list(...) etc.} If I do a call to getDots(A, B, C) then the variable args will be assigned to a list which contains the objects to which A, B C refer, but which will not (except by happenstance) contain the names A, B, or C. I would like getDots to return a named list, with the object names being assigned as the element names in the list. Is there any way to do this? As an aside, I do not understand why the list command does not do this by default, like the data.frame command does. In fact, you can use data.frame instead of list to get a named argument list, but only if all your objects are of the same length. Thanks in advance for any help you can offer. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Name-the-dots-tp3819947p3819947.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Returning the name of an object passed directly or from a list by lapply
Dear Bill-- Wow. This is very clever and I learned a lot from it. I've never seen the ...() trick before, and on a Google code search, I could not find anyone else who had used it. And I've never used the ... feature, which, BTW, though mentioned in every intro to R text, has no help page I can find. grrr Your function is still not doing what I am trying to do, doubtless because I was not clear enough in the question I pose At the bottom of this message I have posted a copy of my testing function, and a few objects to test it on. Its details are unimportant and not very interesting, but note that all of its important outputs are in the form of side effects. What I would like to be able to do is this: f(fun, n variable names) and get back this: fun(variable#1) fun(variable#2) ... fun(variable#n) Attempting to copy some of your techniques, I came up with this: evaluate - function(fun, ...){ unevaluatedArgs - substitute(...) for (i in 1:length(deparse(unevaluatedArgs))) fun(deparse(unevaluatedArgs)[i]) invisible(TRUE) } As applied to my test data, it works on the first variable (but gets the variable name wrong) and ignores the remainder of the list, e.g.: evaluate(testX, H.char, H.vec, H.df, H.mat) ### testX( deparse(unevaluatedArgs)[i] ): Class= character Type= character Mode= character Summary: Length Class Mode 1 character character Structure: chr H.char On the other hand, this almost works: evaluate - function(fun, ...){ evaluatedArgs - list(...) for (i in 1:length(evaluatedArgs)) fun(evaluatedArgs[i]) invisible(TRUE) } The only thing it does not do is get the name of the passed object right. That seems like it ought to be a small problem, but as you pointed out, the names are not in the list. (BTW, I don't understand dropping the names as a design choice for the list() function. If you use list() to make a list out of four symbols for objects, wouldn't it be better to make the text of the symbols the default names for those objects? That would solve this problem nicely.) [s]ubstitute seems to drop all but the first variable passed by Thanks so much for your thoughtful help. andrewH testX - function(objectX, bar=TRUE) {# A useful diagnostic function object.name - deparse(substitute(objectX)) if(bar) cat(##\n); cat(testX(, object.name, ): ); cat( Class=, class(objectX)); cat( Type=, typeof(objectX)); cat( Mode=, mode(objectX), \n); cat(Summary:\n); print(summary(objectX)) cat(Structure:\n); str(objectX); if (is.factor(objectX)) {cat(Levels: , levels(objectX), \n); cat(Length: , length(objectX), \n)} invisible(object.name) } ## Define 4 test variables: H.char, H.vec, H.df, H.mat H.char - letters[1:10] H.vec - c(1:10) H.df - { # Makes a test data set A.df with 2-, 3-, 4-factor sorting variables, making 24 # combinations, a 4th variable with a unique data value for each combination. # No random component. year.2 - factor( rep(1:2, each=12) ) cohort.3 - factor( rep(rep(1:3,each=4),2) ) race.4 - factor( rep(1:4, 6) ) D1 - as.numeric(as.character(year.2))*1.1 + as.numeric(as.character(cohort.3))*1.01+ as.numeric(as.character(race.4))*1.001 data.frame(year.2,cohort.3,race.4,D1) } H.mat -matrix(1:16, 4, 4) ## End of test variables -- View this message in context: http://r.789695.n4.nabble.com/Returning-the-name-of-an-object-passed-directly-or-from-a-list-by-lapply-tp3816798p3819378.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Returning the name of an object passed directly or from a list by lapply
Dear folks: Let’s suppose I want a function to print return the name of the object passed to it. myname - function(object) {out-deparse(substitute(object)); out} This works fine on a single object: O1 -c(1:4) myname(O1) [1] O1 However it does not work if you use lapply to pass it the same object from a list: O2 -c(1:4) object.list - list(O1,O2) lapply(object.list, myname) [[1]] [1] X[[1L]] [[2]] [1] X[[2L]] Is there any way to write myname() so that it returns the same objects name regardless of whether it is handed the name directly or by lapply as an element of a list? Any help you can offer would be greatly appreciated. Warmly, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Returning-the-name-of-an-object-passed-directly-or-from-a-list-by-lapply-tp3816798p3816798.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Returning the name of an object passed directly or from a list by lapply
Thanks Bill! You are correct. I did not understand what was inmy list. I posted a simplified example in the hope of focusing on the essentials, but I see I have edited out the motivation. When my programs go awry, and sometimes when they don't, I find I need to understand what is in some variable or variables. To help with debugging, I have built a little testing function that takes the names of one or more variables and returns a variety of information about each one: summary(), str(), class(), type(), etc., starting with the name. (The name is unimportant when I hand it one variable, but for a longer list, I want to print it to help keep track of what outcome goes with what variable). It also gives me some extra information about certain data types that I seem to have more trouble with, notably factors. These days I’m devoting myself nearly full-time to trying to learn R, and I probably run this function between 50 and 200 times a day. Now I am trying to figure out some way of running my testing function on more than one variable at a time. Should be easy on a computer, right? I don't care if I cluster my variables is a list, vector, or what -- I just want to be able to evaluate a bunch of them at one time. And I'd rather not have to type quotation marks around each variable name. I've timed myself, and it increases the time it takes me to type a list by 250%. Shortly I'll be posting a different question with regards to my failure to get this function to work in a loop. But I also very much want to be able to use one of the apply-family functions to run on multiple variables. If, as you have persuaded me, I can not use a list of variable names, this larger problem still has to have a straightforward solution, I think. But I sure don't know what it is. Any suggestions from any quarter would be deeply appreciated. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Returning-the-name-of-an-object-passed-directly-or-from-a-list-by-lapply-tp3816798p3817116.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Referring to an object by a variable containing its name: 6 failures
Dear Folks-- I'm trying to make a function that takes the columns I select from a data frame and then uses a for loop to print some information about each one, starting with the column name. I succeed in returning the column name, but nothing else I have tried using the variable colName, containing the name of the column, to refer to the column itself has worked. Below I show my non-working function, and a data frame to test it on. I try six distinct ways of trying to turn the variable containing the name back into a name that is recognized by other R functions, mainly functions that display the properties of the object to which the name refers. These are also numbered in the code below in comments: 1. evaluate colName with eval(). 2. convert back into as symbol with as.symbol() 3. treat the data frame as the calling environment using with() 4. use substitute() to plug in any bound information bound in the environment 5. attach the data frame from which the column name is drawn 6. access the column using the $ operator I have actually made this function work using numeric indexing. But I do not understand why none of these ways of accessing the column using its name work. They all give me the properties of the name as a character vector, (except (2), which gives me its properties as a symbol) rather than the properties of the vector to which the name refers. What am I doing wrong? How do I use a variable containing an object's name to refer to the object itself? Although I'm hoping others will find the bald look caused by tearing my hair out to be attractive, I would appreciate any assistance you can offer in understanding this question. Warmly, andrewH testDFcols - function(data.df, select=c(1:ncol(data.df)), bar=TRUE) {# A useful summary function if(bar) cat(##\n); attach(data.df) for (column in select) { colName -names(data.df)[column] cat(Column Name(, colName, ): ) # six failures cat( Class=, class(eval(colName)))#1 cat( Type=, typeof(as.symbol(colName))) #2 cat( Length=, length(with(data.df, colName))) #3 cat( Mode=, mode(substitute(colName)), \n) #4 cat(Summary:\n) print(summary(colName)) cat(Structure:\n) str(colName) % if (is.factor(data.df$colName)) {cat(Factor Levels: , levels(data.df$colName),\n)} else cat(\n)#6 } detach(data.df) invisible(deparse(substitute(data.df))) } A1.df - { # Makes a test data set A.df with 2-, 3-, 4-factor sorting variables, making 24 # combinations, a 4th variable with a unique data value for each combination. # No random component. year.2 - factor( rep(1:2, each=12) ) cohort.3 - factor( rep(rep(1:3,each=4),2) ) race.4 - factor( rep(1:4, 6) ) D1 - as.numeric(as.character(year.2))*1.1 + as.numeric(as.character(cohort.3))*1.01+ as.numeric(as.character(race.4))*1.001 data.frame(year.2,cohort.3,race.4,D1) } testDFcols(A1.df) -- View this message in context: http://r.789695.n4.nabble.com/Referring-to-an-object-by-a-variable-containing-its-name-6-failures-tp3817129p3817129.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading R Code aloud
Dear Clarence-- LOL! R really is good for an amazing range of things! How old are your kids? If no one else points to an intensional sample of vocal R, maybe you could record one of your more inspired readings. It could be a service to the community of R learners -- and if it catches on, perhaps to parents everywhere. Peace, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Reading-R-Code-aloud-tp3811142p3813540.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading R Code aloud
Dearfolks-- I have been told by an experienced R programmer and teacher whom I trust that it is easier to understand R code if you read it aloud, as the language that it is. However, she was clear that reading it aloud was not simply reading the marks on the screen: you read A.df[5,] as the fifth row of A.df (or the fifth row of data frame A), not as A dot df left square bracket five comma right square bracket, which is not helpful at all. So you have to be able to read it to read it aloud. I have observed this of poetry as well, and that, if you hear a poem read well once, you have a deeper understanding of it (and often other work by the same poet) forever after, even when reading it silently. So I was wondering if there are any significant example of people reading R code out loud available on the web, on youtube or something? I did not find any on ten minutes search, but perhaps I do not know how best to look. Does anyone know of any? Warmly, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Reading-R-Code-aloud-tp3811142p3811142.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Searching the console
Thanks, Josh! I'm using TINN-R now, but I have been thinking of switching to ESS. Though perhaps TINN-R has a similar function -- I had been looking for consol functions, rather than editor functions. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Searching-the-console-tp3797884p3797996.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Consistently printing the name of an object passed to a function; a data-auditing question
Dear folks-- I always seem to find that I spend more than half my time making sure my input date is in the right form, properly aligned, with no bizarre features. You know the drill: five kinds of missing values, three of them documented. An alpha mistype in one numeric field turns 30,000 numbers into factor levels. SPSS conversion turns 250 factors nicely into R factors, except 3 have levels instead of labels. A few columns in some years of a survey have undocumented differences in units. Halfway through a 20-year annual survey, they add two more allowable answers to a question. etc. I'm looking for things to make my data auditing go faster. One of them is a dopy little function, testX(), bundling together a variety of r tools to tell me what is in an object. Here it is: testX - function(objectX, bar=TRUE) {# A useful diagnostic function object.name - deparse(substitute(objectX)) if(bar) cat(\n); # visual separation between consecutive objects. cat(testX(, object.name, ): ); cat(Class=, class(objectX)); cat( Mode=, mode(objectX), \n); cat(Summary:\n); print(summary(objectX)) cat(Structure:\n); str(objectX); if (is.factor(objectX)) {cat(Levels: , levels(objectX), \n); cat(Length: , length(objectX), \n)} invisible(object.name)} This works well when I give it the name of a single object. My problem is when I try to produce descriptions of a bunch of variables in a row, such as the variables in a list of variables, or all the variables that I have clomped together in a data frame. The output is all side effects. Some ways of passing multiple variables get the name wrong, but the rest right. For example, if I have a list of variables, and do: lapply(varList, testX) I get an output like this: ## testX( X[[1L]] ): Class= factor Mode= numeric Summary: 1994 1997 1999 2002 2003 2007 2009 1009 1165 985 2502 2528 2007 3013 Structure: Factor w/ 7 levels 1994,1997,..: 1 1 1 1 1 1 1 1 1 1 ... Levels: 1994 1997 1999 2002 2003 2007 2009 Length: 13209 If instead, I do it with a loop through a the variable names in a data.frame, I get the name wrong _and_ it does not evaluate all the way to an object: names(var.df) [1] year YEAR AGE COHORT.5 COHORT.10 ETHNIC EDUC INCOMEINTERNET PARTY IDEOL for (sel in 1:length(names(var.df))) testX(names(var.df)[sel]) Gives an output like this: ## testX( names(var.df)[sel] ): Class= character Mode= character Summary: Length Class Mode 1 character character Structure: chr year Or I can select the column instead of the name of the column. This gives me the right answer on the object description, but not the name, thus: for (sel in 1:length(names(var.df))) testX(var.df[[sel]]) ## testX( var.df[[sel]] ): Class= integer Mode= numeric Summary: Min. 1st Qu. MedianMean 3rd Qu.Max. 199420022003200320072009 Structure: int [1:13209] 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 ... I've tried doing various things to names(var.df)[sel] to get it closer to the object -- as.symbol, eval(substitute() ), several others, but I just get variations on the output above. So there are actually two questions here: 1. How can I write this function so that it works when I just give it an object, but I can also use it with an apply-family function and a list (or vector, or whatever) of objects, and still have it both treat the object as an object and print its name correctly? 2. How can I write the function, or write a loop, or use an apply-family function, to use this function to go through the columns of a data.frame, correctly naming and correctly describing each? Another way of asking this same question is this: I want to be able to give testX the name of an object, or a reference to a named object, via apply-family function, indexing, or whatever. (A) How can I get the name I print, object.name, to be the name of the object in both cases? And, (B), how can I make sure that objectX is the actual object that the name refers to, and not the name or the reference, in both cases? Finally, and this should maybe be another post, I'd love to hear if others have thought through the whole question of efficient data auditing. Is there a suite of tools, or a standard set of recommendations, that you use and like? I'd love to hear any useful advice about how to accelerate this stage of a project, and get more quickly to its statistical heart. Most sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Consistently-printing-the-name-of-an-object-passed-to-a-function-a-data-auditing-question-tp3798005p3798005.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r
Re: [R] Searching the console
Dear Sarah-- I am thinking mainly in terms of long programs run by cut-and-past or some other batch-like submission, where you can get back a lot of code, some program outputs, and some error messages, all in a big lump. I want tl look through that lump to locate all the error or warning messages, or all the occurrences of some variable or function that seems to be causing a problem. In both cases I want to find the result in context. Is that clearer? Thanks for your attention. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Searching-the-console-tp3797884p3799611.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Searching the console
Thanks Eik! I did not know about or remember history. I agree that it solves part of my problem, but I really want to be able to search my code and the things R has printed in response as a single block of text. I can cut-and-paste it into a text editor, but I was hoping that there was a way to do it from the console itself, or otherwise cut out manual steps. Warmly, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Searching-the-console-tp3797884p3799630.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Searching the console
Is there any way to search the console during an interactive session? I've looked and looked, and can not find one. In some add-on package, maybe? Sorry to be so basic, but help would be greatly appreciated. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Searching-the-console-tp3797884p3797884.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can you send side effect text into a variable?
Dear folks -- There are a number of functions -- I am thinking of str() as an example -- that produce text as a side-effect, rather then returning it. Is there any way to send the text produced by such functions into a character variable? Any suggestions would be greatly appreciated. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Can-you-send-side-effect-text-into-a-variable-tp3746025p3746025.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package or procedure recommendations for analysis of repeated cross-sections?
OK, Ive done more research, and I think that what I am looking for is repeated cross section or pseudo-panel estimators. Does anyone know if these have been implimented inany r package? -- View this message in context: http://r.789695.n4.nabble.com/Package-or-procedure-recommendations-for-analysis-of-repeated-cross-sections-tp3694587p3696832.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Package or procedure recommendations for analysis of repeated cross-sections?
I have a survey data set of 6 years and about 1500 persons surveyed per year, with roughly 200 questions per survey. The samples are drawn independently without replacement and are intended to represent the nation (USA). I would like to create something like a synthetic panel, dividing the respondents up into groups and then seeing if year to year changes in the mean value of my independent variable for each group varies with the level or the change in the group mean of my explanatory variable. The grouping would be based on several factors the levels of which denote demographic variables such as income, race, and birth cohort. Each group would consist of all those respondents that are identical in their level of all the selected factors, i.e., it would consistent of all the respondents in the sample who share an identical race, income level, birth cohort, etc. After being imported from an SPSS data set, these variables are implemented as R factors. My dependent variables are measures of ideology and party affiliation; the variables that identify the groups are factors known to be correlated to political ideology for which I wish to control; and my independent variables focus on sources of news and information. My hypothesis is that the change in ideology we have observed over the period for which I have data can be explained in part by changes how these groups get their information. I’m not sure if the ideology change should respond to the level or to the change in level my independent variable. I intend to test both. I was about to try to write this from scratch, but it occurred to me that this is a variety of problem for which a nice package probably already exists, and I could probably find it if I knew the right terminology. I am not enough of a statistician to know the conventional name for the procedure of using subgroupings of cross-sections repeated over time as if they were panels. Moreover, I suspect my procedure of dividing a population into groups based on each combination of the classifying variables has a conventional name, and that looking at differences or ratios of the means of an independent variable over those groups and how they respond to the mean level of an independent variable by group has a name, and that each has one or more good implementation in R. Finally, I was thinking of simply regressing changes in the group means of my independent variable on the group means or changes in the group means of my independent variable. But this throws away information that I know is relevant, though I am not sure how best to use it, e.g. that the groups are of different sizes, so the mean differences or ratios will differ in their variances. I could assume they are normal and do a correction for heteroskedasticity, but if there is a better approach, I’d rather use it. My apologies if this question is unduely basic. I did two semesters of graduate econometrics once, but that was more than a decade ago, and I fear that, like many with a superficial knowledge of econometrics, I tend to see every research question in terms of OLS or GLM, even if that is not the right model for the problem. Any help or suggestions would be greatly appreciated. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Package-or-procedure-recommendations-for-analysis-of-repeated-cross-sections-tp3694587p3694587.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using str() in a function.
Thanks, everybody, this has been very edifying. One last question: It seems that sometimes when a function returns something and you don't assign it, it prints to the console, and sometimes it doesn't. I'm not sure I understand which is which. My best current theory is that, if the function returns NULL, by itself and not as part of some larger object, it does not print it, but non-null values are printed. Is that correct? Thanks! Andrew -- View this message in context: http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p3670513.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using str() in a function.
Thanks, David Dennis, that is very helpful. Let me state what I think I have learned, to see if I understand it correctly. Then I have two remaining questions. If a function contains more than one expression in its {}, it always returns the value of the last evaluated expression in its definition, and only the last object -- unless you previously use the return() function on an object before the last expression, in which case, the value of that expression is returned instead. And in either case, explicit or implicit return(), the returned expression is evaluated, and returned, first -- before any other expressions are evaluated, and any side effects also occur before any other expressions are evaluated. (Though I am unsure in what order expressions are evaluated if objects in the returned expression are defined by other expressions before it in the function. The chain of evaluation -- and of any side effects of that evaluation -- propogates backwards, maybe?). The print() command inside a function sends the object it contains to the currently-defined printer, as a side-effect, without returning it. The difference between return() and print() is that if something is returned, R checks to see if the value of the function is assigned or otherwise nested in a larger evaluated expression. Is so, a copy is moved to the assigned object and the original is deleted. If not, it is printed to the current device and then deleted. If you print() it, it does not check for assignment or use before sending it to the printer and deleting it. A lot of functions, e.g. str(), have as their explicit or implict return an expression which does not create an object. In this case, the function returns a NULL. If you do not want to print the NULL or other returned object, you make the returned argument invisible(). But there are still things here I do not understand. The function that Dennis Murphy provided does print the str() output last instead of first, because its final expression is invisible() rather than str(). But, it still prints out (and returns - I checked) a NULL. e.g. GG-c(1:5) testXa - function(X) { print(summary(X)) print(str(X)) invisible() # returns nothing } testXa(GG) Min. 1st Qu. MedianMean 3rd Qu.Max. 1 2 3 3 4 5 int [1:5] 1 2 3 4 5 NULL # Here is my latest version, of the function, which does exactly what I want: testXf - function(X) { print(Summary:); print(summary(X)) print(Structure:); invisible(str(X)) } testXf(GG) [1] Summary: Min. 1st Qu. MedianMean 3rd Qu.Max. 1 2 3 3 4 5 [1] Structure: int [1:5] 1 2 3 4 5 So, two questions: 1. In Dennis's function, the str() results are printed last because they are no longer returned, as invisible() is now the last expression. But why does his function still print a visible NULL? 2. My function, above, makes the NULL value returned by str() invisible. But invisible(str(X)) is the last expression evaluated, so why does the side-effect printing of str() results happen last instead of first? and thanks again! andrewH -- View this message in context: http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p3666339.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using str() in a function.
David -- Ah! Excellent. OK, that explains Dennis's function's output. Print(str(X)) evaluates str(X), sending the usual str() output to the console as a side effect, and then prints what str() returns, which is NULL. And invisible() prints NULL again, but we don't see NULL NULL, because the second one is invisible. Still puzzled by the order of my output, though. andrewH -- View this message in context: http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p3666543.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using str() in a function.
Dear Peter-- You write: Andrew is being seriously confused. The return(ans) is of course executed when you get to it, returning the value of `ans` and terminating the function. Anything after that is _ignored_. There is no such thing as a previous return() affecting what str() does -- that would be like asking whether it is legal to marry your widow's sister... Right. By previous, I was contrasting an explicit return somewhere other than the last expression in the {} to the implicit return of the last expression. I understand that executing a return() is the last thing a function does. str() prints last because the side effect of the preceding print()s causes them to print before str() is ever called. So, what about this one: GG-c(1:4) testX3 - function(X) {summary(X); return(str(X))} testX3(GG) int [1:4] 1 2 3 4 I thought this was ignoring the summary() because it evaluates the return() first. If it does the return(str(X)) when it encounters it, (1) why doesn't it send the summary() to the console (I'm guessing that it is because its output is local to the function), and (2) why doesn't it return the NULL that str() returns to the console? again, thanks. --andrewH -- View this message in context: http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p316.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using str() in a function.
Using str() in a function. I am in the early phase of learning R, and I find I spend a lot of time trying to figure out what is actually in objects I have created or read in from a file. I'm trying to make a simple little function to display a couple of things about a object, let's say the summary() and the str(), sequentially, preferably without a bunch of surplus lines between them. I have tried a large number of things; none do what I want. GG- c(1,2,3) # This one ignores the str(). testX - function(X) {return(summary(X)); str(X)} testX(GG) Min. 1st Qu. MedianMean 3rd Qu.Max. 1.0 1.5 2.0 2.0 2.5 3.0 # So does this one. testX2 - function(X) {return(summary(X)); return(str(X))} testX2(GG) Min. 1st Qu. MedianMean 3rd Qu.Max. 1.0 1.5 2.0 2.0 2.5 3.0 # On the other hand, this one ignores the summary() testX3 - function(X) {summary(X); return(str(X))} testX3(GG) num [1:3] 1 2 3 # This one displays both, in reverse order, with a superfluous (to my intentions) [[NULL]]. testX4 - function(X) {list(summary(X), (str(X)))} testX4(GG) num [1:3] 1 2 3 [[1]] Min. 1st Qu. MedianMean 3rd Qu.Max. 1.0 1.5 2.0 2.0 2.5 3.0 [[2]] NULL # Now we are back to ignoring the str(). testX5 - function(X) {list(return(summary(X)), (str(X)))} testX5(GG) Min. 1st Qu. MedianMean 3rd Qu.Max. 1.0 1.5 2.0 2.0 2.5 3.0 # This does the same as testX4(). testX6 - function(X) {return(list(summary(X), (str(X} testX6(GG) num [1:3] 1 2 3 [[1]] Min. 1st Qu. MedianMean 3rd Qu.Max. 1.0 1.5 2.0 2.0 2.5 3.0 [[2]] NULL I tried a bunch more, using the print command, etc., but nothng I tried resulted in the output of summary() followed by the output of str(). And is there really no way to assign the output of str() -- that is to say, the output str() normally prints to the console -- to an object? I would be very greatful for any guidance you could offer. Sincerely, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p3655785.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.