Re: [R] Problem connecting to database via RPostgreSQL/RS-DBI: could not connect error

2014-02-16 Thread andrewH
Hi Christian, thanks for responding!

I wrote a reply to you when i first saw your post, but it looks like it
didn't get to the list somehow. I'm still in Windows XP, though I imagine
I'll have to switch over to 7 or something soon. soon. I think you are right
- the problem is that I have not been able to successfully start the server. 



--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-connecting-to-database-via-RPostgreSQL-RS-DBI-could-not-connect-error-tp4684534p4685414.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem connecting to database via RPostgreSQL/RS-DBI: could not connect error

2014-02-05 Thread andrewH

On Feb 03, 2014; 7:24am, hadley wickham wrote: 
 Or you're not running a postgres db on your local machine with that
 accepts a connection with username Administrator and no password?  I
 doubt that's the error you would see if RPostgreSQL hadn't found
 libpq.

I have learned enough since this posting to be aware that my question was
based on assumptions so false to fact that they verge on the nonsensical,
but not enough to reframe it more sensibly. So I am just going to say what I
believed then and what I believe now, and perhaps you can head off the more
extreme excursions from reality in my current beliefs.

Because I knew that libpq depended on the version of PostgreSQL that was
installed, and that it used to be the case that you could not just install
the package, but it is now possible because of the addition of libpq, I
concluded that libpq contained an installer for PostgreSQL. I now think that
this is false, and that libpq is some sort of tailored interface that varies
with the OS and the PostgreSQL version. Tailored where and by what, I still
don't know, but maybe I do not need to know.

Since I thought I had to use the version of PostgreSQL installed by libpq, I
did not try to independently install PostgreSQL. Now I think that PostgreSQL
has to be installed and working before you start to install RPostgreSQL, in
order, among other things, to get the right libpq contents or settings.

I spent most of a day trying to figure out how to do what I was thinking of
as a strictly local installation of PostgreSQL, meaning one that did not
connect to the internet via a port and TCP/IP. I now believe that there is
no such thing, at least on a Windows machine, or that doing so would be
complex and difficult, compared to a connection via a machine-internal
TCP/IP connection -- something I did not know existed until the day before
yesterday. 

For the last few days I have been trying, so far unsuccessfully, to install
PostgreSQL from the (misnamed) one click installer from  enterprisedb.com.
I have figured out that the installation and error logs are written to
stderr, and where my OS puts it. I've gone through the 40+ pages of log.
Although the log contains more than 2 dozen warnings and errors, I now
believe that all of the ones that matter derive in some way from this one:
Executing batch file 'rad3BBD8.bat'...
'icacls' is not recognized as an internal or external command, operable
program or batch file.

. . . and that this means I am having some sort of problem with the Windows
XP/NT access control/permissions system. Unfortunately I know nothing
whatsoever about the Windows permission system. Learning about it is the
next thing on my list. 

I am really quite good at framing feasible research agendas to address
important policy questions on a shoestring. I'm good with regression, with
displaying complicated argument in sensible graphics. I am new to using
databases on my own behalf (rather than having someone to do it for me), but
I am finding the query side of databases elegant and intuitive. But I am not
a competent IT/sysadmin/tech support person, and I am not going to be one
soon, or maybe ever. So I find it somewhat disheartening that I am spending
so much time being just that. 

Still, I am very grateful for the help and support of this list and the
various Stack Exchange lists, which have made possible such learning as I
have been able to accomplish.  Like the glaciers, my progress is slow but
inexorable. Or so I like to believe. 
 
Warmest regards, andrewH








--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-connecting-to-database-via-RPostgreSQL-RS-DBI-could-not-connect-error-tp4684534p4684775.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem connecting to database via RPostgreSQL/RS-DBI: could not connect error

2014-01-31 Thread andrewH

In the description section of the RPostgreSQL package documentation, it
states:

In order to build and install this package from source, PostgreSQL itself
must be present your system to provide PostgreSQL functionality via its
libraries and header files. . . . On Microsoft Windows system the attached
libpq library source will be used.

I am not sure what attached means in this setting. 

After successfully running:
  install.packages('RPostgreSQL')

When I tried to run:
   require('RPostgreSQL')
   drv - dbDriver(PostgreSQL, fetch.default.rec=1)
   con - dbConnect(drv, dbname=postgres)

I get the error message:
  Error in postgresqlNewConnection(drv, ...) : 
  RS-DBI driver: (could not connect Administrator@local on dbname
postgres)

I am guessing that this error is because PostgreSQL either does not exist or
has been improperly installed.

Since RPostgreSQL is supposed to work with the particular version of
PostgreSQL that comes with it in libpq, I assume that either
install.packages('RPostgreSQL') does this installation or I have to do it. 
I have not done it, because I can not find attached libpq library. There
is no file or directory called either libpq or PostgreSQL in my \Program
Files\ directory, my \R-3.0.2 directory, my \R-3.0.2 \library directory, or
my R-3.0.2\library\RPostgreSQL directory,  nor is such a file or folder in
my working directory. There is no library called libpq on CRAN. There is no
discussion that I could find of how to install it, where it comes from, or
where installation puts it in either the package documentation or on the
Google site for the package, http://code.google.com/p/rpostgresql/   At that
site under the Source tab I did find some files at RPostgreSQL/src/libpq,
but no information on what to do with them.

I tried:
require('libpq')

and got:
Warning message:  In library(package, lib.loc = lib.loc, character.only =
TRUE, logical.return = TRUE,  :
  there is no package called ‘libpg’

I though maybe I was misunderstanding what or where libpq is, and that I
should be going to the PostgreSQL site and folowing their installation
instructions without reference to R. But I have carefully gone over the
documentation for dbDriver and dbConnect several times, and neither one
gives me any way (except a name or an IP address for a remote server) to
tell R where the PostgreSQL directory, program, or file are located. That
seems to imply that the local database RPostgreSQL connects to must be in a
location where R puts it. So I am missing something, but I do not know what. 

I am running R 3.0.2 through RStudio on a Windows XP 32-bit machine.

Any help anyone could offer would be greatly 

Warmest regards, andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-connecting-to-database-via-RPostgreSQL-RS-DBI-could-not-connect-error-tp4684534.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Diagnostic and helper functions for defective hard-to-import files

2014-01-28 Thread andrewH
Hi Folks!
I have been writing a small set of utilities for dealing with files that are
hard to open correctly for one reason or another, especially because they
are too big for memory, non-rectangular, or contain odd characters or
unexpected codings, or all of these things together. Today it suddenly hit
me that this has probably been done, done better, and upgraded to package
form a dozen times already. There were pointers to a couple functions useful
in this regard in the Core Import/Export document.  But my effort to come up
with search terms that were productive of such packages was unsuccessful. 

I would be grateful if someone would point me toward such a package or
packages if they exist.

Warmest regards, andrewH

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diagnostic and helper functions for defective hard-to-import files

2014-01-28 Thread andrewH
On Jan 28, 2014 at 8:56pm, David Winsemius wrote:

On Jan 28, 2014, at 8:43 PM, andrewH wrote:

 Hi Folks!
 I have been writing a small set of utilities for dealing with files that
 are
 hard to open correctly for one reason or another, especially because they
 are too big for memory, non-rectangular, or contain odd characters or
 unexpected codings, or all of these things together. Today it suddenly hit
 me that this has probably been done, done better, and upgraded to package
 form a dozen times already. There were pointers to a couple functions
 useful
 in this regard in the Core Import/Export document.  But my effort to come
 up
 with search terms that were productive of such packages was unsuccessful.

I don't know of a package to do that. You know the quote from that Russian
author whose name I am forgetting (in Anna Karinena perhaps) about happy
families being all the same but unhappy families being impossible to
classify. I think it applies to datasets as well. There are too many
different dataset pathologies to allow a neat packaging approach.

My approach has been to study the options in read.table very carefully and
if that is insufficient look at either readLines or scan as options. It is
very useful to be able to use `count.fields` with different parameter
settings of quotes and comment.char. Wrapping it in table() can deliver a
very compact, useful result.

And don't forget to search the Archives if you have a regular but
non-rectangular arrangement.

David Winsemius
Alameda, CA, USA 

Thanks, David! 

You have quickly summarized a set of techniques that it took me a long time
to learn (much of it spent disentangling the truth from various
misconceptions about the data-reading process. I don't think I have very
much to add to your list, but as always, the effectiveness depends on
correct implementation, and I have made a _lot_  of mistake in trying to
implement these in the past. Moreover, all these thing become much more
complicated if the file is too big to just read into a data frame. I am
working with Census records right now, and my primary data file is a 14 gig
csv that had me tearing my hair out trying to read it and pull out the
variables I have needed at any given moment. 

I finally did get it read and the right subset extracted, but it was a
pretty empirical process - I would just keep trying things that didn't work
until I found something that did, often not quite understanding why my
previous efforts had failed.  I know that If I have to do this again six
months from now I will have no idea how I did it. So I wanted to reduce the
things that worked to functions and set up a sort of decision tree that I
could work through to find and correct at least the more common problems.
But I was hoping -- am still hoping, actually -- to find that someone else
has already done this so I could get back to my real work. It seems like the
sort of thing that could easily be buried in the 100+ pages of documentation
of one of the big utility packages like Hmisc, MASS or car. 

I have often wished there was a data manipulation and import/export task
view, with a purview to cover things like what I am talking about here, the
contents of Phil Spector's book, and packages like Hadley Wickham's plyr. 

Warmest regards, andrewH





--
View this message in context: 
http://r.789695.n4.nabble.com/Diagnostic-and-helper-functions-for-defective-hard-to-import-files-tp4684357p4684364.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assigning default function arguments to themselves: Why?

2013-12-29 Thread andrewH
Dear Bill--

I have seen it most often in functions that are defined or used inside of
other functions, and need an argument from the calling function.  So I have,
purely as a matter of imitation, taken to doing it when I am writing a
function that wants an argument of the calling function passed to it
unchanged, because that is how I saw it used. So for instance, in
read.table(), scan() is called several times with reflexive argument
assignments that include:
file = file
what = what
sep = sep
quote = quote
comment.char = comment.char
allowEscapes = allowEscapes
encoding = encoding

Is this what you mean by an example that 'works'? I am sort of foggy on the
shade of meaning conveyed by the single quotes. If not, let me know what
kind of example you want, and I'll try and find it.



--
View this message in context: 
http://r.789695.n4.nabble.com/Assigning-default-function-arguments-to-themselves-Why-tp4682294p4682817.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What purpose is served by reflexive function assignments?

2013-12-29 Thread andrewH
Dear Duncan --
I am terribly sorry. I had a browser crash, and when I reopened it I found a
tab with a Nabble composition box containing an unposted version of my
question. So I thought I had never hit the send button, so I edited it a
bit and sent it off. I should have checked first. My apologies.

andrewH





--
View this message in context: 
http://r.789695.n4.nabble.com/What-purpose-is-served-by-reflexive-function-assignments-tp4682794p4682818.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What purpose is served by reflexive function assignments?

2013-12-29 Thread andrewH
Dear David--

Thanks so much for your helpful reply!

 David Winsemius wrote:
The LHS X becomes a name, the RHS X will be looked up in the calling
environment and fails if no value is positionally matched and then no X is
found (at the time of the function definition. 

Does X really have to exist when the function is defined? I thought it was
enough if it existed in the environment of the calling function, or
somewhere up the environment chain of the calling function. If this is not
true, then that means it matters a lot whether you write a function inside
another function or just call it in that function.  Suppose a function with
a reflexive assignment X=X is defined in the global environment but called
inside another function, and X has a different value in those two places.
Will it look first in the global environment and only then in the calling
environment? And is this different from the behavior without the reflexive
assignment?

I should not bother you with those questions. I should just run it both ways
and see what happens.calling function and will it look first in the 

If you use`X - value` in the argument list, then what is returned is only
the value and the name `X` may be lost. Or in the case of data.frame morphed
into a strange name: 

[example omitted]
I am not sure that I am understanding you correctly here. Are you saying
that assignment using the = retains the name (and other attributes? which
ones?) of the RHS, while - does not? 




--
View this message in context: 
http://r.789695.n4.nabble.com/What-purpose-is-served-by-reflexive-function-assignments-tp4682794p4682819.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What purpose is served by reflexive function assignments?

2013-12-29 Thread andrewH

Dear Peter--

This is a truly wonderful explanation. It makes many things clear that were
completely mysterious to me. For one thing, I realize that, for functions
called inside the definitions of other functions, I have been confusing
function definitions with function calls -- as if the called function were
also being defined. So I have seen a lot of what I was calling reflexive
assignments _inside_ of function definitions, but not as a _part of_
function definitions -- rather they are a part of the calls to other,
already-defined functions that just happen to take place inside of function
definitions.  

Let me sum up a few things I think I have learned, to make sure I am not
merely hallucinating an improved understanding.

1. Outside of function definitions and calls, = and - are pretty similar in
their effect.
2. Inside of the parentheses of a function call, - assigns the RHS to the
variable on the LHS in the enclosing environment, so the value is picked up
when the call is executed -- but is also a permanent change in the variable
of that name in the enclosing environment. (This seems like an exception to
the general no-side-effects rule, yes?)
3. Inside parentheses of a function call, = assigns the RHS to the LHS, but
only in the environment of that function -- more like it was in the function
body.
4. Inside the parentheses of a function definition, you can not do a -
assignment at all, and = has a a pretty different secondary meaning, a sort
of conditional assignment, along the lines of if you can not match the
argument before this positionally, use the value of the RHS.
5. The names of formals in the function definition do not matter outside the
function. Only their position (or an = assignment) matters. You can not get
a function to recognize a variable in its surrounding environment because it
has the same name as the name of a formal in the function definition. 
Conversely, inside the function, only the formal name matters. if
f-function(Y){X}, f(X) still gets you an argument X is missing error.
6. In a call, f(x=x) is different from f(x) if and only if x has a default
value different from the value of x in the calling environment.
7. I have learned by experiment that a reflexive assignment with = in a
function definition does not assign the value of x in the calling
environment as the default, though I do not know why. In fact, I have not
been able to make X=X do anything useful (and different from plain X) inside
the parentheses of a function definition, unless I am trying to generate
strange recursive default error warnings.

So the simple rule i wanted with respect to function definitions is just
say no. 

Is that more or less right?

Many thanks!   --andrewH




plangfelder wrote
 On Sat, Dec 28, 2013 at 7:27 PM, Andrew Hoerner lt;

 ahoerner@

 gt; wrote:
 Let us suppose that we have a function foo(X) which is called inside
 another function, bar(). Suppose, moreover, that the name X has been
 assigned a value when foo is called:

 X - 2
 bar(X=X){
 foo(X)
 }

 I have noticed that many functions contain arguments with defaults of the
 form X=X. Call this reflexive assignment of arguments.
 
 Your example code makes little sense, it throws an error even before
 reaching foo():
 
 X - 2
 bar(X=X){
 Error: unexpected '{' in bar(X=X){
 foo(X)
 Error: could not find function foo
 }
 Error: unexpected '}' in }
 
 
 What you may have in mind is something like
 
 bar = function(X)
 {
   foo(X)
 }
 
 X-2
 bar(X=X)
 
 Note that bar(X=4) is different from bar(X-4), as seen here:
 
 # Define a trivial function
 bar = function(X) {X+2}

 X = 0
 bar(X=2)
 [1] 4
 # Here only the formal argument X of function bar was set to 2; the
 global variable X was left untouched:
 X
 [1] 0
 # This assigns the value 4 to the global variable X and uses that
 value as the value for the first formal argument of bar():
 bar(X-4)
 [1] 6
 # Note that X changed in the global environment
 X
 [1] 4
 
 What you call reflexive assignment X=X is not really: the left hand
 side is the formal argument of bar(), the right hand side is the
 variable X in the calling environment of bar() (in this case global
 environment).
 
 Oh yes, and it has absolutely nothing to do with defaults. If you use
 my example above, the default for the argument X is 2, but doing
 X=0
 bar(X=X)
 
 will call the function with argument X=0, not X=2.
 
 When there is only one argument, saying X=X does not make much sense,
 but when there are many arguments, say
 
 bar = function(X=0, Y=0, Z=0)
 
 and you only want to set the argument Z to a value you call Z in the
 calling function, saying
 
 bar(Z=Z)
 
 makes perfect sense and is very different from saying
 
 bar(Z)
 
 which would set the argument X to value Z, and leave argument Z at the
 default.
 
 Hope this helps.
 
 Peter
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R

Re: [R] What purpose is served by reflexive function assignments?

2013-12-29 Thread andrewH
Dear Ista--
Peter's post has already persuaded me that my original question was based on
several misunderstandings and so difficult if not impossible to follow --
though he did a remarkable job of figuring out where I was going astray and
what examples might set me right. 

But I will post the results of two of my experiments that I still find
puzzling.

This generates a recursive default error in the cat function. I do not see
why it does not print 5:
X - 2
gg -  function(X=X){cat(gg: , X)}
ss- function(X){
X - 5
gg()
}
ss()

And this generates an  'x' is missing  error in x-y. I expected it to
return the number -1:
x-1
y-2
foo- function(x=x,y=y){x-y}
foo() 

Thanks so much for your time and attention!
andrewH


Ista Zahn wrote
 On Sat, Dec 28, 2013 at 10:27 PM, Andrew Hoerner lt;

 ahoerner@

 gt; wrote:
 Let us suppose that we have a function foo(X) which is called inside
 another function, bar(). Suppose, moreover, that the name X has been
 assigned a value when foo is called:

 X - 2
 bar(X=X){
 foo(X)
 }

 I have noticed that many functions contain arguments with defaults of the
 form X=X.
 
 An example would be really helpful here.
 
 Call this reflexive assignment of arguments.
 
 Why call this anything special? All this does is set the default value
 of the X argument. I'm not sure what makes this reflexive, or why it
 needs a special descriptive term.
 
 How is foo(X=X)
 different from foo(X)? Isn't the environment from which X is located the
 
 foo(X) is hardcoded, foo(X = X) just sets a default.
 
 parent environment of foo() in either case? Or if it looks first in the
 environment inside of foo, will it not immediately pop up to the parent
 environment if it is not found in foo? Are reflexive assignments just to
 keep X from being positionally assigned accidentally, or are they doing
 something deeper? Moreover, this is the only place I have seen people
 consistently using an equals sign in place of the usual -, and I am
 confident that there is some subtle difference in how the two assignment
 operators work, perhaps beyond the ken of lesser mortals like myself,
 that
 explains why the = is preferred in this particular application.
 
 Again, some examples would really help here.
 

 Actually, although I would like to hear the deep answer, which I am sure
 has something to do with scoping, as everything really confusing in R
 does,
 my real question is, is there some rule of thumb by which one could
 decide
 whether or not to do a reflexive assignment in a function definition and
 be
 right most of the time?
 
 I'm still not even sure what reflexive assignment means. Can you
 clarify, preferably with some examples.
 

 Lately I have gotten several Error: Promise is already under evaluation
 messages, and my current rule of thumb for dealing with this is to add
 reflexive assignment to the variable if it is missing and take it out if
 it
 is present. This seems to work, but it makes me feel unintelligent. Is
 there a better rule? I would be most grateful for anyone who could shed
 light on the subject.
 
 Perhaps someone can, but you will certainly make their job easier if
 you provide a concrete example that produces this error.
 
 Best,
 Ista
 

 Sincerely, andrewH

 --
 J. Andrew Hoerner
 Director, Sustainable Economics Program
 Redefining Progress
 (510) 507-4820

 [[alternative HTML version deleted]]

 __
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





--
View this message in context: 
http://r.789695.n4.nabble.com/What-purpose-is-served-by-reflexive-function-assignments-tp4682794p4682827.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assigning default function arguments to themselves: Why?

2013-12-29 Thread andrewH
Dear Bill--

I have figured out  that my original question and my most recent response to
you were largely nonsensical bits of idiocy.  Please do not trouble yourself
with them further. But I do thank you most sincerely for your time and
attention.

andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Assigning-default-function-arguments-to-themselves-Why-tp4682294p4682828.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Stoppa distribution

2013-12-15 Thread andrewH
Thanks enormously, Bill!  I'll run with this for a while, and let you know
how it works for me.

 Yours,  andrewH



--
View this message in context: 
http://r.789695.n4.nabble.com/The-Stoppa-distribution-tp4682171p4682254.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I find nonstandard or control characters in a large file?

2013-12-15 Thread andrewH
Thanks, Earl. Your utility ran like a charm, and confirmed that my effort
to adapt Enrico's code to this purpose had not gone astray, which is to
say, I found no funky characters. Your help is greatly appreciated.
Sincerely, andrewH


On Tue, Dec 10, 2013 at 7:35 AM, Earl F Glynn [via R] 
ml-node+s789695n4681952...@n4.nabble.com wrote:

 andrewH wrote:

  However, my suspicion is that there are some funky characters, either
  control characters or characters with some non-standard encoding,
 somewhere
  in this 14 gig file. Moreover, I am concerned that these characters may
  cause me trouble down the road even if I use a different approach to
 getting
  columns out of the file.

 This is not an R solution, but here's a Windows utility I wrote to
 produce a table of frequency counts for all hex characters x00 to xFF in
 a file.

 http://www.efg2.com/Lab/OtherProjects/CharCount.ZIP

 Normally, you'll want to scrutinize anything below x20 or above x7F,
 since ASCII printable characters are in the range x20 to x7E. You can
 see how many tab (x09) characters are in the file, and whether the line
 endings are from Linux (x0A) or Windows (paired x0A and x0D).


 The ZIP includes Delphi source code, but provides a Windows executable.
   I made a change several months ago to allow drag-and-drop, so you can
 just drop the file on the application to have the characters counted.
 Just run the EXE after unzipping.  No installation is needed.

 Once you find problems characters in the file, you can read the file as
 character data and use sub/gsub or other tools to remove or alter
 problem characters.

 efg
 Earl F Glynn
 UMKC School of Medicine
 Center for Health Insights

 __
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4681952i=0mailing 
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896p4681952.html
  To unsubscribe from How can I find nonstandard or control characters in a
 large file?, click 
 herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4681896code=YWhvZXJuZXJAcnByb2dyZXNzLm9yZ3w0NjgxODk2fC0yMDQ3NjI1NDM5
 .
 NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
J. Andrew Hoerner
Director, Sustainable Economics Program
Redefining Progress
(510) 507-4820




--
View this message in context: 
http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896p4682257.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How can I find nonstandard or control characters in a large file?

2013-12-09 Thread andrewH
I have a humongous csv file containing census data, far too big to read into
RAM. I have been trying to extract individual columns from this file using
the colbycol package. This works for certain subsets of the columns, but not
for others. I have not yet been able to precisely identify the problem
columns, as there are 731 columns and running colbycol on the file on my old
slow machine takes about 6 hours. 

However, my suspicion is that there are some funky characters, either
control characters or characters with some non-standard encoding, somewhere
in this 14 gig file. Moreover, I am concerned that these characters may
cause me trouble down the road even if I use a different approach to getting
columns out of the file.

Is there an r utility will search through my file without trying to read it
all into memory at one time and find non-standard characters or misplaced
(non-end-of-line) control characters? Or some R code to the same end?  Even
if the real problem ultimately proves top be different, it would be helpful
to eliminate this possibility. And this is also something I would routinely
run on files from external sources if I had it. 

 I am working in a windows XP environment, in case that makes a difference.

Any help anyone could offer would be greatly appreciated.

Sincerely, andrewH



--
View this message in context: 
http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] filehash error in the colbycol method for as.data.frame from a large object

2013-11-23 Thread andrewH
Dear Folks-- 
I have a 14 gig .csv file with 731 columns. I have read it into a colbycol
object (which took overnight – about 16 hours) using the code below, which
produced no warnings or error messages. The object, CPS62_12, is 49 gig.
After the reading, summary() produced the output below and colnames()
successfully returned the names of all 731 columns.


 CPS62_12 - cbc.read.table(
+ C:\\R_PROJ\\INEQ_TRENDS\\TESTS\\monofile_ALLVARS\\cps_00078.csv,
+ header = T, sep = , )
 summary(CPS62_12)
Object of class colbycol with 8093281 rows and 731 columns.
Data for the object is stored at
C:\DOCUME~1\ADMINI~2\LOCALS~1\Temp\RtmpMj3LRP\dir1a6d82a1df37.
 nrow(CPS62_12)
[1] 8093281
 ncol(CPS62_12)
[1] 731
 colnames(CPS62_12)
  [1] RECTYPE YEARSERIAL  MISH  etc.

I then ran as.data.frame() (code below) and got the following error and
warning message:

 income_HH_CPS -as.data.frame(CPS62_12,
+ c(YEAR, STATEFIP, RECTYPE, SERIAL, HWTSUPP, HHINCOME, NUMPREC))
Error in readSingleKey(con, map, key) : 
  unable to obtain value for key 'RECTYPE'
In addition: Warning message:
In readKeyMap(filecon) : NAs introduced by coercion

I tried the command on a number of column name combinations and subsequently
always got the error message without the warning. The error is always on
RECTYPE, which is the name of the first column in the csv file. 

I am as yet not able to reproduce this error on a smaller object. I copied
the first 10 lines of my file into an object by using a connection with
readLines. I evaluated the object in the console, and passed the result into
notepad, and saved it. Then I manually sliced off all but the first 15
variables. The resulting file sailed through the code above and produced a
data frame faultlessly. This undercut my leading theory, which was that the
slash double-quotes (/) that bracketed the column names were causing the
problem. 

I tried running cbc.get.col on the second variable in the file, YEAR. These
two commands:

yearCPS -cbc.get.col(CPS62_12, YEAR)
yearCPS -cbc.get.col(CPS62_12, 2)

both resulted in the following error message:

Error in readSingleKey(con, map, key) : 
  unable to obtain value for key 'YEAR'

Note that numerical indexing still returned an error on the variable name,
YEAR. I got the same result for several other variables, returning their own
names

I tracked the error message back to the following function in the filehash
package:

readSingleKey - function(con, map, key) {
start - map[[key]]
if(is.null(start))
stop(gettextf(unable to obtain value for key '%s', key))
seek(con, start, rw = read)
unserialize(con)
}

Now I am at a loss. I see that the element “key” of the list “map” has the
value NULL,  that any call to as.data.frame uses RECTYPE as the key, and
that any call to cbc.get.col() uses the passed variable name as a key, even
those that only pass a number.  But I don’t know much of anything about file
hashing, and I have run out of ideas.

Can anyone tell me what I am doing wrong, or whether there is a particular
problem with my file that is likely to be causing this problem, or what my
next diagnostic step should be? Please be aware that I can only do things I
can run on 3 gig of ram.

I am running R under RStudio 0.97.551, on a Widows XP machine with Service
Pack 3.

Sincerely, andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/filehash-error-in-the-colbycol-method-for-as-data-frame-from-a-large-object-tp4681052.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Learning the R way – A Wish

2013-03-04 Thread andrewH
There is something that I wish I had that I think would help me a lot to be a
better R programmer, that I think would probably help many others as well.  
I put the wish out there in the hopes that someone might think it was worth
doing at some point.

I wish I had the code of some substantial, widely used package – lm, say –
heavily annotated and explained at roughly the level of R knowledge of
someone who has completed an intro statistics course using R and picked up
some R along the way.  The idea is that you would say what the various
blocks of code are doing, why the authors chose to do it this way rather
than some other way, point out coding techniques that save time or memory or
prevent errors relative to alternatives, and generally, to explain what it
does and point out and explain as many of the smarter features as possible. 
Ideally, this would include a description at least at the conceptual level
if not at the code level of the major C functions that the package calls, so
that you understand at least what is happening at that level, if not the
nitty-gritty details of coding.

I imagine this as a piece of annotated code, but maybe it could be a video
of someone, or some couple of people, scrolling through the code and talking
about it. Or maybe something more like a wiki page, with various people
contributing explanations for different lines, sections, and practices.

I am learning R on my own from books and the internet, and I think I would
learn a lot from a chatty line-by-line description of some substantial block
of code by someone who really knows what he or she is doing – perhaps with a
little feedback from some people who are new about where they get lost in
the description.

There are a couple of particular things that I personally would hope to get
out of this.  First, there are lots of instances of good coding practice
that I think most people pick up from other programmers or by having
individual bits of code explained to them that are pretty hard to get from
books and help files.  I think this might be a good way to get at them.

Second, there are a whole bunch of functions in R that I call
meta-programming functions – don’t know if they have a more proper name.
These are things that are intended primarily to act on R language objects or
to control how R objects are evaluated. They include functions like call,
match.call, parse and deparse, deparen, get, envir, substitute, eval, etc.
Although I have read the individual documentation for many of these command,
and even used most of them, I don’t think I have any fluency with them, or
understand well how and when to code with them.  I think reading a
good-sized hunk of code that uses these functions to do a lot of things that
packages often need to do in the best-practice or standard R way, together
with comments that describe and explain them would help a lot with that.
(There is a good smaller-scale example of this in Friedrich Leisch’s
tutorial on creating R packages).

These are things I think I probably share with many others. I actually have
an ulterior motive for suggesting lm in particular that is more peculiar to
me, though not unique I am sure. I would like to understand how formulas
work well enough to use them in my own functions. I do not think there is
any way to get that from the help documentation. I have been working on a
piece of code that I suspect is reinventing, but in an awkward and kludgey
way, a piece of the functionality of formulas. So far as I have been able to
gather, the only place they are really explained in detail is in chapters 2
 3 of the White Book, “Statistical Models in S”. Unfortunately, I do not
have ready access to a major research library and I have way, way outspent
my book budget. Someday I’ll probably buy a copy, but for the time being, I
am stuck without it. So it would be great to have a piece of code that uses
them explained in detail.

Warmest regards to all,  andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Learning-the-R-way-A-Wish-tp4660287.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sparse dataframes?

2013-01-15 Thread andrewH
Dear Folks--
Is there a data frame analog to sparse matrices? I am working with a panel
data set that has a large number of variables that are redefined repeatedly
or exist for only a few years (out of 48).  In my current structure, where
variables are columns and rows are years, more than 90 percent of the cells
and more than 3/4 of the total size of my file are NAs.  

I am wondering if there is an alternate file specification currently
available that still allows numeric, character and factor data to be stored. 
Besides just using a database. 

A pointer in the right direction (or a solid no if that is the truth)
would be greatly appreciated.

Sincerely, andrewH



--
View this message in context: 
http://r.789695.n4.nabble.com/Sparse-dataframes-tp4655614.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using factor variables with overlapping categories

2012-11-27 Thread andrewH
ear folks –

I have a question, though it is more of a logic- or a good
practices-question than a programming question per se. I am working with
data from the American Community Survey summary file.  It is mainly
categorical count data. Currently I am working with about 40 tables covering
about 35 variables, mainly in two-way tables, with some 3-way and a handful
of four-way tables. I am going to be doing a lot of analysis on these
tables, and hope to make them available in zipped format to other R users. 
Right now I am keeping this data in single-state data frames, but I will
probably have to shift over to a database if I add many more variables.

Here is my problem: of my 35 variables, five of them are different versions
of age. Different tables cover different age ranges, and have different
levels of disaggregation for the age ranges they cover.

Currently I just have a factor for each with the cut-points in the labels.
But I feel uncomfortable with this. It seems to throw away a lot of
information. There is a “natural” mapping from the different age ranges to
one another, at least within universes (e.g. individuals vs. heads of
household), and my current approach does not encode that mapping in any way
that R can notice (unless I write special functions that read the labels) 

One of the first things I am doing with this data is using all the
cross-tabs to produce some basic estimates of higher-dimensional tabulations
– some 10-way tables covering age, race, sex, age, rent/own, income, etc.
that are consistent with all the lower-dimensional margins, using a
multi-dimensional analogue of the RAS balancing (biproportional matrix
balancing) algorithm often used to update Leontief input-output tables. 
Right now the approach I am using is to sum the age variables into four
categories the let me use four of my five age variables, and throw the fifth
(which has inconsistent breakpoints and is used in only one table) away. But
this seems wasteful to me – not only of one table, but of a lot of
information on finer age sub-structure which is shared by two or more
tables. 

I am guessing that this is a fairly common problem in dealing with large
data sets of count objects. Is there a “standard” approach to is, or a set
of commonly used approaches, that anyone could suggest or point me to? I’d
be happy with either coding suggestions or pointers to the methodology
literature if there is one.

Any help or suggestions would be greatly appreciated. Thanks! 

andrewH







--
View this message in context: 
http://r.789695.n4.nabble.com/Using-factor-variables-with-overlapping-categories-tp4651054.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can you turn a string into a (working) symbol?

2012-11-27 Thread andrewH
Dear Michael – 
This is _very_ interesting and I want to play around with the functions you
suggest. I had no idea it was so easy to define assignment operators.

However, one question: even after reading the “get” documentation and doing
a bunch of mousing around for the expressions “pos” and “the search path”, I
am not sure what function the numeral 1 in these expression serves. Why do I
want to look in the global environment rather than the current environment?
I also can not find anything that explains what the default “pos = -1” does. 

Thanks for responding!

andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4651069.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can you turn a string into a (working) symbol?

2012-11-27 Thread andrewH
Dear Greg—

You mean FAQ 7.21, not 7.22, correct? Though 7.12 also seems relevant.
Though I would say I was asking about turning a string into an expression
rather than a variable. At any rate, thanks for the pointer. I sure I would
benefit from rereading the FAQ on a monthly basis, until I actually know
most of what is in it.

As to your question about my question, I’ve wanted to do this exact thing
several times in different contexts. However, you are quite correct that I
am struggling with this problem in a particular context.  I have of a large,
multi-dimensional object containing count data. Currently this object is
implemented as a 26 dimensional (and growing) array with two to thirteen
dimnames per dimension, though I am thinking of switching it to a data frame
with dimensions as factors and dimname-equivilent factor levels.

I need to take a lot of complicated partitions of this object, mainly,
though not always, summing to the entire object. Most of the partitions are
subsets of --

– OK, now I have to digress to address a terminological uncertainty. Think
of a 4X4X4 cube. It has three dimensions, and each dimension has four of
what?  I’m going to call them levels right now, though I don’t think that is
right -- it would be confusing if there were factors in the picture. Also,
the dimnames do not name the dimensions, but the thing I am calling levels,
which is also confusing. --

Anyway, most of the partitions consist of two to four dimensions out of 24,
but sometimes with some levels omitted or summed, and occasionally the
partitions that are much more complicated (to deal with censored data,
mainly). I have to use each partition multiple times, doing a very different
thing each time (and then repeat the whole set many times) The next 4
paragraphs describe what I am actually doing with the partitions, but you
can skip over them and cut to the chase if you are not so interested.

I am summing over the dimensions in each partition, dividing a table of
“forcing totals” for that partition by those sums (element by element), and
then taking the resulting ratios and multiplying each of the terms in the
original, non-summed object by the corresponding ratio. 

This is easiest to understand by analogy to the two-dimensional case. You
take the row sums and divide them element by element by a vector of
pre-determined row “forcing totals,” to get a vector of forcing ratios. Then
you multiply each row by the corresponding forcing ratio, so that the row
sum will then match the forcing total. Then you do the same thing with the
columns. Repeat, alternating row and columns, to convergence. Each column
has a corresponding column forcing total, and each row has a corresponding
row forcing total. The elements of the matrix have two partitions that we
use, one into rows, and the other into columns.  This is sometimes called
RAS balancing, or biproportional matrix adjustment. It is an algorithm that
is used a lot to update big matrices in national income accounting and
input-output analysis.  

What I am doing is the same, but I have forcing totals in two to four
dimensional tables instead of a one dimensional vectors.  Each partition
divides the array into groups of elements that I want to sum to my forcing
totals. Again, you go around in a circle, doing forcing with each of the
(currently 18) tables, to convergence. On count data it should always
converge.

The thing is, I need to keep track of all these partitions, and then
multiply the forcing totals by the exact same elements of the array as I
previously summed.  I got up to five dimensions, coding by hand, and then
realized that 1: the amount of work in going from, e.g., 19 dimensions to 20
was going to very great, and 2. the likelihood that I would get all the
nesting and partition-matching right was vanishingly small. 

So I am looking for a way to encode the partitions that I use, that would
allow me to use the same encoding to represent both the subsets of the array
to sum over, crunching the array down to a set of totals corresponding to my
forcing totals, and also defining the subsets of the array that should be
multiplied by each forcing ratio.  And I thought, maybe I could do it with
strings of indexing commands, one per table of forcing totals. But this will
only work If I can sum the array over the subdivisions that the partition
defines, multiply all the elements in partition subdivisions by the
corresponding constants, and then assign the results back to the array, or
to a new array. Hence my question. 

I’m afraid that this explanation is too long for people to read, but hope
springs eternal.  I’d be remarkably pleased and eternally grateful if I got
a solution to the problem of keeping track of partitions that can be used in
the three ways described in the previous paragraph, even if it has nothing
to do with executing strings.

Warmest regards, 
andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-turn-a-string

[R] Making part of a data frame into a time series

2012-11-21 Thread andrewH
Dear folks –
I have a bunch of data frames where columns 1:(n-1) contain information
about a county, and columns n and higher contain a time series of monthly
observations on that county.  I wanted to get the data in columns n and
higher to be recognized as a bunch of time series. So I wrote a function
that was supposed to turn all the columns from a given column number on into
a time series:

# Convert the final cols of a data frame into a time series
MakeTS - function(data.df, firstColNo, firstYear, firstSubNo = NULL, freq =
1){
  data.df[,firstColNo:ncol(data.df)] -  ts(data =
data.df[,firstColNo:ncol(data.df)],
 
start = c(firstYear, firstSubNo), frequency = freq)
  data.df
 }

However it does not appear to work. The is.ts function will not let me test
a subset of the data frame:

 # Simplified example for check.
 AA - data.frame(rbind(c(X, 1:12), c(Y, 1:12)))
 AA
  X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
1  X  1  2  3  4  5  6  7  8   9  10  11  12
2  Y  1  2  3  4  5  6  7  8   9  10  11  12

 BB - MakeTS(AA, 2, 2010, 1, 12)
 is.ts(window(BB[,2], start = c(2010, 1), [1,2:13])
Error: unexpected '[' in is.ts(window(BB[,2], start = c(2010, 1), [

In addition, and to my great confusion, the values in columns 3 and higher
have all been replaced by ones:
 BB
  X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
1  X  1  1  1  1  1  1  1  1   1   1   1   1
2  Y  1  1  1  1  1  1  1  1   1   1   1   1

I am guessing that you are not allowed to define part of a data frame as a
time series, and that I will just have to give up on that idea. Is that
right?

And why is everything a one? Is ts using the default frequency instead of
the one I handed to it? And if so, why?

Offers of help or insight greatly appreciated.

Sincerely, andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Making-part-of-a-data-frame-into-a-time-series-tp4650392.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting information encoded in a SAS, SPSS or Stata command file into R.

2012-11-14 Thread andrewH
Dear Anthony – 

On closer examination, what I am talking about is not factor levels, but
something different (but analogous). The data that is categorical all has
integer codes, so the file is entirely numeric. The SAS proc format then
gives text strings for each code for each categorical variable. Like this:

value REGION_f
  11 = New England Division
  12 = Middle Atlantic Division
  21 = East North Central Division 
  22 = West North Central Division
  31 = South Atlantic Division
  32 = East South Central Division
  33 = West South Central Division
  41 = Mountain Division
  42 = Pacific Division
  97 = State not identified

So it would make sense to have a lookup table of these codes linked to the
variables. I’m not sure if it makes more sense to have that table live in R
or in the database. For R purposes, I imagine it would make sense to convert
these integer-valued variables into factors. 

What I do not understand is how SAS knows where the variables begin and end.
I managed to break off a little hunk of the beginning of my file and look at
it in an editor, and it is numbers without any obvious delimiters. Is the
delimiter a particular numeric string? I thought the SAS command file would
contain the starting location for each of the fixed-length fields, but I do
not see anything in the file that could be interpreted that way – just a
little wraparound code and then a long list of variable names followed by
triplets of a code, an equals sign, and a text string, terminating with a
semicolon. 

I’m sorry if I am being obtuse. When I said before that I had saved the SAS
files as flat files, what I really meant was that I had an intern do it.
When I was doing my own analysis, I mainly used TSP, before I switched to R
about a year ago. I’ve never used SAS. 

I find your data project very interesting.  Very.   It is not actually
necessary to wait for BLS to release the older CEX files, if you can lay
your hands on the CDs. I spoke to the BLS data products office about  2
years ago, and they have no problem with people republishing purchased data
in any format they like, including simple duplication.  In fact, they seemed
to like the idea.  I think the sale of data was forced on them by some kind
of mandate from above. 

I'll be playing with your code (which is a model of readability, and a
lesson to me on same, BTW) and keep you posted on my progress. 

Warmly, Andrew




--
View this message in context: 
http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649541.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting information encoded in a SAS, SPSS or Stata command file into R.

2012-11-13 Thread andrewH
Wow!  After reading Jan's post, I said Great, I'll do that, because it was
the closest to what I originally had in mind. Then I read Ista's post, and
said I think I'l try that first, because it got me back on the track of
following directions in the R Data Import/Export manual. Then I read
Anthony's post. Now, I am not so thrilled to go the database route, because
frankly have hardly ever used them before, and this would make an already
complex project take longer. 

But, I know that I will need to use the sample survey package for what I am
trying to do. So i think I am going to try to get the data into SQLite
format, and just hope the effort builds character.  Anthony, I have not used
your packages yet, but they look great!

It will probably be more than a week before i get all this worked out and
implemented. Given how much work this will be, I do not want to do it twice,
so I think I will go back to IPUMS and get the rest of the variables, and
break the file up into smaller chunks at the same time, both so I really
have the whole thing, and also so that it is easier to work with.   The
IPUMS version of the file is rectangular (it duplicates the household data
in each individual), and IPUMS has done a lot of valuable work in cleaning
the data and harmonizing variable names and definitions that have changed
over the history of the CPS. (Annoyingly, however, they have not connected
the cross-sections between years. All the CPS samples consist of two sets of
four consecutive months, eight months apart, so the March Supplement always
consist half of people who were interviewed in the last year and half of
people who will be interviewed in the next year (barring turnover)). 

Anyway, when I have figured out my route to import I will report back here.
In the meantime, I have three more questions that one of you may be able to
answer:
1.   Anthony, does the read.SAScii.sqlite function  preserve the label names
for factors in a data frame it imports into SQLite, when those labels are
coded in the command file? 
2.   If I want to make the resulting SQLite database available to the R
community, is there a good place for me to put it? Assume it is 10-20 gigs
in size.  Ideally, it would be set up so that it could be queried remotely
and extracts downloaded. Setting this up is beyond my competence today, but
maybe not in a couple of months.  (I'd like to do the same thing with the 30
years of Consumer Expenditure Survey data I have. I don't have access to SAS
any more, but I converted it all to flat flies while I still did. Currently
the BLS only makes 2011 microdata available free. Earlier years on cd are
$200/year. But they have told me that they have no objection to my making
them available). 
3. I have not yet been able to determine whether CPS micro data from the
period 1940-1961 exists. Does anyone know? It is not on
http://thedataweb.rm.census.gov/ftp/cps_ftp.html, and  IPUMS and NBER
(http://www.nber.org/data/current-population-survey-data.html)  both only
give data back to 1962. I wrote to Census a week ago, but I have not heard
back from them, and in the past they have not been very helpful about
historical micro data.

Thanks to all! Andrew




--
View this message in context: 
http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649466.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

2012-11-12 Thread andrewH
Dear folks –
I have a large (26 gig) ASCII flat file in fixed-width format with about 10
million observations of roughly 400 variables.  (It is 51 years of Current
Population Survey micro data from IPUMS, roughly half the fields for each
record).  The file was produced by automatic process in response to a data
request of mine. 

The file is not accompanied by a human-readable file giving the fieldnames
and starting positions for each field.  Instead it comes with three command
files that describe the file, one each for SAS SPSS, and Stata. I do not
have ready access to any of these programs.  I understand that these files
also include the equivalent of the levels attribute for the coded data.  I
might be able to hand-extract the information I need from the command files,
but this would involve days of tedious work that I am hoping to avoid.

I have read through the R Data Import/Export manual 2 and the foreign
package documentation and I do not see anything that would allow me to
extract the necessary information from these command files. Does anyone know
of any r package or other non-proprietary tools that would allow me to get
this data set from its current form into any of the following formats:
SAS, SPSS or Stata binary files read by R.
A MySQL data base
An ffdf object readable using the ff package.

My ultimate goal is to get the data into an ffdf object so that I can
manipulate it in R, perhaps by way of a database. In allocation I will
probably be using no more than 20 variables at a time, probably a bit under
a gig. I am working on a machine with three gig of ram. 

(I have seen some suggestions that data.table also provides a
memory-efficient way of providing database-like functions, but I am unsure
whether it would let me cope with an object of this size).

Any help or suggestions anyone could offer would be very much appreciated.

Warmest regards, andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can you turn a string into a (working) symbol?

2012-11-03 Thread andrewH
Dear folks--

Suppose I have an expression that evaluates to a string, and that that
string, were it not a character vector, would be a symbol.  I would like a
function, call it doppel(), that will take that expression as an argument
and produce something that functions exactly like the symbol would have if I
typed it in the place of the function of the expression.  It should go as
far along the path to evaluation as the symbol would have, and then stop,
and be available for subsequent manipulation.  For example, if 

aa - 3.1416
bb  - function(x) {x^2}
r - 2
xx - c(aa, bb)

out - doppel(xx[1])*doppel(xx[2])(r)

Then out should be 13.3664

Or similarly, after 
doppel(paste(a,  a,  sep=''))  -  3
aa

typing aa should return 3.

Is there such a function? Can there be? 

I thought as.symbol would do this, but it does not.
 as.symbol (xx[1])*as.symbol (xx[2])(r)
Error: attempt to apply non-function

Looking forward to hearing from y'all.--andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can you turn a string into a (working) symbol?

2012-11-03 Thread andrewH
Ah!  Excellent! That will be most useful.  And sorry about the typo. 

I found another function in a different discussion that also seems to work,
at least in most cases I have tried.  I do not at all understand the
difference between the two.
doppel - function(x) {eval(parse(text=x))

However, neither one seems to work on the left hand side of a -, a -, 
or an =.

Again, my thanks.--andrewH



--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4648365.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can you turn a string into a (working) symbol?

2012-11-03 Thread andrewH
Yes, the assign command goes a little way toward what what I was hoping for. 
But it requires a different syntax, and it does not in general let you use
quoted expressions that you could  use with other assignment operators. For
instance, 

 DD - 1:3
 assign(DD[2], 5)
 DD
[1] 1 2 3

So I am still looking for a function that produces an output that is fully
equivalent to the string without quotation marks.  Or for a definite
statement that no such function can exist.

Thanks so much for your attention to this problem.
andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4648366.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replacing ugly for loops

2012-10-11 Thread andrewH
Dear Bert--
I tried your function on the data that I provided (data.df) and it worked
beautifully (after I added a missing final parenthesis), producing exactly
the same output as my function.  This is an excellent example of what I was
looking for, because it is 
   (a) 50% shorter than mine, 
   (b) fully vectorized, and 
   (c) uses three functions that I have never used before: with, unique, and
do.call

I am going to spend a happy afternoon working through this command by
command and at the end I am confident that I will have learned some valuable
new ( to me) tricks. 
Thanks!
Warmest Regards, AndrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821p4645914.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] replacing ugly for loops

2012-10-10 Thread andrewH
I have a couple of hundred American Community Survey Summary Files files
containing rectangular arrays of data, mainly though not exclusively
numeric.  Each file is referred to as a sequence (henceforth seq).  From
these files I am trying to extract particular subsets (tables) consisting of
a sets of columns.  These tables are defined by three numbers (now in
columns in a data frame):
1.  a file identifier (seq)
2.  first column position numbers (startNo) 
3.  length of table (len)
so the columns to select for one triple would consist of
startNo:(startNo+length-1).   I am trying to create for each sequence a
vector of all the column numbers for tables in that sequence.

Obviously I could do this with nested for loops,e.g..

 seq - c(1,1,2,2)
 startNo  - c(3, 10, 3, 15)
 len - c(4, 2, 5, 3)
 data.df - data.frame(seq, startNo, len)
 
 seq.f - factor(data.df$seq)
 data.l - split(data.df, seq.f)
 selectColsList- vector(list, length(levels(seq.f)))
 for (i in seq_along(levels(seq.f))){
   selectCols - numeric()
   for (j in seq_along(data.l[[i]]$startNo)){
   selectCols - c(selectCols, 
data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j]
   data.l[[i]]$len[j]-1))
}
selectColsList[[i]] - selectCols
}
 selectColsList
[[1]]
[1]  3  4  5  6 10 11
[[2]]
[1]  3  4  5  6  7 15 16 17

But this code strikes me as inelegant and verbose. It seems to me that there
ought to be a way to make the outer loop, (indexed with i) into a tapply
function (which is why I started with a split()), and the inner loop
(indexed with j) into some cute recursive function, but I was not able to do
so. If anyone could suggest some nicer (e.g. shorter, or faster, or just
more sophisticated) way to do this instead, I would be most grateful.

Sincerely, andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Importing a complex XML file (SDMX format)

2011-12-12 Thread andrewH
Hi folks!

I am trying to read a large XML file from the Fed that contains quarterly
Flow of Funds data since the 1950s.  It contains lots of individual tables
in something called the Statistical Data and Metadata eXchange format
(SDMX format). 

I am not sure if I need something specific to the SDMX format to read the
file, or just to use the XML  package correctly. The XML package includes
over 70 documented functions and frankly I have not been able to figure
where to start. This is the first time I have ever needed to open up an XML
file of any kind, so I am starting from scratch.

I would be very grateful for advice on either reading an arbitrary but
complex XML file or from anyone who has succeeded in opening an XML file in
SDMX format. 

Warmest regards, andrewH



--
View this message in context: 
http://r.789695.n4.nabble.com/Importing-a-complex-XML-file-SDMX-format-tp4188411p4188411.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Consistant test for NAs in a factor when exclude = NULL?

2011-10-26 Thread andrewH
Dear folks?

Is there a function to correctly find (and count) the NAs in a factor when
exclude=NULL, regardless of whether their origin is in the original data or
by subsequent assignment?

In example number 1 below, where NAs are assigned by is.na()-, testing the
factor with is.na() finds the correct number of NAs.  In example number 2,
where the NAs are from the data, neither is.na(), ==NA, nor ==NA correctly
identifies the NAs.  In example number 3, which mixes NAs from assignment
with NAs from data, is.na does not even find the NAs created by assignment,
as it did in example 1.

I'm running R 2.13.2 on Windows XP with ServicePack 3

Any assistance would be greatly appreciated.

Appreciatively, andrewH


Example #1

 # Origin: is.na()-  Exclude: NULL
 KK - factor(c(A,A,B,B,C,C), exclude=NULL)
 KK[KK==C]
[1] C C
Levels: A B C
 is.na(KK[KK==C]) - TRUE
 KK
[1] AABBNA NA
Levels: A B C
 levels(KK)
[1] A B C
 levels(KK)[KK]
[1] A A B B NA  NA 
 KK==NA
[1] NA NA NA NA NA NA
 sum(KK==NA)
[1] NA
 KK==NA
[1] FALSE FALSE FALSE FALSENANA
 sum(KK==NA)
[1] NA
 is.na(KK)
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE
 sum(is.na(KK))
[1] 2

Example #2

 # Origin: data Exclude: NULL
 GG - factor(c(A,A,B,B, NA, NA), exclude=NULL)
 GG
[1] AABBNA NA
Levels: A B NA
 levels(GG)
[1] A B NA 
 levels(GG)[GG]
[1] A A B B NA  NA 
 GG==NA
[1] NA NA NA NA NA NA
 sum(GG==NA)
[1] NA
 GG==NA
[1] FALSE FALSE FALSE FALSE FALSE FALSE
 sum(GG==NA)
[1] 0
 is.na(GG)
[1] FALSE FALSE FALSE FALSE FALSE FALSE
 sum(is.na(GG))

Example #3.

 MM - factor(c(A,A,B,B,C,C, NA), exclude=NULL)
 is.na(MM[MM==C]) - TRUE
 MM
[1] AABBNA NA NA
Levels: A B C NA
 levels(MM)
[1] A B C NA 
 levels(MM)[MM]
[1] A A B B NA  NA  NA 
 MM==NA
[1] NA NA NA NA NA NA NA
 sum(MM==NA)
[1] NA
 MM==NA
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 sum(MM==NA)
[1] 0
 is.na(MM)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 sum(is.na(MM))
[1] 0

--
View this message in context: 
http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3942755.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistant test for NAs in a factor when exclude = NULL?

2011-10-26 Thread andrewH
Thanks Jeff! I appreciate you sharing your experience.

My data set is survey data, 13,209 records over nine years, collected by
someone else, converted from SPSS format. It includes missing values,
identified however SPSS does so, and translated to NAs by the import
process. It also includes values along the lines of none of your business
or beats me that are missing so far as I am concerned. I have assigned NAs
to these values.  Now I am trying to figure out some things about where
these missing values are -- whether they are disproportionately located in
any period or group.  I have been trying to get counts for subsets, but I
have not been able to make the subset counts add up to the total counts that
I get from, e.g. summary.  

So I wrote these simplified versions, and even for the simplest examples, I
could not find a function that correctly identified the NAs that I knew were
there because I put them there myself. That is why I am looking for help.
Does this make sense?

Warmest regards, andrewH


--
View this message in context: 
http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3943157.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] re coercing data frame rows to character: Am I right that this is a bug?

2011-10-21 Thread andrewH
Dear Folks--
All this seems to me to behave the way you expect, recognising that column b
is a factor:
 AA - data.frame(a=3:4, b=c('x', 'y'))
 AA[1,]
  a b
1 3 x
 as.numeric(AA[1,])
[1] 3 1
 AA[,2]
[1] x y
Levels: x y
 as.numeric(AA[,2])
[1] 1 2
 as.character(AA[,2])
[1] x y

But this seems to me to be wrong:
 as.character(AA[1,])
[1] 3 1

Shouldn't it be:
[1] 3 x
to be consistant with the normal pattern of coercing factors to character
values?
If it is a bug, is this the right place to post it?

sincerely, andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/re-coercing-data-frame-rows-to-character-Am-I-right-that-this-is-a-bug-tp3924449p3924449.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reporting multiple objects out of a function

2011-10-07 Thread andrewH
Thanks, Gabor!  

When a beginner (like myself) asks a question, it seems that the thing that
we believe we are confused about, or want to learn, may not be the thing
that would actually help us the most if it were clearly understood.  Your
response is what I consider ideal: Answer my question, then tell me the
answer to the question that I would ask if I were smart enough.

I had dismissed the idea of env- on grounds that I did not know what I
might be overwriting. The env$x- trick is a very nice one that I had not
considered because I have so little understanding of what an environment is.
I is descibed in the environment documentation as a collection of named
objects, and a pointer to an enclosing environment, which is fairly opaque
by he standard of R documentation. R has lots of different kinds of boxes in
which to collect objects. 

If you have a favorite introduction to how R environments work and/or best
practice in programming with them, I'd be pleased to read it.

Again, many thanks. and on to classes.

Warmly, andrewH 

--
View this message in context: 
http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3881118.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unexpected behavior of extract (`[`) or sapply functions

2011-10-07 Thread andrewH
Dear folks--
The function below is a snippet of a larger function that is not doing what
it is supposed to do, and I do not understand its behavior.  The larger
function is supposed to produce an array containing the results of a
user-specified function applied to groups of data defined by the
intersection of one or more factors, and return them in an array with a
dimension for each factor and a dimension level for each factor level.  This
snippet is supposed to take a data frame, a vector of column numbers
containing factors, and a column number for the data, and return (in the
test function below, just print) a list of character vectors of the level
names (one vector per dimension) and the length of those vectors.

It works fine so long as I give it more than one factor column, but if I
give it a vector of factor columns of length 1, it behave differently and
when I try to assign the names from test.levels to the dimnames of the
array,  I end up with an error message:

Error in dimnames(data) - dimnames : 
  length of 'dimnames' [1] not equal to array extent

The example below shows the function output for a test data frame
(“test.df”) when run first of a vector of two column number for factors and
then on just one. You can see how the structure of the output shifts.  I can
not understand what is happening. What I want it to do when given just
factor cols =c(1)  is to give me back exactly what it gives me bact for
factor colum 1 in factor.cols = c(1,2).

Any help or suggestions would be greatly appreciated.

Sincerely, 
   andrewH

# Test Data
test.df - data.frame(AA=rep(LETTERS[1:2], c(6,6)),BB=rep(LETTERS[3:5],
c(4,4,4)), 
  CC=rep(LETTERS[6:9],c(3,3,3,3)), DD=c(1:12))

# The function
getLevels - function(data.df, factor.cols, data.col){
test.levels - sapply(test.df[,factor.cols, drop=F], levels)
cat(test.levels:\n); print(test.levels)
no.levels - sapply(sapply(data.df[,factor.cols, drop=F], levels), length)
cat(no.levels:\n); print(no.levels)
} 

# Run it with two factors and again with 1, Output below
cat(\nTest 2 factors:\n)
getLevels(test.df, c(1,2), 4)
cat(\nTest 1 factor:\n)
getLevels(test.df, c(1), 4)

Test 2 factors:
 getLevels(test.df, c(1,2), 4)
test.levels=
$AA
[1] A B

$BB
[1] C D E

no.levels=AA BB 
 2  3 
 cat(\nTest 1 factor:\n)

Test 1 factor:
 getLevels(test.df, c(1), 4)
test.levels= AA 
[1,] A
[2,] B
no.levels=A B 
1 1 


--
View this message in context: 
http://r.789695.n4.nabble.com/Unexpected-behavior-of-extract-or-sapply-functions-tp3881176p3881176.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reporting multiple objects out of a function

2011-10-05 Thread andrewH
Thanks for the response, Paul!  But I thought these dumped the variables into
the global environment. Is that not correct? I want to make them available
in the calling environment, without making them available in the global
environment, unless that is where the function is called. This is my bow to
the fact that what I want this function to do is not good programming
practice in general.

The whole purpose of this function is to save me time, typing and wear on my
limited short-term memory capacity, by having standard objects with standard
names quickly available.

I wonder if eval.parent would do the job. Like:

fun1 - function(x, y, z) eval.parent{obj1 - x; obj2 - y; obj3 - z })

Or does that just use the parent environment for the inputs, not the output?
Part of my problem is that I am not sure how to tell if I have succeeded.
Otherwise I would just test it myself.

andrewH


--
View this message in context: 
http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3875586.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reporting multiple objects out of a function

2011-10-05 Thread andrewH
Thanks, Sina! This is very helpful and informative, but still not quite what
I want.

So, here is the thing: When a function returns an object, that object is
available in the calling environment.  If it is returned inside a function,
it is available in the function, but not outside of the function.  What I
want to do is simply to return more than one object in the usual sense in
which functions return objects.

Here is a test to see if a function fun does this, at least to the depth of
1.

obj1 - 1
obj2 - 2

cat(obj1 in global=, obj1)
cat(obj2 in global=, obj2)

wrapFun - function(fun) {
   obj1 - 3
   obj2 - 4
   cat(obj1 in calling=, obj1)
   cat(obj2 in calling=, obj2)
   fun()
   cat(obj in calling=, obj)
   cat(obj1 in calling=, obj1)
   cat(obj2 in calling=, obj2)
}

cat(obj1 in global=, obj1)
cat(obj2 in global=, obj2)


Suppose the function fun assigns the values 5 and 6 to obj1 and obj2.  If
the function does what I want, this code should print:
obj1 in global=  1
obj2 in global=  2
obj1 in calling= 3
obj2 in calling= 4
obj1 in calling= 5
obj2 in calling= 6
obj1 in global=  1
obj2 in global=  2

I turned Paul’s and Sina’s code into functions as follows:
paulFun - function() {
obj1 - 5; 
obj2 - 6; 
}

sinaFun - function() {
attach(what = NULL, name = my_env)
assign(obj1, 5, envir = as.environment(my_env))
assign(obj1, 5, envir = as.environment(my_env))
}

Running these two functions in the code above yields:

paulFun:
obj1 in global= 1
obj2 in global= 2
obj1 in calling= 3
obj2 in calling= 4
obj1 in calling= 3
obj2 in calling= 4
obj1 in global= 5
obj2 in global= 6

So paulFun puts the objects in the global environment but not in the calling
environment. Let’s try sinaFun:

sinaFun:
obj1 in global= 1
obj2 in global= 2 
obj1 in calling= 3
obj2 in calling= 4
obj1 in calling= 3
obj2 in calling= 4 
obj1 in global= 1
obj2 in global= 2

sinaFun puts the objects in the new environment it defines, but they are
available in neither the calling nor the global environment.  However, I was
immediately convinced that Sina had given me the tool I was missing: the
assign function. (Thanks, Sina!)  But I was wrong (or used it wrong), and
now I am even more deeply confused.  Here is a function that I thought would
do what I want:

andrewFun - function() {
assign(obj1, 5, pos = sys.parent(n = 1))
assign(obj2, 6, pos = sys.parent(n = 1))
NULL
}

However, when I tried it, my results were the same as paulFun: assigned in
the global environment, but not in the calling environment.  Setting n = 0
seemed to limit the assignment to the interior of andrewFun: none of the
printed obj values were affected.

Help?

andrewH


--
View this message in context: 
http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3876201.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reporting multiple objects out of a function

2011-10-04 Thread andrewH
Dear folks, 

I’m trying to build a function to create and make available some variables I
frequently use for testing purposes.  Suppose I have a function that takes
some inputs and creates (internally) several named objects. Say, 

fun1 - function(x, y, z) {obj1 - x; obj2 - y; obj3 - z
missing stuff
}

Here is the challenge: After I run it, I want the objects to be available in
the calling environment, but not necessarily in the global environment.  I
want them to be individually available, not as part of a list or some larger
object.  I can not figure out how to do this.  If I understand the situation
correctly, I am trying to move several separate objects from the environment
of the function to the environment in which the function was invoked (the
“calling environment,” yes?).  

I’m pretty sure there is a command to do this, but I’m not sure how to find
it. Any help would be greatly appreciated – either on the necessary code, or
on how to search for it, or a reference to a good discussion of this family
of problems.

Sincerely, andrewH


--
View this message in context: 
http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3873380.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Referring to an object by a variable containing its name: 6 failures

2011-09-18 Thread andrewH
Thanks Josh  Duncan!  That was very clear and helpful. After going back and
reviewing documentation for { and $ I am realizing that R the pattern in
R documentation is simply to tell you the truth, and not to give much effort
to distinguishing confusable choices. Once again, things that seemed crazy
to me become perfectly sensible once understood. I think I need to read
function documentation more the way one reads concept definitions in a math
book. 

Josh, one question: Your reasons to avoid attach() seem cogent. However,
Venables, Smith et al. say in “An Introduction to R” 
:
A useful convention that allows you to work with many different problems
comfortably together
in the same working directory is
 gather together all variables for any well defined and separate problem in
a data frame
under a suitably informative name;
 when working with a problem attach the appropriate data frame at position
2, and use the
working directory at level 1 for operational quantities and temporary
variables;
 before leaving a problem, add any variables you wish to keep for future
reference to the
data frame using the $ form of assignment, and then detach();
 finally remove all unwanted variables from the working directory

I'm still at the point that I am doing things just because Authority says
so, but unfortunately, everyone is Authority, relative to me. Still, I
wonder if you have any thoughts about why such a venerable authority as
Venables et al. would recomend a programming practice if that practice
should always be avoided. For cognative dissonance form authority conflicts,
that's up there with the Google R stylesheet saying to avoid using S4
classes.

Again, my thanks.
andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Referring-to-an-object-by-a-variable-containing-its-name-6-failures-tp3817129p3822436.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Referring to an object by a variable containing its name: 6 failures

2011-09-17 Thread andrewH
Dear Folks -- 
The anonymous poster (rmailbox) is perfectly correct.  I had forgotten you
could use names in this way.  When referring to rows or columns by name
rather than by number, I usually use either attach() or the $ operator,
neither of which works here. If anyone understands why data.df[,colName]
works in this setting but datadf$colName and the use of as.symbol(colName)
after attach(data.df) do not work, i would love an explanation, because I
sure don't.

Thanks, Timothy, for helping to clarify what I was trying to do. You are
exactly right, and your analogy to the $$ command in PHP – a command that
works -- was thereby more perfect than my analogy to things in R which do
not work. 

Elk's suggestion to use the get() function was very welcome, as I had never
really understood what get() was for, and this is a great use that often
arises. However, for this purpose, get() is somewhat capricious in its
effectiveness. “get(colName)” works as the operand of class(), length(),
mode(), and summary(), but it does not work for typeof(), where it returns
this error:  
Error in eval(substitute(expr), data, enclos = parent.frame()) :  numeric
'envir' arg not of length one
And it does not work for str(), where it treats the variable name as a
character string rather than a symbol.

Again, I do nut understand what distinguishes the functions for which Elk's
solution works from those for which it does not. Does anybody know? Ideas
welcome.

--and thanks again for all the help.

andrewH


--
View this message in context: 
http://r.789695.n4.nabble.com/Referring-to-an-object-by-a-variable-containing-its-name-6-failures-tp3817129p3819813.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Name the dots! (...)

2011-09-17 Thread andrewH
Dear Folk--

Suppose I have some objects A, B  C, and a function
getDots - function(...) {args - list(...)  etc.}

If I do a call to getDots(A, B, C) then the variable args will be assigned
to a list which contains the objects to which A, B  C refer, but which will
not (except by happenstance) contain the names A, B, or C.  I would like
getDots to return a named list, with the object names being assigned as the
element names in the list. 

Is there any way to do this? 

As an aside, I do not understand why the list command does not do this by
default, like the data.frame command does.  In fact, you can use data.frame
instead of list to get a named argument list, but only if all your objects
are of the same length.

Thanks in advance for any help you can offer.
andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Name-the-dots-tp3819947p3819947.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Returning the name of an object passed directly or from a list by lapply

2011-09-16 Thread andrewH
Dear Bill--
Wow. This is very clever and I learned a lot from it. I've never seen the
...() trick before, and on a Google code search, I could not find anyone
else who had used it. And I've never used the ... feature, which, BTW,
though mentioned in every intro to R text, has no help page I can find.
grrr

Your function is still not doing what I am trying to do, doubtless because I
was not clear enough in the question I pose  At the bottom of this message I
have posted a copy of my testing function, and a few objects to test it on.
Its details are unimportant and not very interesting, but note that all of
its important outputs are in the form of side effects.   What I would like
to be able to do is this:

f(fun, n variable names)
and get back this:
fun(variable#1)
fun(variable#2)
...
fun(variable#n)

Attempting to copy some of your techniques, I came up with this: 
evaluate - function(fun, ...){   
unevaluatedArgs - substitute(...)
for (i in 1:length(deparse(unevaluatedArgs)))
fun(deparse(unevaluatedArgs)[i])
invisible(TRUE)
}
As applied to my test data, it works on the first variable (but gets the
variable name wrong) and ignores the remainder of the list, e.g.:

 evaluate(testX, H.char, H.vec, H.df,  H.mat)
###
testX( deparse(unevaluatedArgs)[i] ):  Class= character  Type= character 
Mode= character 
Summary:
   Length Class  Mode 
1 character character 
Structure:
 chr H.char

On the other hand, this almost works:
evaluate - function(fun, ...){   
evaluatedArgs - list(...)
for (i in 1:length(evaluatedArgs))  fun(evaluatedArgs[i])
invisible(TRUE)
}

The only thing it does not do is get the name of the passed object right.
That seems like it ought to be a small problem, but as you pointed out, the
names are not in the list.  (BTW, I don't understand dropping the names as a
design choice for the list() function. If you use list() to make a list out
of four symbols for objects, wouldn't it be better to make the text of the
symbols the default names for those objects? That would solve this problem
nicely.) [s]ubstitute seems to drop all but the first variable passed by


Thanks so much for your thoughtful help.

andrewH



testX - function(objectX, bar=TRUE) {# A useful diagnostic function
object.name - deparse(substitute(objectX))
if(bar) cat(##\n);
cat(testX(, object.name, ): );  cat( Class=, class(objectX));
cat(  Type=, typeof(objectX)); cat(  Mode=, mode(objectX), \n);
cat(Summary:\n); print(summary(objectX))
cat(Structure:\n);  str(objectX);
if (is.factor(objectX)) {cat(Levels: , levels(objectX), \n);
cat(Length: , length(objectX), \n)}
invisible(object.name)
}

## Define 4 test variables: H.char, H.vec, H.df, H.mat
H.char  - letters[1:10]
H.vec   - c(1:10)
H.df  - {  # Makes a test data set A.df with 2-, 3-,  4-factor sorting
variables, making 24 
# combinations,   a 4th variable with a  unique data value for each
combination.
# No random component. 
   year.2 - factor( rep(1:2, each=12) )
   cohort.3 - factor( rep(rep(1:3,each=4),2) )
   race.4 - factor( rep(1:4, 6) )
   D1  - as.numeric(as.character(year.2))*1.1 +
as.numeric(as.character(cohort.3))*1.01+
as.numeric(as.character(race.4))*1.001
   data.frame(year.2,cohort.3,race.4,D1)
}
H.mat -matrix(1:16, 4, 4)
## End of test variables

--
View this message in context: 
http://r.789695.n4.nabble.com/Returning-the-name-of-an-object-passed-directly-or-from-a-list-by-lapply-tp3816798p3819378.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Returning the name of an object passed directly or from a list by lapply

2011-09-15 Thread andrewH
Dear folks:

Let’s suppose I want a function to print return the name of the object
passed to it. 

 myname - function(object) {out-deparse(substitute(object)); out}

This works fine on a single object:
 O1 -c(1:4)
 myname(O1)
[1] O1

However it does not work if you use lapply to pass it the same object from a
list:
 O2 -c(1:4)
 object.list - list(O1,O2)
 lapply(object.list, myname)
[[1]]
[1] X[[1L]]

[[2]]
[1] X[[2L]]

Is there any way to write myname() so that it returns the same objects name
regardless of whether it is handed the name directly or by lapply as an
element of a list?

Any help you can offer would be greatly appreciated.

Warmly, andrewH


--
View this message in context: 
http://r.789695.n4.nabble.com/Returning-the-name-of-an-object-passed-directly-or-from-a-list-by-lapply-tp3816798p3816798.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Returning the name of an object passed directly or from a list by lapply

2011-09-15 Thread andrewH
Thanks Bill!

You are correct. I did not understand what was inmy list.

I posted a simplified example in the hope of focusing on the essentials, but
I see I have edited out the motivation.  When my programs go awry, and
sometimes when they don't, I find I need to understand what is in some
variable or variables.  To help with debugging, I have built a little 
testing function that takes the names of one or more variables and returns a
variety of information about each one: summary(), str(), class(), type(),
etc., starting with the name. (The name is unimportant when I hand it one
variable, but for a longer list, I want to print it to help keep track of
what outcome goes with what variable).  It also gives me some extra
information about certain data types that I seem to have more trouble with,
notably factors. These days I’m devoting myself nearly full-time to trying
to learn R, and I probably run this function between 50 and 200 times a day.

Now I am trying to figure out some way of running my testing function on
more than one variable at a time. Should be easy on a computer, right?  I
don't care if I cluster my variables is a list, vector, or what -- I just
want to be able to evaluate a bunch of them at one time. And I'd rather not
have to type quotation marks around each variable name. I've timed myself,
and it increases the time it takes me to type a list by 250%. Shortly I'll
be posting a different question with regards to my failure to get this
function to work in a loop.  But I also very much want to be able to use one
of the apply-family functions to run on multiple variables.

If, as you have persuaded me,  I can not use a list of variable names, this
larger problem still has to have a straightforward solution, I think. But I
sure don't know what it is.

Any suggestions from any quarter would be deeply appreciated.

andrewH  


--
View this message in context: 
http://r.789695.n4.nabble.com/Returning-the-name-of-an-object-passed-directly-or-from-a-list-by-lapply-tp3816798p3817116.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Referring to an object by a variable containing its name: 6 failures

2011-09-15 Thread andrewH
Dear Folks--

I'm trying to make a function that takes the columns I select from a data
frame and then uses a for loop to print some information about each one,
starting with the column name. I succeed in returning the column name, but
nothing else I have tried using the variable colName, containing the name of
the column,  to refer to the column itself has worked. 

Below I show my non-working function, and a data frame to test it on. I try
six distinct ways of trying to turn the variable containing the name back
into a name that is recognized by other R functions, mainly functions that
display the properties of the object to which the name refers. These are
also numbered in the code below in comments: 

1. evaluate colName with eval().
2. convert back into as symbol with as.symbol()
3. treat the data frame as the calling environment using with()
4. use substitute() to plug in any bound information bound in the
environment
5. attach the data frame from which the column name is drawn
6. access the column using the $ operator

I have actually made this function work using numeric indexing. But I do not
understand why none of these ways of accessing the column using its name
work. They all give me the properties of the name as a character vector,
(except (2), which gives me its properties as a symbol) rather than the
properties of the vector to which the name refers. What am I doing wrong?
How do I use a variable containing an object's name to refer to the object
itself?

Although I'm hoping others will find the bald look caused by tearing my hair
out to be attractive, I would appreciate any assistance you can offer in
understanding this question.

Warmly, andrewH

testDFcols - function(data.df, select=c(1:ncol(data.df)), bar=TRUE) {#
A useful summary function
if(bar) cat(##\n);
attach(data.df)
for (column in select) {
  colName -names(data.df)[column]
  cat(Column Name(, colName, ): )
# six failures
  cat( Class=, class(eval(colName)))#1
  cat(  Type=, typeof(as.symbol(colName))) #2
  cat(  Length=, length(with(data.df, colName)))   #3
  cat(  Mode=, mode(substitute(colName)), \n)   #4
  cat(Summary:\n)
  print(summary(colName))
  cat(Structure:\n)
  str(colName)  %
  if (is.factor(data.df$colName)) {cat(Factor Levels: , 
  levels(data.df$colName),\n)} else cat(\n)#6
  }
  detach(data.df)
  invisible(deparse(substitute(data.df)))
}

A1.df - { 
# Makes a test data set A.df with 2-, 3-,  4-factor sorting variables,
making 24 
# combinations,   a 4th variable with a  unique data value for each
combination.
# No random component. 
   year.2 - factor( rep(1:2, each=12) )
   cohort.3 - factor( rep(rep(1:3,each=4),2) )
   race.4 - factor( rep(1:4, 6) )
   D1  - as.numeric(as.character(year.2))*1.1 +
as.numeric(as.character(cohort.3))*1.01+
as.numeric(as.character(race.4))*1.001
   data.frame(year.2,cohort.3,race.4,D1)
}

testDFcols(A1.df)


--
View this message in context: 
http://r.789695.n4.nabble.com/Referring-to-an-object-by-a-variable-containing-its-name-6-failures-tp3817129p3817129.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading R Code aloud

2011-09-14 Thread andrewH
Dear Clarence--

LOL!  R really is good for an amazing range of things! How old are your
kids?

If no one else points to an intensional sample of vocal R, maybe you could
record one of your more inspired readings. It could be a service to the
community of R learners -- and if it catches on, perhaps to parents
everywhere.

Peace, andrewH
 



--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-R-Code-aloud-tp3811142p3813540.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading R Code aloud

2011-09-13 Thread andrewH
Dearfolks-- 
I have been told by an experienced R programmer and teacher whom I trust
that it is easier to understand R code if you read it aloud, as the language
that it is.  However, she was clear that reading it aloud was not simply
reading the marks on the screen: you read A.df[5,] as the fifth row of
A.df (or the fifth row of data frame A), not as A dot df left square
bracket five comma right square bracket, which is not helpful at all.  So
you have to be able to read it to read it aloud. I have observed this of
poetry as well, and that, if you hear a poem read well once, you have a
deeper understanding of it (and often other work by the same poet) forever
after, even when reading it silently.

So I was wondering if there are any significant example of people reading R
code out loud available on the web, on youtube or something? I did not find
any on ten minutes search, but perhaps I do not know how best to look. Does
anyone know of any?

Warmly, andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-R-Code-aloud-tp3811142p3811142.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Searching the console

2011-09-08 Thread andrewH
Thanks, Josh! I'm using TINN-R now, but I have been thinking of switching to
ESS.  Though perhaps TINN-R  has a similar function -- I had been looking
for consol functions, rather than editor functions.

andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Searching-the-console-tp3797884p3797996.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Consistently printing the name of an object passed to a function; a data-auditing question

2011-09-08 Thread andrewH
Dear folks--
I always seem to find that I spend more than half my time making sure my
input date is in the right form, properly aligned, with no bizarre features. 
You know the drill: five kinds of missing values, three of them documented.
An alpha mistype in one numeric field turns 30,000 numbers into factor
levels.  SPSS conversion turns 250 factors nicely into R factors, except 3
have levels instead of labels. A few columns in some years of a survey have
undocumented differences in units.  Halfway through a 20-year annual survey,
they add two more allowable answers to a question. etc. 

I'm looking for things to make my data auditing go faster.  One of them is a
dopy little function, testX(),  bundling together a variety of r tools to
tell me what is in an object.  Here it is:

testX - function(objectX, bar=TRUE) {# A useful diagnostic function
object.name - deparse(substitute(objectX))
if(bar) cat(\n);  # visual separation between
consecutive objects.
cat(testX(, object.name, ): );  cat(Class=, class(objectX)); cat( 
Mode=, mode(objectX), \n);
cat(Summary:\n); print(summary(objectX))
cat(Structure:\n);  str(objectX);
if (is.factor(objectX)) {cat(Levels: , levels(objectX), \n);
cat(Length: , length(objectX), \n)}
invisible(object.name)}

This works well when I give it the name of a single object. My problem is
when I try to produce descriptions of a bunch of variables in a row, such as
the variables in a list of variables, or all the variables that I have
clomped together in a data frame.  The output is all side effects. Some ways
of passing multiple variables get the name wrong, but the rest right. For
example, if I have a list of variables, and do:

 lapply(varList, testX)

I get an output like this:

##
testX( X[[1L]] ): Class= factor  Mode= numeric 
Summary:
1994 1997 1999 2002 2003 2007 2009 
1009 1165  985 2502 2528 2007 3013 
Structure:
 Factor w/ 7 levels 1994,1997,..: 1 1 1 1 1 1 1 1 1 1 ...
Levels:  1994 1997 1999 2002 2003 2007 2009 
Length:  13209 

If instead, I do it with a loop through a the variable names in a
data.frame, I get the name wrong _and_ it does not evaluate all the way to
an object:

 names(var.df)
 [1] year  YEAR  AGE   COHORT.5  COHORT.10 ETHNIC   
EDUC  INCOMEINTERNET  PARTY IDEOL 

for (sel in 1:length(names(var.df))) testX(names(var.df)[sel]) 

Gives an output like this:

##
testX( names(var.df)[sel] ): Class= character  Mode= character 
Summary:
   Length Class  Mode 
1 character character 
Structure:
 chr year

Or I can select the column instead of the name of the column. This gives me
the right answer on the object description, but not the name, thus:
 for (sel in 1:length(names(var.df))) testX(var.df[[sel]])

##
testX( var.df[[sel]] ): Class= integer  Mode= numeric 
Summary:
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
   199420022003200320072009 
Structure:
 int [1:13209] 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 ...

I've tried doing various things to names(var.df)[sel] to get it closer to
the object -- as.symbol, eval(substitute() ), several others, but I just get
variations on the output above. 

So there are actually two questions here:
1.  How can I write this function so that it works when I just give it an
object, but I can also use it with an apply-family function and a  list (or
vector, or whatever)  of objects, and still have it both treat the object as
an object and print its name correctly?  

2.  How can I write the function, or write a loop, or use an apply-family
function, to use this function to go through the columns of a data.frame,
correctly naming and correctly describing each?

Another way of asking this same question is this: I want to be able to give
testX the name of an object, or a reference to a named object, via
apply-family function, indexing, or whatever.  (A) How can I get the name I
print, object.name, to be the name of the object in both cases? And, (B),
how can I make sure that objectX is the actual object that the name refers
to, and not the name or the reference, in both cases?

Finally, and this should maybe be another post, I'd love to hear if others
have thought through the whole question of efficient data auditing.  Is
there a suite of tools, or a standard set of recommendations, that you use
and like? I'd love to hear any useful advice about how to accelerate this
stage of a project, and get more quickly to its statistical heart.

Most sincerely, andrewH


--
View this message in context: 
http://r.789695.n4.nabble.com/Consistently-printing-the-name-of-an-object-passed-to-a-function-a-data-auditing-question-tp3798005p3798005.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r

Re: [R] Searching the console

2011-09-08 Thread andrewH
Dear Sarah--

I am thinking mainly in terms of long programs run by cut-and-past or some
other batch-like submission, where you can get back a lot of code, some
program outputs, and some error messages, all in a big lump.  I want tl look
through that lump to locate all the error or warning messages, or all the
occurrences of some variable or function that seems to be causing a problem.
In both cases I want to find the result in context.

Is that clearer?

Thanks for your attention.
  andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Searching-the-console-tp3797884p3799611.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Searching the console

2011-09-08 Thread andrewH
Thanks Eik!

I did not know about or remember history. I agree that it solves part of my
problem, but I really want to be able to search my code and the things R has
printed in response as a single block of text. I can cut-and-paste it into a
text editor, but I was hoping that there was a way to do it from the console
itself, or otherwise cut out manual steps.

Warmly, andrewH



--
View this message in context: 
http://r.789695.n4.nabble.com/Searching-the-console-tp3797884p3799630.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Searching the console

2011-09-07 Thread andrewH
Is there any way to search the console during an interactive session? I've
looked and looked, and can not find one.  In some add-on package, maybe?

Sorry to be so basic, but help would be greatly appreciated.

andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Searching-the-console-tp3797884p3797884.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can you send side effect text into a variable?

2011-08-15 Thread andrewH
Dear folks --

There are a number of functions -- I am thinking of str() as an example --
that produce text as a side-effect, rather then returning it. Is there any
way to send the text produced by such functions into a character variable? 

Any suggestions would be greatly appreciated.

andrewH



--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-send-side-effect-text-into-a-variable-tp3746025p3746025.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package or procedure recommendations for analysis of repeated cross-sections?

2011-07-26 Thread andrewH
OK, Ive done more research, and I think that what I am looking for is
repeated cross section or pseudo-panel estimators. Does anyone know if
these have been implimented inany r package?

--
View this message in context: 
http://r.789695.n4.nabble.com/Package-or-procedure-recommendations-for-analysis-of-repeated-cross-sections-tp3694587p3696832.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Package or procedure recommendations for analysis of repeated cross-sections?

2011-07-25 Thread andrewH
I have a survey data set of 6 years and about 1500 persons surveyed per year,
with roughly 200 questions per survey. The samples are drawn independently
without replacement and are intended to represent the nation (USA).

I would like to create something like a synthetic panel, dividing the
respondents up into groups and then seeing if year to year changes in the
mean value of my independent variable for each group varies with the level
or the change in the group mean of my explanatory variable.  

The grouping would be based on several factors the levels of which denote
demographic variables such as income, race, and birth cohort.  Each group
would consist of all those respondents that are identical in their level of
all the selected factors, i.e., it would consistent of all the respondents
in the sample who share an identical race, income level, birth cohort, etc. 
After being imported from an SPSS data set, these variables are implemented
as R factors.  My dependent variables are measures of ideology and party
affiliation; the variables that identify the groups are factors known to be
correlated to political ideology for which I wish to control; and my
independent variables focus on sources of news and information.  My
hypothesis is that the change in ideology we have observed over the period
for which I have data can be explained in part by changes how these groups
get their information.  I’m not sure if the ideology change should respond
to the level or to the change in level my independent variable. I intend to
test both.

I was about to try to write this from scratch, but it occurred to me that
this is a variety of problem for which a nice package probably already
exists, and I could probably find it if I knew the right terminology.  I am
not enough of a statistician to know the conventional name for the procedure
of using subgroupings of cross-sections repeated over time as if they were
panels.  Moreover, I suspect my procedure of dividing a population into
groups based on each combination of the classifying variables has a
conventional name, and that looking at differences or ratios of the means of
an independent variable over those groups and how they respond to the mean
level of an independent variable by group has a name, and that each has one
or more good implementation in R. 

Finally, I was thinking of simply regressing changes in the group means of
my independent variable on the group means or changes in the group means of
my independent variable.  But this throws away information that I know is
relevant, though I am not sure how best to use it, e.g. that the groups are
of different sizes, so the mean differences or ratios will differ in their
variances. I could assume they are normal and do a correction for
heteroskedasticity, but if there is a better approach, I’d rather use it. 

My apologies if this question is unduely basic.  I did two semesters of
graduate econometrics once, but that was more than a decade ago, and I fear
that, like many with a superficial knowledge of econometrics, I tend to see
every research question in terms of OLS or GLM, even if that is not the
right model for the problem. 

Any help or suggestions would be greatly appreciated.

Sincerely, andrewH


--
View this message in context: 
http://r.789695.n4.nabble.com/Package-or-procedure-recommendations-for-analysis-of-repeated-cross-sections-tp3694587p3694587.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using str() in a function.

2011-07-15 Thread andrewH
Thanks, everybody, this has been very edifying. One last question:

It seems that sometimes when a function returns something and you don't
assign it, it prints to the console, and sometimes it doesn't. I'm not sure
I understand which is which. My best current theory is that, if the function
returns NULL, by itself and not as part of some larger object, it does not
print it, but non-null values are printed. Is that correct?

Thanks! Andrew


--
View this message in context: 
http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p3670513.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using str() in a function.

2011-07-13 Thread andrewH
Thanks, David  Dennis, that is very helpful.

Let me state what I think I have learned, to see if I understand it
correctly. Then I have two remaining questions.

If a function contains more than one expression in its {}, it always returns
the value of the last evaluated expression in its definition, and only the
last object -- unless you previously use the return() function on an object
before the last expression, in which case, the value of that expression is
returned instead. And in either case, explicit or implicit return(), the
returned expression is evaluated, and returned, first -- before any other
expressions are evaluated, and any side effects also occur before any other
expressions are evaluated. (Though I am unsure in what order expressions are
evaluated if  objects in the returned expression are defined by other
expressions before it in the function.  The chain of evaluation -- and of
any side effects of that evaluation -- propogates backwards, maybe?).  

The print() command inside a function sends the object it contains to the
currently-defined printer, as a side-effect, without returning it.  The
difference between return() and print() is that if something is returned, R
checks to see if the value of the function is assigned or otherwise nested
in a larger evaluated expression. Is so, a copy is moved to the assigned
object and the original is deleted. If not, it is printed to the current
device and then deleted. If you print() it, it does not check for assignment
or use before sending it to the printer and deleting it.

A lot of functions, e.g. str(), have as their explicit or implict return an
expression which does not create an object. In this case, the function
returns a NULL. If you do not want to print the NULL or other returned
object, you make the returned argument invisible().

But there are still things here I do not understand.  The function that
Dennis Murphy provided does print the str() output last instead of first,
because its final expression is invisible() rather than str(). But, it still
prints out (and returns - I checked) a NULL.  e.g.

 GG-c(1:5)
 testXa - function(X) {
 print(summary(X))
 print(str(X))
 invisible() # returns nothing
  }
 testXa(GG)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  1   2   3   3   4   5 
 int [1:5] 1 2 3 4 5
NULL
 
 # Here is my latest version, of the function, which does exactly what I
want:
 testXf - function(X) {
 print(Summary:); print(summary(X))
 print(Structure:);  invisible(str(X))
  }
 testXf(GG)
[1] Summary:
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  1   2   3   3   4   5 
[1] Structure:
 int [1:5] 1 2 3 4 5

So, two questions:
1. In Dennis's function, the str() results are printed last because they are
no longer returned, as invisible() is now the last expression. But why does
his function still print a visible NULL?
2. My function, above, makes the NULL value returned by str() invisible. But
invisible(str(X)) is the last expression evaluated, so why does the
side-effect printing of str() results happen last instead of first?

and thanks again!
andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p3666339.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using str() in a function.

2011-07-13 Thread andrewH
David --  Ah! Excellent. OK, that explains Dennis's function's output.
Print(str(X)) evaluates str(X), sending the usual str() output to the
console as a side effect, and then prints what str() returns, which is NULL. 
And invisible() prints NULL again, but we don't see NULL NULL, because the
second one is invisible. 

Still puzzled by the order of my output, though.
andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p3666543.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using str() in a function.

2011-07-13 Thread andrewH
Dear Peter--
You write:
Andrew is being seriously confused. The return(ans) is of course executed
when you get to it, returning the value of `ans` and terminating the
function. Anything after that is _ignored_. There is no such thing as a
previous return() affecting what str() does -- that would be like asking
whether it is legal to  marry your widow's sister... 

Right. By previous, I was contrasting an explicit return somewhere other
than the last expression in the {} to the implicit return of the last
expression. I understand that executing a return() is the last thing a
function does.

str() prints last because the side effect of the preceding print()s causes
them to print before str() is ever called. 

So, what about this one:
GG-c(1:4)
testX3 - function(X) {summary(X); return(str(X))}
testX3(GG) 

int [1:4] 1 2 3 4

I thought this was ignoring the summary() because it evaluates the return()
first.  If it does the return(str(X)) when it encounters it, (1) why doesn't
it send the summary() to the console (I'm guessing that it is because its
output is local to the function), and (2) why doesn't it return the NULL
that str() returns to the console?

again, thanks.  --andrewH

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p316.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using str() in a function.

2011-07-09 Thread andrewH
Using str() in a function.

I am in the early phase of learning R, and I find I spend a lot of time
trying to figure out what is actually in objects I have created or read in
from a file.  I'm trying to make a simple little function to display a
couple of things about a object, let's say the summary() and the str(),
sequentially, preferably without a bunch of surplus lines between them.  I
have tried a large number of things; none do what I want. 

 GG- c(1,2,3)
# This one ignores the str().
 testX - function(X) {return(summary(X)); str(X)}
 testX(GG)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
1.0 1.5 2.0 2.0 2.5 3.0 

# So does this one.
 testX2 - function(X) {return(summary(X)); return(str(X))}
 testX2(GG)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
1.0 1.5 2.0 2.0 2.5 3.0 

# On the other hand, this one ignores the summary()
 testX3 - function(X) {summary(X); return(str(X))}
 testX3(GG)
 num [1:3] 1 2 3

# This one displays both, in reverse order, with a superfluous (to my
intentions) [[NULL]].
 testX4 - function(X) {list(summary(X), (str(X)))}
 testX4(GG)
 num [1:3] 1 2 3
[[1]]
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
1.0 1.5 2.0 2.0 2.5 3.0 

[[2]]
NULL

# Now we are back to ignoring the str().
 testX5 - function(X) {list(return(summary(X)), (str(X)))}
 testX5(GG)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
1.0 1.5 2.0 2.0 2.5 3.0 

# This does the same as testX4().
 testX6 - function(X) {return(list(summary(X), (str(X}
 testX6(GG)
 num [1:3] 1 2 3
[[1]]
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
1.0 1.5 2.0 2.0 2.5 3.0 

[[2]]
NULL

I tried a bunch more, using the print command, etc., but nothng I tried
resulted in the output of summary() followed by the output of str(). And is
there really no way to assign the output of str() -- that is to say, the
output str() normally prints to the console -- to an object?  

I would be very greatful for any guidance you could offer.

Sincerely, Andrew

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-str-in-a-function-tp3655785p3655785.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.