[R] glm gamma scale parameter

2007-02-06 Thread WILLIE, JILL
I would like the option to specify alternative scale parameters when
using the gamma family, log link glm.  In particular I would like the
option to specify any of the following:

1.  maximum likelihood estimate
2.  moment estimator/Pearson's 
3.  total deviance estimator

Is this easy?  Possible?  

In addition, I would like to know what estimation process (maximum
likelihood?) R is using to estimate the parameter if somebody knows that
off the top of their head or can point me to something to read?  

I did read the help  search the archives but I'm a bit confused trying
to reconcile the terminology I'm used to w/R terminology as we're
transitioning to R, so if I missed an obvious way to do this, or stated
this question in a way that's incomprehensible, my apologies.

Jill Willie 
Open Seas
Safeco Insurance
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm gamma scale parameter

2007-02-06 Thread WILLIE, JILL
Thank you.  You are correct, the shape parameter is what I need to
change  I think I see how to use the MASS package to do it...or if not,
at least I have enough now to figure it out.

A question to reconcile terminology which will speed me up, if you have
time to help me a bit more: phi = 'scale parameter' vs. 'dispersion
parameter' vs. 'shape parameter'?  Excerpt below from the R intro.
manual defining phi  the stats compliment discussion.

R intro:

distribution of y is of the form 
f_Y(y; mu, phi) = 
 exp((A/phi) * (y lambda(mu) - gamma(lambda(mu))) + tau(y, phi))
 
where phi is a scale parameter (possibly known), and is constant for all
observations, A represents a prior weight, assumed known but possibly
varying with the observations, and $\mu$ is the mean of y. So it is
assumed that the distribution of y is determined by its mean and
possibly a scale parameter as well. 


Statistics Complements to Modern Applied Statistics with S, Fourth
edition By W. N. Venables and B. D. Ripley Springer: 

7.6 Gamma models
The role of dispersion parameter for the Gamma family is rather
different. This is a parametric family which can be fitted by maximum
likelihood, including its shape parameter 


Jill Willie 
Open Seas
Safeco Insurance
[EMAIL PROTECTED] 


-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 06, 2007 12:25 PM
To: WILLIE, JILL
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] glm gamma scale parameter

On Tue, 6 Feb 2007, Prof Brian Ripley wrote:

 I think you mean 'shape parameter'.  If so, see the MASS package and 
 ?gamma.shape.

Also http://www.stats.ox.ac.uk/pub/MASS4/#Complements
leads to several pages of discussion.


 glm() _is_ providing you with the MLE of the scale parameter, but
really no 
 estimate of the shape (although summary.glm makes use of one).


 On Tue, 6 Feb 2007, WILLIE, JILL wrote:

 I would like the option to specify alternative scale parameters when
 using the gamma family, log link glm.  In particular I would like the
 option to specify any of the following:
 
 1.  maximum likelihood estimate
 2.  moment estimator/Pearson's
 3.  total deviance estimator
 
 Is this easy?  Possible?
 
 In addition, I would like to know what estimation process (maximum
 likelihood?) R is using to estimate the parameter if somebody knows
that
 off the top of their head or can point me to something to read?
 
 I did read the help  search the archives but I'm a bit confused
trying
 to reconcile the terminology I'm used to w/R terminology as we're
 transitioning to R, so if I missed an obvious way to do this, or
stated
 this question in a way that's incomprehensible, my apologies.
 
 Jill Willie
 Open Seas
 Safeco Insurance
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reducing RODBC odbcQuery memory use?

2007-01-25 Thread WILLIE, JILL
Basic Questions:

1.  Can I avoid having RODBC use so much memory (35 times the data size
or more) making a data.frame  then .rda file via. sqlQuery/save?  

2.  If not, is there some more appropriate way from w/in R to pull large
data sets (2-5GB) into .rda files from sql? 

3.  I get an unexpectedly high ratio of virtual memory to memory use
(10:1).  Can that be avoided?


Testing details (R transcript below): 1GB CPU, 1GB RAM windows machine.

1.  testing bigger input table (AUTCombinedWA_BILossCost_1per), size is
20MB in sql, 10 rows, 2 numeric columns, 55 integer columns;
consumes 35kb of memory  80kb of virtual memory to execute the
sqlQuery command.  Memory not released after the step finishes or upon
execution of odbcCloseAll(), or gc().

2.  tested small input table, size is 2MB in sql, 1 rows, 2 numeric
columns, 55 integer columns; consumes 55000kb of memory  515000kb (vm
seems oddly high to me) of virtual memory to execute the sqlQuery
command.

3.  concluded the high memory use is isolated to the odbcQuery step w/in
the sqlQuery function as opposed to sqlGetResults or ODBC itself.


Relevant R session transcript:

library(RODBC)
channel-odbcConnect(psmrd) 
df_OnePer -data.frame(sqlQuery(channel, select * from
AUTCombinedWA_BILossCost_1per))
save(df_OnePer, file = df_OnePer.rda)


Additional testing details:
I exited R which released all memory cleanly, then started R again,
loaded the .rda saved in prior step as below.  This confirmed relatively
little of the memory is consumed going from .rda to data frame;
isolating to the RODBC step:

 load(df_OnePer.rda)
 df - data.frame(df_OnePer)

I closed R  opened MS Access  used the same DSN psmrd,  to import
the AUTCombinedWA_BILossCost_1per into MS Access which required about
3kb of memory  2kb of virtual.

And finally, I have this excerpt from Prof Brian Ripley that seems
potentially relevant (if it's not just confusion because I called them
'byte-size' when really I should have said they're integers just having
values limited to 1-255).  In any case I'm unable to see from the RODBC
help how to specify this:  ...sqlQuery returns a data frame directly.
I think you need to tell RODBC to translate your 'byte-sized factors' to
numeric, as it will be going through character if these are a type it
does not know about.  

Read all the RODBC help, read all the data import guide  searched help
archives...can't find an answer.  Would appreciate advice, experience,
or direction.

Jill Willie 
Open Seas
Safeco Insurance
[EMAIL PROTECTED] 


-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 25, 2007 12:05 AM
To: WILLIE, JILL
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Size of data vs. needed memory...rule of thumb?

On Wed, 24 Jan 2007, WILLIE, JILL wrote:

 I have been searching all day  most of last night, but can't find any
 benchmarking or recommendations regarding R system requirements for
very
 large (2-5GB) data sets to help guide our hardware configuration.  If
 anybody has experience with this they're willing to share or could
 anybody point me in a direction that might be productive to research,
it
 would be much appreciated.  Specifically:  will R simply use as much
 memory as the OS makes available to it, unlimited?

Under most OSes.  Because Windows has no means to limit the amount made 
available, R under Windows does have it own limiting mechanism (which
you 
hit in the examples below).  R under Linux will allow you to run a 4GB 
process on a machine with 2GB RAM, but you are likely to get around 0.7%

usage.  (One of my colleagues did that on a server earlier this week, 
hence the very specific answer.)

 Is there a multi-threading version R, packages?

Not to run computations in R.  Some parts of R (e.g. GUIs) and some 
libraries (e.g. some BLAS) are multithreaded.  There are multiprocess
packages, e.g. Rmpi, rpvm, snow.

 Does the core R package support 64-bit

Yes, and has for many years.

  should I expect to see any difference in how memory's handled under 
 that version?

yes, because the address space will not get seriously fragmented.  See
the 
appropriate section in R-admin.html (referenced from INSTALL).

 Is 3 GB of memory to 1GB of data a reasonable ballpark?

I'd say it was a bit low, but it really depends on the analysis you are 
doing, how 1GB of data is made up (many rows?, many cols?, etc) and so
on.
Had you asked me to suggest a ratio I would have said 5.

 Our testing thus far has been on a windows 32-bit box w/1GB of RAM  1
 CPU; it appears to indicate something like 3GB of RAM for every 1GB of
 sql table (ex-indexes, byte-sized factors).  At this point, we're
 planning on setting up a dual core 64-bit Linux box w/16GB of RAM for
 starters, since we have summed-down sql tables of approx 2-5GB
 generally.

 Here's details, just for context, or in case I'm misinterpreting the
 results, or in case there's some more memory-efficient way to get data

[R] Size of data vs. needed memory...rule of thumb?

2007-01-24 Thread WILLIE, JILL
I have been searching all day  most of last night, but can't find any
benchmarking or recommendations regarding R system requirements for very
large (2-5GB) data sets to help guide our hardware configuration.  If
anybody has experience with this they're willing to share or could
anybody point me in a direction that might be productive to research, it
would be much appreciated.  Specifically:  will R simply use as much
memory as the OS makes available to it, unlimited?  Is there a
multi-threading version R, packages?  Does the core R package support
64-bit  should I expect to see any difference in how memory's handled
under that version?  Is 3 GB of memory to 1GB of data a reasonable
ballpark? 

Our testing thus far has been on a windows 32-bit box w/1GB of RAM  1
CPU; it appears to indicate something like 3GB of RAM for every 1GB of
sql table (ex-indexes, byte-sized factors).  At this point, we're
planning on setting up a dual core 64-bit Linux box w/16GB of RAM for
starters, since we have summed-down sql tables of approx 2-5GB
generally.  

Here's details, just for context, or in case I'm misinterpreting the
results, or in case there's some more memory-efficient way to get data
in R's binary format than going w/the data.frame.  
 
R session:
 library(RODBC)
 channel-odbcConnect(psmrd) 
 FivePer -data.frame(sqlQuery(channel, select * from
AUTCombinedWA_BILossCost_5per))

Error: cannot allocate vector of size 2000 Kb
In addition: Warning messages:
1: Reached total allocation of 1023Mb: see
help(memory.size) 
2: Reached total allocation of 1023Mb: see
help(memory.size)


ODBC connection:
Microsoft SQL Server ODBC Driver Version 03.86.1830

Data Source Name: psmrd
Data Source Description: 
Server: psmrdcdw01\modeling
Database: OpenSeas_Work1
Language: (Default)
Translate Character Data: Yes
Log Long Running Queries: No
Log Driver Statistics: No
Use Integrated Security: Yes
Use Regional Settings: No
Prepared Statements Option: Drop temporary procedures on
disconnect
Use Failover Server: No
Use ANSI Quoted Identifiers: Yes
Use ANSI Null, Paddings and Warnings: Yes
Data Encryption: No

Please be patient, I'm a new R user (or at least I'm trying to be...at
this point I'm mostly a new R-help-reader); I'd appreciated being
pointed in the right direction if this isn't the right help list to send
this question to...or if this question is poorly worded (I did read the
posting guide).

Jill Willie 
Open Seas
Safeco Insurance
[EMAIL PROTECTED] 



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can we do GLM on 2GB data set with R?

2007-01-20 Thread WILLIE, JILL
We are wanting to use R instead of/in addition to our existing stats
package because of it's huge assortment of stat functions.  But, we
routinely need to fit GLM models to files that are approximately 2-4GB
(as SQL tables, un-indexed, w/tinyint-sized fields except for the
response  weight variables).  Is this feasible, does anybody know,
given sufficient hardware, using R?  It appears to use a great deal of
memory on the small files I've tested.

I've read the data import, memory.limit, memory.size  general
documentation but can't seem to find a way to tell what the boundaries
are  roughly gauge the needed memory...other than trial  error.  I've
started by testing the data.frame  run out of memory on my PC.  I'm new
to R so please be forgiving if this is a poorly-worded question.

Jill Willie 
Open Seas
Safeco Insurance
[EMAIL PROTECTED] 
206-545-5673

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.