[R] integers to POSIXct

2006-10-04 Thread paul sorenson
What is the recommended way to convert/coerce and integer to a POSIXct 
please?

d - as.POSIXct(Sys.Date())
i - as.integer(d)

as.POSIXct(i)
Error in as.POSIXct.default(i) : do not know how to convert 'i' to class 
POSIXlt

This appears to be the behaviour in 2.3.1 and 2.4.0 on windows XP.

I have tried searching on this and found as.Date.integer in package zoo 
which performs a similar function but wondered if there was something 
basic I was missing in the base distribution?

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] more on date conversion differences in 2.2.1 vs 2.3.1

2006-08-11 Thread paul sorenson
With dates I get different results with 2.2.1 and 2.3.1.  From my 
somewhat naive point point of view, the 2.2.1 behaviour seems more sensible.

Running the code below in 2.2.1:
V1
2006-08-01 2006-08-01
   1 1

With 2.3.1 I get:
V1
1154354400 1154440800
 1  1

# testdate.R
t - read.csv2('testdate.csv', header=FALSE)
t$V1 - as.POSIXct(t$V1)
print(t)
x - xtabs(V2 ~ V1, data=t)
print(x)

# testdate.csv
2006-8-1;0;1
2006-8-1;1;1
2006-8-2;0;1
2006-8-2;0;0
2006-8-2;1;1

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple return values and optimization

2006-06-09 Thread paul sorenson
I have a function (masheff) which returns a value which I can optimize 
no problem, eg:

optimize(masheff, c(15,30), maximum=TRUE, m_gd=5.13, v_tot=41, e_c=1.0)

I would like masheff() to return multiple values say as a list with 
named elements like so:

v = masheff(...)
v$eff
v$extract
etc

Is there a simple way to do this in an optimize context or would I need 
to set some global variables and inspect them afterwards?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] updated r-help traffic plots

2006-01-27 Thread paul sorenson
I have updated some mail list traffic plots I created a while back 
however I wouldn't consider the data verified.  The reply stats are from 
In-Reply-To headers.

http://brewiki.org/tmp/r-help_traffic.png

The data set is http://brewiki.org/tmp/r-help.zip

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] regular expressions, sub

2006-01-27 Thread paul sorenson
There are some interactive regex tools around.  I use a python one 
sometimes.  You just then have to be careful re escaping and the style 
of regular expressions used in the tool you worked with and the target 
environment.

Christian Hoffmann wrote:
 Hi,
 
 I am trying to use sub, regexpr on expressions like
 
 log(D) ~ log(N)+I(log(N)^2)+log(t)
 
 being a model specification.
 
 The aim is to produce:
 
 ln D ~ ln N + ln^2 N + ln t
 
 The variable names N, t may change, the number of terms too.
 
 I succeded only partially, help on regular expressions is hard to 
 understand for me, examples on my case are rare. The help page on R-help 
 for grep etc. and regular expressions

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] y axis text truncated

2006-01-18 Thread paul sorenson
I have been trying to find which par settings can help me avoid 
truncated text at the bottom of the y axis in a mosaic plot (created 
when I plot a result of a 2d xtabs) without much success.  Using las=1 
has helped but the text (the 500+ level) is still cropped.

I get the same result on XP/2.2.0 and FC4/2.2.1.

Any tips would be appreciated.

# dput(foo.df)
  foo.df = structure(list(vol1 = structure(c(1, 2, 3, 4, 5, 6, 1, 2, 3,
4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6
), .Label = c(100, 101-250, 251-500, 501-750, 751-1000,
1000+), class = factor), vol2 = structure(c(1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5,
5, 5, 5, 5), .Label = c(20, 20-50, 50-100, 100-500,
500+), class = factor), Freq = c(4, 3, 0, 0, 2, 0, 4, 3,
6, 4, 1, 2, 1, 3, 3, 4, 5, 2, 3, 1, 3, 2, 2, 12, 0, 0, 1, 0,
2, 4)), .Names = c(vol1, vol2, Freq), row.names = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30), class = data.frame)

  xtabs(Freq ~ vol1 + vol2, data=foo.df)
   vol2
vol1   20 20-50 50-100 100-500 500+
   100   4 4  1   30
   101-2503 3  3   10
   251-5000 6  3   31
   501-7500 4  4   20
   751-1000   2 1  5   22
   1000+  0 2  2  124

  plot(xtabs(Freq ~ vol1 + vol2, data=foo.df))
  plot(xtabs(Freq ~ vol1 + vol2, data=foo.df), las=1)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Windows ESS for XEmacs - installer

2006-01-17 Thread paul sorenson
I mentioned this on the ESS list a little while ago, I made an installer 
  for ESS and XEmacs on Windows.

It worked for me but given my minimal knowledege of ESS and XEmacs, it 
might not be the right way to do it and may or may not work for you.

I hosted it and the source for the inno setup script at 
http://brewiki.org/XEmacsESS but would be only too pleased for it to 
move somewhere more appropriate and modifed as deemed necessary.

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-07 Thread paul sorenson
Frank E Harrell Jr wrote:
 paul sorenson wrote:
 
 I am a fan of wiki's and I reckon it would really help with making R 
 more accessible.  On one extreme you have this email list and on the 
 other extreme you have RNews and the PDF's on CRAN.  A wiki might hit 
 the spot between them and reduce the traffic on the email list.
 
 
 Thanks Paul.  But as long as the email list is active I fear a wiki 
 won't be.

That would be sad if that were true.  They are different beasts, as 
would be an IRC channel.  I say complementary, not mutually exclusive.

A wiki takes time to reach critical mass (eg my home brew wiki 
http://brewiki.org/ or wikipedia) and you couldn't just pull the plug on 
this list without a serious impact on the uptake of R I would have thought.

Contributions to the wiki from mugs like me with less R/statistics 
experience would hopefully make R more accessible to newbies - pointing 
out the traps for new players.

One way to bootstrap it is to simply add a wiki menu entry into the 
r-project.org menu. This is what the guys over at 
http://wiki.wxpython.org/ have done.  Over time, some of the other items 
there might morph in to wiki pages as appropriate.

I have no doubt that if the R-Wiki was supported in the same thoughtful, 
thorough and patient way in which questions on R-Help are answered, it 
would be one of the lowest entropy wiki's around.

I have some experience with moinmoin (a python wiki) and would be 
willing to contribute some time and skills if that would help.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wikis etc.

2006-01-06 Thread paul sorenson
I am a fan of wiki's and I reckon it would really help with making R 
more accessible.  On one extreme you have this email list and on the 
other extreme you have RNews and the PDF's on CRAN.  A wiki might hit 
the spot between them and reduce the traffic on the email list.


Frank E Harrell Jr wrote:
 I feel that as long as people continue to provide help on r-help wikis 
 will not be successful.  I think we need to move to a central wiki or 
 discussion board and to move away from e-mail.  People are extremely 
 helpful but e-mail seems to be to always be memory-less and messages get 
 too long without factorization of old text.  R-help is now too active 
 and too many new users are asking questions asked dozens of times for 
 e-mail to be effective.
 
 The wiki also needs to collect and organize example code, especially for 
 data manipulation.  I think that new users would profit immensely from a 
 compendium of examples.
 
 Just my .02 Euros
 
 Frank

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Error in X11(paste(png::, filename, sep = ), width, height, pointsize unable to start device PNG

2005-12-30 Thread paul sorenson
Marc Schwartz wrote:
 ...
 On a side note, RH9 is a fairly dated and EOL'd distribution, with
 security updates and bug patches only provided by the Fedora Legacy
 folks (http://fedoralegacy.org). You should consider upgrading to Fedora
 in the near future (if you want to stay with RH), since the FL folks can
 drop support for RH9 at any time, leaving you vulnerable.

Yes but also don't do this naively.  You might find that gcc 4 breaks 
some of your source compilations and IIRC this is the default compiler 
if you install FC4 with the install everything option.  I haven't 
tried compiling R with gcc 4, this is just a general comment.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ESS and Emacs

2005-12-29 Thread paul sorenson
I tried it with XEmacs on Win XP and I had to install ESS separately.

Mark Leeds wrote:
 I have been using the document written by John Fox titled
 Sn Introduction to ESS + XEmacs for Windows
 Users of R.
  
 It's a very nice document and
 I went through it carefully but I got
 an error when I finished it and launched XEmacs.
  
 The error is cannot open load file : ess-site.
  
 So, I did more investigation
 and it seems like there is a folder
  
 Program Files/XEmacs/xemacs-packages/etc/ess
  
 that should have been created when
 I downloaded XEmacs but it doesn't exist. 
 There is a folder called efs ( rather than ess ) but
 I don't think that's it.
  
 I tried installing XEmacs again but the
 same thing happened.
  
 Does anyone know anything about this ?
 I am using Windows NT.
 Thanks.
  
  Mark
 
 
 **
 This email and any files transmitted with it are confidentia...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] date handling

2005-12-13 Thread paul sorenson
d = as.POSIXlt(c(2005-07-01, 2005-07-02, 2005-07-03, 2005-07-04, 
2005-07-05))
d$mon and d$year will get you part way there.

That is assuming your dates are formated -mm-dd.  strptime() might 
also be useful.

Richard van Wingerden wrote:
 Hi,
 
 Given a frame with calendar date's:
 
 2005-07-01, 2005-07-02,2005-07-03,2005-07-04,2005-07-05,etc.
 
 I want to extract the following from these dates:
 
 week number
 month number
 year number
 
 Any ideas how to accomplish this?
 
 Many thanks.
 
 Regards,
 Richard
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to draw continent boundry

2005-12-06 Thread paul sorenson
Have you looked in the maps package?

Yogesh K. Tiwari wrote:
 Hi,
 
 If I am ploting a world map like
 
 plot (lon,lat)
 
 then how to draw a continent boundry in that
 plot.
 
 What is the command...
 
 
 Many thanks
 
 Regards,
 Yogesh

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R newbie...

2005-12-06 Thread paul sorenson
Return something that can hold more than one value, eg:

calculate - function(x, y) {
list(a=x+y, b=x-y)
}

David Hajage wrote:
 Thank you for your answer.
 
 And what if my first function gives 2 results :
 
 calculate - function(x,y)
 {
 a - x + y
  b - x - y
 }
 
 How can I use both a and b in a new function ?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plot

2005-12-06 Thread paul sorenson
Apple Ho wrote:
 Hello,
 
 I have a problem about using the command plot. Suppose I have some
 points, and one of them is (0,0), how can I show the figure with this
 point which is at the corner?

How close to the corner do you want it?

  plot(0, 0, xlim=c(0, 1), ylim=c(0,1))
you could also add:
  grid()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting question

2005-12-05 Thread paul sorenson
Ed,

I am no expert at R but if you follow the tips from the previous poster 
(eg type ?lines) and go to the very bottom of the help page there is an 
example that plots a line over some points.

cheers

Ed Wang wrote:
 Yes, I have gone through the manual.  My best reference for plotting has
 been examples either through the list archive or searches on the internet.
 Nothing in the introductory manual could get me to what I have been
 able to do so far, but that is limited to plotting componenets of the time
 series returned from an STL call.
 
 This is why I am asking for example or references to examples from anyone
 who would be willing to share them.  For some of us not very familiar with
 S+, etc. the documentation with R is not enough.  While I can plot two
 time series one above another using the mfrow() function I'd prefer to
 put two time series in one plot in different colours and using two different
 symbols, which I cannot do using calls to plot().
 
 Thanks.
 
A man is not old until regrets take the place of dreams.
  Actor John Barrymore
 
 
 
 
 From: Berton Gunter [EMAIL PROTECTED]
 To: 'Ed Wang' [EMAIL PROTECTED], r-help@stat.math.ethz.ch
 Subject: RE: [R] plotting question
 Date: Mon, 5 Dec 2005 14:12:47 -0800
 ?lines ?points
 
 An Introduction to R (and numerous other books on R) explains this. Have you
 read it?
 
 
 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA
 
 The business of the statistician is to catalyze the scientific learning
 process.  - George E. P. Box
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] What made us so popular Nov 16-20?

2005-12-03 Thread paul sorenson
I had too much time on my hands on a sunday morning:

For the hell of it I grepped my r-help date headers for the last month 
or two.  I was looking for some corresponding increase in r-help traffic.

The data is:
http://brewiki.org/tmp/r-help_traffic.txt

The traffic looks something like:
http://brewiki.org/tmp/r-help_traffic.png

I didn't take into account time zones for the plot (but I did plot the 
histogram of them).

cheers


Duncan Murdoch wrote:
 Our main US mirror is cran.mirrors.pair.com, AKA cran.us.r-project.org. 
   Pair.com keeps statistics on traffic on the mirror sites, and I got 
 all excited when I looked at this page:
 
 http://mirrors.pair.com/pair/stats.html
 
 and saw that CRAN was 5th most popular over the last month, getting more 
 visitors than Apache, MySQL, OpenOffice, etc.  Then I looked at this graph:
 
 http://mirrors.pair.com/freebsd/stats/cran-ip.png
 
 and saw that this is likely due to a huge spike in traffic between Nov 
 16 and 20.  Our visitors (not sure of the exact definition) went from 
 the usual  10K/day up to 50-150K/day during that week.
 
 Did we get mentioned somewhere (e.g. Slashdot), or was someone just 
 experimenting with some automated downloading?
 
 Duncan Murdoch
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to subset rows using regular expression patterns

2005-12-03 Thread paul sorenson
Something like

A[grep('^ab', as.vector(A$M)),]

might work

zhihua li wrote:
 hi netters,
 
 i have a dataframe A with several columns(variables). the elements of 
 column M are character strings. so 
 A$M=c(ab,abc,bcd,ac,abcd,fg,.fl).
 
 i wanna extract all the rows where A$M match some regular expression 
 pattern.
 for a simple example, let the pattern be just ab, i wanna subset the 
 rows where A$M=ab or abc or abcd or abXX.
 
 i know i can write a loop,using some regular expression pattern 
 functions like grep row by row. but when A's size is pretty large, it's 
 inefficient. could anyone give me a hint about a faster code?
 
 thanks a lot!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] identifying strong clustering

2005-12-02 Thread paul sorenson
I have 130 or so rows of questionnaire data.  It is 67 columns wide 
mostly of categorical nature (and often yes/no).

I have been using daisy, pam and plot to view cluster plots with subsets 
of columns.  With different subsets, there is great variation in the 
degree of overlap of the clusterplot ellipses, which I assume indicates 
some measure of differentiation between clusters.

Can anyone point me at forms of analysis which can help identify column 
subsets which yield high clustering (separation of ellipses?)?

Sorry for the verbose description, there is probably a single word or 
phrase which describes what I would like to do.  I just don't know it.

Ultimately I am trying to identify distinct groups within the 
respondents and find what areas (columns) that define the groups.

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] what is best for scripting?

2005-12-02 Thread paul sorenson
As I get more familiar with R I tend to find the need for massaging data 
with other scripts decreasing.


I still use python as a front end to some R tasks and it is a great 
language to have in your personal arsenal (as is R).


The kind of decision making process for me goes something like:

	o If R can read the data directly then use R.  I am pleasantly 
surprised at R in this regard.


	o If Python (or other scripting tool) already has bindings for dataset 
that you want, consider using Python.  Eg I extract software development 
metrics from Perforce with python for plotting in R.


	o If the data set has a complex grammar, choose a tool with support for 
grammar compilers (I use and recommend pyparsing but there are dozens of 
choices).  Actually I didn't check if there is a grammar compiler for R. 
 Someone mentioned BeautifulSoup not long ago for extracting stuff from 
broken HTML.  I have used this also with some success for extracting 
deeply nested tables in poorly written HTML.


I usually dump data from Python to R in CSV format.  I call R scripts 
from Python and about the only trick I use here is to read in an R 
script template and perform string variable expansion (interpolation in 
Perl) before sending it to an R process.  See attached for example.  For 
various reasons I have not used the R-Python bindings.


cheers


Molins, Jordi wrote:

I am using R in Windows. I see that I will have to use batch processes with
R. I will have to read and write text files, and run some R code; probably
some external code too. I have never done scripting. Is there any document
that explains simple steps with examples? I also have heard that Python is a
good scripting language. Is it worth the effort? (I do not have too much
free time, so if I could do without, much better ...).

Has anybody strong opinions on that? Past experiences?

Thank you!

Jordi




The information contained herein is confidential and is inte...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



#!/usr/bin/env python
# Executes an R script
# $Revision: 1.1 $
# paul sorenson oct 2003

CMD = '''/Program Files/R/rw2001/bin/rterm.exe --vanilla --slave'''

import logging
log = logging.getLogger('TT')


def RExec(RFile, params = None):
'''
Execute an R script - optionally interpolating dict style params.

Warning, R also accepts %0d style interpolation, if you want to pass
this to R then your original script must add an additional %.
Eg:
png(filename=c:/sosman/testtrack/web/tt%%02d.png, width=640)

This will get passed to R as:
png(filename=c:/sosman/testtrack/web/tt%02d.png, width=640)
if you pass in non null params to the function.

If params is not set then no interpolation will take place so raw
R scripts will pass through unchanged.
'''
import popen2
(r, w, e) = popen2.popen3(CMD)
rsrc = RFile.read() % params
log.debug(rsrc)
w.write(rsrc)
w.close()
err = e.read()
print err
e.close()
r.close()
 
def usage():
print 'usage: python R.py --r-script=R_script [--params=params]'
sys.exit(2)

if __name__ == __main__:
import sys, getopt
try:
opts, args = getopt.getopt(sys.argv[1:], p:r:, [params=, 
r-script=])
except getopt.GetoptError:
usage()
scriptname = None
params = None
for opt, arg in opts:
if opt in ('-p', '--params'):
params = eval(arg)
elif opt in ('-r', '--r-script'):
scriptname = arg
if not scriptname:
usage()
print '# script: %s  params: %s' % (scriptname, params)
r = file(scriptname, 'r')
RExec(r, params)
r.close()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] calculating IRR (accounting) in R

2005-12-01 Thread paul sorenson

paul sorenson wrote:
I can't seem to track down R functions to calculate Internal Rate of 
Return and NPV?


Thanks for the answers people.  Comparing the answers with what Excel 
pops out shows just how assumptions can vary.  In particular whether the 
first payment is at T0 or T1.


cheers

# calculates NPV and IRR

# Assumes first cash flow is at time 0
npv1 - function(i, cf, tt = seq(along = cf)) sum(cf / (1+i)^tt)

# http://www.mathepi.com/comp/discounting.html
# Compute net present value of payments received at the
# beginning of each time interval given the discount.rate.
# Payment 0 happens at time 0 (not time 1).
npv - function(discount.rate, payments, nn=length(payments)) {
sum(payments * (1/(1+discount.rate))^(0:(nn-1)))
}

# Payment 0 happens at time 1 (not time 0).
npv1 - function(discount.rate, payments, nn=length(payments)) {
sum(payments * (1/(1+discount.rate))^(1:nn))
}

# npv function with optional zero shift
npv.s = function(discount.rate, payments, npv0=0, fn) {
fn(discount.rate, payments) - npv0
}

# Solve IRR
# npv must cross zero in the search range
irr = function(payments, npv0=0, fn=npv) {
uniroot(npv.s, c(0.0, 1), payments=payments, npv0=npv0, fn=fn)$root
}

# Plot npv for range of discount rates
# Can be used as diagnostic and to find range where npv crosses
# zero (requirement for uniroot).
p.npv = function(t, r=c(0,1), fn=npv) {
s = seq(r[1], r[2], (r[2]-r[1])/30)
plot(s, sapply(s, fn, t, simplify=TRUE), type='l', 
xlab=discount rate, ylab=NPV, main=NPV vs discount rate
)
grid()
#abline(0, 0, col='blue')
}

# Annotate with IRR
p.irr = function(t, fn=npv) {
r = irr(t)
points(r, 0)
text(r, 0, sprintf('IRR = %2.2f%%', 100*r), pos=4, col='blue')
}

t = c(-8000, 100, 100, 100, 2000, 3000, 4000, 5000)
r = 0.1
npv(r, t)
#p.npv(t)
#p.npv(t, fn=npv1)
irr(t)

t2 = c(-8000, 9000, 7000, 6000, 4000)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] calculating IRR (accounting) in R

2005-11-30 Thread paul sorenson
I am trying to replace a spreadsheet model of a project justification 
with an R script.

I can't seem to track down R functions to calculate Internal Rate of 
Return and NPV?  Am I missing something?  NPV doesn't seem so difficult 
to calculate (at least for a regular series) but I am struggling to 
identify how to solve for IRR in R.

It would be sufficient if it worked for a regular series but really 
useful if there was something that worked with irregular time series.

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Fw: Re: Is there anything like a write.fwf() or possibility to print adata.frame without rownames?

2005-11-22 Thread paul sorenson
If you are desperate, save the file as tab delimited and then use vi or 
some stream editor to convert the tabs to spaces.  If you set the tab 
stop wide enough you should be able to guarantee uniform columns.

Gorjanc Gregor wrote:
Petr Pikal wrote:

Hi

did you tried something like

write.table( tab, file.txt, sep=\t, row.names=F)

which writes to tab separated file?


Petr thanks, but I do not want a tab delimited file. I need spaces
between columns.

write.table( tab, file.txt, sep=, row.names=F)
Can it do what you want?
 
 
 Ronggui thanks,
 
 but this does not work also. For example I get something like
 this bellow
 
 26 1 42 DA DA lipa Monika
 26 1 42 DA DA lipa Monika
 27 1 41 DA DA smreka Monika
 27 1 41 DA DA smreka Monika
 
 and you can see, that there is a problem, when all values
 in a column do not have the same length. I need to get
 
 26 1 42 DA DA lipa   Monika
 26 1 42 DA DA lipa   Monika
 27 1 41 DA DA smreka Monika
 27 1 41 DA DA smreka Monika
 
 i.e. columns should be properly aligned.
 
 Lep pozdrav / With regards,
 Gregor Gorjanc
 
 --
 University of Ljubljana
 Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan
 Zootechnical Department mail: gregor.gorjanc at bfro.uni-lj.si
 Groblje 3   tel: +386 (0)1 72 17 861
 SI-1230 Domzale fax: +386 (0)1 72 17 888
 Slovenia, Europe
 --
 One must learn by doing the thing; for though you think you know it,
  you have no certainty until you try. Sophocles ~ 450 B.C.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] PNG-import into R

2005-11-21 Thread paul sorenson
Rajarshi Guha wrote:
 Right now it segfaults, so its not really useful yet.

Can I use that as a tag line?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] correlating irregular time series

2005-11-14 Thread paul sorenson
I don't have the texts you mention but I get the general idea.  The 
diagram I posted shows only a small fraction of the events I have.

Thank you

Christophe Pouzat wrote:
 Hi Paul,
 
 Here is how an amateur statistician deals with this problem when 
 analyzing spike trains from simultaneously recorded neurons.
 
 Start by estimating the hazard function h(t) of your several point 
 processes (if you have a copy of MASS, check out the chapter 13, If you 
 have a copy of Jim Lindsey, The Statistical Analysis of Stochastic 
 Processes in Time, check out chap 3  4; the hazard function is also 
 called the conditional intensity or the stochastic intensity).
 
 In practice if you have a renewal process, meaning that the successive 
 intervals between your events times are independent, you can first 
 estimate the Inter Event Interval pdf, f(t), and its cumulative 
 distribution function F(t). h(t) is then given by:
 
 h(t) = f(t) / (1-F(t)),
 
 where the quantity S(t) = 1-F(t) is often called the survivor function.
 
 Fine, now if your processes are well approximated by renewal processes, 
 you can look for the distribution of time to next event (TTN) and 
 time to former event (TTF). By that I mean that for each of the black 
 events of your figure, you must get the interval separating it from the 
 last red event preceding it (the time to former) and the next red event 
 following it (the time to next). Under the null hypothesis of no 
 correlation these to random variables have the same pdf given by:
 
 TTN(i) = S(i) / IEI,
 
 where S(i) in that case is the survivor function of the red (test) 
 process and IEI is its inter event interval expected value.
 Using this approach I typically estimate the TTN and TTF pdfs with 
 histograms and compare these histograms to their expected values under 
 the null hypothesis. A warning though, I have most of the time much more 
 events than you seem to have on your figure.
 
 Let me know if any of this makes sense.
 
 Christophe.
 
 paul sorenson wrote:
 
 I have some time stamped events that are supposed to be unrelated.

 I have plotted them and that assumption does not appear to be valid. 
 http://metrak.com/tmp/sevents.png is a plot showing three sets of 
 events over time.  For the purpose of this exercise, the Y value is 
 irrelevant.  The series are not sampled at the same time and are not 
 equispaced (just events in a log file).

 The plot is already pretty convincing but requires a human-in-the-loop 
 to zoom in on hot areas and then visually interpret the result.  I 
 want to calculate some index of the events' temporal relationship.

 I think the question I am trying to ask is something like: If event B 
 occurs, how likely is it that an event A occurred at almost the same 
 time?.

 Can anyone suggest an established approach that could provide some 
 further insight into this relationship?  I can think of a fairly basic 
 approach where I start out with the ecdf of the time differences but I 
 am guessing I would be reinventing some wheel.

 Any tips would be most appreciated.

 cheers

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

  

 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] correlating irregular time series

2005-11-13 Thread paul sorenson
I have some time stamped events that are supposed to be unrelated.

I have plotted them and that assumption does not appear to be valid. 
http://metrak.com/tmp/sevents.png is a plot showing three sets of events 
over time.  For the purpose of this exercise, the Y value is irrelevant. 
  The series are not sampled at the same time and are not equispaced 
(just events in a log file).

The plot is already pretty convincing but requires a human-in-the-loop 
to zoom in on hot areas and then visually interpret the result.  I 
want to calculate some index of the events' temporal relationship.

I think the question I am trying to ask is something like: If event B 
occurs, how likely is it that an event A occurred at almost the same time?.

Can anyone suggest an established approach that could provide some 
further insight into this relationship?  I can think of a fairly basic 
approach where I start out with the ecdf of the time differences but I 
am guessing I would be reinventing some wheel.

Any tips would be most appreciated.

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Command line and R

2005-11-10 Thread paul sorenson
Angelo Secchi wrote:
 
 On Wed, 09 Nov 2005 12:25:37 - (GMT)
 (Ted Harding) [EMAIL PROTECTED] wrote:
 
 
On 09-Nov-05 Roger Bivand wrote:

On Wed, 9 Nov 2005, Angelo Secchi wrote:

Hi,
I wrote a small R script (delta.R) using commandArgs(). The script
works from the shell in usual way

R --no-save arg1  delta2.R

Suppose arg1 is the output of another shell command (e.g. gawk,
sed ...). Is there a way to tell R to read arg1 from the
output of the previous command? Any other workaround?

Use shell variables, possibly also Sys.getenv() within R as well as or 
instead of commandArgs().

If it's a fairly simple shell comand (and even if it isn't, though
it could get tricky for complicated ones) you can use the backquote
trick (called, in well-spoken circles, command substitution):

  R --no-save `shellcmd`  delta2.R

As in all shell command lines, wherever you have a command (including
arguments etc.) between backquotes, as exemplified by `shellcmd` above,
the output of the command (as sent to stdout) replaces `shellcmd` in
the command-line. This could be a lot of stuff (depending on what
shellcmd is), or just one value, or whatever.

... and this behaviour is OS (or at least command shell specific) for 
anyone trying this on Windows and wondering why it doesn't work.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Quickest way to match two vectors besides %in%?

2005-11-08 Thread paul sorenson
Pete Cap wrote:
 Hello list,
 
 I have two data frames, X (48469,2) and Y (79771,5).
 
 X[,1] contains distinct values of Y[,2].
 I want to match values in X[,1] and Y[,2], then take
 the corresponding value in [X,2] and place it in
 Y[,4].
 
 So far I have been doing it like so:
 for(i in 1:48469) {
 y[which(x[i,1]==y[,3]),4]-x[i,2]
 }

I'm not sure but isn't that a case where merge() can help?

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] brewing stats

2005-10-23 Thread paul sorenson (sosman)
I guess this isn't so much of a help request as a show-and-tell from a 
non-statistician homebrewer who has been fumbling around with R.  If 
nothing else it provides yet another data set.  I hope it is not out of 
line.

Anyway, the plots I have produced are at

http://brewiki.org/BatchSparge#poll

The polling method is somewhat simple, its just one of those multiple 
choice style polls you can create on various web forums.

The poll was prompted by the ongoing claim from fly spargers that 
their method is more efficient, but I had never seen data to support 
that.  I thought maybe it was a bit of snobbery.

Maybe they are right.  However if I conveniently ignore that annoying 
bump on the left of the batch sparge histogram then the two groups start 
to look very similar.

I was going to go out on a limb and say I learn heaps from reading the 
posts here so please don't ruin my delusion too much if my output 
violates all principles of good statistics. OTOH if you can suggest 
other cool looking graphs please feel free.  The more difficult to 
pronounce the names are, the better :-)

The data set is (efficiency is the low end of its bin):

method  efficiency  count   source
fly 95  0   bb
fly 90  0   bb
fly 85  2   bb
fly 80  8   bb
fly 75  13  bb
fly 70  8   bb
fly 65  3   bb
fly 60  0   bb
fly 55  0   bb
batch   95  0   bb
batch   90  0   bb
batch   85  4   bb
batch   80  3   bb
batch   75  15  bb
batch   70  10  bb
batch   65  6   bb
batch   60  7   bb
batch   55  1   bb

And the R code:

# Crunch some stuff with brewboard (and similar polls).
# Data is already tabulated.

x = read.csv(SpargeEff.csv)

# Shift value to centre of bin
x$efficiency = x$efficiency + 2.5
# Ignore rows with no votes (NA), zeros are ok though
y = x[which(!is.na(x$count)),]
r = rep(row.names(y), y$count)
z = y[r,]
z$count = 1

par(mfrow=c(2,2))
barplot(table(z$method), main=number of responses)
barplot(table(z$method, z$efficiency), beside=T, legend=T, main=Mash 
efficiency by method, sub=paul sorenson 2005 brewiki.org)
boxplot(z$efficiency ~ z$method, main=Mash efficiency)
z.h = hist(z$efficiency, prob=T, main=Efficiencies,\n all methods 
combined, xlab=efficiency)
z.md = max(z.h$density)
lines(density(z$efficiency, bw=3.0), col='blue')
#qqnorm(x$efficiency)
t.test(efficiency ~ method, data=z)
#by(z, z$method, summary)
zs = split(z, z$method)
summary(zs$batch)
summary(zs$fly)

# fit a normal distribution
require(MASS)
z.fit = fitdistr(z$efficiency, 'normal')
q = 55:95 + 2.5
lines(q, dnorm(q, z.fit$estimate['mean'], z.fit$estimate['sd']), col='red')
legend('topleft', legend=c('density', 'fitted'), col=c('blue', 'red'), 
lwd=1, inset=0.05)

# Factor out lowball values.
z.f = z[which(z$efficiency = 65),]
by(z.f, z.f$method, summary)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] dyn.load a f90 module.

2005-10-23 Thread paul sorenson (sosman)
Bo Peng wrote:
 Dear list,
 
 Has there been any success in loading modules written in f90? I tried
 
 % ifort -c myfile.f90
 % R CMD SHLIB myfile.o
 % R
 dyn.load('myfile.so')
 .Fortran('myfile')
 
 I used intel (free) fortran compiler under linux. All commands run
 successfully except that function myfile is not loaded. (Is there a
 function/tool to list symbols in a .so file?)

If there are symbols present then unix 'nm' should show them to you.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] read data from pdf file

2005-10-22 Thread paul sorenson (sosman)
Marco Venanzi wrote:
 Hi, I'm trying to read data from a PDF file.Is it possible to do it
 with R? Thanks,  Marco [[alternative HTML version deleted]]

Ghostview has at least one method for extracting the text from a PDF 
document.  In particular Text|Extract allows you to select pages for 
extraction.  This may or may not give the same result as pdftotext 
because I think that is ghostscript based.

Your mileage may vary when extracting tables from a PDF.

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] sqlQuery and string selection

2005-10-19 Thread paul sorenson
Jérôme Lemaître wrote:
 Dear alls,
 
 Could someone tell me how to select a subset of string observations (e.g.
 females in a sex column) with sqlQuery in the RODBC library?
 
 Indeed, I'm trying to select a subset of observations on my access database
 with:
 
 female-sqlQuery(mychannel,SELECT Micromammiferes.sex
 FROM Micromammiferes
 WHERE (((Micromammiferes.sex)=females));)
 
 The sql works well in access but in R, I keep getting:
 
 Error: syntax error.

R is likely to have problems with nested quote characters.

Ie:

x = SELECT Micromammiferes.sex FROM Micromammiferes WHERE 
(((Micromammiferes.sex)=females));

also results in a syntax error (my mailer split the line).

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] sqlQuery and string selection

2005-10-19 Thread paul sorenson (sosman)
(sorry if a duplicate pops through ...)
Jérôme Lemaître wrote:
 Dear alls,
 
 Could someone tell me how to select a subset of string observations (e.g.
 females in a sex column) with sqlQuery in the RODBC library?
 
 Indeed, I'm trying to select a subset of observations on my access database
 with:
 
 female-sqlQuery(mychannel,SELECT Micromammiferes.sex
 FROM Micromammiferes
 WHERE (((Micromammiferes.sex)=females));)
 
 The sql works well in access but in R, I keep getting:
 
 Error: syntax error.

Most computer software has problems with nested quote characters.

Ie:

  x = SELECT Micromammiferes.sex FROM Micromammiferes WHERE
(((Micromammiferes.sex)=females));

also results in a syntax error (my mailer split the line).

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] graphics - current filename

2005-02-13 Thread Paul Sorenson
I would like to query R for the current (or last used) filename for a graphics 
device.

Eg after png(filename=plot%02d.png) I would like something like the output of 
dev.cur() but with the %02d expanded to the current name.

Can anyone point me at where I can find this please?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregation with extra columns

2005-02-01 Thread Paul Sorenson
R People,

Thanks for your help on my recent questions, Excel is never going to disappear 
from my office but with graphics from lattice package and some other stuff in R 
I have been able to add some value.

I have a problem I haven't been able to figure out with aggregation, I 
mentioned it earlier but didn't state it very clearly.

Basically I have many defect events and I want to grab the most recent event 
for each defect number:

eg:
date  defectnum state
2004-12-1   10  create
2004-12-2   11  create
2004-12-4   10  close
2004-12-7   11  fix

to:
date  defectnum state
2004-12-4   10  close
2004-12-7   11  fix

Now with aggregate I can get the rows I want but not with the state attached:

aggregate(list(date=ev$date), by=list(defectnum=ev$defectnum), max)

Gives me the rows I want but I have lost the state.  I have tried doing a 
merge afterwards but now I realise why they warned me avoid using dates as 
database keys.

What would be handy is somehow getting back the index vector from the aggregate 
function.  I realize in the general case this wouldn't work for aggregate but 
in the case of min/max the result is a specific record.

Someone earlier mentioned some tricks with sort but I haven't been able to make 
that get to where I want.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RE: aggregating dates

2005-01-31 Thread Paul Sorenson
The solution I came up with myself was simply to coerce the integer back to 
POSIXct:

class(ev$date) = POSIXct

Can't say it is the right way to do it but it seem to work.

A second related problem I haven't been able to solve as yet is how to include 
incidents columns (those not in 'x' or 'by') in an aggregate.

names(ev): date defectnum state
 
aggregate(ev$date, by=list(ev$defectnum), max)

This returns only the date and defectnum, I also need the state.

I tried writing my own aggregator function:
maxevent = function(events) {
events[which.max(events$date),]
}

aggregate(ev, by=list(ev$defectnum), maxevent)

But I get:

Error in [.default(events, which.max(events$date), ) : 
  incorrect number of dimensions

I am trying to retrieve only the rows of ev with the latest date for a given 
defectnum.

cheers

 Message: 29
 Date: Mon, 31 Jan 2005 16:16:35 +1100
 From: Paul Sorenson [EMAIL PROTECTED]
 Subject: [R] aggregating dates
 To: r-help@stat.math.ethz.ch
 Message-ID: [EMAIL PROTECTED]
 Content-Type: text/plain; charset=iso-8859-1
 
 I have a frame which contains 3 columns:
 
 date defectnum state
 
 And I want to get the most recent state change for a given 
 defect number.  date is POSIXct.
 
 I have tried:
   aggregate(ev$date, by=list(ev$defectnum), max)
 
 Which appears to be working except that the dates seem to 
 come back as integers (presumably the internal representation 
 of POSIXct).
 
 When I execute max(ev$date) the result remains POSIXct.
 
 I have been dredging through the help among DateTimeClasses 
 and haven't found a function that converts these integers to 
 some kind of date class.  Or a method for using aggregate 
 which doesn't perform the conversion in the first place.
 
 Any clues?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregating dates

2005-01-30 Thread Paul Sorenson
I have a frame which contains 3 columns:

date defectnum state

And I want to get the most recent state change for a given defect number.  date 
is POSIXct.

I have tried:
aggregate(ev$date, by=list(ev$defectnum), max)

Which appears to be working except that the dates seem to come back as integers 
(presumably the internal representation of POSIXct).

When I execute max(ev$date) the result remains POSIXct.

I have been dredging through the help among DateTimeClasses and haven't found a 
function that converts these integers to some kind of date class.  Or a method 
for using aggregate which doesn't perform the conversion in the first place.

Any clues?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] lookups and joins

2005-01-24 Thread Paul Sorenson
I have some data coming from SQL sources that I wish to relate in various ways. 
 For reasons only known to our IT people, this can't be done in SQL at present.

I am looking for an R'ish technique for looking up new columns on a data frame. 
 As a simple, hardwired example I have tried the following:

# This gives me two columns, one the lookup value and the second one
# the result column, ie my lookup table.
stcl = read.csv(stockclass.csv)
stockclass = as.vector(stcl$stock_class)
# This gives me what appears to be a dictionary or map
names(stockclass) = as.vector(stcl$stock_group)

getstockclass = function(stock_group) {
try(stockclass[[stock_group]], TRUE)
}
csg$stk_class=factor(sapply(csg$stock_group, getstockclass))

I need the try since if there is a missing value I get an exception.

I also tried something along the lines of (from memory):
getstockclass = function(stock_group) {
stcl[which(stcl$stock_group == stock_group),]$stock_class
}

These work but I just wanted to check if there was an inbuilt way to do this 
kind of thing in R?  I searched on join without much luck.

Really what I would like is a generic function that:
- Takes 2 data frames,
- Some kind of specification on which column(s) to join
- Outputs the joined frames, or perhaps a vector which is an index 
vector that I can use on the second data frame.

I don't really want to reinvent SQL and my data sets are not huge.

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] easing out of Excel

2005-01-20 Thread Paul Sorenson
Thanks for the responses to this question, I fully realise it is a rather open 
question and the open pointers are the kind of thing I am looking for.

I will look into the lattice package and layout.

Regarding the HTML output, the current tool chain assets that I have have 
been refactored over time and are almost totally driven by config files so they 
suit my purposes very well.  I will look into other possibilities at a later 
date.

For those looking for a more rigorous specification of the problem, you are 
well justified in this.  I was deliberately fuzzy since managers just want 
stuff and I thought casting a wide net would pay off.  The problem is to 
summarise information which is nothing more than sales data.  The kinds of 
columns I am dealing with look like:

date, customer, invoice_no, product, amount, sales_region, etc etc.

Managers want to know things like:
- which products are doing well
- which regions are doing well
- who are good customers
- etc

To me these are simple aggregates and sorts, with visual presentations to match.

I figure a bit of effort, R can extract considerably more useful information 
from the data.

To be honest I am just evolving it as I go, using an existing spreadsheet as a 
basis.  I try something and if it is useful then great, if not, put it down to 
learning.

cheers

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] easing out of Excel

2005-01-19 Thread Paul Sorenson
I know enough about R to be dangerous and our marketing people have asked me to 
automate some reporting.  Data comes from an SQL source and graphs and 
various summaries are currently created manually in Excel.  The raw information 
is invoicing records and the reporting is basically summaries by customer, 
region, product line etc.

With function such as aggregate(), hist() and pareto() (which someone on this 
list kindly pointed me at) I can produce something roughly equivalent to the 
current reports.

My question is, are there any neat R lock out features people here like to 
use on this kind of info, particularly when the output is very visual (report 
is intended for marketing people).

Another way of looking at this is, What kind of hidden information can I 
extract with R that the Excel solution hasn't touched?

For example, even the pareto plot mentioned earlier is something the Excel guys 
haven't thought of or can't easily produce.

regards

BTW the tool chain I am using goes something like:
Production (run daily):
DB - SQL/python - CSV - R/python - images - network
Presentation:
network - CGI/python - browser

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Can R be useful to me?

2004-04-01 Thread Paul Sorenson
On Wed, Mar 31, 2004 at 08:32:49AM -0300, Luiz Rodrigo Tozzi wrote:
 
 My question is: can I generate graphics and tables in gif ou any graphical 
 format through shell script?? can I call R, run a package in my ascii data e 
 then export the results to a gif, png or whatever?

On Wed, Mar 31, 2004 at 08:32:49AM -0300, Luiz Rodrigo Tozzi wrote:
 
 My question is: can I generate graphics and tables in gif ou any graphical 
 format through shell script?? can I call R, run a package in my ascii data e 
 then export the results to a gif, png or whatever?

I do this on my windows machine to generate project metrics 
automatically every morning.  For my situation, the following works well:

o Gather CSV files from various sources (I use python).

o Run R to generate PNG (or whatever) format images.  I am aware
of packages to link python and R but I just open an R process and
stream commands to it.

o Copy the CSV files and images to network directory.  This
network directory is also accessible to the intranet via apache.  I
wrote a simple CGI script in python to present all the diagrams back
to the browser.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: FDA and ICH Compliance of R

2003-11-27 Thread Paul Sorenson
 Antonia Drugica [EMAIL PROTECTED] writes:
 
  Does anybody know if R is FDA or ICH (or EMEA...) 
 compliant? AFAIK S-Plus
  is but that means nothing...
 
 As Thomas pointed out, that does mean nothing -- there was a group of
 folks discussing what might be done to help, earlier this year, but
 then everyone got busy...

FDA has a guidance document for off-the-shelf software:

http://www.fda.gov/cdrh/ode/guidance/585.html

Note that if focuses on OTS used in medical devices.  However you
should read it.  The document:

http://www.fda.gov/cdrh/comp/guidance/938.html

Has a section on applicability of the software guidance (which
encompasses stuff outside the instrument itself.  Since I am no
lawyer, I can't say whether R falls within this scope.

It is fair to say however that the FDA consider safety and
effectiveness very important.  If the effectiveness that you claim is
based on statistics provided by software, or you rely in software for
determining safe levels (eg of a drug) then I would say (as a layman)
it is largely irrelevant whether the vendor claims some sort of FDA
badge because that does not prevent someone from writing dodgy
scripts.

So what you can do (other than soliciting mail list opinions)
includes: 
o Think.  What are the implications for end users, patients
etc.  Would you take a pill based on your own stats?  
o Read what the FDA have to say.  
o Evaluate the risk and safety implications of the
statistics you use.  
o Manage the risk.  Eg can you indepently confirm
the key results?  
o Your scripts are software - the FDA requires
evidence of a credible process in the life cycle of software, whether
they be spreadsheets, real time control systems or whatever.

OTS software that is validated does not remove responsibility for
reducing risk to acceptable levels.

HTH

paul

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] RE: best editor for .R files

2003-11-23 Thread Paul Sorenson
I use gvim.  It has the syntax highlighting and works on most platforms.

My guess is that if you want more integration then the emacs path is the
way to go.

Note that I am not an R expert.

Date: Thu, 20 Nov 2003 22:27:05 +0100
From: Angel [EMAIL PROTECTED]
Subject: [R] best editor for .R files
To: [EMAIL PROTECTED]
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain;  charset=iso-8859-1


Which is the best editor for .R files?

I currently use kate on my linux as it has R highlighting and allows me to
split the window into two: in one I edit the .R file and in the other I have
a shell so I run R and can easily  copy and paste the code. There are some
features that I don't like and I am having a look on some alternatives.
I've heard wonders of emacs with ess but I am a little bit frightened of the
steep learning curve.

What do the R experts use or would recommend using?
Both linux and/or windows alternatives are welcomed.
I guess it would much depend on the particular needs/preferences of each
user but I would like to know which are the most commonly used editors.


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] model tutorial

2003-11-20 Thread Paul Sorenson
The help and man pages of R discuss the model syntax eg a ~ b.  They kind of just 
appear as if everyone knows why - I must be a bit of a mug.

Are there any tutorial style documents around on this topic?

cheers

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] RE: relationship between two discrete variables

2003-11-17 Thread Paul Sorenson
Further to my queries re relating discrete variables I have had a couple of
tips on things I could try.  This has lead me to attempt a marginal
homogeneity test
(http://ourworld.compuserve.com/homepages/jsuebersax/margin.htm).

o  Does anyone have an opinion on whether this approach would be
appropriate?

o Does R have some built in help to do this?  I found a reference to
the McNemar test but not to the Stuart-Maxwell test.

cheers


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] xlims of barplot

2003-11-13 Thread Paul Sorenson


-Original Message-
From: Marc Schwartz [mailto:[EMAIL PROTECTED]
Sent: Friday, 14 November 2003 12:49 AM
To: Paul Sorenson
Cc: [EMAIL PROTECTED]
Subject: Re: [R] xlims of barplot


On Wed, 2003-11-12 at 23:54, Paul Sorenson wrote:
 I would like to create a family of barplots with the same xlimits.  Is
 there a way to read the xlimits from the first graph so I can apply it to
 the subsequent ones?
 
 I have tried just taking the min and max of the x data and the plot doesn't
 show.
 
 cheers

A point of clarification:

When you say x axis limits, are you referring to the bar heights in a
horizontal barplot or are you referring to the range of the x axis with
vertical bars?  A critical difference.
...
If you are talking about a vertical barplot, then the x axis range will
be the same for each barplot under the following conditions, without
having to set it:

1. You have the same number of bars in each plot
2. You do not change the values of 'space', 'width' or 'beside' across
the plots.

Also, keep in mind that the bars are NOT centered over integer values on
the respective axis. You can get the bar center positions by using:

Sorry for being vague, it is the latter case, vertical bars.  The data
doesn't satisfy condition 1.  The family of 6 plots is datestamped data,
the first plot showing all defects, then each subsequent plot showing
defects of each severity level we define.  The min and max of each of the
subsequent datasets is in general a subset of the full dataset.  I can
easily plot them but it would be nice to keep the same x limits on each
graph.  The x data is POSIXct although I suspect that is not relevant.


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] xlims of barplot

2003-11-13 Thread Paul Sorenson


-Original Message-
From: Marc Schwartz [mailto:[EMAIL PROTECTED]
Sent: Friday, 14 November 2003 9:44 AM
To: Paul Sorenson
Cc: [EMAIL PROTECTED]
Subject: RE: [R] xlims of barplot


On Thu, 2003-11-13 at 15:53, Paul Sorenson wrote:

SNIP

 Sorry for being vague, it is the latter case, vertical bars.  The data
 doesn't satisfy condition 1.  The family of 6 plots is datestamped data,
 the first plot showing all defects, then each subsequent plot showing
 defects of each severity level we define.  The min and max of each of the
 subsequent datasets is in general a subset of the full dataset.  I can
 easily plot them but it would be nice to keep the same x limits on each
 graph.  The x data is POSIXct although I suspect that is not relevant.


OK...I think I understand what you are doing.

You want a series of barplots that have space for the same number of
vertical bars along the x axis, but there may be gaps in the series for
any given barplot. Presumably, those gaps may be anywhere in the time
series along the x axis.

Correct and most problematically at the ends.

Hint: barplot() will leave gaps in the bar series where an NA appears in
the vector or in a matrix column of height values.

...

So, the key is to be sure that the vector or matrix has the same number
of elements or matrix columns in each dataset. For your incomplete
datasets, pad each series with NA's to fill out the missing entries in
the time series.

That sounds like a way forward.  I just need to go back to the basics and
learn how to add rows to data.frames.  I am sure it won't be hard, its
just my personal learning curve with several new data types (factors,
tables, data.frames vs vectors, lists, arrays which I am more familiar
with).  For example, yesterday I tried max(myFactor) and it gave me an
error (something like must be a vector), even though to my naive way of
thinking myFactor clearly had a numeric max.

Thanks



[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] more barplot presentation questions

2003-11-04 Thread Paul Sorenson
Thanks to those who pointed me at the solutions to the legend overprinting the bars.  
I took the easy way of rescaling the y axis, picking the scaling factor for stacked 
bars is somewhat problematic but sufficient for my application.

I have another couple of barplot questions:

- Can I extend the major ticks on the Y axis across the page?  Or both axes to 
form a grid?

- A really neat graph for me would be a combination of side-by-side and 
stacked bars in a single plot to display an additional category.

The background on the second problem is that I am displaying software defect metrics.  
For each month (the bins) the categories of interest are:
- new/fixed/closed
- numeric severity (1 - 5)

I am currently displaying 5 separate graphs (6 when you take the aggregate into 
account) with new/fixed/closed side-by-side.  If within the side-by-side graphs I 
could show the severity stacked that would be very neat.

Cheers

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] legend over-prints barplot bar

2003-10-29 Thread Paul Sorenson
When I create a bar plot, the legend is obscuring the rightmost bar.

I haven't found a setting that appears to affect the positioning of the legend - any 
tips re moving the legend would be most appreciated.

paul sorenson

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] RPy for windows and python 2.3

2003-10-14 Thread Paul Sorenson
Does anyone have an RPy installer or some way of getting RPy to work with python 2.3 
on windows? I am using R 1.7.1.

I have tried compiling with several different compilers from the source and the best I 
got was parser errors in header files with mingw32.

paul

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help