Re: [R] Why isn't R recognising integers as numbers?

2008-09-22 Thread Peter Dalgaard

Ted Byers wrote:

Thanks Jim,

Alas, it wasn't this.  Here is the output from both of your suggestions:

  

refdata18 = read.csv(K:\\MerchantData\\RiskModel\\Capture.Week.18.csv,
header = TRUE,na.strings=)
str(refdata18)


'data.frame':   341 obs. of  1 variable:
 $ X0: int  0 0 0 0 0 0 0 0 0 0 ...
  
Ummm, is there a header line or not? If there isn't, read.csv is going 
to eat the first observation thinking it is a name (and since it is 
non-syntactic add an X in front).


The scan command looks fine, you just should have assigned it somewhere, 
x - scan(..) and then fitdistr(x, )



scan(K:\\MerchantData\\RiskModel\\Capture.Week.18.csv, what=0L)


Read 342 items
  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
0  0
 [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
0  0
 [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
0  0
 [76]  0  0  0  0  0  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
1  1
[101]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
1  1
[126]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
1  1
[151]  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2 
2  2
[176]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3 
3  3
[201]  3  3  3  3  3  3  3  3  3  3  3  3  3  4  4  4  4  4  4  4  4  4  4 
4  4
[226]  4  4  4  4  4  4  4  5  5  5  5  5  5  5  5  5  6  6  6  6  6  6  6 
6  6
[251]  6  6  6  6  6  6  6  6  6  6  6  6  6  6  7  7  7  7  7  7  7  7  7 
7  7

[276]  7  7  7  8  8  8  8  9  9  9  9  9  9  9  9  9 10 10 10 10 10 10 10
10 10
[301] 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12
12 12
[326] 12 12 12 18 18 18 18 18 18 18 18 18 18 18 18 18 18

  


--
  O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
 c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding layers in ggplot2 (data and code included)

2008-09-22 Thread Eric


The way you've attempted to get this result seems to align with the way 
R should work, but it fails in this case.

The fix is to break things up a little bit:

p - ggplot(mydata, aes(x=Est, y=Tri))
p - p + geom_point(aes(colour=factor(Group),shape=factor(Group)))
p - p + 
geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F)

p


Eric



Juliet Hannah wrote:

Here is some sample data:

mydata - read.table(textConnection(Est GroupTri
   00 4.639644
   10 4.579189
   20 4.590714
   01 4.443696
   11 4.588243
   21 4.650505
   02 4.296608
   12 4.826036
   22 4.765386),header=TRUE);
  closeAllConnections();

I can form two plots, scatter and  lines, as follows:

p - ggplot(mydata, aes(x=Est, y=Tri))
p + geom_point(aes(colour=factor(Group),shape=factor(Group)))

and

p+ geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F).

However, I am unable to have the plots together.

I obtain the following error:

  

p + 
geom_point(aes(colour=factor(Group),shape=factor(Group)))+geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F)


Error in `[.data.frame`(df, , var) : undefined columns selected

Thanks,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time series (ts) questions.

2008-09-22 Thread rkevinburton
I have been working with the base time series object (ts) and I had a couple of 
questions that hopefully this group can help me with:

1) What is the best why to append an observation to an existing time-series? 
Suppose I have a time series:

t - ts(1:12, frequency=5)

This would generate two complete cycles and one remainder. Now I would like to 
append an observation to this time series. I could use 'c' but then I would 
need to rebuild the whole time series and I would need to know the frequency 
etc. I would like some operation like '+' that would simply append the value to 
the end of the time series (incrementing the 'las time value so thing like 
cycle() still output the correnct values) but alas

t + 10

is already taken as an equally useful operation by adding 10 to each element in 
the time series (rather than in thie case, appending ts(10,frequency) with a 
time value of 13 to the time series).

2) How is the best way to get the last time value in a time series? I can do 
something like:

(start(t)[2] - 1) + (end(t)[1]-1) * frequency(t) + end(t)[2]

But there has to be an easier way.

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matrix balancing on margins

2008-09-22 Thread PALMIER Patrick - CETE NP/INFRA/TRF
Hello,

Is there any package in R for balancing matrix

I want to estimate a matrix with

*  a initial matrix (1 everywhere for example)
* Row margin
* Col margin
* distance class  vector  (each cell of the matrix  belong to a
  distance class) and I want that the distance class repartition
  will be preserved

How can I do such thing?
Is there any function already existing or should I compute an iterative 
script myself?

Thanks
 
-- 

*Patrick PALMIER**
**Centre d'Études Techniques de l'Équipement Nord - Picardie
Département Infrastructures
*/*Trafic -- Socio-économie
*/2, rue de Bruxelles, BP 275
59019 Lille cedex
FRANCE
Tél: +33 (0) 3 20 49 60 70
Fax: +33 (0) 3 20 49 63 69


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable Selection for data reduction and discriminant anlaysis

2008-09-22 Thread Mark Difford

Hi Gareth,

 My data is transformed to the clr or alr under Aitchison geometry, so I
 am essentially working 
 in Euclidean space.

Great: glad to hear it.

 Has anyone had experience doing stepwise LDA??  I can't for the life of
 me find any help 
 online about where to start.

A better option might be this: Trevor Hastie and a student of his have
recently put out a paper that does a step-up from penalized discriminant
analysis based, I think, on Trevor's sparse principal component analysis
method (in his elasticnet package).

http://www-stat.stanford.edu/~hastie/Papers/sda_line.pdf

You can get R-code to do the analysis on the first author's website; there's
a link in the paper.

Bye, Mark.


gcam032 wrote:
 
 Thanks Mark,
 
 I failed to mention that i'm working within a compositional framework.  I
 didn't want to confuse things.  My data is transformed to the clr or alr
 under Aitchison geometry, so I am essentially working in Euclidean space. 
 
 Has anyone had experience doing stepwise LDA??  I can't for the life of me
 find any help online about where to start.
 
 Thanks
 
 Gareth
 
 
 quote author=Mark Difford
 Hi Gareth,
 
 If I use the full composition (31 elements or variables), I can get
 reasonable separation of my 6 sources.
 
 A word of advice: You need to be exceptionally careful when analyzing
 compositional data. Taking compositions puts your data values into a
 constrained/bounded space (generally called a simplex) so that most
 standard statistical procedures (i.e. anything that uses a Euclidean
 metric, and most do) deliver erroneous results. Pearson wrote a paper on
 this long ago, but it's generally been ignored (except by Aitchison and
 the Spanish School of mathematical statisticians).
 
 The problem is comparatively well known to geologists, who work with
 compositional much of the time. R has a very good package for analysing
 this data-type: see the compositions package  (a new release seems
 iminent). You will be able to get most of the main references from it.
 (The authors of the package also have a newly-released article in one of
 the Elsevier journals [unfor. my bib+ are elsewhere so I cannot give
 details]).
 
 You could start by Wiki'ing your way to compositional data.
 
 HTH, Mark.
 
 
 
 Gareth Campbell wrote:
 
 Hello all,
 
 I'm dealing with geochemical analyses of some rocks.
 
 If I use the full composition (31 elements or variables), I can get
 reasonable separation of my 6 sources.  Then when I go onto do LDA with
 the
 6 groups, I get excellent separation.
 
 I feel like I should be reducing the variables to thos that are providing
 the most discrimination between the groups as this is important
 information
 for me.  I struggle to interpret the PCA plot in a way that helps me (due
 to
 the large number of elements).  So I'm trying to do some sort of
 step-wise
 variable selection.
 
 I would love to hear from someone (possibly a geochemist or similar) who
 does this regularly to determine the best course of action in R to do
 this.
 
 
 Thanks very much
 
 
 -- 
 Gareth Campbell
 PhD Candidate
 The University of Auckland
 
 P +649 815 3670
 M +6421 256 3511
 E [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 



-- 
View this message in context: 
http://www.nabble.com/Variable-Selection-for-data-reduction-and-discriminant-anlaysis-tp19591270p19602702.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread Yihui Xie
Hi,

You can treat it as a database and use ODBC to fetch data from the CSV
file using SQL. See the package RODBC for details about database
connections. (I have dealt with similar problems before with RODBC)

Regards,
Yihui
--
Yihui Xie [EMAIL PROTECTED]
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Sep 22, 2008 at 2:50 PM, José E. Lozano [EMAIL PROTECTED] wrote:
 Hello,



 Recently I have been trying to open a huge database with no success.



 It's a 4GB csv plain text file with around 2000 rows and over 500,000
 columns/variables.



 I have try with The SAS System, but it reads only around 5000 columns, no
 more. R hangs up when opening.



 Is there any way to work with parts (a set of columns) of this database,
 since its impossible to manage it all at once?



 Is there any way to establish a link to the csv file and to state the
 columns you want to fetch every time you make an analysis?



 I've been searching the net, but found little about this topic.



 Best regards,

 Jose Lozano


[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
Hello, Yihui

 You can treat it as a database and use ODBC to fetch data from the CSV
 file using SQL. See the package RODBC for details about database
 connections. (I have dealt with similar problems before with RODBC)

Thanks for your tip, I have used RODBC before to read data from MSAccess and
MSExcel files, but never I imagined it could work for non-database files
such as csv.

I will check the RODBC documentation.

Best Regards,
Jose Lozano

--
Jose E. Lozano Alonso
Observatorio de Salud Pública.
Direccion General de Salud Pública e I+D+I.
Junta de Castilla y León.
Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
 I wouldn't call a 4GB csv text file a 'database'.

Obviously, a csv it's not a database itself, I tried to mean (though it
seems I was not understood) that I had a huge database, exported to csv file
by the people who created it (and I don’t have any idea of the original
format of the database).

 Yes, use a database. A real database.

I've used MSAccess and there is a limit of 255 columns, as far as I know, so
there is no way of import it. Obviously, I won't buy an Oracle license to
read this file, so: what database system allows a 50 variables table?
MySQL? Do I have to split the file in smaller parts to import in tables to
relate them all using an index field?

 No, but you can establish a link to a database. You want a database.
 A real relational database.

 Try:
 http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases

It didn't help, sorry. I perfectly knew what a relational database is (and I
humbly consider myself an advanced user on working with MSAccess+VBA, only
that I've never face this problem with variables), you should not suppose
everyone's stupid, though...

Thanks for your help,
Best regards
Jose Lozano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to keep up with R?

2008-09-22 Thread Robin Hankin

Adaikalavan Ramasamy wrote:
I agree! The best way to learn (and remember for longer) is to teach 
someone else about it.


And there is not reason not to repeat some of the anlysis done on SAS 
with R. That way you can verify your outputs or compare the 
presentations. If you consistently find differences in the outputs, 
then trying to figure out the reason may lead you to better understand 
the methods (e.g. different optimization or estimation procedures).




My take on this:

I have repeatedly found that it is surprisingly easy to improve on 
existing (non-R) implementations

of statistical and non-statistical computation, when working  in R.

Something about the structure of the language, something about the 
package mechanism,
something about R-help, something about R-core, something about 
open-source, something
about JSS or R-news, whatever it is, there is SOMETHING ABOUT R which 
lends itself
to straightforward production of quality software.  And that something 
is missing from other

programming languages, IMO.



rksh




Regards, Adai



Barry Rowlingson wrote:

2008/9/19 Wensui Liu [EMAIL PROTECTED]:

Dear Listers,

I've been a big fan of R since graduate school. After working in the
industry for years, I haven't had many opportunities to use R and am 
mainly
using SAS. However, I am still forcing myself really hard to stay 
close to R
by reading R-help and books and writing R code by myself for fun. 
But by and
by, I start realizing I have hard time to keep up with R and am 
afraid that

I would totally forget how to program in R.

I really like it and am very unwilling to give it up. Is there any 
idea how
I might keep touch with R without using it in work on daily basis? I 
really

appreciate it.



--
Robin K. S. Hankin
Senior Research Associate
Cambridge Centre for Climate Change Mitigation Research (4CMR)
Faculty of Economics
The University of Cambridge
[EMAIL PROTECTED]
01223-764877

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why isn't R recognising integers as numbers?

2008-09-22 Thread Ted Harding
Hi Ted (from Ted),
Just to clarify Marc's comments about dataframes in more basic terms.

If you read in data with read.csv() the result returned by the function
is a dataframe. This is a specialised kind of list, which you can think
of as a list of columns all of the same length. You can think of each
column as a vector of elements, all of which must be of the same type
within the column, though the type can vary (e.g. numeric, factor,
character) between columns. When you display a dataframe, it looks like
a matrix, though in R terms it is not really a matrix; it is a list,
where each component of the list is a column.

Of course a dataframe, like any list, might have only one component.
But it is still a list -- and the actual contents are only available
one layer down, after you have extracted that component by some
means (e.g. by using the $ extractor). Simple example:

  L - c(1,2,3,4) ## vector
  L
# [1] 1 2 3 4
  L.df - data.frame(L=L) ## Dataframe with 1 component named L
  L.df
#   L
# 1 1
# 2 2
# 3 3
# 4 4
  L.df$L  ## Extract the component named L
# [1] 1 2 3 4 ## Compare with the result of 'L' above

# Try a regression on L (this works):
  lm(L ~ 1)
# Call:
# lm(formula = L ~ 1)
# Coefficients:
# (Intercept)  
# 2.5  

# Try a regression on L.df (this doesn't work):
  lm(L.df ~ 1)
# Error in model.frame.default(formula = L.df ~ 1,
#   drop.unused.levels = TRUE) : 
#   invalid type (list) for variable 'L.df'

# But it does after you refer to the component L by name:
  lm(L.df$L ~ 1)
# Call:
# lm(formula = L.df$L ~ 1)
# Coefficients:
# (Intercept)  
# 2.5  

# or:
  lm(L ~ 1, data=L.df)
# Call:
# lm(formula = L ~ 1, data = L.df)
# Coefficients:
# (Intercept)  
# 2.5  

# But you can (for a dataframe, not a general list) use an index
method of extraction *as if* it were a matrix (even though it isn't):

  L.df[,1]
# [1] 1 2 3 4
  L.df[3,1]
# [1] 3

# But compare with:
  L.df[1]
#   L
# 1 1
# 2 2
# 3 3
# 4 4

which is essentially the same as L.df itself (e.g. lm(L.df[1] ~ 1)
will not work in exactly the same way as lm(L.df ~ 1) didn't work).

The dataframe structure exists in R because so much data is typically
in the row by column (case by variables) layout such as you get in
spreadsheets and associated CSV files, and it is very useful to be
able to get into this layout directly (and refer to the variables
by name, as above).

The full generality of a 'list' can also be useful for encapsulating
data of a less strictly structured kind, but that is another (longer)
story!

Helping this helps.
Ted.


On 22-Sep-08 02:09:29, Ted Byers wrote:
 Thanks Marc,
 That was it. 
 
 For the last 30 years, I'd write my own code, in FORTRAN, C++,
 or even Java, to do whatever statistical analysis I needed.
 When at the office, sometimes I could use SAS, but that hasn't
 been an option for me in years.
 
 This is the first time I have had to load real data into R
 (instead of generating random data to use while playing with
 some of the stats functions, or manually typing dummy data).
 
 I take it, then, that the result of loading data is a data
 frame, and notjust a matrix or array. Using something like
 refdata18[, 1] feels rather alien, but I'm sure I'll quickly
 get used to it.  I'd seen it before in the R docs, but it didn't
 register that I had to use it to get the functions of most
 interest to me to recognise my data as a vector of numbers,
 given I'd provided only a vector of integers as input.
 
 Thanks
 
 Ted
 
 
 Marc Schwartz wrote:
 
 on 09/21/2008 08:01 PM Ted Byers wrote:
 I have a number of files containing anywhere from a few dozen to a
 few
 thousand integers, one per record.
 
 The statement refdata18 =
 read.csv(K:\\MerchantData\\RiskModel\\Capture.Week.18.csv, header =
 TRUE,na.strings=) works fine, and if I type refdata18, I get the
 integers
 displayed, one value per record (along with a record number). 
 However,
 when
 I try  fitdistr(refdata18,negative binomial), or
 hist.scott(refdata18,
 prob = TRUE), I get an error:
 
 Error in fitdistr(refdata18, negative binomial) : 
   'x' must be a non-empty numeric vector
 Or
 Error in hist.default(x, nclass.scott(x), prob = prob, xlab = xlab,
 ...)
 : 
   'x' must be numeric
 
 How can it not recognise integers as numbers?
 
 Thanks
 
 Ted
 
 'refdata18' is a data frame and the two functions are expecting a
 numeric vector.
 
 If you use:
 
   fitdistr(refdata18[, 1], negative binomial)
 
 or
 
   hist(refdata18[, 1])
 
 you should get a suitable result, presuming that the first column in
 the
 data frame is a numeric vector.
 
 Use:
 
   str(refdata18)
 
 to get a sense for the structure of the data frame, including the
 column
 names, which you could then use, instead of the above index based
 syntax.
 
 HTH,
 
 Marc Schwartz
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting 

Re: [R] Manage huge database

2008-09-22 Thread Barry Rowlingson
2008/9/22 José E.  Lozano [EMAIL PROTECTED]:
 I wouldn't call a 4GB csv text file a 'database'.

 It didn't help, sorry. I perfectly knew what a relational database is (and I
 humbly consider myself an advanced user on working with MSAccess+VBA, only
 that I've never face this problem with variables), you should not suppose
 everyone's stupid, though...

 Maybe you've not lurked on R-help for long enough :) Apologies!

A bit more googling tells me both MySQL and PostgreSQL have limits of
a few thousand on the number of columns in a table, not a few hundred
thousand. An insightful comment on one mailing list is:

Of course, the real bottom line is that if you think you need more than
order-of-a-hundred columns, your database design probably needs revision
anyway ;-)

 So, how much design is in this data? If none, and what you've
basically got is a 2000x50 grid of numbers, then maybe a more raw
binary-type format will help - HDF or netCDF? Although I'm not sure
how much R support for reading slices of these formats exists, you may
be able to use an external utility to write slices out on demand.
Random access to parts of these files is pretty fast.

http://cran.r-project.org/web/packages/RNetCDF/index.html
http://cran.r-project.org/web/packages/hdf5/index.html

 Thinking back to your 4GB file with 1,000,000,000 entries, that's
only 3 bytes per entry (+1 for the comma). What is this data? There
may be more efficient ways to handle it.

 Hope *that* helps...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] matrix balancing on margins

2008-09-22 Thread PALMIER Patrick - CETE NP/INFRA/TRF

Hello,

Is there any package in R for balancing matrix

I want to estimate a matrix with

   *  a initial matrix (1 everywhere for example)
   * Row margin
   * Col margin
   * distance class  vector  (each cell of the matrix  belong to a
 distance class) and I want that the distance class repartition
 will be preserved

How can I do such thing?
Is there any function already existing or should I compute an iterative 
script myself?


Thanks
--

**

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread jim holtman
What are you going to do with the data once you have read it in?  Are
all the data items numeric?  If they are numeric, you would need at
least 8GB to hold one copy and probably a machine with 32GB if you
wanted to do any manipulation on the data.

You can use a 'connection' and 'scan' to read the data in chunks and
then store it in a more accessible format.  A lot would depend on your
answer to my first question.

On Mon, Sep 22, 2008 at 6:26 AM, José E. Lozano [EMAIL PROTECTED] wrote:

  Maybe you've not lurked on R-help for long enough :) Apologies!

 Probably.

  So, how much design is in this data? If none, and what you've
  basically got is a 2000x50 grid of numbers, then maybe a more raw

 Exactly, raw data, but a little more complex since all the 50 variables
 are in text format, so the width is around 2,500,000.

  http://cran.r-project.org/web/packages/RNetCDF/index.html
  http://cran.r-project.org/web/packages/hdf5/index.html

 Thanks, I will check. Right now I am reading line by line the file. It's
 time consuming, but since I will do it only once, just to rearrange the data
 into smaller tables to query, it's ok.

  Thinking back to your 4GB file with 1,000,000,000 entries, that's
  only 3 bytes per entry (+1 for the comma). What is this data? There
  may be more efficient ways to handle it.

 Is genetic DNA data (individuals genotyped), hence the large amount of
 columns to analyze.

 Best Regards,
 Jose Lozano
 --
 Jose E. Lozano Alonso
 Observatorio de Salud Pública.
 Direccion General de Salud Pública e I+D+I.
 Junta de Castilla y León.
 Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread Barry Rowlingson
2008/9/22 José E.  Lozano [EMAIL PROTECTED]:

 Exactly, raw data, but a little more complex since all the 50 variables
 are in text format, so the width is around 2,500,000.

 Thanks, I will check. Right now I am reading line by line the file. It's
 time consuming, but since I will do it only once, just to rearrange the data
 into smaller tables to query, it's ok.

  A language like python, perl, or even awk might be able to help you
slice your data up.

 Is genetic DNA data (individuals genotyped), hence the large amount of
 columns to analyze.

 So is each line just ACCGTATAT etc etc?

 If you have fixed width fields in a file, so that every line is the
same length, then you can use random access methods to get to a
particular value - just multiply the line length by the row number you
want and add the column number. In R you can do this with seek() on a
connection. This should be fast because it seeks by bytes, instead of
having to scan all the comma-separated stuff. The only problem comes
when your data doesn't quite conform, and you can end up reading junk.
When doing this, it's a good idea to test your dataset first to make
sure the lines and fields are right.

Example with dummy.dna:

aaaccctttgggaaa
gattacagattacaa
aaacggg
gtgtggg
aac

 each line has 15 bases, and on my OS there's one additional invisible
character to mark the line end. Windows uses 2, but your data might
not be Windows format... So anyway, my multiplier is 16. Hence to get
a slice of the file of four columns from column 7 for some rows:

 dna=file(dummy.dna)
 open(dna,open=rb)
 for(r in 2:4){seek(dna,7+(r-1)*16);print(readChar(dna,4))}
[1] gatt
[1] 
[1] 

 The speed of this should be independent of the size of your data file.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl: How to position a window during open3d call

2008-09-22 Thread Koen Stegen
Duncan Murdoch wrote:
 This is fixed now on R-forge; eventually it will make it into the next
 rgl release on CRAN.  You should be able to download a binary of the
 development version from R-forge sooner.  Make sure you get version
 0.81.706 or newer.

The R-forge version 0.81.706 works as advertised, both on Linux and Windows.
Thanks Duncan!


Koen Stegen
Royal Meteorological Institute of Belgium

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] auto.arima help.

2008-09-22 Thread rkevinburton
Hello,

I am calling the auto.arima method in the forecast package at it returns what 
seems to be valid Arima output. But when I feed this output to 'predict' I get:

Error in predict.Arima(catall.fit[[.index]], n.ahead = 12) : 
  'xreg' and 'newxreg' have different numbers of columns

Is there a way to tell what is being supplied to xreg from the Arima output? 

Any ideas?

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help for R

2008-09-22 Thread Uwe Ligges

Please read the posting guide an tell us:

- Which version of R
- Which OS?
- Which version of the matlab package (I guess you are using that one?)
- If Windows and a binary version of the matlab package: Does the binary 
it fit to your version of R?


Uwe Ligges




Mac wrote:

Dear R users£¬
   
  I've just started learning R and I'm having a problem with it. I was told as following when I tried to run R: 
   
  Error in loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = keep.source) : 
in 'matlab' methods specified for export, but none defined: sum, size, padarray, flipud, fliplr

Error: package/namespace load failed for 'matlab'
   
  Then I tried package/load in package/matlab, however, the same message showed to me as above.
   
  I appreciate for any help and suggestion. Thanks.
   
  Kai


   
-

 ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡
[[alternative HTML version deleted]]





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Joint maximum likelihood estimation for ordinal data

2008-09-22 Thread denn

Dear R users

From what I understand, the joint maximum likelihood procedure for Rasch
(availabe in the package MiscPsycho) in R can only be used on binary data. 
I was wondering if the code is currently being adapted for application to
ordinal data?  I'm trying to replicate results obtained from Winsteps in R. 

Best wishes
denn
-- 
View this message in context: 
http://www.nabble.com/Joint-maximum-likelihood-estimation-for-ordinal-data-tp19606190p19606190.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hmisc and Ubuntu (aptitude install)

2008-09-22 Thread Vincent Goulet

Matthew,

As per the CRAN Ubuntu README

http://cran.r-project.org/bin/linux/ubuntu/

install the Ubuntu r-base-dev package to compile R packages from  
sources.


Vincent

Le lun. 22 sept. à 00:08, Matthew Pettis a écrit :


Hi,

I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install.
I tried getting Hmisc from within R by issuing the standard
'install.packages' command, but it said I needed 'gfortran' to
compile.  I thought I could circumvent this by using 'aptitude' to get
the package 'r-cran-hmisc', but when I got it, the package had
critical missing parts (got 404s).  So, I'll be trying to go back and
download 'gfortran', but can anybody tell me if this aptitude ubuntu
package should be kept up to date and is just currently overlooked?

Thanks,
Matt

--
It is from the wellspring of our despair and the places that we are
broken that we come to repair the world.
-- Murray Waas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hmisc and Ubuntu (aptitude install)

2008-09-22 Thread Dirk Eddelbuettel
On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote:
 Matthew,

 As per the CRAN Ubuntu README

   http://cran.r-project.org/bin/linux/ubuntu/

 install the Ubuntu r-base-dev package to compile R packages from  
 sources.

Well there should be a working r-cran-hmisc package.  You simply got a
'404' error indicating that your network access (using http) to the
external Ubuntu mirror was broken.   Fix that, or download the package
by hand.  It may be easier to just install the missing package.

That said, Vincent is of course entirely correct on the need for
r-base-dev.  

Dirk
  

 Vincent

 Le lun. 22 sept. à 00:08, Matthew Pettis a écrit :

 Hi,

 I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install.
 I tried getting Hmisc from within R by issuing the standard
 'install.packages' command, but it said I needed 'gfortran' to
 compile.  I thought I could circumvent this by using 'aptitude' to get
 the package 'r-cran-hmisc', but when I got it, the package had
 critical missing parts (got 404s).  So, I'll be trying to go back and
 download 'gfortran', but can anybody tell me if this aptitude ubuntu
 package should be kept up to date and is just currently overlooked?

 Thanks,
 Matt

 -- 
 It is from the wellspring of our despair and the places that we are
 broken that we come to repair the world.
 -- Murray Waas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread Martin Morgan
José E. Lozano [EMAIL PROTECTED] writes:

 Maybe you've not lurked on R-help for long enough :) Apologies!

 Probably.

 So, how much design is in this data? If none, and what you've
 basically got is a 2000x50 grid of numbers, then maybe a more raw

 Exactly, raw data, but a little more complex since all the 50 variables
 are in text format, so the width is around 2,500,000.

 http://cran.r-project.org/web/packages/RNetCDF/index.html
 http://cran.r-project.org/web/packages/hdf5/index.html

 Thanks, I will check. Right now I am reading line by line the file. It's
 time consuming, but since I will do it only once, just to rearrange the data
 into smaller tables to query, it's ok.

 Thinking back to your 4GB file with 1,000,000,000 entries, that's
 only 3 bytes per entry (+1 for the comma). What is this data? There
 may be more efficient ways to handle it.

 Is genetic DNA data (individuals genotyped), hence the large amount of
 columns to analyze.

The Bioconductor package snpMatrix is designed for this type of
data. See

http://www.bioconductor.org/packages/2.2/bioc/html/snpMatrix.html

and if that looks promising

 source('http://bioconductor.org/biocLite.R')
 biocLite('snpMatrix')

Likely you'll quickly want a 64 bit (linux or Mac) machine.

Martin

 Best Regards,
 Jose Lozano
 --
 Jose E. Lozano Alonso
 Observatorio de Salud Pública.
 Direccion General de Salud Pública e I+D+I.
 Junta de Castilla y León.
 Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] use of system() under Linux

2008-09-22 Thread Rainer M Krug
H

I want to use the system() command to execute a command and have to
return the result in a r-variable, so I an using intern=TRUE.

On the other hand, I want to evaluate the return value of the command,
to determine if the command was successful.

According to the help, these to objectives are exclusive, either the
one or the other. Is this true, or is there another way of
accomplishing this?

My prefered return value would be a list, consisting of thre entries:
return code of the command
stderr
and the result

Thanks

Rainer



-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Faculty of Science
Natural Sciences Building
Private Bag X1
University of Stellenbosch
Matieland 7602
South Africa

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Likelihood between observed and predicted response

2008-09-22 Thread Ben Bolker
Christophe LOOTS Christophe.Loots at ifremer.fr writes:

 
 Thank you so much for your help.
 
 The function dbinom seems to work very well.
 
 However, I'm a bit lost with the dnorm function.
 
 Apparently, I have to compute the mean mu and the standard deviation 
 sd but what does it mean exactly? I only have a vector of predicted 
 response and a vector of observed response that I would like to compare!
 
 What are mu and sigma.
 


  mu is the mean (which you might as well set to the
predicted value).  sd is the standard deviation; in order
to calculate the likelihood in this case, you'll need an
*independent* estimate (from somewhere) of the standard
deviation.  Without thinking about it too carefully I think
you could probably get this from sqrt(sum((predicted-observed)^2)/(n-1))



 Thanks again.
 Christophe


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding layers in ggplot2 (data and code included)

2008-09-22 Thread hadley wickham
Hi Juliet,

On Sun, Sep 21, 2008 at 11:47 PM, Juliet Hannah [EMAIL PROTECTED] wrote:
 Here is some sample data:

 mydata - read.table(textConnection(Est GroupTri
   00 4.639644
   10 4.579189
   20 4.590714
   01 4.443696
   11 4.588243
   21 4.650505
   02 4.296608
   12 4.826036
   22 4.765386),header=TRUE);
  closeAllConnections();

 I can form two plots, scatter and  lines, as follows:

 p - ggplot(mydata, aes(x=Est, y=Tri))
 p + geom_point(aes(colour=factor(Group),shape=factor(Group)))

 and

 p+ geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F).

 However, I am unable to have the plots together.

 I obtain the following error:

 p + 
 geom_point(aes(colour=factor(Group),shape=factor(Group)))+geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F)
 Error in `[.data.frame`(df, , var) : undefined columns selected

Are you using R 2.7.2?  Something in R changed between R 2.7.1 and R
2.7.2 that breaks certain ggplot plots (you code works fine for me
without modification).  It's on my to do list to fix.

You can also simplify your code a little by relying on defaults set in
the ggplot() call:

ggplot(mydata, aes(Est, Tri, colour = factor(Group))) +
 geom_point(aes(shape = factor(Group))) +
 geom_smooth(method = lm, se = F)

(Andpleaseusespacesotherwiseitsveryhardtoreadyourcode)

Hadley


 Thanks,

 Juliet

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SmoothScatter plot range issue

2008-09-22 Thread Jason Pare
Hello,

I am attempting to use smoothScatter to plot a heatmap of locations of
events in an x-y axis. When I plot the heatmap without passing xlim and ylim
parameters, it fills the plot area but the perspective is a bit skewed. I
would like to standardize these plots to a uniform window size that does not
depend on the range of values in the dataframe. However, when I resize the
plot using xlim or ylim, there is a light blue background that surrounds the
immediate area of the data (correspnding to the range of the points listed
in the dataframe), surrounded by extra white space for the new xlim and ylim
values I have added. Some of the rings around the datapoints are also cut
off at the margins.

I would like to stop the plot from being cut off, and want this light blue
range to extend throughout the entire area of the resized plot. I have
attempted to add NAs, but it has no effect on expanding this light blue plot
area. Code is below.

 xyz is a dataframe containing two columns with corresponding x and y
values

library(geneplotter)
library(RColorBrewer)

layout(matrix(1:1, ncol=2, byrow=TRUE))

smoothScatter(xyz, nrpoints=0, xlim=c(-3,3),
ylim=c(0,5),colramp=colorRampPalette(c(#f8f8ff, white,
#736AFF, cyan, yellow, #F87431, #FF7F00, red,
#7E2217)))

###END

Thanks very much for any help,

Jason

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] paste with list

2008-09-22 Thread Antje

Hello,

I guess the solution is rather simple but whatever I tried, I don't manage to 
get the result as I want to have it:


I have several vectors of equal length in a list and I'd like to combine all 
first elements to a single string, all second elements to a single string, ..., 
all n-th elements to a single string.


# Example code (how it should look like):
t1 - c(1,2,3)
t2 - c(3.4,5.5,1.1)
paste(t1,t2, sep=\t)

# and now how the data is available
tl - list(t1,t2)
??? what do I have to do to get the same output ???

Can anybody help me?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] zoo: hourly values (local time) not unique

2008-09-22 Thread vwl-mailingliste
Hi!

I've got a time series as a zoo object which contains hourly values. My problem 
is that these values occur in every real hour with regard to daylight savings 
time. I.e. the last sunday in march, i'll have 23values whereas the last sunday 
in october contains 25 values instead of 24. 
Thus if I try to aggregate the data using for example tapply (e.g. to get a 
monthly mean), I get the error 

some methods for zoo objects do not work if the index entries in 'order.by' 
are not unique

Any idea how I can solve this without having to remove/add an hour each year 
manually? Or, as I'm quite new to R, how I could easily manipulate my data so 
that the missing hour is introduced and the double hour is cut from the 
data (and the index)?

I'd really appreciate your help! Thanks in advance,
Arne
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] paste with list

2008-09-22 Thread Dimitris Rizopoulos

try this:

t1 - c(1, 2, 3)
t2 - c(3.4, 5.5, 1.1)
tl - list(t1, t2)

do.call(paste, c(tl, sep = \t))


I hope it helps.

Best,
Dimitris


Antje wrote:

Hello,

I guess the solution is rather simple but whatever I tried, I don't 
manage to get the result as I want to have it:


I have several vectors of equal length in a list and I'd like to combine 
all first elements to a single string, all second elements to a single 
string, ..., all n-th elements to a single string.


# Example code (how it should look like):
t1 - c(1,2,3)
t2 - c(3.4,5.5,1.1)
paste(t1,t2, sep=\t)

# and now how the data is available
tl - list(t1,t2)
??? what do I have to do to get the same output ???

Can anybody help me?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] paste with list

2008-09-22 Thread Henrique Dallazuanna
Try this:

paste(tl[[1]], tl[[2]], sep=\t)

On Mon, Sep 22, 2008 at 11:08 AM, Antje [EMAIL PROTECTED] wrote:
 Hello,

 I guess the solution is rather simple but whatever I tried, I don't manage
 to get the result as I want to have it:

 I have several vectors of equal length in a list and I'd like to combine all
 first elements to a single string, all second elements to a single string,
 ..., all n-th elements to a single string.

 # Example code (how it should look like):
 t1 - c(1,2,3)
 t2 - c(3.4,5.5,1.1)
 paste(t1,t2, sep=\t)

 # and now how the data is available
 tl - list(t1,t2)
 ??? what do I have to do to get the same output ???

 Can anybody help me?

 Antje

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] paste with list

2008-09-22 Thread Antje

Great! That's exactly what I was looking for.
(I see, I still have to learn a lot...)

Thank you!

Antje



Dimitris Rizopoulos schrieb:

try this:

t1 - c(1, 2, 3)
t2 - c(3.4, 5.5, 1.1)
tl - list(t1, t2)

do.call(paste, c(tl, sep = \t))


I hope it helps.

Best,
Dimitris


Antje wrote:

Hello,

I guess the solution is rather simple but whatever I tried, I don't 
manage to get the result as I want to have it:


I have several vectors of equal length in a list and I'd like to 
combine all first elements to a single string, all second elements to 
a single string, ..., all n-th elements to a single string.


# Example code (how it should look like):
t1 - c(1,2,3)
t2 - c(3.4,5.5,1.1)
paste(t1,t2, sep=\t)

# and now how the data is available
tl - list(t1,t2)
??? what do I have to do to get the same output ???

Can anybody help me?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] zoo: hourly values (local time) not unique

2008-09-22 Thread Gabor Grothendieck
See question #1 in the zoo faq:

library(zoo)
vignette(zoo-faq)

Also in the upcoming zoo 1.6-0, not yet on CRAN but in the development version
at R-Forge found here:

http://r-forge.r-project.org/projects/zoo/

there are a set of make.unique functions and a make.unique= argument in
read.zoo which will provide additional capabilities for uniquifying series.

On Mon, Sep 22, 2008 at 10:13 AM,  [EMAIL PROTECTED] wrote:
 Hi!

 I've got a time series as a zoo object which contains hourly values. My 
 problem is that these values occur in every real hour with regard to 
 daylight savings time. I.e. the last sunday in march, i'll have 23values 
 whereas the last sunday in october contains 25 values instead of 24.
 Thus if I try to aggregate the data using for example tapply (e.g. to get a 
 monthly mean), I get the error

 some methods for zoo objects do not work if the index entries in 
 'order.by' are not unique

 Any idea how I can solve this without having to remove/add an hour each year 
 manually? Or, as I'm quite new to R, how I could easily manipulate my data so 
 that the missing hour is introduced and the double hour is cut from the 
 data (and the index)?

 I'd really appreciate your help! Thanks in advance,
 Arne
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time series (ts) questions.

2008-09-22 Thread Gabor Grothendieck
Try this to append 100 to the end of the series, say:

tt - ts(1:12, frequency=5) # sample data
ts(c(tt, 100), start = start(tt), frequency = frequency(tt))


On Mon, Sep 22, 2008 at 2:17 AM,  [EMAIL PROTECTED] wrote:
 I have been working with the base time series object (ts) and I had a couple 
 of questions that hopefully this group can help me with:

 1) What is the best why to append an observation to an existing time-series? 
 Suppose I have a time series:

 t - ts(1:12, frequency=5)

 This would generate two complete cycles and one remainder. Now I would like 
 to append an observation to this time series. I could use 'c' but then I 
 would need to rebuild the whole time series and I would need to know the 
 frequency etc. I would like some operation like '+' that would simply append 
 the value to the end of the time series (incrementing the 'las time value so 
 thing like cycle() still output the correnct values) but alas

 t + 10

 is already taken as an equally useful operation by adding 10 to each element 
 in the time series (rather than in thie case, appending ts(10,frequency) with 
 a time value of 13 to the time series).

 2) How is the best way to get the last time value in a time series? I can do 
 something like:

 (start(t)[2] - 1) + (end(t)[1]-1) * frequency(t) + end(t)[2]

 But there has to be an easier way.

 Thank you.

 Kevin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
 What are you going to do with the data once you have read it in?  Are
 all the data items numeric?  If they are numeric, you would need at
 least 8GB to hold one copy and probably a machine with 32GB if you
 wanted to do any manipulation on the data.

Well, I will use only sets of variables to analyze, I cant manage the full
50 variables at a time, of course. So each time I make an analysis I
will extract the information I need, so that's why I wanted an easy way to
extract parts of the file.

Best regards,
Jose Lozano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
 So is each line just ACCGTATAT etc etc?

Exacty, A_G, A_A, G_G and the such.

 If you have fixed width fields in a file, so that every line is the
 same length, then you can use random access methods to get to a
 particular value - just multiply the line length by the row number you

Nice hint! I didn’t think on this. But I fear that if I have missing values
on the file I wont be able to read the right information...

 When doing this, it's a good idea to test your dataset first to make
 sure the lines and fields are right.

Yes, I am trying to figure out if all the lines have the exact same lenght
to use a random access method to read it.

Thanks,
Jose Lozano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Combine data frames using column names as key

2008-09-22 Thread jimineep

Hi guys,

Suppose I have 2 data frames ie:
 values
one0.32
two0.25
three  0.11

and 
 values
two0.66
one0.74
three  0.19

nb the first column is the row names in both cases

How can I combine them on the row names column? Ie to make something like


 values.1 values.2
one0.32   0.74
two0.25   0.66
three  0.11   0.19

I guess its data.frame or c.bind but I keep getting errors when I try to
combine them on row names...

Many many thanks,

Jim
-- 
View this message in context: 
http://www.nabble.com/Combine-data-frames-using-column-names-as-%22key%22-tp19609173p19609173.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combine data frames using column names as key

2008-09-22 Thread Henrique Dallazuanna
Try:

data.frame(merge(df1, df2, by = row.names), row.names = 1)

On Mon, Sep 22, 2008 at 12:34 PM, jimineep [EMAIL PROTECTED] wrote:

 Hi guys,

 Suppose I have 2 data frames ie:
 values
 one0.32
 two0.25
 three  0.11

 and
 values
 two0.66
 one0.74
 three  0.19

 nb the first column is the row names in both cases

 How can I combine them on the row names column? Ie to make something like


 values.1 values.2
 one0.32   0.74
 two0.25   0.66
 three  0.11   0.19

 I guess its data.frame or c.bind but I keep getting errors when I try to
 combine them on row names...

 Many many thanks,

 Jim
 --
 View this message in context: 
 http://www.nabble.com/Combine-data-frames-using-column-names-as-%22key%22-tp19609173p19609173.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread jim holtman
Why don't you make one pass through your data and encode you
characters as integers (it would appear that you only have 16
combinations).  You might also want to consider using the 'raw' object
since these only take up one byte of storage -- will reduce your
storage requirements by 4.  Then store each row in a 'filehash' object
so you can quickly retrieve a row at a time and then index directly to
the byte(s) that have the information that you want.

On Mon, Sep 22, 2008 at 7:00 AM, José E. Lozano [EMAIL PROTECTED] wrote:
 So is each line just ACCGTATAT etc etc?

 Exacty, A_G, A_A, G_G and the such.

 If you have fixed width fields in a file, so that every line is the
 same length, then you can use random access methods to get to a
 particular value - just multiply the line length by the row number you

 Nice hint! I didn't think on this. But I fear that if I have missing values
 on the file I wont be able to read the right information...

 When doing this, it's a good idea to test your dataset first to make
 sure the lines and fields are right.

 Yes, I am trying to figure out if all the lines have the exact same lenght
 to use a random access method to read it.

 Thanks,
 Jose Lozano

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread Barry Rowlingson
2008/9/22 jim holtman [EMAIL PROTECTED]:
 Why don't you make one pass through your data and encode you
 characters as integers (it would appear that you only have 16
 combinations).  You might also want to consider using the 'raw' object
 since these only take up one byte of storage -- will reduce your
 storage requirements by 4.  Then store each row in a 'filehash' object
 so you can quickly retrieve a row at a time and then index directly to
 the byte(s) that have the information that you want.

 My original response of specifying a relational database now seems
somewhat comical :)

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re lative novice: Working with fitdistr(MASS): 3 questions

2008-09-22 Thread Ted Byers

OK, I am now at the point where I can use fitdistr to obtain a fit of one of
the standard distributions to mydata.

It is quite remarkable how different the parameters are for different
samples through from the same system.  Clearly the system itself is not
stationary.

Anyway, question 1:  I require a visual perspective of the fit I get.  I can
use hist.scott to get a hisogram (and just have to figure out how to get
finer granularity from it - my samples are taken weekly, but the histogram
bars cover two weeks of data and the most interesting changes happen in the
first three to four weeks - after that things slow down tremendously), but
how would I overlay a plot of the best distribution I get from fitdistr over
it?

Second question: I don't see anything in the documentation for fitdistr that
says anything about using the distribution obtained to integrate the
distribution over some range of values.  I get weekly sampled, and for each
sample I get a certain number of events each week for about three months.  I
need to be able to use the distribution to estimate the number of such
events next week or the week after, and how long it will be that the
probability of such an event is so low that no more of them are likely to be
observed from that sample ever.  What package or functions should I be
looking at here to get this done?

Third question: I see nothing in the docs about non-central distributions. 
The distribution most likely to fit is cauchy, but we know that there is
skew that depends on the magnitude: large positive deviates are more common
that large negative deviates, but extremely large positive deviates are less
common that extremely large negative deviates.  What we don't know is how
significant such skewness is for the overall distribution.  How can I assess
this, or can I assess this, using fitdistr (or some other function I haven't
found yet)?

Thanks

Ted
-- 
View this message in context: 
http://www.nabble.com/Relative-novice%3A-Working-with-fitdistr%28MASS%29%3A-3-questions-tp19610812p19610812.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread Ted Harding
On 22-Sep-08 11:00:30, José E. Lozano wrote:
 So is each line just ACCGTATAT etc etc?
 
 Exacty, A_G, A_A, G_G and the such.
 
 If you have fixed width fields in a file, so that every line is the
 same length, then you can use random access methods to get to a
 particular value - just multiply the line length by the row number you
 
 Nice hint! I didn’t think on this. But I fear that if I have missing
 values on the file I wont be able to read the right information...
 
 When doing this, it's a good idea to test your dataset first to make
 sure the lines and fields are right.
 
 Yes, I am trying to figure out if all the lines have the exact same
 lenght to use a random access method to read it.

If you were using Linux, I would suggest a command on the lines of

  cat filename | awk '{print(length($0))}'

which would give you the length of each line. But since you have
around 2000 lines, to simply check whether they all have the same
length (in bytes/characters) you can extend the above to

  cat filename | awk '{print(length($0))}' | sort -u

which will present you with all the different line-lengths. If they
are all the same length you will get one number.

I just tested this on a file with lines exceeding 500,000 characters
in length, and it worked perfectly well even for such long lines.

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 22-Sep-08   Time: 17:03:21
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to set rownames / colnames for matrices in a list

2008-09-22 Thread Antje

Hello,

I have another stupid question. I hope you can give me a hint how to solve this:

I have a list and one element is again a list containing matrices, all of the 
same dimensions. Now, I'd like to set the dimnames for all matrices:


example code:

m1 - matrix(1:25, nrow=5)
m2 - matrix(26:50, nrow=5)
# ... there can be much more than two matrices

l - list()
l[[1]] - list(m1,m2)

r_names - LETTERS[1:5]
c_names - LETTERS[6:10]

? how can I apply these names to any number of matrices within this list-list ?

Ciao,
Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using wildcards in subsets

2008-09-22 Thread Daniel Münch
Hi there,

I am looking for a way to use wildcards in a subset, this is not
working:


subset(data, colname-1==valuecolname2==value*,
select=colx:coly)


is there a way to use wildcards here?

Thanks for your help,
Daniel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to set rownames / colnames for matrices in a list

2008-09-22 Thread Alain Guillet

Hi,

If all your matrices have the same size, you should work with an array 
and not with a list. Then you can use dimnames to set the names of the 
rows, columns, and so on..


Alain

Antje wrote:

Hello,

I have another stupid question. I hope you can give me a hint how to 
solve this:


I have a list and one element is again a list containing matrices, all 
of the same dimensions. Now, I'd like to set the dimnames for all 
matrices:


example code:

m1 - matrix(1:25, nrow=5)
m2 - matrix(26:50, nrow=5)
# ... there can be much more than two matrices

l - list()
l[[1]] - list(m1,m2)

r_names - LETTERS[1:5]
c_names - LETTERS[6:10]

? how can I apply these names to any number of matrices within this 
list-list ?


Ciao,
Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Alain Guillet
Statistician and Computer Scientist

Institut de statistique - Université catholique de Louvain
Bureau d.126
Voie du Roman Pays, 20
B-1348 Louvain-la-Neuve
Belgium

tel: +32 10 47 30 50

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help creating spatial correlation for MC simulation

2008-09-22 Thread jjh21

Thank you for the input.

Which command in the spatstat package am I looking for? The documentation is
unclear to me.


milton ruser wrote:
 
 Dear J.J.Harden
 
 I think that on spatial stat you will find several ways of simulate
 spatial
 pattern that (point or line) that may be what you are looking for. Case
 not,
 please let me know and may be we can improve some solution.
 
 Best wishes,
 
 miltinho astronauta
 brazil
 
 
 
 On Wed, Sep 17, 2008 at 7:36 PM, jjh21 [EMAIL PROTECTED] wrote:
 

 I want to create a dataset in R with spatial correlation (i.e.
 clustering)
 built in for a linear regression analysis. Any tips on how to do this?
 Thanks.
 --
 View this message in context:
 http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19542145.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19610885.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SmoothScatter plot range issue

2008-09-22 Thread Henrik Bengtsson
Hi,

Bioconductor.org is the home of the geneplotter package.  You get a
quicker response if you ask there.

/Henrik

On Mon, Sep 22, 2008 at 7:06 AM, Jason Pare [EMAIL PROTECTED] wrote:
 Hello,

 I am attempting to use smoothScatter to plot a heatmap of locations of
 events in an x-y axis. When I plot the heatmap without passing xlim and ylim
 parameters, it fills the plot area but the perspective is a bit skewed. I
 would like to standardize these plots to a uniform window size that does not
 depend on the range of values in the dataframe. However, when I resize the
 plot using xlim or ylim, there is a light blue background that surrounds the
 immediate area of the data (correspnding to the range of the points listed
 in the dataframe), surrounded by extra white space for the new xlim and ylim
 values I have added. Some of the rings around the datapoints are also cut
 off at the margins.

 I would like to stop the plot from being cut off, and want this light blue
 range to extend throughout the entire area of the resized plot. I have
 attempted to add NAs, but it has no effect on expanding this light blue plot
 area. Code is below.

  xyz is a dataframe containing two columns with corresponding x and y
 values

 library(geneplotter)
 library(RColorBrewer)

 layout(matrix(1:1, ncol=2, byrow=TRUE))

 smoothScatter(xyz, nrpoints=0, xlim=c(-3,3),
 ylim=c(0,5),colramp=colorRampPalette(c(#f8f8ff, white,
 #736AFF, cyan, yellow, #F87431, #FF7F00, red,
 #7E2217)))

 ###END

 Thanks very much for any help,

 Jason

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Statistical question re assessing fit of distribution functions.

2008-09-22 Thread Ted Byers

I am in a situation where I have to fit a distrution, such as cauchy or
normal, to an empirical dataset.  Well and good, that is easy.

But I wanted to assess just how good the fit is, using ks.test.

I am concerned about the following note in the docs (about the example
provided):  Note that the distribution theory is not valid here as we have
estimated the parameters of the normal distribution from the same sample

This implies I should not use ks.test(x,pnorm,mean =1.187, sd =0.917),
where the numbers shown are estimated from 'x'.  If this is so, how do I get
a correct test?  I know I can not use different samples because of just how
different the parameters are from one sample to the next, so using
parameters estimated from the sample from week one to define the
distribution function for ks.test will give a poor fit for the data from
week two.  And the sample size is small enough that I would not have
confidence in the parameters estimated from a portion of a samlpe to fit
against the remainder of the sample.

Thanks

Ted

-- 
View this message in context: 
http://www.nabble.com/Statistical-question-re-assessing-fit-of-distribution-functions.-tp19611539p19611539.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread Gabor Grothendieck
Try this:

read.table(pipe(/Rtools/bin/gawk -f cut.awk bigdata.dat))

where cut.awk contains the single line (assuming you
want fields 101 through 110 and none other):

{ for(i = 101; i = 110; i++) printf(%s , $i); printf \n }

or just use cut.  I tried the gawk command above on Windows
Vista with an artificial file of 500,000 columns and 2 rows and it seemed
instantaneous.

On Windows the above uses gawk from Rtools available at:
   http://www.murdoch-sutherland.com/Rtools/
or you can separately install gawk.  Rtools also has cut if you
prefer that.

On Mon, Sep 22, 2008 at 2:50 AM, José E. Lozano [EMAIL PROTECTED] wrote:
 Hello,



 Recently I have been trying to open a huge database with no success.



 It's a 4GB csv plain text file with around 2000 rows and over 500,000
 columns/variables.



 I have try with The SAS System, but it reads only around 5000 columns, no
 more. R hangs up when opening.



 Is there any way to work with parts (a set of columns) of this database,
 since its impossible to manage it all at once?



 Is there any way to establish a link to the csv file and to state the
 columns you want to fetch every time you make an analysis?



 I've been searching the net, but found little about this topic.



 Best regards,

 Jose Lozano


[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] changing the text offset for axis labels

2008-09-22 Thread Arthur Roberts

Hi, all,

I was wondering if there is a way to change the offset of axis labels  
from the axis.  In other words, I need the axis labels closer to the  
acis than the default.  Thanks for the help.


Best wishes,
Art Roberts
University of Washington
Seattle, WA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] as.day() Function (zoo question)

2008-09-22 Thread stephen sefick
I am was going to look at the as.yearmon function in the zoo package
and write a as.day function to aggregate a time series of 96
observations per day into the mean for each day, but I don't know how
to look at the code so that I can convert it into something I can use.
 On top of that I believe that it is probably an S3 method and I
haven't quite gotten that far in my programming experience.

How I want the mean for each day.  the real data set has NA s randomly
interspersed.

library(chron)
library(zoo)
t1 - chron(1/1/2006, 00:00:00)
t2 - chron(12/31/2006, 23:45:00)
deltat - times(00:15:00)
tt - seq(t1, t2, by = times(00:15:00))
value - rnorm(35040)
z - zoo(value, tt)

thanks

-- 
Stephen Sefick
Research Scientist
Southeastern Natural Sciences Academy

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods. We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.day() Function (zoo question)

2008-09-22 Thread Gabor Grothendieck
chron values are represented as day + fraction of a day so:
try this:

aggregate(z, floor, mean)

On Mon, Sep 22, 2008 at 12:56 PM, stephen sefick [EMAIL PROTECTED] wrote:
 I am was going to look at the as.yearmon function in the zoo package
 and write a as.day function to aggregate a time series of 96
 observations per day into the mean for each day, but I don't know how
 to look at the code so that I can convert it into something I can use.
  On top of that I believe that it is probably an S3 method and I
 haven't quite gotten that far in my programming experience.

 How I want the mean for each day.  the real data set has NA s randomly
 interspersed.

 library(chron)
 library(zoo)
 t1 - chron(1/1/2006, 00:00:00)
 t2 - chron(12/31/2006, 23:45:00)
 deltat - times(00:15:00)
 tt - seq(t1, t2, by = times(00:15:00))
 value - rnorm(35040)
 z - zoo(value, tt)

 thanks

 --
 Stephen Sefick
 Research Scientist
 Southeastern Natural Sciences Academy

 Let's not spend our time and resources thinking about things that are
 so little or so large that all they really do for us is puff us up and
 make us feel like gods. We are mammals, and have not exhausted the
 annoying little problems of being mammals.

-K. Mullis

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.day() Function (zoo question)

2008-09-22 Thread stephen sefick
perfect thanks

On Mon, Sep 22, 2008 at 1:07 PM, Gabor Grothendieck
[EMAIL PROTECTED] wrote:
 chron values are represented as day + fraction of a day so:
 try this:

 aggregate(z, floor, mean)

 On Mon, Sep 22, 2008 at 12:56 PM, stephen sefick [EMAIL PROTECTED] wrote:
 I am was going to look at the as.yearmon function in the zoo package
 and write a as.day function to aggregate a time series of 96
 observations per day into the mean for each day, but I don't know how
 to look at the code so that I can convert it into something I can use.
  On top of that I believe that it is probably an S3 method and I
 haven't quite gotten that far in my programming experience.

 How I want the mean for each day.  the real data set has NA s randomly
 interspersed.

 library(chron)
 library(zoo)
 t1 - chron(1/1/2006, 00:00:00)
 t2 - chron(12/31/2006, 23:45:00)
 deltat - times(00:15:00)
 tt - seq(t1, t2, by = times(00:15:00))
 value - rnorm(35040)
 z - zoo(value, tt)

 thanks

 --
 Stephen Sefick
 Research Scientist
 Southeastern Natural Sciences Academy

 Let's not spend our time and resources thinking about things that are
 so little or so large that all they really do for us is puff us up and
 make us feel like gods. We are mammals, and have not exhausted the
 annoying little problems of being mammals.

-K. Mullis

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Stephen Sefick
Research Scientist
Southeastern Natural Sciences Academy

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods. We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading in results from system(). There must be an easier way...

2008-09-22 Thread Michael A. Gilchrist
Sorry, I misunderstood what I was doing and misspoke.  I don't think there's 
a bug.  I had called COMMAND w/in read.delim.


Thanks for all of your help and sorry for the misinformation.

Sincerely,

Mike

-
Department of Ecology  Evolutionary Biology
569 Dabney Hall
University of Tennessee
Knoxville, TN 37996-1610

phone:(865) 974-6453
fax:  (865) 974-6042

web: http://eeb.bio.utk.edu/gilchrist.asp
-


On Thu, 18 Sep 2008, Henrik Bengtsson wrote:


On Thu, Sep 18, 2008 at 1:39 PM, Michael A. Gilchrist [EMAIL PROTECTED] wrote:

Wow, that's elegant and simple.  It's also faster than my approach.

NB, you don't need to use close(), read.delim() closes the pipe when its
done reading.


If read.delim() close the connection in this case, it's a bug.  It
should only close the connection if it opens it.

/Henrik



Thank you all for your suggestions, they really helped me with this problem
and understand R just a bit better.

Sincerely,

Mike
-
Department of Ecology  Evolutionary Biology
569 Dabney Hall
University of Tennessee
Knoxville, TN 37996-1610

phone:(865) 974-6453
fax:  (865) 974-6042

web: http://eeb.bio.utk.edu/gilchrist.asp
-


On Fri, 12 Sep 2008, Prof Brian Ripley wrote:


Why not use

con - pipe(COMMAND)
foo - read.delim(con, colClasses=numeric)
close(con)

?  See the 'R Data Input/Output Manual'.

On Fri, 12 Sep 2008, Michael A. Gilchrist wrote:


Hello,

I am currently using R to run an external program and then read the
results the external program sends to the stdout which are tsv data.

When R reads the results in it converts it to to a list of strings which
I then have to maniuplate with a whole slew of commands (which, figuring out
how to do was a reall challenge for a newbie like myself)--see below.

Here's the code I'm using.  COMMAND runs the external program.

  rawInput= system(COMMAND,intern=TRUE);##read in tsv values
  rawInput = strsplit(rawInput, split=\t);##split elements w/in the
list
 ##of character strings by
\t
  rawInput = unlist(rawInput); ##unlist, making it one long vector
  mode(rawInput)=double; ##convert from strings to double
  finalInput = data.frame(t(matrix(rawInput, nrow=6))); ##convert

Because I will be doing this 100,000 of times as part of an optimization
problem, I am interested in learning a more efficient way of doing this
conversion.

Any suggestions would be appreciated.


Thanks in advance.

Mike


-
Department of Ecology  Evolutionary Biology
569 Dabney Hall
University of Tennessee
Knoxville, TN 37996-1610

phone:(865) 974-6453
fax:  (865) 974-6042

web: http://eeb.bio.utk.edu/gilchrist.asp



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Statistical question re assessing fit of distribution functions.

2008-09-22 Thread Timur Shtatland
If one of the goals is the normality test, then there may be better
alternatives to the Kolmogorov-Smirnov test.
See an explanation on:
http://graphpad.com/FAQ/viewfaq.cfm?faq=959

The R implementation:
?shapiro.test

A casual search also turned this up:
http://tolstoy.newcastle.edu.au/R/help/04/09/3201.html
http://tolstoy.newcastle.edu.au/R/help/04/08/3121.html
http://www.karlin.mff.cuni.cz/~pawlas/2008/MAI061/dagost.R

Best,

Timur
--
Timur Shtatland, Ph.D.
Senior Bioinformatics Scientist
Agencourt Bioscience Corporation - A Beckman Coulter Company
500 Cummings Center, Suite 2450
Beverly, MA 01915
www.agencourt.com

On Mon, Sep 22, 2008 at 12:26 PM, Ted Byers [EMAIL PROTECTED] wrote:

 I am in a situation where I have to fit a distrution, such as cauchy or
 normal, to an empirical dataset.  Well and good, that is easy.

 But I wanted to assess just how good the fit is, using ks.test.

 I am concerned about the following note in the docs (about the example
 provided):  Note that the distribution theory is not valid here as we have
 estimated the parameters of the normal distribution from the same sample

 This implies I should not use ks.test(x,pnorm,mean =1.187, sd =0.917),
 where the numbers shown are estimated from 'x'.  If this is so, how do I get
 a correct test?  I know I can not use different samples because of just how
 different the parameters are from one sample to the next, so using
 parameters estimated from the sample from week one to define the
 distribution function for ks.test will give a poor fit for the data from
 week two.  And the sample size is small enough that I would not have
 confidence in the parameters estimated from a portion of a samlpe to fit
 against the remainder of the sample.

 Thanks

 Ted

 --
 View this message in context: 
 http://www.nabble.com/Statistical-question-re-assessing-fit-of-distribution-functions.-tp19611539p19611539.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hmisc and Ubuntu (aptitude install)

2008-09-22 Thread Matthew Pettis
Thank You All,

I think all of this may have been due to shared library conflict
headaches.  At one point, I inadvertently upgraded my Perl install to
5.10, and I think that messed up a lot of my libraries.  I have now
started with a clean Ubuntu install, and am going to see if I can work
my way back up to installing R and making that work.  I will recontact
the list if this problem persists through this reimaging of my server.

Thanks again,
Matt

On Mon, Sep 22, 2008 at 8:20 AM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:
 On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote:
 Matthew,

 As per the CRAN Ubuntu README

   http://cran.r-project.org/bin/linux/ubuntu/

 install the Ubuntu r-base-dev package to compile R packages from
 sources.

 Well there should be a working r-cran-hmisc package.  You simply got a
 '404' error indicating that your network access (using http) to the
 external Ubuntu mirror was broken.   Fix that, or download the package
 by hand.  It may be easier to just install the missing package.

 That said, Vincent is of course entirely correct on the need for
 r-base-dev.

 Dirk


 Vincent

 Le lun. 22 sept. à 00:08, Matthew Pettis a écrit :

 Hi,

 I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install.
 I tried getting Hmisc from within R by issuing the standard
 'install.packages' command, but it said I needed 'gfortran' to
 compile.  I thought I could circumvent this by using 'aptitude' to get
 the package 'r-cran-hmisc', but when I got it, the package had
 critical missing parts (got 404s).  So, I'll be trying to go back and
 download 'gfortran', but can anybody tell me if this aptitude ubuntu
 package should be kept up to date and is just currently overlooked?

 Thanks,
 Matt

 --
 It is from the wellspring of our despair and the places that we are
 broken that we come to repair the world.
 -- Murray Waas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
 Three out of two people have difficulties with fractions.




-- 
It is from the wellspring of our despair and the places that we are
broken that we come to repair the world.
-- Murray Waas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing the text offset for axis labels

2008-09-22 Thread Greg Snow
Look at ?par and scroll down to the section on 'mgp'.  Or you can suppress the 
axis when you make the plot, then use the axis function to include it with more 
control (see ?axis).

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 project.org] On Behalf Of Arthur Roberts
 Sent: Monday, September 22, 2008 10:23 AM
 To: [EMAIL PROTECTED]
 Subject: [R] changing the text offset for axis labels

 Hi, all,

 I was wondering if there is a way to change the offset of axis labels
 from the axis.  In other words, I need the axis labels closer to the
 acis than the default.  Thanks for the help.

 Best wishes,
 Art Roberts
 University of Washington
 Seattle, WA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to execute external programs with R?

2008-09-22 Thread Arthur Roberts

Hi, all,

Could anyone give me advise on who the execute external programs with  
R?  It would be greatly appreciated.


Art Roberts
University of Washington.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to execute external programs with R?

2008-09-22 Thread Duncan Murdoch

On 9/22/2008 2:50 PM, Arthur Roberts wrote:

Hi, all,

Could anyone give me advise on who the execute external programs with  
R?  It would be greatly appreciated.


The system() or shell() functions can do this; Windows also has 
shell.exec().


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Building binary package fails because of missing dependent package

2008-09-22 Thread Hans-Peter Suter
On an (Intel Leopard) Mac I try to build a package (mxFinance) which
depends on another package (mxGraphics). The dependendy is 1) a
'Depends:' in DESCRIPTION and 2) an import in NAMESPACE.

- The build fails if the dependent package (mxGraphics) is not
installed in the R.framework

Do I need to have installed all packages which are required by
packages to be built binary (source builds are ok)?

Cheers,
Hans-Peter


---
Macintosh:mxFinance chappi$ R CMD BUILD --binary mxFinance
* checking for file 'mxFinance/DESCRIPTION' ... OK
* preparing 'mxFinance':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* removing junk files
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building binary distribution
* Installing *source* package 'mxFinance' ...
** libs
** arch - i386
gcc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-mmacosx-version-min=10.4 -std=gnu99
-I/Library/Frameworks/R.framework/Resources/include
-I/Library/Frameworks/R.framework/Resources/include/i386  -msse3
-fPIC  -g -O2 -march=nocona -c init.c -o init.o
gcc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-mmacosx-version-min=10.4 -std=gnu99 -dynamiclib
-Wl,-headerpad_max_install_names -mmacosx-version-min=10.4 -undefined
dynamic_lookup -single_module -multiply_defined suppress
-L/usr/local/lib -o mxFinance.so init.o
-F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework
-Wl,CoreFoundation
ld: warning, duplicate dylib
/Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libgcc_s.1.dylib
** arch - ppc
gcc -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-mmacosx-version-min=10.4 -std=gnu99
-I/Library/Frameworks/R.framework/Resources/include
-I/Library/Frameworks/R.framework/Resources/include/ppc
-I/usr/local/include-fPIC  -g -O2 -c init.c -o init.o
gcc -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-mmacosx-version-min=10.4 -std=gnu99 -dynamiclib
-Wl,-headerpad_max_install_names -mmacosx-version-min=10.4 -undefined
dynamic_lookup -single_module -multiply_defined suppress
-L/usr/local/lib -o mxFinance.so init.o
-F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework
-Wl,CoreFoundation
ld: warning, duplicate dylib
/Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libgcc_s.1.dylib
** R
** data
** preparing package for lazy loading
Loading required package: mxGraphics
Warning in library(pkg, character.only = TRUE, logical.return = TRUE,
lib.loc = lib.loc) :
  there is no package called 'mxGraphics'
Error: package 'mxGraphics' could not be loaded
Execution halted
ERROR: lazy loading failed for package 'mxFinance'
** Removing 
'/var/folders/xr/xr01D7JAEtGe4S5uaDQSgTI/-Tmp-/Rinst881133514/mxFinance'
 ERROR
* installation failed

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gbm error

2008-09-22 Thread Darin Brooks
Good afternoon
 
Has anyone tried using Dr. Elith's BRT script?  I cannot seem to run
gbm.step  from the installed gbm package.  Is it something external to gbm?
 
When I run the script itself
 
- gbm.step(data=model.data, 

gbm.x = colx:coly,

gbm.y = colz,

family = bernoulli,

tree.complexity = 5,

learning.rate = 0.01,

bag.fraction = 0.5)

 
... I keep encountering the same error:
 
ERROR:  
  unexpected ')' in bag.fraction = 0.5)
 
I've tried all sorts of variations (such as)
 
sep22BRT.lr01 - gbm{data=sep22BRT, 
gbm.x = sep22BRT[,3:42], 
gbm.y = sep22BRT[,1], 
family = bernoulli, 
tree.complexity = 5, 
learning.rate = 0.01, 
bag.fraction = 0.5}
 
and cannot find the problem. 
 
Is there a glaring error that I am overlooking? 
 
 
Darin Brooks
Geomatics/GIS/Remote Sensing Coordinator
Kim Forest Management Ltd. Cranbrook Office
Cranbrook, BC
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] change the panel name in xyplot

2008-09-22 Thread Ronaldo Reis Junior
Hi,

I try to change the panel name in a xyplot without success.

Look this example from xyplot manual:

xyplot(Murder ~ Population | state.region,data=states)

The panel title are: 
Northeast, South, North Central, West, that are factor from state.region.

I need do change some names and, for example, put some of these in italic. I 
dont find how change this. 

I looking for this in Deepayan Sakar lattice book, but I dont find the way.

Any help?

Thanks
Ronaldo
-- 
You can't make a program without broken egos.
--
 Prof. Ronaldo Reis Júnior
|  .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional
| : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia
| `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
|   `- Fone: (38) 3229-8192 | [EMAIL PROTECTED] | [EMAIL PROTECTED]
| http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366
--
Favor NÃO ENVIAR arquivos do Word ou Powerpoint
Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change the panel name in xyplot

2008-09-22 Thread Henrique Dallazuanna
Try this:

xyplot(Murder ~ Population | state.region,
   data = states,
   strip = strip.custom(factor.levels = c(expression(italic(A)),
B,  C,  D)))

On Mon, Sep 22, 2008 at 4:33 PM, Ronaldo Reis Junior [EMAIL PROTECTED] wrote:
 Hi,

 I try to change the panel name in a xyplot without success.

 Look this example from xyplot manual:

 xyplot(Murder ~ Population | state.region,data=states)

 The panel title are:
 Northeast, South, North Central, West, that are factor from state.region.

 I need do change some names and, for example, put some of these in italic. I
 dont find how change this.

 I looking for this in Deepayan Sakar lattice book, but I dont find the way.

 Any help?

 Thanks
 Ronaldo
 --
 You can't make a program without broken egos.
 --
 Prof. Ronaldo Reis Júnior
 |  .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional
 | : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia
 | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
 |   `- Fone: (38) 3229-8192 | [EMAIL PROTECTED] | [EMAIL PROTECTED]
 | http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366
 --
 Favor NÃO ENVIAR arquivos do Word ou Powerpoint
 Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] findInterval(), binary search, log(N) complexity

2008-09-22 Thread Markus Loecher
Dear R users,
the help for findInterval(x,vec) suggests a logarithmic dependence on N
(=length(vec)), which would imply a binary search type algorithm.
However, when I test this hypothesis, in the following manner:

set.seed(-3645);
l - vector();
N.seq - c(5000, 50, 100, 1000, 5000);k - 1
for (N in N.seq){
  tmp - sort(round(stats::rt(N, df=2), 2));
  l[k] - system.time(it3 - findInterval(-1, tmp))[2];k - k + 1;
}
plot(N.seq,l,type=b,xlab=length(vec), ylab=CPU time);

the resulting plot suggests a linear relationship.
I must be missing sth. here ?

Thanks !

Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to find a shift between two curves or data sets

2008-09-22 Thread Sébastien Durand

Dear Hans,

Thanks for your reply.

I will read that book. 


Cheers!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphing netCDF files

2008-09-22 Thread Paul Hiemstra

Hi Steve,

If you read your netCDF files into R you end up with sp-classes which 
can be displayed using spplot. But you do not seem to use rgdal.


If you can make a data.frame with the x, y and z coordinates this can 
quite easily be transformed into an sp-class:


library(sp)
dat = data.frame(x = UTMx, y = UTMy, z = wat.data2001q1,,i])
coordinates(dat) = ~x+y   # tell spplot what the names of the columns 
with the x and y coordinates are

gridded(dat) = TRUE   # make clear it is a grid
spplot(dat)

For more details see the documentation for the sp-package, especially 
spplot. These kinds of questions are more suitable for the r-sig-geo 
mailing list and not the general r-help list.


hope this helps,

Paul

[EMAIL PROTECTED] schreef:

Hello

I'm working with a large hydrological data set stored in a netCDF format.
The file stores x and y coordinates in the UTM projected coordinate system,
yet when I use image to graphically display the z variable, the image is
distorted in the sense that it does not plot the map in the correct spatial
organization.

I'm wondering if I need to define the projection of the netCDF file with
rgdal or proj4 routines first before I send it to the graphics device.
  

Defining the projection is not needed

My code is as follows:

 q1_2001 - open.ncdf(H:\\SKF_DESKTOP FILES\\My
Documents\\EDEN\\EDEN\\Surfaces\\2000_q1.nc, readunlimi=FALSE) #opens ncdf
file for reading
   wat.data2000q1 - get.var.ncdf(q1_2001,  verbose=FALSE ) # gets the real
information

 # GENERAL EXAMINATION OF HEADER DATA in the wat.data file
   day - get.var.ncdf(q1_2001, time)   # length(day) 91 days in quarter
   UTMx -   get.var.ncdf(q1_2001, x)   # columns (eastings)  # should
return 405
   UTMy -   get.var.ncdf(q1_2001, y)   # rows (northings)   #
should return 287

# plot first 91 days (3 months of the year)
for(i in 1:91) {
   !is.na( image(UTMx, UTMy, z = wat.data2001q1[,,i], col=brewer.pal(8,
YlGnBu),
 axes=T, pty=s, ylab=UTM Northing, xlab=UTM Easting,
 main = First Quater 2001)  )
 }

As I indicated above the map is displayed on the graphics device. However
the orientation is distorted pulling the x axis to wide and the y axis too
tall.  How can I set the graphics device to know the orientation and
scaling (if these are the correct terms) in order to display this map
correctly?

All insights will be greatly appreciated.

Thanks
Steve

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

Office (305) 224 - 4282
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone: +31302535773
Fax:+31302531145
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Profiling on Multicore and Parallel Systems

2008-09-22 Thread Imanpreet
Hello All,

In general when we use Rprof for performance evaluation on
Multicore systems the output provides the time on the basis of the user
time and the sampling time is equal to the the user time as reported by
system.time. This does not seem right behavior when R is linked to
BLAS/Lapack or other libraries which are optimized for parallel or multicore
architectures as over there user time can be more than the elapsed time and
one would be more interested in just the elapsed  time taken by
computation returned by gettimeofday()  per routine rather than user time
as returned by getrusage().


  Could anyone provide any pointers on how to best do R
profiling on parallel and multicore systems.

Regards,

-- 
Imanpreet Singh Arora

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manage huge database

2008-09-22 Thread Thomas Lumley

On Mon, 22 Sep 2008, Martin Morgan wrote:


José E. Lozano [EMAIL PROTECTED] writes:


Maybe you've not lurked on R-help for long enough :) Apologies!


Probably.


So, how much design is in this data? If none, and what you've
basically got is a 2000x50 grid of numbers, then maybe a more raw


Exactly, raw data, but a little more complex since all the 50 variables
are in text format, so the width is around 2,500,000.

snip

Is genetic DNA data (individuals genotyped), hence the large amount of
columns to analyze.


The Bioconductor package snpMatrix is designed for this type of
data. See

http://www.bioconductor.org/packages/2.2/bioc/html/snpMatrix.html

and if that looks promising


source('http://bioconductor.org/biocLite.R')
biocLite('snpMatrix')


Likely you'll quickly want a 64 bit (linux or Mac) machine.



netCDF is another useful option -- we have been using the ncdf package for 
large genomic datasets.  We read the data in one person at a time and 
write to netCDF.  For analysis we can then read any subsets.  Since we 
have imputed SNP data  as well as measured this comes to about 2.5 million 
variables on 4000 people for one of our data sets.



-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hmisc and Ubuntu (aptitude install)

2008-09-22 Thread Matthew Pettis
Hi All,

After rebuilding my Ubuntu image, I followed the instruction in this
thread, and everything worked out fine -- thank you again.

So, I'll just add: if you use R and perl, and don't have to download
perl5.10, then don't do it, at least not yet.  Or, if you do, then you
will have a lot of shared object tweaking.

Matt

On Mon, Sep 22, 2008 at 1:22 PM, Matthew Pettis
[EMAIL PROTECTED] wrote:
 Thank You All,

 I think all of this may have been due to shared library conflict
 headaches.  At one point, I inadvertently upgraded my Perl install to
 5.10, and I think that messed up a lot of my libraries.  I have now
 started with a clean Ubuntu install, and am going to see if I can work
 my way back up to installing R and making that work.  I will recontact
 the list if this problem persists through this reimaging of my server.

 Thanks again,
 Matt

 On Mon, Sep 22, 2008 at 8:20 AM, Dirk Eddelbuettel [EMAIL PROTECTED] wrote:
 On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote:
 Matthew,

 As per the CRAN Ubuntu README

   http://cran.r-project.org/bin/linux/ubuntu/

 install the Ubuntu r-base-dev package to compile R packages from
 sources.

 Well there should be a working r-cran-hmisc package.  You simply got a
 '404' error indicating that your network access (using http) to the
 external Ubuntu mirror was broken.   Fix that, or download the package
 by hand.  It may be easier to just install the missing package.

 That said, Vincent is of course entirely correct on the need for
 r-base-dev.

 Dirk


 Vincent

 Le lun. 22 sept. à 00:08, Matthew Pettis a écrit :

 Hi,

 I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install.
 I tried getting Hmisc from within R by issuing the standard
 'install.packages' command, but it said I needed 'gfortran' to
 compile.  I thought I could circumvent this by using 'aptitude' to get
 the package 'r-cran-hmisc', but when I got it, the package had
 critical missing parts (got 404s).  So, I'll be trying to go back and
 download 'gfortran', but can anybody tell me if this aptitude ubuntu
 package should be kept up to date and is just currently overlooked?

 Thanks,
 Matt

 --
 It is from the wellspring of our despair and the places that we are
 broken that we come to repair the world.
 -- Murray Waas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
 Three out of two people have difficulties with fractions.




 --
 It is from the wellspring of our despair and the places that we are
 broken that we come to repair the world.
 -- Murray Waas




-- 
It is from the wellspring of our despair and the places that we are
broken that we come to repair the world.
-- Murray Waas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] findInterval(), binary search, log(N) complexity

2008-09-22 Thread Duncan Murdoch

On 9/22/2008 1:51 PM, Markus Loecher wrote:

Dear R users,
the help for findInterval(x,vec) suggests a logarithmic dependence on N
(=length(vec)), which would imply a binary search type algorithm.
However, when I test this hypothesis, in the following manner:


R is open source.  Why test things this way, when you can look at the 
source?  You don't even need to go to C code for this:


 findInterval
function (x, vec, rightmost.closed = FALSE, all.inside = FALSE)
{
if (any(is.na(vec)))
stop('vec' contains NAs)
if (is.unsorted(vec))
stop('vec' must be sorted non-decreasingly)
if (has.na - any(ix - is.na(x)))
x - x[!ix]
nx - length(x)
index - integer(nx)
.C(find_interv_vec, xt = as.double(vec), n = 
as.integer(length(vec)),
x = as.double(x), nx = as.integer(nx), 
as.logical(rightmost.closed),

as.logical(all.inside), index, DUP = FALSE, NAOK = TRUE,
PACKAGE = base)
if (has.na) {
ii - as.integer(ix)
ii[ix] - NA
ii[!ix] - index
ii
}
else index
}
environment: namespace:base

Notice the is.unsorted test.  How could that be anything other than 
linear execution time in N? Similarly for any(ix - is.na(x)).


If you know the answers to those tests (as you do in your simulation), 
you could presumably get O(log(n)) behaviour by writing a new function 
that skipped them.  But you could take a look at the source code (in 
https://svn.r-project.org/R/trunk/src/appl/interv.c) if you want to 
check, or if you notice any weird timings.


Duncan Murdoch




set.seed(-3645);
l - vector();
N.seq - c(5000, 50, 100, 1000, 5000);k - 1
for (N in N.seq){
  tmp - sort(round(stats::rt(N, df=2), 2));
  l[k] - system.time(it3 - findInterval(-1, tmp))[2];k - k + 1;
}
plot(N.seq,l,type=b,xlab=length(vec), ylab=CPU time);

the resulting plot suggests a linear relationship.
I must be missing sth. here ?

Thanks !

Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lme problems

2008-09-22 Thread Tommaso Pizzari
Hi, 
I'm analysing a dataset in which the same 5 subjects (male.pair) were subjected 
to two treatments (treatment) and were measured for 12 successive days within 
each treatment (layingday). Overall 5*2*12=120 observations. 

I want to test the effect of treatment, time (layingday) and their interaction. 
I have done so through the ANOVA below:

 bmc3-aov(Mean1~treatment*layingday+Error(male.pair/treatment/layingday))
 summary(bmc3)

Error: male.pair
  Df  Sum Sq Mean Sq F value Pr(F)
Residuals  1 0.13850 0.13850   

Error: male.pair:treatment
  Df  Sum Sq Mean Sq
treatment  1 0.60525 0.60525

Error: male.pair:treatment:layingday
  Df  Sum Sq Mean Sq
layingday  1 0.64037 0.64037

Error: Within
 Df  Sum Sq Mean Sq F valuePr(F)
treatment 1 0.02015 0.02015  0.73400.3934
layingday 1 0.52937 0.52937 19.2878 2.545e-05 ***
treatment:layingday   1 0.02959 0.02959  1.07820.3013
Residuals   113 3.10135 0.02745  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

I then wanted to compare this outcome with an lme, and used the model below. 
However, its outcome doesn't make much sense to me. 

 bmc4- lme(Mean1 ~ treatment*layingday, random = ~1|male.pair)
 summary(bmc4)
Linear mixed-effects model fit by REML
 Data: NULL 
AIC   BIC   logLik
  -118.4522 -101.9306 65.22609

Random effects:
 Formula: ~1 | male.pair
(Intercept)  Residual
StdDev:   0.1313573 0.1185902

Fixed effects: Mean1 ~ treatment * layingday 
 Value  Std.Error  DF   t-value p-value
(Intercept)  0.5311005 0.09369140 112  5.668615  0.
treatment0.0495373 0.04616116 112  1.073138  0.2855
layingday   -0.0488055 0.00991701 112 -4.921389  0.
treatment:layingday  0.0138449 0.00627207 112  2.207388  0.0293
 Correlation: 
(Intr) trtmnt lyngdy
treatment   -0.739  
layingday   -0.688  0.838   
treatment:layingday  0.653 -0.883 -0.949

Standardized Within-Group Residuals:
Min  Q1 Med  Q3 Max 
-2.44529424 -0.68505388  0.01663401  0.59009515  3.53354000 

Number of Observations: 120
Number of Groups: 5 

I struggle to understand the discrepancy in df between the anova and lme, and 
the fact that the interaction term is not significant in the anova but 
significant in lme. Any help would be greatly appreciated. 
Best
Tom

-- 
Dr. Tommaso Pizzari
Edward Grey Institute, Dept of Zoology, 
University of Oxford, Oxford OX1 3PS
Tel: (44) 1865 271279, Fax: (44) 1865 271168

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warranty on Accuracy, Precision, Legality, ... of R in Research

2008-09-22 Thread Marc Schwartz
on 09/22/2008 11:26 AM Bert Chan wrote:
 Warranty on Accuracy, Precision, Legality, ... of R in Research
 
 (These questions may well have been raised.)
 
 What is the implied warranty of using R for research  publications, 
 consulting, etc.?
 
 Alternately, how does one obtain such a warranty?
 
 Your answers will be much appreciated.
 
 Perhaps you can point me to some websites which discussed this subject in the 
 past.
 
 Thanks  regards -
 
 Bert
 
 (Bertram K. C. Chan, PhD)

As per the banner that appears whenever you start up R:

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.


The suitability of R for any particular application is entirely up to
the user. Legally, there is nothing preventing you from using R for such
applications relative to the license under which R is made available.

You did not indicate the specific type of research you have in mind, but
if it might be in the domain of clinical trials, please review:

  http://www.r-project.org/doc/R-FDA.pdf

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Coefficients, OR and 95% CL

2008-09-22 Thread Luciano La Sala
Dear R-users,

After running a logistic regression, I need to calculate OR by exponentiating 
the coefficient, and then I need the 95% CL for the OR as well. For the 
following example (taken from P. Dalaagard's book), what would be the most 
straightforward method of getting what I need? Could anyone enlight me please?  
 

Thank you!
Lucho 

 summary(glm(menarche~age,binomial))

Call:
glm(formula = menarche ~ age, family = binomial)

Deviance Residuals: 
 Min1QMedian3Q   Max  
-4.68654  -0.13049  -0.01067   0.09608   2.35254  

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) -17.9175 1.7074  -10.49   2e-16 ***
age   1.3549 0.1296   10.45   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 974.31  on 703  degrees of freedom
Residual deviance: 223.95  on 702  degrees of freedom
  (635 observations deleted due to missingness)
AIC: 227.95

Number of Fisher Scoring iterations: 9






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coefficients, OR and 95% CL

2008-09-22 Thread Jorge Ivan Velez
Dear Luciano,
See ?logistic.display in the epicalc package. If glm1 is your model,
something like

logistic.display(glm1)

should do the job.


HTH,


Jorge


On Mon, Sep 22, 2008 at 5:28 PM, Luciano La Sala
[EMAIL PROTECTED]wrote:

 Dear R-users,

 After running a logistic regression, I need to calculate OR by
 exponentiating the coefficient, and then I need the 95% CL for the OR as
 well. For the following example (taken from P. Dalaagard's book), what would
 be the most straightforward method of getting what I need? Could anyone
 enlight me please?

 Thank you!
 Lucho

  summary(glm(menarche~age,binomial))

 Call:
 glm(formula = menarche ~ age, family = binomial)

 Deviance Residuals:
 Min1QMedian3Q   Max
 -4.68654  -0.13049  -0.01067   0.09608   2.35254

 Coefficients:
Estimate Std. Error z value Pr(|z|)
 (Intercept) -17.9175 1.7074  -10.49   2e-16 ***
 age   1.3549 0.1296   10.45   2e-16 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 (Dispersion parameter for binomial family taken to be 1)

Null deviance: 974.31  on 703  degrees of freedom
 Residual deviance: 223.95  on 702  degrees of freedom
  (635 observations deleted due to missingness)
 AIC: 227.95

 Number of Fisher Scoring iterations: 9






 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Deleting multiple variables

2008-09-22 Thread Michael Pearmain
Hi All,
i have searched the web for a simple solution but have been unable to find
one.  Can anyone recommend a neat way of deleting multiple variable?
I see, i need to use dataframe$VAR-NULL to get rid of one variable,
In my situation i need to delete all vars between two points.

I've used the 'which' function to find these out and have assigned to myvar
myvars
[1]  2 17

but i can't figure out how i should apply this?

Should i loop through the values? (Psydo code below?)

for (x in c(myvars[1]:myvars[2]))
(M_UC$x-NULL))

Any help gratful

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Deleting multiple variables

2008-09-22 Thread Andrew Robinson
Mike,

how about

M_UC - M_UC[,-(myvars[1]:myvars[2])]

?

Andrew

On Mon, Sep 22, 2008 at 11:04:34PM +0100, Michael Pearmain wrote:
 Hi All,
 i have searched the web for a simple solution but have been unable to find
 one.  Can anyone recommend a neat way of deleting multiple variable?
 I see, i need to use dataframe$VAR-NULL to get rid of one variable,
 In my situation i need to delete all vars between two points.
 
 I've used the 'which' function to find these out and have assigned to myvar
 myvars
 [1]  2 17
 
 but i can't figure out how i should apply this?
 
 Should i loop through the values? (Psydo code below?)
 
 for (x in c(myvars[1]:myvars[2]))
 (M_UC$x-NULL))
 
 Any help gratful
 
 Mike
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Weights for polr

2008-09-22 Thread Gregory Wawro

Hello,

I'm estimating an ordered logit model on a probability weighted survey 
sample.  polr permits case weights with the weights option, but I cannot 
figure out from existing documentation what it actually does with these 
weights.  I'm concerned about this because I get somewhat different 
results using Stata's ologit command with the pweights option and very 
different results using proc logistic in SAS with its weight option.  So 
my basic question is whether or not it is appropriate to use the weight 
option for polr with my data.


Best,
Greg


.

Gregory Wawro   [EMAIL PROTECTED]
Associate Professor phone:  212-854-8540
Dept. of Political Science  fax:212-222-0598
741 International Affairs   http://www.columbia.edu/~gjw10/
Columbia University
New York, NY 10027

.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help for SUR model

2008-09-22 Thread Xianchun Liao
I am an R beginner and trying to run a SUR model in R framework.

 

subset(esasp500, Obs =449  Obs=197, select = -Date) -ev13sub

c(Obs=397)  c(Obs=399) -d13

c(Obs=400)  c(Obs=449) -f13

SP500*f13 -SP500f13

 

BBC~SP500+d13+SP500f13  -sur132

BOW~SP500+d13+SP500f13  -sur133

CSK~SP500+d13+SP500f13  -sur134

DTC~SP500+d13+SP500f13  -sur135

GP~SP500+d13+SP500f13   -sur136

HAN~SP500+d13+SP500f13  -sur137

IP~SP500+d13+SP500f13   -sur138

KMB~SP500+d13+SP500f13  -sur139

LPX~SP500+d13+SP500f13  -sur1310

MWV~SP500+d13+SP500f13  -sur1311

PCH~SP500+d13+SP500f13  -sur1312

PCL~SP500+d13+SP500f13  -sur1313

PNR~SP500+d13+SP500f13  -sur1314

POP~SP500+d13+SP500f13  -sur1315

SON~SP500+d13+SP500f13  -sur1316

TIN~SP500+d13+SP500f13  -sur1317

W~SP500+d13+SP500f13-sur1318

WPP~SP500+d13+SP500f13  -sur1319

WY~SP500+d13+SP500f13   -sur1320

 

system13 - list(sur132, sur133, sur134, sur135, sur136, sur137, sur138,
sur139, sur1310, sur1311, sur1312, sur1313, sur1314, sur1315, sur1316,
sur1317, sur1318,sur1319,sur1320)

labels13 -
ist(sur132,sur133,sur134,sur135,sur136,sur137,sur138,sur1
39,sur1310,sur1311,sur1312,sur1313,sur1314,sur1315,sur1316
,sur1317,sur1318,sur1319,sur1320)  

res13 - systemfit(SUR, system13,labels13, data=ev13sub)

summary(res13)

 

But the results show  Error: could not find function systemfit.

 

So, how to write a R code to implement the formula and get right
results.

 

 

Thanks,

 

Bill

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Prediction errors from forecast()?

2008-09-22 Thread Laura Pyle
Hello,

I am using forecast() in the forecast package to predict future values of an
ARIMA model fit to a time series.  I have read most of the documentation for
the forecast package, but I can't figure out how to obtain the forecast
variance for the predicted values.  I tried using the argument
se.fit=TRUE, hoping this would work since forecast() calls predict().

Is there an easy way to do this?  Sample code is below.

ar - Arima(as.matrix(Y), order= c(1,0,0),include.drift=TRUE))
f - forecast(ar,h=9,se.fit=TRUE)
summary(f)

Thanks,
Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weights for polr

2008-09-22 Thread Thomas Lumley

On Mon, 22 Sep 2008, Gregory Wawro wrote:


Hello,

I'm estimating an ordered logit model on a probability weighted survey 
sample.


You could use svyolr() in the survey package.

polr permits case weights with the weights option, but I cannot 
figure out from existing documentation what it actually does with these 
weights.


They are frequency weights.

I'm concerned about this because I get somewhat different results 
using Stata's ologit command with the pweights option


You should get the same point estimates, but different standard errors.

and very different 
results using proc logistic in SAS with its weight option.


Again, it should be the same point estimates but different standard 
errors.


 So my basic 
question is whether or not it is appropriate to use the weight option for 
polr with my data.


No.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warranty on Accuracy, Precision, Legality, ... of R in Research

2008-09-22 Thread hadley wickham
On Mon, Sep 22, 2008 at 4:07 PM, Marc Schwartz
[EMAIL PROTECTED] wrote:
 on 09/22/2008 11:26 AM Bert Chan wrote:
 Warranty on Accuracy, Precision, Legality, ... of R in Research

 (These questions may well have been raised.)

 What is the implied warranty of using R for research  publications, 
 consulting, etc.?

 Alternately, how does one obtain such a warranty?

 Your answers will be much appreciated.

 Perhaps you can point me to some websites which discussed this subject in 
 the past.

 Thanks  regards -

 Bert

 (Bertram K. C. Chan, PhD)

 As per the banner that appears whenever you start up R:

 R is free software and comes with ABSOLUTELY NO WARRANTY.
 You are welcome to redistribute it under certain conditions.
 Type 'license()' or 'licence()' for distribution details.

And surely this the most that any software could provide?

SAS has:

EXCEPT WHERE EXPRESSLY PROVIDED OTHERWISE IN AN AGREEMENT BETWEEN YOU
AND SAS, ALL INFORMATION, SOFTWARE, PRODUCTS AND SERVICES ARE PROVIDED
AS IS WITHOUT WARRANTY OF ANY KIND INCLUDING WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NON-INFRINGEMENT.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Prediction errors from forecast()?

2008-09-22 Thread Laura Pyle
Sorry, I am resending in plain text.

Hello,

I am using forecast() in the forecast package to predict future values
of an ARIMA model fit to a time series.  I have read most of the
documentation for the forecast package, but I can't figure out how to
obtain the forecast variance for the predicted values.  I tried using
the argument se.fit=TRUE, hoping this would work since forecast()
calls predict().

Is there an easy way to do this?  Sample code is below.

ar - Arima(as.matrix(Y), order= c(1,0,0),include.drift=TRUE))
f - forecast(ar,h=9,se.fit=TRUE)
summary(f)

Thanks,
Laura

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sort a data matrix by all the values and keep the names

2008-09-22 Thread zhihuali

Dear all,

If I have a data frame  x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
   x1  x2  x3
   1 4  8
   7 6  2

I want to sort the whole data and get this:
x1 1
x3  2
x2  4
x2  6
x1   7
x3   8

 If I do sort(X), R reports:
Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = 
FALSE) : 
  unimplemented type 'list' in 'orderVector1'

The only way I can sort all the data is by converting it to a matrix:
 sort(as.matrix(x))
[1] 1 2 4 6 7 8

But now I lost all the names attributes.

Is it possible to sort a data frame and keep all the names?

Thanks!

Zhihua Li

_
[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] perl expression question

2008-09-22 Thread markleeds
If I have the string below. does someone know a regular expression to 
just get the BLC.NYSE. I bought the O'Reilley
book and read it when I can  and I study the solutions on the list but 
I'm still not self sufficient with these things. Thanks.



stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread Moshe Olshansky
One possibility is:

 x - data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2))
 names - t(matrix(rep(names(x),times=nrow(x)),nrow=ncol(x)))
 m - as.matrix(x)
 ind - order(m)
 df - data.frame(name=names[ind],value=m[ind])
 df
  name value
1   x1 1
2   x3 2
3   x2 4
4   x2 6
5   x1 7
6   x3 8



--- On Tue, 23/9/08, zhihuali [EMAIL PROTECTED] wrote:

 From: zhihuali [EMAIL PROTECTED]
 Subject: [R] sort a data matrix by all the values and keep the names
 To: [EMAIL PROTECTED]
 Received: Tuesday, 23 September, 2008, 9:54 AM
 Dear all,
 
 If I have a data frame 
 x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
x1  x2  x3
1 4  8
7 6  2
 
 I want to sort the whole data and get this:
 x1 1
 x3  2
 x2  4
 x2  6
 x1   7
 x3   8
 
  If I do sort(X), R reports:
 Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8,
 2)), decreasing = FALSE) : 
   unimplemented type 'list' in
 'orderVector1'
 
 The only way I can sort all the data is by converting it to
 a matrix:
  sort(as.matrix(x))
 [1] 1 2 4 6 7 8
 
 But now I lost all the names attributes.
 
 Is it possible to sort a data frame and keep all the names?
 
 Thanks!
 
 Zhihua Li
 
 _
 [[elided Hotmail spam]]
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R-2.7.2 infected?

2008-09-22 Thread Dave DeBarr
I tried downloading R-2.7.2 
(http://cran.cnr.berkeley.edu/bin/windows/base/R-2.7.2-win32.exe, both from 
Berkeley and cran) and both times I got a warning from Computer Associates 
eTrust Antivirus (version 7.1.710) that the Win32/Adclicker.JO trojan was 
detected:
The Win32/Adclicker.JO was detected in 
C:\USERS\USER\APPDATA\LOCAL\MICROSOFT\WINDOWS\TEMPORARY INTERNET 
FILES\LOW\CONTENT.IE5\61HAYRTG\R-2.7.2-WIN32[1].EXE.

Has anyone else seen this?

Thanks,
Dave


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread hadley wickham
On Mon, Sep 22, 2008 at 6:54 PM, zhihuali [EMAIL PROTECTED] wrote:

 Dear all,

 If I have a data frame  x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
   x1  x2  x3
   1 4  8
   7 6  2

 I want to sort the whole data and get this:
 x1 1
 x3  2
 x2  4
 x2  6
 x1   7
 x3   8

  If I do sort(X), R reports:
 Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = 
 FALSE) :
  unimplemented type 'list' in 'orderVector1'

 The only way I can sort all the data is by converting it to a matrix:
 sort(as.matrix(x))
 [1] 1 2 4 6 7 8

 But now I lost all the names attributes.

 Is it possible to sort a data frame and keep all the names?

Here's one way:

dfm - melt(x, id = c())
dfm[order(dfm$value), ]

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread zhihuali

This is exactly what I wanted!

Thank you so much!

Z



 Date: Mon, 22 Sep 2008 19:21:43 -0500
 From: [EMAIL PROTECTED]
 Subject: RE: [R] sort a data matrix by all the values and keep the names
 To: [EMAIL PROTECTED]
 
 Hi: there might be a quicker way but you can use stack and order. stack 
 creates a dataframe with 2 columns, values and ind,  with ind
 being the associate columns.
 
 order(temp$values) creates the  indices of the ordered values so you 
 index by that to make it sorted.
 
 temp - stack(x)
 print(temp)
 print(str(temp))
 
 sortedx - temp[order(temp$values),]
 print(sortedx)
 
 
 
 On Mon, Sep 22, 2008 at  7:54 PM, zhihuali wrote:
 
  Dear all,
 
  If I have a data frame  x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
 x1  x2  x3
 1 4  8
 7 6  2
 
  I want to sort the whole data and get this:
  x1 1
  x3  2
  x2  4
  x2  6
  x1   7
  x3   8
 
   If I do sort(X), R reports:
  Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), 
  decreasing = FALSE) :   unimplemented type 'list' in 'orderVector1'
 
  The only way I can sort all the data is by converting it to a matrix:
  sort(as.matrix(x))
  [1] 1 2 4 6 7 8
 
  But now I lost all the names attributes.
 
  Is it possible to sort a data frame and keep all the names?
 
  Thanks!
 
  Zhihua Li
 
  _
  [[elided Hotmail spam]]
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

_
[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] suppress legend in ggplot(data, aes(y=Y, x=X,fill=Z))?

2008-09-22 Thread hadley wickham
On Sun, Sep 21, 2008 at 5:25 PM, Tom Bonen [EMAIL PROTECTED] wrote:
 hi,

 is there any way to suppress the legend in ggplot(data, aes(y=Y,
 x=X,fill=Z)) ? i'd like the values to be displayed in different colors
 as specified by fill= and this works just fine. but i do not want to
 have the legend on the right that is automactially created when fill
 is specified.

Hi Tom,

+ opts(legend.position = none)

should do the trick.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] perl expression question

2008-09-22 Thread Moshe Olshansky
Hi Mark,

stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE
 gsub(.*/([^/]+)$, \\1,stock)
[1] BLC.NYSE



--- On Tue, 23/9/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 Subject: [R] perl expression question
 To: r-help@r-project.org
 Received: Tuesday, 23 September, 2008, 10:29 AM
 If I have the string below. does someone know a regular
 expression to 
 just get the BLC.NYSE. I bought the
 O'Reilley
 book and read it when I can  and I study the solutions on
 the list but 
 I'm still not self sufficient with these things.
 Thanks.
 
  
 stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to view or export values of 'names' in a lm

2008-09-22 Thread Jhunk Emale
Hello,
I have been using:

model - lm(y~x+I(x^2))

I am namely interested in the values of the residuals. If I use the 'names'
command I get the following:

 names(model)
 [1] coefficients  residuals effects   rank
 [5] fitted.values assignqrdf.residual
 [9] xlevels   call  terms model

I know I can view 'residuals' or 'resid' but how can I view the available
values of 'names' together or, perhaps even better, how can I export them.
If this is a case of read the manual, could someone direct me to where this
is discussed.

Thank you kindly,
JE

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread Steven McKinney
Is something missing in the melt()?

 x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2))
 require(reshape)
Loading required package: reshape
 dfm - melt(x, id = c())
Error in if (!missing(id.var)  !(id.var %in% varnames)) { : 
  missing value where TRUE/FALSE needed
 dfm[order(dfm$value), ]
Error: object dfm not found
 x
  x1 x2 x3
1  1  4  8
2  7  6  2
 melt(x, id = c())
Error in if (!missing(id.var)  !(id.var %in% varnames)) { : 
  missing value where TRUE/FALSE needed



Steve McKinney


-Original Message-
From: [EMAIL PROTECTED] on behalf of hadley wickham
Sent: Mon 9/22/2008 5:47 PM
To: zhihuali
Cc: [EMAIL PROTECTED]
Subject: Re: [R] sort a data matrix by all the values and keep the names
 
On Mon, Sep 22, 2008 at 6:54 PM, zhihuali [EMAIL PROTECTED] wrote:

 Dear all,

 If I have a data frame  x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
   x1  x2  x3
   1 4  8
   7 6  2

 I want to sort the whole data and get this:
 x1 1
 x3  2
 x2  4
 x2  6
 x1   7
 x3   8

  If I do sort(X), R reports:
 Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = 
 FALSE) :
  unimplemented type 'list' in 'orderVector1'

 The only way I can sort all the data is by converting it to a matrix:
 sort(as.matrix(x))
 [1] 1 2 4 6 7 8

 But now I lost all the names attributes.

 Is it possible to sort a data frame and keep all the names?

Here's one way:

dfm - melt(x, id = c())
dfm[order(dfm$value), ]

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plot implicit function

2008-09-22 Thread Ying-Ying Lee
Hi,

I would like to know how to plot the implicit function.  For example,
f(x,y)=0.  I'd like to plot x-y figure.

Thanks,
Ying

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] perl expression question

2008-09-22 Thread Andrew Robinson
Hi Mark,

do you mean the regex to get the portion of the address after the
final slash?  Something like

gsub(.*/([^/]*$), \\1, stock, fixed=FALSE)

Cheers

Andrew

On Mon, Sep 22, 2008 at 07:29:25PM -0500, [EMAIL PROTECTED] wrote:
 If I have the string below. does someone know a regular expression to 
 just get the BLC.NYSE. I bought the O'Reilley
 book and read it when I can  and I study the solutions on the list but 
 I'm still not self sufficient with these things. Thanks.
 
 
 stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] perl expression question

2008-09-22 Thread Gabor Grothendieck
Try this:

 sub(.*/, , stock)
[1] BLC.NYSE

On Mon, Sep 22, 2008 at 8:29 PM,  [EMAIL PROTECTED] wrote:
 If I have the string below. does someone know a regular expression to just
 get the BLC.NYSE. I bought the O'Reilley
 book and read it when I can  and I study the solutions on the list but I'm
 still not self sufficient with these things. Thanks.


 stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] perl expression question

2008-09-22 Thread Gabor Grothendieck
By the way, although a regular expression solutions was asked for
if one expands that to any solution then R does have a function
specifically for this case:

 basename(stock)
[1] BLC.NYSE

On Mon, Sep 22, 2008 at 9:23 PM, Gabor Grothendieck
[EMAIL PROTECTED] wrote:
 Try this:

 sub(.*/, , stock)
 [1] BLC.NYSE

 On Mon, Sep 22, 2008 at 8:29 PM,  [EMAIL PROTECTED] wrote:
 If I have the string below. does someone know a regular expression to just
 get the BLC.NYSE. I bought the O'Reilley
 book and read it when I can  and I study the solutions on the list but I'm
 still not self sufficient with these things. Thanks.


 stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] perl expression question

2008-09-22 Thread jim holtman
If this is a path name, then 'basename' will work for you:

 stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE
 basename(stock)
[1] BLC.NYSE



On Mon, Sep 22, 2008 at 8:29 PM,  [EMAIL PROTECTED] wrote:
 If I have the string below. does someone know a regular expression to just
 get the BLC.NYSE. I bought the O'Reilley
 book and read it when I can  and I study the solutions on the list but I'm
 still not self sufficient with these things. Thanks.


 stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread hadley wickham
Hmm, maybe it only works in my development version (to be released v. v. soon)

Hadley

On Mon, Sep 22, 2008 at 8:02 PM, Steven McKinney [EMAIL PROTECTED] wrote:
 Is something missing in the melt()?

 x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2))
 require(reshape)
 Loading required package: reshape
 dfm - melt(x, id = c())
 Error in if (!missing(id.var)  !(id.var %in% varnames)) { :
  missing value where TRUE/FALSE needed
 dfm[order(dfm$value), ]
 Error: object dfm not found
 x
  x1 x2 x3
 1  1  4  8
 2  7  6  2
 melt(x, id = c())
 Error in if (!missing(id.var)  !(id.var %in% varnames)) { :
  missing value where TRUE/FALSE needed



 Steve McKinney


 -Original Message-
 From: [EMAIL PROTECTED] on behalf of hadley wickham
 Sent: Mon 9/22/2008 5:47 PM
 To: zhihuali
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] sort a data matrix by all the values and keep the names

 On Mon, Sep 22, 2008 at 6:54 PM, zhihuali [EMAIL PROTECTED] wrote:

 Dear all,

 If I have a data frame  x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
   x1  x2  x3
   1 4  8
   7 6  2

 I want to sort the whole data and get this:
 x1 1
 x3  2
 x2  4
 x2  6
 x1   7
 x3   8

  If I do sort(X), R reports:
 Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = 
 FALSE) :
  unimplemented type 'list' in 'orderVector1'

 The only way I can sort all the data is by converting it to a matrix:
 sort(as.matrix(x))
 [1] 1 2 4 6 7 8

 But now I lost all the names attributes.

 Is it possible to sort a data frame and keep all the names?

 Here's one way:

 dfm - melt(x, id = c())
 dfm[order(dfm$value), ]

 Hadley

 --
 http://had.co.nz/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error: subscript out of bounds.

2008-09-22 Thread Rolf Turner


Consider:

 x - array(1:12,dim=12)
 x[13]
[1] NA]
 m - array(1:12,dim=c(3,4))
 m[3,5]
Error: subscript out of bounds

Can anyone tell me it there is a Good Reason for the difference in  
behaviour
between 1 dimensional and higher dimensional arrays?  In a bit of  
code that
I was working on I expected the NA behaviour and didn't get it of  
course.  Then

I had to take evasive action to avoid the error.

Naive young thing that I am, I would prefer the NA behaviour to be  
universal.

But I expect that, as usual, I'm overlooking something.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-2.7.2 infected?

2008-09-22 Thread Duncan Murdoch

Dave DeBarr wrote:

I tried downloading R-2.7.2 
(http://cran.cnr.berkeley.edu/bin/windows/base/R-2.7.2-win32.exe, both from 
Berkeley and cran) and both times I got a warning from Computer Associates 
eTrust Antivirus (version 7.1.710) that the Win32/Adclicker.JO trojan was 
detected:
The Win32/Adclicker.JO was detected in 
C:\USERS\USER\APPDATA\LOCAL\MICROSOFT\WINDOWS\TEMPORARY INTERNET 
FILES\LOW\CONTENT.IE5\61HAYRTG\R-2.7.2-WIN32[1].EXE.

Has anyone else seen this?
You're the first to report it, and 2.7.2 has been out for almost a 
month, so I think it's likely that the CRAN copy is uninfected.  Did you 
check the md5 checksum on it?  It matches on the original, so if it 
doesn't match at your end, you've got a bad download.


If it matches and you still get the virus checker reporting, please let 
me know the details about that infection, and I'll try to do a manual 
inspection for it.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create groups from data to compute lm?

2008-09-22 Thread Michael Just
Hello,
Below are the first two rows from my dataset and the header. This dataset
has 5749 rows and I want to select only certain rows to be used based on
existing grouping values. I am trying to group the data based on the values
under 'ex_bin'. (e.g a group for 250, 251, 252, 500, 501, 502) I would then
like to perform a lm for each grouping.

My data:

 all[1:2,]
year extent scape bi_ca r ex_bin PriNo pri1234 pri_ex sc_ex Sc_ex_pri
sc_ec_p1234 PD LPI ED LSI
13 25 1 1 3251251 1   1 26   125
11251125 21.6565 62.6602  82.0769 15.8792
23 25 1 1 3251251 1   1 26   125
11251125 19.3076 27.6264 111.2014 20.7889
PAFRAC PROX_MN ENN_MN CONTAG pfor purban
1  1.440 319.6529 114.8314 62.0965 69.4891 12.3124
2  1.467 396.1949 105.3712 52.9186 38.1179 15.1906

I tried using:

all.lm - (pfor~PD, data = all, subset=(ex_bin==250))

but this resulted in a bogus analysis filed with 'NAs'. I then tried to use
getGroups.

 all.group - getGroups(data=all, ex_bin ='250')
Error in getGroups(data = all, ex_bin = 250) :
unused argument(s) (ex_bin = 250)

Again, no success. I am approaching this correctly?

Thank you kindly,
Regards,
M Just

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-2.7.2 infected?

2008-09-22 Thread Ajay ohri
could this be an intentional attack to compromise a very popular
download, and infect thousands of people.what could be the
motivations...i hope its not some corporate thug here

What exactly does the Win32/Adclicker.JO trojan do ???

Ajay
www.decisionstats.com
www.iwannacrib.com

On Tue, Sep 23, 2008 at 9:11 AM, Duncan Murdoch [EMAIL PROTECTED] wrote:
 Dave DeBarr wrote:

 I tried downloading R-2.7.2
 (http://cran.cnr.berkeley.edu/bin/windows/base/R-2.7.2-win32.exe, both from
 Berkeley and cran) and both times I got a warning from Computer Associates
 eTrust Antivirus (version 7.1.710) that the Win32/Adclicker.JO trojan was
 detected:
 The Win32/Adclicker.JO was detected in
 C:\USERS\USER\APPDATA\LOCAL\MICROSOFT\WINDOWS\TEMPORARY INTERNET
 FILES\LOW\CONTENT.IE5\61HAYRTG\R-2.7.2-WIN32[1].EXE.

 Has anyone else seen this?

 You're the first to report it, and 2.7.2 has been out for almost a month, so
 I think it's likely that the CRAN copy is uninfected.  Did you check the md5
 checksum on it?  It matches on the original, so if it doesn't match at your
 end, you've got a bad download.

 If it matches and you still get the virus checker reporting, please let me
 know the details about that infection, and I'll try to do a manual
 inspection for it.

 Duncan Murdoch

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Ajay Ohri
http://tinyurl.com/liajayohri

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >