date:20100206

Re: [R] Reading large files

2010-02-06 Thread Saptarshi Guha

Hello,
Do you need /all/ the data in memory at one time? Is your goal to
divide the data (e.g according to some factor /or/ some function of
the columns of data set ) and then analyze the divisions? And then,
possibly, combine the results ?
If so, you might consider using Rhipe. We have analyzed (e.g get
regression parameters, apply algorithms) across subsets of data where
the subsets are created according to some condition.
Using this approach(and a cluster of 8 machines, 72 cores) we
successfully analyzed data sets ranging from 14GB to ~140GB .
This all assumes  that your divisions are suitably small - i notice
you mention that each region is 10-20 GB and you want to compute on
/all/ i.e you need all of it in memory. If so, Rhipe cannot help you.


Regards
Saptarshi



On Thu, Feb 4, 2010 at 8:27 PM, Vadlamani, Satish {FLNA}
 wrote:
> Folks:
> I am trying to read in a large file. Definition of large is:
> Number of lines: 333, 250
> Size: 850 MB
>
> The maching is a dual core intel, with 4 GB RAM and nothing else running on 
> it. I read the previous threads on read.fwf and did not see any conclusive 
> statements on how to read fast. Example record and R code given below. I was 
> hoping to purchase a better machine and do analysis with larger datasets - 
> but these preliminary results do not look good.
>
> Does anyone have any experience with large files (> 1GB) and using them with 
> Revolution-R?
>
>
> Thanks.
>
> Satish
>
> Example Code
> key_vec <- c(1,3,3,4,2,8,8,2,2,3,2,2,1,3,3,3,3,9)
> key_names <- 
> c("allgeo","area1","zone","dist","ccust1","whse","bindc","ccust2","account","area2","ccust3","customer","allprod","cat","bu","class","size","bdc")
> key_info <- data.frame(key_vec,key_names)
> col_names <- c(key_names,sas_time$week)
> num_buckets <- rep(12,209)
> width_vec = c(key_vec,num_buckets)
> col_classes<-c(rep("factor",18),rep("numeric",209))
> #threewkoutstat <- 
> read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes,n=100)
> threewkoutstat <- 
> read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes)
> names(threewkoutstat) <- col_names
>
> Example record (only one record pasted below)
> A00400100379949254925004A001002002015002015009        0.00    
>     0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00   !
>      0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.60        
> 0.60        0.60        0.70        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00       !
>  0.00        0.00        0.00        0.00        0.00        0.00
>   0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.

Re: [R] Non-linear regression

2010-02-06 Thread kupz


Agreed, it would be simple to propose the relationship, however the
regression is necessary to model the data properly. Unfortunately a simple
decay based on those two points does not have the proper shape necessary.
This is due to an extreme amount of zero inflation with this fisheries data. 

On another note, I have a working solution for the problem, I am excluding a
portion of the zero data based on some other apriori assumptions.. Thanks
for your help though. 
-- 
View this message in context: 
http://n4.nabble.com/Non-linear-regression-tp1471736p1471749.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Non-linear regression

2010-02-06 Thread David Winsemius



On Feb 6, 2010, at 10:33 PM, kupz wrote:



So I have a data set I would like to model using a non-linear  
method. I know
it should be an exponential decay. However I know what the first  
derivative
of the equation should be at two points, x=0 and x=100. Is there  
anyway to
establish this when inputing the model or how would one go about  
this before

the nls statement


Given the rather simple relationship between the exponential function  
and its derivative, why would you need the regression if you already  
have those two points for dy/dx ( as well as the value of the function  
at x=0)? Is this homework?


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Posting an 'S4-creating Package Problem'...

2010-02-06 Thread Daniel Kosztyla


Hello R-Team,

May you help me to post a 'S4-creating Package Problem'?
Thanks already now for supporting.
The problem sounds like:

Hello R forum,

while compiling my R-package these 'Warnings' occur:

...
Warnung in matchSignature(signature, fdef, where) :
  in the method signature for function "plot" no definition for class: 
"prediction"

Warnung in matchSignature(signature, fdef, where) :
  in the method signature for function "plot" no definition for class: 
"validation"

** help
*** installing help indices
...

Maybe my NAMESPACE file looks wrong. Has anybody an idea how it has to 
look like to solve

this problem? ( I use exportClasses(...), exportMethods(...). )

I have 3 classes: 'prediction', 'validation', 'nvalidation' which have a 
plot function.

There's no warning for class 'nvalidation' but for the other two.
Any suggestions?

Greetings. Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Non-linear regression

2010-02-06 Thread kupz


So I have a data set I would like to model using a non-linear method. I know
it should be an exponential decay. However I know what the first derivative
of the equation should be at two points, x=0 and x=100. Is there anyway to
establish this when inputing the model or how would one go about this before
the nls statement

-Thanks, 
Matt
-- 
View this message in context: 
http://n4.nabble.com/Non-linear-regression-tp1471736p1471736.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

By the way, if you use the H2 database with sqldf then there is a
second way to read files in using sqldf.

# 1. run your perl program outside of R to create myfile.csv, say.

# 2. install java from http://java.sun.com
# and then install the RH2 package from CRAN
install.packages("RH2")

# 3. load sqldf and RH2
# sqldf automatically uses H2 database if RH2 is loaded
library(RH2)
library(sqldf)

# 4. read file using sqldf making use of the CSVREAD function in H2
DF <- sqldf("select * from CSVREAD('myfile.csv')")


On Sat, Feb 6, 2010 at 8:37 PM, Gabor Grothendieck
 wrote:
> file= is the input data file. filter= is just a command string that
> specifies a program to run (not a data file).
>
> 1. If Filename.tmp is the name of a temporary file (that it creates)
> it runs a batch command similar to this:
>      paste("cmd /c", filter, "<", file, ">", Filename.tmp)
>
> 2. Then it reads Filename.tmp into the database (which it creates for
> you) and does this without involving R and
>
> 3. finally it reads the table in the database that was created into R,
> as an R dataframe, and destroys the database.
>
>
> On Sat, Feb 6, 2010 at 7:53 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>> It did suppress the message now and I was able to load the data. Question.
>>
>> 1. test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl 
>> parse_3wkout.pl")
>>
>> In the statement above, should the filename in file= and the file name that 
>> the perl script uses through the filter= command be the same? I would think 
>> not.  I would say that if filter= is passed to the statement, then the 
>> filename should be ignored. Is this how it works?
>>
>> Thanks.
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 4:58 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> I have uploaded another version which suppresses display of the error
>> message but otherwise works the same.  Omitting the redundant
>> arguments we have:
>>
>> ibrary(sqldf)
>> # next line is only needed once per session to read in devel version
>> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
>>
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
>> parse_3wkout.pl")
>>
>>
>> On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Gabor:
>>> Please see the results below. Sourcing your new R script worked (although 
>>> with the same error message). If I put eol="\n" option, it is adding a "\r" 
>>> to the last column. I took out the eol option below. This is just some more 
>>> feedback to you.
>>>
>>> I am thinking that I will just do an inline edit in Perl (that is create 
>>> the csv file through Perl by overwriting the current file) and then use 
>>> read.csv.sql without the filter= option. This seems to be more tried and 
>>> tested. If you have any suggestions, please let me know. Thanks.
>>> Satish
>>>
>>>
>>> BEFORE SOURCING YOUR NEW R SCRIPT
 test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
 from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
>>> Error in readRegistry(key, maxdepth = 3) :
>>>  Registry key 'SOFTWARE\R-core' not found
 test_df
>>> Error: object 'test_df' not found
>>>
>>> AFTER SOURCING YOUR NEW R SCRIPT
 source("f:/dp_modeling_team/downloads/R/sqldf.R")
 test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
 from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
>>> Error in readRegistry(key, maxdepth = 3) :
>>>  Registry key 'SOFTWARE\R-core' not found
>>> In addition: Warning messages:
>>> 1: closing unused connection 5 (3wkoutstatfcst_small.dat)
>>> 2: closing unused connection 4 (3wkoutstatfcst_small.dat)
>>> 3: closing unused connection 3 (3wkoutstatfcst_small.dat)
 test_df
>>>   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
>>> 1       A     4    1   37     99 4925  4925     99      99     4     99
>>> 2       A     4    1   37     99 4925  4925     99      99     4     99
>>>
>>> -Original Message-
>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>> Sent: Saturday, February 06, 2010 4:28 PM
>>> To: Vadlamani, Satish {FLNA}
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Reading large files
>>>
>>> The software attempts to read the registry and temporarily augment the
>>> path in case you have Rtools installed so that the filter can access
>>> all the tools that Rtools provides.  I am not sure why its failing on
>>> your system but there is evidently some differences between systems
>>> here and I have added some code to trap and bypass that portion in
>>> case it fails.  I have added the new version to the svn repository so
>>> try this:
>>>
>>> library(sqldf)
>>> # overwrite with development version
>>> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
>>> # yo

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

file= is the input data file. filter= is just a command string that
specifies a program to run (not a data file).

1. If Filename.tmp is the name of a temporary file (that it creates)
it runs a batch command similar to this:
  paste("cmd /c", filter, "<", file, ">", Filename.tmp)

2. Then it reads Filename.tmp into the database (which it creates for
you) and does this without involving R and

3. finally it reads the table in the database that was created into R,
as an R dataframe, and destroys the database.


On Sat, Feb 6, 2010 at 7:53 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> It did suppress the message now and I was able to load the data. Question.
>
> 1. test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl 
> parse_3wkout.pl")
>
> In the statement above, should the filename in file= and the file name that 
> the perl script uses through the filter= command be the same? I would think 
> not.  I would say that if filter= is passed to the statement, then the 
> filename should be ignored. Is this how it works?
>
> Thanks.
> Satish
>
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 4:58 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> I have uploaded another version which suppresses display of the error
> message but otherwise works the same.  Omitting the redundant
> arguments we have:
>
> ibrary(sqldf)
> # next line is only needed once per session to read in devel version
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
>
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
> parse_3wkout.pl")
>
>
> On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>> Please see the results below. Sourcing your new R script worked (although 
>> with the same error message). If I put eol="\n" option, it is adding a "\r" 
>> to the last column. I took out the eol option below. This is just some more 
>> feedback to you.
>>
>> I am thinking that I will just do an inline edit in Perl (that is create the 
>> csv file through Perl by overwriting the current file) and then use 
>> read.csv.sql without the filter= option. This seems to be more tried and 
>> tested. If you have any suggestions, please let me know. Thanks.
>> Satish
>>
>>
>> BEFORE SOURCING YOUR NEW R SCRIPT
>>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>>> test_df
>> Error: object 'test_df' not found
>>
>> AFTER SOURCING YOUR NEW R SCRIPT
>>> source("f:/dp_modeling_team/downloads/R/sqldf.R")
>>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>> In addition: Warning messages:
>> 1: closing unused connection 5 (3wkoutstatfcst_small.dat)
>> 2: closing unused connection 4 (3wkoutstatfcst_small.dat)
>> 3: closing unused connection 3 (3wkoutstatfcst_small.dat)
>>> test_df
>>   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
>> 1       A     4    1   37     99 4925  4925     99      99     4     99
>> 2       A     4    1   37     99 4925  4925     99      99     4     99
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 4:28 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> The software attempts to read the registry and temporarily augment the
>> path in case you have Rtools installed so that the filter can access
>> all the tools that Rtools provides.  I am not sure why its failing on
>> your system but there is evidently some differences between systems
>> here and I have added some code to trap and bypass that portion in
>> case it fails.  I have added the new version to the svn repository so
>> try this:
>>
>> library(sqldf)
>> # overwrite with development version
>> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
>> # your code to call read.csv.sql
>>
>>
>> On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
>>  wrote:
>>>
>>> Gabor:
>>> Here is the update. As you can see, I got the same error as below in 1.
>>>
>>> 1. Error
>>>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
>>> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
>>> Error in readRegistry(key, maxdepth = 3) :
>>>  Registry key 'SOFTWARE\R-core' not found
>>>
>>> 2. But the loading of the bigger file was successful as you can see below. 
>>> 857 MB, 333,250 rows, 227 columns. This is good.
>>>
>>> I will have to just do an inline edit in Perl and change the file to csv 
>>> from within R

Re: [R] read and process files line by line

2010-02-06 Thread David Winsemius



On Feb 6, 2010, at 4:57 PM, Dick Harray wrote:


Hi there,

I want to read large files line by line in order to process each line
and store the information of each list iin an object. My problem is,
that I do not know how to process each line of the file, because I
want to avoid to import the whole file.

The data file "inputdata.csv" is a CSV file with a header like:
Number, Name, factors1, factors2
"123", "some characters", "a; list; of; factors", "a; second; list;  
of; factors"


How can I read the file in a manner like:

foreach_line_in("inputdata.csv") {
 currentline <- function_or_command_to_get_the_current_line()


?readLines



 # { here comes the block in which the current line is processed, and
the new object created
 #  no help needed here}

}

Thanks and regards,
d!rk


__

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting

2010-02-06 Thread David Neu

On Sat, Feb 6, 2010 at 6:30 PM, Hans W Borchers
 wrote:
> David Neu  davidneu.com> writes:
>
>> David Neu  davidneu.com> writes:
>>
>> Hi,
>>
>> I have a list of vectors (of varying lengths).  I'd like to sort this
>> list by applying a function to each pair of vectors in the list and
>> returning information to sorting routine that let's it know which one
>> is larger.
>>
>> To solve problems like this in Common Lisp, the sort function accepts
>> a function as an argument.  The arguments to this function are two
>> elements of the list which is being sorted.  The writer of the
>> function returns t (TRUE in R) when the first argument to the function
>> is larger than the second and nil (FALSE in R) otherwise.
>>
>> I'm wondering if there is some way to accomplish this in R.
>
> I don't know whether there is a way to do it with the 'base::sort' function
> -- and I too would very much like to know for an application of my own --,
> but you can always define your own sorting, like here a simple bubble sort:
>
>    bubbleSort.list <- function(L, comp) {
>    stopifnot(is.list(L))
>    n <- length(L)
>    if (n <= 1) return(L)
>    for (i in 1:n) {
>        for (j in 2:n) {
>        b <- L[[j]]
>        if (comp(L[[j]], L[[j-1]])) {
>            L[[j]] <- L[[j-1]]
>            L[[j-1]] <- b
>        }
>        }
>    }
>    return(L)
>    }
>
> If your comparing function, for example, compares first length and then mean:
>
>    comp <- function(L1, L2) {
>      if (length(L1)         (length(L1)==length(L2) && mean(L1)        return(TRUE)
>      else
>        return(FALSE)
>    }
>
> then the following test example will turn out to be correct:
>
>    L <- list()
>    for (i in 1:100) L[[i]] <- runif(sample(1:20, 1))
>
>    Ls <- bubbleSort.list(L, comp)
>    is.unsorted(sapply(Ls, length))  # incl. mean for equal length
>
> If bubblesort is too slow, implement your own heapsort or quicksort.
>
>> Many thanks for any help!
>>
>> Cheers,
>> David
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Hi Hans,

Great minds think alike! :)  I also decided to write my own sorting
routine.  I happened to pick insertion sort.

I do think would be a  very nice feature to have in R.  I've it useful
in other languages e.g.
http://www.lispworks.com/documentation/lw50/CLHS/Body/f_sort_.htm
http://wiki.python.org/moin/HowTo/Sorting

Thanks for all of your help and for the confirmation that this is the way to go!

Cheers,
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The KJV

2010-02-06 Thread Ben Bolker

Jim Lemon  bitwrit.com.au> writes:

> 
> On 02/06/2010 06:57 PM, Charlotte Maia wrote:
> > Hey all,
> >
> > Does anyone know if there are any R packages with a copy of the KJV?
> > I'm guessing the answer is no...
> >
> > So the next question, and the more important one is:
> > Does anyone think it would be useful (e.g. for text-mining purposes)?
> > I know almost nothing about theology,
> > so I'm not sure what kind of questions theologists might have (that R
> > could answer).
> >
> > An alternative, that would achieve a similar result (I think),
> > would be an R interface to another open source system, such as Sword.
> >
> Hi Charlotte,
> Try
> 
> http://www.gutenberg.org/etext/10
> 
> Jim
> 

 I couldn't help it:

x <- url("http://www.gutenberg.org/dirs/etext90/kjv10.txt",open="r";)
X <- readLines(x,n=2)
z <- grep("First Book of Moses",X)
X <- X[-(1:z)]
X <- X[nchar(X)>0]
length(X) ## 15058
words <- tolower(unlist(strsplit(X,"[ .,:;()]")))
words2 <- grep("[^0-9]",words,value=TRUE)
tt <- rev(sort(table(words2)))
barplot(rev(tt[1:100]),horiz=TRUE,las=1,cex.names=0.4,log="x")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Gabor:
It did suppress the message now and I was able to load the data. Question.

1. test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl 
parse_3wkout.pl") 

In the statement above, should the filename in file= and the file name that the 
perl script uses through the filter= command be the same? I would think not.  I 
would say that if filter= is passed to the statement, then the filename should 
be ignored. Is this how it works?

Thanks.
Satish


-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 4:58 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

I have uploaded another version which suppresses display of the error
message but otherwise works the same.  Omitting the redundant
arguments we have:

ibrary(sqldf)
# next line is only needed once per session to read in devel version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)

test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
parse_3wkout.pl")


On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> Please see the results below. Sourcing your new R script worked (although 
> with the same error message). If I put eol="\n" option, it is adding a "\r" 
> to the last column. I took out the eol option below. This is just some more 
> feedback to you.
>
> I am thinking that I will just do an inline edit in Perl (that is create the 
> csv file through Perl by overwriting the current file) and then use 
> read.csv.sql without the filter= option. This seems to be more tried and 
> tested. If you have any suggestions, please let me know. Thanks.
> Satish
>
>
> BEFORE SOURCING YOUR NEW R SCRIPT
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>> test_df
> Error: object 'test_df' not found
>
> AFTER SOURCING YOUR NEW R SCRIPT
>> source("f:/dp_modeling_team/downloads/R/sqldf.R")
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
> In addition: Warning messages:
> 1: closing unused connection 5 (3wkoutstatfcst_small.dat)
> 2: closing unused connection 4 (3wkoutstatfcst_small.dat)
> 3: closing unused connection 3 (3wkoutstatfcst_small.dat)
>> test_df
>   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
> 1       A     4    1   37     99 4925  4925     99      99     4     99
> 2       A     4    1   37     99 4925  4925     99      99     4     99
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 4:28 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> The software attempts to read the registry and temporarily augment the
> path in case you have Rtools installed so that the filter can access
> all the tools that Rtools provides.  I am not sure why its failing on
> your system but there is evidently some differences between systems
> here and I have added some code to trap and bypass that portion in
> case it fails.  I have added the new version to the svn repository so
> try this:
>
> library(sqldf)
> # overwrite with development version
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
> # your code to call read.csv.sql
>
>
> On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
>  wrote:
>>
>> Gabor:
>> Here is the update. As you can see, I got the same error as below in 1.
>>
>> 1. Error
>>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
>> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>>
>> 2. But the loading of the bigger file was successful as you can see below. 
>> 857 MB, 333,250 rows, 227 columns. This is good.
>>
>> I will have to just do an inline edit in Perl and change the file to csv 
>> from within R and then call the read.csv.sql.
>>
>> If you have any suggestions to fix 1, I would like to try them.
>>
>>  system.time(test_df <- read.csv.sql(file="out.txt"))
>>   user  system elapsed
>>  192.53   15.50  213.68
>> Warning message:
>> closing unused connection 3 (out.txt)
>>
>> Thanks again.
>>
>> Satish
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 3:02 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> Note that you can shorten #1 to read.csv.sql("out.txt") since your
>> other arguments are the default values.
>>
>> For the second one, use read.csv.sql, eliminate

Re: [R] melt on OSX ignores na.rm=T

2010-02-06 Thread Titus von der Malsburg

Ok, I studied the source code of melt.data.frame.  With na.rm=T melt
operates normally except that it deletes rows from the molten
data.frame that have NAs in the value column.  NAs in the id.vars are
not touched.  This could be clearer in the documentation especially as
it seems that earlier versions of reshape behaved differently.

Best,

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about bigmemory: releasing RAM from a big.matrix that isn't used anymore

2010-02-06 Thread Matthew Keller

Hi Jay and Benilton,

Thank you both for your help. When I do not use the dimnames argument,
everything works fine:

x <- big.matrix(nrow=2e4,ncol=5e5,type='short',init=0) #18 Gb RAM used
rm(x) #18 Gb RAM used
gc() #no RAM used

However, when I use dimnames, I get this problem, reproducibly:

x <- big.matrix(nrow=2e4,ncol=5e5,type='short',init=0,dimnames=list(2e4,5e5))
#18 Gb RAM used
rm(x) #18 Gb RAM used
gc() #18 Gb RAM used

I looked in /dev/shm throughout this and saw nothing there. Might
these files be stored elsewhere?

In any event, I have an easy work-around now, so this isn't a big
deal, but is it a bug in bigmemory that it doesn't return RAM when the
dimnames argument is used? Here's my sessioninfo

> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu

locale:
[1] C

attached base packages:
[1] graphics  grDevices datasets  utils stats methods   base

other attached packages:
[1] doMC_1.2.0  multicore_0.1-3 foreach_1.3.0   codetools_0.2-2
iterators_1.0.3 pnmath0_0.0-2




On Sat, Feb 6, 2010 at 2:44 PM, Jay Emerson  wrote:
>
 See inline for responses.  But people are always welcome to contact
 us directly.
>
> Hi all,
>
> I'm on a Linux server with 48Gb RAM. I did the following:
>
> x <-
> big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
> #Gets around the 2^31 issue - yeah!
>
 We strongly discourage use of dimnames.
>
> in Unix, when I hit the "top" command, I see R is taking up about 18Gb
> RAM, even though the object x is 0 bytes in R. That's fine: that's how
> bigmemory is supposed to work I guess. My question is how do I return
> that RAM to the system once I don't want to use x any more? E.g.,
>
> rm(x)
>
> then "top" in Unix, I expect that my RAM footprint is back ~0, but it
> remains at 18Gb. How do I return RAM to the system?
>
 It can take a while for the OS to free up memory, even after a gc().
 But it's available for re-use; if you want to be really sure, have a
 look
 in /dev/shm to make sure the shared memory segments have been
 deleted.
>
> Thanks,
>
> Matt
>
> --
> John W. Emerson (Jay)
> Associate Professor of Statistics
> Department of Statistics
> Yale University
> http://www.stat.yale.edu/~jay
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting

2010-02-06 Thread Hans W Borchers

David Neu  davidneu.com> writes:

> David Neu  davidneu.com> writes:
>
> Hi,
>
> I have a list of vectors (of varying lengths).  I'd like to sort this
> list by applying a function to each pair of vectors in the list and
> returning information to sorting routine that let's it know which one
> is larger.
>
> To solve problems like this in Common Lisp, the sort function accepts
> a function as an argument.  The arguments to this function are two
> elements of the list which is being sorted.  The writer of the
> function returns t (TRUE in R) when the first argument to the function
> is larger than the second and nil (FALSE in R) otherwise.
>
> I'm wondering if there is some way to accomplish this in R.

I don't know whether there is a way to do it with the 'base::sort' function
-- and I too would very much like to know for an application of my own --,
but you can always define your own sorting, like here a simple bubble sort:

bubbleSort.list <- function(L, comp) {
stopifnot(is.list(L))
n <- length(L)
if (n <= 1) return(L)
for (i in 1:n) {
for (j in 2:n) {
b <- L[[j]]
if (comp(L[[j]], L[[j-1]])) {
L[[j]] <- L[[j-1]]
L[[j-1]] <- b
}
}
}
return(L)
}

If your comparing function, for example, compares first length and then mean:

comp <- function(L1, L2) {
  if (length(L1) Many thanks for any help!
>
> Cheers,
> David
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot of odds ratios obtained from a logistic model

2010-02-06 Thread Frank E Harrell Jr

Also take a look at summary.rms and its plot method which produces odds 
ratio dot charts directly.


Frank

David Winsemius wrote:


On Feb 6, 2010, at 4:13 PM, David Freedman wrote:



You might want to look at the plot.Predict function in the rms package 
- it
allows you to plot the logits or probablities vs the predictor 
variable at
specified levels of other covariates (if any) in the model.  There are 
many

examples in http://cran.r-project.org/web/packages/rms/rms.pdf


But it will not work on a glm( ..., family=binomial)  object. To work 
with such an object you would need to plot exp(fit$linear.predictors) or 
fit$fitted.values


plot.Predict will only work with the functions in the rms package, ... 
in the case of logistic models that would be lrm().




David Freedman
--
View this message in context: 
http://n4.nabble.com/Plot-of-odds-ratios-obtained-from-a-logistic-model-tp1471496p1471566.html 


Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] melt on OSX ignores na.rm=T

2010-02-06 Thread Titus von der Malsburg

On Sat, Feb 6, 2010 at 8:23 PM, hadley wickham  wrote:
> The latest version of reshape is 0.8.3 - perhaps upgrading will fix
> your problem.

Thanks for your response, Hadley!

I just did the upgrade on the Linux system.  On OSX I was already at
0.8.3.  Now, I get the same result on both systems.  However, the
result includes the NAs although I said na.rm=T:

library(reshape)

x <- read.table(textConnection("char trial wn
p E10I13D0  4
r E10I13D0  4
a E10I13D0  4
c E10I13D0  4
t E10I13D0  4
i E10I13D0  4
c E10I13D0  4
e E10I13D0  4
d E10I13D0  4
, E10I13D0 NA"), head=T)

melt(x, measure.vars="char", na.rm=T)
  trial wn variable value
1  E10I13D0  4 char p
2  E10I13D0  4 char r
3  E10I13D0  4 char a
4  E10I13D0  4 char c
5  E10I13D0  4 char t
6  E10I13D0  4 char i
7  E10I13D0  4 char c
8  E10I13D0  4 char e
9  E10I13D0  4 char d
10 E10I13D0 NA char ,

The documentation says "na.rm: Should NA values be removed from the
data set?".  Do I get something wrong?

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

I have uploaded another version which suppresses display of the error
message but otherwise works the same.  Omitting the redundant
arguments we have:

ibrary(sqldf)
# next line is only needed once per session to read in devel version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)

test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
parse_3wkout.pl")


On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> Please see the results below. Sourcing your new R script worked (although 
> with the same error message). If I put eol="\n" option, it is adding a "\r" 
> to the last column. I took out the eol option below. This is just some more 
> feedback to you.
>
> I am thinking that I will just do an inline edit in Perl (that is create the 
> csv file through Perl by overwriting the current file) and then use 
> read.csv.sql without the filter= option. This seems to be more tried and 
> tested. If you have any suggestions, please let me know. Thanks.
> Satish
>
>
> BEFORE SOURCING YOUR NEW R SCRIPT
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>> test_df
> Error: object 'test_df' not found
>
> AFTER SOURCING YOUR NEW R SCRIPT
>> source("f:/dp_modeling_team/downloads/R/sqldf.R")
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
> In addition: Warning messages:
> 1: closing unused connection 5 (3wkoutstatfcst_small.dat)
> 2: closing unused connection 4 (3wkoutstatfcst_small.dat)
> 3: closing unused connection 3 (3wkoutstatfcst_small.dat)
>> test_df
>   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
> 1       A     4    1   37     99 4925  4925     99      99     4     99
> 2       A     4    1   37     99 4925  4925     99      99     4     99
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 4:28 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> The software attempts to read the registry and temporarily augment the
> path in case you have Rtools installed so that the filter can access
> all the tools that Rtools provides.  I am not sure why its failing on
> your system but there is evidently some differences between systems
> here and I have added some code to trap and bypass that portion in
> case it fails.  I have added the new version to the svn repository so
> try this:
>
> library(sqldf)
> # overwrite with development version
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
> # your code to call read.csv.sql
>
>
> On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
>  wrote:
>>
>> Gabor:
>> Here is the update. As you can see, I got the same error as below in 1.
>>
>> 1. Error
>>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
>> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>>
>> 2. But the loading of the bigger file was successful as you can see below. 
>> 857 MB, 333,250 rows, 227 columns. This is good.
>>
>> I will have to just do an inline edit in Perl and change the file to csv 
>> from within R and then call the read.csv.sql.
>>
>> If you have any suggestions to fix 1, I would like to try them.
>>
>>  system.time(test_df <- read.csv.sql(file="out.txt"))
>>   user  system elapsed
>>  192.53   15.50  213.68
>> Warning message:
>> closing unused connection 3 (out.txt)
>>
>> Thanks again.
>>
>> Satish
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 3:02 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> Note that you can shorten #1 to read.csv.sql("out.txt") since your
>> other arguments are the default values.
>>
>> For the second one, use read.csv.sql, eliminate the arguments that are
>> defaults anyways (should not cause a problem but its error prone) and
>> add an explicit eol= argument since SQLite can have problems with end
>> of line in some cases.  Also test out your perl script separately from
>> R first to ensure that it works:
>>
>> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
>> parse_3wkout.pl", eol = "\n")
>>
>> SQLite has some known problems with end of line so try it with and
>> without the eol= argument just in case.  When I just made up the
>> following gawk example I noticed that I did need to specify the eol=
>> argument.
>>
>> Also I have added a complete example using gawk as Example 13c on the
>

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Gabor:
Please see the results below. Sourcing your new R script worked (although with 
the same error message). If I put eol="\n" option, it is adding a "\r" to the 
last column. I took out the eol option below. This is just some more feedback 
to you.

I am thinking that I will just do an inline edit in Perl (that is create the 
csv file through Perl by overwriting the current file) and then use 
read.csv.sql without the filter= option. This seems to be more tried and 
tested. If you have any suggestions, please let me know. Thanks.
Satish


BEFORE SOURCING YOUR NEW R SCRIPT
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * from 
> file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found
> test_df
Error: object 'test_df' not found

AFTER SOURCING YOUR NEW R SCRIPT
> source("f:/dp_modeling_team/downloads/R/sqldf.R")
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * from 
> file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl")
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found
In addition: Warning messages:
1: closing unused connection 5 (3wkoutstatfcst_small.dat) 
2: closing unused connection 4 (3wkoutstatfcst_small.dat) 
3: closing unused connection 3 (3wkoutstatfcst_small.dat) 
> test_df
   allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3
1   A 41   37 99 4925  4925 99  99 4 99
2   A 41   37 99 4925  4925 99  99 4 99 

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 4:28 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

The software attempts to read the registry and temporarily augment the
path in case you have Rtools installed so that the filter can access
all the tools that Rtools provides.  I am not sure why its failing on
your system but there is evidently some differences between systems
here and I have added some code to trap and bypass that portion in
case it fails.  I have added the new version to the svn repository so
try this:

library(sqldf)
# overwrite with development version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
# your code to call read.csv.sql


On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
 wrote:
>
> Gabor:
> Here is the update. As you can see, I got the same error as below in 1.
>
> 1. Error
>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> 2. But the loading of the bigger file was successful as you can see below. 
> 857 MB, 333,250 rows, 227 columns. This is good.
>
> I will have to just do an inline edit in Perl and change the file to csv from 
> within R and then call the read.csv.sql.
>
> If you have any suggestions to fix 1, I would like to try them.
>
>  system.time(test_df <- read.csv.sql(file="out.txt"))
>   user  system elapsed
>  192.53   15.50  213.68
> Warning message:
> closing unused connection 3 (out.txt)
>
> Thanks again.
>
> Satish
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 3:02 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Note that you can shorten #1 to read.csv.sql("out.txt") since your
> other arguments are the default values.
>
> For the second one, use read.csv.sql, eliminate the arguments that are
> defaults anyways (should not cause a problem but its error prone) and
> add an explicit eol= argument since SQLite can have problems with end
> of line in some cases.  Also test out your perl script separately from
> R first to ensure that it works:
>
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
> parse_3wkout.pl", eol = "\n")
>
> SQLite has some known problems with end of line so try it with and
> without the eol= argument just in case.  When I just made up the
> following gawk example I noticed that I did need to specify the eol=
> argument.
>
> Also I have added a complete example using gawk as Example 13c on the
> home page just now:
> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>
>
> On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>>
>> I had success with the following.
>> 1. I created a csv file with a perl script called "out.txt". Then ran the 
>> following successfully
>> library("sqldf")
>> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
>> TRUE, sep = ",", dbname = tempfile())
>>
>> 2. I did not have success with the following. Could you tell me what I may 
>> be doing wrong? I could paste the perl scri

Re: [R] Random number quality

2010-02-06 Thread Thomas Lumley


On Sat, 6 Feb 2010, Patrick Burns wrote:


A couple comments.

Although pseudo-random numbers were originally
used because of necessity rather than choice,
there is a definite upside to using them.  That
upside is that the computations become reproducible
if you set the seed first (see 'set.seed').

I tend to encourage skepticism at pretty much
every turn.  But I find this piece of skepticism
a bit misplaced.  The application that you describe
does not sound at all demanding, and R Core is
populated by some of the best statistical computing
people in the world.



It depends on the purpose that the random numbers are needed for.  For 
statistical simulation the default generators are good, and if you want to be 
even more sure you can run a simulation again with a different generator.

There are some purposes for which the generators are inadequate

1) they are not cryptographically secure: it is feasible to work out the random 
seed and hence the future sequence by observing enough of the output.  They 
cannot be used to generate numbers that must be unpredictable to an intelligent 
adversary. For many applications like this you wouldn't want to use numbers 
from random.org either -- they are sent over the public networks, after all.


2) they may not be random enough for some number-theoretic algorithms.  For 
example, there is an efficient algorithm for finding prime numbers based on 
random choices, but no efficient deterministic algorithm is known and it is an 
open question whether an efficient deterministic algorithm even exists.  It is 
possible that simple random number generators could give substantially worse 
performance in random algorithms of this sort, though the limited empirical 
evidence I am aware of is in the other direction.


  -thomas


On 05/02/2010 22:04, b k wrote:

Hello,

I'm running R 2.10.1 on Windows Vista. I'm selecting a random sample of
several hundred items out of a larger population of several thousand. I
realize there is srswor() in package sampling for exactly this purpose, but
as far as I can tell it uses the native PRNG which may or may not be random
enough. Instead I used the random package which pulls random numbers from
random.org, although in my extended reading  [vignette("random-intro",
package="random")] it seem like that may have problems also.

I'm curious what the general consensus is for random number quality for 
both

the native built-in PRNG and any alternatives including the random package.

Thanks,
Ben K.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

The software attempts to read the registry and temporarily augment the
path in case you have Rtools installed so that the filter can access
all the tools that Rtools provides.  I am not sure why its failing on
your system but there is evidently some differences between systems
here and I have added some code to trap and bypass that portion in
case it fails.  I have added the new version to the svn repository so
try this:

library(sqldf)
# overwrite with development version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
# your code to call read.csv.sql


On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
 wrote:
>
> Gabor:
> Here is the update. As you can see, I got the same error as below in 1.
>
> 1. Error
>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> 2. But the loading of the bigger file was successful as you can see below. 
> 857 MB, 333,250 rows, 227 columns. This is good.
>
> I will have to just do an inline edit in Perl and change the file to csv from 
> within R and then call the read.csv.sql.
>
> If you have any suggestions to fix 1, I would like to try them.
>
>  system.time(test_df <- read.csv.sql(file="out.txt"))
>   user  system elapsed
>  192.53   15.50  213.68
> Warning message:
> closing unused connection 3 (out.txt)
>
> Thanks again.
>
> Satish
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 3:02 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Note that you can shorten #1 to read.csv.sql("out.txt") since your
> other arguments are the default values.
>
> For the second one, use read.csv.sql, eliminate the arguments that are
> defaults anyways (should not cause a problem but its error prone) and
> add an explicit eol= argument since SQLite can have problems with end
> of line in some cases.  Also test out your perl script separately from
> R first to ensure that it works:
>
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
> parse_3wkout.pl", eol = "\n")
>
> SQLite has some known problems with end of line so try it with and
> without the eol= argument just in case.  When I just made up the
> following gawk example I noticed that I did need to specify the eol=
> argument.
>
> Also I have added a complete example using gawk as Example 13c on the
> home page just now:
> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>
>
> On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>>
>> I had success with the following.
>> 1. I created a csv file with a perl script called "out.txt". Then ran the 
>> following successfully
>> library("sqldf")
>> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
>> TRUE, sep = ",", dbname = tempfile())
>>
>> 2. I did not have success with the following. Could you tell me what I may 
>> be doing wrong? I could paste the perl script if necessary. From the perl 
>> script, I am reading the file, creating the csv record and printing each 
>> record one by one and then exiting.
>>
>> Thanks.
>>
>> Not had success with below..
>> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
>> test_df
>>
>> Error message below:
>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>> In addition: Warning messages:
>> 1: closing unused connection 14 (3wkoutstatfcst_small.dat)
>> 2: closing unused connection 13 (3wkoutstatfcst_small.dat)
>> 3: closing unused connection 11 (3wkoutstatfcst_small.dat)
>> 4: closing unused connection 9 (3wkoutstatfcst_small.dat)
>> 5: closing unused connection 3 (3wkoutstatfcst_small.dat)
>>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>>> = tempfile())
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 12:14 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> No.
>>
>> On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Gabor:
>>> Can I pass colClasses as a vector to read.csv.sql? Thanks.
>>> Satish
>>>
>>>
>>> -Original Message-
>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>> Sent: Saturday, February 06, 2010 9

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}


Gabor:
Here is the update. As you can see, I got the same error as below in 1.

1. Error
 test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found 

2. But the loading of the bigger file was successful as you can see below. 857 
MB, 333,250 rows, 227 columns. This is good.

I will have to just do an inline edit in Perl and change the file to csv from 
within R and then call the read.csv.sql. 

If you have any suggestions to fix 1, I would like to try them.

 system.time(test_df <- read.csv.sql(file="out.txt"))
   user  system elapsed 
 192.53   15.50  213.68 
Warning message:
closing unused connection 3 (out.txt) 

Thanks again.

Satish

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 3:02 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

Note that you can shorten #1 to read.csv.sql("out.txt") since your
other arguments are the default values.

For the second one, use read.csv.sql, eliminate the arguments that are
defaults anyways (should not cause a problem but its error prone) and
add an explicit eol= argument since SQLite can have problems with end
of line in some cases.  Also test out your perl script separately from
R first to ensure that it works:

test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
parse_3wkout.pl", eol = "\n")

SQLite has some known problems with end of line so try it with and
without the eol= argument just in case.  When I just made up the
following gawk example I noticed that I did need to specify the eol=
argument.

Also I have added a complete example using gawk as Example 13c on the
home page just now:
http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql


On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
>
> I had success with the following.
> 1. I created a csv file with a perl script called "out.txt". Then ran the 
> following successfully
> library("sqldf")
> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
> TRUE, sep = ",", dbname = tempfile())
>
> 2. I did not have success with the following. Could you tell me what I may be 
> doing wrong? I could paste the perl script if necessary. From the perl 
> script, I am reading the file, creating the csv record and printing each 
> record one by one and then exiting.
>
> Thanks.
>
> Not had success with below..
> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
> test_df
>
> Error message below:
> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
> In addition: Warning messages:
> 1: closing unused connection 14 (3wkoutstatfcst_small.dat)
> 2: closing unused connection 13 (3wkoutstatfcst_small.dat)
> 3: closing unused connection 11 (3wkoutstatfcst_small.dat)
> 4: closing unused connection 9 (3wkoutstatfcst_small.dat)
> 5: closing unused connection 3 (3wkoutstatfcst_small.dat)
>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 12:14 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> No.
>
> On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>> Can I pass colClasses as a vector to read.csv.sql? Thanks.
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 9:41 AM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> Its just any Windows batch command string that filters stdin to
>> stdout.  What the command consists of should not be important.   An
>> invocation of perl that runs a perl script that filters stdin to
>> stdout might look like this:
>>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>>
>> For an actual example see the source of read.csv2.sql which defaults
>> to using a Windows vbscript program as a filter.
>>
>> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Jim, Gabor:
>>> Thanks so much for the suggestions where I can use read.csv.sql and embed 
>>> Perl (or gawk). I just want to mention that I am running on Windows. I am 
>>> goi

Re: [R] read and process files line by line

2010-02-06 Thread Gabor Grothendieck

Check this:

https://stat.ethz.ch/pipermail/r-help/2009-March/192942.html

On Sat, Feb 6, 2010 at 4:57 PM, Dick Harray  wrote:
> Hi there,
>
> I want to read large files line by line in order to process each line
> and store the information of each list iin an object. My problem is,
> that I do not know how to process each line of the file, because I
> want to avoid to import the whole file.
>
> The data file "inputdata.csv" is a CSV file with a header like:
> Number, Name, factors1, factors2
> "123", "some characters", "a; list; of; factors", "a; second; list; of; 
> factors"
>
> How can I read the file in a manner like:
>
> foreach_line_in("inputdata.csv") {
>  currentline <- function_or_command_to_get_the_current_line()
>
>  # { here comes the block in which the current line is processed, and
> the new object created
>  #  no help needed here}
>
> }
>
> Thanks and regards,
> d!rk
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RBloomberg on Mac Leopard

2010-02-06 Thread christophe ollier

Lucio Intelligente  gmail.com> writes:

> 
> Hi,
> 
> I'm running R 2.10.1 GUI 1.31 Leopard build 64-bit (5537).
> 
> I cannot install RBloomberg on my Mac. After I type:
> 
> install.packages("RBloomberg", repos="http://R-Forge.R-project.org";)
> 
> I get the following message:
> 
> Warning in install.packages("RBloomberg", repos =
"http://R-Forge.R-project.org";) :
>   argument 'lib' is missing: using '/Users/Lucio/Library/R/2.10/library'
> Warning message:
> In getDependencies(pkgs, dependencies, available, lib) :
>   package RBloomberg is not available
> 
> Could you please help?
> Thanks in advance
> 
> Cheers,
> Lucio
>   [[alternative HTML version deleted]]
> 
> 
Hi Lucio

I guess you need RDCOMclient to run the RBloomberg package so it means it is
only available for Windows OS world as far as I know.

Cheers,
Christophe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read and process files line by line

2010-02-06 Thread Dick Harray

Hi there,

I want to read large files line by line in order to process each line
and store the information of each list iin an object. My problem is,
that I do not know how to process each line of the file, because I
want to avoid to import the whole file.

The data file "inputdata.csv" is a CSV file with a header like:
Number, Name, factors1, factors2
"123", "some characters", "a; list; of; factors", "a; second; list; of; factors"

How can I read the file in a manner like:

foreach_line_in("inputdata.csv") {
  currentline <- function_or_command_to_get_the_current_line()

  # { here comes the block in which the current line is processed, and
the new object created
  #  no help needed here}

}

Thanks and regards,
d!rk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about bigmemory: releasing RAM from a big.matrix that isn't used anymore

2010-02-06 Thread Jay Emerson

>>> See inline for responses.  But people are always welcome to contact
>>> us directly.

Hi all,

I'm on a Linux server with 48Gb RAM. I did the following:

x <-
big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
#Gets around the 2^31 issue - yeah!

>>> We strongly discourage use of dimnames.

in Unix, when I hit the "top" command, I see R is taking up about 18Gb
RAM, even though the object x is 0 bytes in R. That's fine: that's how
bigmemory is supposed to work I guess. My question is how do I return
that RAM to the system once I don't want to use x any more? E.g.,

rm(x)

then "top" in Unix, I expect that my RAM footprint is back ~0, but it
remains at 18Gb. How do I return RAM to the system?

>>> It can take a while for the OS to free up memory, even after a gc().
>>> But it's available for re-use; if you want to be really sure, have a
look
>>> in /dev/shm to make sure the shared memory segments have been
>>> deleted.

Thanks,

Matt

-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about bigmemory: releasing RAM from a big.matrix that isn't used anymore

2010-02-06 Thread Benilton Carvalho

Hi Matt,

what's your sessionInfo()? Can you try installing bigmemory as follows:

install.packages("bigmemory", repos="http://R-Forge.R-project.org";)

it'll get you the latest version, in which I cannot reproduce the
problem you're reporting (ie, after gc(), I get all the RAM back)

b

On Sat, Feb 6, 2010 at 7:37 PM, Matthew Keller  wrote:
> Hi Steve and other R folks,
>
> Thanks for the suggestion. No - that doesn't work. I meant to put that
> into my original email. To recap
>
> x <- 
> big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
> #Gets around the 2^31 issue - yay!
> #R takes 18 Gb RAM, so says top
>
> rm(x)  #top says R still takes 18Gb RAM
> gc()   #top says R still takes 18Gb RAM
>
> How do I flush the memory? I thought maybe R/bigmemory would give up
> the RAM if it was needed elsewhere, but apparently not:
>
> y <- 
> big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
> #takes *another* 18Gb RAM, and takes it away from several other
> processes I had running - OUCH!
>
> Any help would be appreciated.
>
> As an aside, I just want to say "thank you" to the teams developing
> bigmemory, ff, and other packages meant to allow users of large
> datasets to still use R.
>
> Matt
>
>
>
> On Fri, Feb 5, 2010 at 9:27 PM, Steve Lianoglou
>  wrote:
>> Hi,
>>
>> On Fri, Feb 5, 2010 at 9:24 PM, Matthew Keller  
>> wrote:
>>> Hi all,
>>>
>>> I'm on a Linux server with 48Gb RAM. I did the following:
>>>
>>> x <- 
>>> big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
>>> #Gets around the 2^31 issue - yeah!
>>>
>>> in Unix, when I hit the "top" command, I see R is taking up about 18Gb
>>> RAM, even though the object x is 0 bytes in R. That's fine: that's how
>>> bigmemory is supposed to work I guess. My question is how do I return
>>> that RAM to the system once I don't want to use x any more? E.g.,
>>>
>>> rm(x)
>>>
>>> then "top" in Unix, I expect that my RAM footprint is back ~0, but it
>>> remains at 18Gb. How do I return RAM to the system?
>>
>> Maybe forcing R to do garbage collection might help?
>>
>> Try calling `gc()` after your call to `rm(x)` and see what `top` tells you.
>>
>> Did that do the trick?
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  | Memorial Sloan-Kettering Cancer Center
>>  | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>
>
>
> --
> Matthew C Keller
> Asst. Professor of Psychology
> University of Colorado at Boulder
> www.matthewckeller.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot of odds ratios obtained from a logistic model

2010-02-06 Thread David Winsemius



On Feb 6, 2010, at 4:13 PM, David Freedman wrote:



You might want to look at the plot.Predict function in the rms  
package - it
allows you to plot the logits or probablities vs the predictor  
variable at
specified levels of other covariates (if any) in the model.  There  
are many

examples in http://cran.r-project.org/web/packages/rms/rms.pdf


But it will not work on a glm( ..., family=binomial)  object. To work  
with such an object you would need to plot exp(fit$linear.predictors)  
or fit$fitted.values


plot.Predict will only work with the functions in the rms package, ...  
in the case of logistic models that would be lrm().




David Freedman
--
View this message in context: 
http://n4.nabble.com/Plot-of-odds-ratios-obtained-from-a-logistic-model-tp1471496p1471566.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot of odds ratios obtained from a logistic model

2010-02-06 Thread David Freedman


You might want to look at the plot.Predict function in the rms package - it
allows you to plot the logits or probablities vs the predictor variable at
specified levels of other covariates (if any) in the model.  There are many
examples in http://cran.r-project.org/web/packages/rms/rms.pdf

David Freedman
-- 
View this message in context: 
http://n4.nabble.com/Plot-of-odds-ratios-obtained-from-a-logistic-model-tp1471496p1471566.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

Note that you can shorten #1 to read.csv.sql("out.txt") since your
other arguments are the default values.

For the second one, use read.csv.sql, eliminate the arguments that are
defaults anyways (should not cause a problem but its error prone) and
add an explicit eol= argument since SQLite can have problems with end
of line in some cases.  Also test out your perl script separately from
R first to ensure that it works:

test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
parse_3wkout.pl", eol = "\n")

SQLite has some known problems with end of line so try it with and
without the eol= argument just in case.  When I just made up the
following gawk example I noticed that I did need to specify the eol=
argument.

Also I have added a complete example using gawk as Example 13c on the
home page just now:
http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql


On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
>
> I had success with the following.
> 1. I created a csv file with a perl script called "out.txt". Then ran the 
> following successfully
> library("sqldf")
> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
> TRUE, sep = ",", dbname = tempfile())
>
> 2. I did not have success with the following. Could you tell me what I may be 
> doing wrong? I could paste the perl script if necessary. From the perl 
> script, I am reading the file, creating the csv record and printing each 
> record one by one and then exiting.
>
> Thanks.
>
> Not had success with below..
> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
> test_df
>
> Error message below:
> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
> In addition: Warning messages:
> 1: closing unused connection 14 (3wkoutstatfcst_small.dat)
> 2: closing unused connection 13 (3wkoutstatfcst_small.dat)
> 3: closing unused connection 11 (3wkoutstatfcst_small.dat)
> 4: closing unused connection 9 (3wkoutstatfcst_small.dat)
> 5: closing unused connection 3 (3wkoutstatfcst_small.dat)
>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 12:14 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> No.
>
> On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Gabor:
>> Can I pass colClasses as a vector to read.csv.sql? Thanks.
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 9:41 AM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> Its just any Windows batch command string that filters stdin to
>> stdout.  What the command consists of should not be important.   An
>> invocation of perl that runs a perl script that filters stdin to
>> stdout might look like this:
>>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>>
>> For an actual example see the source of read.csv2.sql which defaults
>> to using a Windows vbscript program as a filter.
>>
>> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Jim, Gabor:
>>> Thanks so much for the suggestions where I can use read.csv.sql and embed 
>>> Perl (or gawk). I just want to mention that I am running on Windows. I am 
>>> going to read the documentation the filter argument and see if it can take 
>>> a decent sized Perl script and then use its output as input.
>>>
>>> Suppose that I write a Perl script that parses this fwf file and creates a 
>>> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be 
>>> a statement or something? If you know the answer, please let me know. 
>>> Otherwise, I will try a few things and report back the results.
>>>
>>> Thanks again.
>>> Saitsh
>>>
>>>
>>> -Original Message-
>>> From: jim holtman [mailto:jholt...@gmail.com]
>>> Sent: Saturday, February 06, 2010 6:16 AM
>>> To: Gabor Grothendieck
>>> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
>>> Subject: Re: [R] Reading large files
>>>
>>> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>>>
>>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>>>  wrote:
 Note that the filter= argument on read.csv.sql can be used to pass the
 input through a filter written in perl, [g]awk or other language.

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Gabor:

I had success with the following.
1. I created a csv file with a perl script called "out.txt". Then ran the 
following successfully
library("sqldf")
test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
TRUE, sep = ",", dbname = tempfile())

2. I did not have success with the following. Could you tell me what I may be 
doing wrong? I could paste the perl script if necessary. From the perl script, 
I am reading the file, creating the csv record and printing each record one by 
one and then exiting.

Thanks.

Not had success with below..
#test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * from 
file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
tempfile())
test_df 

Error message below:
test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * from 
file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
tempfile())
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found
In addition: Warning messages:
1: closing unused connection 14 (3wkoutstatfcst_small.dat) 
2: closing unused connection 13 (3wkoutstatfcst_small.dat) 
3: closing unused connection 11 (3wkoutstatfcst_small.dat) 
4: closing unused connection 9 (3wkoutstatfcst_small.dat) 
5: closing unused connection 3 (3wkoutstatfcst_small.dat) 
> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = 
> tempfile())
Error in readRegistry(key, maxdepth = 3) : 
  Registry key 'SOFTWARE\R-core' not found

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 12:14 PM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

No.

On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> Can I pass colClasses as a vector to read.csv.sql? Thanks.
> Satish
>
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 9:41 AM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Its just any Windows batch command string that filters stdin to
> stdout.  What the command consists of should not be important.   An
> invocation of perl that runs a perl script that filters stdin to
> stdout might look like this:
>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>
> For an actual example see the source of read.csv2.sql which defaults
> to using a Windows vbscript program as a filter.
>
> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
>  wrote:
>> Jim, Gabor:
>> Thanks so much for the suggestions where I can use read.csv.sql and embed 
>> Perl (or gawk). I just want to mention that I am running on Windows. I am 
>> going to read the documentation the filter argument and see if it can take a 
>> decent sized Perl script and then use its output as input.
>>
>> Suppose that I write a Perl script that parses this fwf file and creates a 
>> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be 
>> a statement or something? If you know the answer, please let me know. 
>> Otherwise, I will try a few things and report back the results.
>>
>> Thanks again.
>> Saitsh
>>
>>
>> -Original Message-
>> From: jim holtman [mailto:jholt...@gmail.com]
>> Sent: Saturday, February 06, 2010 6:16 AM
>> To: Gabor Grothendieck
>> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>>
>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>>  wrote:
>>> Note that the filter= argument on read.csv.sql can be used to pass the
>>> input through a filter written in perl, [g]awk or other language.
>>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>>
>>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>>> width fields, e.g.
>>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>>> making this very easy but perl or whatever you are most used to would
>>> be fine too.
>>>
>>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>>  wrote:
 Hi Gabor:
 Thanks. My files are all in fixed width format. They are a lot of them. It 
 would take me some effort to convert them to CSV. I guess this cannot be 
 avoided? I can write some Perl scripts to convert fixed width format to 
 CSV format and then start with your suggestion. Could you let me know your 
 thoughts on the approach?
 Satish


 -Original Message-
 From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
 Sent: Friday, February 05, 2010 5:16 PM
 To: Vadlamani, Satish {FLNA}
 Cc: r-help@r-project.org
 Subject: Re: [R] Reading large files

 If your problem is just how long it takes to load the file into R try

Re: [R] Sorting

2010-02-06 Thread David Neu

On Sat, Feb 6, 2010 at 3:22 PM, Hans W Borchers
 wrote:
> David Neu  davidneu.com> writes:
>
>>
>> Hi,
>>
>> I have a list of vectors (of varying lengths).  I'd like to sort this
>> list by applying a function to each pair of vectors in the list and
>> returning information to sorting routine that let's it know which one
>> is larger.
>>
>> To solve problems like this in Common Lisp, the sort function accepts
>> a function as an argument.  The arguments to this function are two
>> elements of the list which is being sorted.  The writer of the
>> function returns t (TRUE in R) when the first argument to the function
>> is larger than the second and nil (FALSE in R) otherwise.
>>
>> I'm wondering if there is some way to accomplish this in R.
>
> Would the following function do what you want?
>
>    sortList <- function(L, fun) L[order(sapply(L, fun))]
>
> Here is my test and my understanding of your request;
>
>    L <- list()  # define a list of vectors of varying length
>    for (i in 1:10) { n <- sample(1:10, 1); L[[i]] <- runif(n) }
>
>    Ls <- sortList(L, mean)
>    sapply(Ls, mean)  # increasing mean values
>
> Hans Werner
>
>> Many thanks for any help!
>>
>> Cheers,
>> David
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Hi Hans,

Thanks for your response, but I need the comparison function to have
access to both vectors

Cheers,
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-Help

2010-02-06 Thread David Winsemius

On Feb 6, 2010, at 3:29 PM, Ravi Ramaswamy wrote:

Hi - I am not familiar with R.  Could I ask you a quick question?

When I read a file like this, I get an error.  Not sure what I am  
doing
wrong.  I use a MAC.  How do I specify a full path name for a file  
in R?  Or

do files have to reside locally?

KoreaAuto <- read.table(""/Users/

I think the opening and clsing quotes meant that you supplied an empty  
string to the file argument.

raviramaswamy/Documents/Rutgers/STT 586/HW1 Data.txt"")
Error: unexpected numeric constant in "KoreaAuto <-
read.table(""/Users/raviramaswamy/Documents/Rutgers/STT 586"

Using single instances of either sort of quote ( " or ' ) on the ends  
of strings should work. If you drag a file from a Finder window to the  
R-console you should get a fully specified file path and name.

Seems like the working directory is

getwd()

[1] "/Users/raviramaswamy"

> rd <- read.table(file="/Users/davidwinsemius/Downloads/ 
meminfo.csv", sep=",", header=TRUE)

> rd
 time  RSS  VSZ  MEM
1   1  3027932  3141808  4.5
2   2  3028572  3141808  4.5
3   3  3030208  3141808  4.5
4   4  302  3150004  4.5
5   5  3035036  3150004  4.5

You can also shorten the Users/ part to "~"
> rd <- read.table(file="~/Downloads/meminfo.csv", sep=",",  
header=TRUE)

so I said this and still got an error

KoreaAuto <- read.table(/Documents/Rutgers/HW1Data)

Error: unexpected '/' in "KoreaAuto <- read.table(/"

But using no quotes will definitely not work. (And that was not a full  
path name anyway.)

Could someone please help me with the correct syntax?

Thanks

Ravi

   Year   AO  GNP  CP   OP
011974 .0022 183  2322 189
021975 .0024 238  2729 206
031976 .0027 319  3069 206
041977 .0035 408  2763 190
051978 .0050 540  2414 199
061979 .0064 676  2440 233
071980 .0065 785  2430 630
081981 .0069 944  2631 740
091982 .0078 1036 3155 740
101983 .0095 1171 3200 660

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] EC2 - Amazon

2010-02-06 Thread Dirk Eddelbuettel


On 6 February 2010 at 12:04, Brian Wolf wrote:
| I was  wondering if anyone had any experience or knew if its possible to 
| run R on EC2?

Yes, and apparently pretty easily. 

Ubuntu has a number of pre-build images which have the standard R packages
and a number of additional packages already installed. This should allow you
to run e.g. Rmpi out of the box over several machines.

Dirk

-- 
  Registration is open for the 2nd International conference R / Finance 2010
  See http://www.RinFinance.com for details, and see you in Chicago in April!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting

2010-02-06 Thread David Neu

Hi Don,

Thanks for your response.

> Which one is larger, mylist$v1 or mylist$v2? Longer, yes, mylist$v2 is
> longer. But larger?
Sure, given any two vectors the goal is to be able to use any criteria
to determine if the first vector is "larger than" the second one.  The
key requirement is that depending on the criteria I may need both
vectors in hand to perform the test.

Cheers,
David

On Sat, Feb 6, 2010 at 3:00 PM, Don MacQueen  wrote:
> I have trouble making sense of the question, and I wonder if there is a
> terminology issue.
>
> For example, you have a list like this one:
>
>  mylist <- list( v1=1:3, v2=1:4, v3=1:5, v4=1:6)
>
> (That is, a list of vectors of varying lengths.)
>
> You want to apply a function to each pair of vectors:
>  First to v1 and v2,
>  Then to v2 and v3,
>  Then to v3 and v4
> ?
> And also to pairs v1 and v3, v1 and v4, and so on?
>
> Which one is larger, mylist$v1 or mylist$v2? Longer, yes, mylist$v2 is
> longer. But larger?
>
> And ultimately you want to have the list with its elements in some other
> order,
> perhaps v4 comes first, then v3, and so on?
>
> -Don
>
>
> At 1:21 PM -0500 2/6/10, David Neu wrote:
>>
>> Hi,
>>
>> I have a list of vectors (of varying lengths).  I'd like to sort this
>> list by applying a function to each pair of vectors in the list and
>> returning information to sorting routine that let's it know which one
>> is larger.
>>
>> To solve problems like this in Common Lisp, the sort function accepts
>> a function as an argument.  The arguments to this function are two
>> elements of the list which is being sorted.  The writer of the
>> function returns t (TRUE in R) when the first argument to the function
>> is larger than the second and nil (FALSE in R) otherwise.
>>
>> I'm wondering if there is some way to accomplish this in R.
>>
>> Many thanks for any help!
>>
>> Cheers,
>> David
>>
>> __
>> R-help@r-project.org mailing list
>> https://*stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://*www.*R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> --
> --
> Don MacQueen
> Environmental Protection Department
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
> 925-423-1062
> --
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-Help

2010-02-06 Thread Cedrick W. Johnson


Try:

KoreaAuto <- read.table('Documents/Rutgers/HW1Data')

-or-

KoreaAuto <- read.table('Users/raviramaswamy/Documents/Rutgers/STT 586/HW1 
Data.txt')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-Help

2010-02-06 Thread Ravi Ramaswamy

Hi - I am not familiar with R.  Could I ask you a quick question?

When I read a file like this, I get an error.  Not sure what I am doing
wrong.  I use a MAC.  How do I specify a full path name for a file in R?  Or
do files have to reside locally?

> KoreaAuto <- read.table(""/Users/
raviramaswamy/Documents/Rutgers/STT 586/HW1 Data.txt"")
Error: unexpected numeric constant in "KoreaAuto <-
read.table(""/Users/raviramaswamy/Documents/Rutgers/STT 586"
>

Seems like the working directory is
> getwd()
[1] "/Users/raviramaswamy"
>

so I said this and still got an error

> KoreaAuto <- read.table(/Documents/Rutgers/HW1Data)
Error: unexpected '/' in "KoreaAuto <- read.table(/"


Could someone please help me with the correct syntax?

Thanks

Ravi

Year   AO  GNP  CP   OP
011974 .0022 183  2322 189
021975 .0024 238  2729 206
031976 .0027 319  3069 206
041977 .0035 408  2763 190
051978 .0050 540  2414 199
061979 .0064 676  2440 233
071980 .0065 785  2430 630
081981 .0069 944  2631 740
091982 .0078 1036 3155 740
101983 .0095 1171 3200 660

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting

2010-02-06 Thread Hans W Borchers

David Neu  davidneu.com> writes:

> 
> Hi,
> 
> I have a list of vectors (of varying lengths).  I'd like to sort this
> list by applying a function to each pair of vectors in the list and
> returning information to sorting routine that let's it know which one
> is larger.
> 
> To solve problems like this in Common Lisp, the sort function accepts
> a function as an argument.  The arguments to this function are two
> elements of the list which is being sorted.  The writer of the
> function returns t (TRUE in R) when the first argument to the function
> is larger than the second and nil (FALSE in R) otherwise.
> 
> I'm wondering if there is some way to accomplish this in R.

Would the following function do what you want?

sortList <- function(L, fun) L[order(sapply(L, fun))]

Here is my test and my understanding of your request;

L <- list()  # define a list of vectors of varying length
for (i in 1:10) { n <- sample(1:10, 1); L[[i]] <- runif(n) }

Ls <- sortList(L, mean)
sapply(Ls, mean)  # increasing mean values

Hans Werner

> Many thanks for any help!
> 
> Cheers,
> David
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] EC2 - Amazon

2010-02-06 Thread Brian Wolf


Hi,

I was  wondering if anyone had any experience or knew if its possible to 
run R on EC2?


Thanks,

Brian Wolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting

2010-02-06 Thread Don MacQueen

I have trouble making sense of the question, and I wonder if there is 
a terminology issue.


For example, you have a list like this one:

  mylist <- list( v1=1:3, v2=1:4, v3=1:5, v4=1:6)

(That is, a list of vectors of varying lengths.)

You want to apply a function to each pair of vectors:
  First to v1 and v2,
  Then to v2 and v3,
  Then to v3 and v4
?
And also to pairs v1 and v3, v1 and v4, and so on?

Which one is larger, mylist$v1 or mylist$v2? Longer, yes, mylist$v2 
is longer. But larger?


And ultimately you want to have the list with its elements in some other order,
perhaps v4 comes first, then v3, and so on?

-Don


At 1:21 PM -0500 2/6/10, David Neu wrote:

Hi,

I have a list of vectors (of varying lengths).  I'd like to sort this
list by applying a function to each pair of vectors in the list and
returning information to sorting routine that let's it know which one
is larger.

To solve problems like this in Common Lisp, the sort function accepts
a function as an argument.  The arguments to this function are two
elements of the list which is being sorted.  The writer of the
function returns t (TRUE in R) when the first argument to the function
is larger than the second and nil (FALSE in R) otherwise.

I'm wondering if there is some way to accomplish this in R.

Many thanks for any help!

Cheers,
David

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot of odds ratios obtained from a logistic model

2010-02-06 Thread Steve Lianoglou

Hi,

On Sat, Feb 6, 2010 at 1:28 PM, gepeto  wrote:
> Hi all!
>
> I am trying to develop a plot a figure in which I would like to show
> the odds ratios obtained from a logistic model. I  have tried with the
> dotplot option but no success. Could you help me?

You can help us help you by showing the code you tried with dotplot
that didn't turn out as you expected ... it's easier to help you get
where you're going if you showed us the direction you started in ..

>  Is there any option when modelling the logistic model in R?

I'm sure there are many.

Take a look at some of the graphics from:

http://addictedtor.free.fr/graphiques/
http://bm2.genes.nig.ac.jp/RGM2/index.php
http://had.co.nz/ggplot2/

And look at the source code/man pages of the graphs that are of the
type you'd like to make, and take it from there.

Including code + a reproducible example (even if it doesn't work) will
get you a long way ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about bigmemory: releasing RAM from a big.matrix that isn't used anymore

2010-02-06 Thread Matthew Keller

Hi Steve and other R folks,

Thanks for the suggestion. No - that doesn't work. I meant to put that
into my original email. To recap

x <- 
big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
#Gets around the 2^31 issue - yay!
#R takes 18 Gb RAM, so says top

rm(x)  #top says R still takes 18Gb RAM
gc()   #top says R still takes 18Gb RAM

How do I flush the memory? I thought maybe R/bigmemory would give up
the RAM if it was needed elsewhere, but apparently not:

y <- 
big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
#takes *another* 18Gb RAM, and takes it away from several other
processes I had running - OUCH!

Any help would be appreciated.

As an aside, I just want to say "thank you" to the teams developing
bigmemory, ff, and other packages meant to allow users of large
datasets to still use R.

Matt

On Fri, Feb 5, 2010 at 9:27 PM, Steve Lianoglou
 wrote:
> Hi,
>
> On Fri, Feb 5, 2010 at 9:24 PM, Matthew Keller  wrote:
>> Hi all,
>>
>> I'm on a Linux server with 48Gb RAM. I did the following:
>>
>> x <- 
>> big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
>> #Gets around the 2^31 issue - yeah!
>>
>> in Unix, when I hit the "top" command, I see R is taking up about 18Gb
>> RAM, even though the object x is 0 bytes in R. That's fine: that's how
>> bigmemory is supposed to work I guess. My question is how do I return
>> that RAM to the system once I don't want to use x any more? E.g.,
>>
>> rm(x)
>>
>> then "top" in Unix, I expect that my RAM footprint is back ~0, but it
>> remains at 18Gb. How do I return RAM to the system?
>
> Maybe forcing R to do garbage collection might help?
>
> Try calling `gc()` after your call to `rm(x)` and see what `top` tells you.
>
> Did that do the trick?
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting

2010-02-06 Thread David Winsemius

On Feb 6, 2010, at 1:21 PM, David Neu wrote:

Hi,

I have a list of vectors (of varying lengths).  I'd like to sort this
list by applying a function to each pair of vectors in the list and
returning information to sorting routine that let's it know which one
is larger.

To solve problems like this in Common Lisp, the sort function accepts
a function as an argument.  The arguments to this function are two
elements of the list which is being sorted.  The writer of the
function returns t (TRUE in R) when the first argument to the function
is larger than the second and nil (FALSE in R) otherwise.

I'm wondering if there is some way to accomplish this in R.

Here's one way, although there may be options within the netherworld  
of S4 methods that I am not smart enough to navigate:

GT <- function(x,y) x > y
x <- c(8,7,4,2,5,7,5,8,4,5,8,3,0)

> sum(GT(x[1],x))
[1] 10  # so the first element is greater than 10 other elements

x[order(rowSums(sapply(x, GT, y=x)) )]
 # compare the number of other elements one by one and sort by the  
direction of your choice

# [1] 8 8 8 7 7 5 5 5 4 4 3 2 0

#There's probably a method around the "reversal" of the usual sort  
order:

> x[order(rowSums(sapply(x, GT, y=x)) ,decreasing=TRUE)]
 [1] 0 2 3 4 4 5 5 5 7 7 8 8 8

Perhaps use instead negation of the logical matrix that the sapply  
creates:

> x[order(rowSums(!sapply(x, GT, y=x)) )]
 [1] 0 2 3 4 4 5 5 5 7 7 8 8 8

> sortFn <- function(x, FUN=">", ...) x[order(rowSums(!sapply(x, GT,  
y=x)) , ...)]

> sortFn(x, GT)
 [1] 0 2 3 4 4 5 5 5 7 7 8 8 8
> sortFn(x, GT, decreasing=TRUE)
 [1] 8 8 8 7 7 5 5 5 4 4 3 2 0

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] melt on OSX ignores na.rm=T

2010-02-06 Thread hadley wickham

Hi Titus,

The latest version of reshape is 0.8.3 - perhaps upgrading will fix
your problem.

Hadley

On Sat, Feb 6, 2010 at 4:51 AM, Titus von der Malsburg
 wrote:
> Hi list,
>
> I run R on Linux and OSX.  On both systems I use R version 2.9.2 (2009-08-24)
> and reshape version: 0.8.2 (2008-11-04).  When I do a melt with
> na.rm=T on a data frame I get different results on these systems:
>
> library(reshape)
>
> x <- read.table(textConnection("char trial wn
> p E10I13D0  4
> r E10I13D0  4
> a E10I13D0  4
> c E10I13D0  4
> t E10I13D0  4
> i E10I13D0  4
> c E10I13D0  4
> e E10I13D0  4
> d E10I13D0  4
> , E10I13D0 NA"), head=T)
>
> melt(x, measure.vars="char", na.rm=T)
>
> On Linux I get:
>
>  1 E10I13D0  4     char     p
>  2 E10I13D0  4     char     r
>  3 E10I13D0  4     char     a
>  4 E10I13D0  4     char     c
>  5 E10I13D0  4     char     t
>  6 E10I13D0  4     char     i
>  7 E10I13D0  4     char     c
>  8 E10I13D0  4     char     e
>  9 E10I13D0  4     char     d
>
> But on OSX I get:
>
>  1  E10I13D0  4     char     p
>  2  E10I13D0  4     char     r
>  3  E10I13D0  4     char     a
>  4  E10I13D0  4     char     c
>  5  E10I13D0  4     char     t
>  6  E10I13D0  4     char     i
>  7  E10I13D0  4     char     c
>  8  E10I13D0  4     char     e
>  9  E10I13D0  4     char     d
>  10 E10I13D0 NA     char     ,
>
>
> What's causing this glitch?  Is there a simple way to subset lines
> that do not have any NAs?  I'm looking for a line that I can use for
> all data.frames without modification.
>
> As always: thanks a lot!
>
>  Titus
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plot of odds ratios obtained from a logistic model

2010-02-06 Thread gepeto

Hi all!

I am trying to develop a plot a figure in which I would like to show
the odds ratios obtained from a logistic model. I  have tried with the
dotplot option but no success. Could you help me? Is there any option
when modelling the logistic model in R?

Thank you in advance

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sorting

2010-02-06 Thread David Neu

Hi,

I have a list of vectors (of varying lengths).  I'd like to sort this
list by applying a function to each pair of vectors in the list and
returning information to sorting routine that let's it know which one
is larger.

To solve problems like this in Common Lisp, the sort function accepts
a function as an argument.  The arguments to this function are two
elements of the list which is being sorted.  The writer of the
function returns t (TRUE in R) when the first argument to the function
is larger than the second and nil (FALSE in R) otherwise.

I'm wondering if there is some way to accomplish this in R.

Many thanks for any help!

Cheers,
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] optimized R-selection and R-replacement inside a matrix need, strings coerced to factors

2010-02-06 Thread Christine SINOQUET


Hello,

I encounter two problems :

First, I need to modify some huge arrays (2000 individuals x 50 000 
variables).


To format the data, I think I should benefit from optimized R-selection 
and R-replacement inside a matrix and prohibite a naive use of loops.


Thank you in advance for providing information about the following problem :

file A  :
2 000 individuals in rows
50 000 columns corresponding to 50 000 variables : each value belongs to 
{0, 1, 2}



file B :
50 000 variables in rows
1st column : character (A,C,G,T) corresponding to code 0
2nd colomn : character corresponding to code 1

convention:
if A[,j]=0, one wants to replace 0 with  character in  B[j,1] twice
if A[,j]=1, one wants to replace 1 with  character in  B[j,1] and 
character in B[j,2]
if A[,j]=2, one wants to replace 2 with  character in  B[j,2] and 
character in B[j,2]


C <- matrix(0,2000,0) # initialization to void matrix

for(j in 1:2000){

c <- A[,j]
zeros <- which(c==0);
ones <- which(c==1);
twos <- which(c==2);
rm(c)

c1 <- matrix("Z",2000)
c2 <- matrix("Z",2000)
c1[zeros] <-  B$V1[j]; c2[zeros]  <-B$V1[j]
c1[ones]  <-  B$V1[j]; c2[ones]   <-B$V2[j]
c1[twos]  <-  B$V2[j]; c2[twos]   <-B$V2[j]

C <- cbind(C, cbind(c1,c2))
}

I do think some more elaborated solution might exist.

___
However, testing this naive  implementation restricting to 6 individuals 
and variable number 6 (in B), I encounter the problem of character 
strings coerced to numbers.


coding.txt
*allele0 allele1
A C
G T
A G
G C
G T
A T*


c <- data.frame(x=1:6,y=c(0,1,2,0,1,2))
A <- c$y
zeros <- which(A==0);
ones <- which(A==1);
twos <- which(A==2);
rm(A)

c1 <- matrix("Z",6)
c2 <- matrix("Z",6)

B <- read.table(file="coding.txt",h=T)

c1[zeros] <-  B$allele0[6]; c2[zeros]  <-B$allele0[6]
c1[ones]  <-  B$allele0[6]; c2[ones]   <-B$allele1[6]
c1[twos]  <-  B$allele1[6]; c2[twos]   <-B$allele1[6]

results obtained for c1 and c2 :
> c1
[,1]
[1,] "1"
[2,] "1"
[3,] "3"
[4,] "1"
[5,] "1"
[6,] "3"
> c2
[,1]
[1,] "1"
[2,] "3"
[3,] "3"
[4,] "1"
[5,] "3"
[6,] "3"

Thanks in advance for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

No.

On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
 wrote:
> Gabor:
> Can I pass colClasses as a vector to read.csv.sql? Thanks.
> Satish
>
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 9:41 AM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Its just any Windows batch command string that filters stdin to
> stdout.  What the command consists of should not be important.   An
> invocation of perl that runs a perl script that filters stdin to
> stdout might look like this:
>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>
> For an actual example see the source of read.csv2.sql which defaults
> to using a Windows vbscript program as a filter.
>
> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
>  wrote:
>> Jim, Gabor:
>> Thanks so much for the suggestions where I can use read.csv.sql and embed 
>> Perl (or gawk). I just want to mention that I am running on Windows. I am 
>> going to read the documentation the filter argument and see if it can take a 
>> decent sized Perl script and then use its output as input.
>>
>> Suppose that I write a Perl script that parses this fwf file and creates a 
>> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be 
>> a statement or something? If you know the answer, please let me know. 
>> Otherwise, I will try a few things and report back the results.
>>
>> Thanks again.
>> Saitsh
>>
>>
>> -Original Message-
>> From: jim holtman [mailto:jholt...@gmail.com]
>> Sent: Saturday, February 06, 2010 6:16 AM
>> To: Gabor Grothendieck
>> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>>
>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>>  wrote:
>>> Note that the filter= argument on read.csv.sql can be used to pass the
>>> input through a filter written in perl, [g]awk or other language.
>>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>>
>>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>>> width fields, e.g.
>>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>>> making this very easy but perl or whatever you are most used to would
>>> be fine too.
>>>
>>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>>  wrote:
 Hi Gabor:
 Thanks. My files are all in fixed width format. They are a lot of them. It 
 would take me some effort to convert them to CSV. I guess this cannot be 
 avoided? I can write some Perl scripts to convert fixed width format to 
 CSV format and then start with your suggestion. Could you let me know your 
 thoughts on the approach?
 Satish


 -Original Message-
 From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
 Sent: Friday, February 05, 2010 5:16 PM
 To: Vadlamani, Satish {FLNA}
 Cc: r-help@r-project.org
 Subject: Re: [R] Reading large files

 If your problem is just how long it takes to load the file into R try
 read.csv.sql in the sqldf package.  A single read.csv.sql call can
 create an SQLite database and table layout for you, read the file into
 the database (without going through R so R can't slow this down),
 extract all or a portion into R based on the sql argument you give it
 and then remove the database.  See the examples on the home page:
 http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql

 On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
  wrote:
>
> Matthew:
> If it is going to help, here is the explanation. I have an end state in
> mind. It is given below under "End State" header. In order to get there, I
> need to start somewhere right? I started with a 850 MB file and could not
> load in what I think is reasonable time (I waited for an hour).
>
> There are references to 64 bit. How will that help? It is a 4GB RAM 
> machine
> and there is no paging activity when loading the 850 MB file.
>
> I have seen other threads on the same types of questions. I did not see 
> any
> clear cut answers or errors that I could have been making in the process. 
> If
> I am missing something, please let me know. Thanks.
> Satish
>
>
> End State
>> Satish wrote: "at one time I will need to load say 15GB into R"
>
>
> -
> Satish Vadlamani
> --
> View this message in context: 
> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
>>>

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Gabor:
Can I pass colClasses as a vector to read.csv.sql? Thanks.
Satish
 

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Saturday, February 06, 2010 9:41 AM
To: Vadlamani, Satish {FLNA}
Cc: r-help@r-project.org
Subject: Re: [R] Reading large files

Its just any Windows batch command string that filters stdin to
stdout.  What the command consists of should not be important.   An
invocation of perl that runs a perl script that filters stdin to
stdout might look like this:
  read.csv.sql("myfile.dat", filter = "perl myprog.pl")

For an actual example see the source of read.csv2.sql which defaults
to using a Windows vbscript program as a filter.

On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
 wrote:
> Jim, Gabor:
> Thanks so much for the suggestions where I can use read.csv.sql and embed 
> Perl (or gawk). I just want to mention that I am running on Windows. I am 
> going to read the documentation the filter argument and see if it can take a 
> decent sized Perl script and then use its output as input.
>
> Suppose that I write a Perl script that parses this fwf file and creates a 
> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be a 
> statement or something? If you know the answer, please let me know. 
> Otherwise, I will try a few things and report back the results.
>
> Thanks again.
> Saitsh
>
>
> -Original Message-
> From: jim holtman [mailto:jholt...@gmail.com]
> Sent: Saturday, February 06, 2010 6:16 AM
> To: Gabor Grothendieck
> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>
> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>  wrote:
>> Note that the filter= argument on read.csv.sql can be used to pass the
>> input through a filter written in perl, [g]awk or other language.
>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>
>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>> width fields, e.g.
>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>> making this very easy but perl or whatever you are most used to would
>> be fine too.
>>
>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Hi Gabor:
>>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>>> would take me some effort to convert them to CSV. I guess this cannot be 
>>> avoided? I can write some Perl scripts to convert fixed width format to CSV 
>>> format and then start with your suggestion. Could you let me know your 
>>> thoughts on the approach?
>>> Satish
>>>
>>>
>>> -Original Message-
>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>> Sent: Friday, February 05, 2010 5:16 PM
>>> To: Vadlamani, Satish {FLNA}
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Reading large files
>>>
>>> If your problem is just how long it takes to load the file into R try
>>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>>> create an SQLite database and table layout for you, read the file into
>>> the database (without going through R so R can't slow this down),
>>> extract all or a portion into R based on the sql argument you give it
>>> and then remove the database.  See the examples on the home page:
>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>>
>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>>  wrote:

 Matthew:
 If it is going to help, here is the explanation. I have an end state in
 mind. It is given below under "End State" header. In order to get there, I
 need to start somewhere right? I started with a 850 MB file and could not
 load in what I think is reasonable time (I waited for an hour).

 There are references to 64 bit. How will that help? It is a 4GB RAM machine
 and there is no paging activity when loading the 850 MB file.

 I have seen other threads on the same types of questions. I did not see any
 clear cut answers or errors that I could have been making in the process. 
 If
 I am missing something, please let me know. Thanks.
 Satish


 End State
> Satish wrote: "at one time I will need to load say 15GB into R"


 -
 Satish Vadlamani
 --
 View this message in context: 
 http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>>

Re: [R] Alt carachters in R

2010-02-06 Thread gepeto

Thank you!

It has been very helpfull!!


On Jan 27, 9:33 pm, "Peter Alspach" 
wrote:
> Tena koe Gepeto
>
> Not sure if the following helps since you have not been very specific in
> your question:
>
> plot(1:10)
> text(3,2,'*')
> text(5,2,'*5')
> text(4,2,'5*5')
> text(4,5,'17\n4')
> text(6,5,'17\\n4')
>
> Hei kona ra ...
>
> Peter Alspach
>
>
>
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org
> > [mailto:r-help-boun...@r-project.org] On Behalf Of gepeto
> > Sent: Thursday, 28 January 2010 4:59 a.m.
> > To: r-h...@r-project.org
> > Subject: [R] Alt carachters in R
>
> > Hi all,
>
> > I am trying to use and place in a graph some alt characters
> > such as "*".. Could you help me?
> > Thanks
>
> > __
> > r-h...@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.- Hide 
> quoted text -
>
> - Show quoted text -

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading csv files

2010-02-06 Thread analys...@hotmail.com



On Feb 5, 7:16 pm, Jim Lemon  wrote:
> On 02/06/2010 09:05 AM, analys...@hotmail.com wrote:
>
>
>
>
>
> > On Feb 5, 8:57 am, Barry Rowlingson
> > wrote:
> >> On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com
>
> >>   wrote:
> >>> the csv files are downloaded from a database and it looks like some
> >>> character fields contain the CR-LF sequence within them.
>
> >>> This causes R to see a new record/row and the number of rows it sees
> >>> is different (usually higher) from the number of rows actually
> >>> extracted.
>
> >>   Hard to tell without an example, but I just tried this in a file:
>
> >> 1,2,"this
> >> is a test",99
> >> 2,3,"oneliner",45
>
> >> and:
>
> >>> read.table("test.csv",sep=",")
>
> >>    V1 V2              V3 V4
> >> 1  1  2 this\nis a test 99
> >> 2  2  3        oneliner 45
>
> >> seemed to work. But if your strings aren't "quoted" (hard to tell
> >> without an example) then you might have to find another way. Hard to
> >> tell without an example.
>
> >> Barry
>
> >> __
> >> r-h...@r-project.org mailing 
> >> listhttps://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> > Here is a Hex dump (please igmore the '>' at the start of each line) -
> > of the file that results from extracting two rows.
>
> >> EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A   ...description..
> >> 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E   "Unknown
> >> 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65    Anytime, Anywhe
> >> 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F   re Learning >> 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65>..  The
> >> 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F    answer is Unkno
> >> 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75   wn.  you
> >> 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66    can start and f
> >> 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68   inish in less th
> >> 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73   en 17 months. >> 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C   trong>  ..<
> >> 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61   br />..Unknown a
> >> 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F   bout ensuring yo
> >> 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A   u learn ."&.
>
> > R, Fortran and Excel see five lines, but the database has only two
> > lines.
>
> Okay, you have five CR-LF pairs with two being EORs. It looks like the
> CR-LF is the EOR sequence, so it should be possible to preserve
> those while changing the others to something like "~" or deleting them.
> As I said previously, the regexperts can work out a way to distinguish
> the CR-LF pairs that are _not_ in an EOR sequence.
>
> You might want to think about dumping the control characters as well.
>
> Jim
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.- Hide 
> quoted text -
>

I am sure other sequences cause a false EOR also.  The false EORs are
CRLF sequences are within commas - I don't know if R can read a fixed
number of bytes regardless of EOR markers. If it can, it should be
possible to assemble the true database rows from the bytes read in.
> - Show quoted text -

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] limit to p-value using t.test()

2010-02-06 Thread Michael Erickson

On Sat, Feb 6, 2010 at 8:53 AM, Pete Shepard  wrote:
> I am using t-test to check if the difference between two populations is
> significant. I have a large N=20,000, 10,000 in each population. I compare a
> few different populations with each other and though I get different t-scores,
> I get the same  p-value of 10^-16 which seems like the limit for this
> function. Is this true and is so, is there a workaround to get a more
> sensitive/accurate p-value?

Three comments --

First, with a given value of t and the df for your test, you can get
p-values smaller than  2.2e-16 by plugging that information into pt().

> pt(500, df=10, lower.tail=FALSE)
[1] 1.259769e-23
> pt(1500, df=10, lower.tail=FALSE)
[1] 2.133778e-28

Second, if these are *populations* then a t-test is inappropriate.
Just compute the means, and if they do not equal one another, then the
population means are different.  All the statistical tests that I can
think of try to make and place bounds on inferences about the
population based upon samples drawn from those populations.  If you
have the populations, this makes no sense.  It seems like you need to
decide what kinds of differences are meaningful, and then check to see
if the population differences meet those criteria.

Third, why do you want a more accurate p-value?  The only reason I can
think of is using Rosenthal & Rubin's method to compute effect sizes
from a p-value, but again, if you have the populations, you can
compute effect sizes directly.

Good luck!

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] limit to p-value using t.test()

2010-02-06 Thread Pete Shepard

Hello,

I am using t-test to check if the difference between two poulations is
significant. I have a large N=20,000, 10,000 in each population. I compare a
few different poulations with eachother and though I get different t-scores,
I get the same  p-value of 10^-16 which seems like the limit for this
function. Is this true and is so, is there a workarounf to get a more
sensitive/accurate p-value?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] time serie : question about date time class

2010-02-06 Thread Laurent Rhelp


jim holtman a écrit :


Depending on how you are using POSIXct, you accuracy is limited to the
microsecond level.  It is stored as a floating point number with 54
bit of accuracy (~16 digits) and currently the number of seconds since
1/1/1970 is 10 digits, so with microseconds adding 6 more, you are at
the limit:

 


x <- Sys.time()
x
   


[1] "2010-02-06 09:34:58 EST"
 


unclass(x)
   


[1] 1265466899
 


x
   


[1] "2010-02-06 09:34:58 EST"
 


y <- x+.01
x-y
   


Time difference of -9.536743e-07 secs
 


y <- x+.001
x-y
   


Time difference of 0 secs
 


y <- x+.01  # 1 us
identical(x,y)
   


[1] FALSE
 


y <- x+.001  # 0.1 us
identical(x,y)
   


[1] TRUE
 




On Sat, Feb 6, 2010 at 9:41 AM, Laurent Rhelp  wrote:
 


Gabor Grothendieck a écrit :

   


zoo is independent of time and date class so it does not restrict your
choice of index class.  POSIXct supports sub-microsecond accuracy.
See ?POSIXct .  Simply using the number of microseconds since the
start of the experiment is another possibility.

On Sat, Feb 6, 2010 at 8:25 AM, Laurent Rhelp 
wrote:

 


Dear R-List,

I have the habit of using R for my data processing and I like to use the
power of the lattice package. Presently, I have to manage time series.
So,
in order to work properly I want to discover the zoo package and the
related
methods (since lattice can work with zoo class). But my physical
experiment
lasts only one second and my sampling period is equal to 1 microsecond
(the
date time value is given by the IRIG Card for my data acquisition card).
I
read the R-News june 2004 about the date time Classes in R and there is
no
information about allowing for the microseconds. So do you think it is
really a good idea to try to use the R time series framework (zoo package
for example) with my data ? Or would there be a tip ?

Thank you very much

Laurent

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


   

 


Thank you, I will see in details the POSIXct class.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

   





 

This is a good piece of advice ! Thank you very much, I will take care 
of it  in my calculations.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Gabor Grothendieck

Its just any Windows batch command string that filters stdin to
stdout.  What the command consists of should not be important.   An
invocation of perl that runs a perl script that filters stdin to
stdout might look like this:
  read.csv.sql("myfile.dat", filter = "perl myprog.pl")

For an actual example see the source of read.csv2.sql which defaults
to using a Windows vbscript program as a filter.

On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
 wrote:
> Jim, Gabor:
> Thanks so much for the suggestions where I can use read.csv.sql and embed 
> Perl (or gawk). I just want to mention that I am running on Windows. I am 
> going to read the documentation the filter argument and see if it can take a 
> decent sized Perl script and then use its output as input.
>
> Suppose that I write a Perl script that parses this fwf file and creates a 
> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be a 
> statement or something? If you know the answer, please let me know. 
> Otherwise, I will try a few things and report back the results.
>
> Thanks again.
> Saitsh
>
>
> -Original Message-
> From: jim holtman [mailto:jholt...@gmail.com]
> Sent: Saturday, February 06, 2010 6:16 AM
> To: Gabor Grothendieck
> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>
> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>  wrote:
>> Note that the filter= argument on read.csv.sql can be used to pass the
>> input through a filter written in perl, [g]awk or other language.
>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>
>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>> width fields, e.g.
>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>> making this very easy but perl or whatever you are most used to would
>> be fine too.
>>
>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>  wrote:
>>> Hi Gabor:
>>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>>> would take me some effort to convert them to CSV. I guess this cannot be 
>>> avoided? I can write some Perl scripts to convert fixed width format to CSV 
>>> format and then start with your suggestion. Could you let me know your 
>>> thoughts on the approach?
>>> Satish
>>>
>>>
>>> -Original Message-
>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>> Sent: Friday, February 05, 2010 5:16 PM
>>> To: Vadlamani, Satish {FLNA}
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Reading large files
>>>
>>> If your problem is just how long it takes to load the file into R try
>>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>>> create an SQLite database and table layout for you, read the file into
>>> the database (without going through R so R can't slow this down),
>>> extract all or a portion into R based on the sql argument you give it
>>> and then remove the database.  See the examples on the home page:
>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>>
>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>>  wrote:

 Matthew:
 If it is going to help, here is the explanation. I have an end state in
 mind. It is given below under "End State" header. In order to get there, I
 need to start somewhere right? I started with a 850 MB file and could not
 load in what I think is reasonable time (I waited for an hour).

 There are references to 64 bit. How will that help? It is a 4GB RAM machine
 and there is no paging activity when loading the 850 MB file.

 I have seen other threads on the same types of questions. I did not see any
 clear cut answers or errors that I could have been making in the process. 
 If
 I am missing something, please let me know. Thanks.
 Satish


 End State
> Satish wrote: "at one time I will need to load say 15GB into R"


 -
 Satish Vadlamani
 --
 View this message in context: 
 http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

_

[R] How to open gmail accounts

2010-02-06 Thread Velappan Periasamy

How to open two gmail accounts with different log in name and password
with in R?.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread Vadlamani, Satish {FLNA}

Jim, Gabor:
Thanks so much for the suggestions where I can use read.csv.sql and embed Perl 
(or gawk). I just want to mention that I am running on Windows. I am going to 
read the documentation the filter argument and see if it can take a decent 
sized Perl script and then use its output as input.

Suppose that I write a Perl script that parses this fwf file and creates a CSV 
file. Can I embed this within the read.csv.sql call? Or, can it only be a 
statement or something? If you know the answer, please let me know. Otherwise, 
I will try a few things and report back the results.

Thanks again.
Saitsh
 

-Original Message-
From: jim holtman [mailto:jholt...@gmail.com] 
Sent: Saturday, February 06, 2010 6:16 AM
To: Gabor Grothendieck
Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
Subject: Re: [R] Reading large files

In perl the 'unpack' command makes it very easy to parse fixed fielded data.

On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
 wrote:
> Note that the filter= argument on read.csv.sql can be used to pass the
> input through a filter written in perl, [g]awk or other language.
> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>
> gawk has the FIELDWIDTHS variable for automatically parsing fixed
> width fields, e.g.
> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
> making this very easy but perl or whatever you are most used to would
> be fine too.
>
> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Hi Gabor:
>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>> would take me some effort to convert them to CSV. I guess this cannot be 
>> avoided? I can write some Perl scripts to convert fixed width format to CSV 
>> format and then start with your suggestion. Could you let me know your 
>> thoughts on the approach?
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Friday, February 05, 2010 5:16 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> If your problem is just how long it takes to load the file into R try
>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>> create an SQLite database and table layout for you, read the file into
>> the database (without going through R so R can't slow this down),
>> extract all or a portion into R based on the sql argument you give it
>> and then remove the database.  See the examples on the home page:
>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>
>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>  wrote:
>>>
>>> Matthew:
>>> If it is going to help, here is the explanation. I have an end state in
>>> mind. It is given below under "End State" header. In order to get there, I
>>> need to start somewhere right? I started with a 850 MB file and could not
>>> load in what I think is reasonable time (I waited for an hour).
>>>
>>> There are references to 64 bit. How will that help? It is a 4GB RAM machine
>>> and there is no paging activity when loading the 850 MB file.
>>>
>>> I have seen other threads on the same types of questions. I did not see any
>>> clear cut answers or errors that I could have been making in the process. If
>>> I am missing something, please let me know. Thanks.
>>> Satish
>>>
>>>
>>> End State
 Satish wrote: "at one time I will need to load say 15GB into R"
>>>
>>>
>>> -
>>> Satish Vadlamani
>>> --
>>> View this message in context: 
>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer Error message

2010-02-06 Thread Douglas Bates

On Sat, Feb 6, 2010 at 4:45 AM, Martin Bulla  wrote:
> Does anybody knows what this error message means: Error in object$terms : $
> operator not defined for this S4 class

The error message means what it says and it doesn't come from lmer, it
comes from the drop1 function being applied to a model fit by lmer.
You are assuming that you can apply drop1 to an lmer model and you
can't.  You need to do the modifications of the model formula by hand
because an lmer formula contains terms that would not be meaningful
for drop1.

> I have peformed the following steps:
>
>
>> library(lattice)
>
>> library(Matrix)
>
>> library(lme4)
>
>> inkm inkm$Gamie glm.incm drop1(glm.incm,test="Ch") Error in object$terms :
>> $ operator not defined for this S4 class
>
>
> Your suggestin would be of a greatl help to me,
> Martin
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to suppress vector indexes in printout

2010-02-06 Thread jim holtman

You can always create your own output function.  Here one way to do it:

> x <- runif(20)
> cat(strwrap(paste(x, collapse=' ')), sep='\n')
0.482080115471035 0.599565825425088 0.493541307048872
0.186217601411045 0.827373318606988
0.668466738192365 0.79423986072652 0.107943625887856 0.723710946040228
0.411274429643527
0.820946294115856 0.647060193819925 0.78293276228942 0.553036311641335
0.529719580197707
0.789356231689453 0.023331202333793 0.477230065036565 0.7323137386702
0.692731556482613
> # to get same size fields
> cat(strwrap(paste(sprintf("%.6f", x), collapse=' ')), sep='\n')
0.482080 0.599566 0.493541 0.186218 0.827373 0.668467 0.794240
0.107944 0.723711 0.411274
0.820946 0.647060 0.782933 0.553036 0.529720 0.789356 0.023331
0.477230 0.732314 0.692732
>


On Sat, Feb 6, 2010 at 9:34 AM, zumar  wrote:
>
> I tried this, but I've got a printout in single line or with 'fill' option in
> multiline like this
> 0.6375758 1.060877 0.2509587 -0.1509616 0.819645 -0.3580455 -0.07430713
> -0.3464005 -2.312149 -0.8428289 0.8717265 -0.7302025 -0.5292043 -0.289512
> -1.231468 0.01108207 -1.811966 0.03652744 0.1809602 1.578322 1.100399
> 0.06806361 -1.062440 -0.1841067 -0.1975336 0.04150302 1.260545 -0.733543
> 0.3275387 -1.279669 -0.2939457 -1.814987 0.6008691 0.207336 1.362387
> 0.5739906
> 1.283922 0.04413182 -1.590986 0.4637798 -0.5791159 -0.1732862 -0.2332275
> It is not good-looking table, but of course, [1],[2] were disappeared.
> How to get this more better formated view:
>   0.637575751  1.060876635   0.250958721 -0.150961598  0.819644944
>  -0.358045501 -0.074307126 -0.346400504 -2.312149002 -0.842828896
>   0.871726468 -0.730202499 -0.529204288 -0.289511989 -1.231467991
>   0.011082075 -1.811966480  0.036527445   0.180960181  1.578321545
>   1.100398735  0.068063611 -1.062439530 -0.184106670 -0.197533642
>   0.041503019  1.260544719 -0.733543065   0.327538693 -1.279668884
> ?
>
>
>
> --
> View this message in context: 
> http://n4.nabble.com/How-to-suppress-vector-indexes-in-printout-tp1471295p1471318.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer Error message

2010-02-06 Thread David Winsemius



On Feb 6, 2010, at 5:45 AM, Martin Bulla wrote:

Does anybody knows what this error message means: Error in object 
$terms : $ operator not defined for this S4 class


I have peformed the following steps:



library(lattice)



library(Matrix)



library(lme4)


inkm inkm$Gamie glm.incm drop1(glm.incm,test="Ch") Error in object 
$terms : $ operator not defined for this S4 class




Is that a complete representation of your console? It doesn't look  
syntactically valid R as it came across to my mail-client, and it  
produces a different error message when pasted into my console.  
Shouldn't there be an operator after "inkm " (perhaps ";" if you  
wanted to see "inkm" before doing work on it)? And what is "inkm"? You  
are asked to offer reproducible examples. Did you want to assign the  
result of drop1() to glm.incm. If so,  a "<-" would be needed, and I  
at least don't see one.


So either your mail-client is not transmitting punctuation the same  
way some ((most?) of us are used to seeing it, or you have come from a  
programming world that is quite different than R.




Your suggestin would be of a greatl help to me,
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] time serie : question about date time class

2010-02-06 Thread jim holtman

Depending on how you are using POSIXct, you accuracy is limited to the
microsecond level.  It is stored as a floating point number with 54
bit of accuracy (~16 digits) and currently the number of seconds since
1/1/1970 is 10 digits, so with microseconds adding 6 more, you are at
the limit:

> x <- Sys.time()
> x
[1] "2010-02-06 09:34:58 EST"
> unclass(x)
[1] 1265466899
> x
[1] "2010-02-06 09:34:58 EST"
> y <- x+.01
> x-y
Time difference of -9.536743e-07 secs
> y <- x+.001
> x-y
Time difference of 0 secs
> y <- x+.01  # 1 us
> identical(x,y)
[1] FALSE
> y <- x+.001  # 0.1 us
> identical(x,y)
[1] TRUE
>


On Sat, Feb 6, 2010 at 9:41 AM, Laurent Rhelp  wrote:
> Gabor Grothendieck a écrit :
>
>> zoo is independent of time and date class so it does not restrict your
>> choice of index class.  POSIXct supports sub-microsecond accuracy.
>> See ?POSIXct .  Simply using the number of microseconds since the
>> start of the experiment is another possibility.
>>
>> On Sat, Feb 6, 2010 at 8:25 AM, Laurent Rhelp 
>> wrote:
>>
>>>
>>> Dear R-List,
>>>
>>> I have the habit of using R for my data processing and I like to use the
>>> power of the lattice package. Presently, I have to manage time series.
>>> So,
>>> in order to work properly I want to discover the zoo package and the
>>> related
>>> methods (since lattice can work with zoo class). But my physical
>>> experiment
>>> lasts only one second and my sampling period is equal to 1 microsecond
>>> (the
>>> date time value is given by the IRIG Card for my data acquisition card).
>>> I
>>> read the R-News june 2004 about the date time Classes in R and there is
>>> no
>>> information about allowing for the microseconds. So do you think it is
>>> really a good idea to try to use the R time series framework (zoo package
>>> for example) with my data ? Or would there be a tip ?
>>>
>>> Thank you very much
>>>
>>> Laurent
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>
> Thank you, I will see in details the POSIXct class.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to suppress vector indexes in printout

2010-02-06 Thread zumar


I tried this, but I've got a printout in single line or with 'fill' option in
multiline like this
0.6375758 1.060877 0.2509587 -0.1509616 0.819645 -0.3580455 -0.07430713 
-0.3464005 -2.312149 -0.8428289 0.8717265 -0.7302025 -0.5292043 -0.289512 
-1.231468 0.01108207 -1.811966 0.03652744 0.1809602 1.578322 1.100399 
0.06806361 -1.062440 -0.1841067 -0.1975336 0.04150302 1.260545 -0.733543 
0.3275387 -1.279669 -0.2939457 -1.814987 0.6008691 0.207336 1.362387
0.5739906 
1.283922 0.04413182 -1.590986 0.4637798 -0.5791159 -0.1732862 -0.2332275 
It is not good-looking table, but of course, [1],[2] were disappeared.
How to get this more better formated view:
   0.637575751  1.060876635   0.250958721 -0.150961598  0.819644944
  -0.358045501 -0.074307126 -0.346400504 -2.312149002 -0.842828896
   0.871726468 -0.730202499 -0.529204288 -0.289511989 -1.231467991
   0.011082075 -1.811966480  0.036527445   0.180960181  1.578321545
   1.100398735  0.068063611 -1.062439530 -0.184106670 -0.197533642
   0.041503019  1.260544719 -0.733543065   0.327538693 -1.279668884
?



-- 
View this message in context: 
http://n4.nabble.com/How-to-suppress-vector-indexes-in-printout-tp1471295p1471318.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Problem

2010-02-06 Thread jim holtman

Have you tried gc() to see if any memory is released?  How big was the
file that you read in?  I don't see any large objects that appear in
your workspace.  Is there some type of processing that you did after
reading in the data?  You might want to intersperse the following
command in your script so that you can track where memory utilization
is going up:

print(memory.size())

I would try this with a smaller dataset size to see what happens.
Take a set of metrics and determine what happens as the size of the
data file is increased.  It is hard to tell without the actual script
to see what, and how, the processing is done.

Again, are there other alternatives that you might want to consider:
using a database, reading in only the columns of data you need,
preprocessing the data into smaller files, etc.  Besides reading in
the data, exactly what do you want to do with it and how much of it is
actually required for the processing?  For example, I have scripts
that only read in the data and then write out the object for later
processing since it is usually the reading and initial processing that
takes a lot of time.  This is another way of partitioning the work.
Anytime I have problems with processing data, I always take a smaller
chunk (cutting it half each time) till I can at least read it in in a
reasonable time.  One of the skills that you have to learn is to how
to debug your programs; not only actual bugs in your script, but
workarounds that may have to be created due to some constraint in the
system(s) that you are using.  This is a good place to practice design
of experiments.

On Sat, Feb 6, 2010 at 8:09 AM, Meenakshi
 wrote:
>
> This is my objects size:
>
>                                   Size     Mode
> asa_Condition               912     list
> asa_GatedCommunity        9,912     list
> asa_Neighbourhood         2,872     list
> asa_Security                832     list
> asa_Storeys                 800     list
> Condition_adju              560     list
> final_Condition             672     list
> final_GatedCommunity      3,936     list
> final_Neighbourhood       1,376     list
> final_Security              608     list
> final_Storeys               616     list
> GatedCommunity_adju       3,000     list
> model_Condition             648     list
> model_GatedCommunity        648     list
> model_Neighbourhood         648     list
> model_Security              648     list
> model_Storeys               640     list
> modeling1             9,157,856     list
> mult                  3,613,576     list
> my.object.size            6,912 function
> Neighbourhood_adju        1,080     list
> Security_adju               512     list
> Storeys_adju                520     list
> **Total              12,809,784  ---
> Warning message:
> In structure(.Internal(object.size(x)), class = "object_size") :
>  Reached total allocation of 1535Mb: see help(memory.size)
> --
> View this message in context: 
> http://n4.nabble.com/Memory-Problem-tp1459740p1471251.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] time serie : question about date time class

2010-02-06 Thread Laurent Rhelp


Gabor Grothendieck a écrit :


zoo is independent of time and date class so it does not restrict your
choice of index class.  POSIXct supports sub-microsecond accuracy.
See ?POSIXct .  Simply using the number of microseconds since the
start of the experiment is another possibility.

On Sat, Feb 6, 2010 at 8:25 AM, Laurent Rhelp  wrote:
 


Dear R-List,

I have the habit of using R for my data processing and I like to use the
power of the lattice package. Presently, I have to manage time series. So,
in order to work properly I want to discover the zoo package and the related
methods (since lattice can work with zoo class). But my physical experiment
lasts only one second and my sampling period is equal to 1 microsecond (the
date time value is given by the IRIG Card for my data acquisition card). I
read the R-News june 2004 about the date time Classes in R and there is no
information about allowing for the microseconds. So do you think it is
really a good idea to try to use the R time series framework (zoo package
for example) with my data ? Or would there be a tip ?

Thank you very much

Laurent

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

   



 


Thank you, I will see in details the POSIXct class.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to suppress vector indexes in printout

2010-02-06 Thread Henrique Dallazuanna

Try this:

cat(x, '\n')

On Sat, Feb 6, 2010 at 12:03 PM, zumar  wrote:
>
> I'm a newbie in R and my question is simple.
> When I type something like this:
>> x=rnorm(10)
>> x
>  [1]  0.5804216 -1.1537118 -0.335  0.7117290 -1.0918811  0.3992606
>  [7] -0.1800837  0.4168152 -0.2077298 -0.2595467
>> 1
> [1] 1
>>
> I'm getting indexes in the first column ([1], [7], etc.)
> How to suppress them temporarily to get this:
>> x=rnorm(10)
>> x
>  0.5804216 -1.1537118 -0.335  0.7117290 -1.0918811  0.3992606
>  -0.1800837  0.4168152 -0.2077298 -0.2595467
>> 1
>  1
>>
> Is there any option to do this?
>
> Thanks for your attention.
>
>
>
>
>
>
> --
> View this message in context: 
> http://n4.nabble.com/How-to-suppress-vector-indexes-in-printout-tp1471295p1471295.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] time serie : question about date time class

2010-02-06 Thread Gabor Grothendieck

zoo is independent of time and date class so it does not restrict your
choice of index class.  POSIXct supports sub-microsecond accuracy.
See ?POSIXct .  Simply using the number of microseconds since the
start of the experiment is another possibility.

On Sat, Feb 6, 2010 at 8:25 AM, Laurent Rhelp  wrote:
> Dear R-List,
>
>  I have the habit of using R for my data processing and I like to use the
> power of the lattice package. Presently, I have to manage time series. So,
> in order to work properly I want to discover the zoo package and the related
> methods (since lattice can work with zoo class). But my physical experiment
> lasts only one second and my sampling period is equal to 1 microsecond (the
> date time value is given by the IRIG Card for my data acquisition card). I
> read the R-News june 2004 about the date time Classes in R and there is no
> information about allowing for the microseconds. So do you think it is
> really a good idea to try to use the R time series framework (zoo package
> for example) with my data ? Or would there be a tip ?
>
> Thank you very much
>
> Laurent
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to suppress vector indexes in printout

2010-02-06 Thread zumar


I'm a newbie in R and my question is simple.
When I type something like this:
> x=rnorm(10)
> x
 [1]  0.5804216 -1.1537118 -0.335  0.7117290 -1.0918811  0.3992606
 [7] -0.1800837  0.4168152 -0.2077298 -0.2595467
> 1
[1] 1
> 
I'm getting indexes in the first column ([1], [7], etc.)
How to suppress them temporarily to get this:
> x=rnorm(10)
> x
 0.5804216 -1.1537118 -0.335  0.7117290 -1.0918811  0.3992606
 -0.1800837  0.4168152 -0.2077298 -0.2595467
> 1
 1
> 
Is there any option to do this?

Thanks for your attention.



 


-- 
View this message in context: 
http://n4.nabble.com/How-to-suppress-vector-indexes-in-printout-tp1471295p1471295.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Problem

2010-02-06 Thread Meenakshi


This is my objects size:   

   Size Mode
asa_Condition   912 list
asa_GatedCommunity9,912 list
asa_Neighbourhood 2,872 list
asa_Security832 list
asa_Storeys 800 list
Condition_adju  560 list
final_Condition 672 list
final_GatedCommunity  3,936 list
final_Neighbourhood   1,376 list
final_Security  608 list
final_Storeys   616 list
GatedCommunity_adju   3,000 list
model_Condition 648 list
model_GatedCommunity648 list
model_Neighbourhood 648 list
model_Security  648 list
model_Storeys   640 list
modeling1 9,157,856 list
mult  3,613,576 list
my.object.size6,912 function
Neighbourhood_adju1,080 list
Security_adju   512 list
Storeys_adju520 list
**Total  12,809,784  ---
Warning message:
In structure(.Internal(object.size(x)), class = "object_size") :
  Reached total allocation of 1535Mb: see help(memory.size)
-- 
View this message in context: 
http://n4.nabble.com/Memory-Problem-tp1459740p1471251.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interactively editing point labels in a graph

2010-02-06 Thread trece por ciento

Thanks, Gavin
I'll have a look to it
Hug

--- On Thu, 2/4/10, Gavin Simpson  wrote:

> From: Gavin Simpson 
> Subject: Re: [R] Interactively editing point labels in a graph
> To: "trece por ciento" 
> Cc: r-help@r-project.org
> Date: Thursday, February 4, 2010, 1:11 PM
> On Tue, 2010-02-02 at 23:57 -0800,
> trece por ciento wrote:
> > Dear experts,
> > I would like to be able to interactively (if possible,
> with mouse and
> > clik) edit point labels in graphs, particularly in
> multivariate
> > graphs, such as the biplots you get after a
> correspondence analysis
> > (with, for example, package ca), where labels tend to
> overlap. The
> > graph aspect ratio is relevant (it needs to be
> mantained). And I'm
> > working with Windows XP.
> > In this kind of graphs points in the graph are
> identified with labels,
> > generally long (see, for example:
> > http://www.white-history.com/Greece_files/hlafreq.jpg),
> and sometimes
> > -as in the example- it is good to group certain points
> within
> > ellipses.
> > Do you know if exists some package able to do this
> task?
> > Thanks in advance,
> > Hug
> 
> If you can live with the ecological overtones of the vegan
> package, try
> cca() to fit your CA model and then orditkplot() to fiddle
> with the
> ordination labels etc.
> 
> HTH
> 
> G
> 
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson         
>    [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f]
> +44 (0)20 7679 0565
>  Pearson Building,         
>    [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w]
> http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.           
>      [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interactively editing point labels in a graph

2010-02-06 Thread trece por ciento

Many thanks, Felix
It worked, simply importing the emf into PowerPoint!
By the way, as you are the maintainer of playwith, a question: Why is playwith 
unable to cope with it?
I liked very much the playwith option because it is easy to use, and has all 
the basic capabilities that I need.
Best regards,
Hug

--- On Wed, 2/3/10, Felix Andrews  wrote:

> From: Felix Andrews 
> Subject: Re: [R] Interactively editing point labels in a graph
> To: "trece por ciento" 
> Cc: "Liviu Andronic" , r-help@r-project.org
> Date: Wednesday, February 3, 2010, 4:51 PM
> For your situation, perhaps the best
> option is to save the plot in a
> vector format like WMF, PDF or SVG, and open it with an
> external
> editor. Inkscape is a good one.
> 
> 
> On 4 February 2010 06:46, trece por ciento 
> wrote:
> > Thanks, Liviu
> > In a first look it seems OK. Two questions:
> > 1. Playwith accept directly the plots created by the
> ca package, but it seems unable to identify the point
> labels
> > For example:
> > data(smoke)
> > smoke
> > ca(smoke)
> > plot(ca(smoke))
> > playwith(plot(ca(smoke)))
> > Then, if I try to identify a label playwith gives the
> message "Sorry, can not guess the data point coordinates.
> Please contact the maintainer with suggestions".
> > If I ask to select the label from a table playwith
> sends the following message to RGui: "Error in
> data.frame(..., check.names = FALSE) :
> > arguments imply differing number of rows: 2, 0"
> > 2. Can playwith draw ellipses or any other figure
> around selected points?
> >
> > (For the first question it seems my fault, but I don't
> know how to fix it)
> >
> > Hug
> >
> > --- On Wed, 2/3/10, Liviu Andronic 
> wrote:
> >
> >> From: Liviu Andronic 
> >> Subject: Re: [R] Interactively editing point
> labels in a graph
> >> To: "trece por ciento" 
> >> Cc: r-help@r-project.org
> >> Date: Wednesday, February 3, 2010, 3:49 AM
> >> Hello
> >>
> >> On 2/3/10, trece por ciento 
> >> wrote:
> >> > Dear experts,
> >> >  I would like to be able to interactively
> (if
> >> possible, with mouse and clik) edit point labels
> in graphs,
> >> >
> >> Try playwith.
> >> Liviu
> >>
> >> > particularly in multivariate graphs, such as
> the
> >> biplots you get after a correspondence analysis
> (with, for
> >> example, package ca), where labels tend to
> overlap. The
> >> graph aspect ratio is relevant (it needs to be
> mantained).
> >> And I'm working with Windows XP.
> >> >  In this kind of graphs points in the graph
> are
> >> identified with labels, generally long (see, for
> example: http://www.white-history.com/Greece_files/hlafreq.jpg),
> >> and sometimes -as in the example- it is good to
> group
> >> certain points within ellipses.
> >> >  Do you know if exists some package able to
> do
> >> this task?
> >> >  Thanks in advance,
> >> >  Hug
> >> >
> >> > 
> __
> >> >  R-help@r-project.org
> >> mailing list
> >> >  https://stat.ethz.ch/mailman/listinfo/r-help
> >> >  PLEASE do read the posting guide 
> >> >http://www.R-project.org/posting-guide.html
> >> >  and provide commented, minimal,
> self-contained,
> >> reproducible code.
> >> >
> >>
> >>
> >> --
> >> Do you know how to read?
> >> http://www.alienetworks.com/srtest.cfm
> >> http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
> >> Do you know how to write?
> >> http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
> >>
> >
> >
> >
> >
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> 
> -- 
> Felix Andrews / 安福立
> Postdoctoral Fellow
> Integrated Catchment Assessment and Management (iCAM)
> Centre
> Fenner School of Environment and Society [Bldg 48a]
> The Australian National University
> Canberra ACT 0200 Australia
> M: +61 410 400 963
> T: + 61 2 6125 4670
> E: felix.andr...@anu.edu.au
> CRICOS Provider No. 00120C
> -- 
> http://www.neurofractal.org/felix/
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problems with SPC charts in R

2010-02-06 Thread Bart Joosen


It looks like there is an NA created, and then used to calculate stdev.
But it's inside function, and these seems all wrapped functions, where one
function calls another, which calls another, which calls .
I suggest you contact the maintainer of the package, send him the data and
your error.
To find his e-mail address see ?qcc.
(If you solved your problem, please let us know)


Bart

-- 
View this message in context: 
http://n4.nabble.com/problems-with-SPC-charts-in-R-tp1467901p1471287.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why does smoothScatter clip when xlim and ylim increased?

2010-02-06 Thread Duncan Murdoch

On 06/02/2010 7:51 AM, Jennifer Lyon wrote:
> Hi:
>
> Is there a way to get smoothScatter to not clip when I increase the 
xlim and

> ylim parameters?
> Consider the following example:
>
> set.seed(17)
> x1<-rnorm(100)
> x2<-rnorm(100)
> smoothScatter(x1,x2)
>
> #Now if I increase xlim and ylim notice that the plot seems to be 
clipped at

> the former xlim, and ylim boundaries:
>
> smoothScatter(x1,x2, xlim=c(-5,5), ylim=c(-5,5))

If you follow the links on the help page, you'll see that smoothScatter 
uses bkde2D, which has a range.x argument to control the range of the 
smoothing.  The smoothScatter function never passes the xlim and ylim 
values to bkde2D, only to the plotting functions, presumably because the 
author expected you to use them to limit the range, not extend it.

You can get the behaviour you want with specified xlim and ylim by 
modifying one line in smoothScatter:

map <- grDevices:::.smoothScatterCalcDensity(x, nbin, bandwidth)

should become

map <- grDevices:::.smoothScatterCalcDensity(x, nbin, bandwidth, 
list(xlim, ylim))

(You can use fix(smoothScatter) to edit your own local copy of 
smoothScatter and make this change.)

However, this messes up the default plot, so a better patch would be 
needed to permanently fix this.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] time serie : question about date time class

2010-02-06 Thread Laurent Rhelp


Dear R-List,

  I have the habit of using R for my data processing and I like to use 
the power of the lattice package. Presently, I have to manage time 
series. So, in order to work properly I want to discover the zoo package 
and the related methods (since lattice can work with zoo class). But my 
physical experiment lasts only one second and my sampling period is 
equal to 1 microsecond (the date time value is given by the IRIG Card 
for my data acquisition card). I read the R-News june 2004 about the 
date time Classes in R and there is no information about allowing for 
the microseconds. So do you think it is really a good idea to try to use 
the R time series framework (zoo package for example) with my data ? Or 
would there be a tip ?


Thank you very much

Laurent

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Why does smoothScatter clip when xlim and ylim increased?

2010-02-06 Thread Jennifer Lyon

Hi:

Is there a way to get smoothScatter to not clip when I increase the xlim and
ylim parameters?
Consider the following example:

set.seed(17)
x1<-rnorm(100)
x2<-rnorm(100)
smoothScatter(x1,x2)

#Now if I increase xlim and ylim notice that the plot seems to be clipped at
the former xlim, and ylim boundaries:

smoothScatter(x1,x2, xlim=c(-5,5), ylim=c(-5,5))

Thanks.

Jen


sessionInfo()
R version 2.10.1 (2009-12-14)
i686-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  utils datasets  grDevices methods   base

other attached packages:
[1] RColorBrewer_1.0-2

loaded via a namespace (and not attached):
[1] KernSmooth_2.23-3

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SimpleR and UsingR

2010-02-06 Thread j verzani

Uwe Dippel  uniten.edu.my> writes:

> 
> Having found the online version of SimpleR, I wanted to to download the 
> respective data:
> "The data sets for these notes are available from the CSI math 
> department (http://www.math.csi.cuny.edu/Statistics/R/simpleR)
> and must be installed prior to this."
> 

Sorry for the confusion. You install the UsingR package and then load the same
package. Try instead:

install.packages("UsingR")
library("UsingR")

--John

>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

2010-02-06 Thread jim holtman

In perl the 'unpack' command makes it very easy to parse fixed fielded data.

On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
 wrote:
> Note that the filter= argument on read.csv.sql can be used to pass the
> input through a filter written in perl, [g]awk or other language.
> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>
> gawk has the FIELDWIDTHS variable for automatically parsing fixed
> width fields, e.g.
> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
> making this very easy but perl or whatever you are most used to would
> be fine too.
>
> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>  wrote:
>> Hi Gabor:
>> Thanks. My files are all in fixed width format. They are a lot of them. It 
>> would take me some effort to convert them to CSV. I guess this cannot be 
>> avoided? I can write some Perl scripts to convert fixed width format to CSV 
>> format and then start with your suggestion. Could you let me know your 
>> thoughts on the approach?
>> Satish
>>
>>
>> -Original Message-
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Friday, February 05, 2010 5:16 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> If your problem is just how long it takes to load the file into R try
>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>> create an SQLite database and table layout for you, read the file into
>> the database (without going through R so R can't slow this down),
>> extract all or a portion into R based on the sql argument you give it
>> and then remove the database.  See the examples on the home page:
>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>
>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>  wrote:
>>>
>>> Matthew:
>>> If it is going to help, here is the explanation. I have an end state in
>>> mind. It is given below under "End State" header. In order to get there, I
>>> need to start somewhere right? I started with a 850 MB file and could not
>>> load in what I think is reasonable time (I waited for an hour).
>>>
>>> There are references to 64 bit. How will that help? It is a 4GB RAM machine
>>> and there is no paging activity when loading the 850 MB file.
>>>
>>> I have seen other threads on the same types of questions. I did not see any
>>> clear cut answers or errors that I could have been making in the process. If
>>> I am missing something, please let me know. Thanks.
>>> Satish
>>>
>>>
>>> End State
 Satish wrote: "at one time I will need to load say 15GB into R"
>>>
>>>
>>> -
>>> Satish Vadlamani
>>> --
>>> View this message in context: 
>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Problem

2010-02-06 Thread jim holtman

Here is a function I use to get the size of the objects in my
workspace.  Let us know the output of this command

my.object.size <- function (pos = 1, sorted = F)
{
.result <- sapply(ls(pos = pos, all.names = TRUE), function(..x)
object.size(eval(as.symbol(..x
if (sorted) {
.result <- rev(sort(.result))
}
.ls <- as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result)))
names(.ls) <- "Size"
.ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
format = "f")
.ls$Mode <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
mode(eval(as.symbol(x),
"---")
.ls
}


You will get something like this:

> my.object.size()
 SizeMode
.my.env28 environment
.Random.seed2,528 numeric
.required  72   character
my.object.size  6,712function
x   6,712   character
**Total16,052 ---
>




On Sat, Feb 6, 2010 at 4:51 AM, Meenakshi
 wrote:
>
> Hi,
> I am using R 10.2.1 version.
>
> Before run any statement/functions the gc report is:
>         used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 124352  3.4     35  9.4   35  9.4
> Vcells  81237  0.7     786432  6.0   310883  2.4
>
> After I run the repeat statement, I got the following error message:
>
> Error: cannot allocate vector of size 100 Kb
> In addition: There were 50 or more warnings (use warnings() to see the first
> 50)
>
> Finally I have 22 objects. All are 3 columns and within 50 rows only. I
> don't know its size.
>
> I gave final gc report below:(That meas after got error messange)
>            used   (Mb) gc trigger   (Mb)  max used   (Mb)
> Ncells    322451    8.7     597831   16.0    597831   16.0
> Vcells 194014676 1480.3  285240685 2176.3 198652226 1515.6
>
> Please give solution to me.
>
>
>
> --
> View this message in context: 
> http://n4.nabble.com/Memory-Problem-tp1459740p1471138.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lmer Error message

2010-02-06 Thread Martin Bulla

Does anybody knows what this error message means: 
Error in object$terms : $ operator not defined for this S4 class


I have peformed the following steps:



library(lattice)



library(Matrix)



library(lme4)



inkm inkm$Gamie glm.incm drop1(glm.incm,test="Ch") Error in object$terms : $ 
operator not defined for this S4 class



Your suggestin would be of a greatl help to me,
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] melt on OSX ignores na.rm=T

2010-02-06 Thread Titus von der Malsburg

Hi list,

I run R on Linux and OSX.  On both systems I use R version 2.9.2 (2009-08-24)
and reshape version: 0.8.2 (2008-11-04).  When I do a melt with
na.rm=T on a data frame I get different results on these systems:

library(reshape)

x <- read.table(textConnection("char trial wn
p E10I13D0  4
r E10I13D0  4
a E10I13D0  4
c E10I13D0  4
t E10I13D0  4
i E10I13D0  4
c E10I13D0  4
e E10I13D0  4
d E10I13D0  4
, E10I13D0 NA"), head=T)

melt(x, measure.vars="char", na.rm=T)

On Linux I get:

  1 E10I13D0  4 char p
  2 E10I13D0  4 char r
  3 E10I13D0  4 char a
  4 E10I13D0  4 char c
  5 E10I13D0  4 char t
  6 E10I13D0  4 char i
  7 E10I13D0  4 char c
  8 E10I13D0  4 char e
  9 E10I13D0  4 char d

But on OSX I get:

  1  E10I13D0  4 char p
  2  E10I13D0  4 char r
  3  E10I13D0  4 char a
  4  E10I13D0  4 char c
  5  E10I13D0  4 char t
  6  E10I13D0  4 char i
  7  E10I13D0  4 char c
  8  E10I13D0  4 char e
  9  E10I13D0  4 char d
  10 E10I13D0 NA char ,


What's causing this glitch?  Is there a simple way to subset lines
that do not have any NAs?  I'm looking for a line that I can use for
all data.frames without modification.

As always: thanks a lot!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SimpleR and UsingR

2010-02-06 Thread Uwe Dippel

Having found the online version of SimpleR, I wanted to to download the 
respective data:
"The data sets for these notes are available from the CSI math 
department (http://www.math.csi.cuny.edu/Statistics/R/simpleR)

and must be installed prior to this."

There it says:
"The simpleR package is now inside the UsingR package available from 
CRAN. In the Windows GUI, this may be installed from a menubar item. 
Otherwise, the following command > install.packages("UsingR") will work ..."

I did, and got a package of
Content type 'application/x-gzip' length 502288 bytes (490 Kb) 
downloaded and installed. Alas, the command given in the notes:

> library("Simple")
Error in library("Simple") : there is no package called 'Simple'
seems not to work. But when I try
> installed.packages()
UsingR shows up.

Can anyone please enlighten me, what is going on here? Thanks,

Uwe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Problem

2010-02-06 Thread Meenakshi


Hi,
After get error message,
My main file size is 1.05MB.
Other objects are within 400bytes only.
Thanks.

-- 
View this message in context: 
http://n4.nabble.com/Memory-Problem-tp1459740p1471153.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random number quality

2010-02-06 Thread Patrick Burns


A couple comments.

Although pseudo-random numbers were originally
used because of necessity rather than choice,
there is a definite upside to using them.  That
upside is that the computations become reproducible
if you set the seed first (see 'set.seed').

I tend to encourage skepticism at pretty much
every turn.  But I find this piece of skepticism
a bit misplaced.  The application that you describe
does not sound at all demanding, and R Core is
populated by some of the best statistical computing
people in the world.

On 05/02/2010 22:04, b k wrote:

Hello,

I'm running R 2.10.1 on Windows Vista. I'm selecting a random sample of
several hundred items out of a larger population of several thousand. I
realize there is srswor() in package sampling for exactly this purpose, but
as far as I can tell it uses the native PRNG which may or may not be random
enough. Instead I used the random package which pulls random numbers from
random.org, although in my extended reading  [vignette("random-intro",
package="random")] it seem like that may have problems also.

I'm curious what the general consensus is for random number quality for both
the native built-in PRNG and any alternatives including the random package.

Thanks,
Ben K.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Problem

2010-02-06 Thread Meenakshi


Hi,
I am using R 10.2.1 version.

Before run any statement/functions the gc report is:
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 124352  3.4 35  9.4   35  9.4
Vcells  81237  0.7 786432  6.0   310883  2.4

After I run the repeat statement, I got the following error message:

Error: cannot allocate vector of size 100 Kb
In addition: There were 50 or more warnings (use warnings() to see the first
50)

Finally I have 22 objects. All are 3 columns and within 50 rows only. I
don't know its size.

I gave final gc report below:(That meas after got error messange)
used   (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells3224518.7 597831   16.0597831   16.0
Vcells 194014676 1480.3  285240685 2176.3 198652226 1515.6

Please give solution to me.

 

-- 
View this message in context: 
http://n4.nabble.com/Memory-Problem-tp1459740p1471138.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The KJV

2010-02-06 Thread Jim Lemon


On 02/06/2010 06:57 PM, Charlotte Maia wrote:

Hey all,

Does anyone know if there are any R packages with a copy of the KJV?
I'm guessing the answer is no...

So the next question, and the more important one is:
Does anyone think it would be useful (e.g. for text-mining purposes)?
I know almost nothing about theology,
so I'm not sure what kind of questions theologists might have (that R
could answer).

An alternative, that would achieve a similar result (I think),
would be an R interface to another open source system, such as Sword.


Hi Charlotte,
Try

http://www.gutenberg.org/etext/10

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

83 matches

Mail list logo