Re: [R] Request for aid in first R script

2018-11-19 Thread S Ellison
Pointers inline below:

> > Since I'm a newbie on R, I was wondering if you could help me to achieve a
> > small project that I think it's possible with this project (I cant seem to
> > find a similar tool)
> >
> > I have a data file with about 2000 value lines, organized like this:
> >
> > x;y;z;j;
> > ...
> >
> > I want to find diferent correlations (linear regression with
> > Levenberg–Marquardt or least squares) between the x values and a y or z
> > pair. For instance, between x and y.
> >
> > So, what I'm trying to do is:
> >
> > 1) Load the file (is there a limit on the load size? If yes, can I load it
> > in sequence by parts?)
See ?read.table and note that you can define a separator. Using read.table() 
with sep=";" should work
Load limits are memory size; I have read 800,000 lines on a 4Gb system

> > 2) Define 100 sets of 20 values each (also sequence, from x1 to xn: first
> > from x1 to x20, next from x21 to x41, etc.) or process one set at the time
> > in case of file limits in 1)
You can say something like
mydata[i:(i+20), ]
to get row-wise slices of your data, but an R user would perhaps consider 
setting up an ancillary variable using
mydata$chunks <- gl(100,20)
and use a variant of aggregate() or ddply to apply a function to each subset

> > 3) Define a fitting function
er... anything you can write, either as an expression or a function.

> > 4) Use the same function model to find the best fit for each set
Look at, for example, lm for linear models (including polynomials), nls or nlm 
for non-linear models, and a decent book on R for a much, much, much wider 
range, including splines, generalised additive models, generalised linear 
models, mixed effects models (linear and otherwise) ...

> > 5) Save in a file, the coefficients of those fits.
Something like sapply or ddply should be able to give you a table of 
coefficients, especially if you write a wrapper function like
mywrap <- function(x) coef( nls(y~fitfun)) 
to return a vector of coefficients from a chunk x

> > Can this be done accurately with R?
Yes; R has well-characterised numerically stable core functions, which is more 
than can be said for most spreadsheets.

> > It would save me a lot of programming. 
You'll still have to do that, but doing it in R will be a lot faster than C

> > The files will soon have about 1
> > million lines, which is a lot to process.
If you can’t load it all at once, you can use read.table with start and end 
rows.
or you can puch the whole lot to a database and use any of R's database 
packages to read from that; Rmysql and the like.



***
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Request for aid in first R script

2018-11-19 Thread Thierry Onkelinx via R-help
Dear Kepler,

Yes, R can do this all. But this is is to help you when you get stuck, not
to do all the work for you... You are asking basic stuff, so any
introduction book on R should contain sufficient information to get you
going. So please do read on of those first.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkel...@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///




Op ma 19 nov. 2018 om 11:39 schreef Rui Fernandes :

> Good morning
>
> My compliments to all.
>
> Since I'm a newbie on R, I was wondering if you could help me to achieve a
> small project that I think it's possible with this project (I cant seem to
> find a similar tool)
>
> I have a data file with about 2000 value lines, organized like this:
>
> x;y;z;j;
> ...
>
> I want to find diferent correlations (linear regression with
> Levenberg–Marquardt or least squares) between the x values and a y or z
> pair. For instance, between x and y.
>
> So, what I'm trying to do is:
>
> 1) Load the file (is there a limit on the load size? If yes, can I load it
> in sequence by parts?)
> 2) Define 100 sets of 20 values each (also sequence, from x1 to xn: first
> from x1 to x20, next from x21 to x41, etc.) or process one set at the time
> in case of file limits in 1)
> 3) Define a fitting function
> 4) Use the same function model to find the best fit for each set
> 5) Save in a file, the coefficients of those fits.
>
> Can this be done accurately with R?
>
> It would save me a lot of programming. The files will soon have about 1
> million lines, which is a lot to process.
>
> I would apreciate very much if someone could help me.
>
> Kind regards
>
> Kepler
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Request for aid in first R script

2018-11-19 Thread Rui Fernandes
Good morning

My compliments to all.

Since I'm a newbie on R, I was wondering if you could help me to achieve a
small project that I think it's possible with this project (I cant seem to
find a similar tool)

I have a data file with about 2000 value lines, organized like this:

x;y;z;j;
...

I want to find diferent correlations (linear regression with
Levenberg–Marquardt or least squares) between the x values and a y or z
pair. For instance, between x and y.

So, what I'm trying to do is:

1) Load the file (is there a limit on the load size? If yes, can I load it
in sequence by parts?)
2) Define 100 sets of 20 values each (also sequence, from x1 to xn: first
from x1 to x20, next from x21 to x41, etc.) or process one set at the time
in case of file limits in 1)
3) Define a fitting function
4) Use the same function model to find the best fit for each set
5) Save in a file, the coefficients of those fits.

Can this be done accurately with R?

It would save me a lot of programming. The files will soon have about 1
million lines, which is a lot to process.

I would apreciate very much if someone could help me.

Kind regards

Kepler

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.