Re: [R] Documenting data

2016-06-30 Thread Ista Zahn
On Thu, Jun 30, 2016 at 2:50 PM,   wrote:
> Hi Bert,
> Hi Readers,
>
> I did not know much about attributes in R and how to use them. If it is that 
> flexible you are right and I have learnt something.

It is that flexible, but there is a big limitation that makes them much less
useful than Bert suggests. Extending Bert's example:

somedata <- runif(10)
str(somedata)
num [1:10] 0.9393 0.59204 0.04016 0.00273 0.02146 ...

attr(somedata,"doc") <- "Anything you want to say about the data"
str(somedata)
## atomic [1:10] 0.9393 0.59204 0.04016 0.00273 0.02146 ...
## - attr(*, "doc")= chr "Anything you want to say about the data"

Notice that attaching attributes makes the output of str less informative.

The other main limitation is that attributes tend to get lost when you
manipulate the data:

somedata <- somedata[!is.na(somedata)]
attributes(somedata)
## NULL

Since attributes tend to disappear when you manipulate the data I tend
to avoid attaching them to the data directly.

You can work around this of course, and there are several packages
that do it for you, but the combination of these to drawbacks makes
the attributes system in R less useful for documenting data IMO.

Best,
Ista

>
> Kind regards
>
> Georg
>
>> Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr
>> Von: "Bert Gunter" 
>> An: g.maub...@gmx.de
>> Cc: "Pito Salas" , "R Help" 
>> Betreff: Re: [R] Documenting data
>>
>> I believe Georg's pronouncements are wrong. See inline below.
>>
>> -- Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> "...
>>
>> > Within R there are some limitations for storing the informtation about 
>> > what a variable or a value within a variable means.
>>
>> That is FALSE. There are no limitations. For example, just attach a
>> "doc" attribute to your data that says whatever you wish to about
>> them. e.g.
>>
>> > somedata <- runif(10)
>> > attr(somedata,"doc") <- "Anything you want to say about the data"
>>
>> > attr(somedata,"doc")
>> [1] "Anything you want to say about the data"
>>
>>
>> You can go as crazy as you want to with this, e.g. creating a (S3 or
>> S4 )class "documented" with appropriate methods for printing it from
>> classes that inherit from data frames, lists, etc. See also the
>> roxygen2 package for data documentation and R's ?promptData function
>> for data documentation file in Rd format.
>>
>> R is Turing complete -- so it can do anything any other programming
>> language can do. You could program SAS in R if you wanted. The
>> difference is that SAS has pre-programmed some capabilities that R
>> leaves for users, including contributed packages -- like Sweave,
>> knitr, etc.  You may or may not like this extra flexibility (and extra
>> work, depending on whether someone else has already done the work for
>> you), and efficiency may or may not be an issue; but to say that R has
>> "limitations" is a gross misrepresentation, imho.
>>
>>
>>
>> Possibilities to store this information is in other software packages
>> like SAS or SPSS much broader implemented. In R you can work with
>> meaningful variable names and the data type/class factor which can
>> store mappings between values and value descriptions.
>> >
>> > Example
>> > -- cut --
>> > var1 <- c(rep(1:5, 3))
>> > ds_example <- data.frame(var1)
>> >
>> > var1_labels <- c("1 = Strongly Agree",
>> > "2 = Agree",
>> > "3 = Neither agree/nor disagree",
>> > "4 = Disagree",
>> > "5 = Strongly disagree")
>> >
>> > ds_example[["var1"]] <- factor(ds_example[["var1"]],
>> >levels = c(1, 2, 3, 4, 5),
>> >labels = var1_labels)
>> >
>> > summary(ds_example["var1"])
>> > -- cut --
>> >
>> > In addition you find methods to work with variable labels and value labels 
>> > in the pacakges Hmisc and memisc. They can also produce a thing called 
>> > codebook which contains all variable names, variable labels, values, value 
>> >

Re: [R] Documenting data

2016-06-30 Thread G . Maubach
Hi Bert,
Hi Readers,

I did not know much about attributes in R and how to use them. If it is that 
flexible you are right and I have learnt something.

Kind regards

Georg

> Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr
> Von: "Bert Gunter" 
> An: g.maub...@gmx.de
> Cc: "Pito Salas" , "R Help" 
> Betreff: Re: [R] Documenting data
>
> I believe Georg's pronouncements are wrong. See inline below.
> 
> -- Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> "...
> 
> > Within R there are some limitations for storing the informtation about what 
> > a variable or a value within a variable means.
> 
> That is FALSE. There are no limitations. For example, just attach a
> "doc" attribute to your data that says whatever you wish to about
> them. e.g.
> 
> > somedata <- runif(10)
> > attr(somedata,"doc") <- "Anything you want to say about the data"
> 
> > attr(somedata,"doc")
> [1] "Anything you want to say about the data"
> 
> 
> You can go as crazy as you want to with this, e.g. creating a (S3 or
> S4 )class "documented" with appropriate methods for printing it from
> classes that inherit from data frames, lists, etc. See also the
> roxygen2 package for data documentation and R's ?promptData function
> for data documentation file in Rd format.
> 
> R is Turing complete -- so it can do anything any other programming
> language can do. You could program SAS in R if you wanted. The
> difference is that SAS has pre-programmed some capabilities that R
> leaves for users, including contributed packages -- like Sweave,
> knitr, etc.  You may or may not like this extra flexibility (and extra
> work, depending on whether someone else has already done the work for
> you), and efficiency may or may not be an issue; but to say that R has
> "limitations" is a gross misrepresentation, imho.
> 
> 
> 
> Possibilities to store this information is in other software packages
> like SAS or SPSS much broader implemented. In R you can work with
> meaningful variable names and the data type/class factor which can
> store mappings between values and value descriptions.
> >
> > Example
> > -- cut --
> > var1 <- c(rep(1:5, 3))
> > ds_example <- data.frame(var1)
> >
> > var1_labels <- c("1 = Strongly Agree",
> > "2 = Agree",
> > "3 = Neither agree/nor disagree",
> > "4 = Disagree",
> > "5 = Strongly disagree")
> >
> > ds_example[["var1"]] <- factor(ds_example[["var1"]],
> >levels = c(1, 2, 3, 4, 5),
> >labels = var1_labels)
> >
> > summary(ds_example["var1"])
> > -- cut --
> >
> > In addition you find methods to work with variable labels and value labels 
> > in the pacakges Hmisc and memisc. They can also produce a thing called 
> > codebook which contains all variable names, variable labels, values, value 
> > labels and summaries of the distribution of values within the variables.
> >
> > 3. In addition to this you could structure your script in a modular way 
> > according to the analysis process, e. g.
> > importing, cleaning, preparation for analysis, analysis, reporting. Other 
> > structure may be more sufficient in your case. These modules could have a 
> > number in the file name indicating in which sequence the scripts should be 
> > run.
> >
> > 4. I find it valuable to use a software repository like Github, Sourceforge 
> > or others to keep the revisions save and seucre in case you would like to 
> > go back to a version with code you deleted before and figure out that you 
> > need it now again. The R Studio IDE has an interface to git if you like to 
> > go with that. Good commit message can help you track what has changed. 
> > Commits also help you to prepare precise steps when developing your scripts.
> >
> > 5. I have no experience with Sweave or knitr but you could also compile a 
> > simple documentation through copying comments to an Excel sheet using 
> > R-2-Excel libraries like excel.link or others.
> >
> > Example
> > install.packages("excel.link")
> > library(excel.link)
> > xlc["A1"] <- "Project Documentation"
&

Re: [R] Documenting data

2016-06-30 Thread Bert Gunter
I believe Georg's pronouncements are wrong. See inline below.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


"...

> Within R there are some limitations for storing the informtation about what a 
> variable or a value within a variable means.

That is FALSE. There are no limitations. For example, just attach a
"doc" attribute to your data that says whatever you wish to about
them. e.g.

> somedata <- runif(10)
> attr(somedata,"doc") <- "Anything you want to say about the data"

> attr(somedata,"doc")
[1] "Anything you want to say about the data"


You can go as crazy as you want to with this, e.g. creating a (S3 or
S4 )class "documented" with appropriate methods for printing it from
classes that inherit from data frames, lists, etc. See also the
roxygen2 package for data documentation and R's ?promptData function
for data documentation file in Rd format.

R is Turing complete -- so it can do anything any other programming
language can do. You could program SAS in R if you wanted. The
difference is that SAS has pre-programmed some capabilities that R
leaves for users, including contributed packages -- like Sweave,
knitr, etc.  You may or may not like this extra flexibility (and extra
work, depending on whether someone else has already done the work for
you), and efficiency may or may not be an issue; but to say that R has
"limitations" is a gross misrepresentation, imho.



Possibilities to store this information is in other software packages
like SAS or SPSS much broader implemented. In R you can work with
meaningful variable names and the data type/class factor which can
store mappings between values and value descriptions.
>
> Example
> -- cut --
> var1 <- c(rep(1:5, 3))
> ds_example <- data.frame(var1)
>
> var1_labels <- c("1 = Strongly Agree",
> "2 = Agree",
> "3 = Neither agree/nor disagree",
> "4 = Disagree",
> "5 = Strongly disagree")
>
> ds_example[["var1"]] <- factor(ds_example[["var1"]],
>levels = c(1, 2, 3, 4, 5),
>labels = var1_labels)
>
> summary(ds_example["var1"])
> -- cut --
>
> In addition you find methods to work with variable labels and value labels in 
> the pacakges Hmisc and memisc. They can also produce a thing called codebook 
> which contains all variable names, variable labels, values, value labels and 
> summaries of the distribution of values within the variables.
>
> 3. In addition to this you could structure your script in a modular way 
> according to the analysis process, e. g.
> importing, cleaning, preparation for analysis, analysis, reporting. Other 
> structure may be more sufficient in your case. These modules could have a 
> number in the file name indicating in which sequence the scripts should be 
> run.
>
> 4. I find it valuable to use a software repository like Github, Sourceforge 
> or others to keep the revisions save and seucre in case you would like to go 
> back to a version with code you deleted before and figure out that you need 
> it now again. The R Studio IDE has an interface to git if you like to go with 
> that. Good commit message can help you track what has changed. Commits also 
> help you to prepare precise steps when developing your scripts.
>
> 5. I have no experience with Sweave or knitr but you could also compile a 
> simple documentation through copying comments to an Excel sheet using 
> R-2-Excel libraries like excel.link or others.
>
> Example
> install.packages("excel.link")
> library(excel.link)
> xlc["A1"] <- "Project Documentation"
> xlc["A2"] <- "Step XY"
> xlc["A3"] <- "Some explanation about step xy"
>
> This way you have the documentation in your code and in an external source.
>
> Which approach you chose depends on your experience with R and its libraries 
> as well as the size of your project and the need for documentation.
>
> 6. It can be helpful to store interim results in a format that can be read by 
> non-R-users, e. g. Excel.
>
> 7. Documenting code can be done using roxygen2.
>
> If there are different opinions to my suggestions please say so.
>
> Kind regards
>
> Georg
>
>
>> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr
>> Von: "Pito Salas" 
>> An: r-help@r-project.org
>> Betreff: [R] Documenting data
>>
>> 

Re: [R] Documenting data

2016-06-30 Thread G . Maubach
Hi Pito,
Dear Readers,

as other have already mentioned, there are good practices for documenting code 
and data. I would like to summarize them and add a few not mentioned earlier:

1. You should have always two things: your raw data and your R script/s. The 
raw data is immutable whereas the R script/s produce the results.

2. You might want to distinguish between documentating your CODE and 
documenting your DATA. Documenting code is similar to what you already know 
from your programmng experiences. Documenting data is somewhat different cause 
you store information about the meaning of you data directly in your data.

Example
You have a variable with codes ranging from 1 to 5. But what do they mean? 
Perhaps it could be

1 = Strongly agree
2 = Agree
3 = Neither agree/nor disagree
4 = Disagree
5 = Strongly Disagree

But it could also be the other way round:

1 = Strongly Disagree
2 = Disagree
3 = Nether agree/nor disagree
4 = Agree
5 = Strongly Agree

What the codes in your variable means depends on the systems oder processes you 
derived your data from.

Within R there are some limitations for storing the informtation about what a 
variable or a value within a variable means. Possibilities to store this 
information is in other software packages like SAS or SPSS much broader 
implemented. In R you can work with meaningful variable names and the data 
type/class factor which can store mappings between values and value 
descriptions.

Example
-- cut --
var1 <- c(rep(1:5, 3))
ds_example <- data.frame(var1)

var1_labels <- c("1 = Strongly Agree",
"2 = Agree",
"3 = Neither agree/nor disagree",
"4 = Disagree",
"5 = Strongly disagree")

ds_example[["var1"]] <- factor(ds_example[["var1"]],
   levels = c(1, 2, 3, 4, 5),
   labels = var1_labels)

summary(ds_example["var1"])
-- cut --

In addition you find methods to work with variable labels and value labels in 
the pacakges Hmisc and memisc. They can also produce a thing called codebook 
which contains all variable names, variable labels, values, value labels and 
summaries of the distribution of values within the variables.

3. In addition to this you could structure your script in a modular way 
according to the analysis process, e. g. 
importing, cleaning, preparation for analysis, analysis, reporting. Other 
structure may be more sufficient in your case. These modules could have a 
number in the file name indicating in which sequence the scripts should be run.

4. I find it valuable to use a software repository like Github, Sourceforge or 
others to keep the revisions save and seucre in case you would like to go back 
to a version with code you deleted before and figure out that you need it now 
again. The R Studio IDE has an interface to git if you like to go with that. 
Good commit message can help you track what has changed. Commits also help you 
to prepare precise steps when developing your scripts.

5. I have no experience with Sweave or knitr but you could also compile a 
simple documentation through copying comments to an Excel sheet using R-2-Excel 
libraries like excel.link or others.

Example
install.packages("excel.link")
library(excel.link)
xlc["A1"] <- "Project Documentation"
xlc["A2"] <- "Step XY"
xlc["A3"] <- "Some explanation about step xy"

This way you have the documentation in your code and in an external source.

Which approach you chose depends on your experience with R and its libraries as 
well as the size of your project and the need for documentation.

6. It can be helpful to store interim results in a format that can be read by 
non-R-users, e. g. Excel.

7. Documenting code can be done using roxygen2.

If there are different opinions to my suggestions please say so.

Kind regards

Georg


> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr
> Von: "Pito Salas" 
> An: r-help@r-project.org
> Betreff: [R] Documenting data
>
> I am studying statistics and using R in doing it. I come from software 
> development where we document everything we do.
> 
> As I “massage” my data, adding columns to a frame, computing on other data, 
> perhaps cleaning, I feel the need to document in detail what the meaning, or 
> background, or calculations, or whatever of the data is. After all it is now 
> derived from my raw data (which may have been well documented) but it is 
> “new.” 
> 
> Is this a real problem? Is there a “best practice” to address this?
> 
> Thanks!
> 
> Pito Salas
> Brandeis Computer Science
> Feldberg 131
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mai

Re: [R] Documenting data

2016-06-30 Thread Bert Gunter
Private, since this is a trivial comment. Also, just my opinion, so
feel free to ignore.

Capture it, yes, but not necessarily as a function; just as a script
might do, and the tools mentioned can do this. As others have said,
your instincts are good, and you should just choose the methods that
work best for you.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Jun 30, 2016 at 8:46 AM, Pito Salas  wrote:
> Thanks to you both. I think you’re saying/implying that once I “test drive” a 
> particular bit of cleaning I should capture it in a function which does it 
> reproducibly against the raw data, and that becomes the best documentation 
> for it. That makes sense.
>
> Pito Salas
> Brandeis Computer Science
> Feldberg 131
>
>> On Jun 30, 2016, at 11:44 AM, Robert Baer  wrote:
>>
>> You might look at:
>>
>> http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets
>>
>> You might also, try the  FIle | Compile Notebook  from within R-Studio 
>> (https://www.rstudio.com/) on your well-documented R-scripts to get a nice 
>> reproducible recording/report of data analysis workflow.  Similar 
>> functionality is available from basic R, but involves more work.  There are 
>> many other approaches, but the best choice depends on your precise needs.
>>
>> And, as a programmer, you are probably already familiar with things like:
>> https://google.github.io/styleguide/Rguide.xml
>>
>>
>>
>> On 6/30/2016 9:51 AM, Pito Salas wrote:
>>> I am studying statistics and using R in doing it. I come from software 
>>> development where we document everything we do.
>>>
>>> As I “massage” my data, adding columns to a frame, computing on other data, 
>>> perhaps cleaning, I feel the need to document in detail what the meaning, 
>>> or background, or calculations, or whatever of the data is. After all it is 
>>> now derived from my raw data (which may have been well documented) but it 
>>> is “new.”
>>>
>>> Is this a real problem? Is there a “best practice” to address this?
>>>
>>> Thanks!
>>>
>>> Pito Salas
>>> Brandeis Computer Science
>>> Feldberg 131
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documenting data

2016-06-30 Thread Pito Salas
Thanks to you both. I think you’re saying/implying that once I “test drive” a 
particular bit of cleaning I should capture it in a function which does it 
reproducibly against the raw data, and that becomes the best documentation for 
it. That makes sense.

Pito Salas
Brandeis Computer Science
Feldberg 131

> On Jun 30, 2016, at 11:44 AM, Robert Baer  wrote:
> 
> You might look at:
> 
> http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets
> 
> You might also, try the  FIle | Compile Notebook  from within R-Studio 
> (https://www.rstudio.com/) on your well-documented R-scripts to get a nice 
> reproducible recording/report of data analysis workflow.  Similar 
> functionality is available from basic R, but involves more work.  There are 
> many other approaches, but the best choice depends on your precise needs.
> 
> And, as a programmer, you are probably already familiar with things like:
> https://google.github.io/styleguide/Rguide.xml
> 
> 
> 
> On 6/30/2016 9:51 AM, Pito Salas wrote:
>> I am studying statistics and using R in doing it. I come from software 
>> development where we document everything we do.
>> 
>> As I “massage” my data, adding columns to a frame, computing on other data, 
>> perhaps cleaning, I feel the need to document in detail what the meaning, or 
>> background, or calculations, or whatever of the data is. After all it is now 
>> derived from my raw data (which may have been well documented) but it is 
>> “new.”
>> 
>> Is this a real problem? Is there a “best practice” to address this?
>> 
>> Thanks!
>> 
>> Pito Salas
>> Brandeis Computer Science
>> Feldberg 131
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documenting data

2016-06-30 Thread Bert Gunter
In addition to what others have suggested, see ?history.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Jun 30, 2016 at 7:51 AM, Pito Salas  wrote:
> I am studying statistics and using R in doing it. I come from software 
> development where we document everything we do.
>
> As I “massage” my data, adding columns to a frame, computing on other data, 
> perhaps cleaning, I feel the need to document in detail what the meaning, or 
> background, or calculations, or whatever of the data is. After all it is now 
> derived from my raw data (which may have been well documented) but it is 
> “new.”
>
> Is this a real problem? Is there a “best practice” to address this?
>
> Thanks!
>
> Pito Salas
> Brandeis Computer Science
> Feldberg 131
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documenting data

2016-06-30 Thread Ulrik Stervbo
Vince Buffalo has covers this nicely in his book "Bioinformatics Data
Skills". The original data should stay the original data is immutable and
Vince then suggests that you have a text file in your data directory where
you explain where the data came from and which scripts you used to create a
modified version, when you did this and so on.

I find using roxygen comments and knitr extremely useful for keeping track
of what I intend to do and why because it allows me to export all the
reasoning, summary tables and plots to a format I can share with
collaborators that don't care about the R code for getting there.

HTH
Ulrik


On Thu, 30 Jun 2016 at 17:30 Pito Salas  wrote:

> I am studying statistics and using R in doing it. I come from software
> development where we document everything we do.
>
> As I “massage” my data, adding columns to a frame, computing on other
> data, perhaps cleaning, I feel the need to document in detail what the
> meaning, or background, or calculations, or whatever of the data is. After
> all it is now derived from my raw data (which may have been well
> documented) but it is “new.”
>
> Is this a real problem? Is there a “best practice” to address this?
>
> Thanks!
>
> Pito Salas
> Brandeis Computer Science
> Feldberg 131
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documenting data

2016-06-30 Thread Robert Baer

You might look at:

http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets

You might also, try the  FIle | Compile Notebook  from within R-Studio 
(https://www.rstudio.com/) on your well-documented R-scripts to get a 
nice reproducible recording/report of data analysis workflow.  Similar 
functionality is available from basic R, but involves more work.  There 
are many other approaches, but the best choice depends on your precise 
needs.


And, as a programmer, you are probably already familiar with things like:
https://google.github.io/styleguide/Rguide.xml



On 6/30/2016 9:51 AM, Pito Salas wrote:

I am studying statistics and using R in doing it. I come from software 
development where we document everything we do.

As I “massage” my data, adding columns to a frame, computing on other data, 
perhaps cleaning, I feel the need to document in detail what the meaning, or 
background, or calculations, or whatever of the data is. After all it is now 
derived from my raw data (which may have been well documented) but it is “new.”

Is this a real problem? Is there a “best practice” to address this?

Thanks!

Pito Salas
Brandeis Computer Science
Feldberg 131

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documenting data

2016-06-30 Thread Christopher W Ryan
Pito--

You describe excellent practices.

The R code itself, saved as a script, provides some documentation of how you 
got from original data to wherever you are.

Use # comments liberally. 

Whenever possible, save your raw data, however it was when you got it--avoid 
changing it--make all the changes on the objects in R. 

Have you looked into various "reproducible research" systems for R, like Sweave 
or knitr?  They allow you to include analysis code and text of a manuscript or 
report all together in one file.

Christopher W. Ryan
sent from my phone with BlueMail



On Jun 30, 2016, 11:30, at 11:30, Pito Salas  wrote:
>I am studying statistics and using R in doing it. I come from software
>development where we document everything we do.
>
>As I “massage” my data, adding columns to a frame, computing on other
>data, perhaps cleaning, I feel the need to document in detail what the
>meaning, or background, or calculations, or whatever of the data is.
>After all it is now derived from my raw data (which may have been well
>documented) but it is “new.” 
>
>Is this a real problem? Is there a “best practice” to address this?
>
>Thanks!
>
>Pito Salas
>Brandeis Computer Science
>Feldberg 131
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Documenting data

2016-06-30 Thread Pito Salas
I am studying statistics and using R in doing it. I come from software 
development where we document everything we do.

As I “massage” my data, adding columns to a frame, computing on other data, 
perhaps cleaning, I feel the need to document in detail what the meaning, or 
background, or calculations, or whatever of the data is. After all it is now 
derived from my raw data (which may have been well documented) but it is “new.” 

Is this a real problem? Is there a “best practice” to address this?

Thanks!

Pito Salas
Brandeis Computer Science
Feldberg 131

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.