Re: [R] Data Carpentry - Creating a New SQLite Database

2020-01-10 Thread William Michels via R-help
Hi Phillip,

Skipping to the last few lines of your email, did you download a
program to look at Sqlite databases (independent of R) as listed
below? Maybe that program ("DB Browser for SQLite") and/or the
instructions below can help you locate your database directory:

https://datacarpentry.org/semester-biology/computer-setup/
https://datacarpentry.org/semester-biology/materials/sql-for-dplyr-users/

If you do have that program, and you're still seeing an error, you
might consider looking for similar issues at the appropriate
'datacarpentry' repository on Github (or posting a new issue
yourself):

https://github.com/datacarpentry/R-ecology-lesson/issues

Finally, I really feel you'll benefit from reading over the documents
pertaining to "R Data Import/Export" on the www.r-project.org website.
No disrespect to the people at 'datacarpentry', but you'll find
similar (and possibly, easier) R code to follow at section 4.3.1
'Packages using DBI' :

https://cran.r-project.org/doc/manuals/r-release/R-data.html

HTH, Bill.

W. Michels, Ph.D.




On Fri, Jan 10, 2020 at 10:32 AM Phillip Heinrich  wrote:
>
> Working my way through a tutorial named Data Carpentry 
> (https://datacarpentry.org/R-ecology-lesson/).  for the most part it is 
> excellent but I’m stuck on the very last section 
> (https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html).
>
> First, below are the packages I have loaded:
> [1] "forcats"   "stringr"   "purrr" "readr" "tidyr" "tibble"
> "ggplot2"   "tidyverse" "dbplyr""RMySQL""DBI"
> [12] "dplyr" "RSQLite"   "stats" "graphics"  "grDevices" "utils" 
> "datasets"  "methods"   "base"
>
>
> >
>
>
> Second,
> Second, is the text of the last section of the last chapter titled “Creating 
> a New SQLite Database”.
> Second, below is the text from the tutorial.  The black type is from the 
> tutorial.  The green and blue is the suggested R code.  My comments are in 
> red.
> Creating a new SQLite database
> So far, we have used a previously prepared SQLite database. But we can also 
> use R to create a new database, e.g. from existing csv files. Let’s recreate 
> the mammals database that we’ve been working with, in R. First let’s download 
> and read in the csv files. We’ll import tidyverse to gain access to the 
> read_csv() function.
>
> download.file("https://ndownloader.figshare.com/files/3299483;,
>   "data_raw/species.csv")
> download.file("https://ndownloader.figshare.com/files/10717177;,
>   "data_raw/surveys.csv")
> download.file("https://ndownloader.figshare.com/files/3299474;,
>   "data_raw/plots.csv")
> library(tidyverse)
> species <- read_csv("data_raw/species.csv")No problem here.  I’m pulling 
> three databases from the Web and saving them to a folder on my hard drive. 
> (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") 
> plots <- read_csv("data_raw/plots.csv")Again no problem.  I’m just creating 
> an R data files.  But here is where I loose it.  I’m creating something named 
> my_db_file from another file named portal-database-output with an sqlite 
> extension and then creating my_db from the My_db_file.  Not sure where the 
> sqlite extension file came from. Creating a new SQLite database with dplyr is 
> easy. You can re-use the same command we used above to open an existing 
> .sqlite file. The create = TRUE argument instructs R to create a new, empty 
> database instead.
>
> Caution: When create = TRUE is added, any existing database at the same 
> location is overwritten without warning.
>
> my_db_file <- "data/portal-database-output.sqlite"
> my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database is 
> empty, it doesn’t contain any tables:
>
> my_db#> src:  sqlite 3.29.0 [data/portal-database-output.sqlite]
> #> tbls:To add tables, we copy the existing data.frames into the database one 
> by one:
>
> copy_to(my_db, surveys)
> copy_to(my_db, plots)
> my_dbI can follow the directions to fill in my_db but I have no idea how to 
> access the tables.  The text from the tutorial below says to check the 
> location of our database.  Huh!  Can someone give me some direction.  Thanks.
>
>
>
>
>
> If you check the location of our database you’ll see that data is 
> automatically being written to disk. R and dplyr not only provide easy ways 
> to query existing databases, they also allows you to easily create your own 
> databases from flat files!
>
>
>
> Here is where I loose it.
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

Re: [R] Data Carpentry - Creating a New SQLite Database

2020-01-10 Thread Bert Gunter
Please note that tidyverse packages have their own support resources at
RStudio, whence they came; e.g. here:
https://education.rstudio.com/learn/beginner/
You may also do better asking about issues that concern them at their
support site:  https://support.rstudio.com/hc/en-us
 though, as you already found out, there are folks here who may help also.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jan 10, 2020 at 10:32 AM Phillip Heinrich  wrote:

> Working my way through a tutorial named Data Carpentry (
> https://datacarpentry.org/R-ecology-lesson/).  for the most part it is
> excellent but I’m stuck on the very last section (
> https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html).
>
> First, below are the packages I have loaded:
> [1] "forcats"   "stringr"   "purrr" "readr" "tidyr" "tibble"
>   "ggplot2"   "tidyverse" "dbplyr""RMySQL""DBI"
> [12] "dplyr" "RSQLite"   "stats" "graphics"  "grDevices" "utils"
>"datasets"  "methods"   "base"
>
>
> >
>
>
> Second,
> Second, is the text of the last section of the last chapter titled
> “Creating a New SQLite Database”.
> Second, below is the text from the tutorial.  The black type is from the
> tutorial.  The green and blue is the suggested R code.  My comments are in
> red.
> Creating a new SQLite database
> So far, we have used a previously prepared SQLite database. But we can
> also use R to create a new database, e.g. from existing csv files. Let’s
> recreate the mammals database that we’ve been working with, in R. First
> let’s download and read in the csv files. We’ll import tidyverse to gain
> access to the read_csv() function.
>
> download.file("https://ndownloader.figshare.com/files/3299483;,
>   "data_raw/species.csv")
> download.file("https://ndownloader.figshare.com/files/10717177;,
>   "data_raw/surveys.csv")
> download.file("https://ndownloader.figshare.com/files/3299474;,
>   "data_raw/plots.csv")
> library(tidyverse)
> species <- read_csv("data_raw/species.csv")No problem here.  I’m pulling
> three databases from the Web and saving them to a folder on my hard drive.
> (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv")
> plots <- read_csv("data_raw/plots.csv")Again no problem.  I’m just creating
> an R data files.  But here is where I loose it.  I’m creating something
> named my_db_file from another file named portal-database-output with an
> sqlite extension and then creating my_db from the My_db_file.  Not sure
> where the sqlite extension file came from. Creating a new SQLite database
> with dplyr is easy. You can re-use the same command we used above to open
> an existing .sqlite file. The create = TRUE argument instructs R to create
> a new, empty database instead.
>
> Caution: When create = TRUE is added, any existing database at the same
> location is overwritten without warning.
>
> my_db_file <- "data/portal-database-output.sqlite"
> my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database
> is empty, it doesn’t contain any tables:
>
> my_db#> src:  sqlite 3.29.0 [data/portal-database-output.sqlite]
> #> tbls:To add tables, we copy the existing data.frames into the database
> one by one:
>
> copy_to(my_db, surveys)
> copy_to(my_db, plots)
> my_dbI can follow the directions to fill in my_db but I have no idea how
> to access the tables.  The text from the tutorial below says to check the
> location of our database.  Huh!  Can someone give me some direction.
> Thanks.
>
>
>
>
>
> If you check the location of our database you’ll see that data is
> automatically being written to disk. R and dplyr not only provide easy ways
> to query existing databases, they also allows you to easily create your own
> databases from flat files!
>
>
>
> Here is where I loose it.
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Carpentry - Creating a New SQLite Database

2020-01-10 Thread Ivan Krylov
On Fri, 10 Jan 2020 11:31:58 -0700
"Phillip Heinrich"  wrote:

> below is the text from the tutorial.  The black type is from the
> tutorial.  The green and blue is the suggested R code.  My comments
> are in red

R-help is a plain text mailing list, so the markup has been stripped
off (and since HTML-enabled mail clients don't quite care how the plain
text version of the e-mail looks, some paragraph breaks had to go, too).

> etc.surveys <- read_csv("data_raw/surveys.csv")
> plots <- read_csv("data_raw/plots.csv")

> Again no problem.  I’m just creating an R data files.

Note that it is not files that you are creating by running read_csv(),
but variables (of type "tibble", which is like "data.frame", either of
which should have been covered in earlier chapters in a good tutorial)
in the R environment. The files you downloaded previously are opened
in read only mode and are never changed.

> my_db_file <- "data/portal-database-output.sqlite"

> I’m creating something named my_db_file from another file named
> portal-database-output with an sqlite extension and then creating
> my_db from the My_db_file.

This something is just a text string that happens to contain a *path*
to a file. Just like the variable `greeting` in the following snippet:

greeting <- "Hello world"
print(greeting)

See [1] for more info on character vectors in R.

> Not sure where the sqlite extension file came from.

The authors of the tutorial decided that the file to be created should
be named like this. Feel free to change the extension (or the path) to
anything else: neither R, nor SQLite cares about it much (but the file
manager you use may display a different icon for it or become confused
if you name it .txt or .pdf).

> I can follow the directions to fill in my_db but I have no idea
> how to access the tables.

What exactly do you mean by "access"? At this point my_db should be a
dplyr "src" object, so the tools described in dplyr vignettes [2] should
be applicable. Try calling tbl() on it and passing the name of one of
the tables you have just created. Also try running:

example("src_sqlite")

> The text from the tutorial below says to check the location of our
> database.  Huh!  Can someone give me some direction.

The variable my_db_file contains the location of the file where the
database is stored. This is the same variable that you passed to the
src_sqlite() function.

-- 
Best regards,
Ivan

[1]
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Character-vectors

[2]
https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
https://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data Carpentry - Creating a New SQLite Database

2020-01-10 Thread Phillip Heinrich
Working my way through a tutorial named Data Carpentry 
(https://datacarpentry.org/R-ecology-lesson/).  for the most part it is 
excellent but I’m stuck on the very last section 
(https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html).

First, below are the packages I have loaded:
[1] "forcats"   "stringr"   "purrr" "readr" "tidyr" "tibble"
"ggplot2"   "tidyverse" "dbplyr""RMySQL""DBI"  
[12] "dplyr" "RSQLite"   "stats" "graphics"  "grDevices" "utils" 
"datasets"  "methods"   "base" 
 
 
>  
 

Second,
Second, is the text of the last section of the last chapter titled “Creating a 
New SQLite Database”.
Second, below is the text from the tutorial.  The black type is from the 
tutorial.  The green and blue is the suggested R code.  My comments are in red.
Creating a new SQLite database
So far, we have used a previously prepared SQLite database. But we can also use 
R to create a new database, e.g. from existing csv files. Let’s recreate the 
mammals database that we’ve been working with, in R. First let’s download and 
read in the csv files. We’ll import tidyverse to gain access to the read_csv() 
function.

download.file("https://ndownloader.figshare.com/files/3299483;,
  "data_raw/species.csv")
download.file("https://ndownloader.figshare.com/files/10717177;,
  "data_raw/surveys.csv")
download.file("https://ndownloader.figshare.com/files/3299474;,
  "data_raw/plots.csv")
library(tidyverse)
species <- read_csv("data_raw/species.csv")No problem here.  I’m pulling three 
databases from the Web and saving them to a folder on my hard drive. 
(...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") plots 
<- read_csv("data_raw/plots.csv")Again no problem.  I’m just creating an R data 
files.  But here is where I loose it.  I’m creating something named my_db_file 
from another file named portal-database-output with an sqlite extension and 
then creating my_db from the My_db_file.  Not sure where the sqlite extension 
file came from. Creating a new SQLite database with dplyr is easy. You can 
re-use the same command we used above to open an existing .sqlite file. The 
create = TRUE argument instructs R to create a new, empty database instead.

Caution: When create = TRUE is added, any existing database at the same 
location is overwritten without warning.

my_db_file <- "data/portal-database-output.sqlite"
my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database is 
empty, it doesn’t contain any tables:

my_db#> src:  sqlite 3.29.0 [data/portal-database-output.sqlite]
#> tbls:To add tables, we copy the existing data.frames into the database one 
by one:

copy_to(my_db, surveys)
copy_to(my_db, plots)
my_dbI can follow the directions to fill in my_db but I have no idea how to 
access the tables.  The text from the tutorial below says to check the location 
of our database.  Huh!  Can someone give me some direction.  Thanks.





If you check the location of our database you’ll see that data is automatically 
being written to disk. R and dplyr not only provide easy ways to query existing 
databases, they also allows you to easily create your own databases from flat 
files!



Here is where I loose it.  


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] ERROR EN LECTURA DE PAGINAS HTML GIGANTES

2020-01-10 Thread Javier Marcuzzi
Estimado Diego Maldonado

Por ahí leí que hay un error pendiente, pero está fechado en el año 2019,
hace muuccchhooo.

Puede ser que una actualización tenga solucionado el problema?

Yo en lo personal opte por casi el mismo camino, excepto que no uso R y no
uso contenedores. C# me resulta una opción más adecuada para extraer los
datos, luego analizo con R.

Javier Rubén Marcuzzi

El vie., 10 ene. 2020 a las 1:01, Diego Maldonado via R-help-es (<
r-help-es@r-project.org>) escribió:

> Saludos estimado foro, por comentarles que estoy haciendo un proceso de
> webscrapping con Rselenium por medio de contenedores docker y al
> automatizar la cargar paginas html con el paquete XML por medio de la
> función read_html me sale el siguiente mensaje de error:
>
>  Error in doc_parse_raw(x, encoding = encoding, base_url = base_url,
> as_html = as_html,  :
>   Excessive depth in document: 256 use XML_PARSE_HUGE option [1]
>
> Si alguien me puede guiar como solventarlo les agradecería del fondo de mi
> alma ya que voy algunos días tratando de resolverlo pero no lo logro.
>
> De Antemano agradezco su atención
>
> Atte
>
> Diego Maldonado
> Chiefanalytics officer
> Mentalytica
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] R-help Digest, Vol 203, Issue 8

2020-01-10 Thread Rui Barradas

Hello,

And there's also


#
# library(caTools)
# Author(s)
# Jarek Tuszynski 
#
# Original
trapz <- function(x, y){
idx = 2:length(x)
return(as.double( (x[idx] - x[idx-1]) %*% (y[idx] + y[idx-1]) ) / 2)
}

# Modified by me, input is x, f(x)
trapzf <- function(x, FUN) trapz(x, FUN(x))
# Call like 'integrate'
trapzf2 <- function(f, lower, upper, subdivisions = 100){
trapzf(seq(lower, upper, length.out = subdivisions), match.fun(f))
}



So I guess it's not missing, just missing in base R, like the OP said.

Hope this helps,

Rui Barradas

Às 11:20 de 09/01/20, Helmut Schütz escreveu:

Dear Hans,

r-help-requ...@r-project.org wrote on 2020-01-09 12:00:

Date: Wed, 8 Jan 2020 12:09:55 +0100
From: Hans W Borchers 
To: R help project 
Subject: [R] Which external functions are called in a package?
[Solved]

NB: `trapz`, ie.
the trapezoidal integration formula, seems to be the numerical
function to be missed the most in R base.


In R base indeed. However available in Frank Harrels Hmisc as the 
function trap.rule(x, y) for sorted values.

In plain R: function(x, y) sum(diff(x) * (y[-1] + y[-length(y)]))/2

Helmut



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] editing plot

2020-01-10 Thread Rui Barradas

Hello,

There are ways of reducing the white space between the bars but they are 
not obvious. Here are the two ways that I know.


First a data example.

library(ggplot2)
library(gridExtra)

df1 <- data.frame(x = LETTERS[1:5],
  y = c(40, 15, 30, 15, 20))


1. The examples that follow set argument width in two places. What is 
important is the relative magnitudes of width and position_dodge(width).


If width is bigger than position_dodge(width) then the bar sizes and the 
space between them do not change. They only change if the second width 
is bigger. In this case the first argument makes a difference. Run the 
examples to see what I mean.



g1 <- ggplot(df1, aes(x, y)) +
  geom_bar(stat = "identity",
   width = 0.8,
   position = position_dodge(width = 0.5))

g2 <- ggplot(df1, aes(x, y)) +
  geom_bar(stat = "identity",
   width = 0.8,
   position = position_dodge(width = 0.25))

g3 <- ggplot(df1, aes(x, y)) +
  geom_bar(stat = "identity",
   width = 0.5,
   position = position_dodge(width = 0.8))

g4 <- ggplot(df1, aes(x, y)) +
  geom_bar(stat = "identity",
   width = 0.25,
   position = position_dodge(width = 0.8))

grid.arrange(g31, g2, g3, g4)


2. The other way is to shrink the plot by changing its aspect ratio.

g3 + theme(aspect.ratio = 2/1)

grid.arrange(g3, g3 + theme(aspect.ratio = 2/1), nrow = 1)


Run the examples and try to do something out of this.


Hope this helps,

Rui Barradas


Às 16:41 de 09/01/20, Ana Marija escreveu:

HI Rui,

Thank you so much for getting back to me!
I did implement your idea (see attach):

ax.11.text <- element_text(size = 10)
ay.11.text <- element_text(size = 10)
p<-ggplot(data=toplot, aes(x=cat, y=props)) +
   geom_bar(stat="identity",width=0.5, fill="steelblue")+
   geom_errorbar(aes(ymin=props-1.96*ses, ymax=props+1.96*ses), width=.1,
 position=position_dodge(.9))  +

   geom_signif(comparisons=list( c("All eQTL", "eQTL from 103 genes"),
c("All SNPs", "eQTL from 103 genes")),
   y_position=c(0.065, 0.07), tip_length=0, annotation=c("p
= 0.0012", "p = 0.0023")) +
   scale_y_continuous(breaks=seq(0,.06,by=.01)) +
   xlab("") + ylab("Proportion p-values < 0.05") +
   theme_classic()+
   theme(panel.grid.major.x = element_line(size = 0.1, color = "grey"),
 panel.grid.major.y = element_blank(),
 panel.grid.minor = element_blank(),axis.text.x =
ax.11.text,axis.text.y=ay.11.text
   )
p

I was wondering is there is any way to decrease the amount of white
spaces around the bars?

Thanks
Ana

On Wed, Jan 8, 2020 at 2:58 PM Rui Barradas  wrote:


Hello,

Maybe


theme(panel.grid.major.x = element_line(size = 0.1, color = "grey"),
  panel.grid.major.y = element_blank(),
  panel.grid.minor = element_blank()
)



Note that if you remove the y axis grid you must set the x axis grid
explicitly.

Hope this helps,

Rui Barradas

Às 18:52 de 08/01/20, Ana Marija escreveu:

Hello,

I have this plot in attach. I was wondering how can I change my
plotting code in order to remove these gray horizontal background
lines but keep these two vertical lines? These two vertical lines
don't need to be gray, can be any other type of lines but they must be
at the same place. Also how can I make these two bars narrower?

library("ggplot2")
p<-ggplot(data=toplot, aes(x=cat, y=props)) +
geom_bar(stat="identity", fill="steelblue")+
geom_errorbar(aes(ymin=props-1.96*ses, ymax=props+1.96*ses), width=.2,
  position=position_dodge(.9))  +

geom_signif(comparisons=list( c("All eQTL", "eQTL from 103 genes"),
 c("All SNPs", "eQTL from 103 genes")),
y_position=c(0.065, 0.07), tip_length=0, annotation=c("p
= 0.0012", "p = 0.0023")) +
scale_y_continuous(breaks=seq(0,.06,by=.01)) +
xlab("") + ylab("Proportion p-values < 0.05") +
theme_minimal()
p


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.