from:"Ulrik Stervbo via R\-help"

Re: [R] facet_wrap(nrow) ignored

2020-09-10 Thread Ulrik Stervbo via R-help


Dear Ivan,

I don't think it is possible to force a number of rows - but I'm 
honestly just guessing.


What you can do is to add an empty plot. Here I use cowplot, but 
gridExtra should also work well.


I add an indication of the row number for the plot to the initial 
data.frame, and loop over these.


In the first variant, I add an unused factor to the grp which creates an 
empty facet. I personally think this looks a little confusing, so in the 
second variant, I add a number of empty plots.


HTH
Ulrik

```
mydf <- data.frame(
  grp = rep(letters[1:6], each = 15),
  cat = rep(1:3, 30),
  var = rnorm(90),
  row_num = rep(c(1, 1, 2, 3, 4, 5), each = 15)
)

s_mydf <- split(mydf, mydf$row_num)

plots_mydf <- lapply(s_mydf, function(x){
  # Ensure no unused factors
  x$grp <- droplevels.factor(x$grp)
  if(length(unique(x$grp)) == 1){
x$grp <- factor(x$grp, levels = c(unique(x$grp), ""))
  }
  ggplot(data = x, aes(x = cat, y = var)) + geom_point() +
facet_wrap(~grp, drop=FALSE)
})

cowplot::plot_grid(plotlist = plots_mydf, nrow = 5)

# Maybe more elegant output
plots_mydf <- lapply(s_mydf, function(x, ncol = 2){
  # Ensure no unused factors
  x$grp <- droplevels.factor(x$grp)
  x <- split(x, x$grp)

  p <- lapply(x, function(x){
ggplot(data = x, aes(x = cat, y = var)) + geom_point() +
  facet_wrap(~grp)
  })

  if(length(p) < ncol){
pe <- rep(list(ggplot() + theme_void()), ncol - length(p))
p <- c(p, pe)
  }
  cowplot::plot_grid(plotlist = p, ncol = ncol)
})

cowplot::plot_grid(plotlist = plots_mydf, ncol = 1)

# Or if you prefer not to split the plots on the same row
plots_mydf <- lapply(s_mydf, function(x, ncol = 2){

  p <- list(ggplot(data = x, aes(x = cat, y = var)) + geom_point() +
facet_wrap(~grp))

  if(length(unique(x$grp)) < ncol){
pe <- rep(list(ggplot() + theme_void()), ncol - length(p))
p <- c(p, pe)
  }else{
ncol <- 1
  }
  cowplot::plot_grid(plotlist = p, ncol = ncol)
})

cowplot::plot_grid(plotlist = plots_mydf, ncol = 1)

```

On 2020-09-09 17:30, Ivan Calandra wrote:

Dear useRs,

I have an issue with the argument nrow of ggplot2::facet_wrap().

Let's consider some sample data:
mydf <- data.frame(grp = rep(letters[1:6], each = 15), cat = rep(1:3,
30), var = rnorm(90))

And let's try to plot with 5 rows:
library(ggplot2)
ggplot(data = mydf, aes(x = cat, y = var)) + geom_point() +
facet_wrap(~grp, nrow = 5)
It plots 2 rows and 3 columns rather than 5 rows and 2 columns as 
wanted.


These plots are as expected:
ggplot(data = mydf, aes(x = cat, y = var)) + geom_point() +
facet_wrap(~grp, nrow = 2)
ggplot(data = mydf, aes(x = cat, y = var)) + geom_point() +
facet_wrap(~grp, nrow = 6)

My guess is that 5 rows is not ideal for 6 facets (5 facets in 1st
column and only 1 facet for 2nd column) so it overrides the value of
nrow. In the case of 2 or 6 rows, the facets are well distributed in 
the

layout.

The reason why I need 5 rows with 6 facets is that this facet plot is
part of a patchwork and I would like to have the same number of rows 
for

all facet plots of the patchwork (so that they all align well).

Is there a way to force the number of rows in the facet_wrap()?

Thank you in advance.
Best,
Ivan

--


--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readxl question

2020-08-27 Thread Ulrik Stervbo via R-help

I clearly didn't read well enough. As Petr pointed out, there is also 
the col_names argument.


```
# Solution 4a

map_dfr(files, function(cur_file, ranges){
  map_dfc(ranges, function(cur_range, df){
read_excel(cur_file, sheet = 1, col_names = cur_range, range = 
cur_range)

  }, df = df)
}, ranges = ranges, .id = "filename")

```

On 2020-08-27 17:33, Ulrik Stervbo via R-help wrote:

Hi Thomas,

I am not familiar with the use of the range argument, but it seems to
me that the cell value becomes the column name. This might be fine,
but you might get into trouble if you have repeated cell values since
as.data.frame() will fix these.

I am also not sure about what you want, but this seems to capture your
example (reading the same cells in a number of files):

```
library(readxl)

# Create test set
path <- readxl_example("geometry.xls")

read_xls(path) # See the content

example_file1 <- tempfile(fileext = ".xls")
example_file2 <- tempfile(fileext = ".xls")

file.copy(path, example_file1, overwrite = TRUE)
file.copy(path, example_file2, overwrite = TRUE)

# Solve the problem using loops
files <- c(example_file1, example_file2)
ranges <- c("B4", "C5", "D6")

fr <- lapply(ranges, function(cur_range, files){
  x <- lapply(files, read_xls, sheet = 1, range = cur_range)
  t(as.data.frame(x))
}, files = files)

# Loop over fr and save content if needed
```

A couple of variations over the theme, where the cell content is
accessed after reading the file. This will not work well if the data
in the excel files does not start at A1, but if you can adjust for
this it should work just fine

```
# Solution #2

# Read the whole excel file, and access just the column - row
# This will give really unexpected results if the data does not start 
in the
# cell A1 as is the case for geometry.xls. Also, it does not work with 
ranges

# spaning more than a single cell
files <- rep(readxl_example("datasets.xlsx"), 3)
ranges <- c("B4", "C5", "D6")

# Loop over the files to avoid re-reading
fr <- lapply(files, function(cur_file, ranges){
  df <- read_excel(cur_file, sheet = 1)
  x <- lapply(ranges, function(cur_range, df){
cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
df[cr$row, cr$col][[1]]
  }, df = df)
  as.data.frame(setNames(x, ranges))

}, ranges = ranges)

# Solution 3
# Like solution 2 but using purr

library(purrr)

files <- rep(readxl_example("datasets.xlsx"), 3)
ranges <- c("B4", "C5", "D6")

map_dfr(files, function(cur_file, ranges){
  map_dfc(ranges, function(cur_range, df){
df <- read_excel(cur_file, sheet = 1)
cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
setNames(df[cr$row, cr$col], cur_range)
  }, df = df)

}, ranges = ranges)

# Solution 4
# Like solution 3, but with the addition of the file name and producing 
a single

# data.frame at the end

library(purrr)

path <- readxl_example("datasets.xls")
example_file1 <- tempfile(fileext = "_1.xls")
example_file2 <- tempfile(fileext = "_2.xls")
example_file3 <- tempfile(fileext = "_3.xls")

file.copy(path, example_file1, overwrite = TRUE)
file.copy(path, example_file2, overwrite = TRUE)
file.copy(path, example_file3, overwrite = TRUE)

files <- c(example_file1, example_file2, example_file3)

# Name the file paths with the file names. We can them make use of the 
.id

# argument to map_dfr()
files <- setNames(files, basename(files))
ranges <- c("B4", "C5", "D6")

map_dfr(files, function(cur_file, ranges){
  map_dfc(ranges, function(cur_range, df){
df <- read_excel(cur_file, sheet = 1)
cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
setNames(df[cr$row, cr$col], cur_range)
  }, df = df)
}, ranges = ranges, .id = "filename")
```

HTH
Ulrik

On 2020-08-26 15:38, PIKAL Petr wrote:

Hi

As OP has only about 250 files and in read_excel you cannot specify 
several

ranges at once, reading those values separately and concatenating them
together in one step seems to be the most efficient way. One probably 
could
design such function, but time spent on the function performing the 
task

only once is probably bigger than performing 250*3 reads.

I see inefficiency in writing each column into separate text file and
coppying it back to Excel file.

Cheers
Petr


-Original Message-
From: Upton, Stephen (Steve) (CIV) 
Sent: Wednesday, August 26, 2020 2:44 PM
To: PIKAL Petr ; Thomas Subia 


Cc: r-help@r-project.org
Subject: RE: [R] readxl question

From your example, it appears you are reading in the same excel file 
for
each function to get a value. I would look at creating a function 
that
extracts what you need from each file all at once, rather than 
separate

reads.

Stephen C. Upton
SEED (Simula

Re: [R] readxl question

2020-08-27 Thread Ulrik Stervbo via R-help

Hi Thomas,

I am not familiar with the use of the range argument, but it seems to me 
that the cell value becomes the column name. This might be fine, but you 
might get into trouble if you have repeated cell values since 
as.data.frame() will fix these.

I am also not sure about what you want, but this seems to capture your 
example (reading the same cells in a number of files):

```
library(readxl)

# Create test set
path <- readxl_example("geometry.xls")

read_xls(path) # See the content

example_file1 <- tempfile(fileext = ".xls")
example_file2 <- tempfile(fileext = ".xls")

file.copy(path, example_file1, overwrite = TRUE)
file.copy(path, example_file2, overwrite = TRUE)

# Solve the problem using loops
files <- c(example_file1, example_file2)
ranges <- c("B4", "C5", "D6")

fr <- lapply(ranges, function(cur_range, files){
  x <- lapply(files, read_xls, sheet = 1, range = cur_range)
  t(as.data.frame(x))
}, files = files)

# Loop over fr and save content if needed
```

A couple of variations over the theme, where the cell content is 
accessed after reading the file. This will not work well if the data in 
the excel files does not start at A1, but if you can adjust for this it 
should work just fine

```
# Solution #2

# Read the whole excel file, and access just the column - row
# This will give really unexpected results if the data does not start in 
the
# cell A1 as is the case for geometry.xls. Also, it does not work with 
ranges

# spaning more than a single cell
files <- rep(readxl_example("datasets.xlsx"), 3)
ranges <- c("B4", "C5", "D6")

# Loop over the files to avoid re-reading
fr <- lapply(files, function(cur_file, ranges){
  df <- read_excel(cur_file, sheet = 1)
  x <- lapply(ranges, function(cur_range, df){
cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
df[cr$row, cr$col][[1]]
  }, df = df)
  as.data.frame(setNames(x, ranges))

}, ranges = ranges)

# Solution 3
# Like solution 2 but using purr

library(purrr)

files <- rep(readxl_example("datasets.xlsx"), 3)
ranges <- c("B4", "C5", "D6")

map_dfr(files, function(cur_file, ranges){
  map_dfc(ranges, function(cur_range, df){
df <- read_excel(cur_file, sheet = 1)
cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
setNames(df[cr$row, cr$col], cur_range)
  }, df = df)

}, ranges = ranges)

# Solution 4
# Like solution 3, but with the addition of the file name and producing 
a single

# data.frame at the end

library(purrr)

path <- readxl_example("datasets.xls")
example_file1 <- tempfile(fileext = "_1.xls")
example_file2 <- tempfile(fileext = "_2.xls")
example_file3 <- tempfile(fileext = "_3.xls")

file.copy(path, example_file1, overwrite = TRUE)
file.copy(path, example_file2, overwrite = TRUE)
file.copy(path, example_file3, overwrite = TRUE)

files <- c(example_file1, example_file2, example_file3)

# Name the file paths with the file names. We can them make use of the 
.id

# argument to map_dfr()
files <- setNames(files, basename(files))
ranges <- c("B4", "C5", "D6")

map_dfr(files, function(cur_file, ranges){
  map_dfc(ranges, function(cur_range, df){
df <- read_excel(cur_file, sheet = 1)
cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
setNames(df[cr$row, cr$col], cur_range)
  }, df = df)
}, ranges = ranges, .id = "filename")
```

HTH
Ulrik

On 2020-08-26 15:38, PIKAL Petr wrote:

Hi

As OP has only about 250 files and in read_excel you cannot specify 
several

ranges at once, reading those values separately and concatenating them
together in one step seems to be the most efficient way. One probably 
could
design such function, but time spent on the function performing the 
task

only once is probably bigger than performing 250*3 reads.

I see inefficiency in writing each column into separate text file and
coppying it back to Excel file.

Cheers
Petr

-Original Message-
From: Upton, Stephen (Steve) (CIV) 
Sent: Wednesday, August 26, 2020 2:44 PM
To: PIKAL Petr ; Thomas Subia 

Cc: r-help@r-project.org
Subject: RE: [R] readxl question

From your example, it appears you are reading in the same excel file 
for

each function to get a value. I would look at creating a function that
extracts what you need from each file all at once, rather than 
separate

reads.

Stephen C. Upton
SEED (Simulation Experiments & Efficient Designs) Center for Data 
Farming

SEED Center website: https://harvest.nps.edu

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of PIKAL 
Petr

Sent: Wednesday, August 26, 2020 3:50 AM
To: Thomas Subia 
Cc: r-help@r-project.org
Subject: Re: [R] readxl question

NPS WARNING: *external sender* verify before acting.

Hi

Are you sure that your command read values from respective cells?

I tried it and got empty data frame with names
> WO <- lapply(files, read_excel, sheet=1, range=("B3"))
> as.data.frame(WO)
[1] ano TP303   X96
[4] X0  X3.7518 X26.7
<0 rows> (or 0-l

Re: [R] Binomial PCA Using pcr()

2020-08-19 Thread Ulrik Stervbo via R-help


Hi Prasad,

I think this might be a problem with the package, and you can try to 
contact the package author.


The error seem to arise because the pcr() cannot find the 
'negative-binomial' distribution


```
library(qualityTools)
x <- rnbinom(500, mu = 4, size = 100)
pcr(x, distribution = "negative-binomial")
```

When I look in the code of pcr(), there is some testing against the 
words 'negative binomial' (note the missing -), although the 
documentation clearly lists 'negative-binomial' as a possible 
distribution.


Unfortunately changing 'negative-binomial' to 'negative binomial' does 
not help, as


```
pcr(x, distribution = "negative binomial")
```

throws the error "object '.confintnbinom' not found" and a lot of 
warnings.


Best,
Ulrik


On 2020-08-12 12:50, Prasad DN wrote:

Hi All,

i am very new to R and need guidance.

Need help in doing process capability Analysis for my data set (6 
months of

data) given in below format:

Date   |   Opportunities  |  Defectives | DefectivesInPercent

I searched and found that pcr() from QualityTools package can be used 
for

this purpose.  The USL is 2% defectives.

MyData = read.csv(file.choose())   #select  CSV file that has data in 
above

mentioned format.
x <- MyData$DefectivesInPercent

pcr(x, distribution = "negative-binomial", usl=0.02)

I get error message as:
Error in pcr(x, distribution = "negative-binomial", usl = 0.02) :
  y distribution could not be found!

Please advise, how to proceed?

Regards,
Prasad DN

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with read.csv.sql()

2020-07-29 Thread Ulrik Stervbo via R-help

True, but the question was also how to control for formats and naming columns 
while loading the file.

The only way I know how to do this (sans work on my part) is through the 
functions in readr. So, 50% on topic :-)

Best,
Ulrik

On 29 Jul 2020, 17:59, at 17:59, Rasmus Liland  wrote:
>Dear Ulrik,
>
>On 2020-07-29 17:14 +0200, Ulrik Stervbo via R-help wrote:
>> library(readr)
>> read_csv(
>
>This thread was about
>sqldf::read.csv.sql ...
>
>What is the purpose of bringing up
>readr::read_csv?  I am unfamilliar with
>it, so it might be a good one.
>
>Best,
>Rasmus
>
>
>
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with read.csv.sql()

2020-07-29 Thread Ulrik Stervbo via R-help


You might achieve this using readr:

```
library(readr)

lines <- "Id, Date, Time, Quality, Lat, Long
STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"

read_csv(lines)

read_csv(
  lines,
  skip = 1, # Ignore the header row
  col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", 
"myLong"),

  col_types = cols(
myDate = col_date(format = ""),
myTime = col_time(format = ""),
myLat = col_number(),
myLong = col_number(),
.default = col_character()
  )
  )

read_csv(
  lines,
  col_types = cols_only(
Id = col_character(),
Date = col_date(format = ""),
Time = col_time(format = "")
  )
)

read_csv(
  lines,
  skip = 1, # Ignore the header row
  col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", 
"myLong"),

  col_types = cols_only(
myId = col_character(),
myDate = col_date(format = ""),
myTime = col_time(format = "")
  )
)
```

HTH
Ulrik

On 2020-07-20 02:07, H wrote:

On 07/18/2020 01:38 PM, William Michels wrote:

Do either of the postings/threads below help?

https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html

Otherwise you can try reading through the FAQ on Github:

https://github.com/ggrothendieck/sqldf

HTH, Bill.

W. Michels, Ph.D.



On Sat, Jul 18, 2020 at 9:59 AM H  wrote:

On 07/18/2020 11:54 AM, Rui Barradas wrote:

Hello,

I don't believe that what you are asking for is possible but like 
Bert suggested, you can do it after reading in the data.
You could write a convenience function to read the data, then change 
what you need to change.

Then the function would return this final object.

Rui Barradas

Às 16:43 de 18/07/2020, H escreveu:


On 07/17/2020 09:49 PM, Bert Gunter wrote:
Is there some reason that you can't make the changes to the data 
frame (column names, as.date(), ...) *after* you have read all 
your data in?


Do all your csv files use the same names and date formats?


Bert Gunter

"The trouble with having an open mind is that people keep coming 
along and sticking things into it."

-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 17, 2020 at 6:28 PM H > wrote:


 I have created a dataframe with columns that are characters, 
integers and numeric and with column names assigned by me. I am 
using read.csv.sql() to read portions of a number of large csv 
files into this dataframe, each csv file having a header row with 
columb names.


 The problem I am having is that the csv files have header 
rows with column names that are slightly different from the column 
names I have assigned in the dataframe and it seems that when I 
read the csv data into the dataframe, the column names from the 
csv file replace the column names I chose when creating the 
dataframe.


 I have been unable to figure out if it is possible to assign 
column names of my choosing in the read.csv.sql() function? I have 
tried various variations but none seem to work. I tried colClasses 
= c() but that did not work, I tried field.types = c(...) but 
could not get that to work either.


 It seems that the above should be feasible but I am missing 
something? Does anyone know?


 A secondary issue is that the csv files have a column with a 
date in mm/dd/ format that I would like to make into a Date 
type column in my dataframe. Again, I have been unable to find a 
way - if at all possible - to force a conversion into a Date 
format when importing into the dataframe. The best I have so far 
is to import is a character column and then use as.Date() to later 
force the conversion of the dataframe column.


 Is it possible to do this when importing using 
read.csv.sql()?


 __
 R-help@r-project.org  mailing 
list -- To UNSUBSCRIBE and more, see

 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible 
code.


Yes, the files use the same column names and date format (at least 
as far as I know now.) I agree I could do it as you suggest above 
but from a purist perspective I would rather do it when importing 
the data using read.csv.sql(), particularly if column names and/or 
date format might change, or be different between different files. 
I am indeed selecting rows from a large number of c

Re: [R] Help with read.csv.sql()

2020-07-29 Thread Ulrik Stervbo via R-help


You might achieve this using readr:

```
library(readr)

lines <- "Id, Date, Time, Quality, Lat, Long
STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"

read_csv(lines)

read_csv(
  lines,
  skip = 1, # Ignore the header row
  col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", 
"myLong"),

  col_types = cols(
myDate = col_date(format = ""),
myTime = col_time(format = ""),
myLat = col_number(),
myLong = col_number(),
.default = col_character()
  )
  )

read_csv(
  lines,
  col_types = cols_only(
Id = col_character(),
Date = col_date(format = ""),
Time = col_time(format = "")
  )
)

read_csv(
  lines,
  skip = 1, # Ignore the header row
  col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", 
"myLong"),

  col_types = cols_only(
myId = col_character(),
myDate = col_date(format = ""),
myTime = col_time(format = "")
  )
)
```

HTH
Ulrik

On 2020-07-20 02:07, H wrote:

On 07/18/2020 01:38 PM, William Michels wrote:

Do either of the postings/threads below help?

https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html

Otherwise you can try reading through the FAQ on Github:

https://github.com/ggrothendieck/sqldf

HTH, Bill.

W. Michels, Ph.D.



On Sat, Jul 18, 2020 at 9:59 AM H  wrote:

On 07/18/2020 11:54 AM, Rui Barradas wrote:

Hello,

I don't believe that what you are asking for is possible but like 
Bert suggested, you can do it after reading in the data.
You could write a convenience function to read the data, then change 
what you need to change.

Then the function would return this final object.

Rui Barradas

Às 16:43 de 18/07/2020, H escreveu:


On 07/17/2020 09:49 PM, Bert Gunter wrote:
Is there some reason that you can't make the changes to the data 
frame (column names, as.date(), ...) *after* you have read all 
your data in?


Do all your csv files use the same names and date formats?


Bert Gunter

"The trouble with having an open mind is that people keep coming 
along and sticking things into it."

-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 17, 2020 at 6:28 PM H > wrote:


 I have created a dataframe with columns that are characters, 
integers and numeric and with column names assigned by me. I am 
using read.csv.sql() to read portions of a number of large csv 
files into this dataframe, each csv file having a header row with 
columb names.


 The problem I am having is that the csv files have header 
rows with column names that are slightly different from the column 
names I have assigned in the dataframe and it seems that when I 
read the csv data into the dataframe, the column names from the 
csv file replace the column names I chose when creating the 
dataframe.


 I have been unable to figure out if it is possible to assign 
column names of my choosing in the read.csv.sql() function? I have 
tried various variations but none seem to work. I tried colClasses 
= c() but that did not work, I tried field.types = c(...) but 
could not get that to work either.


 It seems that the above should be feasible but I am missing 
something? Does anyone know?


 A secondary issue is that the csv files have a column with a 
date in mm/dd/ format that I would like to make into a Date 
type column in my dataframe. Again, I have been unable to find a 
way - if at all possible - to force a conversion into a Date 
format when importing into the dataframe. The best I have so far 
is to import is a character column and then use as.Date() to later 
force the conversion of the dataframe column.


 Is it possible to do this when importing using 
read.csv.sql()?


 __
 R-help@r-project.org  mailing 
list -- To UNSUBSCRIBE and more, see

 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible 
code.


Yes, the files use the same column names and date format (at least 
as far as I know now.) I agree I could do it as you suggest above 
but from a purist perspective I would rather do it when importing 
the data using read.csv.sql(), particularly if column names and/or 
date format might change, or be different between different files. 
I am indeed selecting rows from a large number of c

Re: [R] Dataframe with different lengths

2020-07-29 Thread Ulrik Stervbo via R-help


Hi Pedro,

I see you use dplyr and ggplot2. Are you looking for something like 
this:


```
library(ggplot2)
library(dplyr)

test_data <- data.frame(
  year = c(rep("2018", 10), rep("2019", 8), rep("2020", 6)),
  value = sample(c(1:100), 24)
)

test_data <- test_data %>%
  group_by(year) %>%
  mutate(cumsum_value = cumsum(value),
 x_pos = 1:n())

ggplot(test_data) +
  aes(x = x_pos, y = cumsum_value, colour = year) +
  geom_point()
```

Best,
Ulrik

On 2020-07-22 13:16, Pedro páramo wrote:

Hi all,

I am trying to draw a plot with cumsum values but each "line" has 
different

lengths

Ilibrary(dplyr)
library(tibble)
library(lubridate)
library(PerformanceAnalytics)
library(quantmod)
library(ggplot2)

getSymbols('TSLA')

I want to create the variables:

a<-cumsum(dailyReturn(TSLA, subset = c('2019')) )
b<-cumsum(dailyReturn(TSLA, subset = c('2020')) )
c<-cumsum(dailyReturn(TSLA, subset = c('2018')) )

Each value, on a,b,c has two columns date, and values.

The thing is I want to plot the three lines in one plot with the 
maximum
values of a,b,c in this case a has 252 values, and plot the other two 
lines
could be in the axis I should put (x <- 1:252) on the axis but I was 
not

able for the moment.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Arranging ggplot2 objects with ggplotGrob()

2020-07-29 Thread Ulrik Stervbo via R-help


Then this should work:

```
library(ggplot2)
library(cowplot)

p1 <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +  
geom_point()
p2 <- ggplot(iris, aes(x = Petal.Length, y = Petal.Width * 1000)) + 
geom_point()


plot_grid(p1, p2, ncol = 1, align = "hv", rel_heights = c(2, 1), axis = 
"t")


p1 <- p1 + theme(
  axis.text.x = element_blank(),
  axis.title.x = element_blank(),
  axis.ticks.x = element_blank()
)

plot_grid(p1, p2, ncol = 1, align = "hv", rel_heights = c(2, 1), axis = 
"t")


# You can play around with ggplot2 plot.margin to further reduce the 
space

p1 <- p1 + theme(
  plot.margin = margin(b = -6)
)

p2 <- p2 + theme(
  plot.margin = margin(t = -6)
)

plot_grid(p1, p2, ncol = 1, align = "hv", rel_heights = c(2, 1), axis = 
"t")

```

Best,
Ulrik

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Arranging ggplot2 objects with ggplotGrob()

2020-07-28 Thread Ulrik Stervbo via R-help


Would this work:

```
library(ggplot2)
library(cowplot)

p1 <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
geom_point()
p2 <- ggplot(iris, aes(x = Petal.Length, y = Petal.Width * 1000)) + 
geom_point()


plot_grid(p1, p2, ncol = 1, align = "hv", rel_heights = c(2, 1))
```

Best,
Ulrik

On 2020-07-24 21:58, Bert Gunter wrote:

?grid.frame, etc. should be straightforward for this I would think.
But of course you have to resort to the underlying grid framework 
rather

than the ggplot2 interface.

Bert Gunter

"The trouble with having an open mind is that people keep coming along 
and

sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 24, 2020 at 12:11 PM H  wrote:


On 07/24/2020 02:50 PM, H wrote:
> On 07/24/2020 02:03 PM, Jeff Newmiller wrote:
>> The set of people interested in helping when you supply a minimal
reproducible example is rather larger than the set of people willing 
to
read the documentation for you (hint) and guess what aspect of 
alignment

you are having trouble with.
>>
>> On July 24, 2020 10:46:57 AM PDT, H  wrote:
>>> On 07/24/2020 01:14 PM, John Kane wrote:
 Well, I am not looking for help debugging my code but for
>>> information to better understand arranging plots vertically. The code
>>> above aligns them horizontally as expected.
 Sigh, we know the code works but we do not know what the plots are
>>> and we cannot play around with them to see if we can help you if we
>>> have nothing to work with.
 On Fri, 24 Jul 2020 at 12:12, H >> > wrote:
 On 07/24/2020 05:29 AM, Erich Subscriptions wrote:
 > Hav a look at the packages cowplot and patchwork
 >
 >> On 24.07.2020, at 02:36, H >> > wrote:
 >>
 >> I am trying to arrange two plots vertically, ie plot 2 below
>>> plot 1, where I want the plots to align columnwise but have a height
>>> ratio of eg 3:1.
 >>
 >> My attempts so far after consulting various webpages is that
>>> the following code aligns them columnwise correctly but I have, so far,
>>> failed in setting the relative heights...
 >>
 >> g2<-ggplotGrob(s)
 >> g3<-ggplotGrob(v)
 >> g<-rbind(g2, g3, size = "first")
 >> g$widths<-unit.pmax(g2$widths, g3$widths)
 >>
 >> what would the appropriate statement for the relative heights
>>> to add here be?
 >>
 >> grid.newpage()
 >> grid.draw(g)
 >>
 >> Thank you!
 >>
 >> __
 >> R-help@r-project.org  mailing
>>> list -- To UNSUBSCRIBE and more, see
 >> https://stat.ethz.ch/mailman/listinfo/r-help
 >> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
 >> and provide commented, minimal, self-contained, reproducible
>>> code.
 So this is not possible without using one of those two packages?
>>> I got the impression I should be able to use grid.arrange to do so but
>>> was not able to get it to work without disturbing the width alignment
>>> above...
 __
 R-help@r-project.org  mailing list
>>> -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible
>>> code.
 --
 John Kane
 Kingston ON Canada
>>> No need to play around with anything. I am simply looking for
>>> assistance on how to use eg arrangeGrob to not only align two plots
>>> columnwise but also adjust their heights relative to each other rather
>>> than 1:1.
>>>
>>> Can arrangeGrob() be used for that?
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> Look at
https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html
where there are two mpg charts, one above the other. What would I need 
to

add to:
>
> |library(gtable) g2 <-ggplotGrob(p2) g3 <-ggplotGrob(p3) g <-rbind(g2,
g3, size = "first") g$widths <-unit.pmax(g2$widths, g3$widths)
grid.newpage() grid.draw(g) |
>
> |to make the second chart 1/2 the size of the top one?|
>
> ||
>
The following code aligns the two plot areas of the two charts 
perfectly
but they are the same height whereas I want to make the bottom one 1/2 
as

tall as the top one:

g2<-ggplotGrob(s)
g3<-ggplotGrob(v)
g<-rbind(g2, g3, size = "first")
g$widths<-unit.pmax(g2$widths, g3$wi

Re: [R] Filtering using multiple rows in dplyr

2018-05-31 Thread Ulrik Stervbo via R-help


Hi Sumitrajit,

dplyr has a function for this - it's called filter.

For each group you can count the number of SNR > 3 (you can use sum on 
true/false). You can filter on the results directly or add a column as 
you plan. The latter might make your intention more clear.


HTH
Ulrik

On 2018-05-30 18:18, Sumitrajit Dhar wrote:


Hi Folks,

I have just started using dplyr and could use some help getting
unstuck. It could well be that dplyr is not the package to be using,
but let me just pose the question and seek your advice.

Here is my basic data frame.

head(h)
subject ageGrp ear hearingGrp sex freq L2   Ldp Phidp
NF   SNR
1 HALAF032  A   L  A   F2  0 -23.54459  55.56005
-43.08282 19.538232
2 HALAF032  A   L  A   F2  2 -32.64881  86.22040
-23.31558 -9.333224
3 HALAF032  A   L  A   F2  4 -18.91058  42.12168
-35.60250 16.691919
4 HALAF032  A   L  A   F2  6 -23.85937 297.94499
-20.70452 -3.154846
5 HALAF032  A   L  A   F2  8 -14.45381 181.75329
-24.17094  9.717128
6 HALAF032  A   L  A   F2 10 -20.42384  67.12998
-35.77357 15.349728

'subject' and 'freq' together make a set of data and I am interested
in how the last four columns vary as a function of L2. So I grouped by
'subject' and 'freq' and can look at basic summaries.

h_byFunc <- h %>% group_by(subject, freq)


h_byFunc %>% summarize(l = mean(Ldp), s = sd(Ldp) )


# A tibble: 1,175 x 4
# Groups:   subject [?]
subject   freq   l s

1 HALAF032 2 -13.88.39
2 HALAF032 4 -15.8   11.0
3 HALAF032 8 -23.46.51
4 HALAF033 2 -14.29.64
5 HALAF033 4 -12.38.92
6 HALAF033 8  -6.55  12.3
7 HALAF036 2 -14.9   12.6
8 HALAF036 4 -16.7   11.2
9 HALAF036 8 -21.76.56
10 HALAF039 2   0.242 12.4
# ... with 1,165 more rows

What  I would like to do is filter some groups out based on various
criteria. For example, if SNR > 3 in three consecutive L2 within a
group, that group qualifies and I would add a column, say "clean" and
assign it a value "Y." Is there a way to do this in dplyr or should I
be looking at a different way.

Thanks in advance for your help.

Regards,
Sumit

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] facet_wrap(nrow) ignored

Re: [R] readxl question

Re: [R] readxl question

Re: [R] Binomial PCA Using pcr()

Re: [R] Help with read.csv.sql()

Re: [R] Help with read.csv.sql()

Re: [R] Help with read.csv.sql()

Re: [R] Dataframe with different lengths

Re: [R] Arranging ggplot2 objects with ggplotGrob()

Re: [R] Arranging ggplot2 objects with ggplotGrob()

Re: [R] Filtering using multiple rows in dplyr

11 matches

Site Navigation

Mail list logo

Footer information