Re: [Bioc-devel] a day in the life of gwascat

2020-04-30 Thread Vincent Carey
Thanks for checking this out. I am leaning towards readr::read_tsv which is very explicit about untoward content Browse[2]> debug: tab = readr::read_tsv(tf) Browse[2]> *Parsed with column specification:* *cols(* * .default = col_character(),* * `DATE ADDED TO CATALOG` = **col_date(format

Re: [Bioc-devel] a day in the life of gwascat

2020-04-30 Thread Hervé Pagès
Everything works fine for me with quote="": > system.time(gwas <-read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv", quote="")) user system elapsed 4.435 0.052 4.487 > dim(gwas) [1] 179364 38 > sessionInfo() R version 4.0.0 Patched (2020-04-27 r78316) Platform:

Re: [Bioc-devel] a day in the life of gwascat

2020-04-30 Thread Vincent Carey
This file trips up fread around record 170349, inconsistently ... I haven't figured that out yet. readLines, strsplit may be the ultimate solution. On Thu, Apr 30, 2020 at 7:15 AM Vincent Carey wrote: > right, line 35265 of > http://www.ebi.ac.uk/gwas/api/search/downloads/alternative has an >

Re: [Bioc-devel] a day in the life of gwascat

2020-04-30 Thread Vincent Carey
right, line 35265 of http://www.ebi.ac.uk/gwas/api/search/downloads/alternative has an unclosed quote in a field. 35265 2019-04-10 30804558Grove J 2019-02-25 Nat Genet www.ncbi.nlm.nih.gov/pubmed/30804558I dentification of common genetic risk variants for autism

Re: [Bioc-devel] a day in the life of gwascat

2020-04-30 Thread Martin Morgan
I'd look instead at or around line 35264 for use of quotes, e.g., "3' DNA", and change the argument read.delim(quote = "") (though I never get that right so probably wrong again...). A comment character might also be a problem. If you point to the location of the file I could investigate

[Bioc-devel] a day in the life of gwascat

2020-04-30 Thread Vincent Carey
The EBI GWAS catalog is large -- now the download is over 100MB for 179K associations. A "bug" in the package was reported, so I acquired the file by hand. > nn = read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv", sep="\t") *Warning message:* *In scan(file = file, what = what,