[R] Tab Separated File Reading Error
Hello, I have a seemingly simple problem that a tab-delimited file can't be read in. annoTranscripts - read.table(matched.txt, sep = '\t', stringsAsFactors = FALSE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5933 did not have 12 elements However, all lines do have 12 columns. lines - readLines(matched.txt) tabsPosns - gregexpr(\t, lines) table(sapply(tabsPosns, length)) 11 367274 system(wc -l matched.txt) 367274 matched.txt You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt The line does not contain comment or quote characters. What can you suggest ? sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_3.0.1 -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tab Separated File Reading Error
annoTranscripts - read.table(matched.txt, sep = '\t', stringsAsFactors = FALSE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5933 did not have 12 elements However, all lines do have 12 columns. lines - readLines(matched.txt) ...[many omitted lines]... The line does not contain comment or quote characters. What can you suggest ? I suggest looking at the lines preceding the one where the error was found, with both print and cat: print(lines[5933 - (10:0)]) cat(lines[5933 - (10:0)], sep=\n) If things are not obvious after looking at them, see if read.table can read just those lines read.table(text=lines[5933 - (10:0)], sep=\t, stringsAsFactors=FALSE) If it can, try backing up more than 10 lines. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dario Strbenac Sent: Friday, October 04, 2013 5:01 AM To: r-help@r-project.org Subject: [R] Tab Separated File Reading Error Hello, I have a seemingly simple problem that a tab-delimited file can't be read in. annoTranscripts - read.table(matched.txt, sep = '\t', stringsAsFactors = FALSE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5933 did not have 12 elements However, all lines do have 12 columns. lines - readLines(matched.txt) tabsPosns - gregexpr(\t, lines) table(sapply(tabsPosns, length)) 11 367274 system(wc -l matched.txt) 367274 matched.txt You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt The line does not contain comment or quote characters. What can you suggest ? sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_3.0.1 -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tab Separated File Reading Error
Hi, Try: annoTranscripts- read.csv(matched.txt, sep = '\t', stringsAsFactors = FALSE,quote=,header=FALSE) str(annoTranscripts) 'data.frame': 367274 obs. of 12 variables: $ V1 : chr comp103529_c0_seq1 comp129123_c0_seq1 comp129123_c0_seq1 comp129124_c0_seq1 ... $ V2 : chr XM_003723822 XM_778057 EU116908 XM_786928 ... $ V3 : chr PREDICTED: Strongylocentrotus purpuratus neuromedin-U receptor 2-like (LOC100888633), mRNA PREDICTED: Strongylocentrotus purpuratus 60S ribosomal protein L30-like (LOC577852), mRNA Barentsia elongata putative ribosomal protein L30 mRNA, complete cds PREDICTED: Strongylocentrotus purpuratus 60S ribosomal protein L29-1-like (LOC587182), mRNA ... $ V4 : int 91 392 69 149 149 451 399 203 193 185 ... $ V5 : int 136 479 203 209 209 541 463 451 456 472 ... $ V6 : int 15 16 40 20 20 24 20 71 83 85 ... $ V7 : int 0 11 4 0 0 5 1 10 4 9 ... $ V8 : num 2e-38 0e+00 6e-26 2e-70 2e-70 ... $ V9 : int 1 22 210 135 135 131 189 205 196 185 ... $ V10: int 136 499 410 343 343 669 650 650 649 653 ... $ V11: int 576 159 27 1 1 1 21 23 140 22 ... $ V12: int 441 627 227 209 209 538 483 468 593 487 ... dim(annoTranscripts) [1] 367274 12 A.K. - Original Message - From: Dario Strbenac dstr7...@uni.sydney.edu.au To: r-help@r-project.org r-help@r-project.org Cc: Sent: Friday, October 4, 2013 8:00 AM Subject: [R] Tab Separated File Reading Error Hello, I have a seemingly simple problem that a tab-delimited file can't be read in. annoTranscripts - read.table(matched.txt, sep = '\t', stringsAsFactors = FALSE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5933 did not have 12 elements However, all lines do have 12 columns. lines - readLines(matched.txt) tabsPosns - gregexpr(\t, lines) table(sapply(tabsPosns, length)) 11 367274 system(wc -l matched.txt) 367274 matched.txt You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt The line does not contain comment or quote characters. What can you suggest ? sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_3.0.1 -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.