Re: [R] "Missing value representation in Excel before

Leif Kirschenbaum Tue, 10 Jan 2006 12:51:12 -0800

I reproduce from memory my exhaustive look into this issue.
RODBC uses the Microsoft ODBC DLL's developed by Microsoft.
  These DLL's perform an automatic determination of column type based on the 
contents of the first N rows of cells in each column, where N [0,16]. N may be 
set in the Windows system registry, and there are a few other things that may 
be set in the system registry which control how the DLL parses an Excel 
spreadsheet. Unfortunately, the Microsoft DLL's do not always pay attention to 
the registry settings and do not always interpret them in the same manner.
  The end result is that no matter what you do with RODBC, and no matter how 
the authors of RODBC re-write it, some Excel spreadsheets will always be 
unreadable via RODBC given particular insidious combinations of data in some 
columns of your spreadsheet. (until such time as Microsoft fixes their DLL 
bugs, I mean features) I have some faint recollection that the Microsoft DLL 
incorrectly parses a column with non-empty rows due to some formatting issue of 
those particular columns, which I was unable to cure by re-formatting the 
source worksheet.
  I have had to resort to using the gdata package which runs a Perl script 
"xls2csv.pl", which converts an Excel spreadsheet to CSV, for a few Excel 
spreadsheets which exhibit the particular anomalies preventing use of RODBC.


Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
[EMAIL PROTECTED] 

> Message: 21
> Date: Mon, 9 Jan 2006 18:06:49 +0100
> From: "Fredrik Lundgren" <[EMAIL PROTECTED]>
> Subject: Re: [R] "Missing value representation in Excel before
>       extraction to   R with RODBC"
> To: "Prof Brian Ripley" <[EMAIL PROTECTED]>,  "Petr Pikal"
>       <[EMAIL PROTECTED]>
> Cc: R-help <[email protected]>
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>       reply-type=response
> 
> Dear list,
> 
> Well, those columns in Excel that starts with NA (actually 8 
> NA's in my 
> case) is imported as all NA in R but if the columns starts 
> with at least 
> 3 cells with values (i.e not NA) the are imported correctly 
> to R. When 
> as.is=TRUE is used a simular conversion takes place but now 
> as all <NA> 
> and dates are represented as date-and-time.
> Is there any way to get this correct even when the Excel 
> columns start 
> with several NA's?
> 
> Sincerely
> Fredrik
> 
> 
> ----- Original Message ----- 
> From: "Prof Brian Ripley" <[EMAIL PROTECTED]>
> To: "Petr Pikal" <[EMAIL PROTECTED]>
> Cc: "Fredrik Lundgren" <[EMAIL PROTECTED]>; "R-help" 
> <[email protected]>
> Sent: Monday, January 09, 2006 9:36 AM
> Subject: Re: [R] "Missing value representation in Excel before 
> extraction to R with RODBC"
> 
> 
> > On Mon, 9 Jan 2006, Petr Pikal wrote:
> >
> >> Hi
> >>
> >> I believe it has something to do with the column identification
> >> decision. When R decides what is in a column it uses only 
> some values
> >> from the beginning of a file.
> >
> > Not R, Excel.  Excel tells ODBC what the column types are.
> >

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] "Missing value representation in Excel before

Reply via email to