Re: [R] R interpreting numeric field as a boolean field

Duncan Murdoch Tue, 30 Jan 2024 11:41:42 -0800

If you are using the read_excel() function from the readxl package, thenthere's an argument named col_types that lets you specify the types to use.

You could specify col_types = "numeric" to read all columns as numericcolumns. If some columns are different types, you should specify avector of type names, with one entry per column. Allowable names are"skip", "guess", "logical", "numeric", "date", "text" or "list". You'llhave to read the docs to find out what some of those do.


Duncan Murdoch

On 30/01/2024 1:40 p.m., Paul Bernal wrote:

Dear friend Duncan,
Thank you so much for your kind reply. Yes, that is exactly what ishappening, there are a lot of NA values at the start, so R assumes thatthe field is of type boolean. The challenge that I am facing is that Iwant to read into R an Excel file that has many sheets (46 in this case)but I wanted to combine all 46 sheets into a single dataframe (since thecolumns are exactly the same for all 46 sheets). The rio package doesthis nicely, the problem is that, once I have the full dataframe (whichamounts to roughly 2.98 million rows total), I cannot change the datatype from boolean to numeric. I tried doing dataset$my_field =as.numeric(dataset$my_field), I also tried to do dataset <-dataset[complete.cases(dataset), ], didn't work either.
The only thing that worked for me was to take a single sheed and throughthe read_excel function use the guess_max parameter and set it to asufficiently large number (a number >= to the total amount of the fullmerged dataset). I want to automate the merging of the N number of Excelsheets so that I don't have to be manually doing it. Unless there is away to accomplish something similar to what rio's package functionimport_list does, that is able to keep the field's numeric data type nature.
Cheers,
Paul
El mar, 30 ene 2024 a las 12:23, Duncan Murdoch(<murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>) escribió:
    On 30/01/2024 11:10 a.m., Paul Bernal wrote:
     > Dear friends,
     >
     > Hope you are doing well. I am currently using R version 4.3.2,
    and I have a
     > .xlsx file that has 46 sheets on it. I basically combined  all 46
    sheets
     > and read them as a single dataframe in R using package rio.
     >
     > I read a solution using package readlx, as suggested in a
    StackOverflow
     > discussion as follows:
     > df <- read_excel(path = filepath, sheet = sheet_name, guess_max =
    100000).
     > Now, when you have so many sheets (46 in my case) in an Excel
    file, the rio
     > methodology is more practical.
     >
     > This is what I did:
     > path =
     >
    
"C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
     > (4).xlsx"
     > figidat = import_list(path, rbind = TRUE) #here figidat refers to
    my dataset
     >
     > Now, it successfully imports and merges all records, however,
    some fields
     > (despite being numeric), R interprets as a boolean field.
     >
     > Here is the structure of the field that is causing me problems (I
    apologize
     > for the length):
     > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
     > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    ...
     > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA,
     > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
     > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame"
     > ), row.names = c(NA, -7033L))
     >
     > As you can see, when I do the dput, it gives me a bunch of TRUE
    and FALSE
     > values, when in reality I have records with value $0, records
    with amounts
     >> $0 and also a bunch of blank records.
     >
     > Any help will be greatly appreciated.

    I don't know how read_excel() determines column types, but some
    functions look only at the first n rows to guess the type.  It appears
    you have a lot of NA values at the start.  That is a logical value, so
    that might be what is going wrong.

    In read.table() and related functions, you can specify the types of
    column explicitly.  It sounds as though that's what you should do if
    read_excel() offers that as a possibility.

    Duncan Murdoch


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R interpreting numeric field as a boolean field

Reply via email to