Re: [julia-users] Question: Forcing readtable to create string type on import
Hello Ralph, this worked. I changed added the eltypes option to force the readtable command to read the first column in as a string type rather than a destructive int32. *df1_readtable_old = readtable("$df1_path")* *df1_readtable_new = readtable("$df1_path", eltypes=[String,String,String])* julia> *eltypes(df1_readtable_old)* 3-element Array{Type,1}: Int32 String String julia>* eltypes(df1_readtable_new)* 3-element Array{Type,1}: * String* String String Thanks everyone for the support. julia> On Thursday, November 3, 2016 at 11:29:53 PM UTC-4, Ralph Smith wrote: > > Unless I misunderstand, > > df1 = readtable(file1,eltypes=[String,String,String]) > > > seems to be what you want. > > If you're new to Julia, the fact that a "vector of types" really means > exactly that may be surprising. > > Let us hope that the new versions of DataFrames include a parser that > doesn't treat most 10-digit numbers as Int32 on systems like yours. > > On Wednesday, November 2, 2016 at 4:15:20 PM UTC-4, LeAnthony Mathews > wrote: >> >> Spoke too soon. >> Again I simple want the CSV column that is read in to not be an int32, >> but a string. >> >> Still having issues casting the CSV file back into a Dataframe. >> Its hard to understand why the Julia system is attempting to determine >> the type of the columns when I use readtable and I have no control over >> this. >> >> Why can I not say: >> df1 = readtable(file1; types=Dict(1=>String)) # assuming your account >> number is column # 1 >> >> *Reading the Julia spec-Advanced Options for Reading CSV Files* >> *readtable accepts the following optional keyword arguments:* >> >> *eltypes::Vector{DataType} – Specify the types of all columns. Defaults >> to [].* >> >> >> *df1 = readtable(file1, Int32::Vector(String))* >> >> I get >> *ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}* >> >> Is this even an option? Or how about convert the df1_CSV to >> df1_dataframe? >> *df1_dataframe = convert(dataframe, df1_CSV)* >> Since the CSV .read seems to give more granular control. >> >> >> On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote: >>> >>> Great, that worked for forcing the column into a string type. >>> Thanks >>> >>> On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote: You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/ In this case, you'd do: df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number is column # 1 df2 = CSV.read(file2; types=Dict(1=>String)) -Jacob On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathewswrote: > Using v0.5.0 > I have two different 10,000 line CSV files that I am reading into two > different dataframe variables using the readtable function. > Each table has in common a ten digit account_number that I would like > to use as an index and join into one master file. > > Here is the account number example in the original CSV from file1: > 8018884596 > 8018893530 > 8018909633 > > When I do a readtable of this CSV into file1 then do a* > typeof(file1[:account_number])* I get: > *DataArrays.DataArray(Int32,1)* > -571049996 > -571041062 > -571024959 > > when I do a > *typeof(file2[:account_number])* > *DataArrays.DataArray(String,1)* > > > *Question: * > My CSV files give no guidance that account_number should be Int32 or > string type. How do I force it to make both account_number elements type > String? > > I would like this join command to work: > *new_account_join = join(file1, file2, on =:account_number,kind = > :left)* > > But I am getting this error: > *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, > got Array{* > *Array{Symbol,1},1}* > * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, > ::DataFrames.DataFrame, ::D* > *ataFrames.DataFrame) at .\:0* > > > Any help would be appreciated. > > >
Re: [julia-users] Question: Forcing readtable to create string type on import
Hello Jacob, see below: julia> Pkg.installed() Dict{String,VersionNumber} with 25 entries: "DataFrames"=> v"0.8.4" "DataStreams" => v"0.1.2" "Calculus" => v"0.1.15" "Reexport" => v"0.0.3" "BinDeps" => v"0.4.5" "Rmath" => v"0.1.4" "Dates" => v"0.4.4" "NullableArrays"=> v"0.0.10" "URIParser" => v"0.1.6" "GZip" => v"0.2.20" "CSV" => v"0.1.1" "RDatasets" => v"0.2.0" "SortingAlgorithms" => v"0.1.0" "Compat"=> v"0.9.3" "FileIO"=> v"0.2.0" "Distributions" => v"0.11.0" "DataArrays"=> v"0.3.9" "PDMats"=> v"0.5.0" "SHA" => v"0.2.1" "StatsBase" => v"0.11.1" "XGBoost" => v"0.2.0" "RData" => v"0.0.4" "WeakRefStrings"=> v"0.2.0" "StatsFuns" => v"0.3.1" "CategoricalArrays" => v"0.1.0" On Thursday, November 3, 2016 at 5:19:04 PM UTC-4, Jacob Quinn wrote: > > LeAnthony, > > I'm wondering if you're on an old version of DataFrames? There haven't > been any issues "show"-ing DataFrames with NullableArray columns for quite > some time. You can check (and post back here) your current package versions > by doing: > > Pkg.installed() > > You can also ensure you're on the latest valid release by doing: > > Pkg.update() > > > -Jacob > > On Thu, Nov 3, 2016 at 3:15 PM, Milan Bouchet-Valat> wrote: > >> Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit : >> > Thanks Michael, >> > I been thinking about this all day. Yes, basically I am going to >> > have to create a macro CSVreadtable that mimics the readtable >> > command, but in the expantion uses CSV.read. The macro will manually >> > constructs a similar readtable sized dataframe array, but use the >> > column types I specify or inherit from the original readtable >> > command. The macro can use the current CSV.read parameters. >> > >> > So this would work. >> > df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String)) >> > >> > so a: >> > eltypes(df1_CSVreadtable) >> > 3-element Array{Type,1}: >> > Int32 >> > String >> > String >> > >> > >> > Anyway, I was looking for a quick fix, but it least I will learn >> > some Julia. >> If you don't have missing values and just want a Vector{String}, you >> can pass nullable=false to CSV.read(). >> >> >> Regards >> >> > >> > >> > > DataFrames is currently undergoing a very major change. Looks like >> > > CSV creates the new type of DataFrames. I hope someone can help you >> > > with using that. As a workaround, on the normal DataFrames version, >> > > I have generally just replaced with a string representation: >> > > ``` >> > > df[:account_numbers] = ["$account_number" for account_number in >> > > df[:account_numbers]] >> > > >> > > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews > > > om> wrote: >> > > > Sure, so I need col #1 in my CSV to be a string in my data frame. >> > > > >> > > > >> > > > So as a test I tried to load the file 3 different ways: >> > > > >> > > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String)) #forcing >> > > > the column to stay a string >> > > > df1_readtable = readtable("$df1_path") #Do not know how to force >> > > > the column to stay a string >> > > > df1_convertDF = convert(DataFrame, df1_CSV) >> > > > >> > > > Here is the output: If they are all dataframes then showcols >> > > > should work an all three df1: >> > > > >> > > > julia> names(df1_CSV) >> > > > 3-element Array{Symbol,1}: >> > > > :account_number >> > > > Symbol("Discharge Date") >> > > > :site >> > > > >> > > > julia> names(df1_readtable) >> > > > 3-element Array{Symbol,1}: >> > > > :account_number >> > > > :Discharge_Date >> > > > :site >> > > > >> > > > julia> names(df1_convertDF) >> > > > 3-element Array{Symbol,1}: >> > > > :account_number >> > > > Symbol("Discharge Date") >> > > > :site >> > > > >> > > > >> > > > julia> eltypes(df1_CSV) >> > > > 3-element Array{Type,1}: >> > > > Nullable{String} >> > > > Nullable{WeakRefString{UInt8}} >> > > > Nullable{WeakRefString{UInt8}} >> > > > >> > > > julia> eltypes(df1_readtable) >> > > > 3-element Array{Type,1}: >> > > > Int32 #Do not know how to force the column to stay a string >> > > > String >> > > > String >> > > > >> > > > julia> eltypes(df1_convertDF) >> > > > 3-element Array{Type,1}: >> > > > Nullable{String} >> > > > Nullable{WeakRefString{UInt8}} >> > > > Nullable{WeakRefString{UInt8}} >> > > > >> > > > julia> showcols(df1_convertDF) >> > > > 1565x3 DataFrames.DataFrame >> > > > ERROR: MethodError: no method matching >> > > > countna(::NullableArrays.NullableArray{St >> > > > ring,1}) >> > > > Closest candidates are: >> > > > countna(::Array{T,N}) at >> > > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut >> > > > ils.jl:115 >> > > > countna(::DataArrays.DataArray{T,N}) at >> > > >
Re: [julia-users] Question: Forcing readtable to create string type on import
Unless I misunderstand, df1 = readtable(file1,eltypes=[String,String,String]) seems to be what you want. If you're new to Julia, the fact that a "vector of types" really means exactly that may be surprising. Let us hope that the new versions of DataFrames include a parser that doesn't treat most 10-digit numbers as Int32 on systems like yours. On Wednesday, November 2, 2016 at 4:15:20 PM UTC-4, LeAnthony Mathews wrote: > > Spoke too soon. > Again I simple want the CSV column that is read in to not be an int32, but > a string. > > Still having issues casting the CSV file back into a Dataframe. > Its hard to understand why the Julia system is attempting to determine the > type of the columns when I use readtable and I have no control over this. > > Why can I not say: > df1 = readtable(file1; types=Dict(1=>String)) # assuming your account > number is column # 1 > > *Reading the Julia spec-Advanced Options for Reading CSV Files* > *readtable accepts the following optional keyword arguments:* > > *eltypes::Vector{DataType} – Specify the types of all columns. Defaults to > [].* > > > *df1 = readtable(file1, Int32::Vector(String))* > > I get > *ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}* > > Is this even an option? Or how about convert the df1_CSV to > df1_dataframe? > *df1_dataframe = convert(dataframe, df1_CSV)* > Since the CSV .read seems to give more granular control. > > > On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote: >> >> Great, that worked for forcing the column into a string type. >> Thanks >> >> On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote: >>> >>> You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/ >>> >>> In this case, you'd do: >>> >>> df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account >>> number is column # 1 >>> df2 = CSV.read(file2; types=Dict(1=>String)) >>> >>> -Jacob >>> >>> >>> On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews>>> wrote: >>> Using v0.5.0 I have two different 10,000 line CSV files that I am reading into two different dataframe variables using the readtable function. Each table has in common a ten digit account_number that I would like to use as an index and join into one master file. Here is the account number example in the original CSV from file1: 8018884596 8018893530 8018909633 When I do a readtable of this CSV into file1 then do a* typeof(file1[:account_number])* I get: *DataArrays.DataArray(Int32,1)* -571049996 -571041062 -571024959 when I do a *typeof(file2[:account_number])* *DataArrays.DataArray(String,1)* *Question: * My CSV files give no guidance that account_number should be Int32 or string type. How do I force it to make both account_number elements type String? I would like this join command to work: *new_account_join = join(file1, file2, on =:account_number,kind = :left)* But I am getting this error: *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got Array{* *Array{Symbol,1},1}* * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, ::DataFrames.DataFrame, ::D* *ataFrames.DataFrame) at .\:0* Any help would be appreciated. >>>
Re: [julia-users] Question: Forcing readtable to create string type on import
LeAnthony, I'm wondering if you're on an old version of DataFrames? There haven't been any issues "show"-ing DataFrames with NullableArray columns for quite some time. You can check (and post back here) your current package versions by doing: Pkg.installed() You can also ensure you're on the latest valid release by doing: Pkg.update() -Jacob On Thu, Nov 3, 2016 at 3:15 PM, Milan Bouchet-Valatwrote: > Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit : > > Thanks Michael, > > I been thinking about this all day. Yes, basically I am going to > > have to create a macro CSVreadtable that mimics the readtable > > command, but in the expantion uses CSV.read. The macro will manually > > constructs a similar readtable sized dataframe array, but use the > > column types I specify or inherit from the original readtable > > command. The macro can use the current CSV.read parameters. > > > > So this would work. > > df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String)) > > > > so a: > > eltypes(df1_CSVreadtable) > > 3-element Array{Type,1}: > > Int32 > > String > > String > > > > > > Anyway, I was looking for a quick fix, but it least I will learn > > some Julia. > If you don't have missing values and just want a Vector{String}, you > can pass nullable=false to CSV.read(). > > > Regards > > > > > > > > DataFrames is currently undergoing a very major change. Looks like > > > CSV creates the new type of DataFrames. I hope someone can help you > > > with using that. As a workaround, on the normal DataFrames version, > > > I have generally just replaced with a string representation: > > > ``` > > > df[:account_numbers] = ["$account_number" for account_number in > > > df[:account_numbers]] > > > > > > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews > > om> wrote: > > > > Sure, so I need col #1 in my CSV to be a string in my data frame. > > > > > > > > > > > > So as a test I tried to load the file 3 different ways: > > > > > > > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String)) #forcing > > > > the column to stay a string > > > > df1_readtable = readtable("$df1_path") #Do not know how to force > > > > the column to stay a string > > > > df1_convertDF = convert(DataFrame, df1_CSV) > > > > > > > > Here is the output: If they are all dataframes then showcols > > > > should work an all three df1: > > > > > > > > julia> names(df1_CSV) > > > > 3-element Array{Symbol,1}: > > > > :account_number > > > > Symbol("Discharge Date") > > > > :site > > > > > > > > julia> names(df1_readtable) > > > > 3-element Array{Symbol,1}: > > > > :account_number > > > > :Discharge_Date > > > > :site > > > > > > > > julia> names(df1_convertDF) > > > > 3-element Array{Symbol,1}: > > > > :account_number > > > > Symbol("Discharge Date") > > > > :site > > > > > > > > > > > > julia> eltypes(df1_CSV) > > > > 3-element Array{Type,1}: > > > > Nullable{String} > > > > Nullable{WeakRefString{UInt8}} > > > > Nullable{WeakRefString{UInt8}} > > > > > > > > julia> eltypes(df1_readtable) > > > > 3-element Array{Type,1}: > > > > Int32 #Do not know how to force the column to stay a string > > > > String > > > > String > > > > > > > > julia> eltypes(df1_convertDF) > > > > 3-element Array{Type,1}: > > > > Nullable{String} > > > > Nullable{WeakRefString{UInt8}} > > > > Nullable{WeakRefString{UInt8}} > > > > > > > > julia> showcols(df1_convertDF) > > > > 1565x3 DataFrames.DataFrame > > > > ERROR: MethodError: no method matching > > > > countna(::NullableArrays.NullableArray{St > > > > ring,1}) > > > > Closest candidates are: > > > > countna(::Array{T,N}) at > > > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut > > > > ils.jl:115 > > > > countna(::DataArrays.DataArray{T,N}) at > > > > C:\Users\lmathews\.julia\v0.5\DataFram > > > > es\src\other\utils.jl:128 > > > > countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at > > > > C:\Users\lmathews\.ju > > > > lia\v0.5\DataFrames\src\other\utils.jl:143 > > > > in colmissing(::DataFrames.DataFrame) at > > > > C:\Users\lmathews\.julia\v0.5\DataFram > > > > es\src\abstractdataframe\abstractdataframe.jl:657 > > > > in showcols(::Base.TTY, ::DataFrames.DataFrame) at > > > > C:\Users\lmathews\.julia\v0. > > > > 5\DataFrames\src\abstractdataframe\show.jl:574 > > > > in showcols(::DataFrames.DataFrame) at > > > > C:\Users\lmathews\.julia\v0.5\DataFrames > > > > \src\abstractdataframe\show.jl:581 > > > > > > > > julia> showcols(df1_readtable) > > > > 1565x3 DataFrames.DataFrame > > > > │ Col # │ Name │ Eltype │ Missing │ > > > > ├───┼┼┼─┤ > > > > │ 1 │ account_number │ Int32 │ 0 │ > > > > │ 2 │ Discharge_Date │ String │ 0 │ > > > > │ 3 │ site │ String │ 0 │ > > > > > > > > julia> showcols(df1_CSV) > > > > 1565x3 DataFrames.DataFrame > > > > ERROR: MethodError: no method matching > > > >
Re: [julia-users] Question: Forcing readtable to create string type on import
Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit : > Thanks Michael, > I been thinking about this all day. Yes, basically I am going to > have to create a macro CSVreadtable that mimics the readtable > command, but in the expantion uses CSV.read. The macro will manually > constructs a similar readtable sized dataframe array, but use the > column types I specify or inherit from the original readtable > command. The macro can use the current CSV.read parameters. > > So this would work. > df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String)) > > so a: > eltypes(df1_CSVreadtable) > 3-element Array{Type,1}: > Int32 > String > String > > > Anyway, I was looking for a quick fix, but it least I will learn > some Julia. If you don't have missing values and just want a Vector{String}, you can pass nullable=false to CSV.read(). Regards > > > > DataFrames is currently undergoing a very major change. Looks like > > CSV creates the new type of DataFrames. I hope someone can help you > > with using that. As a workaround, on the normal DataFrames version, > > I have generally just replaced with a string representation: > > ``` > > df[:account_numbers] = ["$account_number" for account_number in > > df[:account_numbers]] > > > > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews> om> wrote: > > > Sure, so I need col #1 in my CSV to be a string in my data frame. > > > > > > > > > So as a test I tried to load the file 3 different ways: > > > > > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String)) #forcing > > > the column to stay a string > > > df1_readtable = readtable("$df1_path") #Do not know how to force > > > the column to stay a string > > > df1_convertDF = convert(DataFrame, df1_CSV) > > > > > > Here is the output: If they are all dataframes then showcols > > > should work an all three df1: > > > > > > julia> names(df1_CSV) > > > 3-element Array{Symbol,1}: > > > :account_number > > > Symbol("Discharge Date") > > > :site > > > > > > julia> names(df1_readtable) > > > 3-element Array{Symbol,1}: > > > :account_number > > > :Discharge_Date > > > :site > > > > > > julia> names(df1_convertDF) > > > 3-element Array{Symbol,1}: > > > :account_number > > > Symbol("Discharge Date") > > > :site > > > > > > > > > julia> eltypes(df1_CSV) > > > 3-element Array{Type,1}: > > > Nullable{String} > > > Nullable{WeakRefString{UInt8}} > > > Nullable{WeakRefString{UInt8}} > > > > > > julia> eltypes(df1_readtable) > > > 3-element Array{Type,1}: > > > Int32 #Do not know how to force the column to stay a string > > > String > > > String > > > > > > julia> eltypes(df1_convertDF) > > > 3-element Array{Type,1}: > > > Nullable{String} > > > Nullable{WeakRefString{UInt8}} > > > Nullable{WeakRefString{UInt8}} > > > > > > julia> showcols(df1_convertDF) > > > 1565x3 DataFrames.DataFrame > > > ERROR: MethodError: no method matching > > > countna(::NullableArrays.NullableArray{St > > > ring,1}) > > > Closest candidates are: > > > countna(::Array{T,N}) at > > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut > > > ils.jl:115 > > > countna(::DataArrays.DataArray{T,N}) at > > > C:\Users\lmathews\.julia\v0.5\DataFram > > > es\src\other\utils.jl:128 > > > countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at > > > C:\Users\lmathews\.ju > > > lia\v0.5\DataFrames\src\other\utils.jl:143 > > > in colmissing(::DataFrames.DataFrame) at > > > C:\Users\lmathews\.julia\v0.5\DataFram > > > es\src\abstractdataframe\abstractdataframe.jl:657 > > > in showcols(::Base.TTY, ::DataFrames.DataFrame) at > > > C:\Users\lmathews\.julia\v0. > > > 5\DataFrames\src\abstractdataframe\show.jl:574 > > > in showcols(::DataFrames.DataFrame) at > > > C:\Users\lmathews\.julia\v0.5\DataFrames > > > \src\abstractdataframe\show.jl:581 > > > > > > julia> showcols(df1_readtable) > > > 1565x3 DataFrames.DataFrame > > > │ Col # │ Name │ Eltype │ Missing │ > > > ├───┼┼┼─┤ > > > │ 1 │ account_number │ Int32 │ 0 │ > > > │ 2 │ Discharge_Date │ String │ 0 │ > > > │ 3 │ site │ String │ 0 │ > > > > > > julia> showcols(df1_CSV) > > > 1565x3 DataFrames.DataFrame > > > ERROR: MethodError: no method matching > > > countna(::NullableArrays.NullableArray{St > > > ring,1}) > > > Closest candidates are: > > > countna(::Array{T,N}) at > > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut > > > ils.jl:115 > > > countna(::DataArrays.DataArray{T,N}) at > > > C:\Users\lmathews\.julia\v0.5\DataFram > > > es\src\other\utils.jl:128 > > > countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at > > > C:\Users\lmathews\.ju > > > lia\v0.5\DataFrames\src\other\utils.jl:143 > > > in colmissing(::DataFrames.DataFrame) at > > > C:\Users\lmathews\.julia\v0.5\DataFram > > > es\src\abstractdataframe\abstractdataframe.jl:657 > > > in showcols(::Base.TTY, ::DataFrames.DataFrame) at > > >
Re: [julia-users] Question: Forcing readtable to create string type on import
Thanks Michael, I been thinking about this all day. Yes, basically I am going to have to create a macro *CSVreadtable* that mimics the *readtable* command, but in the expantion uses *CSV.read*. The macro will manually constructs a similar readtable sized dataframe array, but use the column types I specify or inherit from the original readtable command. The macro can use the current CSV.read parameters. So this would work. *df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))* so a: *eltypes(df1_CSVreadtable)* 3-element Array{Type,1}: Int32 String String Anyway, I was looking for a quick fix, but it least I will learn some Julia. On Thursday, November 3, 2016 at 4:05:23 PM UTC-4, Michael Borregaard wrote: > > DataFrames is currently undergoing a very major change. Looks like CSV > creates the new type of DataFrames. I hope someone can help you with using > that. As a workaround, on the normal DataFrames version, I have generally > just replaced with a string representation: > ``` > df[:account_numbers] = ["$account_number" for account_number in > df[:account_numbers]] > > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews> wrote: > >> Sure, so I need col #1 in my CSV to be a string in my data frame. >> >> So as a test I tried to load the file 3 different ways: >> >> *df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String)) #forcing the >> column to stay a string* >> *df1_readtable = readtable("$df1_path") #Do not know how to force the >> column to stay a string* >> *df1_convertDF = convert(DataFrame, df1_CSV)* >> >> Here is the output: If they are all dataframes then showcols should work >> an all three df1: >> >> *julia> names(df1_CSV)* >> 3-element Array{Symbol,1}: >> :account_number >> Symbol("Discharge Date") >> :site >> >> *julia> names(df1_readtable)* >> 3-element Array{Symbol,1}: >> :account_number >> :Discharge_Date >> :site >> >> *julia> names(df1_convertDF)* >> 3-element Array{Symbol,1}: >> :account_number >> Symbol("Discharge Date") >> :site >> >> >> *julia> eltypes(df1_CSV)* >> 3-element Array{Type,1}: >> Nullable{String} >> Nullable{WeakRefString{UInt8}} >> Nullable{WeakRefString{UInt8}} >> >> *julia> eltypes(df1_readtable)* >> 3-element Array{Type,1}: >> Int32 *#Do not know how to force the column to stay a string* >> String >> String >> >> *julia> eltypes(df1_convertDF)* >> 3-element Array{Type,1}: >> Nullable{String} >> Nullable{WeakRefString{UInt8}} >> Nullable{WeakRefString{UInt8}} >> >> *julia> showcols(df1_convertDF)* >> 1565x3 DataFrames.DataFrame >> ERROR: MethodError: no method matching >> countna(::NullableArrays.NullableArray{St >> ring,1}) >> Closest candidates are: >> countna(::Array{T,N}) at >> C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut >> ils.jl:115 >> countna(::DataArrays.DataArray{T,N}) at >> C:\Users\lmathews\.julia\v0.5\DataFram >> es\src\other\utils.jl:128 >> countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at >> C:\Users\lmathews\.ju >> lia\v0.5\DataFrames\src\other\utils.jl:143 >> in colmissing(::DataFrames.DataFrame) at >> C:\Users\lmathews\.julia\v0.5\DataFram >> es\src\abstractdataframe\abstractdataframe.jl:657 >> in showcols(::Base.TTY, ::DataFrames.DataFrame) at >> C:\Users\lmathews\.julia\v0. >> 5\DataFrames\src\abstractdataframe\show.jl:574 >> in showcols(::DataFrames.DataFrame) at >> C:\Users\lmathews\.julia\v0.5\DataFrames >> \src\abstractdataframe\show.jl:581 >> >> *julia> showcols(df1_readtable)* >> 1565x3 DataFrames.DataFrame >> │ Col # │ Name │ Eltype │ Missing │ >> ├───┼┼┼─┤ >> │ 1 │ account_number │ Int32 │ 0 │ >> │ 2 │ Discharge_Date │ String │ 0 │ >> │ 3 │ site │ String │ 0 │ >> >> *julia> showcols(df1_CSV)* >> 1565x3 DataFrames.DataFrame >> ERROR: MethodError: no method matching >> countna(::NullableArrays.NullableArray{St >> ring,1}) >> Closest candidates are: >> countna(::Array{T,N}) at >> C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut >> ils.jl:115 >> countna(::DataArrays.DataArray{T,N}) at >> C:\Users\lmathews\.julia\v0.5\DataFram >> es\src\other\utils.jl:128 >> countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at >> C:\Users\lmathews\.ju >> lia\v0.5\DataFrames\src\other\utils.jl:143 >> in colmissing(::DataFrames.DataFrame) at >> C:\Users\lmathews\.julia\v0.5\DataFram >> es\src\abstractdataframe\abstractdataframe.jl:657 >> in showcols(::Base.TTY, ::DataFrames.DataFrame) at >> C:\Users\lmathews\.julia\v0. >> 5\DataFrames\src\abstractdataframe\show.jl:574 >> in showcols(::DataFrames.DataFrame) at >> C:\Users\lmathews\.julia\v0.5\DataFrames >> \src\abstractdataframe\show.jl:581 >> >> >> >> On Thursday, November 3, 2016 at 8:54:19 AM UTC-4, Michael Borregaard >> wrote: >>> >>> The result of CSV should be a DataFrame by default. What return type do >>> you get? >>> >> >
Re: [julia-users] Question: Forcing readtable to create string type on import
DataFrames is currently undergoing a very major change. Looks like CSV creates the new type of DataFrames. I hope someone can help you with using that. As a workaround, on the normal DataFrames version, I have generally just replaced with a string representation: ``` df[:account_numbers] = ["$account_number" for account_number in df[:account_numbers]] On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathewswrote: > Sure, so I need col #1 in my CSV to be a string in my data frame. > > So as a test I tried to load the file 3 different ways: > > *df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String)) #forcing the > column to stay a string* > *df1_readtable = readtable("$df1_path") #Do not know how to force the > column to stay a string* > *df1_convertDF = convert(DataFrame, df1_CSV)* > > Here is the output: If they are all dataframes then showcols should work > an all three df1: > > *julia> names(df1_CSV)* > 3-element Array{Symbol,1}: > :account_number > Symbol("Discharge Date") > :site > > *julia> names(df1_readtable)* > 3-element Array{Symbol,1}: > :account_number > :Discharge_Date > :site > > *julia> names(df1_convertDF)* > 3-element Array{Symbol,1}: > :account_number > Symbol("Discharge Date") > :site > > > *julia> eltypes(df1_CSV)* > 3-element Array{Type,1}: > Nullable{String} > Nullable{WeakRefString{UInt8}} > Nullable{WeakRefString{UInt8}} > > *julia> eltypes(df1_readtable)* > 3-element Array{Type,1}: > Int32 *#Do not know how to force the column to stay a string* > String > String > > *julia> eltypes(df1_convertDF)* > 3-element Array{Type,1}: > Nullable{String} > Nullable{WeakRefString{UInt8}} > Nullable{WeakRefString{UInt8}} > > *julia> showcols(df1_convertDF)* > 1565x3 DataFrames.DataFrame > ERROR: MethodError: no method matching countna(::NullableArrays. > NullableArray{St > ring,1}) > Closest candidates are: > countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\ > DataFrames\src\other\ut > ils.jl:115 > countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\ > DataFram > es\src\other\utils.jl:128 > countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at > C:\Users\lmathews\.ju > lia\v0.5\DataFrames\src\other\utils.jl:143 > in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\ > DataFram > es\src\abstractdataframe\abstractdataframe.jl:657 > in showcols(::Base.TTY, ::DataFrames.DataFrame) at > C:\Users\lmathews\.julia\v0. > 5\DataFrames\src\abstractdataframe\show.jl:574 > in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\ > DataFrames > \src\abstractdataframe\show.jl:581 > > *julia> showcols(df1_readtable)* > 1565x3 DataFrames.DataFrame > │ Col # │ Name │ Eltype │ Missing │ > ├───┼┼┼─┤ > │ 1 │ account_number │ Int32 │ 0 │ > │ 2 │ Discharge_Date │ String │ 0 │ > │ 3 │ site │ String │ 0 │ > > *julia> showcols(df1_CSV)* > 1565x3 DataFrames.DataFrame > ERROR: MethodError: no method matching countna(::NullableArrays. > NullableArray{St > ring,1}) > Closest candidates are: > countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\ > DataFrames\src\other\ut > ils.jl:115 > countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\ > DataFram > es\src\other\utils.jl:128 > countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at > C:\Users\lmathews\.ju > lia\v0.5\DataFrames\src\other\utils.jl:143 > in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\ > DataFram > es\src\abstractdataframe\abstractdataframe.jl:657 > in showcols(::Base.TTY, ::DataFrames.DataFrame) at > C:\Users\lmathews\.julia\v0. > 5\DataFrames\src\abstractdataframe\show.jl:574 > in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\ > DataFrames > \src\abstractdataframe\show.jl:581 > > > > On Thursday, November 3, 2016 at 8:54:19 AM UTC-4, Michael Borregaard > wrote: >> >> The result of CSV should be a DataFrame by default. What return type do >> you get? >> >
Re: [julia-users] Question: Forcing readtable to create string type on import
The result of CSV should be a DataFrame by default. What return type do you get?
Re: [julia-users] Question: Forcing readtable to create string type on import
Spoke too soon. Again I simple want the CSV column that is read in to not be an int32, but a string. Still having issues casting the CSV file back into a Dataframe. Its hard to understand why the Julia system is attempting to determine the type of the columns when I use readtable and I have no control over this. Why can I not say: df1 = readtable(file1; types=Dict(1=>String)) # assuming your account number is column # 1 *Reading the Julia spec-Advanced Options for Reading CSV Files* *readtable accepts the following optional keyword arguments:* *eltypes::Vector{DataType} – Specify the types of all columns. Defaults to [].* *df1 = readtable(file1, Int32::Vector(String))* I get *ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}* Is this even an option? Or how about convert the df1_CSV to df1_dataframe? *df1_dataframe = convert(dataframe, df1_CSV)* Since the CSV .read seems to give more granular control. On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote: > > Great, that worked for forcing the column into a string type. > Thanks > > On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote: >> >> You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/ >> >> In this case, you'd do: >> >> df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account >> number is column # 1 >> df2 = CSV.read(file2; types=Dict(1=>String)) >> >> -Jacob >> >> >> On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews>> wrote: >> >>> Using v0.5.0 >>> I have two different 10,000 line CSV files that I am reading into two >>> different dataframe variables using the readtable function. >>> Each table has in common a ten digit account_number that I would like to >>> use as an index and join into one master file. >>> >>> Here is the account number example in the original CSV from file1: >>> 8018884596 >>> 8018893530 >>> 8018909633 >>> >>> When I do a readtable of this CSV into file1 then do a* >>> typeof(file1[:account_number])* I get: >>> *DataArrays.DataArray(Int32,1)* >>> -571049996 >>> -571041062 >>> -571024959 >>> >>> when I do a >>> *typeof(file2[:account_number])* >>> *DataArrays.DataArray(String,1)* >>> >>> >>> *Question: * >>> My CSV files give no guidance that account_number should be Int32 or >>> string type. How do I force it to make both account_number elements type >>> String? >>> >>> I would like this join command to work: >>> *new_account_join = join(file1, file2, on =:account_number,kind = :left)* >>> >>> But I am getting this error: >>> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, >>> got Array{* >>> *Array{Symbol,1},1}* >>> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, >>> ::DataFrames.DataFrame, ::D* >>> *ataFrames.DataFrame) at .\:0* >>> >>> >>> Any help would be appreciated. >>> >>> >>> >>
Re: [julia-users] Question: Forcing readtable to create string type on import
Great, that worked for forcing the column into a string type. Thanks On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote: > > You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/ > > In this case, you'd do: > > df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account > number is column # 1 > df2 = CSV.read(file2; types=Dict(1=>String)) > > -Jacob > > > On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews> wrote: > >> Using v0.5.0 >> I have two different 10,000 line CSV files that I am reading into two >> different dataframe variables using the readtable function. >> Each table has in common a ten digit account_number that I would like to >> use as an index and join into one master file. >> >> Here is the account number example in the original CSV from file1: >> 8018884596 >> 8018893530 >> 8018909633 >> >> When I do a readtable of this CSV into file1 then do a* >> typeof(file1[:account_number])* I get: >> *DataArrays.DataArray(Int32,1)* >> -571049996 >> -571041062 >> -571024959 >> >> when I do a >> *typeof(file2[:account_number])* >> *DataArrays.DataArray(String,1)* >> >> >> *Question: * >> My CSV files give no guidance that account_number should be Int32 or >> string type. How do I force it to make both account_number elements type >> String? >> >> I would like this join command to work: >> *new_account_join = join(file1, file2, on =:account_number,kind = :left)* >> >> But I am getting this error: >> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, >> got Array{* >> *Array{Symbol,1},1}* >> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, >> ::DataFrames.DataFrame, ::D* >> *ataFrames.DataFrame) at .\:0* >> >> >> Any help would be appreciated. >> >> >> >
Re: [julia-users] Question: Forcing readtable to create string type on import
You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/ In this case, you'd do: df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number is column # 1 df2 = CSV.read(file2; types=Dict(1=>String)) -Jacob On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathewswrote: > Using v0.5.0 > I have two different 10,000 line CSV files that I am reading into two > different dataframe variables using the readtable function. > Each table has in common a ten digit account_number that I would like to > use as an index and join into one master file. > > Here is the account number example in the original CSV from file1: > 8018884596 > 8018893530 > 8018909633 > > When I do a readtable of this CSV into file1 then do a* > typeof(file1[:account_number])* I get: > *DataArrays.DataArray(Int32,1)* > -571049996 > -571041062 > -571024959 > > when I do a > *typeof(file2[:account_number])* > *DataArrays.DataArray(String,1)* > > > *Question: * > My CSV files give no guidance that account_number should be Int32 or > string type. How do I force it to make both account_number elements type > String? > > I would like this join command to work: > *new_account_join = join(file1, file2, on =:account_number,kind = :left)* > > But I am getting this error: > *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got > Array{* > *Array{Symbol,1},1}* > * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, > ::DataFrames.DataFrame, ::D* > *ataFrames.DataFrame) at .\:0* > > > Any help would be appreciated. > > >
[julia-users] Question: Forcing readtable to create string type on import
Using v0.5.0 I have two different 10,000 line CSV files that I am reading into two different dataframe variables using the readtable function. Each table has in common a ten digit account_number that I would like to use as an index and join into one master file. Here is the account number example in the original CSV from file1: 8018884596 8018893530 8018909633 When I do a readtable of this CSV into file1 then do a* typeof(file1[:account_number])* I get: *DataArrays.DataArray(Int32,1)* -571049996 -571041062 -571024959 when I do a *typeof(file2[:account_number])* *DataArrays.DataArray(String,1)* *Question: * My CSV files give no guidance that account_number should be Int32 or string type. How do I force it to make both account_number elements type String? I would like this join command to work: *new_account_join = join(file1, file2, on =:account_number,kind = :left)* But I am getting this error: *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got Array{* *Array{Symbol,1},1}* * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, ::DataFrames.DataFrame, ::D* *ataFrames.DataFrame) at .\:0* Any help would be appreciated.