Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-04 Thread LeAnthony Mathews
Hello Ralph,  this worked.
  
I changed added the eltypes option to force the readtable command to read 
the first column in as a string type rather than a destructive int32. 
*df1_readtable_old = readtable("$df1_path")*
*df1_readtable_new = readtable("$df1_path", eltypes=[String,String,String])*

julia> *eltypes(df1_readtable_old)*
3-element Array{Type,1}:
 Int32
 String
 String

julia>* eltypes(df1_readtable_new)*
3-element Array{Type,1}:
* String*
 String
 String

Thanks everyone for the support.

julia>

On Thursday, November 3, 2016 at 11:29:53 PM UTC-4, Ralph Smith wrote:
>
> Unless I misunderstand,
>
> df1 = readtable(file1,eltypes=[String,String,String])
>
>
> seems to be what you want.
>
> If you're new to Julia, the fact that a "vector of types" really means 
> exactly that may be surprising. 
>
> Let us hope that the new versions of DataFrames include a parser that 
> doesn't treat most 10-digit numbers as Int32 on systems like yours.
>
> On Wednesday, November 2, 2016 at 4:15:20 PM UTC-4, LeAnthony Mathews 
> wrote:
>>
>> Spoke too soon.  
>> Again I simple want the CSV column that is read in to not be an int32, 
>> but a string.
>>
>> Still having issues casting the CSV file back into a Dataframe.
>> Its hard to understand why the Julia system is attempting to determine 
>> the type of the columns when I use readtable and I have no control over 
>> this.
>>
>> Why can I not say:
>> df1 = readtable(file1; types=Dict(1=>String)) # assuming your account 
>> number is column # 1
>>
>> *Reading the Julia spec-Advanced Options for Reading CSV Files*
>> *readtable accepts the following optional keyword arguments:*
>>
>> *eltypes::Vector{DataType} – Specify the types of all columns. Defaults 
>> to [].*
>>
>>
>> *df1 = readtable(file1, Int32::Vector(String))*
>>
>> I get 
>> *ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}*
>>
>> Is this even an option?  Or how about convert the df1_CSV to 
>> df1_dataframe?  
>> *df1_dataframe = convert(dataframe, df1_CSV)*
>> Since the CSV .read seems to give more granular control.
>>
>>
>> On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote:
>>>
>>> Great, that worked for forcing the column into a string type.
>>> Thanks
>>>
>>> On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:

 You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/

 In this case, you'd do:

 df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account 
 number is column # 1
 df2 = CSV.read(file2; types=Dict(1=>String))

 -Jacob


 On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews  wrote:

> Using v0.5.0
> I have two different 10,000 line CSV files that I am reading into two 
> different dataframe variables using the readtable function.
> Each table has in common a ten digit account_number that I would like 
> to use as an index and join into one master file.
>
> Here is the account number example in the original CSV from file1:
> 8018884596
> 8018893530
> 8018909633
>
> When I do a readtable of this CSV into file1 then do a* 
> typeof(file1[:account_number])* I get:
> *DataArrays.DataArray(Int32,1)*
>  -571049996
>  -571041062
>  -571024959
>
> when I do a 
> *typeof(file2[:account_number])*
> *DataArrays.DataArray(String,1)*
>
>
> *Question:  *
> My CSV files give no guidance that account_number should be Int32 or 
> string type.  How do I force it to make both account_number elements type 
> String?
>
> I would like this join command to work:
> *new_account_join = join(file1, file2, on =:account_number,kind = 
> :left)*
>
> But I am getting this error:
> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, 
> got Array{*
> *Array{Symbol,1},1}*
> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, 
> ::DataFrames.DataFrame, ::D*
> *ataFrames.DataFrame) at .\:0*
>
>
> Any help would be appreciated.  
>
>
>


Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-04 Thread LeAnthony Mathews
Hello Jacob, see below:

julia> Pkg.installed()
Dict{String,VersionNumber} with 25 entries:
  "DataFrames"=> v"0.8.4"
  "DataStreams"   => v"0.1.2"
  "Calculus"  => v"0.1.15"
  "Reexport"  => v"0.0.3"
  "BinDeps"   => v"0.4.5"
  "Rmath" => v"0.1.4"
  "Dates" => v"0.4.4"
  "NullableArrays"=> v"0.0.10"
  "URIParser" => v"0.1.6"
  "GZip"  => v"0.2.20"
  "CSV"   => v"0.1.1"
  "RDatasets" => v"0.2.0"
  "SortingAlgorithms" => v"0.1.0"
  "Compat"=> v"0.9.3"
  "FileIO"=> v"0.2.0"
  "Distributions" => v"0.11.0"
  "DataArrays"=> v"0.3.9"
  "PDMats"=> v"0.5.0"
  "SHA"   => v"0.2.1"
  "StatsBase" => v"0.11.1"
  "XGBoost"   => v"0.2.0"
  "RData" => v"0.0.4"
  "WeakRefStrings"=> v"0.2.0"
  "StatsFuns" => v"0.3.1"
  "CategoricalArrays" => v"0.1.0"

On Thursday, November 3, 2016 at 5:19:04 PM UTC-4, Jacob Quinn wrote:
>
> LeAnthony,
>
> I'm wondering if you're on an old version of DataFrames? There haven't 
> been any issues "show"-ing DataFrames with NullableArray columns for quite 
> some time. You can check (and post back here) your current package versions 
> by doing:
>
> Pkg.installed()
>
> You can also ensure you're on the latest valid release by doing:
>
> Pkg.update()
>
>
> -Jacob
>
> On Thu, Nov 3, 2016 at 3:15 PM, Milan Bouchet-Valat  > wrote:
>
>> Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit :
>> > Thanks Michael,
>> >   I been thinking about this all day.  Yes, basically I am going to
>> > have to create a macro CSVreadtable that mimics the readtable
>> > command, but in the expantion uses CSV.read.  The macro will manually
>> > constructs a similar readtable sized dataframe array, but use the
>> > column types I specify or inherit from the original readtable
>> > command.  The macro can use the current CSV.read parameters.
>> >
>> > So this would work.
>> > df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))  
>> >
>> > so a:
>> > eltypes(df1_CSVreadtable)
>> > 3-element Array{Type,1}:
>> >  Int32   
>> >  String
>> >  String
>> >
>> >
>> >   Anyway, I was looking for a quick fix, but it least I will learn
>> > some Julia.
>> If you don't have missing values and just want a Vector{String}, you
>> can pass nullable=false to CSV.read().
>>
>>
>> Regards
>>
>> >
>> >
>> > > DataFrames is currently undergoing a very major change. Looks like
>> > > CSV creates the new type of DataFrames. I hope someone can help you
>> > > with using that. As a workaround, on the normal DataFrames version,
>> > > I have generally just replaced with a string representation:
>> > > ```
>> > > df[:account_numbers] = ["$account_number" for account_number in
>> > > df[:account_numbers]]
>> > >
>> > > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews > > > om> wrote:
>> > > > Sure, so I need col #1 in my CSV to be a string in my data frame.
>> > > >   
>> > > >
>> > > > So as a test  I tried to load the file 3 different ways:
>> > > >
>> > > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing
>> > > > the column to stay a string
>> > > > df1_readtable = readtable("$df1_path")  #Do not know how to force
>> > > > the column to stay a string
>> > > > df1_convertDF = convert(DataFrame, df1_CSV)
>> > > >
>> > > > Here is the output:  If they are all dataframes then showcols
>> > > > should work an all three df1:
>> > > >
>> > > > julia> names(df1_CSV)
>> > > > 3-element Array{Symbol,1}:
>> > > >  :account_number
>> > > >  Symbol("Discharge Date")
>> > > >  :site
>> > > >
>> > > > julia> names(df1_readtable)
>> > > > 3-element Array{Symbol,1}:
>> > > >  :account_number
>> > > >  :Discharge_Date
>> > > >  :site
>> > > >
>> > > > julia> names(df1_convertDF)
>> > > > 3-element Array{Symbol,1}:
>> > > >  :account_number
>> > > >  Symbol("Discharge Date")
>> > > >  :site
>> > > >
>> > > >
>> > > > julia> eltypes(df1_CSV)
>> > > > 3-element Array{Type,1}:
>> > > >  Nullable{String}
>> > > >  Nullable{WeakRefString{UInt8}}
>> > > >  Nullable{WeakRefString{UInt8}}
>> > > >
>> > > > julia> eltypes(df1_readtable)
>> > > > 3-element Array{Type,1}:
>> > > >  Int32   #Do not know how to force the column to stay a string
>> > > >  String
>> > > >  String
>> > > >
>> > > > julia> eltypes(df1_convertDF)
>> > > > 3-element Array{Type,1}:
>> > > >  Nullable{String}
>> > > >  Nullable{WeakRefString{UInt8}}
>> > > >  Nullable{WeakRefString{UInt8}}
>> > > >
>> > > > julia> showcols(df1_convertDF)
>> > > > 1565x3 DataFrames.DataFrame
>> > > > ERROR: MethodError: no method matching
>> > > > countna(::NullableArrays.NullableArray{St
>> > > > ring,1})
>> > > > Closest candidates are:
>> > > >   countna(::Array{T,N}) at
>> > > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
>> > > > ils.jl:115
>> > > >   countna(::DataArrays.DataArray{T,N}) at
>> > > > 

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-03 Thread Ralph Smith
Unless I misunderstand,

df1 = readtable(file1,eltypes=[String,String,String])


seems to be what you want.

If you're new to Julia, the fact that a "vector of types" really means 
exactly that may be surprising. 

Let us hope that the new versions of DataFrames include a parser that 
doesn't treat most 10-digit numbers as Int32 on systems like yours.

On Wednesday, November 2, 2016 at 4:15:20 PM UTC-4, LeAnthony Mathews wrote:
>
> Spoke too soon.  
> Again I simple want the CSV column that is read in to not be an int32, but 
> a string.
>
> Still having issues casting the CSV file back into a Dataframe.
> Its hard to understand why the Julia system is attempting to determine the 
> type of the columns when I use readtable and I have no control over this.
>
> Why can I not say:
> df1 = readtable(file1; types=Dict(1=>String)) # assuming your account 
> number is column # 1
>
> *Reading the Julia spec-Advanced Options for Reading CSV Files*
> *readtable accepts the following optional keyword arguments:*
>
> *eltypes::Vector{DataType} – Specify the types of all columns. Defaults to 
> [].*
>
>
> *df1 = readtable(file1, Int32::Vector(String))*
>
> I get 
> *ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}*
>
> Is this even an option?  Or how about convert the df1_CSV to 
> df1_dataframe?  
> *df1_dataframe = convert(dataframe, df1_CSV)*
> Since the CSV .read seems to give more granular control.
>
>
> On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote:
>>
>> Great, that worked for forcing the column into a string type.
>> Thanks
>>
>> On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
>>>
>>> You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/
>>>
>>> In this case, you'd do:
>>>
>>> df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account 
>>> number is column # 1
>>> df2 = CSV.read(file2; types=Dict(1=>String))
>>>
>>> -Jacob
>>>
>>>
>>> On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews  
>>> wrote:
>>>
 Using v0.5.0
 I have two different 10,000 line CSV files that I am reading into two 
 different dataframe variables using the readtable function.
 Each table has in common a ten digit account_number that I would like 
 to use as an index and join into one master file.

 Here is the account number example in the original CSV from file1:
 8018884596
 8018893530
 8018909633

 When I do a readtable of this CSV into file1 then do a* 
 typeof(file1[:account_number])* I get:
 *DataArrays.DataArray(Int32,1)*
  -571049996
  -571041062
  -571024959

 when I do a 
 *typeof(file2[:account_number])*
 *DataArrays.DataArray(String,1)*


 *Question:  *
 My CSV files give no guidance that account_number should be Int32 or 
 string type.  How do I force it to make both account_number elements type 
 String?

 I would like this join command to work:
 *new_account_join = join(file1, file2, on =:account_number,kind = 
 :left)*

 But I am getting this error:
 *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, 
 got Array{*
 *Array{Symbol,1},1}*
 * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, 
 ::DataFrames.DataFrame, ::D*
 *ataFrames.DataFrame) at .\:0*


 Any help would be appreciated.  



>>>

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-03 Thread Jacob Quinn
LeAnthony,

I'm wondering if you're on an old version of DataFrames? There haven't been
any issues "show"-ing DataFrames with NullableArray columns for quite some
time. You can check (and post back here) your current package versions by
doing:

Pkg.installed()

You can also ensure you're on the latest valid release by doing:

Pkg.update()


-Jacob

On Thu, Nov 3, 2016 at 3:15 PM, Milan Bouchet-Valat 
wrote:

> Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit :
> > Thanks Michael,
> >   I been thinking about this all day.  Yes, basically I am going to
> > have to create a macro CSVreadtable that mimics the readtable
> > command, but in the expantion uses CSV.read.  The macro will manually
> > constructs a similar readtable sized dataframe array, but use the
> > column types I specify or inherit from the original readtable
> > command.  The macro can use the current CSV.read parameters.
> >
> > So this would work.
> > df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))
> >
> > so a:
> > eltypes(df1_CSVreadtable)
> > 3-element Array{Type,1}:
> >  Int32
> >  String
> >  String
> >
> >
> >   Anyway, I was looking for a quick fix, but it least I will learn
> > some Julia.
> If you don't have missing values and just want a Vector{String}, you
> can pass nullable=false to CSV.read().
>
>
> Regards
>
> >
> >
> > > DataFrames is currently undergoing a very major change. Looks like
> > > CSV creates the new type of DataFrames. I hope someone can help you
> > > with using that. As a workaround, on the normal DataFrames version,
> > > I have generally just replaced with a string representation:
> > > ```
> > > df[:account_numbers] = ["$account_number" for account_number in
> > > df[:account_numbers]]
> > >
> > > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews  > > om> wrote:
> > > > Sure, so I need col #1 in my CSV to be a string in my data frame.
> > > >
> > > >
> > > > So as a test  I tried to load the file 3 different ways:
> > > >
> > > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing
> > > > the column to stay a string
> > > > df1_readtable = readtable("$df1_path")  #Do not know how to force
> > > > the column to stay a string
> > > > df1_convertDF = convert(DataFrame, df1_CSV)
> > > >
> > > > Here is the output:  If they are all dataframes then showcols
> > > > should work an all three df1:
> > > >
> > > > julia> names(df1_CSV)
> > > > 3-element Array{Symbol,1}:
> > > >  :account_number
> > > >  Symbol("Discharge Date")
> > > >  :site
> > > >
> > > > julia> names(df1_readtable)
> > > > 3-element Array{Symbol,1}:
> > > >  :account_number
> > > >  :Discharge_Date
> > > >  :site
> > > >
> > > > julia> names(df1_convertDF)
> > > > 3-element Array{Symbol,1}:
> > > >  :account_number
> > > >  Symbol("Discharge Date")
> > > >  :site
> > > >
> > > >
> > > > julia> eltypes(df1_CSV)
> > > > 3-element Array{Type,1}:
> > > >  Nullable{String}
> > > >  Nullable{WeakRefString{UInt8}}
> > > >  Nullable{WeakRefString{UInt8}}
> > > >
> > > > julia> eltypes(df1_readtable)
> > > > 3-element Array{Type,1}:
> > > >  Int32   #Do not know how to force the column to stay a string
> > > >  String
> > > >  String
> > > >
> > > > julia> eltypes(df1_convertDF)
> > > > 3-element Array{Type,1}:
> > > >  Nullable{String}
> > > >  Nullable{WeakRefString{UInt8}}
> > > >  Nullable{WeakRefString{UInt8}}
> > > >
> > > > julia> showcols(df1_convertDF)
> > > > 1565x3 DataFrames.DataFrame
> > > > ERROR: MethodError: no method matching
> > > > countna(::NullableArrays.NullableArray{St
> > > > ring,1})
> > > > Closest candidates are:
> > > >   countna(::Array{T,N}) at
> > > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > > ils.jl:115
> > > >   countna(::DataArrays.DataArray{T,N}) at
> > > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > > es\src\other\utils.jl:128
> > > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > > C:\Users\lmathews\.ju
> > > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > > >  in colmissing(::DataFrames.DataFrame) at
> > > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > > es\src\abstractdataframe\abstractdataframe.jl:657
> > > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > > C:\Users\lmathews\.julia\v0.
> > > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > > >  in showcols(::DataFrames.DataFrame) at
> > > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > > \src\abstractdataframe\show.jl:581
> > > >
> > > > julia> showcols(df1_readtable)
> > > > 1565x3 DataFrames.DataFrame
> > > > │ Col # │ Name   │ Eltype │ Missing │
> > > > ├───┼┼┼─┤
> > > > │ 1 │ account_number │ Int32  │ 0   │
> > > > │ 2 │ Discharge_Date │ String │ 0   │
> > > > │ 3 │ site   │ String │ 0   │
> > > >
> > > > julia> showcols(df1_CSV)
> > > > 1565x3 DataFrames.DataFrame
> > > > ERROR: MethodError: no method matching
> > > > 

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-03 Thread Milan Bouchet-Valat
Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit :
> Thanks Michael,
>   I been thinking about this all day.  Yes, basically I am going to
> have to create a macro CSVreadtable that mimics the readtable
> command, but in the expantion uses CSV.read.  The macro will manually
> constructs a similar readtable sized dataframe array, but use the
> column types I specify or inherit from the original readtable
> command.  The macro can use the current CSV.read parameters.
> 
> So this would work.
> df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))  
> 
> so a:
> eltypes(df1_CSVreadtable)
> 3-element Array{Type,1}:
>  Int32   
>  String
>  String
> 
> 
>   Anyway, I was looking for a quick fix, but it least I will learn
> some Julia.
If you don't have missing values and just want a Vector{String}, you
can pass nullable=false to CSV.read().


Regards

> 
> 
> > DataFrames is currently undergoing a very major change. Looks like
> > CSV creates the new type of DataFrames. I hope someone can help you
> > with using that. As a workaround, on the normal DataFrames version,
> > I have generally just replaced with a string representation:
> > ```
> > df[:account_numbers] = ["$account_number" for account_number in
> > df[:account_numbers]]
> > 
> > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews  > om> wrote:
> > > Sure, so I need col #1 in my CSV to be a string in my data frame.
> > >   
> > > 
> > > So as a test  I tried to load the file 3 different ways:
> > > 
> > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing
> > > the column to stay a string
> > > df1_readtable = readtable("$df1_path")  #Do not know how to force
> > > the column to stay a string
> > > df1_convertDF = convert(DataFrame, df1_CSV)
> > > 
> > > Here is the output:  If they are all dataframes then showcols
> > > should work an all three df1:
> > > 
> > > julia> names(df1_CSV)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  Symbol("Discharge Date")
> > >  :site
> > > 
> > > julia> names(df1_readtable)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  :Discharge_Date
> > >  :site
> > > 
> > > julia> names(df1_convertDF)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  Symbol("Discharge Date")
> > >  :site
> > > 
> > > 
> > > julia> eltypes(df1_CSV)
> > > 3-element Array{Type,1}:
> > >  Nullable{String}
> > >  Nullable{WeakRefString{UInt8}}
> > >  Nullable{WeakRefString{UInt8}}
> > > 
> > > julia> eltypes(df1_readtable)
> > > 3-element Array{Type,1}:
> > >  Int32   #Do not know how to force the column to stay a string
> > >  String
> > >  String
> > > 
> > > julia> eltypes(df1_convertDF)
> > > 3-element Array{Type,1}:
> > >  Nullable{String}
> > >  Nullable{WeakRefString{UInt8}}
> > >  Nullable{WeakRefString{UInt8}}
> > > 
> > > julia> showcols(df1_convertDF)
> > > 1565x3 DataFrames.DataFrame
> > > ERROR: MethodError: no method matching
> > > countna(::NullableArrays.NullableArray{St
> > > ring,1})
> > > Closest candidates are:
> > >   countna(::Array{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > ils.jl:115
> > >   countna(::DataArrays.DataArray{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\other\utils.jl:128
> > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > C:\Users\lmathews\.ju
> > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > >  in colmissing(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\abstractdataframe\abstractdataframe.jl:657
> > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.
> > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > >  in showcols(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > \src\abstractdataframe\show.jl:581
> > > 
> > > julia> showcols(df1_readtable)
> > > 1565x3 DataFrames.DataFrame
> > > │ Col # │ Name           │ Eltype │ Missing │
> > > ├───┼┼┼─┤
> > > │ 1     │ account_number │ Int32  │ 0       │
> > > │ 2     │ Discharge_Date │ String │ 0       │
> > > │ 3     │ site           │ String │ 0       │
> > > 
> > > julia> showcols(df1_CSV)
> > > 1565x3 DataFrames.DataFrame
> > > ERROR: MethodError: no method matching
> > > countna(::NullableArrays.NullableArray{St
> > > ring,1})
> > > Closest candidates are:
> > >   countna(::Array{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > ils.jl:115
> > >   countna(::DataArrays.DataArray{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\other\utils.jl:128
> > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > C:\Users\lmathews\.ju
> > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > >  in colmissing(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\abstractdataframe\abstractdataframe.jl:657
> > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > 

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-03 Thread LeAnthony Mathews
Thanks Michael,
  I been thinking about this all day.  Yes, basically I am going to have to 
create a macro *CSVreadtable* that mimics the *readtable* command, but in 
the expantion uses *CSV.read*.  The macro will manually constructs a 
similar readtable sized dataframe array, but use the column types I specify 
or inherit from the original readtable command.  The macro can use the 
current CSV.read parameters.

So this would work.
*df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))*  

so a:
*eltypes(df1_CSVreadtable)*
3-element Array{Type,1}:
 Int32   
 String
 String


  Anyway, I was looking for a quick fix, but it least I will learn some 
Julia.



On Thursday, November 3, 2016 at 4:05:23 PM UTC-4, Michael Borregaard wrote:
>
> DataFrames is currently undergoing a very major change. Looks like CSV 
> creates the new type of DataFrames. I hope someone can help you with using 
> that. As a workaround, on the normal DataFrames version, I have generally 
> just replaced with a string representation:
> ```
> df[:account_numbers] = ["$account_number" for account_number in 
> df[:account_numbers]]
>
> On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews  > wrote:
>
>> Sure, so I need col #1 in my CSV to be a string in my data frame.   
>>
>> So as a test  I tried to load the file 3 different ways:
>>
>> *df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing the 
>> column to stay a string*
>> *df1_readtable = readtable("$df1_path")  #Do not know how to force the 
>> column to stay a string*
>> *df1_convertDF = convert(DataFrame, df1_CSV)*
>>
>> Here is the output:  If they are all dataframes then showcols should work 
>> an all three df1:
>>
>> *julia> names(df1_CSV)*
>> 3-element Array{Symbol,1}:
>>  :account_number
>>  Symbol("Discharge Date")
>>  :site
>>
>> *julia> names(df1_readtable)*
>> 3-element Array{Symbol,1}:
>>  :account_number
>>  :Discharge_Date
>>  :site
>>
>> *julia> names(df1_convertDF)*
>> 3-element Array{Symbol,1}:
>>  :account_number
>>  Symbol("Discharge Date")
>>  :site
>>
>>
>> *julia> eltypes(df1_CSV)*
>> 3-element Array{Type,1}:
>>  Nullable{String}
>>  Nullable{WeakRefString{UInt8}}
>>  Nullable{WeakRefString{UInt8}}
>>
>> *julia> eltypes(df1_readtable)*
>> 3-element Array{Type,1}:
>>  Int32   *#Do not know how to force the column to stay a string*
>>  String
>>  String
>>
>> *julia> eltypes(df1_convertDF)*
>> 3-element Array{Type,1}:
>>  Nullable{String}
>>  Nullable{WeakRefString{UInt8}}
>>  Nullable{WeakRefString{UInt8}}
>>
>> *julia> showcols(df1_convertDF)*
>> 1565x3 DataFrames.DataFrame
>> ERROR: MethodError: no method matching 
>> countna(::NullableArrays.NullableArray{St
>> ring,1})
>> Closest candidates are:
>>   countna(::Array{T,N}) at 
>> C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
>> ils.jl:115
>>   countna(::DataArrays.DataArray{T,N}) at 
>> C:\Users\lmathews\.julia\v0.5\DataFram
>> es\src\other\utils.jl:128
>>   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at 
>> C:\Users\lmathews\.ju
>> lia\v0.5\DataFrames\src\other\utils.jl:143
>>  in colmissing(::DataFrames.DataFrame) at 
>> C:\Users\lmathews\.julia\v0.5\DataFram
>> es\src\abstractdataframe\abstractdataframe.jl:657
>>  in showcols(::Base.TTY, ::DataFrames.DataFrame) at 
>> C:\Users\lmathews\.julia\v0.
>> 5\DataFrames\src\abstractdataframe\show.jl:574
>>  in showcols(::DataFrames.DataFrame) at 
>> C:\Users\lmathews\.julia\v0.5\DataFrames
>> \src\abstractdataframe\show.jl:581
>>
>> *julia> showcols(df1_readtable)*
>> 1565x3 DataFrames.DataFrame
>> │ Col # │ Name   │ Eltype │ Missing │
>> ├───┼┼┼─┤
>> │ 1 │ account_number │ Int32  │ 0   │
>> │ 2 │ Discharge_Date │ String │ 0   │
>> │ 3 │ site   │ String │ 0   │
>>
>> *julia> showcols(df1_CSV)*
>> 1565x3 DataFrames.DataFrame
>> ERROR: MethodError: no method matching 
>> countna(::NullableArrays.NullableArray{St
>> ring,1})
>> Closest candidates are:
>>   countna(::Array{T,N}) at 
>> C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
>> ils.jl:115
>>   countna(::DataArrays.DataArray{T,N}) at 
>> C:\Users\lmathews\.julia\v0.5\DataFram
>> es\src\other\utils.jl:128
>>   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at 
>> C:\Users\lmathews\.ju
>> lia\v0.5\DataFrames\src\other\utils.jl:143
>>  in colmissing(::DataFrames.DataFrame) at 
>> C:\Users\lmathews\.julia\v0.5\DataFram
>> es\src\abstractdataframe\abstractdataframe.jl:657
>>  in showcols(::Base.TTY, ::DataFrames.DataFrame) at 
>> C:\Users\lmathews\.julia\v0.
>> 5\DataFrames\src\abstractdataframe\show.jl:574
>>  in showcols(::DataFrames.DataFrame) at 
>> C:\Users\lmathews\.julia\v0.5\DataFrames
>> \src\abstractdataframe\show.jl:581
>>
>>
>>
>> On Thursday, November 3, 2016 at 8:54:19 AM UTC-4, Michael Borregaard 
>> wrote:
>>>
>>> The result of CSV should be a DataFrame by default.  What return type do 
>>> you get?
>>>
>>
>

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-03 Thread Michael Krabbe Borregaard
DataFrames is currently undergoing a very major change. Looks like CSV
creates the new type of DataFrames. I hope someone can help you with using
that. As a workaround, on the normal DataFrames version, I have generally
just replaced with a string representation:
```
df[:account_numbers] = ["$account_number" for account_number in
df[:account_numbers]]

On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews 
wrote:

> Sure, so I need col #1 in my CSV to be a string in my data frame.
>
> So as a test  I tried to load the file 3 different ways:
>
> *df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing the
> column to stay a string*
> *df1_readtable = readtable("$df1_path")  #Do not know how to force the
> column to stay a string*
> *df1_convertDF = convert(DataFrame, df1_CSV)*
>
> Here is the output:  If they are all dataframes then showcols should work
> an all three df1:
>
> *julia> names(df1_CSV)*
> 3-element Array{Symbol,1}:
>  :account_number
>  Symbol("Discharge Date")
>  :site
>
> *julia> names(df1_readtable)*
> 3-element Array{Symbol,1}:
>  :account_number
>  :Discharge_Date
>  :site
>
> *julia> names(df1_convertDF)*
> 3-element Array{Symbol,1}:
>  :account_number
>  Symbol("Discharge Date")
>  :site
>
>
> *julia> eltypes(df1_CSV)*
> 3-element Array{Type,1}:
>  Nullable{String}
>  Nullable{WeakRefString{UInt8}}
>  Nullable{WeakRefString{UInt8}}
>
> *julia> eltypes(df1_readtable)*
> 3-element Array{Type,1}:
>  Int32   *#Do not know how to force the column to stay a string*
>  String
>  String
>
> *julia> eltypes(df1_convertDF)*
> 3-element Array{Type,1}:
>  Nullable{String}
>  Nullable{WeakRefString{UInt8}}
>  Nullable{WeakRefString{UInt8}}
>
> *julia> showcols(df1_convertDF)*
> 1565x3 DataFrames.DataFrame
> ERROR: MethodError: no method matching countna(::NullableArrays.
> NullableArray{St
> ring,1})
> Closest candidates are:
>   countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\
> DataFrames\src\other\ut
> ils.jl:115
>   countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\
> DataFram
> es\src\other\utils.jl:128
>   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> C:\Users\lmathews\.ju
> lia\v0.5\DataFrames\src\other\utils.jl:143
>  in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\
> DataFram
> es\src\abstractdataframe\abstractdataframe.jl:657
>  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> C:\Users\lmathews\.julia\v0.
> 5\DataFrames\src\abstractdataframe\show.jl:574
>  in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\
> DataFrames
> \src\abstractdataframe\show.jl:581
>
> *julia> showcols(df1_readtable)*
> 1565x3 DataFrames.DataFrame
> │ Col # │ Name   │ Eltype │ Missing │
> ├───┼┼┼─┤
> │ 1 │ account_number │ Int32  │ 0   │
> │ 2 │ Discharge_Date │ String │ 0   │
> │ 3 │ site   │ String │ 0   │
>
> *julia> showcols(df1_CSV)*
> 1565x3 DataFrames.DataFrame
> ERROR: MethodError: no method matching countna(::NullableArrays.
> NullableArray{St
> ring,1})
> Closest candidates are:
>   countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\
> DataFrames\src\other\ut
> ils.jl:115
>   countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\
> DataFram
> es\src\other\utils.jl:128
>   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> C:\Users\lmathews\.ju
> lia\v0.5\DataFrames\src\other\utils.jl:143
>  in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\
> DataFram
> es\src\abstractdataframe\abstractdataframe.jl:657
>  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> C:\Users\lmathews\.julia\v0.
> 5\DataFrames\src\abstractdataframe\show.jl:574
>  in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\
> DataFrames
> \src\abstractdataframe\show.jl:581
>
>
>
> On Thursday, November 3, 2016 at 8:54:19 AM UTC-4, Michael Borregaard
> wrote:
>>
>> The result of CSV should be a DataFrame by default.  What return type do
>> you get?
>>
>


Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-03 Thread Michael Borregaard
The result of CSV should be a DataFrame by default.  What return type do 
you get?


Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-02 Thread LeAnthony Mathews
Spoke too soon.  
Again I simple want the CSV column that is read in to not be an int32, but 
a string.

Still having issues casting the CSV file back into a Dataframe.
Its hard to understand why the Julia system is attempting to determine the 
type of the columns when I use readtable and I have no control over this.

Why can I not say:
df1 = readtable(file1; types=Dict(1=>String)) # assuming your account 
number is column # 1

*Reading the Julia spec-Advanced Options for Reading CSV Files*
*readtable accepts the following optional keyword arguments:*

*eltypes::Vector{DataType} – Specify the types of all columns. Defaults to 
[].*


*df1 = readtable(file1, Int32::Vector(String))*

I get 
*ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}*

Is this even an option?  Or how about convert the df1_CSV to df1_dataframe? 
 
*df1_dataframe = convert(dataframe, df1_CSV)*
Since the CSV .read seems to give more granular control.


On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote:
>
> Great, that worked for forcing the column into a string type.
> Thanks
>
> On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
>>
>> You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/
>>
>> In this case, you'd do:
>>
>> df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account 
>> number is column # 1
>> df2 = CSV.read(file2; types=Dict(1=>String))
>>
>> -Jacob
>>
>>
>> On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews  
>> wrote:
>>
>>> Using v0.5.0
>>> I have two different 10,000 line CSV files that I am reading into two 
>>> different dataframe variables using the readtable function.
>>> Each table has in common a ten digit account_number that I would like to 
>>> use as an index and join into one master file.
>>>
>>> Here is the account number example in the original CSV from file1:
>>> 8018884596
>>> 8018893530
>>> 8018909633
>>>
>>> When I do a readtable of this CSV into file1 then do a* 
>>> typeof(file1[:account_number])* I get:
>>> *DataArrays.DataArray(Int32,1)*
>>>  -571049996
>>>  -571041062
>>>  -571024959
>>>
>>> when I do a 
>>> *typeof(file2[:account_number])*
>>> *DataArrays.DataArray(String,1)*
>>>
>>>
>>> *Question:  *
>>> My CSV files give no guidance that account_number should be Int32 or 
>>> string type.  How do I force it to make both account_number elements type 
>>> String?
>>>
>>> I would like this join command to work:
>>> *new_account_join = join(file1, file2, on =:account_number,kind = :left)*
>>>
>>> But I am getting this error:
>>> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, 
>>> got Array{*
>>> *Array{Symbol,1},1}*
>>> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, 
>>> ::DataFrames.DataFrame, ::D*
>>> *ataFrames.DataFrame) at .\:0*
>>>
>>>
>>> Any help would be appreciated.  
>>>
>>>
>>>
>>

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-01 Thread LeAnthony Mathews
Great, that worked for forcing the column into a string type.
Thanks

On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
>
> You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/
>
> In this case, you'd do:
>
> df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account 
> number is column # 1
> df2 = CSV.read(file2; types=Dict(1=>String))
>
> -Jacob
>
>
> On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews  > wrote:
>
>> Using v0.5.0
>> I have two different 10,000 line CSV files that I am reading into two 
>> different dataframe variables using the readtable function.
>> Each table has in common a ten digit account_number that I would like to 
>> use as an index and join into one master file.
>>
>> Here is the account number example in the original CSV from file1:
>> 8018884596
>> 8018893530
>> 8018909633
>>
>> When I do a readtable of this CSV into file1 then do a* 
>> typeof(file1[:account_number])* I get:
>> *DataArrays.DataArray(Int32,1)*
>>  -571049996
>>  -571041062
>>  -571024959
>>
>> when I do a 
>> *typeof(file2[:account_number])*
>> *DataArrays.DataArray(String,1)*
>>
>>
>> *Question:  *
>> My CSV files give no guidance that account_number should be Int32 or 
>> string type.  How do I force it to make both account_number elements type 
>> String?
>>
>> I would like this join command to work:
>> *new_account_join = join(file1, file2, on =:account_number,kind = :left)*
>>
>> But I am getting this error:
>> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, 
>> got Array{*
>> *Array{Symbol,1},1}*
>> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, 
>> ::DataFrames.DataFrame, ::D*
>> *ataFrames.DataFrame) at .\:0*
>>
>>
>> Any help would be appreciated.  
>>
>>
>>
>

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-10-31 Thread Jacob Quinn
You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/

In this case, you'd do:

df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number
is column # 1
df2 = CSV.read(file2; types=Dict(1=>String))

-Jacob


On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews 
wrote:

> Using v0.5.0
> I have two different 10,000 line CSV files that I am reading into two
> different dataframe variables using the readtable function.
> Each table has in common a ten digit account_number that I would like to
> use as an index and join into one master file.
>
> Here is the account number example in the original CSV from file1:
> 8018884596
> 8018893530
> 8018909633
>
> When I do a readtable of this CSV into file1 then do a*
> typeof(file1[:account_number])* I get:
> *DataArrays.DataArray(Int32,1)*
>  -571049996
>  -571041062
>  -571024959
>
> when I do a
> *typeof(file2[:account_number])*
> *DataArrays.DataArray(String,1)*
>
>
> *Question:  *
> My CSV files give no guidance that account_number should be Int32 or
> string type.  How do I force it to make both account_number elements type
> String?
>
> I would like this join command to work:
> *new_account_join = join(file1, file2, on =:account_number,kind = :left)*
>
> But I am getting this error:
> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got
> Array{*
> *Array{Symbol,1},1}*
> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join,
> ::DataFrames.DataFrame, ::D*
> *ataFrames.DataFrame) at .\:0*
>
>
> Any help would be appreciated.
>
>
>


[julia-users] Question: Forcing readtable to create string type on import

2016-10-31 Thread LeAnthony Mathews
Using v0.5.0
I have two different 10,000 line CSV files that I am reading into two 
different dataframe variables using the readtable function.
Each table has in common a ten digit account_number that I would like to 
use as an index and join into one master file.

Here is the account number example in the original CSV from file1:
8018884596
8018893530
8018909633

When I do a readtable of this CSV into file1 then do a* 
typeof(file1[:account_number])* I get:
*DataArrays.DataArray(Int32,1)*
 -571049996
 -571041062
 -571024959

when I do a 
*typeof(file2[:account_number])*
*DataArrays.DataArray(String,1)*


*Question:  *
My CSV files give no guidance that account_number should be Int32 or string 
type.  How do I force it to make both account_number elements type String?

I would like this join command to work:
*new_account_join = join(file1, file2, on =:account_number,kind = :left)*

But I am getting this error:
*ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got 
Array{*
*Array{Symbol,1},1}*
* in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, 
::DataFrames.DataFrame, ::D*
*ataFrames.DataFrame) at .\:0*


Any help would be appreciated.