[julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
i'm trying to do this: using DataFrames df = DataFrame(a=[hi,there],x = rand(2)) df2 = DataFrame(a=[oh,yeah],x = rand(2)) for e in eachrow(df) append!(df2,e) end ERROR: `append!` has no method matching append!(::DataFrame, ::DataFrameRow{DataFrame}) in anonymous at no file:2 or julia for i in 1:nrow(df) push!(df2,df[i,:]) end but that errors as well. this works: julia for i in 1:nrow(df) push!(df2,array(df[i,:])) end but wondering whether that's the best way of achieving this efficiently.
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
Have you tried append!(df2,df)? ~~~ julia using DataFrames julia df = DataFrame(a=[hi,there],x = rand(2)) 2x2 DataFrame |---|-|--| | Row # | a | x| | 1 | hi| 0.862957 | | 2 | there | 0.101378 | julia df2 = DataFrame(a=[oh,yeah],x = rand(2)) 2x2 DataFrame |---||| | Row # | a | x | | 1 | oh | 0.00803615 | | 2 | yeah | 0.0222873 | julia append!(df2,df) 4x2 DataFrame |---|-|| | Row # | a | x | | 1 | oh| 0.00803615 | | 2 | yeah | 0.0222873 | | 3 | hi| 0.862957 | | 4 | there | 0.101378 | ~~~ On Fri, Sep 12, 2014 at 7:57 AM, Florian Oswald florian.osw...@gmail.com wrote: i'm trying to do this: using DataFrames df = DataFrame(a=[hi,there],x = rand(2)) df2 = DataFrame(a=[oh,yeah],x = rand(2)) for e in eachrow(df) append!(df2,e) end ERROR: `append!` has no method matching append!(::DataFrame, ::DataFrameRow{DataFrame}) in anonymous at no file:2 or julia for i in 1:nrow(df) push!(df2,df[i,:]) end but that errors as well. this works: julia for i in 1:nrow(df) push!(df2,array(df[i,:])) end but wondering whether that's the best way of achieving this efficiently.
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
yeah I wasn't very clear in that example. i really need to append one row at a time. On 12 September 2014 14:50, Leah Hanson astriea...@gmail.com wrote: Have you tried append!(df2,df)? ~~~ julia using DataFrames julia df = DataFrame(a=[hi,there],x = rand(2)) 2x2 DataFrame |---|-|--| | Row # | a | x| | 1 | hi| 0.862957 | | 2 | there | 0.101378 | julia df2 = DataFrame(a=[oh,yeah],x = rand(2)) 2x2 DataFrame |---||| | Row # | a | x | | 1 | oh | 0.00803615 | | 2 | yeah | 0.0222873 | julia append!(df2,df) 4x2 DataFrame |---|-|| | Row # | a | x | | 1 | oh| 0.00803615 | | 2 | yeah | 0.0222873 | | 3 | hi| 0.862957 | | 4 | there | 0.101378 | ~~~ On Fri, Sep 12, 2014 at 7:57 AM, Florian Oswald florian.osw...@gmail.com wrote: i'm trying to do this: using DataFrames df = DataFrame(a=[hi,there],x = rand(2)) df2 = DataFrame(a=[oh,yeah],x = rand(2)) for e in eachrow(df) append!(df2,e) end ERROR: `append!` has no method matching append!(::DataFrame, ::DataFrameRow{DataFrame}) in anonymous at no file:2 or julia for i in 1:nrow(df) push!(df2,df[i,:]) end but that errors as well. this works: julia for i in 1:nrow(df) push!(df2,array(df[i,:])) end but wondering whether that's the best way of achieving this efficiently.
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
Oh, I didn't realize that. So, `eachrow(df)` is giving you `[(:a,hi),(:x,0.703943)]` when you need `[hi,0.703943]` to use `push!`. ~~~ julia df = DataFrame(a=[hi,there],x = rand(2)) 2x2 DataFrame |---|-|--| | Row # | a | x| | 1 | hi| 0.703943 | | 2 | there | 0.269876 | julia df2 = DataFrame(a=[oh,yeah],x = rand(2)) 2x2 DataFrame |---||--| | Row # | a | x| | 1 | oh | 0.138966 | | 2 | yeah | 0.856162 | julia for e = eachrow(df) push!(df2,[v for (_,v) in e]) end julia df2 4x2 DataFrame |---|-|--| | Row # | a | x| | 1 | oh| 0.138966 | | 2 | yeah | 0.856162 | | 3 | hi| 0.703943 | | 4 | there | 0.269876 | ~~~ Does this work for you? -- Leah On Fri, Sep 12, 2014 at 8:54 AM, Florian Oswald florian.osw...@gmail.com wrote: yeah I wasn't very clear in that example. i really need to append one row at a time. On 12 September 2014 14:50, Leah Hanson astriea...@gmail.com wrote: Have you tried append!(df2,df)? ~~~ julia using DataFrames julia df = DataFrame(a=[hi,there],x = rand(2)) 2x2 DataFrame |---|-|--| | Row # | a | x| | 1 | hi| 0.862957 | | 2 | there | 0.101378 | julia df2 = DataFrame(a=[oh,yeah],x = rand(2)) 2x2 DataFrame |---||| | Row # | a | x | | 1 | oh | 0.00803615 | | 2 | yeah | 0.0222873 | julia append!(df2,df) 4x2 DataFrame |---|-|| | Row # | a | x | | 1 | oh| 0.00803615 | | 2 | yeah | 0.0222873 | | 3 | hi| 0.862957 | | 4 | there | 0.101378 | ~~~ On Fri, Sep 12, 2014 at 7:57 AM, Florian Oswald florian.osw...@gmail.com wrote: i'm trying to do this: using DataFrames df = DataFrame(a=[hi,there],x = rand(2)) df2 = DataFrame(a=[oh,yeah],x = rand(2)) for e in eachrow(df) append!(df2,e) end ERROR: `append!` has no method matching append!(::DataFrame, ::DataFrameRow{DataFrame}) in anonymous at no file:2 or julia for i in 1:nrow(df) push!(df2,df[i,:]) end but that errors as well. this works: julia for i in 1:nrow(df) push!(df2,array(df[i,:])) end but wondering whether that's the best way of achieving this efficiently.
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
We really need to standardize on a single type that reflects a single row of a tabular data structure that gets used both by DBI and by DataFrames. DataFrameRow is really nice because it's a zero-copy operation for DataFrames, but we can't provide zero-copy semantics when pulling rows out of a database. I tend to think we should have all tabular data systems use an OrderedDict to represent a single row of data. We could then change eachrow(df) to mutate a one-time allocated OrderedDict. This involves non-trivial copying, but it syncs up closer with the idioms you'd use when working with DBI. -- John On Sep 12, 2014, at 10:15 AM, Leah Hanson astriea...@gmail.com wrote: Oh, I didn't realize that. So, `eachrow(df)` is giving you `[(:a,hi),(:x,0.703943)]` when you need `[hi,0.703943]` to use `push!`. ~~~ julia df = DataFrame(a=[hi,there],x = rand(2)) 2x2 DataFrame |---|-|--| | Row # | a | x| | 1 | hi| 0.703943 | | 2 | there | 0.269876 | julia df2 = DataFrame(a=[oh,yeah],x = rand(2)) 2x2 DataFrame |---||--| | Row # | a | x| | 1 | oh | 0.138966 | | 2 | yeah | 0.856162 | julia for e = eachrow(df) push!(df2,[v for (_,v) in e]) end julia df2 4x2 DataFrame |---|-|--| | Row # | a | x| | 1 | oh| 0.138966 | | 2 | yeah | 0.856162 | | 3 | hi| 0.703943 | | 4 | there | 0.269876 | ~~~ Does this work for you? -- Leah On Fri, Sep 12, 2014 at 8:54 AM, Florian Oswald florian.osw...@gmail.com wrote: yeah I wasn't very clear in that example. i really need to append one row at a time. On 12 September 2014 14:50, Leah Hanson astriea...@gmail.com wrote: Have you tried append!(df2,df)? ~~~ julia using DataFrames julia df = DataFrame(a=[hi,there],x = rand(2)) 2x2 DataFrame |---|-|--| | Row # | a | x| | 1 | hi| 0.862957 | | 2 | there | 0.101378 | julia df2 = DataFrame(a=[oh,yeah],x = rand(2)) 2x2 DataFrame |---||| | Row # | a | x | | 1 | oh | 0.00803615 | | 2 | yeah | 0.0222873 | julia append!(df2,df) 4x2 DataFrame |---|-|| | Row # | a | x | | 1 | oh| 0.00803615 | | 2 | yeah | 0.0222873 | | 3 | hi| 0.862957 | | 4 | there | 0.101378 | ~~~ On Fri, Sep 12, 2014 at 7:57 AM, Florian Oswald florian.osw...@gmail.com wrote: i'm trying to do this: using DataFrames df = DataFrame(a=[hi,there],x = rand(2)) df2 = DataFrame(a=[oh,yeah],x = rand(2)) for e in eachrow(df) append!(df2,e) end ERROR: `append!` has no method matching append!(::DataFrame, ::DataFrameRow{DataFrame}) in anonymous at no file:2 or julia for i in 1:nrow(df) push!(df2,df[i,:]) end but that errors as well. this works: julia for i in 1:nrow(df) push!(df2,array(df[i,:])) end but wondering whether that's the best way of achieving this efficiently.
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
It seems like standardizing on convert would be a natural approach when one needs to go from one to the other. I don't know the DBI semantics, but myrow = convert(Dict, mydataframerow) myrow2 = convert(OrderedDict, mydataframerow), etc is transparent and lets different data storage objects use efficient representations internally (losing zero copy semantics is a huge sacrifice.) It's also easier to enforce in future packages: much simpler to add convert methods than to re-represent rows as OrderedDicts (or whatever datatype). On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote: We really need to standardize on a single type that reflects a single row of a tabular data structure that gets used both by DBI and by DataFrames. DataFrameRow is really nice because it's a zero-copy operation for DataFrames, but we can't provide zero-copy semantics when pulling rows out of a database. I tend to think we should have all tabular data systems use an OrderedDict to represent a single row of data. [...]
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
I'm not sure that losing zero copy semantics is actually a big performance hit in most pipelines. I think much more important is that you can't write generic code right now because the abstractions aren't linked in any way. The rows you fetch from a database using DBI aren't mutable, whereas the rows you fecth using eachrow(df) are. -- John On Sep 12, 2014, at 1:08 PM, Gray Calhoun gcalh...@iastate.edu wrote: It seems like standardizing on convert would be a natural approach when one needs to go from one to the other. I don't know the DBI semantics, but myrow = convert(Dict, mydataframerow) myrow2 = convert(OrderedDict, mydataframerow), etc is transparent and lets different data storage objects use efficient representations internally (losing zero copy semantics is a huge sacrifice.) It's also easier to enforce in future packages: much simpler to add convert methods than to re-represent rows as OrderedDicts (or whatever datatype). On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote: We really need to standardize on a single type that reflects a single row of a tabular data structure that gets used both by DBI and by DataFrames. DataFrameRow is really nice because it's a zero-copy operation for DataFrames, but we can't provide zero-copy semantics when pulling rows out of a database. I tend to think we should have all tabular data systems use an OrderedDict to represent a single row of data. [...]
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
Probably not in most, you're right. Can't you get generic code as long as a method to convert to OrderedDict is supplied, though? When you don't need anything more specific, convert the dataframe row to an OrderedDict, then either work with that object or convert it into a more appropriate internal format. But if you want to write specific algorithms for different storage types, that's still an option (e.g. either work with immutable DBI rows, or use a custom convert method to a more appropriate format, skipping the OrderedDict intermediate step). On Friday, September 12, 2014 3:26:47 PM UTC-5, John Myles White wrote: I'm not sure that losing zero copy semantics is actually a big performance hit in most pipelines. I think much more important is that you can't write generic code right now because the abstractions aren't linked in any way. The rows you fetch from a database using DBI aren't mutable, whereas the rows you fecth using eachrow(df) are. -- John On Sep 12, 2014, at 1:08 PM, Gray Calhoun gcal...@iastate.edu javascript: wrote: It seems like standardizing on convert would be a natural approach when one needs to go from one to the other. I don't know the DBI semantics, but myrow = convert(Dict, mydataframerow) myrow2 = convert(OrderedDict, mydataframerow), etc is transparent and lets different data storage objects use efficient representations internally (losing zero copy semantics is a huge sacrifice.) It's also easier to enforce in future packages: much simpler to add convert methods than to re-represent rows as OrderedDicts (or whatever datatype). On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote: We really need to standardize on a single type that reflects a single row of a tabular data structure that gets used both by DBI and by DataFrames. DataFrameRow is really nice because it's a zero-copy operation for DataFrames, but we can't provide zero-copy semantics when pulling rows out of a database. I tend to think we should have all tabular data systems use an OrderedDict to represent a single row of data. [...]
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
Doing a convert(OrderedDict, DataFrameRow) seems like it's going to be a much worse performance hit than copying everything into a specific OrderedDict that's reused, because you're going to allocate memory for a new OrderedDict object on every iteration. -- John On Sep 12, 2014, at 2:44 PM, Gray Calhoun gcalh...@iastate.edu wrote: Probably not in most, you're right. Can't you get generic code as long as a method to convert to OrderedDict is supplied, though? When you don't need anything more specific, convert the dataframe row to an OrderedDict, then either work with that object or convert it into a more appropriate internal format. But if you want to write specific algorithms for different storage types, that's still an option (e.g. either work with immutable DBI rows, or use a custom convert method to a more appropriate format, skipping the OrderedDict intermediate step). On Friday, September 12, 2014 3:26:47 PM UTC-5, John Myles White wrote: I'm not sure that losing zero copy semantics is actually a big performance hit in most pipelines. I think much more important is that you can't write generic code right now because the abstractions aren't linked in any way. The rows you fetch from a database using DBI aren't mutable, whereas the rows you fecth using eachrow(df) are. -- John On Sep 12, 2014, at 1:08 PM, Gray Calhoun gcal...@iastate.edu wrote: It seems like standardizing on convert would be a natural approach when one needs to go from one to the other. I don't know the DBI semantics, but myrow = convert(Dict, mydataframerow) myrow2 = convert(OrderedDict, mydataframerow), etc is transparent and lets different data storage objects use efficient representations internally (losing zero copy semantics is a huge sacrifice.) It's also easier to enforce in future packages: much simpler to add convert methods than to re-represent rows as OrderedDicts (or whatever datatype). On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote: We really need to standardize on a single type that reflects a single row of a tabular data structure that gets used both by DBI and by DataFrames. DataFrameRow is really nice because it's a zero-copy operation for DataFrames, but we can't provide zero-copy semantics when pulling rows out of a database. I tend to think we should have all tabular data systems use an OrderedDict to represent a single row of data. [...]
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
Oh, I wasn't thinking of that. Good point. A mutating OrderedDict constructor would allow reuse, but isn't as generic.
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
Leah: yeah that works. but i think i almost prefer my previous solution, instead of this push!(df2,[v for (_,v) in e]) that: push!(df2,array(e)) not sure about the performance implications though. On 12 September 2014 22:18, Gray Calhoun gcalh...@iastate.edu wrote: Oh, I wasn't thinking of that. Good point. A mutating OrderedDict constructor would allow reuse, but isn't as generic.
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
What does that mean? A DataFrameRow can't be easily created without reference to an existing DataFrame, so this seems like it's either a mechanism for transferring rows from one DataFrame to another very slowly or a mechanism for inserting duplicate rows. -- John On Sep 12, 2014, at 3:37 PM, Florian Oswald florian.osw...@gmail.com wrote: I'll submit a PR for Base.append!(adf::AbstracDataFrame,dfr::DataFrameRow) unless you tell me that's useless. On 12 September 2014 22:31, Florian Oswald florian.osw...@gmail.com wrote: Leah: yeah that works. but i think i almost prefer my previous solution, instead of thispush!(df2,[v for (_,v) in e]) that: push!(df2,array(e)) not sure about the performance implications though. On 12 September 2014 22:18, Gray Calhoun gcalh...@iastate.edu wrote: Oh, I wasn't thinking of that. Good point. A mutating OrderedDict constructor would allow reuse, but isn't as generic.
Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?
oh, i didnt' know it's slow. yes in my case it's a way of transferring a row from one df to another. what's a better way of doing this? On 12 September 2014 22:39, John Myles White johnmyleswh...@gmail.com wrote: What does that mean? A DataFrameRow can't be easily created without reference to an existing DataFrame, so this seems like it's either a mechanism for transferring rows from one DataFrame to another very slowly or a mechanism for inserting duplicate rows. -- John On Sep 12, 2014, at 3:37 PM, Florian Oswald florian.osw...@gmail.com wrote: I'll submit a PR for Base.append!(adf::AbstracDataFrame,dfr::DataFrameRow) unless you tell me that's useless. On 12 September 2014 22:31, Florian Oswald florian.osw...@gmail.com wrote: Leah: yeah that works. but i think i almost prefer my previous solution, instead of thispush!(df2,[v for (_,v) in e]) that: push!(df2,array(e)) not sure about the performance implications though. On 12 September 2014 22:18, Gray Calhoun gcalh...@iastate.edu wrote: Oh, I wasn't thinking of that. Good point. A mutating OrderedDict constructor would allow reuse, but isn't as generic.