[julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Florian Oswald
i'm trying to do this:

using DataFrames
df = DataFrame(a=[hi,there],x = rand(2))
df2 = DataFrame(a=[oh,yeah],x = rand(2))

for e in eachrow(df)
append!(df2,e)
end

ERROR: `append!` has no method matching append!(::DataFrame, 
::DataFrameRow{DataFrame}) 
in anonymous at no file:2

or 

julia for i in 1:nrow(df) 
push!(df2,df[i,:]) 
end

but that errors as well. 

this works:

julia for i in 1:nrow(df) 
push!(df2,array(df[i,:])) 
end

but wondering whether that's the best way of achieving this efficiently. 


Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Leah Hanson
Have you tried append!(df2,df)?

~~~
julia using DataFrames



julia df = DataFrame(a=[hi,there],x = rand(2))


2x2 DataFrame
|---|-|--|
| Row # | a   | x|
| 1 | hi| 0.862957 |
| 2 | there | 0.101378 |



julia df2 = DataFrame(a=[oh,yeah],x = rand(2))


2x2 DataFrame
|---|||
| Row # | a  | x  |
| 1 | oh   | 0.00803615 |
| 2 | yeah | 0.0222873  |



julia append!(df2,df)


4x2 DataFrame
|---|-||
| Row # | a   | x  |
| 1 | oh| 0.00803615 |
| 2 | yeah  | 0.0222873  |
| 3 | hi| 0.862957   |
| 4 | there | 0.101378   |
~~~



On Fri, Sep 12, 2014 at 7:57 AM, Florian Oswald florian.osw...@gmail.com
wrote:

 i'm trying to do this:

 using DataFrames
 df = DataFrame(a=[hi,there],x = rand(2))
 df2 = DataFrame(a=[oh,yeah],x = rand(2))

 for e in eachrow(df)
 append!(df2,e)
 end

 ERROR: `append!` has no method matching append!(::DataFrame,
 ::DataFrameRow{DataFrame})
 in anonymous at no file:2

 or

 julia for i in 1:nrow(df)
 push!(df2,df[i,:])
 end

 but that errors as well.

 this works:

 julia for i in 1:nrow(df)
 push!(df2,array(df[i,:]))
 end

 but wondering whether that's the best way of achieving this efficiently.



Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Florian Oswald
yeah I wasn't very clear in that example. i really need to append one row
at a time.

On 12 September 2014 14:50, Leah Hanson astriea...@gmail.com wrote:

 Have you tried append!(df2,df)?

 ~~~
 julia using DataFrames



 julia df = DataFrame(a=[hi,there],x = rand(2))


 2x2 DataFrame
 |---|-|--|
 | Row # | a   | x|
 | 1 | hi| 0.862957 |
 | 2 | there | 0.101378 |



 julia df2 = DataFrame(a=[oh,yeah],x = rand(2))


 2x2 DataFrame
 |---|||
 | Row # | a  | x  |
 | 1 | oh   | 0.00803615 |
 | 2 | yeah | 0.0222873  |



 julia append!(df2,df)


 4x2 DataFrame
 |---|-||
 | Row # | a   | x  |
 | 1 | oh| 0.00803615 |
 | 2 | yeah  | 0.0222873  |
 | 3 | hi| 0.862957   |
 | 4 | there | 0.101378   |
 ~~~



 On Fri, Sep 12, 2014 at 7:57 AM, Florian Oswald florian.osw...@gmail.com
 wrote:

 i'm trying to do this:

 using DataFrames
 df = DataFrame(a=[hi,there],x = rand(2))
 df2 = DataFrame(a=[oh,yeah],x = rand(2))

 for e in eachrow(df)
 append!(df2,e)
 end

 ERROR: `append!` has no method matching append!(::DataFrame,
 ::DataFrameRow{DataFrame})
 in anonymous at no file:2

 or

 julia for i in 1:nrow(df)
 push!(df2,df[i,:])
 end

 but that errors as well.

 this works:

 julia for i in 1:nrow(df)
 push!(df2,array(df[i,:]))
 end

 but wondering whether that's the best way of achieving this efficiently.





Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Leah Hanson
Oh, I didn't realize that. So, `eachrow(df)` is giving you
`[(:a,hi),(:x,0.703943)]` when you need `[hi,0.703943]` to use `push!`.

~~~
julia df = DataFrame(a=[hi,there],x = rand(2))
2x2 DataFrame
|---|-|--|
| Row # | a   | x|
| 1 | hi| 0.703943 |
| 2 | there | 0.269876 |

julia df2 = DataFrame(a=[oh,yeah],x = rand(2))
2x2 DataFrame
|---||--|
| Row # | a  | x|
| 1 | oh   | 0.138966 |
| 2 | yeah | 0.856162 |

julia for e = eachrow(df)
 push!(df2,[v for (_,v) in e])
   end

julia df2
4x2 DataFrame
|---|-|--|
| Row # | a   | x|
| 1 | oh| 0.138966 |
| 2 | yeah  | 0.856162 |
| 3 | hi| 0.703943 |
| 4 | there | 0.269876 |
~~~

Does this work for you?

-- Leah

On Fri, Sep 12, 2014 at 8:54 AM, Florian Oswald florian.osw...@gmail.com
wrote:

 yeah I wasn't very clear in that example. i really need to append one row
 at a time.

 On 12 September 2014 14:50, Leah Hanson astriea...@gmail.com wrote:

 Have you tried append!(df2,df)?

 ~~~
 julia using DataFrames



 julia df = DataFrame(a=[hi,there],x = rand(2))


 2x2 DataFrame
 |---|-|--|
 | Row # | a   | x|
 | 1 | hi| 0.862957 |
 | 2 | there | 0.101378 |



 julia df2 = DataFrame(a=[oh,yeah],x = rand(2))


 2x2 DataFrame
 |---|||
 | Row # | a  | x  |
 | 1 | oh   | 0.00803615 |
 | 2 | yeah | 0.0222873  |



 julia append!(df2,df)


 4x2 DataFrame
 |---|-||
 | Row # | a   | x  |
 | 1 | oh| 0.00803615 |
 | 2 | yeah  | 0.0222873  |
 | 3 | hi| 0.862957   |
 | 4 | there | 0.101378   |
 ~~~



 On Fri, Sep 12, 2014 at 7:57 AM, Florian Oswald florian.osw...@gmail.com
  wrote:

 i'm trying to do this:

 using DataFrames
 df = DataFrame(a=[hi,there],x = rand(2))
 df2 = DataFrame(a=[oh,yeah],x = rand(2))

 for e in eachrow(df)
 append!(df2,e)
 end

 ERROR: `append!` has no method matching append!(::DataFrame,
 ::DataFrameRow{DataFrame})
 in anonymous at no file:2

 or

 julia for i in 1:nrow(df)
 push!(df2,df[i,:])
 end

 but that errors as well.

 this works:

 julia for i in 1:nrow(df)
 push!(df2,array(df[i,:]))
 end

 but wondering whether that's the best way of achieving this efficiently.






Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread John Myles White
We really need to standardize on a single type that reflects a single row of a 
tabular data structure that gets used both by DBI and by DataFrames.

DataFrameRow is really nice because it's a zero-copy operation for DataFrames, 
but we can't provide zero-copy semantics when pulling rows out of a database.

I tend to think we should have all tabular data systems use an OrderedDict to 
represent a single row of data.

We could then change eachrow(df) to mutate a one-time allocated OrderedDict. 
This involves non-trivial copying, but it syncs up closer with the idioms you'd 
use when working with DBI.

 -- John

On Sep 12, 2014, at 10:15 AM, Leah Hanson astriea...@gmail.com wrote:

 Oh, I didn't realize that. So, `eachrow(df)` is giving you 
 `[(:a,hi),(:x,0.703943)]` when you need `[hi,0.703943]` to use `push!`.
 
 ~~~
 julia df = DataFrame(a=[hi,there],x = rand(2))
 2x2 DataFrame
 |---|-|--|
 | Row # | a   | x|
 | 1 | hi| 0.703943 |
 | 2 | there | 0.269876 |
 
 julia df2 = DataFrame(a=[oh,yeah],x = rand(2))
 2x2 DataFrame
 |---||--|
 | Row # | a  | x|
 | 1 | oh   | 0.138966 |
 | 2 | yeah | 0.856162 |
 
 julia for e = eachrow(df)
  push!(df2,[v for (_,v) in e])
end
 
 julia df2
 4x2 DataFrame
 |---|-|--|
 | Row # | a   | x|
 | 1 | oh| 0.138966 |
 | 2 | yeah  | 0.856162 |
 | 3 | hi| 0.703943 |
 | 4 | there | 0.269876 |
 ~~~
 
 Does this work for you?
 
 -- Leah
 
 On Fri, Sep 12, 2014 at 8:54 AM, Florian Oswald florian.osw...@gmail.com 
 wrote:
 yeah I wasn't very clear in that example. i really need to append one row at 
 a time.
 
 On 12 September 2014 14:50, Leah Hanson astriea...@gmail.com wrote:
 Have you tried append!(df2,df)?
 
 ~~~
 julia using DataFrames
   
   
  
 julia df = DataFrame(a=[hi,there],x = rand(2))   
   
  
 2x2 DataFrame
 |---|-|--|
 | Row # | a   | x|
 | 1 | hi| 0.862957 |
 | 2 | there | 0.101378 |
   
   
  
 julia df2 = DataFrame(a=[oh,yeah],x = rand(2))   
   
  
 2x2 DataFrame
 |---|||
 | Row # | a  | x  |
 | 1 | oh   | 0.00803615 |
 | 2 | yeah | 0.0222873  |
   
   
  
 julia append!(df2,df)
   
  
 4x2 DataFrame
 |---|-||
 | Row # | a   | x  |
 | 1 | oh| 0.00803615 |
 | 2 | yeah  | 0.0222873  |
 | 3 | hi| 0.862957   |
 | 4 | there | 0.101378   |
 ~~~   
   
 
 
 On Fri, Sep 12, 2014 at 7:57 AM, Florian Oswald florian.osw...@gmail.com 
 wrote:
 i'm trying to do this:
 
 using DataFrames
 df = DataFrame(a=[hi,there],x = rand(2))
 df2 = DataFrame(a=[oh,yeah],x = rand(2))
 
 for e in eachrow(df)
 append!(df2,e)
 end
 
 ERROR: `append!` has no method matching append!(::DataFrame, 
 ::DataFrameRow{DataFrame}) 
 in anonymous at no file:2
 
 or 
 
 julia for i in 1:nrow(df) 
 push!(df2,df[i,:]) 
 end
 
 but that errors as well. 
 
 this works:
 
 julia for i in 1:nrow(df) 
 push!(df2,array(df[i,:])) 
 end
 
 but wondering whether that's the best way of achieving this efficiently. 
 
 
 



Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Gray Calhoun
It seems like standardizing on convert would be a natural approach when 
one needs to go from one to the other. I don't know the DBI semantics, but

  myrow = convert(Dict, mydataframerow)
  myrow2 = convert(OrderedDict, mydataframerow), 

etc is transparent and lets different data storage objects use efficient 
representations internally (losing zero copy semantics is a huge 
sacrifice.)

It's also easier to enforce in future packages: much simpler to add convert 
methods than to re-represent rows as OrderedDicts (or whatever datatype).

On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote:

 We really need to standardize on a single type that reflects a single row 
 of a tabular data structure that gets used both by DBI and by DataFrames.

 DataFrameRow is really nice because it's a zero-copy operation for 
 DataFrames, but we can't provide zero-copy semantics when pulling rows out 
 of a database.

 I tend to think we should have all tabular data systems use an OrderedDict 
 to represent a single row of data.

[...] 


Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread John Myles White
I'm not sure that losing zero copy semantics is actually a big performance hit 
in most pipelines.

I think much more important is that you can't write generic code right now 
because the abstractions aren't linked in any way. The rows you fetch from a 
database using DBI aren't mutable, whereas the rows you fecth using eachrow(df) 
are.

 -- John

On Sep 12, 2014, at 1:08 PM, Gray Calhoun gcalh...@iastate.edu wrote:

 It seems like standardizing on convert would be a natural approach when one 
 needs to go from one to the other. I don't know the DBI semantics, but
 
   myrow = convert(Dict, mydataframerow)
   myrow2 = convert(OrderedDict, mydataframerow), 
 
 etc is transparent and lets different data storage objects use efficient 
 representations internally (losing zero copy semantics is a huge sacrifice.)
 
 It's also easier to enforce in future packages: much simpler to add convert 
 methods than to re-represent rows as OrderedDicts (or whatever datatype).
 
 On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote:
 We really need to standardize on a single type that reflects a single row of 
 a tabular data structure that gets used both by DBI and by DataFrames.
 
 DataFrameRow is really nice because it's a zero-copy operation for 
 DataFrames, but we can't provide zero-copy semantics when pulling rows out of 
 a database.
 
 I tend to think we should have all tabular data systems use an OrderedDict to 
 represent a single row of data.
 [...] 



Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Gray Calhoun
Probably not in most, you're right.

Can't you get generic code as long as a method to convert to OrderedDict is 
supplied, though?

When you don't need anything more specific, convert the dataframe row to an 
OrderedDict, then either work with that object or convert it into a more 
appropriate internal format. But if you want to write specific algorithms 
for different storage types, that's still an option (e.g. either work with 
immutable DBI rows, or use a custom convert method to a more appropriate 
format, skipping the OrderedDict intermediate step).

On Friday, September 12, 2014 3:26:47 PM UTC-5, John Myles White wrote:

 I'm not sure that losing zero copy semantics is actually a big performance 
 hit in most pipelines.

 I think much more important is that you can't write generic code right now 
 because the abstractions aren't linked in any way. The rows you fetch from 
 a database using DBI aren't mutable, whereas the rows you fecth using 
 eachrow(df) are.

  -- John

 On Sep 12, 2014, at 1:08 PM, Gray Calhoun gcal...@iastate.edu 
 javascript: wrote:

 It seems like standardizing on convert would be a natural approach when 
 one needs to go from one to the other. I don't know the DBI semantics, but

   myrow = convert(Dict, mydataframerow)
   myrow2 = convert(OrderedDict, mydataframerow), 

 etc is transparent and lets different data storage objects use efficient 
 representations internally (losing zero copy semantics is a huge 
 sacrifice.)

 It's also easier to enforce in future packages: much simpler to add 
 convert methods than to re-represent rows as OrderedDicts (or whatever 
 datatype).

 On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote:

 We really need to standardize on a single type that reflects a single row 
 of a tabular data structure that gets used both by DBI and by DataFrames.

 DataFrameRow is really nice because it's a zero-copy operation for 
 DataFrames, but we can't provide zero-copy semantics when pulling rows out 
 of a database.

 I tend to think we should have all tabular data systems use an 
 OrderedDict to represent a single row of data.

 [...] 




Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread John Myles White
Doing a convert(OrderedDict, DataFrameRow) seems like it's going to be a much 
worse performance hit than copying everything into a specific OrderedDict 
that's reused, because you're going to allocate memory for a new OrderedDict 
object on every iteration.

 -- John

On Sep 12, 2014, at 2:44 PM, Gray Calhoun gcalh...@iastate.edu wrote:

 Probably not in most, you're right.
 
 Can't you get generic code as long as a method to convert to OrderedDict is 
 supplied, though?
 
 When you don't need anything more specific, convert the dataframe row to an 
 OrderedDict, then either work with that object or convert it into a more 
 appropriate internal format. But if you want to write specific algorithms for 
 different storage types, that's still an option (e.g. either work with 
 immutable DBI rows, or use a custom convert method to a more appropriate 
 format, skipping the OrderedDict intermediate step).
 
 On Friday, September 12, 2014 3:26:47 PM UTC-5, John Myles White wrote:
 I'm not sure that losing zero copy semantics is actually a big performance 
 hit in most pipelines.
 
 I think much more important is that you can't write generic code right now 
 because the abstractions aren't linked in any way. The rows you fetch from a 
 database using DBI aren't mutable, whereas the rows you fecth using 
 eachrow(df) are.
 
  -- John
 
 On Sep 12, 2014, at 1:08 PM, Gray Calhoun gcal...@iastate.edu wrote:
 
 It seems like standardizing on convert would be a natural approach when 
 one needs to go from one to the other. I don't know the DBI semantics, but
 
   myrow = convert(Dict, mydataframerow)
   myrow2 = convert(OrderedDict, mydataframerow), 
 
 etc is transparent and lets different data storage objects use efficient 
 representations internally (losing zero copy semantics is a huge 
 sacrifice.)
 
 It's also easier to enforce in future packages: much simpler to add convert 
 methods than to re-represent rows as OrderedDicts (or whatever datatype).
 
 On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote:
 We really need to standardize on a single type that reflects a single row of 
 a tabular data structure that gets used both by DBI and by DataFrames.
 
 DataFrameRow is really nice because it's a zero-copy operation for 
 DataFrames, but we can't provide zero-copy semantics when pulling rows out 
 of a database.
 
 I tend to think we should have all tabular data systems use an OrderedDict 
 to represent a single row of data.
 [...] 
 



Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Gray Calhoun
Oh, I wasn't thinking of that. Good point. A mutating OrderedDict constructor 
would allow reuse, but isn't as generic.


Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Florian Oswald
Leah: yeah that works. but i think i almost prefer my previous solution,
instead of this
push!(df2,[v for (_,v) in e])
that:
push!(df2,array(e))

not sure about the performance implications though.




On 12 September 2014 22:18, Gray Calhoun gcalh...@iastate.edu wrote:

 Oh, I wasn't thinking of that. Good point. A mutating OrderedDict
 constructor would allow reuse, but isn't as generic.



Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread John Myles White
What does that mean? A DataFrameRow can't be easily created without reference 
to an existing DataFrame, so this seems like it's either a mechanism for 
transferring rows from one DataFrame to another very slowly or a mechanism for 
inserting duplicate rows.

 -- John

On Sep 12, 2014, at 3:37 PM, Florian Oswald florian.osw...@gmail.com wrote:

 I'll submit a PR for Base.append!(adf::AbstracDataFrame,dfr::DataFrameRow) 
 unless you tell me that's useless.
 
 On 12 September 2014 22:31, Florian Oswald florian.osw...@gmail.com wrote:
 Leah: yeah that works. but i think i almost prefer my previous solution, 
 instead of thispush!(df2,[v for (_,v) in e])
 that:
 push!(df2,array(e)) 
 
 not sure about the performance implications though.
 
 
 
 
 On 12 September 2014 22:18, Gray Calhoun gcalh...@iastate.edu wrote:
 Oh, I wasn't thinking of that. Good point. A mutating OrderedDict constructor 
 would allow reuse, but isn't as generic.
 
 



Re: [julia-users] how to push!(::DataFrame,::DataFrameRow) or append!(::DataFrame,::DataFrameRow) ?

2014-09-12 Thread Florian Oswald
oh, i didnt' know it's slow. yes in my case it's a way of transferring a
row from one df to another. what's a better way of doing this?

On 12 September 2014 22:39, John Myles White johnmyleswh...@gmail.com
wrote:

 What does that mean? A DataFrameRow can't be easily created without
 reference to an existing DataFrame, so this seems like it's either a
 mechanism for transferring rows from one DataFrame to another very slowly
 or a mechanism for inserting duplicate rows.

  -- John

 On Sep 12, 2014, at 3:37 PM, Florian Oswald florian.osw...@gmail.com
 wrote:

 I'll submit a PR for Base.append!(adf::AbstracDataFrame,dfr::DataFrameRow)
 unless you tell me that's useless.

 On 12 September 2014 22:31, Florian Oswald florian.osw...@gmail.com
 wrote:

 Leah: yeah that works. but i think i almost prefer my previous solution,
 instead of thispush!(df2,[v for (_,v) in e])
 that:
 push!(df2,array(e))

 not sure about the performance implications though.




 On 12 September 2014 22:18, Gray Calhoun gcalh...@iastate.edu wrote:

 Oh, I wasn't thinking of that. Good point. A mutating OrderedDict
 constructor would allow reuse, but isn't as generic.