Re: [julia-users] DataFrames : Apply a function by rows
Ok, I hope that exchange could contribute to bring news ideas to improve DataFrames although there are other way to do it, like convert a DataFrame or a row into array. Thank you for your help ! Le dimanche 22 novembre 2015 15:48:37 UTC+1, tshort a écrit : > > Contributions/pull requests from folks that need that are welcome. I don't > have that need. For row operations, I can generally get by with loops or > `@byrow!` in DataFramesMeta. >
Re: [julia-users] DataFrames : Apply a function by rows
Contributions/pull requests from folks that need that are welcome. I don't have that need. For row operations, I can generally get by with loops or `@byrow!` in DataFramesMeta. On Nov 22, 2015 8:23 AM, "Fred" wrote: > Yes, it is a good solution, but it means that DataFrames cannot be used to > do some calculations by rows, it is a severe limitation. An equivalent of > colwise() > whould be very usefull. > > Le dimanche 22 novembre 2015 14:11:21 UTC+1, tshort a écrit : >> >> I'd convert the whole DataFrame to a matrix and use a loop over rows. >> >
Re: [julia-users] DataFrames : Apply a function by rows
Yes, it is a good solution, but it means that DataFrames cannot be used to do some calculations by rows, it is a severe limitation. An equivalent of colwise() whould be very usefull. Le dimanche 22 novembre 2015 14:11:21 UTC+1, tshort a écrit : > > I'd convert the whole DataFrame to a matrix and use a loop over rows. >
Re: [julia-users] DataFrames : Apply a function by rows
I'd convert the whole DataFrame to a matrix and use a loop over rows. On Nov 22, 2015 2:54 AM, "Fred" wrote: > In my last example, the function mean() is not well chosen. In fact, what > I would like to calculate is a statistical test line by lline, like TTest, > or Wilcoxon. This is why I need to iterate thought 2 DataFrames at the same > time if I subset the DataFrame first to increase speed :) > > > Something like : > > julia> for r1,r2 in eachrow(df1, df2) > println(TTest(r1,r2)) >end > ERROR: syntax: invalid iteration specification > > > > > Le samedi 21 novembre 2015 19:17:27 UTC+1, Fred a écrit : >> >> It is a good idea but how is it possible to iterate two dataframes at the >> same time ? Something like : >> >> julia> df = DataFrame(a=1:5, b=7:11, c=10:14, d=20:24) >> 5x4 DataFrames.DataFrame >> | Row | a | b | c | d | >> |-|---|||| >> | 1 | 1 | 7 | 10 | 20 | >> | 2 | 2 | 8 | 11 | 21 | >> | 3 | 3 | 9 | 12 | 22 | >> | 4 | 4 | 10 | 13 | 23 | >> | 5 | 5 | 11 | 14 | 24 | >> >> julia> df1 = df[1:2,] >> 5x2 DataFrames.DataFrame >> | Row | a | b | >> |-|---|| >> | 1 | 1 | 7 | >> | 2 | 2 | 8 | >> | 3 | 3 | 9 | >> | 4 | 4 | 10 | >> | 5 | 5 | 11 | >> >> julia> df1 = df[3:4,] >> 5x2 DataFrames.DataFrame >> | Row | c | d | >> |-||| >> | 1 | 10 | 20 | >> | 2 | 11 | 21 | >> | 3 | 12 | 22 | >> | 4 | 13 | 23 | >> | 5 | 14 | 24 | >> >> julia> for r1,r2 in eachrow(df1, df2) >> println(mean(r1,r2)) >>end >> ERROR: syntax: invalid iteration specification >> >> >> >> >> Le samedi 21 novembre 2015 15:08:34 UTC+1, tshort a écrit : >>> >>> For the subset, do the indexing after the conversion to an array, or >>> subset the DataFrame first (probably faster). >>> >>
Re: [julia-users] DataFrames : Apply a function by rows
In my last example, the function mean() is not well chosen. In fact, what I would like to calculate is a statistical test line by lline, like TTest, or Wilcoxon. This is why I need to iterate thought 2 DataFrames at the same time if I subset the DataFrame first to increase speed :) Something like : julia> for r1,r2 in eachrow(df1, df2) println(TTest(r1,r2)) end ERROR: syntax: invalid iteration specification Le samedi 21 novembre 2015 19:17:27 UTC+1, Fred a écrit : > > It is a good idea but how is it possible to iterate two dataframes at the > same time ? Something like : > > julia> df = DataFrame(a=1:5, b=7:11, c=10:14, d=20:24) > 5x4 DataFrames.DataFrame > | Row | a | b | c | d | > |-|---|||| > | 1 | 1 | 7 | 10 | 20 | > | 2 | 2 | 8 | 11 | 21 | > | 3 | 3 | 9 | 12 | 22 | > | 4 | 4 | 10 | 13 | 23 | > | 5 | 5 | 11 | 14 | 24 | > > julia> df1 = df[1:2,] > 5x2 DataFrames.DataFrame > | Row | a | b | > |-|---|| > | 1 | 1 | 7 | > | 2 | 2 | 8 | > | 3 | 3 | 9 | > | 4 | 4 | 10 | > | 5 | 5 | 11 | > > julia> df1 = df[3:4,] > 5x2 DataFrames.DataFrame > | Row | c | d | > |-||| > | 1 | 10 | 20 | > | 2 | 11 | 21 | > | 3 | 12 | 22 | > | 4 | 13 | 23 | > | 5 | 14 | 24 | > > julia> for r1,r2 in eachrow(df1, df2) > println(mean(r1,r2)) >end > ERROR: syntax: invalid iteration specification > > > > > Le samedi 21 novembre 2015 15:08:34 UTC+1, tshort a écrit : >> >> For the subset, do the indexing after the conversion to an array, or >> subset the DataFrame first (probably faster). >> >
Re: [julia-users] DataFrames : Apply a function by rows
It is a good idea but how is it possible to iterate two dataframes at the same time ? Something like : julia> df = DataFrame(a=1:5, b=7:11, c=10:14, d=20:24) 5x4 DataFrames.DataFrame | Row | a | b | c | d | |-|---|||| | 1 | 1 | 7 | 10 | 20 | | 2 | 2 | 8 | 11 | 21 | | 3 | 3 | 9 | 12 | 22 | | 4 | 4 | 10 | 13 | 23 | | 5 | 5 | 11 | 14 | 24 | julia> df1 = df[1:2,] 5x2 DataFrames.DataFrame | Row | a | b | |-|---|| | 1 | 1 | 7 | | 2 | 2 | 8 | | 3 | 3 | 9 | | 4 | 4 | 10 | | 5 | 5 | 11 | julia> df1 = df[3:4,] 5x2 DataFrames.DataFrame | Row | c | d | |-||| | 1 | 10 | 20 | | 2 | 11 | 21 | | 3 | 12 | 22 | | 4 | 13 | 23 | | 5 | 14 | 24 | julia> for r1,r2 in eachrow(df1, df2) println(mean(r1,r2)) end ERROR: syntax: invalid iteration specification Le samedi 21 novembre 2015 15:08:34 UTC+1, tshort a écrit : > > For the subset, do the indexing after the conversion to an array, or > subset the DataFrame first (probably faster). >
Re: [julia-users] DataFrames : Apply a function by rows
For the subset, do the indexing after the conversion to an array, or subset the DataFrame first (probably faster). On Nov 21, 2015 8:43 AM, "Fred" wrote: > Thanks for the answer. I tried "eachrow" but I have 2 problems : > > 1- I still have to do an array conversion, I think it is slow > > julia> for r in eachrow(df) > println(mean(convert(Array,r))) >end > 6.0 > 7.0 > 8.0 > 9.0 > 10.0 > > > > 2- I do not manage to use a subset of the row, for example the 2 first > values : > julia> for r in eachrow(df) > println(mean(convert(Array,r))) >end > 6.0 > 7.0 > 8.0 > 9.0 > 10.0 > > julia> for r in eachrow(df) > println(mean(convert(Array,r[1:2]))) >end > WARNING: [a] concatenation is deprecated; use collect(a) instead > in depwarn at deprecated.jl:73 > in oldstyle_vcat_warning at ./abstractarray.jl:29 > [inlined code] from none:2 > in anonymous at no file:0 > while loading no file, in expression starting on line 0 > 4.0 > > > > > > > Le samedi 21 novembre 2015 14:04:11 UTC+1, tshort a écrit : >> >> You can try `eachrow`. It probably won't be fast, though. Here's an >> example: >> >> >> https://github.com/JuliaStats/DataFrames.jl/blob/master/test/iteration.jl#L34 >> On Nov 21, 2015 7:19 AM, "Fred" wrote: >> >>> Hi, >>> >>> In DataFrames, it is easy to apply a function by columns using the >>> colwise() function. But I find very difficult and inefficient to apply >>> a function by rows. >>> >>> For example : >>> >>> >>> >>> julia> df = DataFrame(a=1:5, b=7:11, c=10:14) >>> 5x3 DataFrames.DataFrame >>> | Row | a | b | c | >>> |-|---||| >>> | 1 | 1 | 7 | 10 | >>> | 2 | 2 | 8 | 11 | >>> | 3 | 3 | 9 | 12 | >>> | 4 | 4 | 10 | 13 | >>> | 5 | 5 | 11 | 14 | >>> >>> >>> >>> julia> colwise(mean,df) >>> 3-element Array{Any,1}: >>> [3.0] >>> [9.0] >>> [12.0] >>> >>> >>> julia> colwise(mean,df[1,1:2]) >>> 2-element Array{Any,1}: >>> [1.0] >>> [7.0] >>> >>> >>> >>> To calculate the mean of a row (or a subset), the only way I found is >>> this : >>> >>> julia> mean(convert(Array,df[1,1:3])) >>> 6.0 >>> >>> >>> >>> I think this is inefficient and probably very slow. I there a better way >>> to apply a function by rows ? >>> >>> Thanks ! >>> >>
Re: [julia-users] DataFrames : Apply a function by rows
Thanks for the answer. I tried "eachrow" but I have 2 problems : 1- I still have to do an array conversion, I think it is slow julia> for r in eachrow(df) println(mean(convert(Array,r))) end 6.0 7.0 8.0 9.0 10.0 2- I do not manage to use a subset of the row, for example the 2 first values : julia> for r in eachrow(df) println(mean(convert(Array,r))) end 6.0 7.0 8.0 9.0 10.0 julia> for r in eachrow(df) println(mean(convert(Array,r[1:2]))) end WARNING: [a] concatenation is deprecated; use collect(a) instead in depwarn at deprecated.jl:73 in oldstyle_vcat_warning at ./abstractarray.jl:29 [inlined code] from none:2 in anonymous at no file:0 while loading no file, in expression starting on line 0 4.0 Le samedi 21 novembre 2015 14:04:11 UTC+1, tshort a écrit : > > You can try `eachrow`. It probably won't be fast, though. Here's an > example: > > > https://github.com/JuliaStats/DataFrames.jl/blob/master/test/iteration.jl#L34 > On Nov 21, 2015 7:19 AM, "Fred" > > wrote: > >> Hi, >> >> In DataFrames, it is easy to apply a function by columns using the >> colwise() function. But I find very difficult and inefficient to apply a >> function by rows. >> >> For example : >> >> >> >> julia> df = DataFrame(a=1:5, b=7:11, c=10:14) >> 5x3 DataFrames.DataFrame >> | Row | a | b | c | >> |-|---||| >> | 1 | 1 | 7 | 10 | >> | 2 | 2 | 8 | 11 | >> | 3 | 3 | 9 | 12 | >> | 4 | 4 | 10 | 13 | >> | 5 | 5 | 11 | 14 | >> >> >> >> julia> colwise(mean,df) >> 3-element Array{Any,1}: >> [3.0] >> [9.0] >> [12.0] >> >> >> julia> colwise(mean,df[1,1:2]) >> 2-element Array{Any,1}: >> [1.0] >> [7.0] >> >> >> >> To calculate the mean of a row (or a subset), the only way I found is >> this : >> >> julia> mean(convert(Array,df[1,1:3])) >> 6.0 >> >> >> >> I think this is inefficient and probably very slow. I there a better way >> to apply a function by rows ? >> >> Thanks ! >> >
Re: [julia-users] DataFrames : Apply a function by rows
You can try `eachrow`. It probably won't be fast, though. Here's an example: https://github.com/JuliaStats/DataFrames.jl/blob/master/test/iteration.jl#L34 On Nov 21, 2015 7:19 AM, "Fred" wrote: > Hi, > > In DataFrames, it is easy to apply a function by columns using the > colwise() function. But I find very difficult and inefficient to apply a > function by rows. > > For example : > > > > julia> df = DataFrame(a=1:5, b=7:11, c=10:14) > 5x3 DataFrames.DataFrame > | Row | a | b | c | > |-|---||| > | 1 | 1 | 7 | 10 | > | 2 | 2 | 8 | 11 | > | 3 | 3 | 9 | 12 | > | 4 | 4 | 10 | 13 | > | 5 | 5 | 11 | 14 | > > > > julia> colwise(mean,df) > 3-element Array{Any,1}: > [3.0] > [9.0] > [12.0] > > > julia> colwise(mean,df[1,1:2]) > 2-element Array{Any,1}: > [1.0] > [7.0] > > > > To calculate the mean of a row (or a subset), the only way I found is this > : > > julia> mean(convert(Array,df[1,1:3])) > 6.0 > > > > I think this is inefficient and probably very slow. I there a better way > to apply a function by rows ? > > Thanks ! >
[julia-users] DataFrames : Apply a function by rows
Hi, In DataFrames, it is easy to apply a function by columns using the colwise() function. But I find very difficult and inefficient to apply a function by rows. For example : julia> df = DataFrame(a=1:5, b=7:11, c=10:14) 5x3 DataFrames.DataFrame | Row | a | b | c | |-|---||| | 1 | 1 | 7 | 10 | | 2 | 2 | 8 | 11 | | 3 | 3 | 9 | 12 | | 4 | 4 | 10 | 13 | | 5 | 5 | 11 | 14 | julia> colwise(mean,df) 3-element Array{Any,1}: [3.0] [9.0] [12.0] julia> colwise(mean,df[1,1:2]) 2-element Array{Any,1}: [1.0] [7.0] To calculate the mean of a row (or a subset), the only way I found is this : julia> mean(convert(Array,df[1,1:3])) 6.0 I think this is inefficient and probably very slow. I there a better way to apply a function by rows ? Thanks !