Re: [julia-users] DataFrames : Apply a function by rows

2015-11-22 Thread Fred
Ok, I hope that exchange could contribute to bring news ideas to improve 
DataFrames although there are other way to do it, like convert a DataFrame 
or a row into array. Thank you for your help !

Le dimanche 22 novembre 2015 15:48:37 UTC+1, tshort a écrit :
>
> Contributions/pull requests from folks that need that are welcome. I don't 
> have that need. For row operations, I can generally get by with loops or 
> `@byrow!` in DataFramesMeta.
>


Re: [julia-users] DataFrames : Apply a function by rows

2015-11-22 Thread Tom Short
Contributions/pull requests from folks that need that are welcome. I don't
have that need. For row operations, I can generally get by with loops or
`@byrow!` in DataFramesMeta.
On Nov 22, 2015 8:23 AM, "Fred"  wrote:

> Yes, it is a good solution, but it means that DataFrames cannot be used to
> do some calculations by rows, it is a severe limitation. An equivalent of 
> colwise()
> whould be very usefull.
>
> Le dimanche 22 novembre 2015 14:11:21 UTC+1, tshort a écrit :
>>
>> I'd convert the whole DataFrame to a matrix and use a loop over rows.
>>
>


Re: [julia-users] DataFrames : Apply a function by rows

2015-11-22 Thread Fred
Yes, it is a good solution, but it means that DataFrames cannot be used to 
do some calculations by rows, it is a severe limitation. An equivalent of 
colwise() 
whould be very usefull.

Le dimanche 22 novembre 2015 14:11:21 UTC+1, tshort a écrit :
>
> I'd convert the whole DataFrame to a matrix and use a loop over rows.
>


Re: [julia-users] DataFrames : Apply a function by rows

2015-11-22 Thread Tom Short
I'd convert the whole DataFrame to a matrix and use a loop over rows.
On Nov 22, 2015 2:54 AM, "Fred"  wrote:

> In my last example, the function mean() is not well chosen. In fact, what
> I would like to calculate is a statistical test line by lline, like TTest,
> or Wilcoxon. This is why I need to iterate thought 2 DataFrames at the same
> time if I subset the DataFrame first to increase speed :)
>
>
> Something like :
>
> julia> for r1,r2 in eachrow(df1, df2)
>   println(TTest(r1,r2))
>end
> ERROR: syntax: invalid iteration specification
>
>
>
>
> Le samedi 21 novembre 2015 19:17:27 UTC+1, Fred a écrit :
>>
>> It is a good idea but how is it possible to iterate two dataframes at the
>> same time ? Something like :
>>
>> julia> df = DataFrame(a=1:5, b=7:11, c=10:14, d=20:24)
>> 5x4 DataFrames.DataFrame
>> | Row | a | b  | c  | d  |
>> |-|---||||
>> | 1   | 1 | 7  | 10 | 20 |
>> | 2   | 2 | 8  | 11 | 21 |
>> | 3   | 3 | 9  | 12 | 22 |
>> | 4   | 4 | 10 | 13 | 23 |
>> | 5   | 5 | 11 | 14 | 24 |
>>
>> julia> df1 = df[1:2,]
>> 5x2 DataFrames.DataFrame
>> | Row | a | b  |
>> |-|---||
>> | 1   | 1 | 7  |
>> | 2   | 2 | 8  |
>> | 3   | 3 | 9  |
>> | 4   | 4 | 10 |
>> | 5   | 5 | 11 |
>>
>> julia> df1 = df[3:4,]
>> 5x2 DataFrames.DataFrame
>> | Row | c  | d  |
>> |-|||
>> | 1   | 10 | 20 |
>> | 2   | 11 | 21 |
>> | 3   | 12 | 22 |
>> | 4   | 13 | 23 |
>> | 5   | 14 | 24 |
>>
>> julia> for r1,r2 in eachrow(df1, df2)
>>   println(mean(r1,r2))
>>end
>> ERROR: syntax: invalid iteration specification
>>
>>
>>
>>
>> Le samedi 21 novembre 2015 15:08:34 UTC+1, tshort a écrit :
>>>
>>> For the subset, do the indexing after the conversion to an array, or
>>> subset the DataFrame first (probably faster).
>>>
>>


Re: [julia-users] DataFrames : Apply a function by rows

2015-11-21 Thread Fred
In my last example, the function mean() is not well chosen. In fact, what  
I would like to calculate is a statistical test line by lline, like TTest, 
or Wilcoxon. This is why I need to iterate thought 2 DataFrames at the same 
time if I subset the DataFrame first to increase speed :)


Something like :

julia> for r1,r2 in eachrow(df1, df2)
  println(TTest(r1,r2))
   end
ERROR: syntax: invalid iteration specification




Le samedi 21 novembre 2015 19:17:27 UTC+1, Fred a écrit :
>
> It is a good idea but how is it possible to iterate two dataframes at the 
> same time ? Something like :
>
> julia> df = DataFrame(a=1:5, b=7:11, c=10:14, d=20:24)
> 5x4 DataFrames.DataFrame
> | Row | a | b  | c  | d  |
> |-|---||||
> | 1   | 1 | 7  | 10 | 20 |
> | 2   | 2 | 8  | 11 | 21 |
> | 3   | 3 | 9  | 12 | 22 |
> | 4   | 4 | 10 | 13 | 23 |
> | 5   | 5 | 11 | 14 | 24 |
>
> julia> df1 = df[1:2,]
> 5x2 DataFrames.DataFrame
> | Row | a | b  |
> |-|---||
> | 1   | 1 | 7  |
> | 2   | 2 | 8  |
> | 3   | 3 | 9  |
> | 4   | 4 | 10 |
> | 5   | 5 | 11 |
>
> julia> df1 = df[3:4,]
> 5x2 DataFrames.DataFrame
> | Row | c  | d  |
> |-|||
> | 1   | 10 | 20 |
> | 2   | 11 | 21 |
> | 3   | 12 | 22 |
> | 4   | 13 | 23 |
> | 5   | 14 | 24 |
>
> julia> for r1,r2 in eachrow(df1, df2)
>   println(mean(r1,r2))
>end
> ERROR: syntax: invalid iteration specification
>
>
>
>
> Le samedi 21 novembre 2015 15:08:34 UTC+1, tshort a écrit :
>>
>> For the subset, do the indexing after the conversion to an array, or 
>> subset the DataFrame first (probably faster).
>>
>

Re: [julia-users] DataFrames : Apply a function by rows

2015-11-21 Thread Fred
It is a good idea but how is it possible to iterate two dataframes at the 
same time ? Something like :

julia> df = DataFrame(a=1:5, b=7:11, c=10:14, d=20:24)
5x4 DataFrames.DataFrame
| Row | a | b  | c  | d  |
|-|---||||
| 1   | 1 | 7  | 10 | 20 |
| 2   | 2 | 8  | 11 | 21 |
| 3   | 3 | 9  | 12 | 22 |
| 4   | 4 | 10 | 13 | 23 |
| 5   | 5 | 11 | 14 | 24 |

julia> df1 = df[1:2,]
5x2 DataFrames.DataFrame
| Row | a | b  |
|-|---||
| 1   | 1 | 7  |
| 2   | 2 | 8  |
| 3   | 3 | 9  |
| 4   | 4 | 10 |
| 5   | 5 | 11 |

julia> df1 = df[3:4,]
5x2 DataFrames.DataFrame
| Row | c  | d  |
|-|||
| 1   | 10 | 20 |
| 2   | 11 | 21 |
| 3   | 12 | 22 |
| 4   | 13 | 23 |
| 5   | 14 | 24 |

julia> for r1,r2 in eachrow(df1, df2)
  println(mean(r1,r2))
   end
ERROR: syntax: invalid iteration specification




Le samedi 21 novembre 2015 15:08:34 UTC+1, tshort a écrit :
>
> For the subset, do the indexing after the conversion to an array, or 
> subset the DataFrame first (probably faster).
>


Re: [julia-users] DataFrames : Apply a function by rows

2015-11-21 Thread Tom Short
For the subset, do the indexing after the conversion to an array, or subset
the DataFrame first (probably faster).
On Nov 21, 2015 8:43 AM, "Fred"  wrote:

> Thanks for the answer. I tried "eachrow" but I have 2 problems :
>
> 1- I still have to do an array conversion, I think it is slow
>
> julia> for r in eachrow(df)
>   println(mean(convert(Array,r)))
>end
> 6.0
> 7.0
> 8.0
> 9.0
> 10.0
>
>
>
> 2- I do not manage to use a subset of the row, for example the 2 first
> values :
> julia> for r in eachrow(df)
>   println(mean(convert(Array,r)))
>end
> 6.0
> 7.0
> 8.0
> 9.0
> 10.0
>
> julia> for r in eachrow(df)
>   println(mean(convert(Array,r[1:2])))
>end
> WARNING: [a] concatenation is deprecated; use collect(a) instead
>  in depwarn at deprecated.jl:73
>  in oldstyle_vcat_warning at ./abstractarray.jl:29
>  [inlined code] from none:2
>  in anonymous at no file:0
> while loading no file, in expression starting on line 0
> 4.0
>
>
>
>
>
>
> Le samedi 21 novembre 2015 14:04:11 UTC+1, tshort a écrit :
>>
>> You can try `eachrow`. It probably won't be fast, though. Here's an
>> example:
>>
>>
>> https://github.com/JuliaStats/DataFrames.jl/blob/master/test/iteration.jl#L34
>> On Nov 21, 2015 7:19 AM, "Fred"  wrote:
>>
>>> Hi,
>>>
>>> In DataFrames, it is easy to apply a function by columns using the
>>> colwise() function. But I find very difficult and inefficient to apply
>>> a function by rows.
>>>
>>> For example :
>>>
>>>
>>>
>>>  julia> df = DataFrame(a=1:5, b=7:11, c=10:14)
>>> 5x3 DataFrames.DataFrame
>>> | Row | a | b  | c  |
>>> |-|---|||
>>> | 1   | 1 | 7  | 10 |
>>> | 2   | 2 | 8  | 11 |
>>> | 3   | 3 | 9  | 12 |
>>> | 4   | 4 | 10 | 13 |
>>> | 5   | 5 | 11 | 14 |
>>>
>>>
>>>
>>>  julia> colwise(mean,df)
>>> 3-element Array{Any,1}:
>>>  [3.0]
>>>  [9.0]
>>>  [12.0]
>>>
>>>
>>>  julia> colwise(mean,df[1,1:2])
>>> 2-element Array{Any,1}:
>>>  [1.0]
>>>  [7.0]
>>>
>>>
>>>
>>> To calculate the mean of a row (or a subset), the only way I found is
>>> this :
>>>
>>> julia> mean(convert(Array,df[1,1:3]))
>>> 6.0
>>>
>>>
>>>
>>> I think this is inefficient and probably very slow. I there a better way
>>> to apply a function by rows ?
>>>
>>> Thanks !
>>>
>>


Re: [julia-users] DataFrames : Apply a function by rows

2015-11-21 Thread Fred
 Thanks for the answer. I tried "eachrow" but I have 2 problems :

1- I still have to do an array conversion, I think it is slow

julia> for r in eachrow(df) 
  println(mean(convert(Array,r))) 
   end 
6.0 
7.0 
8.0 
9.0 
10.0



2- I do not manage to use a subset of the row, for example the 2 first 
values :
julia> for r in eachrow(df) 
  println(mean(convert(Array,r))) 
   end 
6.0 
7.0 
8.0 
9.0 
10.0 
 
julia> for r in eachrow(df) 
  println(mean(convert(Array,r[1:2]))) 
   end 
WARNING: [a] concatenation is deprecated; use collect(a) instead 
 in depwarn at deprecated.jl:73 
 in oldstyle_vcat_warning at ./abstractarray.jl:29 
 [inlined code] from none:2 
 in anonymous at no file:0 
while loading no file, in expression starting on line 0 
4.0
 





Le samedi 21 novembre 2015 14:04:11 UTC+1, tshort a écrit :
>
> You can try `eachrow`. It probably won't be fast, though. Here's an 
> example:
>
>
> https://github.com/JuliaStats/DataFrames.jl/blob/master/test/iteration.jl#L34
> On Nov 21, 2015 7:19 AM, "Fred" > 
> wrote:
>
>> Hi,
>>
>> In DataFrames, it is easy to apply a function by columns using the 
>> colwise() function. But I find very difficult and inefficient to apply a 
>> function by rows.
>>
>> For example :
>>
>>
>>  
>>  julia> df = DataFrame(a=1:5, b=7:11, c=10:14) 
>> 5x3 DataFrames.DataFrame 
>> | Row | a | b  | c  | 
>> |-|---||| 
>> | 1   | 1 | 7  | 10 | 
>> | 2   | 2 | 8  | 11 | 
>> | 3   | 3 | 9  | 12 | 
>> | 4   | 4 | 10 | 13 | 
>> | 5   | 5 | 11 | 14 | 
>>
>>  
>>  
>>  julia> colwise(mean,df) 
>> 3-element Array{Any,1}: 
>>  [3.0]  
>>  [9.0]  
>>  [12.0]
>>  
>>  
>>  julia> colwise(mean,df[1,1:2]) 
>> 2-element Array{Any,1}: 
>>  [1.0] 
>>  [7.0]
>>
>>
>>
>> To calculate the mean of a row (or a subset), the only way I found is 
>> this :
>>
>> julia> mean(convert(Array,df[1,1:3])) 
>> 6.0
>>  
>>
>>
>> I think this is inefficient and probably very slow. I there a better way 
>> to apply a function by rows ?
>>
>> Thanks !
>>
>

Re: [julia-users] DataFrames : Apply a function by rows

2015-11-21 Thread Tom Short
You can try `eachrow`. It probably won't be fast, though. Here's an example:

https://github.com/JuliaStats/DataFrames.jl/blob/master/test/iteration.jl#L34
On Nov 21, 2015 7:19 AM, "Fred"  wrote:

> Hi,
>
> In DataFrames, it is easy to apply a function by columns using the
> colwise() function. But I find very difficult and inefficient to apply a
> function by rows.
>
> For example :
>
>
>
>  julia> df = DataFrame(a=1:5, b=7:11, c=10:14)
> 5x3 DataFrames.DataFrame
> | Row | a | b  | c  |
> |-|---|||
> | 1   | 1 | 7  | 10 |
> | 2   | 2 | 8  | 11 |
> | 3   | 3 | 9  | 12 |
> | 4   | 4 | 10 | 13 |
> | 5   | 5 | 11 | 14 |
>
>
>
>  julia> colwise(mean,df)
> 3-element Array{Any,1}:
>  [3.0]
>  [9.0]
>  [12.0]
>
>
>  julia> colwise(mean,df[1,1:2])
> 2-element Array{Any,1}:
>  [1.0]
>  [7.0]
>
>
>
> To calculate the mean of a row (or a subset), the only way I found is this
> :
>
> julia> mean(convert(Array,df[1,1:3]))
> 6.0
>
>
>
> I think this is inefficient and probably very slow. I there a better way
> to apply a function by rows ?
>
> Thanks !
>


[julia-users] DataFrames : Apply a function by rows

2015-11-21 Thread Fred
Hi,

In DataFrames, it is easy to apply a function by columns using the colwise() 
function. But I find very difficult and inefficient to apply a function by 
rows.

For example :


 
 julia> df = DataFrame(a=1:5, b=7:11, c=10:14) 
5x3 DataFrames.DataFrame 
| Row | a | b  | c  | 
|-|---||| 
| 1   | 1 | 7  | 10 | 
| 2   | 2 | 8  | 11 | 
| 3   | 3 | 9  | 12 | 
| 4   | 4 | 10 | 13 | 
| 5   | 5 | 11 | 14 | 

 
 
 julia> colwise(mean,df) 
3-element Array{Any,1}: 
 [3.0]  
 [9.0]  
 [12.0]
 
 
 julia> colwise(mean,df[1,1:2]) 
2-element Array{Any,1}: 
 [1.0] 
 [7.0]



To calculate the mean of a row (or a subset), the only way I found is this :

julia> mean(convert(Array,df[1,1:3])) 
6.0
 


I think this is inefficient and probably very slow. I there a better way to 
apply a function by rows ?

Thanks !