RE: [julia-users] Is there a way to use values in a DataFrame directly in computation?

2016-10-05 Thread David Anthoff
Query.jl does not aim to make working with Nullables easier. The package 
provides querying capabilities, and is specifically designed to simply pick up 
whatever support for Nullables there is in julia base. Right now, as a 
temporary measure, Query.jl defines lots of methods for functions like 
arithmetic operators (``+`` etc.) for Nullables. Without those definitions the 
package would be close to unusable (in the same way that DataFrames right now 
is close to unusable). But I really hope to move these methods out of Query.jl, 
they don’t belong in that package, instead those methods should be in base (the 
approach here is what David Gold called the “method extension lifting 
approach”).

 

I feel strongly that “pushing” the problem of how to deal with Nullables into 
querying packages is not the right strategy. Instead I would much prefer to see 
better support for Nullables generally, and then packages like Query.jl can 
pick that support up. There are too many situations where using a query package 
is overkill, but where you will still encounter Nullables (especially now that 
DataFrames is based on NullableArrays). If we ask folks to use query packages 
in all of these cases we will have created a conceptually clean but completely 
impractical system, IMHO. For example, I think the examples from the original 
email simply need to work before the new DataFrames is tagged. Those kinds of 
operations are sooo common, I would find it completely impractical to ask folks 
to use something like Query.jl in such a situation.

 

I think the path to this is pretty simple: all we have to do is add methods for 
the common arithmetic operators that work on Nullable types. That is the 
approach that C# took, and it works really well. Maybe add some methods for 
Strings. I think once that is covered, most of the common use cases are dealt 
with and the system would work well in practice. Those new methods could be 
added to the julia master branch now, and then be backported to julia 0.5. Once 
that is done DataFrames could be merged.

 

Best,

David

 

 

From: julia-users@googlegroups.com [mailto:julia-users@googlegroups.com] On 
Behalf Of John Myles White
Sent: Monday, October 3, 2016 5:05 PM
To: julia-users 
Subject: Re: [julia-users] Is there a way to use values in a DataFrame directly 
in computation?

 

I think the core problem is that the current API + Nullable's is very 
cumbersome, but the switch to Nullable's will hopefully occur nearly 
simultaneously with the introduction of new API's that can make Nullable's much 
easier to deal with. David Gold spent the summer working on one approach that 
is, I think, much better than the current API; David Anthoff also has another 
approach that is substantially more powerful than the current API. The time 
between 0.5 and 0.6 may be a little chaotic in this regard, but I think the 
eventual results will be unequivocally worth the wait.


 -- John


On Monday, October 3, 2016 at 3:45:42 PM UTC-7, Min-Woong Sohn wrote:

Thank you. I fear that Nullables will make the DataFrame very difficult to use 
and turn many people away from Julia. 

 



On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote:

Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit : 
> 
> I am using DataFrames from master branch (with NullableArrays as the 
> default) and was wondering how the following should be done: 
> 
> df = DataFrame() 
> df[:A] = NullableArray([1,2,3]) 
> 
> The following are not allowed or return wrong values: 
> 
> df[1,:A] == 1   # false 
> df[1,:A] > 1 # MethodError: no method matching isless(::Int64, 
> ::Nullable{Int64}) 
> df[3,:A] + 1 # MethodError: no method matching 
> +(::Nullable{Int64}, ::Int64) 
> 
> How should I get around these issues? Does anybody know if there is a 
> plan to support these kinds of computations directly? 
These operations currently work (after loading NullableArrays) if you 
rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two 
first return a Nullable{Bool}, so you need to call get() on the result 
if you want to use them e.g. with an if. As an alternative, you can use 
isequal(). 

There are discussions as regards whether mixing Nullable and scalars 
should be allowed, as well as whether these operations should be moved 
into Julia Base. See in particular 
https://github.com/JuliaStats/NullableArrays.jl/pull/85 
https://github.com/JuliaLang/julia/pull/16988 

Anyway, the best approach to work with data frames is probably to use 
frameworks like AbstractQuery.jl and Query.jl, which are not yet 
completely ready to handle Nullable, but should make this easier. 


Regards 



Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?

2016-10-05 Thread Michael Borregaard

This is good news, and I am holding my breath for this to be succesful! As 
someone from a data-rich science (Ecology), a really good way of 
interacting directly with data is the make-or-break for whether I will be 
able to persuade my colleagues to make the shift to julia.


Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?

2016-10-03 Thread John Myles White
I think the core problem is that the current API + Nullable's is very 
cumbersome, but the switch to Nullable's will hopefully occur nearly 
simultaneously with the introduction of new API's that can make Nullable's 
much easier to deal with. David Gold spent the summer working on one 
approach that is, I think, much better than the current API; David Anthoff 
also has another approach that is substantially more powerful than the 
current API. The time between 0.5 and 0.6 may be a little chaotic in this 
regard, but I think the eventual results will be unequivocally worth the 
wait.

 -- John

On Monday, October 3, 2016 at 3:45:42 PM UTC-7, Min-Woong Sohn wrote:
>
> Thank you. I fear that Nullables will make the DataFrame very difficult to 
> use and turn many people away from Julia. 
>
>
>
> On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote:
>>
>> Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit : 
>> > 
>> > I am using DataFrames from master branch (with NullableArrays as the 
>> > default) and was wondering how the following should be done: 
>> > 
>> > df = DataFrame() 
>> > df[:A] = NullableArray([1,2,3]) 
>> > 
>> > The following are not allowed or return wrong values: 
>> > 
>> > df[1,:A] == 1   # false 
>> > df[1,:A] > 1 # MethodError: no method matching isless(::Int64, 
>> > ::Nullable{Int64}) 
>> > df[3,:A] + 1 # MethodError: no method matching 
>> > +(::Nullable{Int64}, ::Int64) 
>> > 
>> > How should I get around these issues? Does anybody know if there is a 
>> > plan to support these kinds of computations directly? 
>> These operations currently work (after loading NullableArrays) if you 
>> rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two 
>> first return a Nullable{Bool}, so you need to call get() on the result 
>> if you want to use them e.g. with an if. As an alternative, you can use 
>> isequal(). 
>>
>> There are discussions as regards whether mixing Nullable and scalars 
>> should be allowed, as well as whether these operations should be moved 
>> into Julia Base. See in particular 
>> https://github.com/JuliaStats/NullableArrays.jl/pull/85 
>> https://github.com/JuliaLang/julia/pull/16988 
>>
>> Anyway, the best approach to work with data frames is probably to use 
>> frameworks like AbstractQuery.jl and Query.jl, which are not yet 
>> completely ready to handle Nullable, but should make this easier. 
>>
>>
>> Regards 
>>
>

Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?

2016-10-03 Thread Min-Woong Sohn
Thank you. I fear that Nullables will make the DataFrame very difficult to 
use and turn many people away from Julia. 



On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote:
>
> Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit : 
> > 
> > I am using DataFrames from master branch (with NullableArrays as the 
> > default) and was wondering how the following should be done: 
> > 
> > df = DataFrame() 
> > df[:A] = NullableArray([1,2,3]) 
> > 
> > The following are not allowed or return wrong values: 
> > 
> > df[1,:A] == 1   # false 
> > df[1,:A] > 1 # MethodError: no method matching isless(::Int64, 
> > ::Nullable{Int64}) 
> > df[3,:A] + 1 # MethodError: no method matching 
> > +(::Nullable{Int64}, ::Int64) 
> > 
> > How should I get around these issues? Does anybody know if there is a 
> > plan to support these kinds of computations directly? 
> These operations currently work (after loading NullableArrays) if you 
> rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two 
> first return a Nullable{Bool}, so you need to call get() on the result 
> if you want to use them e.g. with an if. As an alternative, you can use 
> isequal(). 
>
> There are discussions as regards whether mixing Nullable and scalars 
> should be allowed, as well as whether these operations should be moved 
> into Julia Base. See in particular 
> https://github.com/JuliaStats/NullableArrays.jl/pull/85 
> https://github.com/JuliaLang/julia/pull/16988 
>
> Anyway, the best approach to work with data frames is probably to use 
> frameworks like AbstractQuery.jl and Query.jl, which are not yet 
> completely ready to handle Nullable, but should make this easier. 
>
>
> Regards 
>


Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?

2016-10-03 Thread Milan Bouchet-Valat
Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit :
> I am using DataFrames from master branch (with NullableArrays as the default) 
> and was wondering how the following should be done:
> 
> df = DataFrame()
> df[:A] = NullableArray([1,2,3])
> 
> The following are not allowed or return wrong values:
> 
> df[1,:A] == 1   # false
> df[1,:A] > 1     # MethodError: no method matching isless(::Int64, 
> ::Nullable{Int64})
> df[3,:A] + 1     # MethodError: no method matching +(::Nullable{Int64}, 
> ::Int64)
> 
> How should I get around these issues? Does anybody know if there is a
> plan to support these kinds of computations directly?
These operations currently work (after loading NullableArrays) if you
rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two
first return a Nullable{Bool}, so you need to call get() on the result
if you want to use them e.g. with an if. As an alternative, you can use
isequal().

There are discussions as regards whether mixing Nullable and scalars
should be allowed, as well as whether these operations should be moved
into Julia Base. See in particular
https://github.com/JuliaStats/NullableArrays.jl/pull/85
https://github.com/JuliaLang/julia/pull/16988

Anyway, the best approach to work with data frames is probably to use
frameworks like AbstractQuery.jl and Query.jl, which are not yet
completely ready to handle Nullable, but should make this easier.


Regards


Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?

2016-10-03 Thread Milan Bouchet-Valat
Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit :
> 
> I am using DataFrames from master branch (with NullableArrays as the
> default) and was wondering how the following should be done:
> 
> df = DataFrame()
> df[:A] = NullableArray([1,2,3])
> 
> The following are not allowed or return wrong values:
> 
> df[1,:A] == 1   # false
> df[1,:A] > 1     # MethodError: no method matching isless(::Int64,
> ::Nullable{Int64})
> df[3,:A] + 1     # MethodError: no method matching
> +(::Nullable{Int64}, ::Int64)
> 
> How should I get around these issues? Does anybody know if there is a
> plan to support these kinds of computations directly?
These operations currently work (after loading NullableArrays) if you
rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two
first return a Nullable{Bool}, so you need to call get() on the result
if you want to use them e.g. with an if. As an alternative, you can use
isequal().

There are discussions as regards whether mixing Nullable and scalars
should be allowed, as well as whether these operations should be moved
into Julia Base. See in particular
https://github.com/JuliaStats/NullableArrays.jl/pull/85
https://github.com/JuliaLang/julia/pull/16988

Anyway, the best approach to work with data frames is probably to use
frameworks like AbstractQuery.jl and Query.jl, which are not yet
completely ready to handle Nullable, but should make this easier.


Regards


[julia-users] Is there a way to use values in a DataFrame directly in computation?

2016-10-03 Thread Min-Woong Sohn
I am using DataFrames from master branch (with NullableArrays as the 
default) and was wondering how the following should be done:

df = DataFrame()
df[:A] = NullableArray([1,2,3])

The following are not allowed or return wrong values:

df[1,:A] == 1   # false
df[1,:A] > 1 # MethodError: no method matching isless(::Int64, 
::Nullable{Int64})
df[3,:A] + 1 # MethodError: no method matching +(::Nullable{Int64}, 
::Int64)

How should I get around these issues? Does anybody know if there is a plan 
to support these kinds of computations directly?