Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
> This does bring up a question about writing methods though. What happens > if two completely unrelated packages define a function 'foobar' (that isn't > part of Base or any other Julia standard package) and someone tries to use > both packages? It seems like this couldn't work. You would only get one > or the other, but whichever one was loaded last would always shadow the > other. You need to fully qualify one of them (or better both) A.foo() B.foo() I think it will also print a warning when using the second package. > On Sunday, March 29, 2015 at 9:45:10 AM UTC-5, Milan Bouchet-Valat wrote: >> >> Le dimanche 29 mars 2015 à 09:33 -0400, Stefan Karpinski a écrit : >> >> Why is it odd? >> >> I understand that this behavior can be confusing the first time you >> experience it. Since multiple dispatch is one of Julia strong points, I >> also expected at first that all methods with the same name would be merged >> together, and that which one to call would be decided based on the >> signature. >> >> This expectation is of course problematic since it would mean that any >> module could override a private method used in another module. And I think >> the fact that unexported functions remain private to a module is quite >> natural and not surprising at all. >> >> The confusion arises when one "sees" (via using) a function (e.g. f) >> exported from another module, but writing >> f() = something >> does not extend it by default. Again, this is perfectly reasonable, since >> otherwise adding a new exported function in a module could suddenly >> override one which existed in a module that used it. >> >> But the downside of this is that two modules which define each its own >> version of f with non-conflicting signatures (e.g. f(::MyType1) and >> f(::MyType2)) cannot be used at the same time even though their >> interaction is perfectly OK, unless one imports f from the other, or both >> import it from a third module. This is very powerful to encourage >> coordination between packages, but it in some cases it can be annoying (and >> certainly surprising when coming from R -- which is not necessarily bad). >> In the present case, it means you have to depend on DataFrames, which is >> a big dependency if your package implements a replacement for it and does >> not call its code at all; or move function definitions to an >> AbstractDataFrames package. Maybe the latter is a good idea, and with >> explicit definitions of interfaces/traits it could be a good practice >> anyway. >> >> I'm not sure anything can be done to make this easier to grasp for >> newcomers. Could a warning could be printed when a module exports a >> function with the same name as a function exported from one of the modules >> it calls using on? This would most likely indicate a mistake, or an >> uncoordinated change in the reverse dependency. >> >> >> Regards >> >> On Mar 28, 2015, at 10:16 PM, kevin.da...@gmail.com wrote: >> >> >> On Saturday, March 28, 2015 at 4:19:44 PM UTC-5, Mauro wrote: >> >> Now, generic functions carry >> around with them the module in which they were first defined. To extend >> such a function with another method in another module you either have to >> import it or fully qualify it (DataFrames.nrow). If you don't do that >> then you create a new generic function with the same name as the other >> but not sharing any methods. Also, this function will shadow the other >> one in the current module. >> >> >> >> Aha. This is the piece of information I was looking for. It seems a >> bit odd, but it does clear up some things. I'll have to play around with >> my implementation and this new information to see if it helps. >> >> >>
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
This does bring up a question about writing methods though. What happens if two completely unrelated packages define a function 'foobar' (that isn't part of Base or any other Julia standard package) and someone tries to use both packages? It seems like this couldn't work. You would only get one or the other, but whichever one was loaded last would always shadow the other. On Sunday, March 29, 2015 at 9:45:10 AM UTC-5, Milan Bouchet-Valat wrote: > > Le dimanche 29 mars 2015 à 09:33 -0400, Stefan Karpinski a écrit : > > Why is it odd? > > I understand that this behavior can be confusing the first time you > experience it. Since multiple dispatch is one of Julia strong points, I > also expected at first that all methods with the same name would be merged > together, and that which one to call would be decided based on the > signature. > > This expectation is of course problematic since it would mean that any > module could override a private method used in another module. And I think > the fact that unexported functions remain private to a module is quite > natural and not surprising at all. > > The confusion arises when one "sees" (via using) a function (e.g. f) > exported from another module, but writing > f() = something > does not extend it by default. Again, this is perfectly reasonable, since > otherwise adding a new exported function in a module could suddenly > override one which existed in a module that used it. > > But the downside of this is that two modules which define each its own > version of f with non-conflicting signatures (e.g. f(::MyType1) and > f(::MyType2)) cannot be used at the same time even though their > interaction is perfectly OK, unless one imports f from the other, or both > import it from a third module. This is very powerful to encourage > coordination between packages, but it in some cases it can be annoying (and > certainly surprising when coming from R -- which is not necessarily bad). > In the present case, it means you have to depend on DataFrames, which is > a big dependency if your package implements a replacement for it and does > not call its code at all; or move function definitions to an > AbstractDataFrames package. Maybe the latter is a good idea, and with > explicit definitions of interfaces/traits it could be a good practice > anyway. > > I'm not sure anything can be done to make this easier to grasp for > newcomers. Could a warning could be printed when a module exports a > function with the same name as a function exported from one of the modules > it calls using on? This would most likely indicate a mistake, or an > uncoordinated change in the reverse dependency. > > > Regards > > On Mar 28, 2015, at 10:16 PM, kevin.da...@gmail.com wrote: > > > On Saturday, March 28, 2015 at 4:19:44 PM UTC-5, Mauro wrote: > > Now, generic functions carry > around with them the module in which they were first defined. To extend > such a function with another method in another module you either have to > import it or fully qualify it (DataFrames.nrow). If you don't do that > then you create a new generic function with the same name as the other > but not sharing any methods. Also, this function will shadow the other > one in the current module. > > > > Aha. This is the piece of information I was looking for. It seems a > bit odd, but it does clear up some things. I'll have to play around with > my implementation and this new information to see if it helps. > > >
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
You pretty much nailed it on the head as to why I thought it was odd. I still have some work to do to get my head around what the best way is to work around my current issue. It still seems like most of the DataFrame methods should be defined in AbstractDataFrame to make it easier to subtype. That would depend on the subtypes always using the columns and colindex fields for data storage, but it looks by the definition of AbstractDataType that is assumed already anyway. On Sunday, March 29, 2015 at 9:45:10 AM UTC-5, Milan Bouchet-Valat wrote: > > Le dimanche 29 mars 2015 à 09:33 -0400, Stefan Karpinski a écrit : > > Why is it odd? > > I understand that this behavior can be confusing the first time you > experience it. Since multiple dispatch is one of Julia strong points, I > also expected at first that all methods with the same name would be merged > together, and that which one to call would be decided based on the > signature. > > This expectation is of course problematic since it would mean that any > module could override a private method used in another module. And I think > the fact that unexported functions remain private to a module is quite > natural and not surprising at all. > > The confusion arises when one "sees" (via using) a function (e.g. f) > exported from another module, but writing > f() = something > does not extend it by default. Again, this is perfectly reasonable, since > otherwise adding a new exported function in a module could suddenly > override one which existed in a module that used it. > > But the downside of this is that two modules which define each its own > version of f with non-conflicting signatures (e.g. f(::MyType1) and > f(::MyType2)) cannot be used at the same time even though their > interaction is perfectly OK, unless one imports f from the other, or both > import it from a third module. This is very powerful to encourage > coordination between packages, but it in some cases it can be annoying (and > certainly surprising when coming from R -- which is not necessarily bad). > In the present case, it means you have to depend on DataFrames, which is > a big dependency if your package implements a replacement for it and does > not call its code at all; or move function definitions to an > AbstractDataFrames package. Maybe the latter is a good idea, and with > explicit definitions of interfaces/traits it could be a good practice > anyway. > > I'm not sure anything can be done to make this easier to grasp for > newcomers. Could a warning could be printed when a module exports a > function with the same name as a function exported from one of the modules > it calls using on? This would most likely indicate a mistake, or an > uncoordinated change in the reverse dependency. > > > Regards > > On Mar 28, 2015, at 10:16 PM, kevin.da...@gmail.com wrote: > > > On Saturday, March 28, 2015 at 4:19:44 PM UTC-5, Mauro wrote: > > Now, generic functions carry > around with them the module in which they were first defined. To extend > such a function with another method in another module you either have to > import it or fully qualify it (DataFrames.nrow). If you don't do that > then you create a new generic function with the same name as the other > but not sharing any methods. Also, this function will shadow the other > one in the current module. > > > > Aha. This is the piece of information I was looking for. It seems a > bit odd, but it does clear up some things. I'll have to play around with > my implementation and this new information to see if it helps. > > >
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
Le dimanche 29 mars 2015 à 09:33 -0400, Stefan Karpinski a écrit : > Why is it odd? I understand that this behavior can be confusing the first time you experience it. Since multiple dispatch is one of Julia strong points, I also expected at first that all methods with the same name would be merged together, and that which one to call would be decided based on the signature. This expectation is of course problematic since it would mean that any module could override a private method used in another module. And I think the fact that unexported functions remain private to a module is quite natural and not surprising at all. The confusion arises when one "sees" (via using) a function (e.g. f) exported from another module, but writing f() = something does not extend it by default. Again, this is perfectly reasonable, since otherwise adding a new exported function in a module could suddenly override one which existed in a module that used it. But the downside of this is that two modules which define each its own version of f with non-conflicting signatures (e.g. f(::MyType1) and f(::MyType2)) cannot be used at the same time even though their interaction is perfectly OK, unless one imports f from the other, or both import it from a third module. This is very powerful to encourage coordination between packages, but it in some cases it can be annoying (and certainly surprising when coming from R -- which is not necessarily bad). In the present case, it means you have to depend on DataFrames, which is a big dependency if your package implements a replacement for it and does not call its code at all; or move function definitions to an AbstractDataFrames package. Maybe the latter is a good idea, and with explicit definitions of interfaces/traits it could be a good practice anyway. I'm not sure anything can be done to make this easier to grasp for newcomers. Could a warning could be printed when a module exports a function with the same name as a function exported from one of the modules it calls using on? This would most likely indicate a mistake, or an uncoordinated change in the reverse dependency. Regards > On Mar 28, 2015, at 10:16 PM, kevin.dale.sm...@gmail.com wrote: > > > > On Saturday, March 28, 2015 at 4:19:44 PM UTC-5, Mauro wrote: > > > > Now, generic functions carry > > around with them the module in which they were first > > defined. To extend > > such a function with another method in another module you > > either have to > > import it or fully qualify it (DataFrames.nrow). If you > > don't do that > > then you create a new generic function with the same name as > > the other > > but not sharing any methods. Also, this function will > > shadow the other > > one in the current module. > > > > > > > > > > Aha. This is the piece of information I was looking for. It seems > > a bit odd, but it does clear up some things. I'll have to play > > around with my implementation and this new information to see if it > > helps.
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
I wouldn't say that generic functions remember in which module they were first defined. Rather, a `using` declaration is kind of a weak import that says to only look for a name in the given module if it is not found in the current module. So when defining a method in the current module, no previous definition of the function will be found there and a new one will be created, shadowing the definition imported through using. On the other hand, import creates a bindings for the imported names in the current module, making it look just like they were defined there in the first place. This is my understanding at least, please correct me if I'm wrong.
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
Why is it odd? > On Mar 28, 2015, at 10:16 PM, kevin.dale.sm...@gmail.com wrote: > >> On Saturday, March 28, 2015 at 4:19:44 PM UTC-5, Mauro wrote: >> Now, generic functions carry >> around with them the module in which they were first defined. To extend >> such a function with another method in another module you either have to >> import it or fully qualify it (DataFrames.nrow). If you don't do that >> then you create a new generic function with the same name as the other >> but not sharing any methods. Also, this function will shadow the other >> one in the current module. > > Aha. This is the piece of information I was looking for. It seems a bit > odd, but it does clear up some things. I'll have to play around with my > implementation and this new information to see if it helps.
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
On Saturday, March 28, 2015 at 4:19:44 PM UTC-5, Mauro wrote: > > Now, generic functions carry > around with them the module in which they were first defined. To extend > such a function with another method in another module you either have to > import it or fully qualify it (DataFrames.nrow). If you don't do that > then you create a new generic function with the same name as the other > but not sharing any methods. Also, this function will shadow the other > one in the current module. > > Aha. This is the piece of information I was looking for. It seems a bit odd, but it does clear up some things. I'll have to play around with my implementation and this new information to see if it helps.
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
> That should be index(mydf). I did get the small test case working, but I > still can't seem to use the same techniques to get my application working. > I just don't understand how these method overrides are supposed to work. > I originally thought that you just needed to have methods with the same > name and Julia would simply look at the name and the argument types to > determine the correct method to use. But there is apparently more to it > since a previous suggestion was to do something like: > >DataFrames.nrow(df::MyDataFrame) = ncol(df) > 0 ? > length(df.columns[1])::Int : 0 > > I don't see why DataFrames should be involved at all. I'm using > AbstractDataFrames as a super-type, but why would the DataFrames type have > to know about MyDataFrame? They are peers, so I don't see why DataFrames > would be special. Note, DataFrames is not a type but the module. The type is DataFrame. It is customary (when applicable) to name the module (aka package) in plural and the type in singular tense. Now, generic functions carry around with them the module in which they were first defined. To extend such a function with another method in another module you either have to import it or fully qualify it (DataFrames.nrow). If you don't do that then you create a new generic function with the same name as the other but not sharing any methods. Also, this function will shadow the other one in the current module. However, as far as I understand “using DataFrames” + “DataFrames.nrow(...) =...” and "import DataFrames: nrow" + "nrow(...) = ..." should be equivalent. If that is indeed not the case, then it sounds like a bug to me. Do you have a self contained test-case? > Actually, I'm kind of surprised that DataFrames' nrow is even implemented > on DataFrames and not AbstractDataFrames. I would think that most of the > methods in dataframes.jl should be done on the AbstractDataFrame so that > anyone creating a subtype like I'm trying to do wouldn't have to > reimplement them all. But that's another issue altogether. If nrow is dependent on the implementation details of DataFrame then that is the only way, otherwise it probably should be defined on AbstractDataFrame.
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
That should be index(mydf). I did get the small test case working, but I still can't seem to use the same techniques to get my application working. I just don't understand how these method overrides are supposed to work. I originally thought that you just needed to have methods with the same name and Julia would simply look at the name and the argument types to determine the correct method to use. But there is apparently more to it since a previous suggestion was to do something like: DataFrames.nrow(df::MyDataFrame) = ncol(df) > 0 ? length(df.columns[1])::Int : 0 I don't see why DataFrames should be involved at all. I'm using AbstractDataFrames as a super-type, but why would the DataFrames type have to know about MyDataFrame? They are peers, so I don't see why DataFrames would be special. Actually, I'm kind of surprised that DataFrames' nrow is even implemented on DataFrames and not AbstractDataFrames. I would think that most of the methods in dataframes.jl should be done on the AbstractDataFrame so that anyone creating a subtype like I'm trying to do wouldn't have to reimplement them all. But that's another issue altogether.
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
On Fri, 2015-03-27 at 19:46, kevin.dale.sm...@gmail.com wrote: > Ok, I narrowed it down to a very small test case. The mymodule.jl file is > at the bottom of this posting. If you save that to a file then run this > code, you'll get the same effect as my original problem except with the > 'index' method. > > using mymodule > > mydf = MyDataFrame(Any[], Index()) > > # This line complains that `index` has no method matching > index(::MyDataFrame) This line is not valid syntax: julia> # This line complains that `index` has no method matching index(::MyDataFrame) ERROR: syntax: invalid "::" syntax So, correcting this everything seems to work, right? > display(mydf) > > # This displays my `index` method > methods(index) > > # This shows that my `index` method works > index(mydf) > > > === mymodule.jl === > > module mymodule > > import DataFrames: AbstractDataFrame, DataFrame, Index, nrow, ncol > import DataArrays: DataArray > > export MyDataFrame, nrow, ncol, Index, index, columns > > type MyDataFrame <: AbstractDataFrame >columns::Vector{Any} >colindex::Index > >function MyDataFrame(columns::Vector{Any}, colindex::Index) > > ncols = length(columns) > if ncols > 1 > nrows = length(columns[1]) > equallengths = true > for i in 2:ncols > equallengths &= length(columns[i]) == nrows > end > if !equallengths > msg = "All columns in a DataFrame must be the same length" > throw(ArgumentError(msg)) > end > end > > if length(colindex) != ncols > msg = "Columns and column index must be the same length" > throw(ArgumentError(msg)) > end > > new(columns, colindex) >end > > end > > index(df::MyDataFrame) = df.colindex > columns(df::MyDataFrame) = df.columns > > nrow(df::MyDataFrame) = ncol(df) > 0 ? length(df.columns[1])::Int : 0 > ncol(df::MyDataFrame) = length(index(df)) > > end > > ==
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
I did get it working using “import DataFrames: index, …”, but not with “using DataFrames” + “DataFrames.nrow”, at least in my simple test case. Doing the same thing in my actual application still didn’t work. Is there some place where this concept of writing methods for abstract types is documented? I feel like I’m missing some concept.
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
You need to import index from DataFrames. That's one reason I prefer using `using DataFrames` and defining methods with `DataFrames.index() = ...`. On Fri, Mar 27, 2015 at 2:46 PM, wrote: > Ok, I narrowed it down to a very small test case. The mymodule.jl file is > at the bottom of this posting. If you save that to a file then run this > code, you'll get the same effect as my original problem except with the > 'index' method. > > using mymodule > > mydf = MyDataFrame(Any[], Index()) > > # This line complains that `index` has no method matching > index(::MyDataFrame) > display(mydf) > > # This displays my `index` method > methods(index) > > # This shows that my `index` method works > index(mydf) > > > === mymodule.jl === > > module mymodule > > import DataFrames: AbstractDataFrame, DataFrame, Index, nrow, ncol > import DataArrays: DataArray > > export MyDataFrame, nrow, ncol, Index, index, columns > > type MyDataFrame <: AbstractDataFrame >columns::Vector{Any} >colindex::Index > >function MyDataFrame(columns::Vector{Any}, colindex::Index) > > ncols = length(columns) > if ncols > 1 > nrows = length(columns[1]) > equallengths = true > for i in 2:ncols > equallengths &= length(columns[i]) == nrows > end > if !equallengths > msg = "All columns in a DataFrame must be the same length" > throw(ArgumentError(msg)) > end > end > > if length(colindex) != ncols > msg = "Columns and column index must be the same length" > throw(ArgumentError(msg)) > end > > new(columns, colindex) >end > > end > > index(df::MyDataFrame) = df.colindex > columns(df::MyDataFrame) = df.columns > > nrow(df::MyDataFrame) = ncol(df) > 0 ? length(df.columns[1])::Int : 0 > ncol(df::MyDataFrame) = length(index(df)) > > end > > == >
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
Ok, I narrowed it down to a very small test case. The mymodule.jl file is at the bottom of this posting. If you save that to a file then run this code, you'll get the same effect as my original problem except with the 'index' method. using mymodule mydf = MyDataFrame(Any[], Index()) # This line complains that `index` has no method matching index(::MyDataFrame) display(mydf) # This displays my `index` method methods(index) # This shows that my `index` method works index(mydf) === mymodule.jl === module mymodule import DataFrames: AbstractDataFrame, DataFrame, Index, nrow, ncol import DataArrays: DataArray export MyDataFrame, nrow, ncol, Index, index, columns type MyDataFrame <: AbstractDataFrame columns::Vector{Any} colindex::Index function MyDataFrame(columns::Vector{Any}, colindex::Index) ncols = length(columns) if ncols > 1 nrows = length(columns[1]) equallengths = true for i in 2:ncols equallengths &= length(columns[i]) == nrows end if !equallengths msg = "All columns in a DataFrame must be the same length" throw(ArgumentError(msg)) end end if length(colindex) != ncols msg = "Columns and column index must be the same length" throw(ArgumentError(msg)) end new(columns, colindex) end end index(df::MyDataFrame) = df.colindex columns(df::MyDataFrame) = df.columns nrow(df::MyDataFrame) = ncol(df) > 0 ? length(df.columns[1])::Int : 0 ncol(df::MyDataFrame) = length(index(df)) end ==
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
Consider putting the code somewhere, so we can take a look. On Fri, Mar 27, 2015 at 1:55 PM, wrote: > I did try that as well, but it acted the same way. The size function is > actually defined in abstractdataframe.jl as Base.size, and that's where the > error message originates from. I don't know if that has anything to do > with it though. > >
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
I did try that as well, but it acted the same way. The size function is actually defined in abstractdataframe.jl as Base.size, and that's where the error message originates from. I don't know if that has anything to do with it though.
Re: [julia-users] Extending a DataFrame (or, why aren't my imports working?)
When you define your version of nrow, are you extending the DataFrames version as in the following? DataFrames.nrow(df::MyDataFrame) = ... If not, you are defining your own nrow that is different than the one in DataFrames. On Fri, Mar 27, 2015 at 12:45 PM, wrote: > I'm trying to extend DataFrames so that I can include metadata on the > dataframe and the columns. Unfortunately, from the way I understand how > Julia works, this is not an easy task. It seams as though I pretty much > have to copy the existing dataframes.jl file and replace all of the > "DataFrame" references with "MyDataFrame", then add in the metadata parts > where needed (or use composition and proxy all of the DataFrame > interfaces). This method seems to work for some things (it works with > Gadfly), however, I can't seem to get ncol and nrow to work properly. When > I try to do "size(mydf)", I get the following error: > > `nrow` has no method matching nrow(::MyDataFrame) > > However, if I do "methods(nrow)", it displays this: > >nrow(df::MyDataFrame) > > Which is exactly what the previous message said didn't exist. I'm a > little puzzled as to why DataFrames' original nrow doesn't show up in that > output as well since it is exported. When I do a "using DataFrames" in > this same session. I get the following warning. > >Warning: using DataFrames.nrow in module Main conflicts with an > existing identifier. > > I think it's safe to say that I'm pretty confused at this point. I'd > appreciate it if someone could clarify how extending existing structures is > supposed to work. > > >
[julia-users] Extending a DataFrame (or, why aren't my imports working?)
I'm trying to extend DataFrames so that I can include metadata on the dataframe and the columns. Unfortunately, from the way I understand how Julia works, this is not an easy task. It seams as though I pretty much have to copy the existing dataframes.jl file and replace all of the "DataFrame" references with "MyDataFrame", then add in the metadata parts where needed (or use composition and proxy all of the DataFrame interfaces). This method seems to work for some things (it works with Gadfly), however, I can't seem to get ncol and nrow to work properly. When I try to do "size(mydf)", I get the following error: `nrow` has no method matching nrow(::MyDataFrame) However, if I do "methods(nrow)", it displays this: nrow(df::MyDataFrame) Which is exactly what the previous message said didn't exist. I'm a little puzzled as to why DataFrames' original nrow doesn't show up in that output as well since it is exported. When I do a "using DataFrames" in this same session. I get the following warning. Warning: using DataFrames.nrow in module Main conflicts with an existing identifier. I think it's safe to say that I'm pretty confused at this point. I'd appreciate it if someone could clarify how extending existing structures is supposed to work.