Re: [julia-users] problem plotting with Gadfly/Cairo
You haven’t installed Cairo yet it seems. Or at least Julia isn’t finding Cairo installed where it expects to find it. — John On Dec 21, 2013, at 2:26 AM, Laksh Gupta glaks...@gmail.com wrote: Hi I am running 64 bit Julia Studio 0.4.3 on Windows 8. I installed Gadfly and Cairo but is still facing problems while trying to plot anything: julia plot plot not defined julia Gadfly.plot plot (generic function with 6 methods) julia Gadfly.plot(x=collect(1:100), y=sort(rand(100))) Plot(...) julia p = Gadfly.plot(x=collect(1:100), y=sort(rand(100))) Plot(...) julia draw(PNG(plot.png, 6.5inch, 3inch), p) Cairo must be installed to use the PNG backend. julia using Cairo julia draw(PNG(plot.png, 6.5inch, 3inch), p) Cairo must be installed to use the PNG backend. Any idea what am I missing here? Thanks, lg
Re: [julia-users] Composite Types With Initialized Fields
Assigning default values to fields of a composite type is not yet supported. Your inner constructor is also a little un-Julian, since `MyType() = new()` doesn’t assign any values to those fields. — John On Dec 21, 2013, at 4:37 AM, Marcus Urban math...@gmail.com wrote: I am a little confused about constructing composite types. Given the definition type MyType x::Int y::Int = 6 MyType() = new() end an instance of MyType can be created using m = MyType() At that point, m.x acts as expected --- I can assign to it, read its value, and so forth. However, attempting to access m.y produces an error that MyType has no field y. Based on another post, I gather that my attempt to provide a value to m.y in this manner is not allowed If that's the case, what exactly is the effect of y::Int = 6 If this part of the code is completely ignored, it would be really nice if the system let me know since initializing fields in this way is common in many languages. Also, I gather that a workaround is to use a constructor that takes named arguments. Is that still the recommended way? With just two fields, things are not difficult, but if the type has 20, calling a constructor with 20 positional arguments would be difficult.
Re: [julia-users] Performance of varargs indexing
Hi Milan, Have you looked at the many table-like functions already in existence? We have xtabs, xtab and table already. Would be nice to shrink everything down to one high-performance function. -- John On Dec 26, 2013, at 6:05 AM, Milan Bouchet-Valat nalimi...@club.fr wrote: Hi! I've been trying to implement a table and cross-table function for generic AbstractVectors and a more efficient version for PooledDataVectors (from DataArrays). I have something that seems to work fine for the latter, but the performance is not completely satisfying. See the code here: https://gist.github.com/nalimilan/8132114 Something like this: a = PooledDataArray(rep(1:10, 10)) table(a) @time table(a) Reports about 1s here, while the same thing in R take about .4s. My implementation has the advantage that it does not copy the input vectors, which may have a great impact when working with large data under memory pressure. But I think I'm doing many things wrong, since the allocated bytes are much higher than I would expect/like. Ideally there wouldn't be any allocation in the inner loop. It seems that the main problem comes from the transformation from vector to varargs that happens in a[el...] += 1. In an ideal world the compiler would detect that the length of el is fixed for given input types, and it would be able to make it equivalent to a direct call. But maybe I'm not doing this correctly. Or would I be better off computing the linear index manually by combining the indexes on the different dimensions? A secondary issue is that += seems to involve a call to getindex() and another to setindex!(), while theoretically it would be possible to do both at the same time once the pointer to the array position has been computed. Is this a planned optimization? (For the general AbstractVector method, I need a similar feature but applied to Dicts, and I've seen that an update() method is apparently planned.) Thanks for your help (I plan to open a PR to discuss the interface soon)
Re: [julia-users] Re: Modules, Closures and Methods
Haven’t had time to read through this in depth, but is your concern that abstract types can’t contain fields? That is likely to get fixed at some point in the future. — John On Dec 28, 2013, at 9:31 AM, andrew cooke and...@acooke.org wrote: thanks. i've done almost exactly the same thing with the Nothing type at https://github.com/andrewcooke/BlockCipherSelfStudy.jl/blob/master/src/GA.jl i don't think my problem is specific to GA. the problem is how to add extra state to an api. in traditional OO you can use inheritance to create a new class with the extra state. in julia you cannot. nor can you add it inside closures because you can't (as far as i can tell) extend methods in another module with a closure. so instead you have to spot ahead of time that a user might want to add extra state and provide an additional parameterized field (the context in my code linked to above) where the user can store arbitrary information. which seems ugly and prone to errors (what if you miss somewhere)? so i still hope there's a better solution. cheers, andrew On Saturday, 28 December 2013 04:50:11 UTC-3, Toivo Henningsson wrote: I don't have enough background in genetic algorithms to understand what you are trying to accomplish, but I think that to answer the question of how to write code so that it can be generally extended by users, the first thing to ask is what the interface to the code that you want to write really is (in abstract terms). Then, one can start to model it with types, generic functions, inheritance, etc. Also, to create a generic function without actually providing any implementations, I've lately been using things like f(::None) = nothing which seems to work fine.
Re: [julia-users] Why should map be avoided (for performance)?
My understanding (which may be out-of-date) is that the current version of map frequently doesn’t get the type of its input correct. That may have been fixed since I developed the habit of not using map. — John On Dec 29, 2013, at 11:58 AM, andrew cooke and...@acooke.org wrote: From the comment at https://gist.github.com/nalimilan/8132114 (am I reading it wrong, does it just mean map with local anon functions?) Is it the overhead of creating an intermediate Task? Are there any plans to merge nested tasks as an optimisation (I have no idea if something like that is even possible)? Or to replace the collect(map(...)) idiom with something faster? Thanks, Andrew
[julia-users] ccall confusion
I’m trying to use ccall to access the following function from the SQLite3 API: int sqlite3_table_column_metadata( sqlite3 *db,/* Connection handle */ const char *zDbName,/* Database name or NULL */ const char *zTableName, /* Table name */ const char *zColumnName,/* Column name */ char const **pzDataType,/* OUTPUT: Declared data type */ char const **pzCollSeq, /* OUTPUT: Collation sequence name */ int *pNotNull, /* OUTPUT: True if NOT NULL constraint exists */ int *pPrimaryKey, /* OUTPUT: True if column part of PK */ int *pAutoinc /* OUTPUT: True if column is auto-increment */ ); My attempt to do so keeps failing, so I suspect that I’m just not using ccall correctly. I keep trying the following lines and getting segfaults: using DBI using SQLite db = connect(SQLite3, db/tmp.sqlite3) dbptr = db.ptr table = users column = id datatype = Array(Ptr{Uint8}, 1) collseq = Array(Ptr{Uint8}, 1) notnull = zero(Cint) primarykey = zero(Cint) autoinc = zero(Cint) ccall((:sqlite3_table_column_metadata, sqlite3_lib), Cint, (Ptr{Void}, Ptr{Uint8}, Ptr{Uint8}, Ptr{Uint8}, Ptr{Ptr{Uint8}}, Ptr{Ptr{Uint8}}, Ptr{Cint}, Ptr{Cint}, Ptr{Cint}), dbptr, convert(Ptr{Uint8}, C_NULL), table, column, datatype, collseq, notnull, primarykey, autoinc) Any thoughts?
Re: [julia-users] ccall confusion
Thanks for the comments, Isaiah! It never occurred to me that library versions would be an issue here because I had forgotten that this function isn’t always defined unless certain compiler directives are enabled. I was using the version being found by the default search strategy for SQLite. I’ve changed to using a custom installation to make sure that the library I’m using has access to the function I want. This works for me now, but I’m not sure how to get access to the integer outputs from this function, which are all passed as pointers. I tried to change the return signature, but I don’t think I understand how to do it correctly. — John On Dec 30, 2013, at 1:44 AM, Isaiah Norton isaiah.nor...@gmail.com wrote: It's unclear what version of various libraries you are using, because I had to make several changes to get this to run. However, the following works fine (for me..). You might want to try deleting `SQLite/lib/sqlite3.dylib`, in case an incompatible version of the shared library is being picked up. The lib should come from HomeBrew if you are on OS X, as SQLite has a BinDeps install rule (so I'm not sure what the deal is with the dylib and .so files). using DBI using SQLite db = SQLite.connect(/tmp/db.sqlite3) dbptr = db.handle # changed table = users column = id datatype = Array(Ptr{Uint8}, 1) collseq = Array(Ptr{Uint8}, 1) notnull = zero(Cint) primarykey = zero(Cint) autoinc = zero(Cint) ccall((:sqlite3_table_column_metadata, SQLite.sqlite3_lib), # changed Cint, (Ptr{Void}, Ptr{Uint8}, Ptr{Uint8}, Ptr{Uint8}, Ptr{Ptr{Uint8}}, Ptr{Ptr{Uint8}}, Ptr{Cint}, Ptr{Cint}, Ptr{Cint}), dbptr, convert(Ptr{Uint8}, C_NULL), table, column, datatype, collseq, notnull, primarykey, autoinc) On Mon, Dec 30, 2013 at 12:00 AM, John Myles White johnmyleswh...@gmail.com wrote: I’m trying to use ccall to access the following function from the SQLite3 API: int sqlite3_table_column_metadata( sqlite3 *db,/* Connection handle */ const char *zDbName,/* Database name or NULL */ const char *zTableName, /* Table name */ const char *zColumnName,/* Column name */ char const **pzDataType,/* OUTPUT: Declared data type */ char const **pzCollSeq, /* OUTPUT: Collation sequence name */ int *pNotNull, /* OUTPUT: True if NOT NULL constraint exists */ int *pPrimaryKey, /* OUTPUT: True if column part of PK */ int *pAutoinc /* OUTPUT: True if column is auto-increment */ ); My attempt to do so keeps failing, so I suspect that I’m just not using ccall correctly. I keep trying the following lines and getting segfaults: using DBI using SQLite db = connect(SQLite3, db/tmp.sqlite3) dbptr = db.ptr table = users column = id datatype = Array(Ptr{Uint8}, 1) collseq = Array(Ptr{Uint8}, 1) notnull = zero(Cint) primarykey = zero(Cint) autoinc = zero(Cint) ccall((:sqlite3_table_column_metadata, sqlite3_lib), Cint, (Ptr{Void}, Ptr{Uint8}, Ptr{Uint8}, Ptr{Uint8}, Ptr{Ptr{Uint8}}, Ptr{Ptr{Uint8}}, Ptr{Cint}, Ptr{Cint}, Ptr{Cint}), dbptr, convert(Ptr{Uint8}, C_NULL), table, column, datatype, collseq, notnull, primarykey, autoinc) Any thoughts?
Re: [julia-users] Random.rand must be explicitly imported to be extended
It’s fine to punt on things. You can either not include those methods at all or include them as skewness(d::HarryDist) = error(“Not yet implemented”) — John On Dec 30, 2013, at 2:15 PM, Harry Southworth harry.southwo...@gmail.com wrote: Thanks for the tip. Another question, and possibly one not best posted here: Is there a minimum required functionality for adding a distribution to the package? I ask because the package seems to want me to provide skewness and kurtosis functions, but I've never wanted to know those things, don't know why anyone else would, have never seen them written down, and would rather spend my time doing something else. Thanks again, Harry
Re: [julia-users] create a type for points on n-dimensional simplex
I think you’d need a family of types to do that. You might look at https://github.com/twadleigh/ImmutableArrays.jl and try to extend it. — John On Dec 31, 2013, at 7:13 AM, Christian Groll groll.christian@gmail.com wrote: I already know how I could implement this for a given dimensionality. For example, for the two-dimensional case I can define: immutable twoDimSimplex weight1::Float64 weight2::Float64 twoDimPortfolio(x::Float64, y::Float64) = (abs(x+y - 1) 1e-10) ? error(entries must sum to one) : new(x,y) end However, I do not get how I could extend this for the n-dimensional case. Here, I thought that I would have to use one field which stores a n-dimensional vector: immutable nDimSimplex points::Vector{Float64} nDimSimplex(x::Vector{Float64}) = (abs(sum(x) - 1) 1e-10) ? error(entries must sum to one) : new(x) end Now, I think that it will not be possible to change the vector that the field points to. However, the entries of the vector itself still can be changed without restrictions. Any recommendations? On Tuesday, 31 December 2013 12:33:57 UTC+1, Tim Holy wrote: Use an immutable http://docs.julialang.org/en/release-0.2/manual/types/#immutable-composite- types with an inner constructor: http://docs.julialang.org/en/release-0.2/manual/constructors/#inner- constructor-methods You may find the example http://docs.julialang.org/en/release-0.2/manual/constructors/#case-study- rational helpful. Best, --Tim
Re: [julia-users] Re: Style Guideline
(4) Using both tabs and spaces is a huge problem in a shared codebase. This is probably the only rule in my entire list that I’m actually going to enforce in the code I maintain. IIRC, Python completely forbids mixing these kinds of space characters at the language level. (7) + (8) These rules are part of the official Google style guides for R, which is the language with the most similarity to Julia that’s being used at companies with public facing style guidelines. I think they’re quite sensible rules, which is why I decided to borrow them from published standards. (18) + (19): This is clearly an area of big disagreement in our community. I might pull them out into a suggestions section since I’d really prefer that code submitted to things like DataFrames.jl follow this rule, but don’t want to include a rule that’s going to be a big schism in the community. (22) + (23) + (24): I may take these out as well. I definitely agree that there’s a big difference between performance guidelines and style guidelines, although that line is blurry when you’re trying to keep a codebase written in a consistent style. (31): Comments aren’t PDF’s or HTML or any other language designed for transmitting carefully formatted documents. You don’t get to use images, properly formatted tables, etc. I find diagrams are an essential part of good documentation. I think conflating documents with code leads to documents that are less readable and lots of lines in code that’s not actually worth reading. (35): I might take this one out as well. It’s somewhere on the boundary between a performance tip and a style habit worth developing. — John On Dec 31, 2013, at 11:12 AM, Daniel Carrera dcarr...@gmail.com wrote: Personally, I do not think that a more thorough style guide is necessarily better. That said, I will give you my comments: (4): I like tabs and I use them. (7) + (8): I disagree. Although I generally use comma+space as you say, at times I deviate from that when I feel that doing so will improve the clarity and readability of my code. (18)+(19): I disagree. Although I could favour rules like this in a particular project, in many cases I think that adding type annotations just creates syntactic noise and can create a needless limitation. (22)+(23)+(24): I do not think that performance tips belong in a style guide. You could spend a lot of time writing performance tips and I don't see an obvious reason why the three tips you chose are more important than other performance tips. (31): I partially disagree. I like writing documentation (e.g. tutorial or explaining an algorithm) at the top of the file. I like having the documentation in the same file as the code that it refers to. I do not know what you mean when you say that English documents are more readable when not constrained by the rule of code comments. What rules are those? Also, I rarely want to have a diagram in my documentation because that involves starting a WYSIWYG program like LibreOffice or something like that. I haven't really felt a lot of need for diagrams. (35): This doesn't sound like a style thing either. Advice on the correct way to use a module, or how to maintain precision or avoid round-off errors, do not belong in a style guide. This sort of thing belongs in either the documentation for the module, or on some tutorial about numerical computation. Cheers, Daniel. On Tuesday, 31 December 2013 10:01:23 UTC-5, John Myles White wrote: One of the things that I really like about working with the Facebook codebase is that all of the code was written to comply with a very thorough internal style guideline. This prevents a lot of useless disagreement about code stylistics and discourages the creation of unreadable code before anything reaches the review stage. In an attempt to emulate that level of thoroughness, I decided to extend the main Julia manual’s style guide by writing my own personal style guideline, which can be found at https://github.com/johnmyleswhite/Style.jl I’d be really interested to know what others think of these rules and what they think is missing. Right now, my guidelines leave a lot of wiggle room. — John
Re: [julia-users] Style Guideline
You’re totally right that Base Julia has a very different implicit style guide than I’ve been using. That’s intentional since I find that some of the Base Julia code is a little hard to read at times. I’ve also been bitten by the absence of important type constraints in Base before (think of when show(io) used to have no type information), so I’ve tended towards initially conservative typing until it’s clear that looser typing is needed. I’m not sure there’s much benefit in having rules that involve personal judgement because reasonable people can make different judgements. So I’d rather have no rule at all (and just let things happen as they may) than try to formalize a rule whose application can’t be reliably checked by a linting program. Coming from R, I’m pretty strongly opposed to Matlab's precedence rule for “:”. I find it hard to read and really wish that it hadn't made it impossible for us to match R’s formula syntax. The “:” operator’s precedence is by far the part of Julia that I most dislike (which, of course, is why I’m such a big fan of Julia, since that’s a minor problem to have as your worst quality.) The for loops thing is one where I don’t have strong feelings, but tend to prefer consistency. I see the appeal of using “=“ in some contexts, but find it easier to avoid using different things to express the same concept. — John On Dec 31, 2013, at 10:54 AM, Stefan Karpinski ste...@karpinski.org wrote: I would mention that the vast majority of Base Julia, although it's fairly internally consistent, does not follow a lot of these rules. In particular, the whitespace rules and some of the type annotation rules, and for x in vs for x =. I tend to follow rules that require a bit of judgement, but therefore convey some subtle information about the code. Whitespace. I don't use spaces when calling functions that are mathy: f(x,y). I do, on the other hand, tend to use spaces when calling non-mathy functions: endswith(str, substr). I think that math expressions should be spaced so that they're readable and I'm not sure that a fixed set of rules does that, although no spaces for tighter operations and spaces for looser operations is the trend. I rely heavily on Matlab precedence of arithmetic versus :. For loops. When the right-hand-side is a range like 1:n then I use =. When the r-h-s is an opaque object that we're iterating over, then I use in. Examples: for i = 1:n # blah, blah end for obj in collection # blah, blah, blah end On Tue, Dec 31, 2013 at 10:01 AM, John Myles White johnmyleswh...@gmail.com wrote: One of the things that I really like about working with the Facebook codebase is that all of the code was written to comply with a very thorough internal style guideline. This prevents a lot of useless disagreement about code stylistics and discourages the creation of unreadable code before anything reaches the review stage. In an attempt to emulate that level of thoroughness, I decided to extend the main Julia manual’s style guide by writing my own personal style guideline, which can be found at https://github.com/johnmyleswhite/Style.jl I’d be really interested to know what others think of these rules and what they think is missing. Right now, my guidelines leave a lot of wiggle room. — John
Re: [julia-users] Re: Style Guideline
I could see a couple of nice uses for having the ability to do block-local imports, but I’m not sure if that would solve the problems that (33) is meant to address, which is that using importall makes it to too easy to accidentally monkey-patch Base and that using import sometimes makes it hard to know the provenance of functions being extended by a module. The latter is way less problematic than the former: single function import has a lot of good use cases, even if it’s a bit too non-local for my taste. It’s only importall that makes things really hard to keep track of. — John On Dec 31, 2013, at 3:54 PM, Brian Rogoff brog...@gmail.com wrote: IMO the main reason for (33) is that Julia presently lacks any local import feature. At least a few languages with module systems add these; see for instance OCaml http://caml.inria.fr/pub/docs/manual-ocaml-4.01/extn.html#sec225 and also Ada, which allows with/use inside of blocks. Is there a reason that a similar feature wouldn't work well with Julia too? -- Brian On Tuesday, December 31, 2013 7:01:23 AM UTC-8, John Myles White wrote: One of the things that I really like about working with the Facebook codebase is that all of the code was written to comply with a very thorough internal style guideline. This prevents a lot of useless disagreement about code stylistics and discourages the creation of unreadable code before anything reaches the review stage. In an attempt to emulate that level of thoroughness, I decided to extend the main Julia manual’s style guide by writing my own personal style guideline, which can be found at https://github.com/johnmyleswhite/Style.jl I’d be really interested to know what others think of these rules and what they think is missing. Right now, my guidelines leave a lot of wiggle room. — John
Re: [julia-users] Style Guideline
Thanks everyone for the feedback. Going to try to synthesize responses this weekend. Been distracted by a major push to add more database support to Julia. — John On Jan 2, 2014, at 6:02 AM, Keith Campbell keithcc1...@gmail.com wrote: +1 for Eric's proposal for a 100 character line length. The old 80 character limit in PEP 8 was the one bit of that guideline I could never abide. +1 also for Milan's proposal regarding brief comments describing a function's purpose. And thank you John for pushing a standard that can enable tooling support. On Thursday, January 2, 2014 5:15:19 AM UTC-5, Marcus Urban wrote: Do people using Julia really like underscores that much? I find them generally unsightly, and I do not plan to use them.
Re: [julia-users] Declaring types that are special cases of already existing composite types
Concrete types can't be subtyped because you need to know exactly how much memory space they occupy. -- John On Jan 2, 2014, at 10:01 AM, Mauro mauro...@runbox.com wrote: Only abstract types can be subtyped (and if I recall correctly this is going to stay that way for some type-theory-reason). Further, at the moment abstract types cannot have fields, i.e. cannot be composite types. However, this might change sometime, have a look at the issue: https://github.com/JuliaLang/julia/issues/4935 (which is also referenced in the mailing list thread mentioned by tshort) Now, all of this does not help with your quest, sorry, but it may be of some interest. On Thu, 2014-01-02 at 16:23, Christian Groll wrote: My interest lies in the implementation of types that are special cases of already existing composite types. For example, I want to implement a type Portfolio, which is just a DataFrame with two additional requirements: - all columns are of numeric type - the sum of the entries in each row must be equal to 1 Or, I want to implement a type TimeSeries, which is just a DataFrame where the first column consists of dates. I think the way to go here would be to implement some type of constraint checking in the setindex! methods, although this would not prevent messing with the entries by way of directly setting the fields of the type. However, once there exist convenient getindex and setindex! methods, I hope that basically nobody would mess with the values directly, and complete immutability is not necessarily needed. What I am trying to achieve is something that I think is called inheritance in other languages, where classes can simply be declared as subclasses to already existing classes. So the question is, whether something like this is possible in julia as well? As far as I get it, there is no way to declare a type to be a subtype of a composite type. Or, if this is not possible, is there any way around, like for example declaring a type portfolio, type Portfolio weights::DataVector{Float64} end where I can simply relate all setindex! methods to the respective methods of DataVector, adding constraint checks where necessary. function Base.setindex!(pf::Portfolio, v::Any, col_ind::ColumnIndex) if check_constraints(pf, v, col_ind) setindex!(pf.weights, v, col_ind) else error(constraints not fulfilled.) end end However, I then would need some type of metaprogramming such that I do not have to implement all the numerous setindex! methods of DataVector from scratch up.
Re: [julia-users] Declaring types that are special cases of already existing composite types
Right now, there is no mechanism for doing the delegation I described earlier beyond simple macros like those I wrote up a long time ago. For your use case, you would really want a method that wraps a given type in a new immutable type and then delegates all methods to the contained type unless they are explicitly overriden. Currently, that's not possible without a lot of legwork. Since it seems like you mostly want to guarantee invariant properties of your data, you can just write functions that don't break those invariants when operating a standard DataFrame and then call them. The compiler won't give you provable guarantees that those invariants are never broken, but your code still will respect them. If the compiler gets new abilities, you could then easily upgrade your methods to refer to a new type that imposes the desired invariants. -- John On Jan 2, 2014, at 2:00 PM, Christian Groll groll.christian@gmail.com wrote: Recapturing, it hence seems like julia does not support this feature - although I must admit that I did not get all the details in the answers ;-) Still, however, I would like to find some reasonable workaround to this problem. In my opinion, the dataframe type should probably really cover almost all cases of data storage in statistics / data analysis. Nevertheless, I would very much like to be able to allow for some distinction between different datasets. Hence, ideally I would like to have a type that behaves almost exactly like a dataframe, while I am still able to overload certain methods. For example, if I know that my dataset contains time series data, a visualization plot(df::dataframe) should look different than a visualization for geographic data on a map. Also, different datasets come with different constraints: portfolio weights must sum to 1, correlations must be between -1 and 1, and so forth. Isn't there any way to reasonably implement this without each time starting a new type from scratch? I was only calling it subtype because I somewhere stumbled upon the advice that it could work with subtyping the AbstractDataFrame type, but I didn't get this running. Any tips on whether / how this would work? Alternatively, I also found somewhere else a code snippet of John Myles White about a redirect or delegate macro: macro redirect(t, comp, fname) t = esc(t) comp = esc(comp) fname = esc(fname) quote ($fname)(a::($t), args...) = ($fname)(a.($comp), args...) end end This at least could be a starting point to give a new type the behavior of a dataframe. Is there any update on this macro? At last, I still do not get the memory problem with subtyping composite types for my exact case. The subtypes that I would like to have do NOT have any additional fields compared to their parent. They only shall help to allow function dispatch and implementation of some constraints. A Portfolio type still is nothing else than a dataframe, only that its values sum up to one. You definitely need not further explain the memory issues here to me, because I most likely do not understand them anyways. But are you really sure that such a Portfolio type would have different memory requirements than a dataframe? In effect, it should be nothing different, but only one special case of all possible dataframes. On Thursday, 2 January 2014 16:53:18 UTC+1, Stefan Karpinski wrote: On Thu, Jan 2, 2014 at 10:01 AM, Mauro maur...@runbox.com wrote: Only abstract types can be subtyped (and if I recall correctly this is going to stay that way for some type-theory-reason). It's not for a type theory reason – if anything, it's the opposite of a type theory reason. If Float64 can be subtyped, then then an Array{Float64} can hold objects of arbitrary size. Thus, you can't represent it as inline data, but rather have to store the array as pointers to boxed, heap-allocated values. Not only is this horribly inefficient (200% storage overhead on 64-bit machines), but it completely destroys interoperability with BLAS, FFTW, etc. Some o.o. languages have allowed declaring types to be final as a way of dealing with this issue (you also need immutability and/or value types to fully solve the array storage problem). After a few decades of real-world o.o. programming, however, the best practice that's emerged is that you should only subtype intentional supertypes – types that are very carefully designed to be subtypeable. Where a classically o.o. language might do Ac : Bc, where Ac and Bc are both concrete and Ac is a supertype of Bc, in Julia you would have Aj' : Aj, Bj where the abstract aspect of Ac is distilled into the purely abstract type, Aj', while the concrete aspect of Ac is implemented by Aj, which is a sibling of Bj instead of its parent. I've found that while this requires a slight shift in thinking, the resulting
Re: [julia-users] A small performance puzzle
Hopefully Jeff will chime in (or someone else with the required expertise), but I’ve heard Jeff warn against splatting tuples lots of times. — John On Jan 4, 2014, at 4:44 PM, Milan Bouchet-Valat nalimi...@club.fr wrote: Hi! I'd like propose you a small game about performance. In the following gist, I provide three very similar short functions; the first one allocates much more memory and is much slower than the two others. Can someone find an explanation? ;-) https://gist.github.com/nalimilan/8261056 The real-world scenario is again building a frequency table. I discovered that when doing a = zeros(Int, dims) I really had to make dim a tuple rather than an array, which forces me to use two versions of the same data, one in each type Thanks for the help!
Re: [julia-users] Add packages as root?
I believe there is a Julia environment variable that lets you control where packages will be located, but I can’t seem to recall what it is. If you knew that variable, you could have every user specify in the .juliarc that packages should be loaded from this alternative location. — John On Jan 5, 2014, at 6:46 AM, Alasdair McAndrew amc...@gmail.com wrote: I can install packages as myself; that's fine. I'm just wondering if they can be installed centrally, so as to be available to all users.
[julia-users] Ambiguity warnings re. Diagonal{T}
Anyone have a sense why Diagonal{T} is now ambiguous with DataArray, but only for subtraction?
Re: [julia-users] Interpreting flat format profiler reports
I’m not really sure. My not totally informed sense is that this is likely to be generally slow since you have to check the type every time to determine what the inner field means. But someone with more knowledge of Julia internals would need to confirm that. — John On Jan 6, 2014, at 10:34 AM, Brendan O'Connor breno...@gmail.com wrote: On Sunday, January 5, 2014 8:54:08 PM UTC-5, John Myles White wrote: Looking at this now, what are the types of the variables on the bolded lines? If they’re specific real-valued types, I’m surprised they’re so slow. They're all Float64, or at least, they should be. code_typed() says they are: w0 = /(-(+(getindex(top(getfield)(n0,:counts),wordID::Int64),top(box)(Float64,top(div_float)(betaHere::Float64,top(box)($(Float64),top(sitofp)($(Float64),V::Int64))::Float64))::Float64),on0::Int64),-(+(top(getfield)(top(getfield)(n0,:counts),:total),betaHere::Float64),top(box)($(Int64),top(zext_int)($(Int64),on_cur::Bool))::Int64)) # line 338: w1 = /(-(+(getindex(top(getfield)(n1,:counts),wordID::Int64),top(box)(Float64,top(div_float)(betaHere::Float64,top(box)($(Float64),top(sitofp)($(Float64),V::Int64))::Float64))::Float64),on1::Int64),-(+(top(getfield)(top(getfield)(n1,:counts),:total),betaHere::Float64),top(box)($(Int64),top(zext_int)($(Int64),on_cur::Bool))::Int64)) # line 339: p0 = -(+(top(getfield)(top(getfield)(cur_docnode,:left),:count),top(box)(Float64,top(div_float)(top(getfield)(mm::TreeTM,:gammaConc)::Float64,top(box)($(Float64),top(sitofp)($(Float64),2))::Float64))::Float64),on0::Int64) # line 340: p1 = -(+(top(getfield)(top(getfield)(cur_docnode,:right),:count),top(box)(Float64,top(div_float)(top(getfield)(mm::TreeTM,:gammaConc)::Float64,top(box)($(Float64),top(sitofp)($(Float64),2))::Float64))::Float64),on1::Int64) # line 341: q = /(*(p1,w1),+(*(p1,w1),*(p0,w0))) # line 343: Right now it’s not possible to have an abstract type with fields, so field lookup should be pretty consistently fast. What I meant was: abstract A type B : A x end type C: A y end ... and you have a datastructure typed A, but actually contains a mix of B's and C's. You have code that knows it's always accessing type B, and accesses the x field. Under what circumstances is this fast? -Brendan
Re: [julia-users] Ambiguity warnings re. Diagonal{T}
Agreed: the current ambiguity system has some unfortunate properties. — John On Jan 6, 2014, at 1:41 PM, Dahua Lin linda...@gmail.com wrote: In base/linalg/diagonal.jl: line 24 - 27 it defines the following functions: + (Diagonal, Diagonal) - (Diagonal, Diagonal) - (Diagonal, AbstractMatrix) - (AbstractMatrix, Diagonal) So when you write Diagonal - DataMatrix, the compiler doesn't know which method to use. But for +, there is not such a problem ... I don't know why they don't define + (Diagonal, AbstractMatrix) and + (AbstractMatrix, Diagonal) I think these things need a serious cleanup. - Dahua On Sunday, January 5, 2014 11:13:27 AM UTC-6, John Myles White wrote: Anyone have a sense why Diagonal{T} is now ambiguous with DataArray, but only for subtraction?
Re: [julia-users] Announcing AudioIO.jl - Simple Audio I/O for Julia
I don’t have homebrew on my general system, so I’ll just wait a bit. — John On Jan 6, 2014, at 10:13 PM, Spencer Russell s...@mit.edu wrote: Ah, currently I don't have the Homebrew.jl support working properly. I haven't dug into it deeply yet, but it looks like I'll need to put together a custom formula that will download a portaudio binary, and request that it be added to https://github.com/staticfloat/homebrew-juliadeps. For now you can do a brew install portaudio -s On Mon, Jan 6, 2014 at 9:00 PM, John Myles White johnmyleswh...@gmail.com wrote: This sounds really awesome. When I try to install it on OS X, I get the following error: ===[ ERROR: AudioIO ]=== None of the selected providers can install dependency libportaudio while loading /Users/johnmyleswhite/.julia/AudioIO/deps/build.jl, in expression starting on line 20 — John On Jan 5, 2014, at 9:43 PM, Spencer Russell s...@mit.edu wrote: Code and details at: https://github.com/ssfrr/AudioIO.jl Currently supporting OSX and Linux. AudioIO is a Julia library for interfacing to audio streams, which include playing to and recording from sound cards, reading and writing audio files, sending to network audio streams, etc. Currently only playing to the sound card through PortAudio is supported. It is under heavy development, so the API could change, there will be bugs, there are important missing features. That said, the basic API for playing back vectors of audio should work fine and that API should not change. For instance, to play 1 second of noise through your sound card, it's as easy as: julia v = rand(44100) * 0.1 julia play(v) If you have any problems, please open an Issue on the github page. Also don't hesitate to email the list and/or me.
Re: [julia-users] Julia and Python languages
I think part of the appeal of dot-notation OO is that it reads left-to-right, which helps to make the code seem to read in the same order as the sequence of actions taken. — John On Jan 8, 2014, at 7:45 AM, Tobias Knopp tobias.kn...@googlemail.com wrote: Would be interesting to see some use cases where Java-like OO better fits than Julias OO. In C++ one can use both and usually choses based on whether the dispatching can be done at runtime or at compile time (i.e. classes with virtual function for runtime decisions and templates for compile time decisions). There are many situations where I would have liked to use generic programming in C++ but it was not possible as the type was only known at runtime. In Julia this is no issue which makes it such a joy to use. Am Mittwoch, 8. Januar 2014 14:17:20 UTC+1 schrieb Stefan Karpinski: It's a bit hard to say whether Julia is object-oriented or not. I suspect that for a lot of people, object-oriented means do you write `x.f(y)` a lot? By that metric, Julia is not very object oriented. On the other hand, everything you can do with single-dispatch o.o. in C++ or Java, you can easily simulate with multiple dispatch, but you'll have to get used to writing `f(x,y)` instead of `x.f(y)`. If your notion of object-orientation has more to do with encapsulation and/or message passing, then we start to look pretty non-o.o. again. On Wed, Jan 8, 2014 at 5:25 AM, Matthias BUSSONNIER bussonnie...@gmail.com wrote: Le 7 janv. 2014 à 21:48, Erik Engheim a écrit : Thanks for the nice comments all of you. I guess I have to keep writing more about my Julia experiences after this ;-) On Tuesday, January 7, 2014 9:39:05 PM UTC+1, Ivar Nesje wrote: Great post, it sums up very well the things I think is the strengths of Julia. A few notes: Julia does not look up the method at runtime if the types of the arguments to the function can be deduced from the types of the arguments to the surrounding function (but it behaves that way for the user, unless he redefines the method after the function was compiled #265). That is cool I didn't know that. I assume this can make quite a big difference in performance for tight inner loops. Some misc comment too : Julia is not object oriented Is that True ? From the manual : It is multi-paradigm, combining features of imperative, functional, and object-oriented programming. I consider that Julia can be OO, the code just look different than in other languages. Typo ? Polymorphis lets you Missing m ? Liked the blog post too otherwise thanks, I would also have mentioned code_lowered, code_llvm and code_typed not everyone is fluent assembler and those tool are really useful to, especially in metaprogramming. -- M
Re: [julia-users] API inconsistency in `ismatch` vs `contains`
It depends entirely on how you interpret match. To me, the string is a match for the pattern, rather than the pattern being a match for the string. -- John On Jan 9, 2014, at 4:50 PM, Mike Nolta m...@nolta.net wrote: Conceptually, a regex is a set of strings, so i don't see the inconsistency. -Mike On Thu, Jan 9, 2014 at 7:40 PM, John Myles White johnmyleswh...@gmail.com wrote: It would break a bunch of code, but I also think ismatch(string, regex) would make sense than the current design. -- John On Jan 9, 2014, at 4:39 PM, Daniel Carrera dcarr...@gmail.com wrote: Hello, The functions `ismatch` and `contains` do similar things. Therefore, I think they should have a consistent API. Currently they receive parameters in reverse order: ismatch( rfoo , haystack ) contains( haystack, foo ) I always forget which one goes in which direction so I have to look it up. Since Julia is still a young language, I was wondering if there is any interest in reviewing to API to help ensure consistency between similar functions. Cheers, Daniel.
[julia-users] Repeating names in inner constructors?
I’ve noticed that a lot of people to use different field names when writing inner constructors, so that you see code like: type Foo a::Int function Foo(alpha::Int) magic(alpha) new(alpha) end end Would this ever be necessary to avoid confusion about names? I’ve started reusing the exact field name and it seems to work fine. Am I going to run into a subtle bug? — John
Re: [julia-users] Repeating names in inner constructors?
Great. That is really nice. — John On Jan 11, 2014, at 5:34 PM, Stefan Karpinski stefan.karpin...@gmail.com wrote: Nope. This is one of the nice things about the design. On Jan 11, 2014, at 8:16 PM, John Myles White johnmyleswh...@gmail.com wrote: I’ve noticed that a lot of people to use different field names when writing inner constructors, so that you see code like: type Foo a::Int function Foo(alpha::Int) magic(alpha) new(alpha) end end Would this ever be necessary to avoid confusion about names? I’ve started reusing the exact field name and it seems to work fine. Am I going to run into a subtle bug? — John
Re: [julia-users] Incorrect behaviour defining methods with different argument definitions in separate modules? (using method in module Main conflicts with an existing identifier)
Freddy, This is definitely one of the more confusing things about Julia, but it’s the best current solution anyone has proposed. The problem with your example is that methods can only be extended to work on new types if you make their provenance clear. In your example, you would do something like the following: module A export f f(s::String) = Some operation with a String; end module B A.f(b::Bool) = Some operation with a boolean; end Absent an explicit qualification of the origin of the “f” name in module B, Julia assumes that the f method in B is totally unrelated, which effectively overwrites the f method in A. — John On Jan 12, 2014, at 9:48 AM, Freddy Snijder fre...@visionscapers.com wrote: Hello Julia Users, I'm new to Julia and came across some behaviour of Julia, related to methods, I didn't expect. Case A) In the REPL, when I define two methods, I get the behaviour I expect: julia f(s::String) = Some operation with a String; julia f(b::Bool) = Some operation with a boolean; julia f f (generic function with 2 methods) So far, so good. Case B) Now if I have a file with this code and load it in to a fresh REPL session (using 'include'): module A export f f(s::String) = Some operation with a String; end module B export f f(b::Bool) = Some operation with a boolean; end then, when stating 'using A' and 'using B', I get a warning that there is a conflict with an existing f: julia using A julia f f (generic function with 1 method) julia using B Warning: using B.f in module Main conflicts with an existing identifier. julia f f (generic function with 1 method) I would have expected that Julia would see this as the same method with two different argument definitions, just like in Case A). I have multiple modules that define the same methods for different composite types, which seems a normal way of working to me. What am I doing wrong? The way Julia currently handles this seems incorrect to me ... I'm interested to hear your input! Kind regards, Freddy PS : I'm on Julia Version 0.3.0-prerelease+584 (2013-12-19 22:26 UTC), Commit 06458fa* (2 days old master), x86_64-apple-darwin13.0.0
Re: [julia-users] ANN: LibGit2 bindings
This seems really awesome. Amazing work, Jake! — John On Jan 11, 2014, at 9:56 PM, Jake Bolewski jakebolew...@gmail.com wrote: Link https://github.com/jakebolewski/LibGit2.jl On Sunday, January 12, 2014 12:55:27 AM UTC-5, Jake Bolewski wrote: Hi everyone, I've been working on LibGit2 bindings for julia over the past month or so, steadily porting over the the test suite from Ruby's rugged library. Allmost all of the tests have been rewritten and are now passing. Most of the testing has been done on the development branch of the libgit library and on Linux. Please run the test suite and submit an issue if (when) it breaks on your system. Hopefully once this matures some more it will enable Pkg to be be rewritten using libgit. See: https://github.com/JuliaLang/julia/issues/4158, https://github.com/JuliaLang/julia/pull/4866 If you have any spare cycles please help! The api could be refactored quite a bit. Hopefully this is a good base to work from. Best, Jake
Re: [julia-users] New method definition not being picked up
This is one of the main outstanding quirks about Julia that will get resolved at some point in the nearish future. See https://github.com/JuliaLang/julia/issues/265 for more details. — John On Jan 12, 2014, at 4:02 PM, Andrew Burrows burro...@gmail.com wrote: Hi I'm rather new to Julia, but I've come across some rather puzzling behaviour of the language. The following code works fine and the assert passes: a(x) = 12345 b(x) = a(x) a(x::Int64) = 1000 @assert b(1)== 1000 But this near identical code does not, throwing an assertion error: a(x) = 12345 b(x) = a(x) b(1) # --- This line is new a(x::Int64) = 1000 @assert b(1)== 1000 It would seem that the definition of a(x) is being cached but in both cases this assert passes fine: @assert a(1) == 1000 Also this almost identical code works fine: a(x) = 12345 b(x) = a(x) a(1) # --- This line is now calling a not b a(x::Int64) = 1000 @assert b(1)== 1000 @assert a(1)== 1000 Is this behaviour a bug or is it by design? Am I doing something wrong or is there something I can do to disable what ever is caching my method definition or is there any way to work around it? Cheers Andy
Re: [julia-users] A few questions I couldn't answer by myself
1. I think this is not possible, but I might be wrong. 2. Tuples have gotten a lot more efficient recently. Others will have to comment more on their relative merits vs. immutable composite types, which I prefer for explicitness and simpler integration with the dispatch system. 3. No idea about this. 90 MB isn’t much of an issue for the kind of work I do. 4. Blah{A} is a family of types, each of which is different for a specific value of A. The untyped version type Blah; a; end has a single type: its `a` field always has type Any, which is never tightened in response to data. 5. I think this is possible, but don’t know for sure. 6. Documentation is a major issue that should move forward in the next few months. Right now it is not possible to integrate your own functions with the help system. Hope that helps. Others will probably expand on my answers. — John On Jan 12, 2014, at 6:39 PM, Andy M 0andrewmart...@gmail.com wrote: I've been following and experimenting with Julia for a little while now, and I have encountered questions that I haven't managed to answer by reading or googling. An answer to any of them would be greatly appreciated. 1. Is there any way to retrieve the location of the definition of a variable or a type? I know I can use @which to find method definitions, but that's all I know how to find. 2. Are tuples are less memory efficient than immutable composite types? If so, why is this? I got the impression that they are after reading various different articles and comments, so maybe I have just misunderstood something. 3. Why is Julia's memory usage so high? When I open the interpreter (in linux) it stabilises around 90MB. If I call Pkg.installed(), it jumps to 165MB, and stays there. Calling gc() doesn't reduce it either. Is it an inevitable consequence of the language's design? Or perhaps an issue that is being worked on? Or is it just not that important to the language's target users? 4. The documentation suggests that type Blah; a; end is less efficient than type Blah{A}; a::A; end. If so, why does the former not default to the behaviour of the latter? Is it to avoid excess code generation? Or perhaps the latter representation has some undesirable behaviour? 5. Is it currently possible to pass a struct to a C function? I found documentation saying that it isn't possible, but there are github issues which suggest the problem has been worked on. 6. Is there a way to document a function, method, type, variable or module, such that the documentation can be retrieved in the interpreter? I mean something like javadocs or python docstrings. If not, is something like this going to be added? Sorry for asking so many questions all at once. I am considering starting quite a big project in Julia, and I think my timezone has made it difficult to find help in the IRC channel.
Re: [julia-users] Edit Distance?
Thanks! I'll have to check that out. I was able to translate some of the Wikipedia code fast enough to get something working for my purposes. -- John On Jan 14, 2014, at 3:18 PM, Matthias BUSSONNIER bussonniermatth...@gmail.com wrote: Le 14 janv. 2014 à 15:08, John Myles White a écrit : Is there a package out there to compute edit distances between strings? I started at some point, never really finished. https://github.com/carreau/Diff.jl -- M -- John
Re: [julia-users] Can I write a macro that defines a function?
To be honest, I don’t fully understand what goes wrong here, but this way of doing it does work: macro bar(num) ex = Expr(:(=), esc(Expr(:call, :foo, :x)), esc(num)) return ex end @bar 5 foo(1) I suspect that, in your example, there’s an attempt to evaluate the sub-expressions in the wrong scope. For example, this code shouldn’t (and doesn’t) work: macro bar(num) ex = Expr(:(=), Expr(:call, :foo, :x), esc(num)) return ex end @bar 5 foo(1) — John On Jan 14, 2014, at 4:47 PM, Eric Davies iam...@gmail.com wrote: julia macro bar(num) :(foo(x) = $num) end julia @bar 5 foo#27 (generic function with 1 method) julia foo ERROR: foo not defined It appears that variable/function definitions in macros are mangled somehow. Is there any way to define a function or set a variable in a macro (s.t. that definition/assignment occurs in the calling scope)?
Re: [julia-users] Re: A few questions I couldn't answer by myself
I think a new Python interpreter session might not be the closest comparison for Julia since Python loads almost nothing by default, whereas Julia imports a ton of functionality by default. R is much more like Julia in this regard. Consistent with that hypothesis, on my machine, R uses 38 MB and Julia uses 41 MB. I suspect that Julia without most of its functionality could take up much less memory. — John On Jan 14, 2014, at 6:56 PM, Andy M 0andrewmart...@gmail.com wrote: Thank you all very much for your answers, they have been extremely helpful! In summary, it seems like there is a fairly clear answer to all but one question, which is the question about Julia's memory usage. I am still puzzled by what it is actually being used for. For comparison, if I start a python interpreter it uses less than 5MB of RAM. I expected Julia's code generation to consume more memory than an interpreter, but I did not expect it to be anywhere near that much. I suppose the issue with Tuple/immutable type performance also isn't completely clear. Andy, would you be willing to collect the responses you found helpful and add them to the FAQ? https://github.com/JuliaLang/julia/blob/master/doc/manual/faq.rst You can just click edit on that page, no need to explicitly deal with git. I've been keeping track of many questions that I have found answers to (not just those answered here), and I've been writing it all up in a desktop wiki. I'm not sure what would be best to add to the FAQ at this point, but I am hoping to work out a good format for sharing my experiences soon. Either by contributing to the FAQ, by writing a blog post, or both.
Re: [julia-users] Re: Julia computational efficiency vs C vs Java vs Python vs Cython
The arguments against changing are pretty strong, but I’d really like it if Julia did a bit less automatic promotion. For example, it would be great if sum(x::T…) returned a value of type T. — John On Jan 15, 2014, at 5:32 AM, Stefan Karpinski ste...@karpinski.org wrote: We already provide all the necessary intrinsics for 32-bit arithmetic, so it's pretty easy to write a module that redefines arithmetic operations on integers to do this, but it's definitely some work that would need to be done. I'd be in favor of having this option. The biggest issue is that it would only apply in lexical scope. I.e. even if Int16 + Int16 = Int16, you'd still have sum(Int16[]) = Int, since sum is defined in Base. So to get the full effect, this would probably need to be a global switch, at which point you're really just talking about running in 32-bit mode on a 64-bit system. On Wed, Jan 15, 2014 at 5:32 AM, Miles Lubin miles.lu...@gmail.com wrote: Just to throw in my two cents, I don't think it's the right approach to brush off a class of performance optimizations that has a valid use case in practice and can lead to a 4x speedup. There should at least be *some* way to access nonpromoting integer operations, even if the default operators do promote.
Re: [julia-users] New install on OSX 10.9 GLM package does not work
We've unfortunately done a bad job of keeping those packages compatible with 0.2. I'll try to fix as much as I can today. -- John On Jan 15, 2014, at 8:48 AM, Corey Sparks corey.sparks.u...@gmail.com wrote: Dear List, I just installed Julia 0.2.0 last night and was trying to get the GLM package going, when I try to load it and the RDatasets packages, I get: julia using RDatasets, GLM Warning: could not import Base.foldl into NumericExtensions Warning: could not import Base.foldr into NumericExtensions Warning: could not import Base.sum! into NumericExtensions Warning: could not import Base.maximum! into NumericExtensions Warning: could not import Base.minimum! into NumericExtensions Warning: could not import Base.PAIRWISE_SUM_BLOCKSIZE into NumericExtensions ERROR: TernaryFunctor not defined in include at boot.jl:238 in include_from_node1 at loading.jl:114 in include at boot.jl:238 in include_from_node1 at loading.jl:114 in reload_path at loading.jl:140 in _require at loading.jl:58 in require at loading.jl:43 at /Users/ozd504/.julia/GLM/src/lm.jl:22 at /Users/ozd504/.julia/GLM/src/GLM.jl:76 It looks like something in GLM is broken, does anyone have advice on this? Thank you Corey
Re: [julia-users] Julia computational efficiency vs C vs Java vs Python vs Cython
+1 for Iain’s point of view. — John On Jan 15, 2014, at 5:16 PM, Iain Dunning iaindunn...@gmail.com wrote: From a philosophical POV alone, I think its inconsistent that we a) Don't save people from overflows, but b) Silently do Int32 math as Int64 behind the scenes to presumably save themselves from themselves I think the overflow behaviour suprises some people, but only because they've been trained on Python etc. instead of C, but the Int32 behaviour would surprise pretty much everyone given how Julia normally acts (as the manual says, its falls into the more no automatic coversion family of languages) On Wednesday, January 15, 2014 4:28:15 PM UTC-5, Földes László wrote: Sorry for the wrong info, I was switching between a 32 bit and a 64 bit machine (SSH terminal), and I just happened to run the script on the 32 bit machine... On Wednesday, January 15, 2014 12:37:07 AM UTC+1, Przemyslaw Szufel wrote: Foldes, I went for your solution and got a time increase from 2.1 seconds (64bit integers) to 17.78 seconds (32 bit dow-casting). Seems like casting is no cheap... Any other ideas possibilities? All best, Przemyslaw P.S. Naturally I realize that this is toy example and normally in a typical production code we would rather use real numbers for computations not ints. I am asking just out of curiosity ;-) On Wednesday, 15 January 2014 00:25:20 UTC+1, Földes László wrote: You can force the literals by enclosing them in int32(): p = [int32(0) for i=1:2] result = [int32(0) for i=1:2] k = int32(0) n = int32(2) while k int32(2) i = int32(0) On Wednesday, January 15, 2014 12:04:23 AM UTC+1, Przemyslaw Szufel wrote: Simon, Thanks! I changed in Cython to def primes_list(int kmax): cdef int k, i cdef long long n cdef long long p[2] and now I am getting 2.1 seconds - exactly the same time as Julia and Java with longs... Since the computational difference between 64bit longs and 32bit ints is soo high - is there any way to rewrite my toy example to force Julia to do 32 bit int calculations? All best, Przemyslaw Szufel On Tuesday, 14 January 2014 23:55:12 UTC+1, Simon Kornblith wrote: In C long is only guaranteed to be at least 32 bits (IIRC it's 64 bits on 64-bit *nix but 32-bit on 64-bit Windows). long long is guaranteed to be at least 64 bits (and is 64 bits on all systems I know of). Simon On Tuesday, January 14, 2014 5:46:04 PM UTC-5, Przemyslaw Szufel wrote: Simon, Thanks for the explanation! In Java int is 32 bit as well. I have just replaced ints with longs in Java and found out that now I get the Java speed also very similar to Julia. However I tried in Cython: def primes_list(int kmax): cdef int k, i cdef long n cdef long p[2] ... and surprisingly the speed did not change...at first I thought that maybe something did not compile or is in cache - but I made sure - it's not the cache. Cython speed remains unchanged regardles using int or long? I know that now it becomes other language question...but maybe someone can explain? All best, Przemyslaw Szufel On Tuesday, 14 January 2014 23:29:40 UTC+1, Simon Kornblith wrote: With a 64-bit build, Julia integers are 64-bit unless otherwise specified. In C, you use ints, which are 32-bit. Changing them to long long makes the C code perform similarly to the Julia code on my system. Unfortunately, it's hard to operate on 32-bit integers in Julia, since + promotes to 64-bit by default (am I missing something)? Simon On Tuesday, January 14, 2014 4:32:16 PM UTC-5, Przemyslaw Szufel wrote: Dear Julia users, I am considering using Julia for computational projects. As a first to get a feeling of the new language a I tried to benchmark Julia speed against other popular languages. I used an example code from the Cython tutorial: http://docs.cython.org/src/tutorial/cython_tutorial.html [ the code for finding n first prime numbers]. Rewriting the code in different languages and measuring the times on my Windows laptop gave me the following results: Language | Time in seconds (less=better) Python: 65.5 Cython (with MinGW): 0.82 Java : 0.64 Java (with -server option) : 0.64 C (with MinGW): 0.64 Julia (0.2): 2.1 Julia (0.3 nightly build): 2.1 All the codes for my experiments are attached to this post (Cython i Python are both being run starting from the prim.py file) The thing that worries me is that Julia takes much much longer than Cython ,,, I am a beginner to Julia and would like to kindly ask what am I doing wrong with my code. I start Julia console and use the command include (prime.jl) to execute it. This code looks very simple and I think the compiler should be able to optimise it to at least the speed of Cython? Maybe I my code has been written in non-Julia style way and the compiler has
Re: [julia-users] duplicate a type
I don’t know offhand how to do this, but I’d look at the code for xdump, which shows that the necessary introspection operations exist: Foo::DataType : Any a::Int64::DataType : Signed b::Float64::DataType : FloatingPoint — John On Jan 17, 2014, at 10:16 AM, Simon Byrne simonby...@gmail.com wrote: I want to define a new composite type with exactly the same fields as another type. Is there an easy way to do this? The original type is not parametric. Alternatively, is there a way I can figure out the type of a field of a composite type Foo without constructing an object of type Foo? -Simon
[julia-users] DataFrames / DataArrays updated
As a consequence of renaming Stats to StatsBase, I’ve had to update DataFrames and DataArrays. This means that everyone working with those libraries is now in sync with master again. That brings with it a lot of changes that may break some code. To help minimize breakage, here are the most obvious changes that might affect you. (1) We now offer @data / @pdata macros to write out literal DataArrays and PooledDataArrays. They need a little bit more refinement to deal with edge cases, but they’re a big improvement over the previous system. Examples of usage below: @data [1, 2, NA, 4] @data [1 2; NA 4] @pdata [1, 2, NA, 4] @pdata [1 2; NA 4] You can also do this with variables (as long as they’re not NA’s): a, b, c, d = 1, 2, 3, 4 @data [a, b, c, d] @data [a b; c d] The unfortunate edge case is that the following will fail: a, b, c, d = 1, 2, 3, NA @data [a, b, c, d] @data [a b; c d] (2) To convert other AbstractArrays to DataArrays / DataFrames, please use the data and pdata functions: data([1, 2, 3, 4]) data(1:3) pdata([1, 2, 3, 4]) pdata(1:3) We’ve removed a lot of the constructors for DataArrays and PooledDataArray’s that had no parallel to anything in Base, where there are very few valid constructors for Array’s. If you use things like DataArray(1:10), it will be broken now. Please switch to using the data() function. — John
[julia-users] UTF8 byte indexing
I suspect I’m missing something, but this seems odd to me: julia s = string('ñ') ñ julia s[2] ERROR: invalid UTF-8 character index julia s[2:2] “ — John
Re: [julia-users] How to install a Package from a github branch
My fork of SQLite is very different from master. It represents most of my work pushing for Julia to have a DBI module that lets us write generic code for database access. I’m hoping to finish my work on writing a DBI package plus drivers for SQLite and MySQL very soon. I would hold off on using my fork until there’s an official release. — John On Jan 20, 2014, at 8:13 AM, Stefan Karpinski ste...@karpinski.org wrote: That did successfully install the package. However, as per the documentation for Pkg.clone, it did so under the package name jmw. Did you mean for the second argument to be a branch name? You can checkout a specific branch after cloning the package using the Pkg.checkout command. Also, SQLite is an official, registered package, so installing it via Pkg.clone is a bit unusual. Do you need John's fork for some particular reason? On Mon, Jan 20, 2014 at 8:10 AM, Sharmila Gopirajan Sivakumar sharmila.gopira...@gmail.com wrote: Hi, I want to install a package from a github branch, specifically, https://github.com/johnmyleswhite/SQLite.jl/tree/jmw . I tried the following command Pkg.clone(https://github.com/johnmyleswhite/SQLite.jl.git;, jmw) INFO: Cloning jmw from https://github.com/johnmyleswhite/SQLite.jl.git INFO: Computing changes... INFO: No packages to install, update or remove. Julia is not able to install the package. Is it possible to locally checkout the code and install from source? Thank you. Regards, Sharmila
Re: [julia-users] Re: Higher order derivatives in Calculus
I would love to see lots of improvements in the Calculus package. The interface is kind of wonky and there’s probably a lot of places where we’re getting less than ideal results. But I currently own far too many of Julia’s packages at the moment. If other people want to take some of them over, it will radically improve my life. As things stand, it’s literally impossible for me to keep up with the workload that package maintenance would involve. — John On Jan 20, 2014, at 10:54 AM, Ivar Nesje iva...@gmail.com wrote: The calculus package could definitely be much better if someone with knowhow and time would improve it. Unfortunately it seems like @johnmyleswhite does not maintain this package anymore, and nobody has taken up the ball. kl. 19:40:28 UTC+1 mandag 20. januar 2014 skrev Hans W Borchers følgende: I looked into the Calculus package and its derivative functions. First, I got errors when running examples from the README file: julia second_derivative(x - sin(x), pi) ERROR: no method eps(DataType,) in finite_difference at /Users/HwB/.julia/Calculus/src/finite_difference.jl:27 in second_derivative at /Users/HwB/.julia/Calculus/src/derivative.jl:67 Then I was a bit astonished to see not too accurate results such as julia abs(second_derivative(sin, 1.0) + sin(1.0)) 6.647716624952338e-7 while, when applying the standard central formula for second derivatives, (f(x+h) - 2*f(x) + f(x-h)) / h^2 with the (by theory) suggested step length eps^0.25 (for second derivatives) will result in a much better value: julia h = eps()^0.25; julia f = sin; x = 1.0; julia df = (sin(x+h) - 2*sin(x) + sin(x-h)) / h^2 -0.8414709866046906 julia abs(df + sin(1.0)) 1.7967940468821553e-9 The functions for numerical differentiation in Calculus look quite involved, maybe it would be preferable to apply known approaches derived from Taylor series. Even the fourth order derivative will in this case lead to an absolute error below 1e-05!
Re: [julia-users] How to install a Package from a github branch
Keyword arguments seem like a much better approach. — John On Jan 20, 2014, at 11:23 AM, Stefan Karpinski ste...@karpinski.org wrote: This makes me wonder if the API should change. Maybe keyword arguments for both the package name and branch? On Mon, Jan 20, 2014 at 1:25 PM, Ivar Nesje iva...@gmail.com wrote: Can't you just do Pkg.clone(https://gitub.com.,pkgname;) Pkg.checkout(pkgname,branch) Ivar kl. 19:18:23 UTC+1 mandag 20. januar 2014 skrev Sharmila Gopirajan Sivakumar følgende: Hi Stefan, Thank you for responding. As an extension of my understanding of Pkg.checkout, I assumed that Pkg.clone(url, name) would clone the branch 'name' for the repo at 'url'. My bad. It still installs only the master branch. Right now there doesn't seem to be support to install a branch or tag through Pkg.clone(). I want to use John's fork because it is DBI complaint and supports prepared statements and parameter binding which the official version doesn't. I just now read John Myles White's response too. While I accept his idea, would it not be useful to have the ability to install from the branch or tag of an unregistered Package? If you feel that is a valid feature, I would be happy to help add it. Regards, Sharmila On Mon, Jan 20, 2014 at 9:43 PM, Stefan Karpinski ste...@karpinski.org wrote: That did successfully install the package. However, as per the documentation for Pkg.clone, it did so under the package name jmw. Did you mean for the second argument to be a branch name? You can checkout a specific branch after cloning the package using the Pkg.checkout command. Also, SQLite is an official, registered package, so installing it via Pkg.clone is a bit unusual. Do you need John's fork for some particular reason? On Mon, Jan 20, 2014 at 8:10 AM, Sharmila Gopirajan Sivakumar sharmila@gmail.com wrote: Hi, I want to install a package from a github branch, specifically, https://github.com/johnmyleswhite/SQLite.jl/tree/jmw . I tried the following command Pkg.clone(https://github.com/johnmyleswhite/SQLite.jl.git;, jmw) INFO: Cloning jmw from https://github.com/johnmyleswhite/SQLite.jl.git INFO: Computing changes... INFO: No packages to install, update or remove. Julia is not able to install the package. Is it possible to locally checkout the code and install from source? Thank you. Regards, Sharmila
[julia-users] Ambiguity Warnings
The recent SharedArray change to Base created some new ambiguity warnings for DataFrames. Warning: New definition getindex(AbstractArray{T,1},Indexer) at /Users/johnmyleswhite/.julia/DataFrames/src/indexing.jl:195 is ambiguous with: getindex(SharedArray{T,N},Any...) at sharedarray.jl:156. To fix, define getindex(SharedArray{T,1},Indexer) before the new definition. — John
[julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
As I said in another thread recently, I am currently the lead maintainer of more packages than I can keep up with. I think it’s been useful for me to start so many different projects, but I can’t keep maintaining most of my packages given my current work schedule. Without Simon Kornblith, Kevin Squire, Sean Garborg and several others doing amazing work to keep DataArrays and DataFrames going, much of our basic data infrastructure would have already become completely unusable. But even with the great work that’s been done on those package recently, there’s still lot of additional design work required. I’d like to free up some of my time to do that work. To keep things moving forward, I’d like to propose a couple of radical New Year’s resolutions for the packages I work on. (1) We need to stop adding functionality and focus entirely on improving the quality and documentation of our existing functionality. We have way too much prototype code in DataFrames that I can’t keep up with. I’m about to make a pull request for DataFrames that will remove everything related to column groupings, database-style indexing and Blocks.jl support. I absolutely want to see us push all of those ideas forward in the future, but they need to happen in unmerged forks or separate packages until we have the resources needed to support them. Right now, they make an overwhelming maintenance challenge even more onerous. (2) We can’t support anything other than the master branch of most JuliaStats packages except possibly for Distributions. I personally don’t have the time to simultaneously keep stuff working with Julia 0.2 and Julia 0.3. Moreover, many of our basic packages aren’t mature enough to justify supporting older versions. We should do a better job of supporting our master releases and not invest precious time trying to support older releases. (3) We need to make more of DataArrays and DataFrames reflect the Julian worldview. Lots of our code uses an interface that is incongruous with the interfaces found in Base. Even worse, a large chunk of code has type-stability problems that makes it very slow, when comparable code that uses normal Arrays is 100x faster. We need to develop new idioms and new strategies for making code that interacts with type-destabilizing NA’s faster. More generally, we need to make DataArrays and DataFrames fit in better with Julia when Julia and R disagree. Following R’s lead has often lead us astray because R doesn’t share Julia’s strenths or weaknesses. (4) Going forward, there should be exactly one way to do most things. The worst part of our current codebase is that there are multiple ways to express the same computation, but (a) some of them are unusably slow and (b) some of them don’t ever get tested or maintained properly. This is closely linked to the excess proliferation of functionality described in Resolution 1 above. We need to start removing stuff from our packages and making the parts we keep both reliable and fast. I think we can push DataArrays and DataFrames to 1.0 status by the end of this year. But I think we need to adopt a new approach if we’re going to get there. Lots of stuff needs to get deprecated and what remains needs a lot more testing, benchmarking and documentation. — John
Re: [julia-users] Ambiguity Warnings
For the moment this is solved by me having removed indexing.jl, which we didn’t really need. So I don’t think you need to do anything for the moment. But I’d broadly like to know if we have any strategy for avoiding these kinds of conflicts moving forward. It’s such an odd experience to find my code raises warnings because of changes external to it. — John On Jan 20, 2014, at 7:52 PM, Amit Murthy amit.mur...@gmail.com wrote: What would be the best way to solve this? A SharedArray type has a regular Array backing it and we should make it usable wherever a regular Array can be used. Would the right thing to do be - get a list of getindex methods that operate on a regular Array - generate the same definitions for a SharedArray with a pass through to the backing Array - this would ensure that any further getindex definitions for an Array are automatically generated for SharedArray too On Tue, Jan 21, 2014 at 1:04 AM, John Myles White johnmyleswh...@gmail.com wrote: The recent SharedArray change to Base created some new ambiguity warnings for DataFrames. Warning: New definition getindex(AbstractArray{T,1},Indexer) at /Users/johnmyleswhite/.julia/DataFrames/src/indexing.jl:195 is ambiguous with: getindex(SharedArray{T,N},Any...) at sharedarray.jl:156. To fix, define getindex(SharedArray{T,1},Indexer) before the new definition. — John
Re: [julia-users] Higher order derivatives in Calculus
Just to chime in: the biggest problem with the Calculus isn’t the absence of usable functionality, it’s that the published interface isn’t a very good one and the more reliable interface, including things like finite_difference_hessian, isn’t exported. To fix this, we need someone to come in and do some serious design work, where they'll rethink interfaces and remove out-dated functionality. As Tim Holy mentioned, the combination of the unpublished finite diference methods and automatic differentation methods in DualNumbers should get you very far. — John On Jan 21, 2014, at 7:20 AM, Tim Holy tim.h...@gmail.com wrote: On Tuesday, January 21, 2014 05:32:13 AM Hans W Borchers wrote: When you say, Calculus is not developed much at the moment, maybe it's too early for me to change. Writing finite-differencing algorithms isn't that hard. That should not be a make-or-break issue for your decision about whether to use Julia. But don't underestimate the automatic differentiation facilities that have recently been added to Julia (https://github.com/scidom/DualNumbers.jl). Basically, AD computes numerical derivatives without the roundoff error, by defining a new numerical type that behaves somewhat similarly to complex numbers but extracts the first derivative exactly. The key point is that it is a _numerical_ approach, so it doesn't rely on anything symbolic. The one place you can't use AD is when your function relies on calling out to C (because C doesn't know about Julia's Dual type). But any function defined in Julia, including special functions like elliptic integrals, etc, should be fine. For higher-order derivatives, you can do similar things with even more fancy numerical types. Perhaps the new PowerSeries already does this? (I haven't looked.) --Tim
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
I agree with everything on this list, including my always neglected DataStreams project. I think it would be nice to get rid of expression-based indexing + select and focus on getting something like LINQ working. For another interesting perspective, check out the nearly created query function in Pandas, which takes in strings rather than expressions as inputs. — John On Jan 21, 2014, at 4:42 AM, Tom Short tshort.rli...@gmail.com wrote: I also agree with your approach, John. Based on your criteria, here are some other things to consider for the chopping block. - expression-based indexing - NamedArray (you already have an issue on this) - with, within, based_on and variants - @transform, @DataFrame - select, filter - DataStream Many of these were attempts to ease syntax via delayed evaluation. We can either do without or try to implement something like LINQ.
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
Can you do something like df[“ColA”] = f(df)? — John On Jan 21, 2014, at 8:48 AM, Blake Johnson blakejohnso...@gmail.com wrote: I use within! pretty frequently. What should I be using instead if that is on the chopping block? --Blake On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: I also agree with your approach, John. Based on your criteria, here are some other things to consider for the chopping block. - expression-based indexing - NamedArray (you already have an issue on this) - with, within, based_on and variants - @transform, @DataFrame - select, filter - DataStream Many of these were attempts to ease syntax via delayed evaluation. We can either do without or try to implement something like LINQ. On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire kevin@gmail.com wrote: Hi John, I agree with pretty much everything you have written here, and really appreciate that you've taken the lead in cleaning things up and getting us on track. Cheers! Kevin On Mon, Jan 20, 2014 at 1:57 PM, John Myles White johnmyl...@gmail.com wrote: As I said in another thread recently, I am currently the lead maintainer of more packages than I can keep up with. I think it’s been useful for me to start so many different projects, but I can’t keep maintaining most of my packages given my current work schedule. Without Simon Kornblith, Kevin Squire, Sean Garborg and several others doing amazing work to keep DataArrays and DataFrames going, much of our basic data infrastructure would have already become completely unusable. But even with the great work that’s been done on those package recently, there’s still lot of additional design work required. I’d like to free up some of my time to do that work. To keep things moving forward, I’d like to propose a couple of radical New Year’s resolutions for the packages I work on. (1) We need to stop adding functionality and focus entirely on improving the quality and documentation of our existing functionality. We have way too much prototype code in DataFrames that I can’t keep up with. I’m about to make a pull request for DataFrames that will remove everything related to column groupings, database-style indexing and Blocks.jl support. I absolutely want to see us push all of those ideas forward in the future, but they need to happen in unmerged forks or separate packages until we have the resources needed to support them. Right now, they make an overwhelming maintenance challenge even more onerous. (2) We can’t support anything other than the master branch of most JuliaStats packages except possibly for Distributions. I personally don’t have the time to simultaneously keep stuff working with Julia 0.2 and Julia 0.3. Moreover, many of our basic packages aren’t mature enough to justify supporting older versions. We should do a better job of supporting our master releases and not invest precious time trying to support older releases. (3) We need to make more of DataArrays and DataFrames reflect the Julian worldview. Lots of our code uses an interface that is incongruous with the interfaces found in Base. Even worse, a large chunk of code has type-stability problems that makes it very slow, when comparable code that uses normal Arrays is 100x faster. We need to develop new idioms and new strategies for making code that interacts with type-destabilizing NA’s faster. More generally, we need to make DataArrays and DataFrames fit in better with Julia when Julia and R disagree. Following R’s lead has often lead us astray because R doesn’t share Julia’s strenths or weaknesses. (4) Going forward, there should be exactly one way to do most things. The worst part of our current codebase is that there are multiple ways to express the same computation, but (a) some of them are unusably slow and (b) some of them don’t ever get tested or maintained properly. This is closely linked to the excess proliferation of functionality described in Resolution 1 above. We need to start removing stuff from our packages and making the parts we keep both reliable and fast. I think we can push DataArrays and DataFrames to 1.0 status by the end of this year. But I think we need to adopt a new approach if we’re going to get there. Lots of stuff needs to get deprecated and what remains needs a lot more testing, benchmarking and documentation. — John
Re: [julia-users] Higher order derivatives in Calculus
If you’re willing to wait, I’m happy to return to the Calculus package in the spring. I’m focusing on DataFrames/DataArrays (and some database stuff that’s closely related) until then. — John On Jan 21, 2014, at 8:42 AM, Hans W Borchers hwborch...@gmail.com wrote: Thanks for these encouraging words. I have already written an R package with more than a hundred numerical functions (incl. several numerical derivatives), and I would be willing to help build up a numerical package in Julia. But of course, someone from the Julia community will be needed to take the lead. Please let me know when this 'management position'(?) has been taken. On Tuesday, January 21, 2014 4:44:37 PM UTC+1, John Myles White wrote: Just to chime in: the biggest problem with the Calculus isn’t the absence of usable functionality, it’s that the published interface isn’t a very good one and the more reliable interface, including things like finite_difference_hessian, isn’t exported. To fix this, we need someone to come in and do some serious design work, where they'll rethink interfaces and remove out-dated functionality. As Tim Holy mentioned, the combination of the unpublished finite diference methods and automatic differentation methods in DualNumbers should get you very far. — John
Re: [julia-users] Higher order derivatives in Calculus
This sounds like a great approach, Tim. (And, for the record, I’m legitimately amazed by the amount of functionality you’re successfully maintaining.) Since we’re adding feature requests, here’s another one: (Feature) Implement lower and upper bounds on FD gradient calculations. If lower or upper bounds are violated by chosen forward or central differencing method, change behavior to stay within bounds. This would make it much easier for us to use finite differencing in constrained optimization problems. — John On Jan 22, 2014, at 8:53 AM, Tim Holy tim.h...@gmail.com wrote: To me this sounds like a case for a fork: Hans doesn't yet feel confident about his Julia, but John wants to ditch maintainership. (Trust me John, I _really_ understand!) We need an organic way of test-driving a new maintainer. Hans, why don't you just fork it to your github account and start making changes, and let's see how it goes? A couple of tips: - As you make changes, run the tests to see if they still pass, and you'll have some reason to hope that you may not have broken anything. - For any API changes, a way to be nice to users is to use the `@deprecate` macro. Adhering to those guidelines will make it easier for people to migrate to your package. If you get to the point of having something your proud of, rather than submitting a pull request to John's package, just advertise it to the list. That will begin the process of other people being able to test out your version, with no risk (John's will still be up, too). If all goes well, you'll eventually become the official maintainer. Hans, I already have a feature-request for you: spot checking particular elements of the gradient. When I have a function of 10^6 variables, often all I want to do it get some indication that I've done my analytic calculation of the gradient correctly. Computing all 10^6 components is horrifically slow, and usually not necessary. --Tim On Wednesday, January 22, 2014 08:28:05 AM John Myles White wrote: Yes, it would. I just don’t know who’s going to do that. But I badly want someone to. — John On Jan 22, 2014, at 3:33 AM, Hans W Borchers hwborch...@gmail.com wrote: John, as I understood you are overloaden. And I cannot believe this will change in spring. Wouldn't it be preferable if someone else takes over? Hans Werner On Wednesday, January 22, 2014 3:58:18 AM UTC+1, John Myles White wrote: If you’re willing to wait, I’m happy to return to the Calculus package in the spring. I’m focusing on DataFrames/DataArrays (and some database stuff that’s closely related) until then. — John On Jan 21, 2014, at 8:42 AM, Hans W Borchers hwbor...@gmail.com wrote: Thanks for these encouraging words. I have already written an R package with more than a hundred numerical functions (incl. several numerical derivatives), and I would be willing to help build up a numerical package in Julia. But of course, someone from the Julia community will be needed to take the lead. Please let me know when this 'management position'(?) has been taken. On Tuesday, January 21, 2014 4:44:37 PM UTC+1, John Myles White wrote: Just to chime in: the biggest problem with the Calculus isn’t the absence of usable functionality, it’s that the published interface isn’t a very good one and the more reliable interface, including things like finite_difference_hessian, isn’t exported. To fix this, we need someone to come in and do some serious design work, where they'll rethink interfaces and remove out-dated functionality. As Tim Holy mentioned, the combination of the unpublished finite diference methods and automatic differentation methods in DualNumbers should get you very far. — John
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
My impression is that Pandas didn't support anything like delayed evaluation. Is that wrong? I'm aware that the resulting expressions are a lot more verbose. That definitely sucks. I'd love to see strong proposals for how we're going to do a better job of making code shorter going forward. But too much of our current codebase is buggy, unable to handle edge cases, slow and undocumented. I think it's much more important that we have one way of doing things that actually works as advertised for every Julia user than two ways of doing things, each of which is slightly broken and performs worse than R and Pandas. As I've been saying lately, I'm burning out on maintaing so much Julia code. If someone else wants to take charge of my projects, I'm ok with that. But if I'm going to be doing the work going forward, I need to devote my energies to making a small number of things work really well. Once we get our core functionality solid, I'll be comfortable getting fancier stuff working again. -- John On Jan 22, 2014, at 1:06 PM, Kevin Squire kevin.squ...@gmail.com wrote: I'm also a fan of the expression-based interface (mostly because I'm used to similar things in Pandas). I haven't looked at that code, though, so I can't comment on the complexity. Kevin On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson blakejohnso...@gmail.com wrote: Sure, but the resulting expression is much more verbose. I just noticed that all expression-based indexing was on the chopping block. What is left after all this? I can see how axing these features would make DataFrames.jl easier to maintain, but I found the expression stuff to present a rather nice interface. --Blake On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote: Can you do something like df[“ColA”] = f(df)? — John On Jan 21, 2014, at 8:48 AM, Blake Johnson blakejo...@gmail.com wrote: I use within! pretty frequently. What should I be using instead if that is on the chopping block? --Blake On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: I also agree with your approach, John. Based on your criteria, here are some other things to consider for the chopping block. - expression-based indexing - NamedArray (you already have an issue on this) - with, within, based_on and variants - @transform, @DataFrame - select, filter - DataStream Many of these were attempts to ease syntax via delayed evaluation. We can either do without or try to implement something like LINQ. On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire kevin@gmail.com wrote: Hi John, I agree with pretty much everything you have written here, and really appreciate that you've taken the lead in cleaning things up and getting us on track. Cheers! Kevin On Mon, Jan 20, 2014 at 1:57 PM, John Myles White johnmyl...@gmail.com wrote: As I said in another thread recently, I am currently the lead maintainer of more packages than I can keep up with. I think it’s been useful for me to start so many different projects, but I can’t keep maintaining most of my packages given my current work schedule. Without Simon Kornblith, Kevin Squire, Sean Garborg and several others doing amazing work to keep DataArrays and DataFrames going, much of our basic data infrastructure would have already become completely unusable. But even with the great work that’s been done on those package recently, there’s still lot of additional design work required. I’d like to free up some of my time to do that work. To keep things moving forward, I’d like to propose a couple of radical New Year’s resolutions for the packages I work on. (1) We need to stop adding functionality and focus entirely on improving the quality and documentation of our existing functionality. We have way too much prototype code in DataFrames that I can’t keep up with. I’m about to make a pull request for DataFrames that will remove everything related to column groupings, database-style indexing and Blocks.jl support. I absolutely want to see us push all of those ideas forward in the future, but they need to happen in unmerged forks or separate packages until we have the resources needed to support them. Right now, they make an overwhelming maintenance challenge even more onerous. (2) We can’t support anything other than the master branch of most JuliaStats packages except possibly for Distributions. I personally don’t have the time to simultaneously keep stuff working with Julia 0.2 and Julia 0.3. Moreover, many of our basic packages aren’t mature enough to justify supporting older versions. We should do a better job of supporting our master releases and not invest precious time trying to support older releases. (3) We need to make more of DataArrays and DataFrames reflect the Julian
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
The idealized expression interface offers things like (up to reordering): with(df, a + b * x) where a and b are variables in the caller's scope and x is a column of df. In practice, we've had to hack this sort of thing together to offer things like with(df, :($a + $b * x)) That's because we need to pass quoted strings and we also need to tell the system which variables are in the caller's cope. More generally, I'd refer to any operation that passes expressions around and asks other functions to evaluate them with an ad hoc scope as expression-based operations. R offers very deep support for this in the language. -- John On Jan 22, 2014, at 2:48 PM, Kevin Squire kevin.squ...@gmail.com wrote: Maybe I misinterpreted the term expression-based interface. On Wed, Jan 22, 2014 at 2:33 PM, John Myles White johnmyleswh...@gmail.com wrote: My impression is that Pandas didn't support anything like delayed evaluation. Is that wrong? I'm aware that the resulting expressions are a lot more verbose. That definitely sucks. I'd love to see strong proposals for how we're going to do a better job of making code shorter going forward. But too much of our current codebase is buggy, unable to handle edge cases, slow and undocumented. I think it's much more important that we have one way of doing things that actually works as advertised for every Julia user than two ways of doing things, each of which is slightly broken and performs worse than R and Pandas. As I've been saying lately, I'm burning out on maintaing so much Julia code. If someone else wants to take charge of my projects, I'm ok with that. But if I'm going to be doing the work going forward, I need to devote my energies to making a small number of things work really well. Once we get our core functionality solid, I'll be comfortable getting fancier stuff working again. -- John On Jan 22, 2014, at 1:06 PM, Kevin Squire kevin.squ...@gmail.com wrote: I'm also a fan of the expression-based interface (mostly because I'm used to similar things in Pandas). I haven't looked at that code, though, so I can't comment on the complexity. Kevin On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson blakejohnso...@gmail.com wrote: Sure, but the resulting expression is much more verbose. I just noticed that all expression-based indexing was on the chopping block. What is left after all this? I can see how axing these features would make DataFrames.jl easier to maintain, but I found the expression stuff to present a rather nice interface. --Blake On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote: Can you do something like df[“ColA”] = f(df)? — John On Jan 21, 2014, at 8:48 AM, Blake Johnson blakejo...@gmail.com wrote: I use within! pretty frequently. What should I be using instead if that is on the chopping block? --Blake On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: I also agree with your approach, John. Based on your criteria, here are some other things to consider for the chopping block. - expression-based indexing - NamedArray (you already have an issue on this) - with, within, based_on and variants - @transform, @DataFrame - select, filter - DataStream Many of these were attempts to ease syntax via delayed evaluation. We can either do without or try to implement something like LINQ. On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire kevin@gmail.com wrote: Hi John, I agree with pretty much everything you have written here, and really appreciate that you've taken the lead in cleaning things up and getting us on track. Cheers! Kevin On Mon, Jan 20, 2014 at 1:57 PM, John Myles White johnmyl...@gmail.com wrote: As I said in another thread recently, I am currently the lead maintainer of more packages than I can keep up with. I think it’s been useful for me to start so many different projects, but I can’t keep maintaining most of my packages given my current work schedule. Without Simon Kornblith, Kevin Squire, Sean Garborg and several others doing amazing work to keep DataArrays and DataFrames going, much of our basic data infrastructure would have already become completely unusable. But even with the great work that’s been done on those package recently, there’s still lot of additional design work required. I’d like to free up some of my time to do that work. To keep things moving forward, I’d like to propose a couple of radical New Year’s resolutions for the packages I work on. (1) We need to stop adding functionality and focus entirely on improving the quality and documentation of our existing functionality. We have way too much prototype code in DataFrames that I can’t keep up with. I’m about to make a pull request for DataFrames that will remove everything related
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
That's exactly the kind of indexing I'd like to encourage using until we get our core functionality cleaned up. Nothing special required except Boolean indexing, which is easy to make fast and doesn't have weird scoping issues. -- John On Jan 22, 2014, at 3:18 PM, Kevin Squire kevin.squ...@gmail.com wrote: Got it. I was thinking of the more verbose (but still useful) df[(df[colA] 4) !isna(df[colB]), :] Kevin On Wed, Jan 22, 2014 at 3:10 PM, John Myles White johnmyleswh...@gmail.com wrote: The idealized expression interface offers things like (up to reordering): with(df, a + b * x) where a and b are variables in the caller's scope and x is a column of df. In practice, we've had to hack this sort of thing together to offer things like with(df, :($a + $b * x)) That's because we need to pass quoted strings and we also need to tell the system which variables are in the caller's cope. More generally, I'd refer to any operation that passes expressions around and asks other functions to evaluate them with an ad hoc scope as expression-based operations. R offers very deep support for this in the language. -- John On Jan 22, 2014, at 2:48 PM, Kevin Squire kevin.squ...@gmail.com wrote: Maybe I misinterpreted the term expression-based interface. On Wed, Jan 22, 2014 at 2:33 PM, John Myles White johnmyleswh...@gmail.com wrote: My impression is that Pandas didn't support anything like delayed evaluation. Is that wrong? I'm aware that the resulting expressions are a lot more verbose. That definitely sucks. I'd love to see strong proposals for how we're going to do a better job of making code shorter going forward. But too much of our current codebase is buggy, unable to handle edge cases, slow and undocumented. I think it's much more important that we have one way of doing things that actually works as advertised for every Julia user than two ways of doing things, each of which is slightly broken and performs worse than R and Pandas. As I've been saying lately, I'm burning out on maintaing so much Julia code. If someone else wants to take charge of my projects, I'm ok with that. But if I'm going to be doing the work going forward, I need to devote my energies to making a small number of things work really well. Once we get our core functionality solid, I'll be comfortable getting fancier stuff working again. -- John On Jan 22, 2014, at 1:06 PM, Kevin Squire kevin.squ...@gmail.com wrote: I'm also a fan of the expression-based interface (mostly because I'm used to similar things in Pandas). I haven't looked at that code, though, so I can't comment on the complexity. Kevin On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson blakejohnso...@gmail.com wrote: Sure, but the resulting expression is much more verbose. I just noticed that all expression-based indexing was on the chopping block. What is left after all this? I can see how axing these features would make DataFrames.jl easier to maintain, but I found the expression stuff to present a rather nice interface. --Blake On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote: Can you do something like df[“ColA”] = f(df)? — John On Jan 21, 2014, at 8:48 AM, Blake Johnson blakejo...@gmail.com wrote: I use within! pretty frequently. What should I be using instead if that is on the chopping block? --Blake On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: I also agree with your approach, John. Based on your criteria, here are some other things to consider for the chopping block. - expression-based indexing - NamedArray (you already have an issue on this) - with, within, based_on and variants - @transform, @DataFrame - select, filter - DataStream Many of these were attempts to ease syntax via delayed evaluation. We can either do without or try to implement something like LINQ. On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire kevin@gmail.com wrote: Hi John, I agree with pretty much everything you have written here, and really appreciate that you've taken the lead in cleaning things up and getting us on track. Cheers! Kevin On Mon, Jan 20, 2014 at 1:57 PM, John Myles White johnmyl...@gmail.com wrote: As I said in another thread recently, I am currently the lead maintainer of more packages than I can keep up with. I think it’s been useful for me to start so many different projects, but I can’t keep maintaining most of my packages given my current work schedule. Without Simon Kornblith, Kevin Squire, Sean Garborg and several others doing amazing work to keep DataArrays and DataFrames going, much of our basic data infrastructure would have already become completely unusable. But even with the great work that’s been done on those package recently, there’s still lot
Re: [julia-users] Re: Why isn't typeof(Float64[]) : typeof(Real[]) true?
I thought this was in the performance tips, but I couldn’t find it in a quick read. Definitely worth putting in there, because this is a really, really subtle point despite being so important. — John On Jan 22, 2014, at 4:01 PM, Kevin Squire kevin.squ...@gmail.com wrote: Thanks a lot for the correction, Tobias. I was confused on this point, but it's easy to check: julia c(a::Real, b::Real) = a+b c (generic function with 1 method) julia code_native(c, (Float64,Float64)) .text Filename: none Source line: 1 pushRBP movRBP, RSP Source line: 1 addsdXMM0, XMM1 popRBP ret julia code_native(c, (BigFloat,BigFloat)) .text Filename: none Source line: 1 pushRBP movRBP, RSP pushRBX subRSP, 40 movQWORD PTR [RBP - 40], 4 Source line: 1 movabsRBX, 139893810921040 movRAX, QWORD PTR [RBX] movQWORD PTR [RBP - 32], RAX leaRAX, QWORD PTR [RBP - 40] movQWORD PTR [RBX], RAX xorpsXMM0, XMM0 movupsXMMWORD PTR [RBP - 24], XMM0 movupsXMM0, XMMWORD PTR [RSI] Source line: 1 movupsXMMWORD PTR [RBP - 24], XMM0 Source line: 1 leaRSI, QWORD PTR [RBP - 24] Source line: 1 movabsRAX, 139893815571504 movEDI, 64233568 movEDX, 2 callRAX movRCX, QWORD PTR [RBP - 32] movQWORD PTR [RBX], RCX addRSP, 40 popRBX popRBP ret Kevin On Wed, Jan 22, 2014 at 3:55 PM, Tobias Knopp tobias.kn...@googlemail.com wrote: No. Giving types in function definitions does not give you any speedup as the function are always compiled for concrete types. When you define composite types it is however important to give concrete types for optimal performance. Am Donnerstag, 23. Januar 2014 00:41:41 UTC+1 schrieb Patrick Foley: Thanks! I've sorted it out and have solved my original problems. Would I get any speedup by defining function foo{TA:Real, TB:Real}(a::TA, b::TB) rather than function foo(a::Real, b::Real) ? My guess is .. yes? Since if I'm defining it the first way, I can compile versions of foo like foo(a::Int8, b::Int8) automatically, which would be much faster than defaulting to a foo(a::Real, b::Real) and reserving space for a possible Float64 each time? On Tuesday, January 21, 2014 7:17:36 PM UTC-5, Patrick Foley wrote: Is there a way to get around this? I have a lot of types (foo1, foo2, ) all of which are subtypes of an abstract (bar). I want to be able to define the behavior for arrays of any of the foos just by defining the behavior of an array of 'bar's. Any advice?
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
Yeah, at some point in the future I’d like to see if we can imitate the experimental query() and eval() methods from Pandas. It’s the fact that those methods were just recently introduced which made me decide we needed to stop spending time on getting them working right now. We’re way behind Pandas in terms of performance and reliability, so it’s a bad idea for us to try being as feature complete until we catch up. — John On Jan 23, 2014, at 6:37 AM, Jonathan Malmaud malm...@gmail.com wrote: Pandas has a 'query' method (http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-query) which uses the Python numexpr package for delayed evaluation (if i understand what you mean by that in this context).
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
I think that’s probably because you need to do using DataArrays now. — John On Jan 23, 2014, at 2:08 AM, Jon Norberg jon.norb...@ecology.su.se wrote: is this why I get this on latest julia studio on mac with recently updated packages: julia using DataFrames julia using RDatasets julia iris = data(datasets, iris) data not defined ??
Re: [julia-users] New install on OSX 10.9 GLM package does not work
Hi Chris, Unfortunately it’s very difficult for us to support 0.2 anymore because of the badly breaking Stats - StatsBase renaming. We’d have to rewrite the history of every repo to resolve this name change, so we chose to instead push everything up to our current development branches. That change unfortunately entirely deprecated Julia 0.2 support for DataArrays and DataFrames. I’m hopeful we’ll standardize on a stable set of features for core statistical libraries in the next six months. Once we all agree on core infrastructure issues, it’ll be easier to provide backwards compatibility. Right now we don’t have enough developers working on JuliaStats to both support older releases and develop new ones. — John On Jan 23, 2014, at 2:33 PM, Cgast cmg...@gmail.com wrote: Thanks Ivar. My incomplete description (which you have somewhat offensively labeled as weak) was intentionally so, to avoid hijacking the thread with my own problem. With your encouragement, however, here is the problem I'm having, which I suspect is related to the OP's problem: I installed v0.2.0 (64-bit) this morning (build date appears to be 2013-11-16 23:44 UTC), and have the following Pkg.status(): Required packages: - GLM 0.2.2 Additional packages: - Blocks0.0.1 - DataArrays0.1.0 - DataFrames0.5.0 - Distributions 0.3.0 - GZip 0.2.7 - NumericExtensions 0.3.6 - SortingAlgorithms 0.0.1 - StatsBase 0.3.5 using GLM gives me the following messages, terminating in an error: Warning: could not import Base.foldl into NumericExtensions Warning: could not import Base.foldr into NumericExtensions Warning: could not import Base.sum! into NumericExtensions Warning: could not import Base.maximum! into NumericExtensions Warning: could not import Base.minimum! into NumericExtensions ERROR: Stats not found in require at loading.jl:39 at C:\~\.julia\GLM\src\GLM.jl:8 Does this appear to be related to the previous problem? Does anyone have any suggestions on how to fix it, or shall I wait for package authors to do some updating? If a newer Julia version is required (which appears to be the suggestion from the NumericExtensions github issues), are there no newer Windows binaries available than v0.2.0? My corporate environment will make building from source difficult, for a variety of reasons. After starting with a fresh installation and a clean .julia directory, I've tried to install older versions of NumericExtensions (as suggested), with the following results: Pkg.pin(NumericExtensions,v0.2.20) ERROR: NumericExtensions is not a git repo in pin at pkg/entry.jl:202 and also: Pkg.pin(NumericExtensions,v0.2.20) INFO: Installing NumericExtensions v0.3.6 # ---wrong version (latest) INFO: REQUIRE updated. Thanks in advance for your help, Chris On Thursday, January 23, 2014 1:45:38 PM UTC-8, Ivar Nesje wrote: Similar problem is a quite weak description. The previous problem was that a new version of a pakcage (NumericExtensions) was incorrectly marked as compatible with 0.2. This does not appear to be fixed, so a bump on Dahua Lin and John Myles White might be what is needed. kl. 21:32:14 UTC+1 torsdag 23. januar 2014 skrev Cgast følgende: Any update on this? Having similar problems on Windows 7 with a fresh install just this morning. Seems to be related to some renaming of Stats vs. StatsBase? I've tried fiddling with this myself within the packages, but haven't been able to resolve it. Thanks in advance for all your help, and all the hard work getting Julia to this point. Chris On Wednesday, January 15, 2014 10:15:55 AM UTC-8, John Myles White wrote: No, we'll fix the packages to indicate which work with 0.2 and which don't. -- John On Jan 15, 2014, at 9:52 AM, Corey Sparks corey.sp...@gmail.com wrote: so, if i just wait for 0.3 things might get worked out? Thanks On Wednesday, January 15, 2014 11:26:56 AM UTC-6, John Myles White wrote: We've unfortunately done a bad job of keeping those packages compatible with 0.2. I'll try to fix as much as I can today. -- John On Jan 15, 2014, at 8:48 AM, Corey Sparks corey.sp...@gmail.com wrote: Dear List, I just installed Julia 0.2.0 last night and was trying to get the GLM package going, when I try to load it and the RDatasets packages, I get: julia using RDatasets, GLM Warning: could not import Base.foldl into NumericExtensions Warning: could not import Base.foldr into NumericExtensions Warning: could not import Base.sum! into NumericExtensions Warning: could not import Base.maximum! into NumericExtensions Warning: could not import Base.minimum! into NumericExtensions Warning: could not import Base.PAIRWISE_SUM_BLOCKSIZE
Re: [julia-users] Re: Error: no method display(DataFrame)
A couple of points that expand on Tom’s comments: (1) We need to add Tom’s definition of countna(a::Array) = 0 to show() wide DataFrame’s that contain any columns that are Vector’s. I never use DataFrame’s like that, so I forgot that others might. It’s also impossible to produce such a DataFrame using our current I/O routines. (2) The constructor you’re using does exist, Jacob, but you should typically pass in a Vector{Any}, each element of which is either a DataVector or PooledDataVector. See Point (3) for why, at the moment, using a Vector as a column is subtly broken. (3) If people are going to put Vector’s in DataFrames for performance reasons, all of our setindex!() functions for DataFrames need to add methods that automatically convert Vector’s to DataVector’s if an NA is inserted in a Vector. Right now that kind of insertion is just going to error out. Ths check isn’t too hard, but it’s totally missing from our current codebase. Personally, I would prefer that we not allow any of the columns of a DataFrame to be Vector's. It’s a weird edge case that doesn’t actually offer reliable high performance, because the potential performance improvements relies on the unsafe assumption that a DataFame won’t contain any columns with NA’s in it. — John On Jan 23, 2014, at 1:33 PM, Tom Short tshort.rli...@gmail.com wrote: That works, but columns will be Arrays instead of DataArrays. That's the way it's always worked. If you want them to be DataArrays, then convert to DataArrays right at the end. To fix show to support columns that are arrays, we probably need (at least) to define the following: countna(da::Array) = 0 On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn quinn.jac...@gmail.com wrote: Great investigative work. Is DataFrames( array_of_arrays, Index(column_names_array) ) not the right way to hand construct DataFrames any more? I think I can allocate DataArrays instead, but at every step of the way, I was trying to hand-optimize the result fetching process, which resulted in not creating a DataArray or DataFrame until right before we return to the user. -Jacob On Thu, Jan 23, 2014 at 3:27 PM, bp2012 bert.pritch...@gmail.com wrote: To check Jacob's suggestion about versions mismatch I completely removed the DataFrames and ODBC packages using Pkg.rm and physically deleted the directories from disk. I then added them via Pkg.add and Pkg,update. I am running the julia nightlies build. julia versioninfo() Julia Version 0.3.0-prerelease+1127 Commit bc73674* (2014-01-22 20:09 UTC) Pkg.status() - DataFrames 0.5.1 - ODBC 0.3.5 Pkg.checkout(ODBC) INFO: Checking out ODBC master... INFO: Pulling ODBC latest master... INFO: No packages to install, update or remove julia Pkg.checkout(DataFrames) INFO: Checking out DataFrames master... INFO: Pulling DataFrames latest master... INFO: No packages to install, update or remove I did some digging. It looks like there is a mismatch in that countna expects DataFrame columns to be DataArrays. However the ODBC package returns DataFrames that have array columns (using the first constructor in dataframe.jl). You guys would know better as to whether a change is needed in the constructor or if countna should also accept Array columns. I made some local changes to work around the issue. show.jl: line 42: if isna(col, i) changed to if isna(col[i]) line 322: missing[j] = countna(adf[j]) changed tomissing[j] = countna(isa(adf[j], DataArray) ? adf[j] : DataArray(adf[j])) These work great for me.
Re: [julia-users] New install on OSX 10.9 GLM package does not work
We should hopefully have nightly binaries sometime soon that will help alleviate some of these issues in the future. I’ve lost track of the work to provide them, but I know it’s being done. — John On Jan 23, 2014, at 6:02 PM, Cgast cmg...@gmail.com wrote: OK, thanks John and Ivar. I'll probably put in some effort towards building it myself, and wait for 0.3 binaries. Thanks for the help, Chris On Thursday, January 23, 2014 5:35:16 PM UTC-8, John Myles White wrote: Hi Chris, Unfortunately it’s very difficult for us to support 0.2 anymore because of the badly breaking Stats - StatsBase renaming. We’d have to rewrite the history of every repo to resolve this name change, so we chose to instead push everything up to our current development branches. That change unfortunately entirely deprecated Julia 0.2 support for DataArrays and DataFrames. I’m hopeful we’ll standardize on a stable set of features for core statistical libraries in the next six months. Once we all agree on core infrastructure issues, it’ll be easier to provide backwards compatibility. Right now we don’t have enough developers working on JuliaStats to both support older releases and develop new ones. — John On Jan 23, 2014, at 2:33 PM, Cgast cmg...@gmail.com wrote: Thanks Ivar. My incomplete description (which you have somewhat offensively labeled as weak) was intentionally so, to avoid hijacking the thread with my own problem. With your encouragement, however, here is the problem I'm having, which I suspect is related to the OP's problem: I installed v0.2.0 (64-bit) this morning (build date appears to be 2013-11-16 23:44 UTC), and have the following Pkg.status(): Required packages: - GLM 0.2.2 Additional packages: - Blocks0.0.1 - DataArrays0.1.0 - DataFrames0.5.0 - Distributions 0.3.0 - GZip 0.2.7 - NumericExtensions 0.3.6 - SortingAlgorithms 0.0.1 - StatsBase 0.3.5 using GLM gives me the following messages, terminating in an error: Warning: could not import Base.foldl into NumericExtensions Warning: could not import Base.foldr into NumericExtensions Warning: could not import Base.sum! into NumericExtensions Warning: could not import Base.maximum! into NumericExtensions Warning: could not import Base.minimum! into NumericExtensions ERROR: Stats not found in require at loading.jl:39 at C:\~\.julia\GLM\src\GLM.jl:8 Does this appear to be related to the previous problem? Does anyone have any suggestions on how to fix it, or shall I wait for package authors to do some updating? If a newer Julia version is required (which appears to be the suggestion from the NumericExtensions github issues), are there no newer Windows binaries available than v0.2.0? My corporate environment will make building from source difficult, for a variety of reasons. After starting with a fresh installation and a clean .julia directory, I've tried to install older versions of NumericExtensions (as suggested), with the following results: Pkg.pin(NumericExtensions,v0.2.20) ERROR: NumericExtensions is not a git repo in pin at pkg/entry.jl:202 and also: Pkg.pin(NumericExtensions,v0.2.20) INFO: Installing NumericExtensions v0.3.6 # ---wrong version (latest) INFO: REQUIRE updated. Thanks in advance for your help, Chris On Thursday, January 23, 2014 1:45:38 PM UTC-8, Ivar Nesje wrote: Similar problem is a quite weak description. The previous problem was that a new version of a pakcage (NumericExtensions) was incorrectly marked as compatible with 0.2. This does not appear to be fixed, so a bump on Dahua Lin and John Myles White might be what is needed. kl. 21:32:14 UTC+1 torsdag 23. januar 2014 skrev Cgast følgende: Any update on this? Having similar problems on Windows 7 with a fresh install just this morning. Seems to be related to some renaming of Stats vs. StatsBase? I've tried fiddling with this myself within the packages, but haven't been able to resolve it. Thanks in advance for all your help, and all the hard work getting Julia to this point. Chris On Wednesday, January 15, 2014 10:15:55 AM UTC-8, John Myles White wrote: No, we'll fix the packages to indicate which work with 0.2 and which don't. -- John On Jan 15, 2014, at 9:52 AM, Corey Sparks corey.sp...@gmail.com wrote: so, if i just wait for 0.3 things might get worked out? Thanks On Wednesday, January 15, 2014 11:26:56 AM UTC-6, John Myles White wrote: We've unfortunately done a bad job of keeping those packages compatible with 0.2. I'll try to fix as much as I can today. -- John On Jan 15, 2014, at 8:48 AM, Corey Sparks corey.sp...@gmail.com wrote: Dear List, I just installed Julia 0.2.0 last night and was trying
Re: [julia-users] Re: Error: no method display(DataFrame)
I would be a lot happier with that feature if we followed the lead of traditional databases and constantly reminded users which columns are “NOT NULL”. As it stands, the “types” of a DataFrame don’t tell you whether a column could contain NA’s or not. If we exposed functionality through something like a hypothetical nullable(df, colindex), my resistance to that feature would start to go away — John On Jan 23, 2014, at 6:48 PM, Tom Short tshort.rli...@gmail.com wrote: I think of item #3 as a feature, not a bug. I don't like the idea of auto-conversion. If I choose Vectors, I should not expect them to support missing values. R sometimes irritates me by adding NA's when I don't expect it. I'd rather have the error than have NA's sneak in there. Also, there may be other types of AbstractDataFrames where we don't have the ability to assign missing values. HDF5 tables are one example I can think of. We wouldn't want to try to autoconvert a huge HDF5 column to a DataVector. On Thu, Jan 23, 2014 at 8:58 PM, John Myles White johnmyleswh...@gmail.com wrote: A couple of points that expand on Tom’s comments: (1) We need to add Tom’s definition of countna(a::Array) = 0 to show() wide DataFrame’s that contain any columns that are Vector’s. I never use DataFrame’s like that, so I forgot that others might. It’s also impossible to produce such a DataFrame using our current I/O routines. (2) The constructor you’re using does exist, Jacob, but you should typically pass in a Vector{Any}, each element of which is either a DataVector or PooledDataVector. See Point (3) for why, at the moment, using a Vector as a column is subtly broken. (3) If people are going to put Vector’s in DataFrames for performance reasons, all of our setindex!() functions for DataFrames need to add methods that automatically convert Vector’s to DataVector’s if an NA is inserted in a Vector. Right now that kind of insertion is just going to error out. Ths check isn’t too hard, but it’s totally missing from our current codebase. Personally, I would prefer that we not allow any of the columns of a DataFrame to be Vector's. It’s a weird edge case that doesn’t actually offer reliable high performance, because the potential performance improvements relies on the unsafe assumption that a DataFame won’t contain any columns with NA’s in it. — John On Jan 23, 2014, at 1:33 PM, Tom Short tshort.rli...@gmail.com wrote: That works, but columns will be Arrays instead of DataArrays. That's the way it's always worked. If you want them to be DataArrays, then convert to DataArrays right at the end. To fix show to support columns that are arrays, we probably need (at least) to define the following: countna(da::Array) = 0 On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn quinn.jac...@gmail.com wrote: Great investigative work. Is DataFrames( array_of_arrays, Index(column_names_array) ) not the right way to hand construct DataFrames any more? I think I can allocate DataArrays instead, but at every step of the way, I was trying to hand-optimize the result fetching process, which resulted in not creating a DataArray or DataFrame until right before we return to the user. -Jacob On Thu, Jan 23, 2014 at 3:27 PM, bp2012 bert.pritch...@gmail.com wrote: To check Jacob's suggestion about versions mismatch I completely removed the DataFrames and ODBC packages using Pkg.rm and physically deleted the directories from disk. I then added them via Pkg.add and Pkg,update. I am running the julia nightlies build. julia versioninfo() Julia Version 0.3.0-prerelease+1127 Commit bc73674* (2014-01-22 20:09 UTC) Pkg.status() - DataFrames 0.5.1 - ODBC 0.3.5 Pkg.checkout(ODBC) INFO: Checking out ODBC master... INFO: Pulling ODBC latest master... INFO: No packages to install, update or remove julia Pkg.checkout(DataFrames) INFO: Checking out DataFrames master... INFO: Pulling DataFrames latest master... INFO: No packages to install, update or remove I did some digging. It looks like there is a mismatch in that countna expects DataFrame columns to be DataArrays. However the ODBC package returns DataFrames that have array columns (using the first constructor in dataframe.jl). You guys would know better as to whether a change is needed in the constructor or if countna should also accept Array columns. I made some local changes to work around the issue. show.jl: line 42: if isna(col, i) changed to if isna(col[i]) line 322: missing[j] = countna(adf[j]) changed tomissing[j] = countna(isa(adf[j], DataArray) ? adf[j] : DataArray(adf[j])) These work great for me.
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
Just saw that. Seems like a very smart way to get us important functionality while we continue to push things forward. Would be very cool if we could make it possible to switch between the Pandas and native Julia implementations totally seamlessly. — John On Jan 23, 2014, at 7:51 PM, Jonathan Malmaud malm...@gmail.com wrote: Sounds reasonable. As a temporary measure for people who want that functionality immediately, I've taken a stab at wrapping pandas in a Julia package (just as pyplot does for matplotlib), at https://github.com/malmaud/pandas. On Thursday, January 23, 2014 10:17:40 AM UTC-5, John Myles White wrote: Yeah, at some point in the future I’d like to see if we can imitate the experimental query() and eval() methods from Pandas. It’s the fact that those methods were just recently introduced which made me decide we needed to stop spending time on getting them working right now. We’re way behind Pandas in terms of performance and reliability, so it’s a bad idea for us to try being as feature complete until we catch up. — John On Jan 23, 2014, at 6:37 AM, Jonathan Malmaud mal...@gmail.com wrote: Pandas has a 'query' method (http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-query) which uses the Python numexpr package for delayed evaluation (if i understand what you mean by that in this context).
Re: [julia-users] Re: Error: no method display(DataFrame)
Ok. I’m coming around to this. How would you do I/O? If we make DataFrames expose a nullable property, we could plausibly produce vectors instead of data vectors when parsing CSV files. — John On Jan 23, 2014, at 7:38 PM, Sean Garborg sean.garb...@gmail.com wrote: I'd think of #3 as a feature, too. Just to throw another use case in the ring, if DataFrames with a mix of Vectors and DataVectors (with NAs) were performant, my co-workers and I would usually pull in data marking all columns as Vectors, these columns would remain Vectors, and derived columns would be mostly DataVectors. On Thursday, January 23, 2014 8:48:42 PM UTC-6, tshort wrote: I think of item #3 as a feature, not a bug. I don't like the idea of auto-conversion. If I choose Vectors, I should not expect them to support missing values. R sometimes irritates me by adding NA's when I don't expect it. I'd rather have the error than have NA's sneak in there. Also, there may be other types of AbstractDataFrames where we don't have the ability to assign missing values. HDF5 tables are one example I can think of. We wouldn't want to try to autoconvert a huge HDF5 column to a DataVector. On Thu, Jan 23, 2014 at 8:58 PM, John Myles White johnmyl...@gmail.com wrote: A couple of points that expand on Tom’s comments: (1) We need to add Tom’s definition of countna(a::Array) = 0 to show() wide DataFrame’s that contain any columns that are Vector’s. I never use DataFrame’s like that, so I forgot that others might. It’s also impossible to produce such a DataFrame using our current I/O routines. (2) The constructor you’re using does exist, Jacob, but you should typically pass in a Vector{Any}, each element of which is either a DataVector or PooledDataVector. See Point (3) for why, at the moment, using a Vector as a column is subtly broken. (3) If people are going to put Vector’s in DataFrames for performance reasons, all of our setindex!() functions for DataFrames need to add methods that automatically convert Vector’s to DataVector’s if an NA is inserted in a Vector. Right now that kind of insertion is just going to error out. Ths check isn’t too hard, but it’s totally missing from our current codebase. Personally, I would prefer that we not allow any of the columns of a DataFrame to be Vector's. It’s a weird edge case that doesn’t actually offer reliable high performance, because the potential performance improvements relies on the unsafe assumption that a DataFame won’t contain any columns with NA’s in it. — John On Jan 23, 2014, at 1:33 PM, Tom Short tshort...@gmail.com wrote: That works, but columns will be Arrays instead of DataArrays. That's the way it's always worked. If you want them to be DataArrays, then convert to DataArrays right at the end. To fix show to support columns that are arrays, we probably need (at least) to define the following: countna(da::Array) = 0 On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn quinn@gmail.com wrote: Great investigative work. Is DataFrames( array_of_arrays, Index(column_names_array) ) not the right way to hand construct DataFrames any more? I think I can allocate DataArrays instead, but at every step of the way, I was trying to hand-optimize the result fetching process, which resulted in not creating a DataArray or DataFrame until right before we return to the user. -Jacob On Thu, Jan 23, 2014 at 3:27 PM, bp2012 bert.pr...@gmail.com wrote: To check Jacob's suggestion about versions mismatch I completely removed the DataFrames and ODBC packages using Pkg.rm and physically deleted the directories from disk. I then added them via Pkg.add and Pkg,update. I am running the julia nightlies build. julia versioninfo() Julia Version 0.3.0-prerelease+1127 Commit bc73674* (2014-01-22 20:09 UTC) Pkg.status() - DataFrames 0.5.1 - ODBC 0.3.5 Pkg.checkout(ODBC) INFO: Checking out ODBC master... INFO: Pulling ODBC latest master... INFO: No packages to install, update or remove julia Pkg.checkout(DataFrames) INFO: Checking out DataFrames master... INFO: Pulling DataFrames latest master... INFO: No packages to install, update or remove I did some digging. It looks like there is a mismatch in that countna expects DataFrame columns to be DataArrays. However the ODBC package returns DataFrames that have array columns (using the first constructor in dataframe.jl). You guys would know better as to whether a change is needed in the constructor or if countna should also accept Array columns. I made some local changes to work around the issue. show.jl: line 42: if isna(col, i) changed to if isna(col[i]) line 322: missing[j] = countna(adf[j]) changed tomissing[j
Re: [julia-users] Re: Error: no method display(DataFrame)
Yeah, that seems totally reasonable to me. If we do this in a more formal way, I’m now onboard. Let’s add the idea of explicit restrictions on columns that can and can’t contain NA’s to the spec: https://github.com/JuliaStats/DataFrames.jl/issues/502 — John On Jan 23, 2014, at 8:21 PM, Sean Garborg sean.garb...@gmail.com wrote: My first thought was a Vector{Bool}. On Thursday, January 23, 2014 10:05:25 PM UTC-6, John Myles White wrote: Ok. I’m coming around to this. How would you do I/O? If we make DataFrames expose a nullable property, we could plausibly produce vectors instead of data vectors when parsing CSV files. — John On Jan 23, 2014, at 7:38 PM, Sean Garborg sean.g...@gmail.com wrote: I'd think of #3 as a feature, too. Just to throw another use case in the ring, if DataFrames with a mix of Vectors and DataVectors (with NAs) were performant, my co-workers and I would usually pull in data marking all columns as Vectors, these columns would remain Vectors, and derived columns would be mostly DataVectors. On Thursday, January 23, 2014 8:48:42 PM UTC-6, tshort wrote: I think of item #3 as a feature, not a bug. I don't like the idea of auto-conversion. If I choose Vectors, I should not expect them to support missing values. R sometimes irritates me by adding NA's when I don't expect it. I'd rather have the error than have NA's sneak in there. Also, there may be other types of AbstractDataFrames where we don't have the ability to assign missing values. HDF5 tables are one example I can think of. We wouldn't want to try to autoconvert a huge HDF5 column to a DataVector. On Thu, Jan 23, 2014 at 8:58 PM, John Myles White johnmyl...@gmail.com wrote: A couple of points that expand on Tom’s comments: (1) We need to add Tom’s definition of countna(a::Array) = 0 to show() wide DataFrame’s that contain any columns that are Vector’s. I never use DataFrame’s like that, so I forgot that others might. It’s also impossible to produce such a DataFrame using our current I/O routines. (2) The constructor you’re using does exist, Jacob, but you should typically pass in a Vector{Any}, each element of which is either a DataVector or PooledDataVector. See Point (3) for why, at the moment, using a Vector as a column is subtly broken. (3) If people are going to put Vector’s in DataFrames for performance reasons, all of our setindex!() functions for DataFrames need to add methods that automatically convert Vector’s to DataVector’s if an NA is inserted in a Vector. Right now that kind of insertion is just going to error out. Ths check isn’t too hard, but it’s totally missing from our current codebase. Personally, I would prefer that we not allow any of the columns of a DataFrame to be Vector's. It’s a weird edge case that doesn’t actually offer reliable high performance, because the potential performance improvements relies on the unsafe assumption that a DataFame won’t contain any columns with NA’s in it. — John On Jan 23, 2014, at 1:33 PM, Tom Short tshort...@gmail.com wrote: That works, but columns will be Arrays instead of DataArrays. That's the way it's always worked. If you want them to be DataArrays, then convert to DataArrays right at the end. To fix show to support columns that are arrays, we probably need (at least) to define the following: countna(da::Array) = 0 On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn quinn@gmail.com wrote: Great investigative work. Is DataFrames( array_of_arrays, Index(column_names_array) ) not the right way to hand construct DataFrames any more? I think I can allocate DataArrays instead, but at every step of the way, I was trying to hand-optimize the result fetching process, which resulted in not creating a DataArray or DataFrame until right before we return to the user. -Jacob On Thu, Jan 23, 2014 at 3:27 PM, bp2012 bert.pr...@gmail.com wrote: To check Jacob's suggestion about versions mismatch I completely removed the DataFrames and ODBC packages using Pkg.rm and physically deleted the directories from disk. I then added them via Pkg.add and Pkg,update. I am running the julia nightlies build. julia versioninfo() Julia Version 0.3.0-prerelease+1127 Commit bc73674* (2014-01-22 20:09 UTC) Pkg.status() - DataFrames 0.5.1 - ODBC 0.3.5 Pkg.checkout(ODBC) INFO: Checking out ODBC master... INFO: Pulling ODBC latest master... INFO: No packages to install, update or remove julia Pkg.checkout(DataFrames) INFO: Checking out DataFrames master... INFO: Pulling DataFrames latest master... INFO: No packages to install, update or remove I did some digging. It looks like there is a mismatch in that countna expects DataFrame columns to be DataArrays
Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages
I think they’re uncorrelated, but you’d have to ask Wes to know for sure. — John On Jan 24, 2014, at 12:19 AM, Matthias BUSSONNIER bussonniermatth...@gmail.com wrote: Le 24 janv. 2014 à 04:51, Jonathan Malmaud a écrit : Sounds reasonable. As a temporary measure for people who want that functionality immediately, I've taken a stab at wrapping pandas in a Julia package (just as pyplot does for matplotlib), at https://github.com/malmaud/pandas. Would this explain this Tweet from 10h Ago ? Wes McKinney @wesmckinn Friendly reminder that performance-obsessed data hackers (R, Python, Julia) should feel free to drop me a line about working together -- M
Re: [julia-users] Bug or feature? How does = decide whether to do a copy or deepcopy?
Hi Eric, I think you’re being confused by the distinction between the bindings of variables and values, which can be bound to variables. If w is an Array, then an expression like w = [1, 2, 3] assigns a value (namely the value of an array containing 1, 2 and 3) to the variable w. In contrast, an expression like w[1] = 4 does not refer to a variable called w[1]. It refers to a position in memory whose value is being mutated to 4. — John On Jan 24, 2014, at 9:01 PM, Eric Ford ericbf...@gmail.com wrote: Hi Ivar, Thanks for the idea. But replacing w += 10 with w = w + 10 still gives the same problem. (As does w = w .+ 10). Either way, after calling f2, the values of x are modified. Even stranger, the values of x are not modified by the following function function f3(x::Array) w = x + 0. for i in 1:length(w) w[i] = w[i] + 10.0 end return w end So the behavior of future lines of code depends on whether w is initialized as w = x or w = x + 0. It appears that for some reason, julia is doing different things when the right hand side is just an array, versus when it is an expression. Maybe somebody considers that a feature, but it's definitely non-intuitive. I can't find any mention of this in the documentation. Amusingly, there appears to be basically no documentation for what = does (despite it being the most common symbol in most every julia program). I thought = might be implemented as a generic function with different behaviors for Arrays and expressions, but methods(=) and help(=) turned up nothing. On one hand, it seems, like = should be fairly obvious, but evidently there are currently important differences in the behavior of = that need to be documented and/or corrected. Cheers, Eric On Friday, January 24, 2014 2:38:53 PM UTC-5, Ivar Nesje wrote: It becomes more obvious if you write a += 3 as the longer form a = a + 4 kl. 19:57:22 UTC+1 fredag 24. januar 2014 skrev Eric Ford følgende: Sorry. (Evidently, downloading a notebook is based on the last saved version and what's currently on your screen.) IJulia notebook attached. Readable version below. function f1(x::Array) w = x w += 10.0 return w end function f2(x::Array) w = x for i in 1:length(x) w[i] += 10.0 end return w end x=randn(10) x_orig = deepcopy(x) f1_of_x = f1(x) println(After f1: ,sum((x.-x_orig).^2)); x = deepcopy(x_orig) f2_of_x = f2(x) println(After f2: ,sum((x.-x_orig).^2)); After f1: 0.0 After f2: 1000.0 Thanks, Eric On Friday, January 24, 2014 1:10:59 PM UTC-5, Kevin Squire wrote: In what you posted, `f1` and `f2` are identical (except for the name). Can you share the output of a Julia or IJulia session showing the problem? Cheers, Kevin On Fri, Jan 24, 2014 at 9:58 AM, Eric Ford eric...@gmail.com wrote: I don't understand why the first function doesn't change x, but the second function does. Is the = calling deepcopy in f1, but copy in f2? If so, why? function f1(x::Array) w = x for i in 1:length(x) w[i] += 10.0 end w += 10.0 return w end function f2(x::Array) w = x for i in 1:length(x) w[i] += 10.0 end w += 10.0 return w end x=randn(10) x_orig = deepcopy(x) f_of_x = f(x) sum((x.-x_orig).^2) After f1: 0.0 After f2: 1000.0 Thanks, Eric (on behalf of an Astro 585 student)
Re: [julia-users] pretty printing
You need to override Base.show(io::IO, foo:T) show()’s definition provides the basis for most other printing methods. — John On Jan 25, 2014, at 3:22 AM, Shoibal Chakravarty shoib...@gmail.com wrote: Suppose I define a composite type T. type T xx::Int yy::Int end juliaT (xx,yy) I want to change what the repl prints when I do T[enter] on the command line. Which function should I change to to do this (the equivalent of T.__repr__() in Python). Thanks, Shoibal.
Re: [julia-users] Implementing a special Array{Float64, 2}
There’s a lot of built-in functions for showing and displaying AbstractArrays. Are you extending them? Right now AbstractArray implies a slightly underdocumented interface, which you have to implement before inheriting from AbstractArray will work right. I’m hopeful this interface will get documented after Julia stabilizes, but for now I’ve used trial-and-error to figure out what needs to be implemented. — John On Jan 26, 2014, at 9:50 AM, Jesse van den Kieboom jesse...@gmail.com wrote: On Sunday, January 26, 2014 5:59:03 PM UTC+1, John Myles White wrote: Right now this is a little tricky. It’s come up before and will probably have some kind of solution in the future. For now, you might find http://grollchristian.wordpress.com/2014/01/22/julia-inheriting-behavior/ useful. Thanks, that was an interesting read which addresses exactly what I was doing. I have a related question, that maybe you (or someone else) can answer. I have the following type: type MotionVector : AbstractArray{Float64} v::Array{Float64} MotionVector() = (x = new(); x.v = zeros(6, 1); x) MotionVector(v) = (x = new(); x.v = v; x) end This seems to work, but when I do this, the display(MotionVector()) does not work anymore, telling me: ERROR: no method display(MotionVector) in display at multimedia.jl:158 Without inheriting from AbstractArray{Float64}, this doesn't happen. — John On Jan 26, 2014, at 8:53 AM, Jesse van den Kieboom jess...@gmail.com wrote: Hi all, I'm new to julia, so forgive me for maybe asking something obvious. What I would like to do is to create a new type which is basically an Array{Float64, 2}, but has some special operations defined as part of the vector space that it belongs to. What I currently do is to create a new composite type with one field containing the underlying array. This kind of works, but I need to proxy a lot of operators (*, -, +, etc) and methods (getindex, setindex!, convert, display, ndims, size), which do not need special behavior, to the underlying array. Initially, I tried to use a typealias instead of a composite type, but it seems the typealias type information is not retained and so I can't define new operations on it. Does anyone have a better way to implement this?
Re: [julia-users] Merging dataframes
This is quite close to being possible, but we’re missing a few things. Daniel Jones recently added an append! method to DataArrays, which would let you do this column-by-column. To help you out, we need to add an append! method to DataFrames as well. I’ve wanted that badly myself lately. I will try to get to this today, but am already pretty overwhelmed with work for the day. — John On Jan 26, 2014, at 11:02 AM, Joosep Pata joosep.p...@gmail.com wrote: Is there a way to avoid copying when doing vcat(df1::DataFrame, df2::DataFrame, …)? I’m trying to open hundreds of files with DataFrames, merge all of them and save a single ~150M row x 100 col DataFrame using HDF5 and JLD (to be opened later using mmap), and it seems to work marvelously, apart from the vcat. Does a no-copy option exist? I’m aware of DataStreams as a concept, but as I understand, they’re not fully fleshed out yet.
Re: [julia-users] General Licensing Question
Hi Hans, (1) The GPL makes it impossible for users of Julia to embed Julia as part of a closed source product. We’d prefer not to impose that restriction. The BSD and MIT licenses are largely identical: the major difference is that the BSD license comes in several flavors, not all of which are equivalent to the MIT license. The BSD license with two clauses is effectively the same license as the MIT license. (2) All of the code written for Julia by Julia developers is licensed under the MIT license. Only some dependencies like FFTW are licensed under the GPL, but those dependencies are sufficient to make the aggregate of Julia + dependencies fall under the GPL. (3) Either the removal or the recreation of the GPL components of the current Julia distribution would be sufficient to remove the GPL restriction on the Julia distribution. Some parts, like Rmath, are easily replaceable. Other parts, like SuiteSparse, are much harder to replace and would likely have to be removed to provide a non-GPL release. I hope that helps. — John On Jan 26, 2014, at 2:18 PM, Hans W Borchers hwborch...@gmail.com wrote: In the file DISTRIBUTING.md I read the following lines: Note that while the code for Julia is [MIT-licensed](https://github.com/JuliaLang/julia/blob/master/LICENSE.md), the distribution created by the techniques described herein will be GPL licensed, as various dependent libraries such as `FFTW`, `Rmath`, `SuiteSparse`, and `git` are GPL licensed. We do hope to have a non-GPL distribution of Julia in the future. For me this triggers the question: (1) Why is the MIT license so much better for Julia than any GPL license? What is the main difference to consider? I think, Python is under BSD license, would that be an alternative? (2) What does it mean that Julia (which part?) is under MIT license while the distribution is GPL-licensed. Are there legal consequences for this kind of construction? (3) To have a non-GPLed version in the future: Does that mean, certain parts have to be removed, or will they have to be rewritten in C and Julia? Hans Werner
Re: [julia-users] Natural language processing in Julia
JuliaText would be great. TextAnalysis.jl really needs a lot of love to move forward. For now, I’d strongly push people towards NLTK. — John On Jan 27, 2014, at 8:29 AM, Jonathan Malmaud malm...@gmail.com wrote: I was thinking of starting up a Julia NLP meta-project on github if there's enough interest. It could host projects like textanalysis.jl, a Julia interface to NLTK, a Julia interface to some of Stanford's NLP tools, and whatever more native solutions people put together. On Friday, October 25, 2013 9:32:10 AM UTC-4, Dahua Lin wrote: I wish there is something comparable to NLTK in Julia. In a recent project that involves text parsing, I have to implement the text handling module in Python, simply for the purpose of using NTLK and Jinja2. If we can get the attention of the NLP community, I believe some NLP people will build such things very soon. - Dahua On Tuesday, October 22, 2013 7:35:57 PM UTC-5, John Myles White wrote: There's a package called TextAnalysis.jl that has stemming and very basic tokenization. Patches to do POS tagging would be very welcome. -- John On Oct 22, 2013, at 5:29 PM, Jonathan Malmaud mal...@gmail.com wrote: Is anyone working on or know of a package to do NLP tasks with Julia, like part-of-speech tagging and stemming? PyCall works fine with Python's NLTK, so that would be my default choice if there isn't anything more native at the moment.
[julia-users] DBI: Generic database access in Julia
I've been intentionally holding off on announcing this work (because it's not even close to being ready for practical use yet), but I've been working with Eric Davies on a generic database access module in Julia called DBI: https://github.com/johnmyleswhite/DBI.jl The goal of DBI is to provide a consistent interface that specific database drivers can implement. Between Eric and me, some work's been done on implementing this for SQLite, MySQL and Postgres: https://github.com/johnmyleswhite/SQLite.jl https://github.com/johnmyleswhite/MySQL.jl https://github.com/iamed2/PostgreSQL.jl I've unfortunately slowed down so that I can fix up DataFrames, but I've seen a bunch of people working on database support recently and wanted to encourage collaboration early on. Would be great to get everyone interested in database support to work together. I can't be in charge of this for another few weeks, but wanted to start a discussion so that everyone can collaborate effectively. -- John
Re: [julia-users] General Licensing Question
Yes, the main LICENSE file for Julia should contain more details about the legal status of subsets of the code and also about the distribution as an entirety. -- John On Jan 27, 2014, at 9:52 AM, Hans W Borchers hwborch...@gmail.com wrote: Yes, but this is not downloaded with the source. At least in my source-master directory there is no COPYING file. And if the whole Julia distribution is GPLed, I would expect a version of the license on highest level. On Monday, January 27, 2014 11:10:37 AM UTC+1, Shaun Walbridge wrote: The components which use the GPL license do already include copies of the license -- e.g. https://github.com/JuliaLang/Rmath/blob/master/COPYING. I believe this is true for the other GPL components as well (readline, FFTW, patchelf).
Re: [julia-users] General Licensing Question
You’re right, the LICENSE.md file is pretty explicit. — John On Jan 28, 2014, at 1:08 AM, Tobias Knopp tobias.kn...@googlemail.com wrote: Isn't the LICENSE.md file in Julia pretty clear? Julia is MIT licensed and repl-readline.c is GPL. I don't see the problem. If I where using libjulia, I can use it in a commercial program. One is of course not allowed to ship fftw though. Still, libjulia and all the .jl files in Base are MIT licensed. I evantually plan to integrate Julia into a commerical product and I have made some contributions to Julia and Gtk.jl. If Julia would be GPL I would not have done this. Am Montag, 27. Januar 2014 22:21:31 UTC+1 schrieb John Myles White: Yes, the main LICENSE file for Julia should contain more details about the legal status of subsets of the code and also about the distribution as an entirety. -- John On Jan 27, 2014, at 9:52 AM, Hans W Borchers hwbor...@gmail.com wrote: Yes, but this is not downloaded with the source. At least in my source-master directory there is no COPYING file. And if the whole Julia distribution is GPLed, I would expect a version of the license on highest level. On Monday, January 27, 2014 11:10:37 AM UTC+1, Shaun Walbridge wrote: The components which use the GPL license do already include copies of the license -- e.g. https://github.com/JuliaLang/Rmath/blob/master/COPYING. I believe this is true for the other GPL components as well (readline, FFTW, patchelf).
Re: [julia-users] Can't manage packages
Try doing Pkg.rm(“Stats”). — John On Jan 28, 2014, at 6:47 PM, Carlos Lesmes carlosles...@gmail.com wrote: Hi, I'm on mac 10.7 Julia 0.2.0, today I updated but found this: julia Pkg.update() INFO: Updating METADATA... INFO: Updating cache of Stats... INFO: Updating cache of StatsBase... INFO: Updating cache of Distance... INFO: Updating cache of JSON... INFO: Updating cache of PyPlot... INFO: Updating cache of NumericExtensions... ERROR: failed process: Process(`git --git-dir=/Users/carloslesmes/.julia/.cache/Stats merge-base 0efba512a2bf8faa21e61c9568222ae1ae96acbb 5113ce6044fc554b350ea16f92502f8d6e077a62`, ProcessExited(1)) [1] in pipeline_error at process.jl:476 in readbytes at process.jl:430 in readall at process.jl:437 in readchomp at git.jl:26 in installed_version at pkg/read.jl:70 in installed at pkg/read.jl:121 in update at pkg/entry.jl:231 in anonymous at pkg/dir.jl:25 in cd at file.jl:22 in cd at pkg/dir.jl:25 in update at pkg.jl:40 anybody knows what's wrong? Please help.
Re: [julia-users] Can't manage packages
Good to know. — John On Jan 28, 2014, at 7:53 PM, Shaun Walbridge shaun.walbri...@gmail.com wrote: I had the same issue today, and blowing away Stats was insufficient, but deleting recreating ~/.julia did fix it. On Tue, Jan 28, 2014 at 9:56 PM, John Myles White johnmyleswh...@gmail.com wrote: Try doing Pkg.rm(“Stats”). — John On Jan 28, 2014, at 6:47 PM, Carlos Lesmes carlosles...@gmail.com wrote: Hi, I'm on mac 10.7 Julia 0.2.0, today I updated but found this: julia Pkg.update() INFO: Updating METADATA... INFO: Updating cache of Stats... INFO: Updating cache of StatsBase... INFO: Updating cache of Distance... INFO: Updating cache of JSON... INFO: Updating cache of PyPlot... INFO: Updating cache of NumericExtensions... ERROR: failed process: Process(`git --git-dir=/Users/carloslesmes/.julia/.cache/Stats merge-base 0efba512a2bf8faa21e61c9568222ae1ae96acbb 5113ce6044fc554b350ea16f92502f8d6e077a62`, ProcessExited(1)) [1] in pipeline_error at process.jl:476 in readbytes at process.jl:430 in readall at process.jl:437 in readchomp at git.jl:26 in installed_version at pkg/read.jl:70 in installed at pkg/read.jl:121 in update at pkg/entry.jl:231 in anonymous at pkg/dir.jl:25 in cd at file.jl:22 in cd at pkg/dir.jl:25 in update at pkg.jl:40 anybody knows what's wrong? Please help.
Re: [julia-users] Type stability of eig
How much worse would performance be if we “upgraded” all results to complex matrices? — John On Jan 28, 2014, at 8:38 PM, Jiahao Chen jia...@mit.edu wrote: The reason is primarily for performance and secondarily for numerical stability. eig() on a Matrix implements a polyalgorithm depending on the symmetries of the input matrix. Certain symmetries, e.g. real symmetric or Hermitian, can be solve significantly more efficiently than the general case, and so eig() attempts to detect these symmetries at runtime and if found, dispatch to different LAPACK routines that are able to take advantage of faster and more stable algorithms. Several other generic linear algebraic functions are written in this fashion, notably \. (This was recently discussed in the context of issue #4006 with particular focus on sqrtm, whose code is somewhat easier to read than eigfact!. https://github.com/JuliaLang/julia/issues/4006) Thanks, Jiahao Chen, PhD Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory
[julia-users] DataFrames changes
As we continue trying to prune DataFrames down to the essentials that we can reasonably commit to maintaining for the long-term future, we've decided to start using only symbols for the names of columns and remove all uses of strings. This change will go live on master today, so please don't pull from master until you're ready to update your code. -- John
Re: [julia-users] Matlab versus Julia for loop timing
Can you show the call to @time / @elapsed so we know exactly what's being timed? -- John On Jan 29, 2014, at 9:28 AM, Rajn rjngrj2...@gmail.com wrote: Now it takes even longer i.e., ~1 minute Does this make sense. Also I am running this loop only once. I do not understand why writing in the function form would help. I read the manual but they suggest writing function form for something which is used many times. I=runave(S,A,f) showim(I); function runave(S,A,f) imsz=size(A); p1=f+1; for n=(f+1):(imsz[2]-f-1) for m=(f+1):(imsz[1]-f) S[m,n+1]=S[m,n]+sum(sum(A[m-f:m+f,n+p1],2))-sum(sum(A[m-f:m+f,n-f],2)); end end S; end Do I have to declare function parameters to speed it up.
Re: [julia-users] How to reload?
I don't think it's possible to redo the importing of names that `using` performs: julia module Foo export a a = 1 end julia using Foo julia a 1 julia module Foo export a a = 2 end Warning: replacing module Foo julia a 1 julia using Foo Warning: using Foo.a in module Main conflicts with an existing identifier. julia a 1 julia Foo.a 2 On Jan 29, 2014, at 1:47 PM, Robert DJ math.rob...@gmail.com wrote: I am starting to work on a package, but I've run into a very mundane problem: I can't figure out how to reload functions after editing. The first time I load the package with using package I discover a bug, fix it and run reload(package) But I still get the same error. If I exit Julia, start it again and load the package the error is (of course) gone. What am I missing? Thanks, Robert
Re: [julia-users] DBI: Generic database access in Julia
Yeah, most of the work needed to push forward is building C wrappers. -- John On Jan 29, 2014, at 11:56 AM, Randy Zwitch randy.zwi...@fuqua.duke.edu wrote: What are the types of skills needed to get this off the ground? I know ODBC.jl is a bunch of wrapping of C functions, is that's what's required here as well? On Monday, January 27, 2014 12:30:22 PM UTC-5, John Myles White wrote: I've been intentionally holding off on announcing this work (because it's not even close to being ready for practical use yet), but I've been working with Eric Davies on a generic database access module in Julia called DBI: https://github.com/johnmyleswhite/DBI.jl The goal of DBI is to provide a consistent interface that specific database drivers can implement. Between Eric and me, some work's been done on implementing this for SQLite, MySQL and Postgres: https://github.com/johnmyleswhite/SQLite.jl https://github.com/johnmyleswhite/MySQL.jl https://github.com/iamed2/PostgreSQL.jl I've unfortunately slowed down so that I can fix up DataFrames, but I've seen a bunch of people working on database support recently and wanted to encourage collaboration early on. Would be great to get everyone interested in database support to work together. I can't be in charge of this for another few weeks, but wanted to start a discussion so that everyone can collaborate effectively. -- John
Re: [julia-users] DBI: Generic database access in Julia
That would be great. -- John On Jan 29, 2014, at 12:19 PM, Stephen Pope stephen.p...@predict.com wrote: I cannot commit to anything at this moment, but surely if no one else implements Oracle.jl my hand will be forced to do it :-)
[julia-users] Re: DBI: Generic database access in Julia
Inspired by Jonathan Malmaud's creation of a JuliaText organization, I created a JuliaDB GitHub organization so that we can have a consistent place to discuss these issues: https://github.com/JuliaDB/Roadmap.jl/issues/1 Looking at Jonathan's approach, I realized that a lot of the Julia SIG's that are forming might benefit from having a Roadmap.jl repo to centralize discussion and point people towards canonical implementations of functionality. It's been quite useful to have one for JuliaStats and I hope JuliaDB will benefit in the same way. -- John On Jan 27, 2014, at 9:30 AM, John Myles White johnmyleswh...@gmail.com wrote: I've been intentionally holding off on announcing this work (because it's not even close to being ready for practical use yet), but I've been working with Eric Davies on a generic database access module in Julia called DBI: https://github.com/johnmyleswhite/DBI.jl The goal of DBI is to provide a consistent interface that specific database drivers can implement. Between Eric and me, some work's been done on implementing this for SQLite, MySQL and Postgres: https://github.com/johnmyleswhite/SQLite.jl https://github.com/johnmyleswhite/MySQL.jl https://github.com/iamed2/PostgreSQL.jl I've unfortunately slowed down so that I can fix up DataFrames, but I've seen a bunch of people working on database support recently and wanted to encourage collaboration early on. Would be great to get everyone interested in database support to work together. I can't be in charge of this for another few weeks, but wanted to start a discussion so that everyone can collaborate effectively. -- John
Re: [julia-users] Can't manage packages
Ok. You'll unfortunately have to either (1) delete your ~/.julia folder or (2) manually rename the Stats package to StatsBase and then edit its .git/config file. -- John On Jan 29, 2014, at 6:21 PM, Carlos Lesmes carlosles...@gmail.com wrote: I got Pkg.rm(Stats) ERROR: failed process: Process(`git --git-dir=/Users/carloslesmes/.julia/.cache/Stats merge-base 0efba512a2bf8faa21e61c9568222ae1ae96acbb 5113ce6044fc554b350ea16f92502f8d6e077a62`, ProcessExited(1)) [1] in pipeline_error at process.jl:476 in readbytes at process.jl:430 in readall at process.jl:437 in readchomp at git.jl:26 in installed_version at pkg/read.jl:70 in installed at pkg/read.jl:121 in resolve at pkg/entry.jl:316 in edit at pkg/entry.jl:24 in rm at pkg/entry.jl:51 in anonymous at pkg/dir.jl:25 in cd at file.jl:22 in cd at pkg/dir.jl:25 in rm at pkg.jl:18 On Tuesday, January 28, 2014 9:56:16 PM UTC-5, John Myles White wrote: Try doing Pkg.rm(“Stats”). — John On Jan 28, 2014, at 6:47 PM, Carlos Lesmes carlos...@gmail.com wrote: Hi, I'm on mac 10.7 Julia 0.2.0, today I updated but found this: julia Pkg.update() INFO: Updating METADATA... INFO: Updating cache of Stats... INFO: Updating cache of StatsBase... INFO: Updating cache of Distance... INFO: Updating cache of JSON... INFO: Updating cache of PyPlot... INFO: Updating cache of NumericExtensions... ERROR: failed process: Process(`git --git-dir=/Users/carloslesmes/.julia/.cache/Stats merge-base 0efba512a2bf8faa21e61c9568222ae1ae96acbb 5113ce6044fc554b350ea16f92502f8d6e077a62`, ProcessExited(1)) [1] in pipeline_error at process.jl:476 in readbytes at process.jl:430 in readall at process.jl:437 in readchomp at git.jl:26 in installed_version at pkg/read.jl:70 in installed at pkg/read.jl:121 in update at pkg/entry.jl:231 in anonymous at pkg/dir.jl:25 in cd at file.jl:22 in cd at pkg/dir.jl:25 in update at pkg.jl:40 anybody knows what's wrong? Please help.
Re: [julia-users] Re: DataFrames changes
We mostly did this to prepare for the time when Julia will let us overload the dot-operator to access columns like df.col1. Symbols also encourage people to use valid Julia identifiers as column names, which makes it easier to work with column names in some contexts. — John On Jan 29, 2014, at 5:46 PM, Cristóvão Duarte Sousa cris...@gmail.com wrote: BTW, is there some documentation about the choice of symbols vs strings for this kind of stuff (dictionary keys, optional function args, etc.)? Are symbols more efficient for this? On Wednesday, January 29, 2014 5:11:20 PM UTC, John Myles White wrote: As we continue trying to prune DataFrames down to the essentials that we can reasonably commit to maintaining for the long-term future, we've decided to start using only symbols for the names of columns and remove all uses of strings. This change will go live on master today, so please don't pull from master until you're ready to update your code. -- John
Re: [julia-users] Matlab versus Julia for loop timing
This is pretty standard fare for Julia. Things like sum are really wasteful with memory, whereas the nuclear option is very conservative when implemented right. — John On Jan 30, 2014, at 7:30 AM, Rajn rjngrj2...@gmail.com wrote: Stefan, You wanted to know how the nuclear option worked in comparison to usage of sum(sub(A,...) for my problem. This is just AMAZING! @time for your 2nd suggestion i.e., sum,sub gave a time of 12.9 seconds @time for your 3rd suggestion i.e., -nuclear suggestion gave a time of 0.36. This is unbelievable!! Am I doing this right? I just took both your code, inserted into my code and only timed this specific section. Does this mean that sum and sub together take nearly 35 times long to run through 1440*1782 loops or a delay of ~40 microsecond per loop? WOW! On Wednesday, January 29, 2014 12:59:27 PM UTC-5, Stefan Karpinski wrote: This sum(sum(foo,2)) business is really wasteful. Just do sum(foo) to take the sum of foo. It's also better to extract the dimensions into individual variables. Something like this: function runave1(S,A,f) s1, s2 = size(A) p1 = f+1 for n = f+1:s2-f-1, m = f+1:s1-f S[m,n+1] = S[m,n] + sum(A[m-f:m+f,n+p1]) - sum(A[m-f:m+f,n-f]) end S end I suspect that since Matlab forces this sum(sum(X,2)) idiom on you, it probably detects it and automatically does the efficient thing. It's unclear to me why you need two sum operations when the slices you're taking are just single columns, but maybe I'm missing something here. Currently, taking array slices in Julia makes a copy, which is unfortunate, but in the future they will be views. In the meantime, you might get better performance by explicitly using views: function runave2(S,A,f) s1, s2 = size(A) p1 = f+1 for n = f+1:s2-f-1, m = f+1:s1-f S[m,n+1] = S[m,n] + sum(sub(A,m-f:m+f,n+p1)) - sum(sub(A,m-f:m+f,n-f)) end S end And, of course, there's always the nuclear option for really performance critical code, which is to write out the summation manually: function runave3(S,A,f) s1, s2 = size(A) p1 = f+1 for n = f+1:s2-f-1, m = f+1:s1-f t = S[m,n] for k = m-f:m+f; t += A[k,n+p1] - A[k,n-f]; end S[m,n+1] = t end S end Not so elegant, but probably the fastest possible version. Ideally, once array slices are views, the simpler version of the code will be essentially equivalent to this. It will take some compiler cleverness, but it's certainly doable. It would be interesting to hear how each of these versions performs on your data. On Wed, Jan 29, 2014 at 12:30 PM, John Myles White johnmyl...@gmail.com wrote: Can you show the call to @time / @elapsed so we know exactly what's being timed? -- John On Jan 29, 2014, at 9:28 AM, Rajn rjngr...@gmail.com wrote: Now it takes even longer i.e., ~1 minute Does this make sense. Also I am running this loop only once. I do not understand why writing in the function form would help. I read the manual but they suggest writing function form for something which is used many times. I=runave(S,A,f) showim(I); function runave(S,A,f) imsz=size(A); p1=f+1; for n=(f+1):(imsz[2]-f-1) for m=(f+1):(imsz[1]-f) S[m,n+1]=S[m,n]+sum(sum(A[m-f:m+f,n+p1],2))-sum(sum(A[m-f:m+f,n-f],2)); end end S; end Do I have to declare function parameters to speed it up.
Re: [julia-users] Matlab versus Julia for loop timing
That's true. Sorry for misstating the core issue, which is memory allocation related to the current definition of array indexing. -- John On Jan 30, 2014, at 8:55 AM, Tim Holy tim.h...@gmail.com wrote: On Thursday, January 30, 2014 07:32:18 AM John Myles White wrote: This is pretty standard fare for Julia. Things like sum are really wasteful with memory, whereas the nuclear option is very conservative when implemented right. To be fair, it's not sum() that's to blame, the problem is allocating a new array with A[m-f:m+f, indx]. --Tim
Re: [julia-users] DataFrames changes
We will automatically convert them to valid identifiers. I fear we are probably not doing that yet, but will get it done before we release a new version. -- John On Jan 30, 2014, at 10:50 AM, Jonathan Malmaud malm...@gmail.com wrote: What's the plan for reading in files that have a header row with non-valid Julia identifiers? On Wednesday, January 29, 2014 10:03:39 PM UTC-5, John Myles White wrote: Please go ahead and add deprecation warnings. — John On Jan 29, 2014, at 6:51 PM, Simon Kornblith si...@simonster.com wrote: I believe two identical symbols are the same object, which implies that Dict lookup shouldn't require hashing. I haven't benchmarked this, though. Since this is a huge change (although one that I am in favor of) that presumably affects a lot of existing code, any objection if I add some deprecation warnings? Simon On Wednesday, January 29, 2014 8:46:19 PM UTC-5, Cristóvão Duarte Sousa wrote: BTW, is there some documentation about the choice of symbols vs strings for this kind of stuff (dictionary keys, optional function args, etc.)? Are symbols more efficient for this? On Wednesday, January 29, 2014 5:11:20 PM UTC, John Myles White wrote: As we continue trying to prune DataFrames down to the essentials that we can reasonably commit to maintaining for the long-term future, we've decided to start using only symbols for the names of columns and remove all uses of strings. This change will go live on master today, so please don't pull from master until you're ready to update your code. -- John
Re: [julia-users] Re: How to write a macro that can substitute variable values into an expression
If you want to do this, the easiest way is to define your own implementation of the @~ macro that the latest version Julia uses to parse expressions that look like R’s formulas. That will give you access to the quoted expressions you’d need to manipulate to do your analysis. Given those quoted expressions, you’ll need to define a symbolic differentiation tool that’s rich enough to handle the inputs you want to process. The Calculus package handles symbolic differentiation for a good chunk of functions, but you may need to extend it to your use case. It may be worth noting that your example makes very heavy usage of R’s non-standard evaluation functionality, which is something that the Julia community has not invested much time into developing yet. Most Julia programmers tend to avoid operating on symbolic expressions. — John On Feb 1, 2014, at 3:42 PM, Walking Sparrow hq...@gopivotal.com wrote: You are right about that I have an R background. What I am trying to do is to evaluate a function given by the user. For example, I want to write a function that can compute the marginal effects of a linear or logistic model. For simplicity, let's just use linear regression. If the user did a linear regression using the following model (I am using the formula syntax from R) y ~ x + z + sin(x) * sin(z) for the data set my_data, which has three columns x, y, and z Then the marginal effects at the mean are computed like this: First, compute the first derivative of 1+ x + z + sin(x) * sin(z). This can be done in R using the function deriv to get the expression of the first derivative. In the second step, I need to substitute the mean values of x and z into the result of the first step. An example of this would be the margins function in the R package PivotalR (http://cran.r-project.org/web/packages/PivotalR/ and https://github.com/gopivotal/PivotalR) Right now, I have no idea how to do the first step in Julia. But that is OK, because I just started learning Julia. Now my question is in the second step. The user can use any complex expressions in the linear regression like y ~ x + x*z + log(sin(x) + 2) * log(cos(z) + 2), and the data set my_data and formula can have any number of variables like x1, x2, , x1000. So when you write the code for the value substitution in the second step, you cannot know which function and what variables you will have. So in Julia or R, I need a function or macro F(f, []) that does this: given a function f, whose format is the input from the user, and a set of variable values [...], whose number and names are also the input from the user, F(f, [...]) returns the value of f evaluated at the values [...]. For example, the user inputs f = 1 + z + cos(x)*log(2+cos(z))/(2+sin(x)) and [x = 2.3, z = 1.4], F should return the value of f evaluated at x = 2.3 and z = 1.4. This can be done in R, see margins function in PivotalR, which actually does big data computation in-database. The problem is how to do the same thing in Julia? Hope my explanation makes my question clearer. On Saturday, February 1, 2014 2:35:38 PM UTC-8, Jameson wrote: You need to provide more detail on what you are trying to do with this. You seem to be confusing several concepts involving the usage of expressions, macros, and functions. I can't tell if you are trying to write special syntax, or are just unaware of anonymous functions: Mostly, why is :(sin(x) + cos(y) * sin(z)) an expression, and not a function? It seems like you perhaps have an R background? f(x,y,z) = (sin(x) + cos(y) * sin(z)) f(1,2,3) On Sat, Feb 1, 2014 at 12:04 PM, Walking Sparrow hq...@gopivotal.com wrote: So the real question is how to generate a code block like this quote x = 2 y = 3 . x + y + end Need to embed a for loop inside the macro definition? On Saturday, February 1, 2014 8:52:30 AM UTC-8, Walking Sparrow wrote: Please forgive me if this is a stupid question. Suppose I have an expression :(sin(x) + cos(y) * sin(z)) and the values of x, y, z. How can I write a macro that can substitute the values of x, y, z into the above expression? The number of values that I want to substitute depends on the actual use cases and thus is unknown. I wrote a function that can do this function substitute(expr::Expr, vals::Array{Expr,1}) for i = 1:length(vals) @eval $(vals[i]) end @eval $expr end x = 10 y = 23 substitute(:(x+y), [:(x = 2), :(y = 3)]) x y But if you run the above code, you will see that the values of global x and y are changed, which is not what I intend to do. This is because eval does the evaluation in the global scope. Besides, I think it is a bad coding pattern to use eval and it is slow. It would be better if this can
Re: [julia-users] Re: Gadfly installation problem... ERROR: DenseArray not defined
I definitely agree that changing the version of Julia a package depends upon should trigger a 0.x - 0.(x + 1) bump. — John On Feb 1, 2014, at 6:37 PM, Kevin Squire kevin.squ...@gmail.com wrote: One related thought: it would be nice if versions which target a new version of Julia got a larger version bump, to make it easier to backport fixes to previous versions of julia. Something like: 0.2.1 # targets Julia v0.2 0.2.2 0.2.3 # last real version which targets v0.2 0.2.4 # simply add julia -0.2 to REQUIRES 0.3.0 # first version which targets v0.3; use julia 0.3- in REQUIRES 0.3.1 0.3.2 # bug fix 0.2.5 # port of bug fix back to 0.2 series There's no reason, of course, that the 0.2.x has to work with Julia v0.2, and 0.3.x has to work with Julia v0.3--It could just as easily be 0.1.x and 0.2.x, or 1.0.x and 2.0.x. Thoughts? Kevin On Sat, Feb 1, 2014 at 3:20 PM, John Myles White johnmyleswh...@gmail.com wrote: I went into METADATA and updated the requires files, then submitted a new commit. I actually did this for one release of NumericExtensions which would reliably crash when loading on the 0.2 release. — John On Feb 1, 2014, at 3:19 PM, Dahua Lin linda...@gmail.com wrote: John, Could you elaborate a little bit about how you did this? Recent changes in NumericExtensions that rely on some new features have caused headaches to users who use 0.2 release. I would like to do something to fix it sometime next week. — Dahua On February 1, 2014 at 5:08:34 PM, John Myles White (johnmyleswh...@gmail.com) wrote: I think so. I’ve done it recently and fixed some errors by doing it. — John On Feb 1, 2014, at 3:07 PM, Dahua Lin linda...@gmail.com wrote: Is it possible to update the requirement of previously tagged versions? On Friday, January 31, 2014 5:13:21 PM UTC-6, Ivar Nesje wrote: It seems like you are using the 0.2.0 version of Julia, and some package authors have not correctly marked new versions of their package to require 0.3.0-prerelease when they decided to use features that has been introduced after the release of 0.2.0. The consequence is that Pkg.add and Pkg.update installs versions of some packages that is incompatible with your version of Julia. I think this is a very unfortunate situation for new people evaluating Julia, and the easiest way to solve this us to compile from source or download a nightly release.
Re: [julia-users] printing from IJulia notebook
Is asking them to print PDF’s using the notebook export tools too onerous? — John On Feb 2, 2014, at 8:33 AM, j verzani jverz...@gmail.com wrote: Is there an easy way to print an IJulia notebook? I'm using julia in a lab setting and am providing notebooks for students to fill out and turn in. I'd prefer they print them. Unfortunately, I don't see a print menu item and the browser's print feature only prints the visible parts of the page. For the tech savvy I've recommended exporting as an ipynb file, uploading to a public site on dropbox, viewing that through nbviewer and then printing that web page. Definitely tedious. Am I missing something obvious?
Re: [julia-users] Julia Parallel Computing Optimization
One potential performance issue here is that the array indexing steps like S[:,i][my] currently produce copies, not references, which would slow things down. Someone with more expertise in parallel programming might have better suggestions than that. Have you tried profiling your code? http://docs.julialang.org/en/latest/stdlib/profile/ — John On Feb 3, 2014, at 6:32 AM, Alex C alex@gmail.com wrote: Hi, I am trying to port some Matlab code into Julia in order to improve performance. The Julia parallel code currently takes about 2-3x as long as my Matlab implementation. I am at wit's end as to how to improve the performance. Any suggestions? I tried using pmap but couldn't figure out how to implement it in this case. FYI, I am using Julia on Windows 7 with nprocs() = 5. Thanks, Alex function expensive_hat(S::Array{Complex{Float64},2},mx::Array{Int64,2},my::Array{Int64,2}) samples = 64 A = @parallel (+) for i = 1:samples abs2(S[:,i][my].*S[:,i][mx]); end B = @parallel (+) for i = 1:samples abs2(sqrt(conj(S[:,i][mx+my]).*S[:,i][mx+my])); end C = @parallel (+) for i = 1:samples conj(S[:,i][mx+my]).*S[:,i][my].*S[:,i][mx]; end return (A.*B./samples./samples, C./samples); end data = rand(24000,64); limit = 2000; ix = int64([1:limit/2]); iy = ix[1:end/2]; mg = zeros(Int64,length(iy),length(ix)); mx = broadcast(+,ix',mg); my = broadcast(+,iy,mg); S = rfft(data,1)./24000; @elapsed (AB, C) = expensive_hat(S,mx,my)
Re: [julia-users] Julia Parallel Computing Optimization
Just to be clear: in the future, Julia will not makes copies during array slicing. But it does now, which can be costly. — John On Feb 3, 2014, at 7:01 AM, David Salamon d...@lithp.org wrote: I agree with John about the insane amount of copying going on. However, I added some @times to your code and it looks like most of the time is spent in conj. You probably want to precompute that for both B and C's calculation. function expensive_hat(S::Array{Complex{Float64},2}, mx::Array{Int64,2}, my::Array{Int64,2}) samples = 64 @time A = @parallel (+) for i = 1:samples abs2(S[:,i][my] .* S[:,i][mx]); end #@time B = @parallel (+) for i = 1:samples # abs2( sqrt( conj(S[:,i][mx+my]) .* S[:,i][mx+my] ) ) @time b0 = conj(S[:,1][mx+my]) @time b1 = b0 .* S[:,1][mx+my] @time b2 = sqrt(b1) @time B = abs2(b2) #end @time C = @parallel (+) for i = 1:samples conj(S[:,i][mx+my]) .* S[:,i][my].*S[:,i][mx]; end @time ans = (A .* B ./ samples ./ samples, C./samples) return ans end data = rand(24000,64); limit = 2000; ix = int64([1:limit/2]); iy = ix[1:end/2]; mg = zeros(Int64,length(iy),length(ix)); mx = broadcast(+,ix',mg); my = broadcast(+,iy,mg); S = rfft(data,1)./24000; @time (AB, C) = expensive_hat(S,mx,my) On Mon, Feb 3, 2014 at 6:59 AM, John Myles White johnmyleswh...@gmail.com wrote: One potential performance issue here is that the array indexing steps like S[:,i][my] currently produce copies, not references, which would slow things down. Someone with more expertise in parallel programming might have better suggestions than that. Have you tried profiling your code? http://docs.julialang.org/en/latest/stdlib/profile/ — John On Feb 3, 2014, at 6:32 AM, Alex C alex@gmail.com wrote: Hi, I am trying to port some Matlab code into Julia in order to improve performance. The Julia parallel code currently takes about 2-3x as long as my Matlab implementation. I am at wit's end as to how to improve the performance. Any suggestions? I tried using pmap but couldn't figure out how to implement it in this case. FYI, I am using Julia on Windows 7 with nprocs() = 5. Thanks, Alex function expensive_hat(S::Array{Complex{Float64},2},mx::Array{Int64,2},my::Array{Int64,2}) samples = 64 A = @parallel (+) for i = 1:samples abs2(S[:,i][my].*S[:,i][mx]); end B = @parallel (+) for i = 1:samples abs2(sqrt(conj(S[:,i][mx+my]).*S[:,i][mx+my])); end C = @parallel (+) for i = 1:samples conj(S[:,i][mx+my]).*S[:,i][my].*S[:,i][mx]; end return (A.*B./samples./samples, C./samples); end data = rand(24000,64); limit = 2000; ix = int64([1:limit/2]); iy = ix[1:end/2]; mg = zeros(Int64,length(iy),length(ix)); mx = broadcast(+,ix',mg); my = broadcast(+,iy,mg); S = rfft(data,1)./24000; @elapsed (AB, C) = expensive_hat(S,mx,my)
Re: [julia-users] How to write a macro that can substitute variable values into an expression
To make sure everyone’s on the same page, Walking Sparrow’s approach is completely standard for R. The way that R treats certain DataFrames as an additional scope in which to search for variable bindings is something R users have been taught to expect, even though it is an extremely un-Julian way of coding. All that said, in DataFrames, our current solution is to completely avoid this kind of scoping until we’re confident that we can make it work efficiently. We may come back to it in the future, but there are other priorities to work on now. — John On Feb 2, 2014, at 12:35 PM, Mauro mauro...@runbox.com wrote: On Sun, 2014-02-02 at 17:35, hq...@gopivotal.com wrote: User inputs a function, and a data.frame which contains all the variables that appear in the function. I will need to substitute the mean values of the variables into the function. (Actually for computing the marginal effects, one also needs to compute the average of the function values evaluated at all rows of the data.frame). I think you're making hacking-life more complicated than it already is! You'll only need macros if you insist that the naming of the function arguments is automatically matched against the column names of the dataframe. But I don't think that that is good idea: names of function arguments are here to refer to values inside the function and not outside of it. Nor is it, I think, a particular Julian way of coding. I suggest instead something like this: user supplies - a function: f(a,b,c,d) = ... - a DataFrame: df - a tuple/list of column names to be used in the order they need to be inserted into f; i.e. this is a mapping from column-names to function argument position. E.g.: (height, :width, 'd', :x) you provide a function like so: function F(userfn, datafr, fields) # take mean of dataframe columns colmeans = [mean(datafr[fl]) for fl in fields] # maybe do some more stuff: # call user function return userfn(colmeans...) # (the three dots are the syntax used here) end Then the user can call it like so: F(f, df, (height, :width, 'd', :x)) I reckon you ought to give this kind of user interface a try and see whether that works for you. So in order to use apply, I will need to extract the variable order and names from the user-defined function. This is because the function might be func(x,y,z), but the data.frame has the columns z, x, y, a, b, c, d (it has more columns than what are needed by the function, which is the usual case. And the order is different). So John Myles White's opinion is that this is very hard to do in the current Julia (see his post above). On Sunday, February 2, 2014 9:03:21 AM UTC-8, Johan Sigfrids wrote: Thinking about this, the whole let or macro might be overkill. If the user provides both the function and the arguments, the user should be able to provide the arguments in the correct form for the function, in which case you need neither let nor macros. You could just call apply directly on those two: user_function(x,y,z) = x + y + z^2 user_arguments = (3, 4, 5) apply(user_function, user_arguments...) On Sunday, February 2, 2014 6:49:52 PM UTC+2, Walking Sparrow wrote: I guess apply and let can do some work here. But I do not know the variable names and number that the user would use. So now I need a macro that can construct the let-apply block with the variable number undetermined. The macro should be able to accept any number of variables. Suppose that the user inputs func(x, y) = x+2y and x = 1, y = 2 @my_macro func (x=1, y=2) would be expanded to let x = 1, y = 2 apply(func,1, 2) end And if the user inputs func(a, b, c, d) = a + b + c + d, and a = 1, b = 2, c = 3, d=4 @my_macro func (a = 1, b = 2, c = 3, d = 4) would expand to let a = 1, b = 2, c = 3, d = 4 apply(func, 1,2,3,4) end How to write a macro like this? If I knew the function and the variables, of course I could directly call the function or use let-apply, but the problem is that these are the inputs of the user, which I cannot know beforehand. On Sunday, February 2, 2014 8:11:34 AM UTC-8, Keno Fischer wrote: Or you could just call the function directly: f = (x,y,z)-x+y+z^2 let x=3, y=4, z=5 f(x,y,z) end or f((x,y,z)...) or f((1,2,3)...) On Sun, Feb 2, 2014 at 10:59 AM, Johan Sigfrids johan.s...@gmail.comwrote: Can't you just do this with apply? Something like this: f = (x, y, z) - x + y + z^2 let x=3, y=4, z=5 apply(f, x, y, z) end On Sunday, February 2, 2014 5:47:36 PM UTC+2, Walking Sparrow wrote: Let me clarify a little bit. My question is actually the following: In R, one can do something like f - function(x1, x2, x3, x4, x5, x6) { some expressions that you like to use } evaluate.at - list(x1 = 2, x2 = 2.3, x3 = 2, x4 = 1.2, x5 = 3.4, x6
Re: [julia-users] Sorting Index
I think you want sortperm. — John On Feb 4, 2014, at 6:24 AM, RecentConvert giz...@gmail.com wrote: Is there an easier method to obtain the sorting index given a column of data? In Matlab you can add a second output and it'll give you an index which you can apply to other related arrays. using Datetime using DataFrame D = # load your data, DataFrame time = # Parse time from your loaded data y = [int64(time) [1:length(time)]] I = sortrows(y,by=x-x[1]) # Sort index (by time) I = I[1:end,2] # Remove unnecessary time column time = time[I] # Sort time by time
Re: [julia-users] Gadfly installation problem... ERROR: DenseArray not defined
Yes, assuming we can get the builds working smoothly, it would be really great to offer stable and unstable binaries right on the main downloads page. — John On Feb 4, 2014, at 11:52 AM, Eric Davies iam...@gmail.com wrote: On Tuesday, 4 February 2014 12:35:19 UTC-6, Sung Soo Kim wrote: I think it would be a good idea to provide pre-release version as a binary. A simple overnight automatic build system can be used to upload the most recent pre-release version to the website easily, weekly (or even daily) basis (though must be after success of automatic testing of the core and packages, of course), so that new comers don't have to get into 'compiling' from the source codes. Compiling IS a major barrier. This exists at http://status.julialang.org/, but it really should be visible from the http://julialang.org/downloads/ page (and perhaps have matching styling).
Re: [julia-users] Gadfly installation problem... ERROR: DenseArray not defined
I’m sure you are. :) — John On Feb 4, 2014, at 6:37 PM, Elliot Saba staticfl...@gmail.com wrote: We're working on it, I promise. :) On Feb 4, 2014 6:01 PM, John Myles White johnmyleswh...@gmail.com wrote: Yes, assuming we can get the builds working smoothly, it would be really great to offer stable and unstable binaries right on the main downloads page. — John On Feb 4, 2014, at 11:52 AM, Eric Davies iam...@gmail.com wrote: On Tuesday, 4 February 2014 12:35:19 UTC-6, Sung Soo Kim wrote: I think it would be a good idea to provide pre-release version as a binary. A simple overnight automatic build system can be used to upload the most recent pre-release version to the website easily, weekly (or even daily) basis (though must be after success of automatic testing of the core and packages, of course), so that new comers don't have to get into 'compiling' from the source codes. Compiling IS a major barrier. This exists at http://status.julialang.org/, but it really should be visible from the http://julialang.org/downloads/ page (and perhaps have matching styling).
Re: [julia-users] Re: operators and basic mathematical functions for DataFrames
This is definitely on purpose. Quick summary: * DataMatrix is a mathematical object * DataFrame is a database We're going to encourage use of colwise for some of these use cases. But for many of them we're going to encourage the use of DataMatrix instead. -- John On Feb 5, 2014, at 5:07 AM, Johan Sigfrids johan.sigfr...@gmail.com wrote: Issue #484 seems to indicate it is on purpose. On Wednesday, February 5, 2014 3:00:39 PM UTC+2, Christian Groll wrote: Since updating DataFrames and DataArrays recently, operators and basic functions are not working on DataFrames anymore. Is this a new design decision, or only temporary due to restructuring the code base? julia Pkg.status() - DataFrames0.5.1 - DataArrays0.1.1 julia df = DataFrame(rand(4, 2)) 4x2 DataFrame |---|--|--| | Row # | x1 | x2 | | 1 | 0.698851 | 0.353054 | | 2 | 0.427287 | 0.76353 | | 3 | 0.872991 | 0.182744 | | 4 | 0.779048 | 0.554823 | julia df + 1 ERROR: no method +(DataFrame, Int64) julia mean(df) ERROR: no method +((ASCIIString,DataArray{Float64,1}), (ASCIIString,DataArray{Float64,1})) in mean at statistics.jl:11 julia df + df ERROR: no method +(DataFrame, DataFrame)
Re: [julia-users] Move Clustering.jl to JuliaStats
That's true. I find the mechanism a little opaque, so it makes it uncomfortable. But hopefully it will all work out. -- John On Feb 5, 2014, at 2:04 AM, Ivar Nesje iva...@gmail.com wrote: I think Github will set up redirects if you use the move functionality. On Wednesday, February 5, 2014 2:54:36 AM UTC+1, John Myles White wrote: Hi all, Over the coming weekend, I am going to move Clustering.jl to JuliaStats. I hope the move will go smoothly, but am always wary about changing repo URL’s. — John
Re: [julia-users] Re: If (in my system) Int is an alias for Int32, then why there is no Float alias for Float32/64?
FYI, this claim about the safety of symbols is actually not true. You can reassign the bindings of sym just as easily as you can reassign the bindings of a variable bound to a. -- John On Feb 7, 2014, at 8:00 AM, Felix dotfel...@gmail.com wrote: Ismeal VC check the julia docs at http://docs.julialang.org/en/latest/ for the rest keep asking in this group someone will surely help you out. like look at a symbol as a safe string when you do something like sym = :hello you will always know sym is hello you can also use it as an expression like http://docs.julialang.org/en/latest/manual/metaprogramming/#expressions-and-eval
Re: [julia-users] Re: Multiple plots in one (pdf) file?
Isn't the behavior Daniel described how ggplot2 works? Certainly it's how ggsave works. -- John On Feb 7, 2014, at 9:41 AM, G. Patrick Mauroy gpmau...@gmail.com wrote: Ouch! In my opinion, this may be a major stumbling block for Julia adoption. I, and I am sure many, find it typical routine to load data, crunch, make a variety of graphical views (sometimes very many), export them to files in an organized way for analysis and sharing a story line. With many such plots, one file per plot could become quickly messy, harder to manage. I suppose then a workaround would be to organize plots in sub-directories, as PNG pictures for ease of scrolling through them. Perhaps not that bad after all thinking about it. I suppose I can live with that. I still believe it would be a good idea if support to have multiple plots in one pdf would be added somehow, very handy! Thanks for the info, it saves me some search time. On Friday, February 7, 2014 12:02:28 PM UTC-5, Daniel Jones wrote: There's not a way to put them on separate pdf pages, but you can stack them and output them to the same pdf like: using Gadfly x = [1,2,3] plot1 = plot(x = x, y = x + 3) plot2 = plot(x = x, y = 2 * x + 1) draw(PDF(plotJ.pdf, 6inch, 6inch), vstack(plot1, plot2)) On Friday, February 7, 2014 8:35:57 AM UTC-8, G. Patrick Mauroy wrote: Just starting taking a look at Julia. I have seen examples on how to send a plot to a file. But I have not stumbled upon one example as yet to export multiple plots to the same file, say pdf. Can someone please point me in the right direction? # R example of what I would like to do. x = 1:3 pdf(file = plotR.pdf) plot(x = x, y = x + 3) plot(x = x, y = 2 * x + 1) dev.off() # My first Julia attempt. using Gadfly x = [1,2,3] plot1 = plot(x = x, y = x + 3) plot2 = plot(x = x, y = 2 * x + 1) draw(PDF(plotJ.pdf, 6inch, 3inch), plot1) draw(PDF(plotJ.pdf, 6inch, 3inch), plot2) Pb: plot2 overrides plot1, so only plot2 in plotJ.pdf. To be clear, in this example, I want plot1 plot2 in two distinct plots/pages -- as opposed to merge both graphs into one plot. Thanks.
[julia-users] DBI / DBDSQLite
I’ve just moved DBI.jl to JuliaDB, the organization that I’m hoping will house Julia’s emerging database packages. In the interest of getting some eyes on the DBI library without breaking Jacob Quinn’s substantially more stable SQLite.jl package, I’ve created a new DBDSQLite.jl package that provides a basic implementation of DBI’s interface. Right now you need to make a custom binary of SQLite3 to use DBDSQLite.jl. I’m hoping to automatically build/provide custom binaries in the future to work around this. Links for the curious: https://github.com/JuliaDB/DBI.jl https://github.com/JuliaDB/DBDSQLite.jl — John