Re: [julia-users] problem plotting with Gadfly/Cairo

2013-12-21 Thread John Myles White
You haven’t installed Cairo yet it seems. Or at least Julia isn’t finding Cairo 
installed where it expects to find it.

 — John

On Dec 21, 2013, at 2:26 AM, Laksh Gupta glaks...@gmail.com wrote:

 Hi
 
 I am running 64 bit Julia Studio 0.4.3 on Windows 8. I installed Gadfly and 
 Cairo but is still facing problems while trying to plot anything:
 
 julia plot plot not defined
  
 
  julia Gadfly.plot
  plot (generic function with 6 methods)
  
 
  julia Gadfly.plot(x=collect(1:100), y=sort(rand(100)))
  Plot(...)
 
 
 julia p = Gadfly.plot(x=collect(1:100), y=sort(rand(100)))
 
 Plot(...)
 
 
 julia draw(PNG(plot.png, 6.5inch, 3inch), p)
 
 Cairo must be installed to use the PNG backend.
 
 
 
 
 julia using Cairo
 
 
 
 
 
 
 
 julia draw(PNG(plot.png, 6.5inch, 3inch), p)
 
 Cairo must be installed to use the PNG backend.
 
 
 Any idea what am I missing here?
 
 Thanks,
 lg



Re: [julia-users] Composite Types With Initialized Fields

2013-12-21 Thread John Myles White
Assigning default values to fields of a composite type is not yet supported.

Your inner constructor is also a little un-Julian, since `MyType() = new()` 
doesn’t assign any values to those fields.

 — John

On Dec 21, 2013, at 4:37 AM, Marcus Urban math...@gmail.com wrote:

 I am a little confused about constructing composite types. Given the 
 definition
 
 type MyType
   x::Int
   y::Int = 6
   MyType() = new()
 end
 
 an instance of MyType can be created using
 
   m = MyType()
 
 At that point, m.x acts as expected --- I can assign to it, read its value, 
 and so forth. However, attempting to access m.y produces an error that MyType 
 has no field y. Based on another post, I gather that my attempt to provide a 
 value to m.y in this manner is not allowed If that's the case, what exactly 
 is the effect of y::Int = 6 If this part of the code is completely ignored, 
 it would be really nice if the system let me know since initializing fields 
 in this way is common in many languages.
 
 Also, I gather that a workaround is to use a constructor that takes named 
 arguments. Is that still the recommended way? With just two fields, things 
 are not difficult, but if the type has 20, calling a constructor with 20 
 positional arguments would be difficult.
 
 



Re: [julia-users] Performance of varargs indexing

2013-12-26 Thread John Myles White
Hi Milan,

Have you looked at the many table-like functions already in existence? We have 
xtabs, xtab and table already.

Would be nice to shrink everything down to one high-performance function.

 -- John

On Dec 26, 2013, at 6:05 AM, Milan Bouchet-Valat nalimi...@club.fr wrote:

 Hi!
 
 I've been trying to implement a table and cross-table function for generic 
 AbstractVectors and a more efficient version for PooledDataVectors (from 
 DataArrays). I have something that seems to work fine for the latter, but the 
 performance is not completely satisfying. See the code here: 
 https://gist.github.com/nalimilan/8132114
 
 Something like this:
 a = PooledDataArray(rep(1:10, 10))
 table(a)
 @time table(a)
 
 Reports about 1s here, while the same thing in R take about .4s. My 
 implementation has the advantage that it does not copy the input vectors, 
 which may have a great impact when working with large data under memory 
 pressure.
 
 But I think I'm doing many things wrong, since the allocated bytes are much 
 higher than I would expect/like. Ideally there wouldn't be any allocation in 
 the inner loop. It seems that the main problem comes from the transformation 
 from vector to varargs that happens in a[el...] += 1. In an ideal world the 
 compiler would detect that the length of el is fixed for given input types, 
 and it would be able to make it equivalent to a direct call. But maybe I'm 
 not doing this correctly. Or would I be better off computing the linear index 
 manually by combining the indexes on the different dimensions?
 
 A secondary issue is that += seems to involve a call to getindex() and 
 another to setindex!(), while theoretically it would be possible to do both 
 at the same time once the pointer to the array position has been computed. Is 
 this a planned optimization? (For the general AbstractVector method, I need a 
 similar feature but applied to Dicts, and I've seen that an update() method 
 is apparently planned.)
 
 Thanks for your help (I plan to open a PR to discuss the interface soon)



Re: [julia-users] Re: Modules, Closures and Methods

2013-12-28 Thread John Myles White
Haven’t had time to read through this in depth, but is your concern that 
abstract types can’t contain fields? That is likely to get fixed at some point 
in the future.

 — John

On Dec 28, 2013, at 9:31 AM, andrew cooke and...@acooke.org wrote:

 thanks.  i've done almost exactly the same thing with the Nothing type at 
 https://github.com/andrewcooke/BlockCipherSelfStudy.jl/blob/master/src/GA.jl
 
 i don't think my problem is specific to GA.  the problem is how to add extra 
 state to an api.
 
 in traditional OO you can use inheritance to create a new class with the 
 extra state.
 
 in julia you cannot.  nor can you add it inside closures because you can't 
 (as far as i can tell) extend methods in another module with a closure.
 
 so instead you have to spot ahead of time that a user might want to add extra 
 state and provide an additional parameterized field (the context in my code 
 linked to above) where the user can store arbitrary information.
 
 which seems ugly and prone to errors (what if you miss somewhere)?
 
 so i still hope there's a better solution.
 
 cheers, andrew
 
 On Saturday, 28 December 2013 04:50:11 UTC-3, Toivo Henningsson wrote:
 I don't have enough background in genetic algorithms to understand what you 
 are trying to accomplish, but I think that to answer the question of how to 
 write code so that it can be generally extended by users, the first thing to 
 ask is what the interface to the code that you want to write really is (in 
 abstract terms). Then, one can start to model it with types, generic 
 functions, inheritance, etc.
 
 Also, to create a generic function without actually providing any 
 implementations, I've lately been using things like
 
 f(::None) = nothing
 
 which seems to work fine.
 
 



Re: [julia-users] Why should map be avoided (for performance)?

2013-12-29 Thread John Myles White
My understanding (which may be out-of-date) is that the current version of map 
frequently doesn’t get the type of its input correct. That may have been fixed 
since I developed the habit of not using map.

 — John

On Dec 29, 2013, at 11:58 AM, andrew cooke and...@acooke.org wrote:

 From the comment at https://gist.github.com/nalimilan/8132114
 (am I reading it wrong, does it just mean map with local anon functions?)
 
 Is it the overhead of creating an intermediate Task?  Are there any plans to 
 merge nested tasks as an optimisation (I have no idea if something like that 
 is even possible)?  Or to replace the collect(map(...)) idiom with 
 something faster?
 
 Thanks,
 Andrew



[julia-users] ccall confusion

2013-12-29 Thread John Myles White
I’m trying to use ccall to access the following function from the SQLite3 API:

int sqlite3_table_column_metadata(
  sqlite3 *db,/* Connection handle */
  const char *zDbName,/* Database name or NULL */
  const char *zTableName, /* Table name */
  const char *zColumnName,/* Column name */
  char const **pzDataType,/* OUTPUT: Declared data type */
  char const **pzCollSeq, /* OUTPUT: Collation sequence name */
  int *pNotNull,  /* OUTPUT: True if NOT NULL constraint exists */
  int *pPrimaryKey,   /* OUTPUT: True if column part of PK */
  int *pAutoinc   /* OUTPUT: True if column is auto-increment */
);

My attempt to do so keeps failing, so I suspect that I’m just not using ccall 
correctly. I keep trying the following lines and getting segfaults:

using DBI
using SQLite

db = connect(SQLite3, db/tmp.sqlite3)

dbptr = db.ptr
table = users
column = id
datatype = Array(Ptr{Uint8}, 1)
collseq = Array(Ptr{Uint8}, 1)
notnull = zero(Cint)
primarykey = zero(Cint)
autoinc = zero(Cint)

ccall((:sqlite3_table_column_metadata, sqlite3_lib),
  Cint,
  (Ptr{Void},
   Ptr{Uint8}, Ptr{Uint8}, Ptr{Uint8},
   Ptr{Ptr{Uint8}}, Ptr{Ptr{Uint8}},
   Ptr{Cint}, Ptr{Cint}, Ptr{Cint}),
  dbptr,
  convert(Ptr{Uint8}, C_NULL), table, column,
  datatype, collseq,
  notnull, primarykey, autoinc)

Any thoughts?



Re: [julia-users] ccall confusion

2013-12-30 Thread John Myles White
Thanks for the comments, Isaiah! It never occurred to me that library versions 
would be an issue here because I had forgotten that this function isn’t always 
defined unless certain compiler directives are enabled.

I was using the version being found by the default search strategy for SQLite. 
I’ve changed to using a custom installation to make sure that the library I’m 
using has access to the function I want.

This works for me now, but I’m not sure how to get access to the integer 
outputs from this function, which are all passed as pointers. I tried to change 
the return signature, but I don’t think I understand how to do it correctly.

 — John

On Dec 30, 2013, at 1:44 AM, Isaiah Norton isaiah.nor...@gmail.com wrote:

 It's unclear what version of various libraries you are using, because I had 
 to make several changes to get this to run. However, the following works fine 
 (for me..). You might want to try deleting `SQLite/lib/sqlite3.dylib`, in 
 case an incompatible version of the shared library is being picked up. The 
 lib should come from HomeBrew if you are on OS X, as SQLite has a BinDeps 
 install rule (so I'm not sure what the deal is with the dylib and .so files).
 
 using DBI
 using SQLite
 
 db = SQLite.connect(/tmp/db.sqlite3)
 
 dbptr = db.handle   # changed
 table = users
 column = id
 datatype = Array(Ptr{Uint8}, 1)
 collseq = Array(Ptr{Uint8}, 1)
 notnull = zero(Cint)
 primarykey = zero(Cint)
 autoinc = zero(Cint)
 
 ccall((:sqlite3_table_column_metadata, SQLite.sqlite3_lib),   # changed
   Cint,
   (Ptr{Void},
Ptr{Uint8}, Ptr{Uint8}, Ptr{Uint8},
Ptr{Ptr{Uint8}}, Ptr{Ptr{Uint8}},
Ptr{Cint}, Ptr{Cint}, Ptr{Cint}),
   dbptr,
   convert(Ptr{Uint8}, C_NULL), table, column,
   datatype, collseq,
   notnull, primarykey, autoinc)
 
 
 
 
 
 On Mon, Dec 30, 2013 at 12:00 AM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 I’m trying to use ccall to access the following function from the SQLite3 API:
 
 int sqlite3_table_column_metadata(
   sqlite3 *db,/* Connection handle */
   const char *zDbName,/* Database name or NULL */
   const char *zTableName, /* Table name */
   const char *zColumnName,/* Column name */
   char const **pzDataType,/* OUTPUT: Declared data type */
   char const **pzCollSeq, /* OUTPUT: Collation sequence name */
   int *pNotNull,  /* OUTPUT: True if NOT NULL constraint exists */
   int *pPrimaryKey,   /* OUTPUT: True if column part of PK */
   int *pAutoinc   /* OUTPUT: True if column is auto-increment */
 );
 
 My attempt to do so keeps failing, so I suspect that I’m just not using ccall 
 correctly. I keep trying the following lines and getting segfaults:
 
 using DBI
 using SQLite
 
 db = connect(SQLite3, db/tmp.sqlite3)
 
 dbptr = db.ptr
 table = users
 column = id
 datatype = Array(Ptr{Uint8}, 1)
 collseq = Array(Ptr{Uint8}, 1)
 notnull = zero(Cint)
 primarykey = zero(Cint)
 autoinc = zero(Cint)
 
 ccall((:sqlite3_table_column_metadata, sqlite3_lib),
   Cint,
   (Ptr{Void},
Ptr{Uint8}, Ptr{Uint8}, Ptr{Uint8},
Ptr{Ptr{Uint8}}, Ptr{Ptr{Uint8}},
Ptr{Cint}, Ptr{Cint}, Ptr{Cint}),
   dbptr,
   convert(Ptr{Uint8}, C_NULL), table, column,
   datatype, collseq,
   notnull, primarykey, autoinc)
 
 Any thoughts?
 
 



Re: [julia-users] Random.rand must be explicitly imported to be extended

2013-12-30 Thread John Myles White
It’s fine to punt on things. You can either not include those methods at all or 
include them as

skewness(d::HarryDist) = error(“Not yet implemented”)

 — John

On Dec 30, 2013, at 2:15 PM, Harry Southworth harry.southwo...@gmail.com 
wrote:

 Thanks for the tip.
 
 Another question, and possibly one not best posted here:
 
 Is there a minimum required functionality for adding a distribution to the 
 package? I ask because the package seems to want me to provide skewness and 
 kurtosis functions, but I've never wanted to know those things, don't know 
 why anyone else would, have never seen them written down, and would rather 
 spend my time doing something else.
 
 Thanks again,
 Harry



Re: [julia-users] create a type for points on n-dimensional simplex

2013-12-31 Thread John Myles White
I think you’d need a family of types to do that. You might look at 
https://github.com/twadleigh/ImmutableArrays.jl and try to extend it.

 — John

On Dec 31, 2013, at 7:13 AM, Christian Groll groll.christian@gmail.com 
wrote:

 I already know how I could implement this for a given dimensionality. For 
 example, for the two-dimensional case I can define:
 
 immutable twoDimSimplex
 weight1::Float64
 weight2::Float64
 
 twoDimPortfolio(x::Float64, y::Float64) = (abs(x+y - 1)  1e-10) ? 
 error(entries must sum to one) : new(x,y)
 end
 
 However, I do not get how I could extend this for the n-dimensional case. 
 Here, I thought that I would have to use one field which stores a 
 n-dimensional vector:
 
 immutable nDimSimplex
 points::Vector{Float64}
 
 nDimSimplex(x::Vector{Float64}) = (abs(sum(x) - 1)  1e-10) ? 
 error(entries must sum to one) : new(x)
 end
 
 Now, I think that it will not be possible to change the vector that the field 
 points to. However, the entries of the vector itself still can be changed 
 without restrictions. Any recommendations?
 
 On Tuesday, 31 December 2013 12:33:57 UTC+1, Tim Holy wrote:
 Use an immutable 
 http://docs.julialang.org/en/release-0.2/manual/types/#immutable-composite- 
 types 
 with an inner constructor: 
 http://docs.julialang.org/en/release-0.2/manual/constructors/#inner- 
 constructor-methods 
 
 You may find the example 
 http://docs.julialang.org/en/release-0.2/manual/constructors/#case-study- 
 rational 
 helpful. 
 
 Best, 
 --Tim 
 



Re: [julia-users] Re: Style Guideline

2013-12-31 Thread John Myles White
(4) Using both tabs and spaces is a huge problem in a shared codebase. This is 
probably the only rule in my entire list that I’m actually going to enforce in 
the code I maintain. IIRC, Python completely forbids mixing these kinds of 
space characters at the language level.

(7) + (8) These rules are part of the official Google style guides for R, which 
is the language with the most similarity to Julia that’s being used at 
companies with public facing style guidelines. I think they’re quite sensible 
rules, which is why I decided to borrow them from published standards.

(18) + (19): This is clearly an area of big disagreement in our community. I 
might pull them out into a suggestions section since I’d really prefer that 
code submitted to things like DataFrames.jl follow this rule, but don’t want to 
include a rule that’s going to be a big schism in the community.

(22) + (23) + (24): I may take these out as well. I definitely agree that 
there’s a big difference between performance guidelines and style guidelines, 
although that line is blurry when you’re trying to keep a codebase written in a 
consistent style.

(31): Comments aren’t PDF’s or HTML or any other language designed for 
transmitting carefully formatted documents. You don’t get to use images, 
properly formatted tables, etc. I find diagrams are an essential part of good 
documentation. I think conflating documents with code leads to documents that 
are less readable and lots of lines in code that’s not actually worth reading.

(35): I might take this one out as well. It’s somewhere on the boundary between 
a performance tip and a style habit worth developing.

 — John

On Dec 31, 2013, at 11:12 AM, Daniel Carrera dcarr...@gmail.com wrote:

 Personally, I do not think that a more thorough style guide is necessarily 
 better. That said, I will give you my comments:
 
 (4):  I like tabs and I use them.
 
 (7) + (8):  I disagree. Although I generally use comma+space as you say, at 
 times I deviate from that when I feel that doing so will improve the clarity 
 and readability of my code.
 
 (18)+(19):  I disagree. Although I could favour rules like this in a 
 particular project, in many cases I think that adding type annotations just 
 creates syntactic noise and can create a needless limitation.
 
 (22)+(23)+(24): I do not think that performance tips belong in a style guide. 
 You could spend a lot of time writing performance tips and I don't see an 
 obvious reason why the three tips you chose are more important than other 
 performance tips.
 
 (31): I partially disagree. I like writing documentation (e.g. tutorial or 
 explaining an algorithm) at the top of the file. I like having the 
 documentation in the same file as the code that it refers to. I do not know 
 what you mean when you say that English documents are more readable when not 
 constrained by the rule of code comments. What rules are those?
 
 Also, I rarely want to have a diagram in my documentation because that 
 involves starting a WYSIWYG program like LibreOffice or something like that. 
 I haven't really felt a lot of need for diagrams.
 
 
 (35): This doesn't sound like a style thing either. Advice on the correct way 
 to use a module, or how to maintain precision or avoid round-off errors, do 
 not belong in a style guide. This sort of thing belongs in either the 
 documentation for the module, or on some tutorial about numerical computation.
 
 Cheers,
 Daniel.
 
 
 
 On Tuesday, 31 December 2013 10:01:23 UTC-5, John Myles White wrote:
 One of the things that I really like about working with the Facebook codebase 
 is that all of the code was written to comply with a very thorough internal 
 style guideline. This prevents a lot of useless disagreement about code 
 stylistics and discourages the creation of unreadable code before anything 
 reaches the review stage. 
 
 In an attempt to emulate that level of thoroughness, I decided to extend the 
 main Julia manual’s style guide by writing my own personal style guideline, 
 which can be found at https://github.com/johnmyleswhite/Style.jl 
 
 I’d be really interested to know what others think of these rules and what 
 they think is missing. Right now, my guidelines leave a lot of wiggle room. 
 
  — John 
 



Re: [julia-users] Style Guideline

2013-12-31 Thread John Myles White
You’re totally right that Base Julia has a very different implicit style guide 
than I’ve been using. That’s intentional since I find that some of the Base 
Julia code is a little hard to read at times. I’ve also been bitten by the 
absence of important type constraints in Base before (think of when show(io) 
used to have no type information), so I’ve tended towards initially 
conservative typing until it’s clear that looser typing is needed.

I’m not sure there’s much benefit in having rules that involve personal 
judgement because reasonable people can make different judgements. So I’d 
rather have no rule at all (and just let things happen as they may) than try to 
formalize a rule whose application can’t be reliably checked by a linting 
program.

Coming from R, I’m pretty strongly opposed to Matlab's precedence rule for “:”. 
I find it hard to read and really wish that it hadn't made it impossible for us 
to match R’s formula syntax. The “:” operator’s precedence is by far the part 
of Julia that I most dislike (which, of course, is why I’m such a big fan of 
Julia, since that’s a minor problem to have as your worst quality.)

The for loops thing is one where I don’t have strong feelings, but tend to 
prefer consistency. I see the appeal of using “=“ in some contexts, but find it 
easier to avoid using different things to express the same concept.

 — John

On Dec 31, 2013, at 10:54 AM, Stefan Karpinski ste...@karpinski.org wrote:

 I would mention that the vast majority of Base Julia, although it's fairly 
 internally consistent, does not follow a lot of these rules. In particular, 
 the whitespace rules and some of the type annotation rules, and for x in vs 
 for x =. I tend to follow rules that require a bit of judgement, but 
 therefore convey some subtle information about the code.
 
 Whitespace. I don't use spaces when calling functions that are mathy: f(x,y). 
 I do, on the other hand, tend to use spaces when calling non-mathy functions: 
 endswith(str, substr). I think that math expressions should be spaced so that 
 they're readable and I'm not sure that a fixed set of rules does that, 
 although no spaces for tighter operations and spaces for looser operations is 
 the trend. I rely heavily on Matlab precedence of arithmetic versus :.
 
 For loops. When the right-hand-side is a range like 1:n then I use =. When 
 the r-h-s is an opaque object that we're iterating over, then I use in. 
 Examples:
 
 for i = 1:n
   # blah, blah
 end
 
 for obj in collection
   # blah, blah, blah
 end
 
 
 
 On Tue, Dec 31, 2013 at 10:01 AM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 One of the things that I really like about working with the Facebook codebase 
 is that all of the code was written to comply with a very thorough internal 
 style guideline. This prevents a lot of useless disagreement about code 
 stylistics and discourages the creation of unreadable code before anything 
 reaches the review stage.
 
 In an attempt to emulate that level of thoroughness, I decided to extend the 
 main Julia manual’s style guide by writing my own personal style guideline, 
 which can be found at https://github.com/johnmyleswhite/Style.jl
 
 I’d be really interested to know what others think of these rules and what 
 they think is missing. Right now, my guidelines leave a lot of wiggle room.
 
  — John
 
 



Re: [julia-users] Re: Style Guideline

2013-12-31 Thread John Myles White
I could see a couple of nice uses for having the ability to do block-local 
imports, but I’m not sure if that would solve the problems that (33) is meant 
to address, which is that using importall makes it to too easy to accidentally 
monkey-patch Base and that using import sometimes makes it hard to know the 
provenance of functions being extended by a module. The latter is way less 
problematic than the former: single function import has a lot of good use 
cases, even if it’s a bit too non-local for my taste. It’s only importall that 
makes things really hard to keep track of.

 — John

On Dec 31, 2013, at 3:54 PM, Brian Rogoff brog...@gmail.com wrote:

 IMO the main reason for (33) is that Julia presently lacks any local import 
 feature. At least a few languages with module systems add these; see for 
 instance OCaml 
 http://caml.inria.fr/pub/docs/manual-ocaml-4.01/extn.html#sec225 and also 
 Ada, which allows with/use inside of blocks.
 
 Is there a reason that a similar feature wouldn't work well with Julia too?
 
 -- Brian
 
 On Tuesday, December 31, 2013 7:01:23 AM UTC-8, John Myles White wrote:
 One of the things that I really like about working with the Facebook codebase 
 is that all of the code was written to comply with a very thorough internal 
 style guideline. This prevents a lot of useless disagreement about code 
 stylistics and discourages the creation of unreadable code before anything 
 reaches the review stage. 
 
 In an attempt to emulate that level of thoroughness, I decided to extend the 
 main Julia manual’s style guide by writing my own personal style guideline, 
 which can be found at https://github.com/johnmyleswhite/Style.jl 
 
 I’d be really interested to know what others think of these rules and what 
 they think is missing. Right now, my guidelines leave a lot of wiggle room. 
 
  — John 
 



Re: [julia-users] Style Guideline

2014-01-02 Thread John Myles White
Thanks everyone for the feedback. Going to try to synthesize responses this 
weekend. Been distracted by a major push to add more database support to Julia.

 — John

On Jan 2, 2014, at 6:02 AM, Keith Campbell keithcc1...@gmail.com wrote:

 +1 for Eric's proposal for a 100 character line length.  The old 80 character 
 limit in PEP 8 was the one bit of that guideline I could never abide.
 
 +1 also for Milan's proposal regarding brief comments describing a function's 
 purpose.  
 
 And thank you John for pushing a standard that can enable tooling support.  
 
 On Thursday, January 2, 2014 5:15:19 AM UTC-5, Marcus Urban wrote:
 Do people using Julia really like underscores that much? I find them 
 generally unsightly, and I do not plan to use them.



Re: [julia-users] Declaring types that are special cases of already existing composite types

2014-01-02 Thread John Myles White
Concrete types can't be subtyped because you need to know exactly how much 
memory space they occupy.

 -- John

On Jan 2, 2014, at 10:01 AM, Mauro mauro...@runbox.com wrote:

 Only abstract types can be subtyped (and if I recall correctly this is
 going to stay that way for some type-theory-reason).
 
 Further, at the moment abstract types cannot have fields, i.e. cannot be
 composite types.  However, this might change sometime, have a look at the
 issue:
 
 https://github.com/JuliaLang/julia/issues/4935
 
 (which is also referenced in the mailing list thread mentioned by tshort)
 
 Now, all of this does not help with your quest, sorry, but it may be of
 some interest.
 
 On Thu, 2014-01-02 at 16:23, Christian Groll wrote:
 My interest lies in the implementation of types that are special cases of 
 already existing composite types.
 
 For example, I want to implement a type Portfolio, which is just a 
 DataFrame with two additional requirements:
 - all columns are of numeric type
 - the sum of the entries in each row must be equal to 1
 
 Or, I want to implement a type TimeSeries, which is just a DataFrame where 
 the first column consists of dates. 
 
 I think the way to go here would be to implement some type of constraint 
 checking in the setindex! methods, although this would not prevent messing 
 with the entries by way of directly setting the fields of the type. 
 However, once there exist convenient getindex and setindex! methods, I hope 
 that basically nobody would mess with the values directly, and complete 
 immutability is not necessarily needed.
 
 What I am trying to achieve is something that I think is called inheritance 
 in other languages, where classes can simply be declared as subclasses to 
 already existing classes. 
 
 So the question is, whether something like this is possible in julia as 
 well? As far as I get it, there is no way to declare a type to be a subtype 
 of a composite type.
 Or, if this is not possible, is there any way around, like for example 
 declaring a type portfolio,
 
 type Portfolio
   weights::DataVector{Float64}
 end
 
 where I can simply relate all setindex! methods to the respective methods 
 of DataVector, adding constraint checks where necessary.
 
 function Base.setindex!(pf::Portfolio,
v::Any,
col_ind::ColumnIndex)
if check_constraints(pf, v, col_ind)
   setindex!(pf.weights, v, col_ind)
else
   error(constraints not fulfilled.)
end
 end
 
 However, I then would need some type of metaprogramming such that I do not 
 have to implement all the numerous setindex! methods of DataVector from 
 scratch up.
 



Re: [julia-users] Declaring types that are special cases of already existing composite types

2014-01-02 Thread John Myles White
Right now, there is no mechanism for doing the delegation I described earlier 
beyond simple macros like those I wrote up a long time ago.

For your use case, you would really want a method that wraps a given type in a 
new immutable type and then delegates all methods to the contained type unless 
they are explicitly overriden. Currently, that's not possible without a lot of 
legwork.

Since it seems like you mostly want to guarantee invariant properties of your 
data, you can just write functions that don't break those invariants when 
operating a standard DataFrame and then call them. The compiler won't give you 
provable guarantees that those invariants are never broken, but your code still 
will respect them. If the compiler gets new abilities, you could then easily 
upgrade your methods to refer to a new type that imposes the desired invariants.

 -- John

On Jan 2, 2014, at 2:00 PM, Christian Groll groll.christian@gmail.com 
wrote:

 Recapturing, it hence seems like julia does not support this feature - 
 although I must admit that I did not get all the details in the answers ;-) 
 
 Still, however, I would like to find some reasonable workaround to this 
 problem. In my opinion, the dataframe type should probably really cover 
 almost all cases of data storage in statistics / data analysis. Nevertheless, 
 I would very much like to be able to allow for some distinction between 
 different datasets. Hence, ideally I would like to have a type that behaves 
 almost exactly like a dataframe, while I am still able to overload certain 
 methods. For example, if I know that my dataset contains time series data, a 
 visualization plot(df::dataframe) should look different than a visualization 
 for geographic data on a map. Also, different datasets come with different 
 constraints: portfolio weights must sum to 1, correlations must be between -1 
 and 1, and so forth. Isn't there any way to reasonably implement this without 
 each time starting a new type from scratch?
 
 I was only calling it subtype because I somewhere stumbled upon the advice 
 that it could work with subtyping the AbstractDataFrame type, but I didn't 
 get this running. Any tips on whether / how this would work?
 
 Alternatively, I also found somewhere else a code snippet of John Myles White 
 about a redirect or delegate macro:
 macro redirect(t, comp, fname) 
 t = esc(t) 
 comp = esc(comp) 
 fname = esc(fname) 
 quote 
 ($fname)(a::($t), args...) = ($fname)(a.($comp), args...) 
 end 
 end 
 
 This at least could be a starting point to give a new type the behavior of a 
 dataframe. Is there any update on this macro?
 
 At last, I still do not get the memory problem with subtyping composite types 
 for my exact case. The subtypes that I would like to have do NOT have any 
 additional fields compared to their parent. They only shall help to allow 
 function dispatch and implementation of some constraints. A Portfolio type 
 still is nothing else than a dataframe, only that its values sum up to one. 
 You definitely need not further explain the memory issues here to me, because 
 I most likely do not understand them anyways. But are you really sure that 
 such a Portfolio type would have different memory requirements than a 
 dataframe? In effect, it should be nothing different, but only one special 
 case of all possible dataframes.
 
 On Thursday, 2 January 2014 16:53:18 UTC+1, Stefan Karpinski wrote:
 On Thu, Jan 2, 2014 at 10:01 AM, Mauro maur...@runbox.com wrote:
 Only abstract types can be subtyped (and if I recall correctly this is going 
 to stay that way for some type-theory-reason).
 
 It's not for a type theory reason – if anything, it's the opposite of a type 
 theory reason. If Float64 can be subtyped, then then an Array{Float64} can 
 hold objects of arbitrary size. Thus, you can't represent it as inline data, 
 but rather have to store the array as pointers to boxed, heap-allocated 
 values. Not only is this horribly inefficient (200% storage overhead on 
 64-bit machines), but it completely destroys interoperability with BLAS, 
 FFTW, etc.
 
 Some o.o. languages have allowed declaring types to be final as a way of 
 dealing with this issue (you also need immutability and/or value types to 
 fully solve the array storage problem). After a few decades of real-world 
 o.o. programming, however, the best practice that's emerged is that you 
 should only subtype intentional supertypes – types that are very carefully 
 designed to be subtypeable. Where a classically o.o. language might do Ac : 
 Bc, where Ac and Bc are both concrete and Ac is a supertype of Bc, in Julia 
 you would have Aj' : Aj, Bj where the abstract aspect of Ac is distilled 
 into the purely abstract type, Aj', while the concrete aspect of Ac is 
 implemented by Aj, which is a sibling of Bj instead of its parent. I've found 
 that while this requires a slight shift in thinking, the resulting

Re: [julia-users] A small performance puzzle

2014-01-04 Thread John Myles White
Hopefully Jeff will chime in (or someone else with the required expertise), but 
I’ve heard Jeff warn against splatting tuples lots of times.

 — John

On Jan 4, 2014, at 4:44 PM, Milan Bouchet-Valat nalimi...@club.fr wrote:

 Hi!
 
 I'd like propose you a small game about performance. In the following gist, I 
 provide three very similar short functions; the first one allocates much more 
 memory and is much slower than the two others. Can someone find an 
 explanation? ;-)
 
 https://gist.github.com/nalimilan/8261056
 
 The real-world scenario is again building a frequency table. I discovered 
 that when doing a = zeros(Int, dims) I really had to make dim a tuple rather 
 than an array, which forces me to use two versions of the same data, one in 
 each type
 
 Thanks for the help!



Re: [julia-users] Add packages as root?

2014-01-05 Thread John Myles White
I believe there is a Julia environment variable that lets you control where 
packages will be located, but I can’t seem to recall what it is. If you knew 
that variable, you could have every user specify in the .juliarc that packages 
should be loaded from this alternative location.

 — John

On Jan 5, 2014, at 6:46 AM, Alasdair McAndrew amc...@gmail.com wrote:

 I can install packages as myself; that's fine.  I'm just wondering if they 
 can be installed centrally, so as to be available to all users.



[julia-users] Ambiguity warnings re. Diagonal{T}

2014-01-05 Thread John Myles White
Anyone have a sense why Diagonal{T} is now ambiguous with DataArray, but only 
for subtraction?



Re: [julia-users] Interpreting flat format profiler reports

2014-01-06 Thread John Myles White
I’m not really sure. My not totally informed sense is that this is likely to be 
generally slow since you have to check the type every time to determine what 
the inner field means.

But someone with more knowledge of Julia internals would need to confirm that.

 — John

On Jan 6, 2014, at 10:34 AM, Brendan O'Connor breno...@gmail.com wrote:

 On Sunday, January 5, 2014 8:54:08 PM UTC-5, John Myles White wrote:
 Looking at this now, what are the types of the variables on the bolded lines? 
 If they’re specific real-valued types, I’m surprised they’re so slow. 
 
 They're all Float64, or at least, they should be.  code_typed() says they are:
 
 w0 = 
 /(-(+(getindex(top(getfield)(n0,:counts),wordID::Int64),top(box)(Float64,top(div_float)(betaHere::Float64,top(box)($(Float64),top(sitofp)($(Float64),V::Int64))::Float64))::Float64),on0::Int64),-(+(top(getfield)(top(getfield)(n0,:counts),:total),betaHere::Float64),top(box)($(Int64),top(zext_int)($(Int64),on_cur::Bool))::Int64))
  # line 338:
 w1 = 
 /(-(+(getindex(top(getfield)(n1,:counts),wordID::Int64),top(box)(Float64,top(div_float)(betaHere::Float64,top(box)($(Float64),top(sitofp)($(Float64),V::Int64))::Float64))::Float64),on1::Int64),-(+(top(getfield)(top(getfield)(n1,:counts),:total),betaHere::Float64),top(box)($(Int64),top(zext_int)($(Int64),on_cur::Bool))::Int64))
  # line 339:
 p0 = 
 -(+(top(getfield)(top(getfield)(cur_docnode,:left),:count),top(box)(Float64,top(div_float)(top(getfield)(mm::TreeTM,:gammaConc)::Float64,top(box)($(Float64),top(sitofp)($(Float64),2))::Float64))::Float64),on0::Int64)
  # line 340:
 p1 = 
 -(+(top(getfield)(top(getfield)(cur_docnode,:right),:count),top(box)(Float64,top(div_float)(top(getfield)(mm::TreeTM,:gammaConc)::Float64,top(box)($(Float64),top(sitofp)($(Float64),2))::Float64))::Float64),on1::Int64)
  # line 341:
 q = /(*(p1,w1),+(*(p1,w1),*(p0,w0))) # line 343:
 
  
 Right now it’s not possible to have an abstract type with fields, so field 
 lookup should be pretty consistently fast. 
 
 What I meant was:
 
   abstract A
   type B : A
 x
   end
   type C: A
 y
   end
 
 ... and you have a datastructure typed A, but actually contains a mix of B's 
 and C's.  You have code that knows it's always accessing type B, and accesses 
 the x field.
 Under what circumstances is this fast?
 
 -Brendan
 
 
 
 



Re: [julia-users] Ambiguity warnings re. Diagonal{T}

2014-01-06 Thread John Myles White
Agreed: the current ambiguity system has some unfortunate properties.

 — John

On Jan 6, 2014, at 1:41 PM, Dahua Lin linda...@gmail.com wrote:

 In base/linalg/diagonal.jl: line 24 - 27
 
 it defines the following functions:
 
 + (Diagonal, Diagonal)
 - (Diagonal, Diagonal)
 - (Diagonal, AbstractMatrix)
 - (AbstractMatrix, Diagonal)
 
 So when you write Diagonal - DataMatrix, the compiler doesn't know which 
 method to use. 
 
 But for +, there is not such a problem ...
 
 I don't know why they don't define + (Diagonal, AbstractMatrix) and + 
 (AbstractMatrix, Diagonal) 
 
 I think these things need a serious cleanup.
 
 - Dahua
 
 
 
 On Sunday, January 5, 2014 11:13:27 AM UTC-6, John Myles White wrote:
 Anyone have a sense why Diagonal{T} is now ambiguous with DataArray, but only 
 for subtraction? 
 



Re: [julia-users] Announcing AudioIO.jl - Simple Audio I/O for Julia

2014-01-07 Thread John Myles White
I don’t have homebrew on my general system, so I’ll just wait a bit.

 — John

On Jan 6, 2014, at 10:13 PM, Spencer Russell s...@mit.edu wrote:

 Ah, currently I don't have the Homebrew.jl support working properly. I 
 haven't dug into it deeply yet, but it looks like I'll need to put together a 
 custom formula that will download a portaudio binary, and request that it be 
 added to https://github.com/staticfloat/homebrew-juliadeps.
 
 For now you can do a brew install portaudio
 
 -s
 
 
 On Mon, Jan 6, 2014 at 9:00 PM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 This sounds really awesome. When I try to install it on OS X, I get the 
 following error:
 
 ===[ ERROR: AudioIO 
 ]===
 
 None of the selected providers can install dependency libportaudio
 while loading /Users/johnmyleswhite/.julia/AudioIO/deps/build.jl, in 
 expression starting on line 20
 
 
 
  — John
 
 On Jan 5, 2014, at 9:43 PM, Spencer Russell s...@mit.edu wrote:
 
  Code and details at:
  https://github.com/ssfrr/AudioIO.jl
 
  Currently supporting OSX and Linux.
 
  AudioIO is a Julia library for interfacing to audio streams, which include 
  playing to and recording from sound cards, reading and writing audio files, 
  sending to network audio streams, etc. Currently only playing to the sound 
  card through PortAudio is supported. It is under heavy development, so the 
  API could change, there will be bugs, there are important missing features. 
  That said, the basic API for playing back vectors of audio should work fine 
  and that API should not change.
 
  For instance, to play 1 second of noise through your sound card, it's as 
  easy as:
  julia v = rand(44100) * 0.1
  julia play(v)
 
 
  If you have any problems, please open an Issue on the github page. Also 
  don't hesitate to email the list and/or me.
 
 



Re: [julia-users] Julia and Python languages

2014-01-08 Thread John Myles White
I think part of the appeal of dot-notation OO is that it reads left-to-right, 
which helps to make the code seem to read in the same order as the sequence of 
actions taken.

 — John

On Jan 8, 2014, at 7:45 AM, Tobias Knopp tobias.kn...@googlemail.com wrote:

 Would be interesting to see some use cases where Java-like OO better fits 
 than Julias OO. In C++ one can use both and usually choses based on whether 
 the dispatching can be done at runtime or at compile time (i.e. classes with 
 virtual function for runtime decisions and templates for compile time 
 decisions).
 There are many situations where I would have liked to use generic programming 
 in C++ but it was not possible as the type was only known at runtime. In 
 Julia this is no issue which makes it such a joy to use.
 
 
 Am Mittwoch, 8. Januar 2014 14:17:20 UTC+1 schrieb Stefan Karpinski:
 It's a bit hard to say whether Julia is object-oriented or not. I suspect 
 that for a lot of people, object-oriented means do you write `x.f(y)` a 
 lot? By that metric, Julia is not very object oriented. On the other hand, 
 everything you can do with single-dispatch o.o. in C++ or Java, you can 
 easily simulate with multiple dispatch, but you'll have to get used to 
 writing `f(x,y)` instead of `x.f(y)`. If your notion of object-orientation 
 has more to do with encapsulation and/or message passing, then we start to 
 look pretty non-o.o. again.
 
 
 On Wed, Jan 8, 2014 at 5:25 AM, Matthias BUSSONNIER bussonnie...@gmail.com 
 wrote:
 
 Le 7 janv. 2014 à 21:48, Erik Engheim a écrit :
 
 Thanks for the nice comments all of you. I guess I have to keep writing more 
 about my Julia experiences after this ;-)
 
 On Tuesday, January 7, 2014 9:39:05 PM UTC+1, Ivar Nesje wrote:
 Great post, it sums up very well the things I think is the strengths of 
 Julia.
 
 A few notes:
 Julia does not look up the method at runtime if the types of the arguments 
 to the function can be deduced from the types of the arguments to the 
 surrounding function (but it behaves that way for the user, unless he 
 redefines the method after the function was compiled #265).
 
 
 That is cool I didn't know that. I assume this can make quite a big 
 difference in performance for tight inner loops. 
 
 
 Some misc comment too :
 
  Julia is not object oriented
 
 Is that True ? From the manual :
 
   It is multi-paradigm, combining features of imperative, functional, and 
  object-oriented programming.
 
 I consider that Julia can be OO, the code just look different than in other 
 languages.
 
 
 Typo ?
  Polymorphis lets you
 Missing m ?
 
 Liked the blog post too otherwise thanks, I would also have mentioned 
 code_lowered, code_llvm and  code_typed
 not everyone is fluent assembler and those tool are really useful to, 
 especially in metaprogramming.
 
 -- 
 M
 



Re: [julia-users] API inconsistency in `ismatch` vs `contains`

2014-01-09 Thread John Myles White
It depends entirely on how you interpret match. To me, the string is a match 
for the pattern, rather than the pattern being a match for the string.

 -- John

On Jan 9, 2014, at 4:50 PM, Mike Nolta m...@nolta.net wrote:

 Conceptually, a regex is a set of strings, so i don't see the inconsistency.
 
 -Mike
 
 On Thu, Jan 9, 2014 at 7:40 PM, John Myles White
 johnmyleswh...@gmail.com wrote:
 It would break a bunch of code, but I also think ismatch(string, regex) 
 would make sense than the current design.
 
 -- John
 
 On Jan 9, 2014, at 4:39 PM, Daniel Carrera dcarr...@gmail.com wrote:
 
 Hello,
 
 The functions `ismatch` and `contains` do similar things. Therefore, I 
 think they should have a consistent API. Currently they receive parameters 
 in reverse order:
 
 ismatch( rfoo , haystack )
 contains( haystack, foo )
 
 I always forget which one goes in which direction so I have to look it up. 
 Since Julia is still a young language, I was wondering if there is any 
 interest in reviewing to API to help ensure consistency between similar 
 functions.
 
 Cheers,
 Daniel.
 



[julia-users] Repeating names in inner constructors?

2014-01-11 Thread John Myles White
I’ve noticed that a lot of people to use different field names when writing 
inner constructors, so that you see code like:

type Foo
a::Int

function Foo(alpha::Int)
magic(alpha)
new(alpha)
end
end

Would this ever be necessary to avoid confusion about names? I’ve started 
reusing the exact field name and it seems to work fine. Am I going to run into 
a subtle bug?

 — John



Re: [julia-users] Repeating names in inner constructors?

2014-01-11 Thread John Myles White
Great. That is really nice.

 — John

On Jan 11, 2014, at 5:34 PM, Stefan Karpinski stefan.karpin...@gmail.com 
wrote:

 Nope. This is one of the nice things about the design.
 
 On Jan 11, 2014, at 8:16 PM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 
 I’ve noticed that a lot of people to use different field names when writing 
 inner constructors, so that you see code like:
 
 type Foo
   a::Int
 
   function Foo(alpha::Int)
   magic(alpha)
   new(alpha)
   end
 end
 
 Would this ever be necessary to avoid confusion about names? I’ve started 
 reusing the exact field name and it seems to work fine. Am I going to run 
 into a subtle bug?
 
 — John
 



Re: [julia-users] Incorrect behaviour defining methods with different argument definitions in separate modules? (using method in module Main conflicts with an existing identifier)

2014-01-12 Thread John Myles White
Freddy,

This is definitely one of the more confusing things about Julia, but it’s the 
best current solution anyone has proposed.

The problem with your example is that methods can only be extended to work on 
new types if you make their provenance clear. In your example, you would do 
something like the following:

module A
export f
f(s::String) = Some operation with a String;
end 
module B
A.f(b::Bool) = Some operation with a boolean;
end

Absent an explicit qualification of the origin of the “f” name in module B, 
Julia assumes that the f method in B is totally unrelated, which effectively 
overwrites the f method in A.

 — John

On Jan 12, 2014, at 9:48 AM, Freddy Snijder fre...@visionscapers.com wrote:

 Hello Julia Users,
 
 I'm new to Julia and came across some behaviour of Julia, related to methods, 
 I didn't expect.
 
 Case A) In the REPL, when I define two methods, I get the behaviour I expect:
 
 julia f(s::String) = Some operation with a String;
 julia f(b::Bool) = Some operation with a boolean;
 julia f
 f (generic function with 2 methods)
 So far, so good.
 
 Case B) Now if I have a file with this code and load it in to a fresh REPL 
 session (using 'include'):
 
 module A
 export f
 f(s::String) = Some operation with a String;
 end 
 module B
 export f
 f(b::Bool) = Some operation with a boolean;
 end
  then, when stating 'using A' and 'using B', I get a warning that there is a 
 conflict with an existing f:
 
 julia using A
 julia f
 f (generic function with 1 method)
 julia using B
 Warning: using B.f in module Main conflicts with an existing identifier.
 julia f
 f (generic function with 1 method) 
 I would have expected that Julia would see this as the same method with two 
 different argument definitions, just  like in Case A).
 I have multiple modules that define the same methods for different composite 
 types, which seems a normal way of working to me.
 
 What am I doing wrong? The way Julia currently handles this seems incorrect 
 to me ...
 
 I'm interested to hear your input!
 
 Kind regards,
 
 Freddy
 
 PS : I'm on Julia Version 0.3.0-prerelease+584 (2013-12-19 22:26 UTC), Commit 
 06458fa* (2 days old master), x86_64-apple-darwin13.0.0
 
 



Re: [julia-users] ANN: LibGit2 bindings

2014-01-12 Thread John Myles White
This seems really awesome. Amazing work, Jake!

 — John

On Jan 11, 2014, at 9:56 PM, Jake Bolewski jakebolew...@gmail.com wrote:

 Link https://github.com/jakebolewski/LibGit2.jl
 
 On Sunday, January 12, 2014 12:55:27 AM UTC-5, Jake Bolewski wrote:
 Hi everyone,
 
 I've been working on LibGit2 bindings for julia over the past month or so, 
 steadily porting over the the test suite from Ruby's rugged library.
 Allmost all of the tests have been rewritten and are now passing.  Most of 
 the testing has been done on the development branch of the libgit library and 
 on Linux.
 Please run the test suite and submit an issue if (when) it breaks on your 
 system.
 
 Hopefully once this matures some more it will enable Pkg to be be rewritten 
 using libgit. 
 See: https://github.com/JuliaLang/julia/issues/4158, 
 https://github.com/JuliaLang/julia/pull/4866
 
 If you have any spare cycles please help!  The api could be refactored quite 
 a bit.  Hopefully this is a good base to work from.
 
 Best,
 Jake 
 



Re: [julia-users] New method definition not being picked up

2014-01-12 Thread John Myles White
This is one of the main outstanding quirks about Julia that will get resolved 
at some point in the nearish future.

See https://github.com/JuliaLang/julia/issues/265 for more details.

 — John

On Jan 12, 2014, at 4:02 PM, Andrew Burrows burro...@gmail.com wrote:

 Hi
 
 I'm rather new to Julia, but I've come across some rather puzzling behaviour 
 of the language.
 
 The following code works fine and the assert passes:
 
 a(x) = 12345
 b(x) = a(x)
 a(x::Int64) = 1000
 @assert b(1)== 1000
 
 But this near identical code does not, throwing an assertion error:
 
 a(x) = 12345
 b(x) = a(x)
 b(1) # --- This line is new
 a(x::Int64) = 1000
 @assert b(1)== 1000
 
 It would seem that the definition of a(x) is being cached but in both cases 
 this assert passes fine:
 
 @assert a(1) == 1000
 
 Also this almost identical code works fine:
 
 a(x) = 12345
 b(x) = a(x)
 a(1) # --- This line is now calling a not b
 a(x::Int64) = 1000
 @assert b(1)== 1000
 @assert a(1)== 1000
 
 Is this behaviour a bug or is it by design? Am I doing something wrong or is 
 there something I can do to disable what ever is caching my method definition 
 or is there any way to work around it?
 
 Cheers
 Andy



Re: [julia-users] A few questions I couldn't answer by myself

2014-01-12 Thread John Myles White
1. I think this is not possible, but I might be wrong.

2. Tuples have gotten a lot more efficient recently. Others will have to 
comment more on their relative merits vs. immutable composite types, which I 
prefer for explicitness and simpler integration with the dispatch system.

3. No idea about this. 90 MB isn’t much of an issue for the kind of work I do.

4. Blah{A} is a family of types, each of which is different for a specific 
value of A. The untyped version type Blah; a; end has a single type: its `a` 
field always has type Any, which is never tightened in response to data.

5. I think this is possible, but don’t know for sure.

6. Documentation is a major issue that should move forward in the next few 
months. Right now it is not possible to integrate your own functions with the 
help system.

Hope that helps. Others will probably expand on my answers.

 — John

On Jan 12, 2014, at 6:39 PM, Andy M 0andrewmart...@gmail.com wrote:

 I've been following and experimenting with Julia for a little while now, and 
 I have encountered questions that I haven't managed to answer by reading or 
 googling. An answer to any of them would be greatly appreciated.
 
 1. Is there any way to retrieve the location of the definition of a variable 
 or a type? I know I can use @which to find method definitions, but that's all 
 I know how to find.
 
 2. Are tuples are less memory efficient than immutable composite types? If 
 so, why is this? I got the impression that they are after reading various 
 different articles and comments, so maybe I have just misunderstood something.
 
 3. Why is Julia's memory usage so high? When I open the interpreter (in 
 linux) it stabilises around 90MB. If I call Pkg.installed(), it jumps to 
 165MB, and stays there. Calling gc() doesn't reduce it either. Is it an 
 inevitable consequence of the language's design? Or perhaps an issue that is 
 being worked on? Or is it just not that important to the language's target 
 users?
 
 4. The documentation suggests that type Blah; a; end is less efficient than 
 type Blah{A}; a::A; end. If so, why does the former not default to the 
 behaviour of the latter? Is it to avoid excess code generation? Or perhaps 
 the latter representation has some undesirable behaviour?
 
 5. Is it currently possible to pass a struct to a C function? I found 
 documentation saying that it isn't possible, but there are github issues 
 which suggest the problem has been worked on.
 
 6. Is there a way to document a function, method, type, variable or module, 
 such that the documentation can be retrieved in the interpreter? I mean 
 something like javadocs or python docstrings. If not, is something like this 
 going to be added?
 
 Sorry for asking so many questions all at once. I am considering starting 
 quite a big project in Julia, and I think my timezone has made it difficult 
 to find help in the IRC channel.



Re: [julia-users] Edit Distance?

2014-01-14 Thread John Myles White
Thanks!

I'll have to check that out. I was able to translate some of the Wikipedia code 
fast enough to get something working for my purposes.

 -- John

On Jan 14, 2014, at 3:18 PM, Matthias BUSSONNIER bussonniermatth...@gmail.com 
wrote:

 
 Le 14 janv. 2014 à 15:08, John Myles White a écrit :
 
 Is there a package out there to compute edit distances between strings?
 
 I started at some point, never really finished. 
 
 https://github.com/carreau/Diff.jl
 
 -- 
 M
 
 
 -- John
 
 



Re: [julia-users] Can I write a macro that defines a function?

2014-01-14 Thread John Myles White
To be honest, I don’t fully understand what goes wrong here, but this way of 
doing it does work:

macro bar(num)
ex = Expr(:(=), esc(Expr(:call, :foo, :x)), esc(num))
return ex
end

@bar 5

foo(1)

I suspect that, in your example, there’s an attempt to evaluate the 
sub-expressions in the wrong scope. For example, this code shouldn’t (and 
doesn’t) work:

macro bar(num)
ex = Expr(:(=), Expr(:call, :foo, :x), esc(num))
return ex
end

@bar 5

foo(1)

 — John

On Jan 14, 2014, at 4:47 PM, Eric Davies iam...@gmail.com wrote:

 julia macro bar(num)
  :(foo(x) = $num)
end
 
 julia @bar 5
 foo#27 (generic function with 1 method)
 
 julia foo
 ERROR: foo not defined
 
 It appears that variable/function definitions in macros are mangled somehow. 
 Is there any way to define a function or set a variable in a macro (s.t. that 
 definition/assignment occurs in the calling scope)?



Re: [julia-users] Re: A few questions I couldn't answer by myself

2014-01-14 Thread John Myles White
I think a new Python interpreter session might not be the closest comparison 
for Julia since Python loads almost nothing by default, whereas Julia imports a 
ton of functionality by default. R is much more like Julia in this regard. 
Consistent with that hypothesis, on my machine, R uses 38 MB and Julia uses 41 
MB. I suspect that Julia without most of its functionality could take up much 
less memory.

 — John

On Jan 14, 2014, at 6:56 PM, Andy M 0andrewmart...@gmail.com wrote:

 Thank you all very much for your answers, they have been extremely helpful!
 
 In summary, it seems like there is a fairly clear answer to all but one 
 question, which is the question about Julia's memory usage. I am still 
 puzzled by what it is actually being used for. For comparison, if I start a 
 python interpreter it uses less than 5MB of RAM. I expected Julia's code 
 generation to consume more memory than an interpreter, but I did not expect 
 it to be anywhere near that much.
 
 I suppose the issue with Tuple/immutable type performance also isn't 
 completely clear.
 
 Andy, would you be willing to collect the responses you found helpful and add 
 them to the FAQ? 
 https://github.com/JuliaLang/julia/blob/master/doc/manual/faq.rst 
 You can just click edit on that page, no need to explicitly deal with git.
 I've been keeping track of many questions that I have found answers to (not 
 just those answered here), and I've been writing it all up in a desktop wiki. 
 I'm not sure what would be best to add to the FAQ at this point, but I am 
 hoping to work out a good format for sharing my experiences soon. Either by 
 contributing to the FAQ, by writing a blog post, or both.



Re: [julia-users] Re: Julia computational efficiency vs C vs Java vs Python vs Cython

2014-01-15 Thread John Myles White
The arguments against changing are pretty strong, but I’d really like it if 
Julia did a bit less automatic promotion. For example, it would be great if 
sum(x::T…) returned a value of type T.

 — John

On Jan 15, 2014, at 5:32 AM, Stefan Karpinski ste...@karpinski.org wrote:

 We already provide all the necessary intrinsics for 32-bit arithmetic, so 
 it's pretty easy to write a module that redefines arithmetic operations on 
 integers to do this, but it's definitely some work that would need to be 
 done. I'd be in favor of having this option. The biggest issue is that it 
 would only apply in lexical scope. I.e. even if Int16 + Int16 = Int16, you'd 
 still have sum(Int16[]) = Int, since sum is defined in Base. So to get the 
 full effect, this would probably need to be a global switch, at which point 
 you're really just talking about running in 32-bit mode on a 64-bit system.
 
 
 On Wed, Jan 15, 2014 at 5:32 AM, Miles Lubin miles.lu...@gmail.com wrote:
 Just to throw in my two cents, I don't think it's the right approach to brush 
 off a class of performance optimizations that has a valid use case in 
 practice and can lead to a 4x speedup. There should at least be *some* way to 
 access nonpromoting integer operations, even if the default operators do 
 promote.
 



Re: [julia-users] New install on OSX 10.9 GLM package does not work

2014-01-15 Thread John Myles White
We've unfortunately done a bad job of keeping those packages compatible with 
0.2. I'll try to fix as much as I can today.

 -- John

 On Jan 15, 2014, at 8:48 AM, Corey Sparks corey.sparks.u...@gmail.com wrote:
 
 Dear List,
 I just installed Julia 0.2.0 last night and was trying to get the GLM package 
 going, when I try to load it and the RDatasets packages, I get:
 
 julia using RDatasets, GLM
 
 Warning: could not import Base.foldl into NumericExtensions
 
 Warning: could not import Base.foldr into NumericExtensions
 
 Warning: could not import Base.sum! into NumericExtensions
 
 Warning: could not import Base.maximum! into NumericExtensions
 
 Warning: could not import Base.minimum! into NumericExtensions
 
 Warning: could not import Base.PAIRWISE_SUM_BLOCKSIZE into NumericExtensions
 
 ERROR: TernaryFunctor not defined
 
  in include at boot.jl:238
 
  in include_from_node1 at loading.jl:114
 
  in include at boot.jl:238
 
  in include_from_node1 at loading.jl:114
 
  in reload_path at loading.jl:140
 
  in _require at loading.jl:58
 
  in require at loading.jl:43
 
 at /Users/ozd504/.julia/GLM/src/lm.jl:22
 
 at /Users/ozd504/.julia/GLM/src/GLM.jl:76
 
 
 
 It looks like something in GLM is broken, does anyone have advice on this?
 
 Thank you
 
 Corey


Re: [julia-users] Julia computational efficiency vs C vs Java vs Python vs Cython

2014-01-15 Thread John Myles White
+1 for Iain’s point of view.

 — John

On Jan 15, 2014, at 5:16 PM, Iain Dunning iaindunn...@gmail.com wrote:

 From a philosophical POV alone, I think its inconsistent that we
 a) Don't save people from overflows, but
 b) Silently do Int32 math as Int64 behind the scenes to presumably save 
 themselves from themselves
 
 I think the overflow behaviour suprises some people, but only because they've 
 been trained on Python etc. instead of C, but the Int32 behaviour would 
 surprise pretty much everyone given how Julia normally acts (as the manual 
 says, its falls into the more no automatic coversion family of languages)
 
 On Wednesday, January 15, 2014 4:28:15 PM UTC-5, Földes László wrote:
 Sorry for the wrong info, I was switching between a 32 bit and a 64 bit 
 machine (SSH terminal), and I just happened to run the script on the 32 bit 
 machine...
 
 On Wednesday, January 15, 2014 12:37:07 AM UTC+1, Przemyslaw Szufel wrote:
 Foldes,
 
 I went for your solution and got a time increase from 
 2.1 seconds (64bit integers) to 17.78 seconds (32 bit dow-casting). 
 Seems like casting is no cheap...
 
 Any other ideas possibilities?
 
 All best,
 Przemyslaw 
 
 P.S. 
 Naturally I realize that this is toy example and normally in a typical 
 production code we would rather use real numbers for computations not ints.
 I am asking just out of curiosity ;-)
 
 
 On Wednesday, 15 January 2014 00:25:20 UTC+1, Földes László wrote:
 You can force the literals by enclosing them in int32():
 
 p = [int32(0) for i=1:2]
 result = [int32(0) for i=1:2]   
 k = int32(0)
 n = int32(2)
 while k  int32(2)
 i = int32(0)
 
 
 
 On Wednesday, January 15, 2014 12:04:23 AM UTC+1, Przemyslaw Szufel wrote:
 Simon,
 Thanks!
 I changed in Cython to 
 def primes_list(int kmax):
 cdef int k, i
 cdef long long n
 cdef long long p[2]
 and now I am getting 2.1 seconds - exactly the same time as Julia and Java 
 with longs...
 
 Since the computational difference between 64bit longs and 32bit ints is soo 
 high - is there any way to rewrite my toy example to force Julia to do 32 bit 
 int calculations?
 
 All best,
 Przemyslaw Szufel
 
 
 On Tuesday, 14 January 2014 23:55:12 UTC+1, Simon Kornblith wrote:
 In C long is only guaranteed to be at least 32 bits (IIRC it's 64 bits on 
 64-bit *nix but 32-bit on 64-bit Windows). long long is guaranteed to be at 
 least 64 bits (and is 64 bits on all systems I know of).
 
 Simon
 
 On Tuesday, January 14, 2014 5:46:04 PM UTC-5, Przemyslaw Szufel wrote:
 Simon,
 Thanks for the explanation!
 In Java int is 32 bit as well. 
 I have just replaced ints with longs in Java and found out that now I get the 
 Java speed also very similar to Julia. 
 
 However I tried in Cython:
 def primes_list(int kmax):
 cdef int k, i
 cdef long n
 cdef long p[2]
 ...
 
 and surprisingly the speed did not change...at first I thought that maybe 
 something did not compile or is in cache - but I made sure - it's not the 
 cache. 
  Cython speed remains unchanged regardles using int or long? 
 I know that now it becomes other language question...but maybe someone can 
 explain?
 
 All best,
 Przemyslaw Szufel
 
 
 On Tuesday, 14 January 2014 23:29:40 UTC+1, Simon Kornblith wrote:
 With a 64-bit build, Julia integers are 64-bit unless otherwise specified. In 
 C, you use ints, which are 32-bit. Changing them to long long makes the C 
 code perform similarly to the Julia code on my system. Unfortunately, it's 
 hard to operate on 32-bit integers in Julia, since + promotes to 64-bit by 
 default (am I missing something)?
 
 Simon
 
 On Tuesday, January 14, 2014 4:32:16 PM UTC-5, Przemyslaw Szufel wrote:
 Dear Julia users,
 
 I am considering using Julia for computational projects. 
 As a first to get a feeling of the new language a I tried to benchmark Julia 
 speed against other popular languages.
 I used an example code from the Cython tutorial: 
 http://docs.cython.org/src/tutorial/cython_tutorial.html [ the code for 
 finding n first prime numbers]. 
 
 Rewriting the code in different languages and measuring the times on my 
 Windows laptop gave me the following results:
 
 Language | Time in seconds (less=better)
 
 Python: 65.5
 Cython (with MinGW): 0.82
 Java : 0.64
 Java (with -server option) : 0.64
 C (with MinGW): 0.64
 Julia (0.2): 2.1
 Julia (0.3 nightly build): 2.1
 
 All the codes for my experiments are attached to this post (Cython i Python 
 are both being run starting from the prim.py file)
 
 The thing that worries me is that Julia takes much much longer than Cython ,,,
 I am a beginner to Julia and would like to kindly ask what am I doing wrong 
 with my code. 
 I start Julia console and use the command  include (prime.jl) to execute it.
 
 This code looks very simple and I think the compiler should be able to 
 optimise it to at least the speed of Cython?
 Maybe I my code has been written in non-Julia style way and the compiler has 
 

Re: [julia-users] duplicate a type

2014-01-17 Thread John Myles White
I don’t know offhand how to do this, but I’d look at the code for xdump, which 
shows that the necessary introspection operations exist:

Foo::DataType  : Any
  a::Int64::DataType  : Signed
  b::Float64::DataType  : FloatingPoint

 — John

On Jan 17, 2014, at 10:16 AM, Simon Byrne simonby...@gmail.com wrote:

 I want to define a new composite type with exactly the same fields as another 
 type. Is there an easy way to do this? The original type is not parametric. 
 
 Alternatively, is there a way I can figure out the type of a field of a 
 composite type Foo without constructing an object of type Foo?
 
 -Simon



[julia-users] DataFrames / DataArrays updated

2014-01-18 Thread John Myles White
As a consequence of renaming Stats to StatsBase, I’ve had to update DataFrames 
and DataArrays.

This means that everyone working with those libraries is now in sync with 
master again. That brings with it a lot of changes that may break some code.

To help minimize breakage, here are the most obvious changes that might affect 
you.

(1) We now offer @data / @pdata macros to write out literal DataArrays and 
PooledDataArrays. They need a little bit more refinement to deal with edge 
cases, but they’re a big improvement over the previous system. Examples of 
usage below:

@data [1, 2, NA, 4]
@data [1 2; NA 4]

@pdata [1, 2, NA, 4]
@pdata [1 2; NA 4]

You can also do this with variables (as long as they’re not NA’s):

a, b, c, d = 1, 2, 3, 4
@data [a, b, c, d]
@data [a b; c d]

The unfortunate edge case is that the following will fail:

a, b, c, d = 1, 2, 3, NA
@data [a, b, c, d]
@data [a b; c d]

(2) To convert other AbstractArrays to DataArrays / DataFrames, please use the 
data and pdata functions:

data([1, 2, 3, 4])
data(1:3)

pdata([1, 2, 3, 4])
pdata(1:3)

We’ve removed a lot of the constructors for DataArrays and PooledDataArray’s 
that had no parallel to anything in Base, where there are very few valid 
constructors for Array’s. If you use things like DataArray(1:10), it will be 
broken now. Please switch to using the data() function.

 — John



[julia-users] UTF8 byte indexing

2014-01-18 Thread John Myles White
I suspect I’m missing something, but this seems odd to me:

julia s = string('ñ')
ñ

julia s[2]
ERROR: invalid UTF-8 character index

julia s[2:2]
“

 — John




Re: [julia-users] How to install a Package from a github branch

2014-01-20 Thread John Myles White
My fork of SQLite is very different from master. It represents most of my work 
pushing for Julia to have a DBI module that lets us write generic code for 
database access.

I’m hoping to finish my work on writing a DBI package plus drivers for SQLite 
and MySQL very soon. I would hold off on using my fork until there’s an 
official release.

 — John

On Jan 20, 2014, at 8:13 AM, Stefan Karpinski ste...@karpinski.org wrote:

 That did successfully install the package. However, as per the documentation 
 for Pkg.clone, it did so under the package name jmw. Did you mean for the 
 second argument to be a branch name? You can checkout a specific branch after 
 cloning the package using the Pkg.checkout command. Also, SQLite is an 
 official, registered package, so installing it via Pkg.clone is a bit 
 unusual. Do you need John's fork for some particular reason?
 
 
 On Mon, Jan 20, 2014 at 8:10 AM, Sharmila Gopirajan Sivakumar 
 sharmila.gopira...@gmail.com wrote:
 Hi,
 I want to install a package from a github branch, specifically, 
 https://github.com/johnmyleswhite/SQLite.jl/tree/jmw .  I tried the following 
 command
 
 Pkg.clone(https://github.com/johnmyleswhite/SQLite.jl.git;, jmw)
 INFO: Cloning jmw from https://github.com/johnmyleswhite/SQLite.jl.git
 INFO: Computing changes...
 INFO: No packages to install, update or remove.
 
 Julia is not able to install the package.  Is it possible to locally checkout 
 the code and install from source?  
 
 Thank you.
 
 Regards,
 Sharmila
 



Re: [julia-users] Re: Higher order derivatives in Calculus

2014-01-20 Thread John Myles White
I would love to see lots of improvements in the Calculus package. The interface 
is kind of wonky and there’s probably a lot of places where we’re getting less 
than ideal results.

But I currently own far too many of Julia’s packages at the moment. If other 
people want to take some of them over, it will radically improve my life. As 
things stand, it’s literally impossible for me to keep up with the workload 
that package maintenance would involve.

 — John

On Jan 20, 2014, at 10:54 AM, Ivar Nesje iva...@gmail.com wrote:

 The calculus package could definitely be much better if someone with knowhow 
 and time would improve it. Unfortunately it seems like @johnmyleswhite does 
 not maintain this package anymore, and nobody has taken up the ball. 
 
 
 kl. 19:40:28 UTC+1 mandag 20. januar 2014 skrev Hans W Borchers følgende:
 I looked into the Calculus package and its derivative functions. First, I got 
 errors when running examples from the README file:
 
 julia second_derivative(x - sin(x), pi)
 ERROR: no method eps(DataType,)
  in finite_difference at 
 /Users/HwB/.julia/Calculus/src/finite_difference.jl:27
  in second_derivative at /Users/HwB/.julia/Calculus/src/derivative.jl:67
 
 Then I was a bit astonished to see not too accurate results such as
 
 julia abs(second_derivative(sin, 1.0) + sin(1.0))
 6.647716624952338e-7
 
 while, when applying the standard central formula for second derivatives,
 (f(x+h) - 2*f(x) + f(x-h)) / h^2 with the (by theory) suggested step length 
 eps^0.25 (for second derivatives) will result in a much better value:
 
 julia h = eps()^0.25;
 
 julia f = sin; x = 1.0;
 
 julia df = (sin(x+h) - 2*sin(x) + sin(x-h)) / h^2
 -0.8414709866046906
 
 julia abs(df + sin(1.0))
 1.7967940468821553e-9
 
 The functions for numerical differentiation in Calculus look quite involved, 
 maybe it would be preferable to apply known approaches derived from Taylor 
 series. Even the fourth order derivative will in this case lead to an 
 absolute error below 1e-05!



Re: [julia-users] How to install a Package from a github branch

2014-01-20 Thread John Myles White
Keyword arguments seem like a much better approach.

 — John

On Jan 20, 2014, at 11:23 AM, Stefan Karpinski ste...@karpinski.org wrote:

 This makes me wonder if the API should change. Maybe keyword arguments for 
 both the package name and branch?
 
 
 On Mon, Jan 20, 2014 at 1:25 PM, Ivar Nesje iva...@gmail.com wrote:
 Can't you just do
 
 Pkg.clone(https://gitub.com.,pkgname;)
 Pkg.checkout(pkgname,branch)
 
 Ivar
 
 kl. 19:18:23 UTC+1 mandag 20. januar 2014 skrev Sharmila Gopirajan Sivakumar 
 følgende:
 Hi Stefan,
 Thank you for responding.  As an extension of my understanding of 
 Pkg.checkout, I assumed that Pkg.clone(url, name) would clone the branch 
 'name' for the repo at 'url'.  My bad. It still installs only the master 
 branch. Right now there doesn't seem to be support to install a branch or tag 
 through Pkg.clone(). 
 
 I want to use John's fork because it is DBI complaint and supports prepared 
 statements and parameter binding which the official version doesn't.  
 
  I just now read John Myles White's response too.  While I accept his idea, 
 would it not be useful to have the ability to install from the branch or tag 
 of an unregistered Package?  If you feel that is a valid feature, I would be 
 happy to help add it.
 
 Regards,
 Sharmila
 
 
 On Mon, Jan 20, 2014 at 9:43 PM, Stefan Karpinski ste...@karpinski.org 
 wrote:
 That did successfully install the package. However, as per the documentation 
 for Pkg.clone, it did so under the package name jmw. Did you mean for the 
 second argument to be a branch name? You can checkout a specific branch after 
 cloning the package using the Pkg.checkout command. Also, SQLite is an 
 official, registered package, so installing it via Pkg.clone is a bit 
 unusual. Do you need John's fork for some particular reason?
 
 
 On Mon, Jan 20, 2014 at 8:10 AM, Sharmila Gopirajan Sivakumar 
 sharmila@gmail.com wrote:
 Hi,
 I want to install a package from a github branch, specifically, 
 https://github.com/johnmyleswhite/SQLite.jl/tree/jmw .  I tried the following 
 command
 
 Pkg.clone(https://github.com/johnmyleswhite/SQLite.jl.git;, jmw)
 INFO: Cloning jmw from https://github.com/johnmyleswhite/SQLite.jl.git
 INFO: Computing changes...
 INFO: No packages to install, update or remove.
 
 Julia is not able to install the package.  Is it possible to locally checkout 
 the code and install from source?  
 
 Thank you.
 
 Regards,
 Sharmila
 
 
 



[julia-users] Ambiguity Warnings

2014-01-20 Thread John Myles White
The recent SharedArray change to Base created some new ambiguity warnings for 
DataFrames.

Warning: New definition 
getindex(AbstractArray{T,1},Indexer) at 
/Users/johnmyleswhite/.julia/DataFrames/src/indexing.jl:195
is ambiguous with: 
getindex(SharedArray{T,N},Any...) at sharedarray.jl:156.
To fix, define 
getindex(SharedArray{T,1},Indexer)
before the new definition.

 — John



[julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-20 Thread John Myles White
As I said in another thread recently, I am currently the lead maintainer of 
more packages than I can keep up with. I think it’s been useful for me to start 
so many different projects, but I can’t keep maintaining most of my packages 
given my current work schedule.

Without Simon Kornblith, Kevin Squire, Sean Garborg and several others doing 
amazing work to keep DataArrays and DataFrames going, much of our basic data 
infrastructure would have already become completely unusable. But even with the 
great work that’s been done on those package recently, there’s still lot of 
additional design work required. I’d like to free up some of my time to do that 
work.

To keep things moving forward, I’d like to propose a couple of radical New 
Year’s resolutions for the packages I work on.

(1) We need to stop adding functionality and focus entirely on improving the 
quality and documentation of our existing functionality. We have way too much 
prototype code in DataFrames that I can’t keep up with. I’m about to make a 
pull request for DataFrames that will remove everything related to column 
groupings, database-style indexing and Blocks.jl support. I absolutely want to 
see us push all of those ideas forward in the future, but they need to happen 
in unmerged forks or separate packages until we have the resources needed to 
support them. Right now, they make an overwhelming maintenance challenge even 
more onerous.

(2) We can’t support anything other than the master branch of most JuliaStats 
packages except possibly for Distributions. I personally don’t have the time to 
simultaneously keep stuff working with Julia 0.2 and Julia 0.3. Moreover, many 
of our basic packages aren’t mature enough to justify supporting older 
versions. We should do a better job of supporting our master releases and not 
invest precious time trying to support older releases.

(3) We need to make more of DataArrays and DataFrames reflect the Julian 
worldview. Lots of our code uses an interface that is incongruous with the 
interfaces found in Base. Even worse, a large chunk of code has type-stability 
problems that makes it very slow, when comparable code that uses normal Arrays 
is 100x faster. We need to develop new idioms and new strategies for making 
code that interacts with type-destabilizing NA’s faster. More generally, we 
need to make DataArrays and DataFrames fit in better with Julia when Julia and 
R disagree. Following R’s lead has often lead us astray because R doesn’t share 
Julia’s strenths or weaknesses.

(4) Going forward, there should be exactly one way to do most things. The worst 
part of our current codebase is that there are multiple ways to express the 
same computation, but (a) some of them are unusably slow and (b) some of them 
don’t ever get tested or maintained properly. This is closely linked to the 
excess proliferation of functionality described in Resolution 1 above. We need 
to start removing stuff from our packages and making the parts we keep both 
reliable and fast.

I think we can push DataArrays and DataFrames to 1.0 status by the end of this 
year. But I think we need to adopt a new approach if we’re going to get there. 
Lots of stuff needs to get deprecated and what remains needs a lot more 
testing, benchmarking and documentation.

 — John



Re: [julia-users] Ambiguity Warnings

2014-01-20 Thread John Myles White
For the moment this is solved by me having removed indexing.jl, which we didn’t 
really need. So I don’t think you need to do anything for the moment.

But I’d broadly like to know if we have any strategy for avoiding these kinds 
of conflicts moving forward. It’s such an odd experience to find my code raises 
warnings because of changes external to it.

 — John

On Jan 20, 2014, at 7:52 PM, Amit Murthy amit.mur...@gmail.com wrote:

 What would be the best way to solve this? 
 
 A SharedArray type has a regular Array backing it and we should make it 
 usable wherever a regular Array can be used.
 
 Would the right thing to do be 
 
 - get a list of getindex methods that operate on a regular Array 
 - generate the same definitions for a SharedArray with a pass through to the 
 backing Array
 - this would ensure that any further getindex definitions for an Array are 
 automatically generated for SharedArray too
 
 
 
 
 
 On Tue, Jan 21, 2014 at 1:04 AM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 The recent SharedArray change to Base created some new ambiguity warnings for 
 DataFrames.
 
 Warning: New definition
 getindex(AbstractArray{T,1},Indexer) at 
 /Users/johnmyleswhite/.julia/DataFrames/src/indexing.jl:195
 is ambiguous with:
 getindex(SharedArray{T,N},Any...) at sharedarray.jl:156.
 To fix, define
 getindex(SharedArray{T,1},Indexer)
 before the new definition.
 
  — John
 
 



Re: [julia-users] Higher order derivatives in Calculus

2014-01-21 Thread John Myles White
Just to chime in: the biggest problem with the Calculus isn’t the absence of 
usable functionality, it’s that the published interface isn’t a very good one 
and the more reliable interface, including things like 
finite_difference_hessian, isn’t exported.

To fix this, we need someone to come in and do some serious design work, where 
they'll rethink interfaces and remove out-dated functionality. As Tim Holy 
mentioned, the combination of the unpublished finite diference methods and 
automatic differentation methods in DualNumbers should get you very far.

 — John

On Jan 21, 2014, at 7:20 AM, Tim Holy tim.h...@gmail.com wrote:

 On Tuesday, January 21, 2014 05:32:13 AM Hans W Borchers wrote:
 When you say, Calculus is not developed much at the moment,
 maybe it's too early for me to change.
 
 Writing finite-differencing algorithms isn't that hard. That should not be a 
 make-or-break issue for your decision about whether to use Julia.
 
 But don't underestimate the automatic differentiation facilities that have 
 recently been added to Julia (https://github.com/scidom/DualNumbers.jl). 
 Basically, AD computes numerical derivatives without the roundoff error, by 
 defining a new numerical type that behaves somewhat similarly to complex 
 numbers but extracts the first derivative exactly. The key point is that it 
 is 
 a _numerical_ approach, so it doesn't rely on anything symbolic. The one 
 place 
 you can't use AD is when your function relies on calling out to C (because C 
 doesn't know about Julia's Dual type). But any function defined in Julia, 
 including special functions like elliptic integrals, etc, should be fine.
 
 For higher-order derivatives, you can do similar things with even more fancy 
 numerical types. Perhaps the new PowerSeries already does this? (I haven't 
 looked.)
 
 --Tim
 



Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-21 Thread John Myles White
I agree with everything on this list, including my always neglected DataStreams 
project.

I think it would be nice to get rid of expression-based indexing + select and 
focus on getting something like LINQ working. For another interesting 
perspective, check out the nearly created query function in Pandas, which takes 
in strings rather than expressions as inputs.

 — John

On Jan 21, 2014, at 4:42 AM, Tom Short tshort.rli...@gmail.com wrote:

 I also agree with your approach, John. Based on your criteria, here
 are some other things to consider for the chopping block.
 
 - expression-based indexing
 - NamedArray (you already have an issue on this)
 - with, within, based_on and variants
 - @transform, @DataFrame
 - select, filter
 - DataStream
 
 Many of these were attempts to ease syntax via delayed evaluation. We
 can either do without or try to implement something like LINQ.



Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-21 Thread John Myles White
Can you do something like df[“ColA”] = f(df)?

 — John

On Jan 21, 2014, at 8:48 AM, Blake Johnson blakejohnso...@gmail.com wrote:

 I use within! pretty frequently. What should I be using instead if that is on 
 the chopping block?
 
 --Blake
 
 On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote:
 I also agree with your approach, John. Based on your criteria, here 
 are some other things to consider for the chopping block. 
 
 - expression-based indexing 
 - NamedArray (you already have an issue on this) 
 - with, within, based_on and variants 
 - @transform, @DataFrame 
 - select, filter 
 - DataStream 
 
 Many of these were attempts to ease syntax via delayed evaluation. We 
 can either do without or try to implement something like LINQ. 
 
 
 
 On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire kevin@gmail.com wrote: 
  Hi John, 
  
  I agree with pretty much everything you have written here, and really 
  appreciate that you've taken the lead in cleaning things up and getting us 
  on track. 
  
  Cheers! 
 Kevin 
  
  
  On Mon, Jan 20, 2014 at 1:57 PM, John Myles White johnmyl...@gmail.com 
  wrote: 
  
  As I said in another thread recently, I am currently the lead maintainer 
  of more packages than I can keep up with. I think it’s been useful for me 
  to 
  start so many different projects, but I can’t keep maintaining most of my 
  packages given my current work schedule. 
  
  Without Simon Kornblith, Kevin Squire, Sean Garborg and several others 
  doing amazing work to keep DataArrays and DataFrames going, much of our 
  basic data infrastructure would have already become completely unusable. 
  But 
  even with the great work that’s been done on those package recently, 
  there’s 
  still lot of additional design work required. I’d like to free up some of 
  my 
  time to do that work. 
  
  To keep things moving forward, I’d like to propose a couple of radical New 
  Year’s resolutions for the packages I work on. 
  
  (1) We need to stop adding functionality and focus entirely on improving 
  the quality and documentation of our existing functionality. We have way 
  too 
  much prototype code in DataFrames that I can’t keep up with. I’m about to 
  make a pull request for DataFrames that will remove everything related to 
  column groupings, database-style indexing and Blocks.jl support. I 
  absolutely want to see us push all of those ideas forward in the future, 
  but 
  they need to happen in unmerged forks or separate packages until we have 
  the 
  resources needed to support them. Right now, they make an overwhelming 
  maintenance challenge even more onerous. 
  
  (2) We can’t support anything other than the master branch of most 
  JuliaStats packages except possibly for Distributions. I personally don’t 
  have the time to simultaneously keep stuff working with Julia 0.2 and 
  Julia 
  0.3. Moreover, many of our basic packages aren’t mature enough to justify 
  supporting older versions. We should do a better job of supporting our 
  master releases and not invest precious time trying to support older 
  releases. 
  
  (3) We need to make more of DataArrays and DataFrames reflect the Julian 
  worldview. Lots of our code uses an interface that is incongruous with the 
  interfaces found in Base. Even worse, a large chunk of code has 
  type-stability problems that makes it very slow, when comparable code that 
  uses normal Arrays is 100x faster. We need to develop new idioms and new 
  strategies for making code that interacts with type-destabilizing NA’s 
  faster. More generally, we need to make DataArrays and DataFrames fit in 
  better with Julia when Julia and R disagree. Following R’s lead has often 
  lead us astray because R doesn’t share Julia’s strenths or weaknesses. 
  
  (4) Going forward, there should be exactly one way to do most things. The 
  worst part of our current codebase is that there are multiple ways to 
  express the same computation, but (a) some of them are unusably slow and 
  (b) 
  some of them don’t ever get tested or maintained properly. This is closely 
  linked to the excess proliferation of functionality described in 
  Resolution 
  1 above. We need to start removing stuff from our packages and making the 
  parts we keep both reliable and fast. 
  
  I think we can push DataArrays and DataFrames to 1.0 status by the end of 
  this year. But I think we need to adopt a new approach if we’re going to 
  get 
  there. Lots of stuff needs to get deprecated and what remains needs a lot 
  more testing, benchmarking and documentation. 
  
   — John 
  
 



Re: [julia-users] Higher order derivatives in Calculus

2014-01-21 Thread John Myles White
If you’re willing to wait, I’m happy to return to the Calculus package in the 
spring. I’m focusing on DataFrames/DataArrays (and some database stuff that’s 
closely related) until then.

 — John

On Jan 21, 2014, at 8:42 AM, Hans W Borchers hwborch...@gmail.com wrote:

 Thanks for these encouraging words. I have already written an R package with 
 more than a hundred numerical functions (incl. several numerical 
 derivatives), and I would be willing to help build up a numerical package in 
 Julia. But of course, someone from the Julia community will be needed to take 
 the lead. Please let me know when this 'management position'(?) has been 
 taken.
 
 On Tuesday, January 21, 2014 4:44:37 PM UTC+1, John Myles White wrote:
 Just to chime in: the biggest problem with the Calculus isn’t the absence of 
 usable functionality, it’s that the published interface isn’t a very good one 
 and the more reliable interface, including things like 
 finite_difference_hessian, isn’t exported. 
 
 To fix this, we need someone to come in and do some serious design work, 
 where they'll rethink interfaces and remove out-dated functionality. As Tim 
 Holy mentioned, the combination of the unpublished finite diference methods 
 and automatic differentation methods in DualNumbers should get you very far. 
 
  — John 
 



Re: [julia-users] Higher order derivatives in Calculus

2014-01-22 Thread John Myles White
This sounds like a great approach, Tim. (And, for the record, I’m legitimately 
amazed by the amount of functionality you’re successfully maintaining.)

Since we’re adding feature requests, here’s another one:

(Feature) Implement lower and upper bounds on FD gradient calculations. If 
lower or upper bounds are violated by chosen forward or central differencing 
method, change behavior to stay within bounds.

This would make it much easier for us to use finite differencing in constrained 
optimization problems.

 — John

On Jan 22, 2014, at 8:53 AM, Tim Holy tim.h...@gmail.com wrote:

 To me this sounds like a case for a fork: Hans doesn't yet feel confident 
 about 
 his Julia, but John wants to ditch maintainership. (Trust me John, I _really_ 
 understand!) We need an organic way of test-driving a new maintainer. Hans, 
 why don't you just fork it to your github account and start making changes, 
 and let's see how it goes?
 
 A couple of tips:
 - As you make changes, run the tests to see if they still pass, and you'll 
 have some reason to hope that you may not have broken anything.
 - For any API changes, a way to be nice to users is to use the `@deprecate` 
 macro.
 
 Adhering to those guidelines will make it easier for people to migrate to 
 your 
 package. If you get to the point of having something your proud of, rather 
 than submitting a pull request to John's package, just advertise it to the 
 list. That will begin the process of other people being able to test out your 
 version, with no risk (John's will still be up, too). If all goes well, 
 you'll 
 eventually become the official maintainer.
 
 Hans, I already have a feature-request for you: spot checking particular 
 elements of the gradient. When I have a function of 10^6 variables, often all 
 I want to do it get some indication that I've done my analytic calculation of 
 the gradient correctly. Computing all 10^6 components is horrifically slow, 
 and 
 usually not necessary.
 
 --Tim
 
 On Wednesday, January 22, 2014 08:28:05 AM John Myles White wrote:
 Yes, it would. I just don’t know who’s going to do that. But I badly want
 someone to.
 
 — John
 
 On Jan 22, 2014, at 3:33 AM, Hans W Borchers hwborch...@gmail.com wrote:
 John, as I understood you are overloaden.
 And I cannot believe this will change in spring.
 Wouldn't it be preferable if someone else takes over?
 
 Hans Werner
 
 
 On Wednesday, January 22, 2014 3:58:18 AM UTC+1, John Myles White wrote:
 If you’re willing to wait, I’m happy to return to the Calculus package in
 the spring. I’m focusing on DataFrames/DataArrays (and some database
 stuff that’s closely related) until then. 
 — John
 
 On Jan 21, 2014, at 8:42 AM, Hans W Borchers hwbor...@gmail.com wrote:
 Thanks for these encouraging words. I have already written an R package
 with more than a hundred numerical functions (incl. several numerical
 derivatives), and I would be willing to help build up a numerical
 package in Julia. But of course, someone from the Julia community will
 be needed to take the lead. Please let me know when this 'management
 position'(?) has been taken.
 
 On Tuesday, January 21, 2014 4:44:37 PM UTC+1, John Myles White wrote:
 Just to chime in: the biggest problem with the Calculus isn’t the absence
 of usable functionality, it’s that the published interface isn’t a very
 good one and the more reliable interface, including things like
 finite_difference_hessian, isn’t exported.
 
 To fix this, we need someone to come in and do some serious design work,
 where they'll rethink interfaces and remove out-dated functionality. As
 Tim Holy mentioned, the combination of the unpublished finite diference
 methods and automatic differentation methods in DualNumbers should get
 you very far. 
 — John



Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-22 Thread John Myles White
My impression is that Pandas didn't support anything like delayed evaluation. 
Is that wrong?

I'm aware that the resulting expressions are a lot more verbose. That 
definitely sucks.

I'd love to see strong proposals for how we're going to do a better job of 
making code shorter going forward. But too much of our current codebase is 
buggy, unable to handle edge cases, slow and undocumented. I think it's much 
more important that we have one way of doing things that actually works as 
advertised for every Julia user than two ways of doing things, each of which is 
slightly broken and performs worse than R and Pandas.

As I've been saying lately, I'm burning out on maintaing so much Julia code. If 
someone else wants to take charge of my projects, I'm ok with that. But if I'm 
going to be doing the work going forward, I need to devote my energies to 
making a small number of things work really well. Once we get our core 
functionality solid, I'll be comfortable getting fancier stuff working again.

 -- John

On Jan 22, 2014, at 1:06 PM, Kevin Squire kevin.squ...@gmail.com wrote:

 I'm also a fan of the expression-based interface (mostly because I'm used to 
 similar things in Pandas).  I haven't looked at that code, though, so I can't 
 comment on the complexity.
 
 Kevin
 
 
 On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson blakejohnso...@gmail.com 
 wrote:
 Sure, but the resulting expression is much more verbose. I just noticed that 
 all expression-based indexing was on the chopping block. What is left after 
 all this?
 
 I can see how axing these features would make DataFrames.jl easier to 
 maintain, but I found the expression stuff to present a rather nice interface.
 
 --Blake
 
 
 On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote:
 Can you do something like df[“ColA”] = f(df)?
 
  — John
 
 
 On Jan 21, 2014, at 8:48 AM, Blake Johnson blakejo...@gmail.com wrote:
 
 I use within! pretty frequently. What should I be using instead if that is 
 on the chopping block?
 
 --Blake
 
 On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote:
 I also agree with your approach, John. Based on your criteria, here 
 are some other things to consider for the chopping block. 
 
 - expression-based indexing 
 - NamedArray (you already have an issue on this) 
 - with, within, based_on and variants 
 - @transform, @DataFrame 
 - select, filter 
 - DataStream 
 
 Many of these were attempts to ease syntax via delayed evaluation. We 
 can either do without or try to implement something like LINQ. 
 
 
 
 On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire kevin@gmail.com wrote: 
  Hi John, 
  
  I agree with pretty much everything you have written here, and really 
  appreciate that you've taken the lead in cleaning things up and getting us 
  on track. 
  
  Cheers! 
 Kevin 
  
  
  On Mon, Jan 20, 2014 at 1:57 PM, John Myles White johnmyl...@gmail.com 
  wrote: 
  
  As I said in another thread recently, I am currently the lead maintainer 
  of more packages than I can keep up with. I think it’s been useful for me 
  to 
  start so many different projects, but I can’t keep maintaining most of my 
  packages given my current work schedule. 
  
  Without Simon Kornblith, Kevin Squire, Sean Garborg and several others 
  doing amazing work to keep DataArrays and DataFrames going, much of our 
  basic data infrastructure would have already become completely unusable. 
  But 
  even with the great work that’s been done on those package recently, 
  there’s 
  still lot of additional design work required. I’d like to free up some of 
  my 
  time to do that work. 
  
  To keep things moving forward, I’d like to propose a couple of radical 
  New 
  Year’s resolutions for the packages I work on. 
  
  (1) We need to stop adding functionality and focus entirely on improving 
  the quality and documentation of our existing functionality. We have way 
  too 
  much prototype code in DataFrames that I can’t keep up with. I’m about to 
  make a pull request for DataFrames that will remove everything related to 
  column groupings, database-style indexing and Blocks.jl support. I 
  absolutely want to see us push all of those ideas forward in the future, 
  but 
  they need to happen in unmerged forks or separate packages until we have 
  the 
  resources needed to support them. Right now, they make an overwhelming 
  maintenance challenge even more onerous. 
  
  (2) We can’t support anything other than the master branch of most 
  JuliaStats packages except possibly for Distributions. I personally don’t 
  have the time to simultaneously keep stuff working with Julia 0.2 and 
  Julia 
  0.3. Moreover, many of our basic packages aren’t mature enough to justify 
  supporting older versions. We should do a better job of supporting our 
  master releases and not invest precious time trying to support older 
  releases. 
  
  (3) We need to make more of DataArrays and DataFrames reflect the Julian

Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-22 Thread John Myles White
The idealized expression interface offers things like (up to reordering):

with(df, a + b * x)

where a and b are variables in the caller's scope and x is a column of df.

In practice, we've had to hack this sort of thing together to offer things like

with(df, :($a + $b * x))

That's because we need to pass quoted strings and we also need to tell the 
system which variables are in the caller's cope.

More generally, I'd refer to any operation that passes expressions around and 
asks other functions to evaluate them with an ad hoc scope as expression-based 
operations.

R offers very deep support for this in the language.

 -- John

On Jan 22, 2014, at 2:48 PM, Kevin Squire kevin.squ...@gmail.com wrote:

 Maybe I misinterpreted the term expression-based interface.
 
 
 On Wed, Jan 22, 2014 at 2:33 PM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 My impression is that Pandas didn't support anything like delayed evaluation. 
 Is that wrong?
 
 I'm aware that the resulting expressions are a lot more verbose. That 
 definitely sucks.
 
 I'd love to see strong proposals for how we're going to do a better job of 
 making code shorter going forward. But too much of our current codebase is 
 buggy, unable to handle edge cases, slow and undocumented. I think it's much 
 more important that we have one way of doing things that actually works as 
 advertised for every Julia user than two ways of doing things, each of which 
 is slightly broken and performs worse than R and Pandas.
 
 As I've been saying lately, I'm burning out on maintaing so much Julia code. 
 If someone else wants to take charge of my projects, I'm ok with that. But if 
 I'm going to be doing the work going forward, I need to devote my energies to 
 making a small number of things work really well. Once we get our core 
 functionality solid, I'll be comfortable getting fancier stuff working again.
 
  -- John
 
 On Jan 22, 2014, at 1:06 PM, Kevin Squire kevin.squ...@gmail.com wrote:
 
 I'm also a fan of the expression-based interface (mostly because I'm used to 
 similar things in Pandas).  I haven't looked at that code, though, so I 
 can't comment on the complexity.
 
 Kevin
 
 
 On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson blakejohnso...@gmail.com 
 wrote:
 Sure, but the resulting expression is much more verbose. I just noticed that 
 all expression-based indexing was on the chopping block. What is left after 
 all this?
 
 I can see how axing these features would make DataFrames.jl easier to 
 maintain, but I found the expression stuff to present a rather nice 
 interface.
 
 --Blake
 
 
 On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote:
 Can you do something like df[“ColA”] = f(df)?
 
  — John
 
 
 On Jan 21, 2014, at 8:48 AM, Blake Johnson blakejo...@gmail.com wrote:
 
 I use within! pretty frequently. What should I be using instead if that is 
 on the chopping block?
 
 --Blake
 
 On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote:
 I also agree with your approach, John. Based on your criteria, here 
 are some other things to consider for the chopping block. 
 
 - expression-based indexing 
 - NamedArray (you already have an issue on this) 
 - with, within, based_on and variants 
 - @transform, @DataFrame 
 - select, filter 
 - DataStream 
 
 Many of these were attempts to ease syntax via delayed evaluation. We 
 can either do without or try to implement something like LINQ. 
 
 
 
 On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire kevin@gmail.com wrote: 
  Hi John, 
  
  I agree with pretty much everything you have written here, and really 
  appreciate that you've taken the lead in cleaning things up and getting 
  us 
  on track. 
  
  Cheers! 
 Kevin 
  
  
  On Mon, Jan 20, 2014 at 1:57 PM, John Myles White johnmyl...@gmail.com 
  wrote: 
  
  As I said in another thread recently, I am currently the lead maintainer 
  of more packages than I can keep up with. I think it’s been useful for 
  me to 
  start so many different projects, but I can’t keep maintaining most of 
  my 
  packages given my current work schedule. 
  
  Without Simon Kornblith, Kevin Squire, Sean Garborg and several others 
  doing amazing work to keep DataArrays and DataFrames going, much of our 
  basic data infrastructure would have already become completely unusable. 
  But 
  even with the great work that’s been done on those package recently, 
  there’s 
  still lot of additional design work required. I’d like to free up some 
  of my 
  time to do that work. 
  
  To keep things moving forward, I’d like to propose a couple of radical 
  New 
  Year’s resolutions for the packages I work on. 
  
  (1) We need to stop adding functionality and focus entirely on improving 
  the quality and documentation of our existing functionality. We have way 
  too 
  much prototype code in DataFrames that I can’t keep up with. I’m about 
  to 
  make a pull request for DataFrames that will remove everything related

Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-22 Thread John Myles White
That's exactly the kind of indexing I'd like to encourage using until we get 
our core functionality cleaned up. Nothing special required except Boolean 
indexing, which is easy to make fast and doesn't have weird scoping issues.

 -- John

On Jan 22, 2014, at 3:18 PM, Kevin Squire kevin.squ...@gmail.com wrote:

 Got it.  I was thinking of the more verbose (but still useful)
 
 df[(df[colA]  4)  !isna(df[colB]), :]
 
 Kevin
 
 
 On Wed, Jan 22, 2014 at 3:10 PM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 The idealized expression interface offers things like (up to reordering):
 
 with(df, a + b * x)
 
 where a and b are variables in the caller's scope and x is a column of df.
 
 In practice, we've had to hack this sort of thing together to offer things 
 like
 
 with(df, :($a + $b * x))
 
 That's because we need to pass quoted strings and we also need to tell the 
 system which variables are in the caller's cope.
 
 More generally, I'd refer to any operation that passes expressions around and 
 asks other functions to evaluate them with an ad hoc scope as 
 expression-based operations.
 
 R offers very deep support for this in the language.
 
  -- John
 
 On Jan 22, 2014, at 2:48 PM, Kevin Squire kevin.squ...@gmail.com wrote:
 
 Maybe I misinterpreted the term expression-based interface.
 
 
 On Wed, Jan 22, 2014 at 2:33 PM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 My impression is that Pandas didn't support anything like delayed 
 evaluation. Is that wrong?
 
 I'm aware that the resulting expressions are a lot more verbose. That 
 definitely sucks.
 
 I'd love to see strong proposals for how we're going to do a better job of 
 making code shorter going forward. But too much of our current codebase is 
 buggy, unable to handle edge cases, slow and undocumented. I think it's much 
 more important that we have one way of doing things that actually works as 
 advertised for every Julia user than two ways of doing things, each of which 
 is slightly broken and performs worse than R and Pandas.
 
 As I've been saying lately, I'm burning out on maintaing so much Julia code. 
 If someone else wants to take charge of my projects, I'm ok with that. But 
 if I'm going to be doing the work going forward, I need to devote my 
 energies to making a small number of things work really well. Once we get 
 our core functionality solid, I'll be comfortable getting fancier stuff 
 working again.
 
  -- John
 
 On Jan 22, 2014, at 1:06 PM, Kevin Squire kevin.squ...@gmail.com wrote:
 
 I'm also a fan of the expression-based interface (mostly because I'm used 
 to similar things in Pandas).  I haven't looked at that code, though, so I 
 can't comment on the complexity.
 
 Kevin
 
 
 On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson blakejohnso...@gmail.com 
 wrote:
 Sure, but the resulting expression is much more verbose. I just noticed 
 that all expression-based indexing was on the chopping block. What is left 
 after all this?
 
 I can see how axing these features would make DataFrames.jl easier to 
 maintain, but I found the expression stuff to present a rather nice 
 interface.
 
 --Blake
 
 
 On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote:
 Can you do something like df[“ColA”] = f(df)?
 
  — John
 
 
 On Jan 21, 2014, at 8:48 AM, Blake Johnson blakejo...@gmail.com wrote:
 
 I use within! pretty frequently. What should I be using instead if that is 
 on the chopping block?
 
 --Blake
 
 On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote:
 I also agree with your approach, John. Based on your criteria, here 
 are some other things to consider for the chopping block. 
 
 - expression-based indexing 
 - NamedArray (you already have an issue on this) 
 - with, within, based_on and variants 
 - @transform, @DataFrame 
 - select, filter 
 - DataStream 
 
 Many of these were attempts to ease syntax via delayed evaluation. We 
 can either do without or try to implement something like LINQ. 
 
 
 
 On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire kevin@gmail.com wrote: 
  Hi John, 
  
  I agree with pretty much everything you have written here, and really 
  appreciate that you've taken the lead in cleaning things up and getting 
  us 
  on track. 
  
  Cheers! 
 Kevin 
  
  
  On Mon, Jan 20, 2014 at 1:57 PM, John Myles White johnmyl...@gmail.com 
  wrote: 
  
  As I said in another thread recently, I am currently the lead 
  maintainer 
  of more packages than I can keep up with. I think it’s been useful for 
  me to 
  start so many different projects, but I can’t keep maintaining most of 
  my 
  packages given my current work schedule. 
  
  Without Simon Kornblith, Kevin Squire, Sean Garborg and several others 
  doing amazing work to keep DataArrays and DataFrames going, much of our 
  basic data infrastructure would have already become completely 
  unusable. But 
  even with the great work that’s been done on those package recently, 
  there’s 
  still lot

Re: [julia-users] Re: Why isn't typeof(Float64[]) : typeof(Real[]) true?

2014-01-22 Thread John Myles White
I thought this was in the performance tips, but I couldn’t find it in a quick 
read. Definitely worth putting in there, because this is a really, really 
subtle point despite being so important.

 — John

On Jan 22, 2014, at 4:01 PM, Kevin Squire kevin.squ...@gmail.com wrote:

 Thanks a lot for the correction, Tobias.  I was confused on this point, but 
 it's easy to check:
 
 julia c(a::Real, b::Real) = a+b
 c (generic function with 1 method)
 
 julia code_native(c, (Float64,Float64))
 .text
 Filename: none
 Source line: 1
 pushRBP
 movRBP, RSP
 Source line: 1
 addsdXMM0, XMM1
 popRBP
 ret
 
 julia code_native(c, (BigFloat,BigFloat))
 .text
 Filename: none
 Source line: 1
 pushRBP
 movRBP, RSP
 pushRBX
 subRSP, 40
 movQWORD PTR [RBP - 40], 4
 Source line: 1
 movabsRBX, 139893810921040
 movRAX, QWORD PTR [RBX]
 movQWORD PTR [RBP - 32], RAX
 leaRAX, QWORD PTR [RBP - 40]
 movQWORD PTR [RBX], RAX
 xorpsXMM0, XMM0
 movupsXMMWORD PTR [RBP - 24], XMM0
 movupsXMM0, XMMWORD PTR [RSI]
 Source line: 1
 movupsXMMWORD PTR [RBP - 24], XMM0
 Source line: 1
 leaRSI, QWORD PTR [RBP - 24]
 Source line: 1
 movabsRAX, 139893815571504
 movEDI, 64233568
 movEDX, 2
 callRAX
 movRCX, QWORD PTR [RBP - 32]
 movQWORD PTR [RBX], RCX
 addRSP, 40
 popRBX
 popRBP
 ret
 
 Kevin
 
 
 On Wed, Jan 22, 2014 at 3:55 PM, Tobias Knopp tobias.kn...@googlemail.com 
 wrote:
 No. Giving types in function definitions does not give you any speedup as the 
 function are always compiled for concrete types.
 
 When you define composite types it is however important to give concrete 
 types for optimal performance.
 
 Am Donnerstag, 23. Januar 2014 00:41:41 UTC+1 schrieb Patrick Foley:
 Thanks!
 
 I've sorted it out and have solved my original problems.
 
 Would I get any speedup by defining 
 
 function foo{TA:Real, TB:Real}(a::TA, b::TB)
 
 rather than
 
 function foo(a::Real, b::Real)
 
 ?
 
 My guess is .. yes? Since if I'm defining it the first way, I can compile 
 versions of foo like foo(a::Int8, b::Int8) automatically, which would be much 
 faster than defaulting to a foo(a::Real, b::Real) and reserving space for a 
 possible Float64 each time?
 
 
 On Tuesday, January 21, 2014 7:17:36 PM UTC-5, Patrick Foley wrote:
 Is there a way to get around this?  I have a lot of types (foo1, foo2, ) 
 all of which are subtypes of an abstract (bar).  I want to be able to define 
 the behavior for arrays of any of the foos just by defining the behavior of 
 an array of 'bar's.  Any advice?
 



Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-23 Thread John Myles White
Yeah, at some point in the future I’d like to see if we can imitate the 
experimental query() and eval() methods from Pandas.

It’s the fact that those methods were just recently introduced which made me 
decide we needed to stop spending time on getting them working right now. We’re 
way behind Pandas in terms of performance and reliability, so it’s a bad idea 
for us to try being as feature complete until we catch up.

 — John

On Jan 23, 2014, at 6:37 AM, Jonathan Malmaud malm...@gmail.com wrote:

 Pandas has a 'query' method 
 (http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-query) which 
 uses the Python numexpr package for delayed evaluation (if i understand what 
 you mean by that in this context). 



Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-23 Thread John Myles White
I think that’s probably because you need to do using DataArrays now.

 — John

On Jan 23, 2014, at 2:08 AM, Jon Norberg jon.norb...@ecology.su.se wrote:

 is this why I get this on latest julia studio on mac with recently updated 
 packages:
 
 julia using DataFrames
 julia using RDatasets
 julia iris = data(datasets, iris)
 data not defined
 
 ??



Re: [julia-users] New install on OSX 10.9 GLM package does not work

2014-01-23 Thread John Myles White
Hi Chris,

Unfortunately it’s very difficult for us to support 0.2 anymore because of the 
badly breaking Stats - StatsBase renaming. We’d have to rewrite the history of 
every repo to resolve this name change, so we chose to instead push everything 
up to our current development branches. That change unfortunately entirely 
deprecated Julia 0.2 support for DataArrays and DataFrames.

I’m hopeful we’ll standardize on a stable set of features for core statistical 
libraries in the next six months. Once we all agree on core infrastructure 
issues, it’ll be easier to provide backwards compatibility. Right now we don’t 
have enough developers working on JuliaStats to both support older releases and 
develop new ones.

 — John

On Jan 23, 2014, at 2:33 PM, Cgast cmg...@gmail.com wrote:

 Thanks Ivar. My incomplete description (which you have somewhat offensively 
 labeled as weak) was intentionally so, to avoid hijacking the thread with 
 my own problem. With your encouragement, however, here is the problem I'm 
 having, which I suspect is related to the OP's problem: I installed v0.2.0 
 (64-bit) this morning (build date appears to be 2013-11-16 23:44 UTC), and 
 have the following Pkg.status():
 
 Required packages:
  - GLM   0.2.2
 Additional packages:
  - Blocks0.0.1
  - DataArrays0.1.0
  - DataFrames0.5.0
  - Distributions 0.3.0
  - GZip  0.2.7
  - NumericExtensions 0.3.6
  - SortingAlgorithms 0.0.1
  - StatsBase 0.3.5
 
 using GLM gives me the following messages, terminating in an error:
 
 Warning: could not import Base.foldl into NumericExtensions
 Warning: could not import Base.foldr into NumericExtensions
 Warning: could not import Base.sum! into NumericExtensions
 Warning: could not import Base.maximum! into NumericExtensions
 Warning: could not import Base.minimum! into NumericExtensions
 ERROR: Stats not found
   in require at loading.jl:39
 
 at C:\~\.julia\GLM\src\GLM.jl:8
 
 Does this appear to be related to the previous problem? Does anyone have any 
 suggestions on how to fix it, or shall I wait for package authors to do some 
 updating?
 
 If a newer Julia version is required (which appears to be the suggestion from 
 the NumericExtensions github issues), are there no newer Windows binaries 
 available than v0.2.0? My corporate environment will make building from 
 source difficult, for a variety of reasons.
 
 After starting with a fresh installation and a clean .julia directory, I've 
 tried to install older versions of NumericExtensions (as suggested), with the 
 following results:
 
 Pkg.pin(NumericExtensions,v0.2.20)
 ERROR: NumericExtensions is not a git repo
   in pin at pkg/entry.jl:202
 
 and also:
 
 Pkg.pin(NumericExtensions,v0.2.20)
 INFO: Installing NumericExtensions v0.3.6  # ---wrong version (latest)
 INFO: REQUIRE updated.
 
 Thanks in advance for your help,
 
 Chris
 
 
 
 
 
 
 On Thursday, January 23, 2014 1:45:38 PM UTC-8, Ivar Nesje wrote:
 Similar problem is a quite weak description.
 
 The previous problem was that a new version of a pakcage (NumericExtensions) 
 was incorrectly marked as compatible with 0.2. This does not appear to be 
 fixed, so a bump on Dahua Lin and John Myles White might be what is needed.
 
 kl. 21:32:14 UTC+1 torsdag 23. januar 2014 skrev Cgast følgende:
 Any update on this? Having similar problems on Windows 7 with a fresh install 
 just this morning.
 
 Seems to be related to some renaming of Stats vs. StatsBase? I've tried 
 fiddling with this myself within the packages, but haven't been able to 
 resolve it.
 
 Thanks in advance for all your help, and all the hard work getting Julia to 
 this point. 
 
 
 Chris
 
 
 
 On Wednesday, January 15, 2014 10:15:55 AM UTC-8, John Myles White wrote:
 No, we'll fix the packages to indicate which work with 0.2 and which don't.
 
  -- John
 
 On Jan 15, 2014, at 9:52 AM, Corey Sparks corey.sp...@gmail.com wrote:
 
 so, if i just wait for 0.3 things might get worked out?
 Thanks
 
 On Wednesday, January 15, 2014 11:26:56 AM UTC-6, John Myles White wrote:
 We've unfortunately done a bad job of keeping those packages compatible with 
 0.2. I'll try to fix as much as I can today.
 
  -- John
 
 On Jan 15, 2014, at 8:48 AM, Corey Sparks corey.sp...@gmail.com wrote:
 
 Dear List,
 I just installed Julia 0.2.0 last night and was trying to get the GLM 
 package going, when I try to load it and the RDatasets packages, I get:
 
 julia using RDatasets, GLM
 
 Warning: could not import Base.foldl into NumericExtensions
 
 Warning: could not import Base.foldr into NumericExtensions
 
 Warning: could not import Base.sum! into NumericExtensions
 
 Warning: could not import Base.maximum! into NumericExtensions
 
 Warning: could not import Base.minimum! into NumericExtensions
 
 Warning: could not import Base.PAIRWISE_SUM_BLOCKSIZE

Re: [julia-users] Re: Error: no method display(DataFrame)

2014-01-23 Thread John Myles White
A couple of points that expand on Tom’s comments:

(1) We need to add Tom’s definition of countna(a::Array) = 0 to show() wide 
DataFrame’s that contain any columns that are Vector’s. I never use DataFrame’s 
like that, so I forgot that others might. It’s also impossible to produce such 
a DataFrame using our current I/O routines.

(2) The constructor you’re using does exist, Jacob, but you should typically 
pass in a Vector{Any}, each element of which is either a DataVector or 
PooledDataVector. See Point (3) for why, at the moment, using a Vector as a 
column is subtly broken.

(3) If people are going to put Vector’s in DataFrames for performance reasons, 
all of our setindex!() functions for DataFrames need to add methods that 
automatically convert Vector’s to DataVector’s if an NA is inserted in a 
Vector. Right now that kind of insertion is just going to error out. Ths check 
isn’t too hard, but it’s totally missing from our current codebase.

Personally, I would prefer that we not allow any of the columns of a DataFrame 
to be Vector's. It’s a weird edge case that doesn’t actually offer reliable 
high performance, because the potential performance improvements relies on the 
unsafe assumption that a DataFame won’t contain any columns with NA’s in it.

 — John

On Jan 23, 2014, at 1:33 PM, Tom Short tshort.rli...@gmail.com wrote:

 That works, but columns will be Arrays instead of DataArrays. That's
 the way it's always worked. If you want them to be DataArrays, then
 convert to DataArrays right at the end.
 
 To fix show to support columns that are arrays, we probably need (at
 least) to define the following:
 
 countna(da::Array) = 0
 
 
 
 On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn quinn.jac...@gmail.com wrote:
 Great investigative work. Is
 DataFrames( array_of_arrays, Index(column_names_array) )
 not the right way to hand construct DataFrames any more? I think I can
 allocate DataArrays instead, but at every step of the way, I was trying to
 hand-optimize the result fetching process, which resulted in not creating a
 DataArray or DataFrame until right before we return to the user.
 
 -Jacob
 
 
 On Thu, Jan 23, 2014 at 3:27 PM, bp2012 bert.pritch...@gmail.com wrote:
 
 To check Jacob's suggestion about versions mismatch I completely removed
 the DataFrames and ODBC packages using Pkg.rm and physically deleted the
 directories from disk. I then added them via Pkg.add and Pkg,update.
 
 I am running the julia nightlies build.
 julia versioninfo()
 Julia Version 0.3.0-prerelease+1127
 Commit bc73674* (2014-01-22 20:09 UTC)
 
 Pkg.status()
 - DataFrames  0.5.1
 - ODBC  0.3.5
 
 Pkg.checkout(ODBC)
 INFO: Checking out ODBC master...
 INFO: Pulling ODBC latest master...
 INFO: No packages to install, update or remove
 
 julia Pkg.checkout(DataFrames)
 INFO: Checking out DataFrames master...
 INFO: Pulling DataFrames latest master...
 INFO: No packages to install, update or remove
 
 I did some digging. It looks like there is a mismatch in that countna
 expects DataFrame columns to be DataArrays. However the ODBC package returns
 DataFrames that have array columns (using the first constructor in
 dataframe.jl). You guys would know better as to whether a change is needed
 in the constructor or if countna should also accept Array columns.
 
 
 I made some local changes to work around the issue.
 
 show.jl:
 line 42:  if isna(col, i) changed to  if isna(col[i])
 line 322:  missing[j] = countna(adf[j]) changed tomissing[j] =
 countna(isa(adf[j], DataArray) ? adf[j] : DataArray(adf[j]))
 
 These work great for me.
 
 



Re: [julia-users] New install on OSX 10.9 GLM package does not work

2014-01-23 Thread John Myles White
We should hopefully have nightly binaries sometime soon that will help 
alleviate some of these issues in the future. I’ve lost track of the work to 
provide them, but I know it’s being done.

 — John

On Jan 23, 2014, at 6:02 PM, Cgast cmg...@gmail.com wrote:

 OK, thanks John and Ivar. I'll probably put in some effort towards building 
 it myself, and wait for 0.3 binaries.
 
 Thanks for the help,
 
 Chris
 
 
 
 On Thursday, January 23, 2014 5:35:16 PM UTC-8, John Myles White wrote:
 Hi Chris,
 
 Unfortunately it’s very difficult for us to support 0.2 anymore because of 
 the badly breaking Stats - StatsBase renaming. We’d have to rewrite the 
 history of every repo to resolve this name change, so we chose to instead 
 push everything up to our current development branches. That change 
 unfortunately entirely deprecated Julia 0.2 support for DataArrays and 
 DataFrames.
 
 I’m hopeful we’ll standardize on a stable set of features for core 
 statistical libraries in the next six months. Once we all agree on core 
 infrastructure issues, it’ll be easier to provide backwards compatibility. 
 Right now we don’t have enough developers working on JuliaStats to both 
 support older releases and develop new ones.
 
  — John
 
 On Jan 23, 2014, at 2:33 PM, Cgast cmg...@gmail.com wrote:
 
 Thanks Ivar. My incomplete description (which you have somewhat offensively 
 labeled as weak) was intentionally so, to avoid hijacking the thread with 
 my own problem. With your encouragement, however, here is the problem I'm 
 having, which I suspect is related to the OP's problem: I installed v0.2.0 
 (64-bit) this morning (build date appears to be 2013-11-16 23:44 UTC), and 
 have the following Pkg.status():
 
 Required packages:
  - GLM   0.2.2
 Additional packages:
  - Blocks0.0.1
  - DataArrays0.1.0
  - DataFrames0.5.0
  - Distributions 0.3.0
  - GZip  0.2.7
  - NumericExtensions 0.3.6
  - SortingAlgorithms 0.0.1
  - StatsBase 0.3.5
 
 using GLM gives me the following messages, terminating in an error:
 
 Warning: could not import Base.foldl into NumericExtensions
 Warning: could not import Base.foldr into NumericExtensions
 Warning: could not import Base.sum! into NumericExtensions
 Warning: could not import Base.maximum! into NumericExtensions
 Warning: could not import Base.minimum! into NumericExtensions
 ERROR: Stats not found
   in require at loading.jl:39
 
 at C:\~\.julia\GLM\src\GLM.jl:8
 
 Does this appear to be related to the previous problem? Does anyone have any 
 suggestions on how to fix it, or shall I wait for package authors to do some 
 updating?
 
 If a newer Julia version is required (which appears to be the suggestion 
 from the NumericExtensions github issues), are there no newer Windows 
 binaries available than v0.2.0? My corporate environment will make building 
 from source difficult, for a variety of reasons.
 
 After starting with a fresh installation and a clean .julia directory, I've 
 tried to install older versions of NumericExtensions (as suggested), with 
 the following results:
 
 Pkg.pin(NumericExtensions,v0.2.20)
 ERROR: NumericExtensions is not a git repo
   in pin at pkg/entry.jl:202
 
 and also:
 
 Pkg.pin(NumericExtensions,v0.2.20)
 INFO: Installing NumericExtensions v0.3.6  # ---wrong version (latest)
 INFO: REQUIRE updated.
 
 Thanks in advance for your help,
 
 Chris
 
 
 
 
 
 
 On Thursday, January 23, 2014 1:45:38 PM UTC-8, Ivar Nesje wrote:
 Similar problem is a quite weak description.
 
 The previous problem was that a new version of a pakcage (NumericExtensions) 
 was incorrectly marked as compatible with 0.2. This does not appear to be 
 fixed, so a bump on Dahua Lin and John Myles White might be what is needed.
 
 kl. 21:32:14 UTC+1 torsdag 23. januar 2014 skrev Cgast følgende:
 Any update on this? Having similar problems on Windows 7 with a fresh 
 install just this morning.
 
 Seems to be related to some renaming of Stats vs. StatsBase? I've tried 
 fiddling with this myself within the packages, but haven't been able to 
 resolve it.
 
 Thanks in advance for all your help, and all the hard work getting Julia to 
 this point. 
 
 
 Chris
 
 
 
 On Wednesday, January 15, 2014 10:15:55 AM UTC-8, John Myles White wrote:
 No, we'll fix the packages to indicate which work with 0.2 and which don't.
 
  -- John
 
 On Jan 15, 2014, at 9:52 AM, Corey Sparks corey.sp...@gmail.com wrote:
 
 so, if i just wait for 0.3 things might get worked out?
 Thanks
 
 On Wednesday, January 15, 2014 11:26:56 AM UTC-6, John Myles White wrote:
 We've unfortunately done a bad job of keeping those packages compatible 
 with 0.2. I'll try to fix as much as I can today.
 
  -- John
 
 On Jan 15, 2014, at 8:48 AM, Corey Sparks corey.sp...@gmail.com wrote:
 
 Dear List,
 I just installed Julia 0.2.0 last night and was trying

Re: [julia-users] Re: Error: no method display(DataFrame)

2014-01-23 Thread John Myles White
I would be a lot happier with that feature if we followed the lead of 
traditional databases and constantly reminded users which columns are “NOT 
NULL”. As it stands, the “types” of a DataFrame don’t tell you whether a column 
could contain NA’s or not. If we exposed functionality through something like a 
hypothetical nullable(df, colindex), my resistance to that feature would start 
to go away

 — John

On Jan 23, 2014, at 6:48 PM, Tom Short tshort.rli...@gmail.com wrote:

 I think of item #3 as a feature, not a bug. I don't like the idea of
 auto-conversion. If I choose Vectors, I should not expect them to
 support missing values. R sometimes irritates me by adding NA's when I
 don't expect it. I'd rather have the error than have NA's sneak in
 there. Also, there may be other types of AbstractDataFrames where we
 don't have the ability to assign missing values. HDF5 tables are one
 example I can think of. We wouldn't want to try to autoconvert a huge
 HDF5 column to a DataVector.
 
 
 
 On Thu, Jan 23, 2014 at 8:58 PM, John Myles White
 johnmyleswh...@gmail.com wrote:
 A couple of points that expand on Tom’s comments:
 
 (1) We need to add Tom’s definition of countna(a::Array) = 0 to show() wide 
 DataFrame’s that contain any columns that are Vector’s. I never use 
 DataFrame’s like that, so I forgot that others might. It’s also impossible 
 to produce such a DataFrame using our current I/O routines.
 
 (2) The constructor you’re using does exist, Jacob, but you should typically 
 pass in a Vector{Any}, each element of which is either a DataVector or 
 PooledDataVector. See Point (3) for why, at the moment, using a Vector as a 
 column is subtly broken.
 
 (3) If people are going to put Vector’s in DataFrames for performance 
 reasons, all of our setindex!() functions for DataFrames need to add methods 
 that automatically convert Vector’s to DataVector’s if an NA is inserted in 
 a Vector. Right now that kind of insertion is just going to error out. Ths 
 check isn’t too hard, but it’s totally missing from our current codebase.
 
 Personally, I would prefer that we not allow any of the columns of a 
 DataFrame to be Vector's. It’s a weird edge case that doesn’t actually offer 
 reliable high performance, because the potential performance improvements 
 relies on the unsafe assumption that a DataFame won’t contain any columns 
 with NA’s in it.
 
 — John
 
 On Jan 23, 2014, at 1:33 PM, Tom Short tshort.rli...@gmail.com wrote:
 
 That works, but columns will be Arrays instead of DataArrays. That's
 the way it's always worked. If you want them to be DataArrays, then
 convert to DataArrays right at the end.
 
 To fix show to support columns that are arrays, we probably need (at
 least) to define the following:
 
 countna(da::Array) = 0
 
 
 
 On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn quinn.jac...@gmail.com wrote:
 Great investigative work. Is
 DataFrames( array_of_arrays, Index(column_names_array) )
 not the right way to hand construct DataFrames any more? I think I can
 allocate DataArrays instead, but at every step of the way, I was trying to
 hand-optimize the result fetching process, which resulted in not creating a
 DataArray or DataFrame until right before we return to the user.
 
 -Jacob
 
 
 On Thu, Jan 23, 2014 at 3:27 PM, bp2012 bert.pritch...@gmail.com wrote:
 
 To check Jacob's suggestion about versions mismatch I completely removed
 the DataFrames and ODBC packages using Pkg.rm and physically deleted the
 directories from disk. I then added them via Pkg.add and Pkg,update.
 
 I am running the julia nightlies build.
 julia versioninfo()
 Julia Version 0.3.0-prerelease+1127
 Commit bc73674* (2014-01-22 20:09 UTC)
 
 Pkg.status()
 - DataFrames  0.5.1
 - ODBC  0.3.5
 
 Pkg.checkout(ODBC)
 INFO: Checking out ODBC master...
 INFO: Pulling ODBC latest master...
 INFO: No packages to install, update or remove
 
 julia Pkg.checkout(DataFrames)
 INFO: Checking out DataFrames master...
 INFO: Pulling DataFrames latest master...
 INFO: No packages to install, update or remove
 
 I did some digging. It looks like there is a mismatch in that countna
 expects DataFrame columns to be DataArrays. However the ODBC package 
 returns
 DataFrames that have array columns (using the first constructor in
 dataframe.jl). You guys would know better as to whether a change is needed
 in the constructor or if countna should also accept Array columns.
 
 
 I made some local changes to work around the issue.
 
 show.jl:
 line 42:  if isna(col, i) changed to  if isna(col[i])
 line 322:  missing[j] = countna(adf[j]) changed tomissing[j] =
 countna(isa(adf[j], DataArray) ? adf[j] : DataArray(adf[j]))
 
 These work great for me.
 
 
 



Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-23 Thread John Myles White
Just saw that. Seems like a very smart way to get us important functionality 
while we continue to push things forward. Would be very cool if we could make 
it possible to switch between the Pandas and native Julia implementations 
totally seamlessly.

 — John

On Jan 23, 2014, at 7:51 PM, Jonathan Malmaud malm...@gmail.com wrote:

 Sounds reasonable. As a temporary measure for people who want that 
 functionality immediately, I've taken a stab at wrapping pandas in a Julia 
 package (just as pyplot does for matplotlib), at 
 https://github.com/malmaud/pandas. 
 
 On Thursday, January 23, 2014 10:17:40 AM UTC-5, John Myles White wrote:
 Yeah, at some point in the future I’d like to see if we can imitate the 
 experimental query() and eval() methods from Pandas. 
 
 It’s the fact that those methods were just recently introduced which made me 
 decide we needed to stop spending time on getting them working right now. 
 We’re way behind Pandas in terms of performance and reliability, so it’s a 
 bad idea for us to try being as feature complete until we catch up. 
 
  — John 
 
 On Jan 23, 2014, at 6:37 AM, Jonathan Malmaud mal...@gmail.com wrote: 
 
  Pandas has a 'query' method 
  (http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-query) 
  which uses the Python numexpr package for delayed evaluation (if i 
  understand what you mean by that in this context). 
 



Re: [julia-users] Re: Error: no method display(DataFrame)

2014-01-23 Thread John Myles White
Ok. I’m coming around to this.

How would you do I/O? If we make DataFrames expose a nullable property, we 
could plausibly produce vectors instead of data vectors when parsing CSV files.

 — John

On Jan 23, 2014, at 7:38 PM, Sean Garborg sean.garb...@gmail.com wrote:

 I'd think of #3 as a feature, too.
 
 Just to throw another use case in the ring, if DataFrames with a mix of 
 Vectors and DataVectors (with NAs) were performant, my co-workers and I would 
 usually pull in data marking all columns as Vectors, these columns would 
 remain Vectors, and derived columns would be mostly DataVectors.
 
 
 On Thursday, January 23, 2014 8:48:42 PM UTC-6, tshort wrote:
 I think of item #3 as a feature, not a bug. I don't like the idea of 
 auto-conversion. If I choose Vectors, I should not expect them to 
 support missing values. R sometimes irritates me by adding NA's when I 
 don't expect it. I'd rather have the error than have NA's sneak in 
 there. Also, there may be other types of AbstractDataFrames where we 
 don't have the ability to assign missing values. HDF5 tables are one 
 example I can think of. We wouldn't want to try to autoconvert a huge 
 HDF5 column to a DataVector. 
 
 
 
 On Thu, Jan 23, 2014 at 8:58 PM, John Myles White 
 johnmyl...@gmail.com wrote: 
  A couple of points that expand on Tom’s comments: 
  
  (1) We need to add Tom’s definition of countna(a::Array) = 0 to show() wide 
  DataFrame’s that contain any columns that are Vector’s. I never use 
  DataFrame’s like that, so I forgot that others might. It’s also impossible 
  to produce such a DataFrame using our current I/O routines. 
  
  (2) The constructor you’re using does exist, Jacob, but you should 
  typically pass in a Vector{Any}, each element of which is either a 
  DataVector or PooledDataVector. See Point (3) for why, at the moment, using 
  a Vector as a column is subtly broken. 
  
  (3) If people are going to put Vector’s in DataFrames for performance 
  reasons, all of our setindex!() functions for DataFrames need to add 
  methods that automatically convert Vector’s to DataVector’s if an NA is 
  inserted in a Vector. Right now that kind of insertion is just going to 
  error out. Ths check isn’t too hard, but it’s totally missing from our 
  current codebase. 
  
  Personally, I would prefer that we not allow any of the columns of a 
  DataFrame to be Vector's. It’s a weird edge case that doesn’t actually 
  offer reliable high performance, because the potential performance 
  improvements relies on the unsafe assumption that a DataFame won’t contain 
  any columns with NA’s in it. 
  
   — John 
  
  On Jan 23, 2014, at 1:33 PM, Tom Short tshort...@gmail.com wrote: 
  
  That works, but columns will be Arrays instead of DataArrays. That's 
  the way it's always worked. If you want them to be DataArrays, then 
  convert to DataArrays right at the end. 
  
  To fix show to support columns that are arrays, we probably need (at 
  least) to define the following: 
  
  countna(da::Array) = 0 
  
  
  
  On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn quinn@gmail.com wrote: 
  Great investigative work. Is 
  DataFrames( array_of_arrays, Index(column_names_array) ) 
  not the right way to hand construct DataFrames any more? I think I can 
  allocate DataArrays instead, but at every step of the way, I was trying 
  to 
  hand-optimize the result fetching process, which resulted in not creating 
  a 
  DataArray or DataFrame until right before we return to the user. 
  
  -Jacob 
  
  
  On Thu, Jan 23, 2014 at 3:27 PM, bp2012 bert.pr...@gmail.com wrote: 
  
  To check Jacob's suggestion about versions mismatch I completely removed 
  the DataFrames and ODBC packages using Pkg.rm and physically deleted the 
  directories from disk. I then added them via Pkg.add and Pkg,update. 
  
  I am running the julia nightlies build. 
  julia versioninfo() 
  Julia Version 0.3.0-prerelease+1127 
  Commit bc73674* (2014-01-22 20:09 UTC) 
  
  Pkg.status() 
  - DataFrames  0.5.1 
  - ODBC  0.3.5 
  
  Pkg.checkout(ODBC) 
  INFO: Checking out ODBC master... 
  INFO: Pulling ODBC latest master... 
  INFO: No packages to install, update or remove 
  
  julia Pkg.checkout(DataFrames) 
  INFO: Checking out DataFrames master... 
  INFO: Pulling DataFrames latest master... 
  INFO: No packages to install, update or remove 
  
  I did some digging. It looks like there is a mismatch in that countna 
  expects DataFrame columns to be DataArrays. However the ODBC package 
  returns 
  DataFrames that have array columns (using the first constructor in 
  dataframe.jl). You guys would know better as to whether a change is 
  needed 
  in the constructor or if countna should also accept Array columns. 
  
  
  I made some local changes to work around the issue. 
  
  show.jl: 
  line 42:  if isna(col, i) changed to  if isna(col[i]) 
  line 322:  missing[j] = countna(adf[j]) changed tomissing[j

Re: [julia-users] Re: Error: no method display(DataFrame)

2014-01-23 Thread John Myles White
Yeah, that seems totally reasonable to me. If we do this in a more formal way, 
I’m now onboard.

Let’s add the idea of explicit restrictions on columns that can and can’t 
contain NA’s to the spec: https://github.com/JuliaStats/DataFrames.jl/issues/502

 — John

On Jan 23, 2014, at 8:21 PM, Sean Garborg sean.garb...@gmail.com wrote:

 My first thought was a Vector{Bool}.
 
 On Thursday, January 23, 2014 10:05:25 PM UTC-6, John Myles White wrote:
 Ok. I’m coming around to this.
 
 How would you do I/O? If we make DataFrames expose a nullable property, we 
 could plausibly produce vectors instead of data vectors when parsing CSV 
 files.
 
  — John
 
 On Jan 23, 2014, at 7:38 PM, Sean Garborg sean.g...@gmail.com wrote:
 
 I'd think of #3 as a feature, too.
 
 Just to throw another use case in the ring, if DataFrames with a mix of 
 Vectors and DataVectors (with NAs) were performant, my co-workers and I 
 would usually pull in data marking all columns as Vectors, these columns 
 would remain Vectors, and derived columns would be mostly DataVectors.
 
 
 On Thursday, January 23, 2014 8:48:42 PM UTC-6, tshort wrote:
 I think of item #3 as a feature, not a bug. I don't like the idea of 
 auto-conversion. If I choose Vectors, I should not expect them to 
 support missing values. R sometimes irritates me by adding NA's when I 
 don't expect it. I'd rather have the error than have NA's sneak in 
 there. Also, there may be other types of AbstractDataFrames where we 
 don't have the ability to assign missing values. HDF5 tables are one 
 example I can think of. We wouldn't want to try to autoconvert a huge 
 HDF5 column to a DataVector. 
 
 
 
 On Thu, Jan 23, 2014 at 8:58 PM, John Myles White 
 johnmyl...@gmail.com wrote: 
  A couple of points that expand on Tom’s comments: 
  
  (1) We need to add Tom’s definition of countna(a::Array) = 0 to show() 
  wide DataFrame’s that contain any columns that are Vector’s. I never use 
  DataFrame’s like that, so I forgot that others might. It’s also impossible 
  to produce such a DataFrame using our current I/O routines. 
  
  (2) The constructor you’re using does exist, Jacob, but you should 
  typically pass in a Vector{Any}, each element of which is either a 
  DataVector or PooledDataVector. See Point (3) for why, at the moment, 
  using a Vector as a column is subtly broken. 
  
  (3) If people are going to put Vector’s in DataFrames for performance 
  reasons, all of our setindex!() functions for DataFrames need to add 
  methods that automatically convert Vector’s to DataVector’s if an NA is 
  inserted in a Vector. Right now that kind of insertion is just going to 
  error out. Ths check isn’t too hard, but it’s totally missing from our 
  current codebase. 
  
  Personally, I would prefer that we not allow any of the columns of a 
  DataFrame to be Vector's. It’s a weird edge case that doesn’t actually 
  offer reliable high performance, because the potential performance 
  improvements relies on the unsafe assumption that a DataFame won’t contain 
  any columns with NA’s in it. 
  
   — John 
  
  On Jan 23, 2014, at 1:33 PM, Tom Short tshort...@gmail.com wrote: 
  
  That works, but columns will be Arrays instead of DataArrays. That's 
  the way it's always worked. If you want them to be DataArrays, then 
  convert to DataArrays right at the end. 
  
  To fix show to support columns that are arrays, we probably need (at 
  least) to define the following: 
  
  countna(da::Array) = 0 
  
  
  
  On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn quinn@gmail.com wrote: 
  Great investigative work. Is 
  DataFrames( array_of_arrays, Index(column_names_array) ) 
  not the right way to hand construct DataFrames any more? I think I can 
  allocate DataArrays instead, but at every step of the way, I was trying 
  to 
  hand-optimize the result fetching process, which resulted in not 
  creating a 
  DataArray or DataFrame until right before we return to the user. 
  
  -Jacob 
  
  
  On Thu, Jan 23, 2014 at 3:27 PM, bp2012 bert.pr...@gmail.com wrote: 
  
  To check Jacob's suggestion about versions mismatch I completely 
  removed 
  the DataFrames and ODBC packages using Pkg.rm and physically deleted 
  the 
  directories from disk. I then added them via Pkg.add and Pkg,update. 
  
  I am running the julia nightlies build. 
  julia versioninfo() 
  Julia Version 0.3.0-prerelease+1127 
  Commit bc73674* (2014-01-22 20:09 UTC) 
  
  Pkg.status() 
  - DataFrames  0.5.1 
  - ODBC  0.3.5 
  
  Pkg.checkout(ODBC) 
  INFO: Checking out ODBC master... 
  INFO: Pulling ODBC latest master... 
  INFO: No packages to install, update or remove 
  
  julia Pkg.checkout(DataFrames) 
  INFO: Checking out DataFrames master... 
  INFO: Pulling DataFrames latest master... 
  INFO: No packages to install, update or remove 
  
  I did some digging. It looks like there is a mismatch in that countna 
  expects DataFrame columns to be DataArrays

Re: [julia-users] New Year's resolutions for DataArrays, DataFrames and other packages

2014-01-24 Thread John Myles White
I think they’re uncorrelated, but you’d have to ask Wes to know for sure.

 — John

On Jan 24, 2014, at 12:19 AM, Matthias BUSSONNIER 
bussonniermatth...@gmail.com wrote:

 
 Le 24 janv. 2014 à 04:51, Jonathan Malmaud a écrit :
 
 Sounds reasonable. As a temporary measure for people who want that 
 functionality immediately, I've taken a stab at wrapping pandas in a Julia 
 package (just as pyplot does for matplotlib), at 
 https://github.com/malmaud/pandas. 
 
 
 Would this explain this Tweet from 10h Ago ?
 
 Wes McKinney @wesmckinn
 Friendly reminder that performance-obsessed data hackers (R, Python, Julia) 
 should feel free to drop me a line about working together
 
 -- 
 M



Re: [julia-users] Bug or feature? How does = decide whether to do a copy or deepcopy?

2014-01-24 Thread John Myles White
Hi Eric,

I think you’re being confused by the distinction between the bindings of 
variables and values, which can be bound to variables.

If w is an Array, then an expression like

w = [1, 2, 3]

assigns a value (namely the value of an array containing 1, 2 and 3) to the 
variable w.

In contrast, an expression like

w[1] = 4

does not refer to a variable called w[1]. It refers to a position in memory 
whose value is being mutated to 4.

 — John

On Jan 24, 2014, at 9:01 PM, Eric Ford ericbf...@gmail.com wrote:

 Hi Ivar,
 
 Thanks for the idea.  But replacing w += 10 with w = w + 10 still gives the 
 same problem.   (As does w = w .+ 10).
 Either way, after calling f2, the values of x are modified.  
 
 Even stranger, the values of x are not modified by the following function
 function f3(x::Array)
  w = x + 0.
  for i in 1:length(w)
w[i] = w[i] + 10.0
  end
  return w
 end
 So the behavior of future lines of code depends on whether w is initialized 
 as w = x or w = x + 0.
 
 It appears that for some reason, julia is doing different things when the 
 right hand side is just an array, versus when it is an expression.  
 Maybe somebody considers that a feature, but it's definitely non-intuitive.  
 
 I can't find any mention of this in the documentation.  Amusingly, there 
 appears to be basically no documentation for what = does (despite it being 
 the most common symbol in most every julia program).  
 I thought = might be implemented as a generic function with different 
 behaviors for Arrays and expressions, but methods(=) and help(=) turned up 
 nothing.  
 On one hand, it seems, like = should be fairly obvious, but evidently there 
 are currently important differences in the behavior of = that need to be 
 documented and/or corrected.
 
 Cheers,
 Eric
 
 
 On Friday, January 24, 2014 2:38:53 PM UTC-5, Ivar Nesje wrote:
 It becomes more obvious if you write a += 3 as the longer form a = a + 4
 
 kl. 19:57:22 UTC+1 fredag 24. januar 2014 skrev Eric Ford følgende:
 Sorry.  (Evidently, downloading a notebook is based on the last saved version 
 and what's currently on your screen.)  IJulia notebook attached.  Readable 
 version below.
 
 function f1(x::Array)
  w = x  
  w += 10.0
  return w
 end
 
 function f2(x::Array)
  w = x  
  for i in 1:length(x)
w[i] += 10.0
  end
  return w
 end
 
 x=randn(10)
 x_orig = deepcopy(x)
 f1_of_x = f1(x)
 println(After f1: ,sum((x.-x_orig).^2));
 x = deepcopy(x_orig)
 f2_of_x = f2(x)
 println(After f2: ,sum((x.-x_orig).^2));
 
 After f1: 0.0
 After f2: 1000.0
 
 Thanks,
 Eric
 
 
 On Friday, January 24, 2014 1:10:59 PM UTC-5, Kevin Squire wrote:
 In what you posted,  `f1` and `f2` are identical (except for the name).  Can 
 you share the output of a Julia or IJulia session showing the problem?
 
 Cheers,
Kevin
 
 
 On Fri, Jan 24, 2014 at 9:58 AM, Eric Ford eric...@gmail.com wrote:
 I don't understand why the first function doesn't change x, but the second 
 function does.  
 Is the = calling deepcopy in f1, but copy in f2?  
 If so, why?
 
 function f1(x::Array)
  w = x  
  for i in 1:length(x)
w[i] += 10.0
  end
  w += 10.0
  return w
 end
 
 function f2(x::Array)
  w = x  
  for i in 1:length(x)
w[i] += 10.0
  end
  w += 10.0
  return w
 end
 
 x=randn(10)
 x_orig = deepcopy(x)
 f_of_x = f(x)
 sum((x.-x_orig).^2)
 
 After f1: 0.0
 After f2: 1000.0
 
 Thanks,
 Eric (on behalf of an Astro 585 student)
 
 



Re: [julia-users] pretty printing

2014-01-25 Thread John Myles White
You need to override Base.show(io::IO, foo:T)

show()’s definition provides the basis for most other printing methods.

 — John

On Jan 25, 2014, at 3:22 AM, Shoibal Chakravarty shoib...@gmail.com wrote:

 Suppose I define a composite type T.
 
 type T
 xx::Int
 yy::Int
 end
 
 juliaT
 (xx,yy)
 
 I want to change what the repl prints when I do T[enter] on the command line. 
 Which function should I change to to do this (the equivalent of T.__repr__() 
 in Python).
 
 Thanks,
 Shoibal.
 
 



Re: [julia-users] Implementing a special Array{Float64, 2}

2014-01-26 Thread John Myles White
There’s a lot of built-in functions for showing and displaying AbstractArrays. 
Are you extending them?

Right now AbstractArray implies a slightly underdocumented interface, which you 
have to implement before inheriting from AbstractArray will work right. I’m 
hopeful this interface will get documented after Julia stabilizes, but for now 
I’ve used trial-and-error to figure out what needs to be implemented.

 — John

On Jan 26, 2014, at 9:50 AM, Jesse van den Kieboom jesse...@gmail.com wrote:

 On Sunday, January 26, 2014 5:59:03 PM UTC+1, John Myles White wrote:
 Right now this is a little tricky. It’s come up before and will probably have 
 some kind of solution in the future. 
 
 For now, you might find 
 http://grollchristian.wordpress.com/2014/01/22/julia-inheriting-behavior/ 
 useful. 
 
 Thanks, that was an interesting read which addresses exactly what I was 
 doing. I have a related question, that maybe you (or someone else) can 
 answer. I have the following type:
 
 type MotionVector : AbstractArray{Float64}
   v::Array{Float64}
 
   MotionVector() = (x = new(); x.v = zeros(6, 1); x)
   MotionVector(v) = (x = new(); x.v = v; x)
 end
 
 This seems to work, but when I do this, the display(MotionVector()) does not 
 work anymore, telling me:
 
 
 ERROR: no method display(MotionVector)
 
  in display at multimedia.jl:158
 
 
 
 Without inheriting from AbstractArray{Float64}, this doesn't happen.
 
  
 
  — John 
 
 On Jan 26, 2014, at 8:53 AM, Jesse van den Kieboom jess...@gmail.com wrote: 
 
  Hi all, 
  
  I'm new to julia, so forgive me for maybe asking something obvious. What I 
  would like to do is to create a new type which is basically an 
  Array{Float64, 2}, but has some special operations defined as part of the 
  vector space that it belongs to. 
  
  What I currently do is to create a new composite type with one field 
  containing the underlying array. This kind of works, but I need to proxy a 
  lot of operators (*, -, +, etc) and methods (getindex, setindex!, convert, 
  display, ndims, size), which do not need special behavior, to the 
  underlying array. 
  
  Initially, I tried to use a typealias instead of a composite type, but it 
  seems the typealias type information is not retained and so I can't define 
  new operations on it. Does anyone have a better way to implement this? 
 



Re: [julia-users] Merging dataframes

2014-01-26 Thread John Myles White
This is quite close to being possible, but we’re missing a few things.

Daniel Jones recently added an append! method to DataArrays, which would let 
you do this column-by-column.

To help you out, we need to add an append! method to DataFrames as well. I’ve 
wanted that badly myself lately.

I will try to get to this today, but am already pretty overwhelmed with work 
for the day.

 — John

On Jan 26, 2014, at 11:02 AM, Joosep Pata joosep.p...@gmail.com wrote:

 Is there a way to avoid copying when doing vcat(df1::DataFrame, 
 df2::DataFrame, …)? I’m trying to open hundreds of files with DataFrames, 
 merge all of them and save a single ~150M row x 100 col DataFrame using HDF5 
 and JLD (to be opened later using mmap), and it seems to work marvelously, 
 apart from the vcat.
 Does a no-copy option exist? I’m aware of DataStreams as a concept, but as I 
 understand, they’re not fully fleshed out yet.



Re: [julia-users] General Licensing Question

2014-01-26 Thread John Myles White
Hi Hans,

(1) The GPL makes it impossible for users of Julia to embed Julia as part of a 
closed source product. We’d prefer not to impose that restriction. The BSD and 
MIT licenses are largely identical: the major difference is that the BSD 
license comes in several flavors, not all of which are equivalent to the MIT 
license. The BSD license with two clauses is effectively the same license as 
the MIT license.

(2) All of the code written for Julia by Julia developers is licensed under the 
MIT license. Only some dependencies like FFTW are licensed under the GPL, but 
those dependencies are sufficient to make the aggregate of Julia + dependencies 
fall under the GPL.

(3) Either the removal or the recreation of the GPL components of the current 
Julia distribution would be sufficient to remove the GPL restriction on the 
Julia distribution. Some parts, like Rmath, are easily replaceable. Other 
parts, like SuiteSparse, are much harder to replace and would likely have to be 
removed to provide a non-GPL release.

I hope that helps.

 — John

On Jan 26, 2014, at 2:18 PM, Hans W Borchers hwborch...@gmail.com wrote:

 In the file DISTRIBUTING.md I read the following lines:
 
 Note that while the code for Julia is
 [MIT-licensed](https://github.com/JuliaLang/julia/blob/master/LICENSE.md),
 the distribution created by the techniques described herein will be
 GPL licensed, as various dependent libraries such as `FFTW`, `Rmath`,
 `SuiteSparse`, and `git` are GPL licensed. We do hope to have a
 non-GPL distribution of Julia in the future.
 
 For me this triggers the question: 
 
 (1) Why is the MIT license so much better for Julia than any GPL license?
  What is the main difference to consider? I think, Python is under BSD 
  license, would that be an alternative?
 
 (2) What does it mean that Julia (which part?) is under MIT license while the
  distribution is GPL-licensed. Are there legal consequences for this kind 
  of construction?
 
 (3) To have a non-GPLed version in the future: Does that mean, certain parts
  have to be removed, or will they have to be rewritten in C and Julia?
 
 Hans Werner
 



Re: [julia-users] Natural language processing in Julia

2014-01-27 Thread John Myles White
JuliaText would be great.

TextAnalysis.jl really needs a lot of love to move forward. For now, I’d 
strongly push people towards NLTK.

 — John

On Jan 27, 2014, at 8:29 AM, Jonathan Malmaud malm...@gmail.com wrote:

 I was thinking of starting up a Julia NLP meta-project on github if there's 
 enough interest. It could host projects like textanalysis.jl, a Julia 
 interface to NLTK, a Julia interface to some of Stanford's NLP tools, and 
 whatever more native solutions people put together.
 
 On Friday, October 25, 2013 9:32:10 AM UTC-4, Dahua Lin wrote:
 I wish there is something comparable to NLTK in Julia. In a recent project 
 that involves text parsing, I have to implement the text handling module in 
 Python, simply for the purpose of using NTLK and Jinja2. 
 
 If we can get the attention of the NLP community, I believe some NLP people 
 will build such things very soon.
 
 - Dahua
 
 
 On Tuesday, October 22, 2013 7:35:57 PM UTC-5, John Myles White wrote:
 There's a package called TextAnalysis.jl that has stemming and very basic 
 tokenization. Patches to do POS tagging would be very welcome. 
 
  -- John 
 
 On Oct 22, 2013, at 5:29 PM, Jonathan Malmaud mal...@gmail.com wrote: 
 
  Is anyone working on or know of a package to do NLP tasks with Julia, like 
  part-of-speech tagging and stemming? PyCall works fine with Python's NLTK, 
  so that would be my default choice if there isn't anything more native at 
  the moment. 
 



[julia-users] DBI: Generic database access in Julia

2014-01-27 Thread John Myles White
I've been intentionally holding off on announcing this work (because it's not 
even close to being ready for practical use yet), but I've been working with 
Eric Davies on a generic database access module in Julia called DBI:

https://github.com/johnmyleswhite/DBI.jl

The goal of DBI is to provide a consistent interface that specific database 
drivers can implement. Between Eric and me, some work's been done on 
implementing this for SQLite, MySQL and Postgres:

https://github.com/johnmyleswhite/SQLite.jl
https://github.com/johnmyleswhite/MySQL.jl
https://github.com/iamed2/PostgreSQL.jl

I've unfortunately slowed down so that I can fix up DataFrames, but I've seen a 
bunch of people working on database support recently and wanted to encourage 
collaboration early on.

Would be great to get everyone interested in database support to work together. 
I can't be in charge of this for another few weeks, but wanted to start a 
discussion so that everyone can collaborate effectively.

 -- John



Re: [julia-users] General Licensing Question

2014-01-27 Thread John Myles White
Yes, the main LICENSE file for Julia should contain more details about the 
legal status of subsets of the code and also about the distribution as an 
entirety.

 -- John

On Jan 27, 2014, at 9:52 AM, Hans W Borchers hwborch...@gmail.com wrote:

 Yes, but this is not downloaded with the source.
 At least in my source-master directory there is no COPYING file.
 And if the whole Julia distribution is GPLed, I would expect a version of the 
 license on highest level.
 
 
 On Monday, January 27, 2014 11:10:37 AM UTC+1, Shaun Walbridge wrote:
 The components which use the GPL license do already include copies of the 
 license -- e.g. https://github.com/JuliaLang/Rmath/blob/master/COPYING. I 
 believe this is true for the other GPL components as well (readline, FFTW, 
 patchelf).
 



Re: [julia-users] General Licensing Question

2014-01-28 Thread John Myles White
You’re right, the LICENSE.md file is pretty explicit.

 — John

On Jan 28, 2014, at 1:08 AM, Tobias Knopp tobias.kn...@googlemail.com wrote:

 Isn't the LICENSE.md file in Julia pretty clear? Julia is MIT licensed and 
 repl-readline.c is GPL. I don't see the problem. If I where using libjulia, I 
 can use it in a commercial program. One is of course not allowed to ship fftw 
 though. Still, libjulia and all the .jl files in Base are MIT licensed.
 
 I evantually plan to integrate Julia into a commerical product and I have 
 made some contributions to Julia and Gtk.jl. If Julia would be GPL I would 
 not have done this.
 
 Am Montag, 27. Januar 2014 22:21:31 UTC+1 schrieb John Myles White:
 Yes, the main LICENSE file for Julia should contain more details about the 
 legal status of subsets of the code and also about the distribution as an 
 entirety.
 
  -- John
 
 On Jan 27, 2014, at 9:52 AM, Hans W Borchers hwbor...@gmail.com wrote:
 
 Yes, but this is not downloaded with the source.
 At least in my source-master directory there is no COPYING file.
 And if the whole Julia distribution is GPLed, I would expect a version of 
 the license on highest level.
 
 
 On Monday, January 27, 2014 11:10:37 AM UTC+1, Shaun Walbridge wrote:
 The components which use the GPL license do already include copies of the 
 license -- e.g. https://github.com/JuliaLang/Rmath/blob/master/COPYING. I 
 believe this is true for the other GPL components as well (readline, FFTW, 
 patchelf).
 
 



Re: [julia-users] Can't manage packages

2014-01-28 Thread John Myles White
Try doing Pkg.rm(“Stats”).

 — John

On Jan 28, 2014, at 6:47 PM, Carlos Lesmes carlosles...@gmail.com wrote:

 Hi,
 I'm on mac 10.7 Julia 0.2.0, today I updated but found this:
 julia Pkg.update()
 INFO: Updating METADATA...
 INFO: Updating cache of Stats...
 INFO: Updating cache of StatsBase...
 INFO: Updating cache of Distance...
 INFO: Updating cache of JSON...
 INFO: Updating cache of PyPlot...
 INFO: Updating cache of NumericExtensions...
 ERROR: failed process: Process(`git 
 --git-dir=/Users/carloslesmes/.julia/.cache/Stats merge-base 
 0efba512a2bf8faa21e61c9568222ae1ae96acbb 
 5113ce6044fc554b350ea16f92502f8d6e077a62`, ProcessExited(1)) [1]
  in pipeline_error at process.jl:476
  in readbytes at process.jl:430
  in readall at process.jl:437
  in readchomp at git.jl:26
  in installed_version at pkg/read.jl:70
  in installed at pkg/read.jl:121
  in update at pkg/entry.jl:231
  in anonymous at pkg/dir.jl:25
  in cd at file.jl:22
  in cd at pkg/dir.jl:25
  in update at pkg.jl:40
 anybody knows what's wrong? Please help.
 



Re: [julia-users] Can't manage packages

2014-01-28 Thread John Myles White
Good to know.

 — John

On Jan 28, 2014, at 7:53 PM, Shaun Walbridge shaun.walbri...@gmail.com wrote:

 I had the same issue today, and blowing away Stats was insufficient, but 
 deleting  recreating ~/.julia did fix it.
 
 
 On Tue, Jan 28, 2014 at 9:56 PM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 Try doing Pkg.rm(“Stats”).
 
  — John
 
 On Jan 28, 2014, at 6:47 PM, Carlos Lesmes carlosles...@gmail.com wrote:
 
 Hi,
 I'm on mac 10.7 Julia 0.2.0, today I updated but found this:
 julia Pkg.update()
 INFO: Updating METADATA...
 INFO: Updating cache of Stats...
 INFO: Updating cache of StatsBase...
 INFO: Updating cache of Distance...
 INFO: Updating cache of JSON...
 INFO: Updating cache of PyPlot...
 INFO: Updating cache of NumericExtensions...
 ERROR: failed process: Process(`git 
 --git-dir=/Users/carloslesmes/.julia/.cache/Stats merge-base 
 0efba512a2bf8faa21e61c9568222ae1ae96acbb 
 5113ce6044fc554b350ea16f92502f8d6e077a62`, ProcessExited(1)) [1]
  in pipeline_error at process.jl:476
  in readbytes at process.jl:430
  in readall at process.jl:437
  in readchomp at git.jl:26
  in installed_version at pkg/read.jl:70
  in installed at pkg/read.jl:121
  in update at pkg/entry.jl:231
  in anonymous at pkg/dir.jl:25
  in cd at file.jl:22
  in cd at pkg/dir.jl:25
  in update at pkg.jl:40
 anybody knows what's wrong? Please help.
 
 
 



Re: [julia-users] Type stability of eig

2014-01-28 Thread John Myles White
How much worse would performance be if we “upgraded” all results to complex 
matrices?

 — John

On Jan 28, 2014, at 8:38 PM, Jiahao Chen jia...@mit.edu wrote:

 The reason is primarily for performance and secondarily for numerical
 stability. eig() on a Matrix implements a polyalgorithm depending on
 the symmetries of the input matrix. Certain symmetries, e.g. real
 symmetric or Hermitian, can be solve significantly more efficiently
 than the general case, and so eig() attempts to detect these
 symmetries at runtime and if found, dispatch to different LAPACK
 routines that are able to take advantage of faster and more stable
 algorithms. Several other generic linear algebraic functions are
 written in this fashion, notably \.
 
 (This was recently discussed in the context of issue #4006 with
 particular focus on sqrtm, whose code is somewhat easier to read than
 eigfact!. https://github.com/JuliaLang/julia/issues/4006)
 
 Thanks,
 
 Jiahao Chen, PhD
 Staff Research Scientist
 MIT Computer Science and Artificial Intelligence Laboratory



[julia-users] DataFrames changes

2014-01-29 Thread John Myles White
As we continue trying to prune DataFrames down to the essentials that we can 
reasonably commit to maintaining for the long-term future, we've decided to 
start using only symbols for the names of columns and remove all uses of 
strings.

This change will go live on master today, so please don't pull from master 
until you're ready to update your code.

 -- John



Re: [julia-users] Matlab versus Julia for loop timing

2014-01-29 Thread John Myles White
Can you show the call to @time / @elapsed so we know exactly what's being timed?

 -- John

On Jan 29, 2014, at 9:28 AM, Rajn rjngrj2...@gmail.com wrote:

 Now it takes even longer i.e., ~1 minute
 
 Does this make sense. Also I am running this loop only once. I do not 
 understand why writing in the function form would help. I read the manual but 
 they suggest writing function form for something which is used many times.
 I=runave(S,A,f)
 showim(I);
 
 function runave(S,A,f)
   imsz=size(A);
   p1=f+1;
   for n=(f+1):(imsz[2]-f-1)
 for m=(f+1):(imsz[1]-f)
   S[m,n+1]=S[m,n]+sum(sum(A[m-f:m+f,n+p1],2))-sum(sum(A[m-f:m+f,n-f],2));
 end
   end
   S;
 end
 
 Do I have to declare function parameters to speed it up.



Re: [julia-users] How to reload?

2014-01-29 Thread John Myles White
I don't think it's possible to redo the importing of names that `using` 
performs:

julia module Foo
   export a
   a = 1
   end

julia using Foo

julia a
1

julia module Foo
   export a
   a = 2
   end
Warning: replacing module Foo

julia a
1

julia using Foo
Warning: using Foo.a in module Main conflicts with an existing identifier.

julia a
1

julia Foo.a
2

On Jan 29, 2014, at 1:47 PM, Robert DJ math.rob...@gmail.com wrote:

 I am starting to work on a package, but I've run into a very mundane problem: 
 I can't figure out how to reload functions after editing.
 
 The first time I load the package with
 using package
 I discover a bug, fix it and run 
 reload(package)
 But I still get the same error. 
 
 If I exit Julia, start it again and load the package the error is (of course) 
 gone.
 
 What am I missing?
 
 Thanks,
 
 Robert



Re: [julia-users] DBI: Generic database access in Julia

2014-01-29 Thread John Myles White
Yeah, most of the work needed to push forward is building C wrappers.

 -- John

On Jan 29, 2014, at 11:56 AM, Randy Zwitch randy.zwi...@fuqua.duke.edu wrote:

 What are the types of skills needed to get this off the ground? I know 
 ODBC.jl is a bunch of wrapping of C functions, is that's what's required here 
 as well?
 
 
 On Monday, January 27, 2014 12:30:22 PM UTC-5, John Myles White wrote:
 I've been intentionally holding off on announcing this work (because it's not 
 even close to being ready for practical use yet), but I've been working with 
 Eric Davies on a generic database access module in Julia called DBI:
 
 https://github.com/johnmyleswhite/DBI.jl
 
 The goal of DBI is to provide a consistent interface that specific database 
 drivers can implement. Between Eric and me, some work's been done on 
 implementing this for SQLite, MySQL and Postgres:
 
 https://github.com/johnmyleswhite/SQLite.jl
 https://github.com/johnmyleswhite/MySQL.jl
 https://github.com/iamed2/PostgreSQL.jl
 
 I've unfortunately slowed down so that I can fix up DataFrames, but I've seen 
 a bunch of people working on database support recently and wanted to 
 encourage collaboration early on.
 
 Would be great to get everyone interested in database support to work 
 together. I can't be in charge of this for another few weeks, but wanted to 
 start a discussion so that everyone can collaborate effectively.
 
  -- John
 



Re: [julia-users] DBI: Generic database access in Julia

2014-01-29 Thread John Myles White
That would be great.

 -- John

On Jan 29, 2014, at 12:19 PM, Stephen Pope stephen.p...@predict.com wrote:

 I cannot commit to anything at this moment, but surely if no one else 
 implements Oracle.jl my hand will be forced to do it :-)



[julia-users] Re: DBI: Generic database access in Julia

2014-01-29 Thread John Myles White
Inspired by Jonathan Malmaud's creation of a JuliaText organization, I created 
a JuliaDB GitHub organization so that we can have a consistent place to discuss 
these issues:

https://github.com/JuliaDB/Roadmap.jl/issues/1

Looking at Jonathan's approach, I realized that a lot of the Julia SIG's that 
are forming might benefit from having a Roadmap.jl repo to centralize 
discussion and point people towards canonical implementations of functionality. 
It's been quite useful to have one for JuliaStats and I hope JuliaDB will 
benefit in the same way.

 -- John

On Jan 27, 2014, at 9:30 AM, John Myles White johnmyleswh...@gmail.com wrote:

 I've been intentionally holding off on announcing this work (because it's not 
 even close to being ready for practical use yet), but I've been working with 
 Eric Davies on a generic database access module in Julia called DBI:
 
 https://github.com/johnmyleswhite/DBI.jl
 
 The goal of DBI is to provide a consistent interface that specific database 
 drivers can implement. Between Eric and me, some work's been done on 
 implementing this for SQLite, MySQL and Postgres:
 
 https://github.com/johnmyleswhite/SQLite.jl
 https://github.com/johnmyleswhite/MySQL.jl
 https://github.com/iamed2/PostgreSQL.jl
 
 I've unfortunately slowed down so that I can fix up DataFrames, but I've seen 
 a bunch of people working on database support recently and wanted to 
 encourage collaboration early on.
 
 Would be great to get everyone interested in database support to work 
 together. I can't be in charge of this for another few weeks, but wanted to 
 start a discussion so that everyone can collaborate effectively.
 
  -- John
 



Re: [julia-users] Can't manage packages

2014-01-29 Thread John Myles White
Ok. You'll unfortunately have to either (1) delete your ~/.julia folder or (2) 
manually rename the Stats package to StatsBase and then edit its .git/config 
file.

 -- John

On Jan 29, 2014, at 6:21 PM, Carlos Lesmes carlosles...@gmail.com wrote:

 
 I got
 
 Pkg.rm(Stats)
 
  
 
 ERROR: failed process: Process(`git 
 --git-dir=/Users/carloslesmes/.julia/.cache/Stats merge-base 
 0efba512a2bf8faa21e61c9568222ae1ae96acbb 
 5113ce6044fc554b350ea16f92502f8d6e077a62`, ProcessExited(1)) [1] 
 
  in pipeline_error at process.jl:476 
 
  in readbytes at process.jl:430 
 
  in readall at process.jl:437 
 
  in readchomp at git.jl:26 
 
  in installed_version at pkg/read.jl:70 
 
  in installed at pkg/read.jl:121 
 
  in resolve at pkg/entry.jl:316 
 
  in edit at pkg/entry.jl:24 
 
  in rm at pkg/entry.jl:51 
 
  in anonymous at pkg/dir.jl:25 
 
  in cd at file.jl:22 
 
  in cd at pkg/dir.jl:25 
 
  in rm at pkg.jl:18
 
 
 On Tuesday, January 28, 2014 9:56:16 PM UTC-5, John Myles White wrote:
 Try doing Pkg.rm(“Stats”).
 
  — John
 
 On Jan 28, 2014, at 6:47 PM, Carlos Lesmes carlos...@gmail.com wrote:
 
 Hi,
 I'm on mac 10.7 Julia 0.2.0, today I updated but found this:
 julia Pkg.update()
 INFO: Updating METADATA...
 INFO: Updating cache of Stats...
 INFO: Updating cache of StatsBase...
 INFO: Updating cache of Distance...
 INFO: Updating cache of JSON...
 INFO: Updating cache of PyPlot...
 INFO: Updating cache of NumericExtensions...
 ERROR: failed process: Process(`git 
 --git-dir=/Users/carloslesmes/.julia/.cache/Stats merge-base 
 0efba512a2bf8faa21e61c9568222ae1ae96acbb 
 5113ce6044fc554b350ea16f92502f8d6e077a62`, ProcessExited(1)) [1]
  in pipeline_error at process.jl:476
  in readbytes at process.jl:430
  in readall at process.jl:437
  in readchomp at git.jl:26
  in installed_version at pkg/read.jl:70
  in installed at pkg/read.jl:121
  in update at pkg/entry.jl:231
  in anonymous at pkg/dir.jl:25
  in cd at file.jl:22
  in cd at pkg/dir.jl:25
  in update at pkg.jl:40
 anybody knows what's wrong? Please help.
 
 



Re: [julia-users] Re: DataFrames changes

2014-01-29 Thread John Myles White
We mostly did this to prepare for the time when Julia will let us overload the 
dot-operator to access columns like df.col1. Symbols also encourage people to 
use valid Julia identifiers as column names, which makes it easier to work with 
column names in some contexts.

 — John

On Jan 29, 2014, at 5:46 PM, Cristóvão Duarte Sousa cris...@gmail.com wrote:

 BTW, is there some documentation about the choice of symbols vs strings for 
 this kind of stuff (dictionary keys, optional function args, etc.)? Are 
 symbols more efficient for this?
 
 
 
 On Wednesday, January 29, 2014 5:11:20 PM UTC, John Myles White wrote:
 As we continue trying to prune DataFrames down to the essentials that we can 
 reasonably commit to maintaining for the long-term future, we've decided to 
 start using only symbols for the names of columns and remove all uses of 
 strings. 
 
 This change will go live on master today, so please don't pull from master 
 until you're ready to update your code. 
 
  -- John 
 



Re: [julia-users] Matlab versus Julia for loop timing

2014-01-30 Thread John Myles White
This is pretty standard fare for Julia. Things like sum are really wasteful 
with memory, whereas the nuclear option is very conservative when implemented 
right.

 — John

On Jan 30, 2014, at 7:30 AM, Rajn rjngrj2...@gmail.com wrote:

 Stefan,
 You wanted to know how the nuclear option worked in comparison to usage of 
 sum(sub(A,...) for my problem.
 This is just AMAZING!
 
 @time for your 2nd suggestion i.e., sum,sub gave a time of 12.9 seconds
 @time for your 3rd suggestion i.e., -nuclear suggestion gave a time of 0.36. 
 This is unbelievable!! Am I doing this right? I just took both your code, 
 inserted into my code and only timed this specific section.
 Does this mean that sum and sub together take nearly 35 times long to run 
 through 1440*1782 loops or a delay of ~40 microsecond per loop?   WOW!
 
 
 
 On Wednesday, January 29, 2014 12:59:27 PM UTC-5, Stefan Karpinski wrote:
 This sum(sum(foo,2)) business is really wasteful. Just do sum(foo) to take 
 the sum of foo. It's also better to extract the dimensions into individual 
 variables. Something like this:
 
 function runave1(S,A,f)
   s1, s2 = size(A)
   p1 = f+1
   for n = f+1:s2-f-1, m = f+1:s1-f
 S[m,n+1] = S[m,n] + sum(A[m-f:m+f,n+p1]) - sum(A[m-f:m+f,n-f])
   end
   S
 end
 
 I suspect that since Matlab forces this sum(sum(X,2)) idiom on you, it 
 probably detects it and automatically does the efficient thing. It's unclear 
 to me why you need two sum operations when the slices you're taking are just 
 single columns, but maybe I'm missing something here.
 
 Currently, taking array slices in Julia makes a copy, which is unfortunate, 
 but in the future they will be views. In the meantime, you might get better 
 performance by explicitly using views:
 
 function runave2(S,A,f)
   s1, s2 = size(A)
   p1 = f+1
   for n = f+1:s2-f-1, m = f+1:s1-f
 S[m,n+1] = S[m,n] + sum(sub(A,m-f:m+f,n+p1)) - sum(sub(A,m-f:m+f,n-f))
   end
   S
 end
 
 And, of course, there's always the nuclear option for really performance 
 critical code, which is to write out the summation manually:
 
 function runave3(S,A,f)
   s1, s2 = size(A)
   p1 = f+1
   for n = f+1:s2-f-1, m = f+1:s1-f
 t = S[m,n]
 for k = m-f:m+f; t += A[k,n+p1] - A[k,n-f]; end
 S[m,n+1] = t
   end
   S
 end
 
 Not so elegant, but probably the fastest possible version. Ideally, once 
 array slices are views, the simpler version of the code will be essentially 
 equivalent to this. It will take some compiler cleverness, but it's certainly 
 doable. It would be interesting to hear how each of these versions performs 
 on your data.
 
 On Wed, Jan 29, 2014 at 12:30 PM, John Myles White johnmyl...@gmail.com 
 wrote:
 Can you show the call to @time / @elapsed so we know exactly what's being 
 timed?
 
  -- John
 
 On Jan 29, 2014, at 9:28 AM, Rajn rjngr...@gmail.com wrote:
 
  Now it takes even longer i.e., ~1 minute
 
  Does this make sense. Also I am running this loop only once. I do not 
  understand why writing in the function form would help. I read the manual 
  but they suggest writing function form for something which is used many 
  times.
  I=runave(S,A,f)
  showim(I);
 
  function runave(S,A,f)
imsz=size(A);
p1=f+1;
for n=(f+1):(imsz[2]-f-1)
  for m=(f+1):(imsz[1]-f)

  S[m,n+1]=S[m,n]+sum(sum(A[m-f:m+f,n+p1],2))-sum(sum(A[m-f:m+f,n-f],2));
  end
end
S;
  end
 
  Do I have to declare function parameters to speed it up.
 
 



Re: [julia-users] Matlab versus Julia for loop timing

2014-01-30 Thread John Myles White
That's true. Sorry for misstating the core issue, which is memory allocation 
related to the current definition of array indexing.

 -- John

On Jan 30, 2014, at 8:55 AM, Tim Holy tim.h...@gmail.com wrote:

 On Thursday, January 30, 2014 07:32:18 AM John Myles White wrote:
 This is pretty standard fare for Julia. Things like sum are really wasteful
 with memory, whereas the nuclear option is very conservative when
 implemented right.
 
 To be fair, it's not sum() that's to blame, the problem is allocating a new 
 array with A[m-f:m+f, indx].
 
 --Tim
 



Re: [julia-users] DataFrames changes

2014-01-30 Thread John Myles White
We will automatically convert them to valid identifiers. I fear we are probably 
not doing that yet, but will get it done before we release a new version.

 -- John

On Jan 30, 2014, at 10:50 AM, Jonathan Malmaud malm...@gmail.com wrote:

 What's the plan for reading in files that have a header row with non-valid 
 Julia identifiers?
 
 On Wednesday, January 29, 2014 10:03:39 PM UTC-5, John Myles White wrote:
 Please go ahead and add deprecation warnings.
 
  — John
 
 On Jan 29, 2014, at 6:51 PM, Simon Kornblith si...@simonster.com wrote:
 
 I believe two identical symbols are the same object, which implies that Dict 
 lookup shouldn't require hashing. I haven't benchmarked this, though.
 
 Since this is a huge change (although one that I am in favor of) that 
 presumably affects a lot of existing code, any objection if I add some 
 deprecation warnings?
 
 Simon
 
 On Wednesday, January 29, 2014 8:46:19 PM UTC-5, Cristóvão Duarte Sousa 
 wrote:
 BTW, is there some documentation about the choice of symbols vs strings for 
 this kind of stuff (dictionary keys, optional function args, etc.)? Are 
 symbols more efficient for this?
 
 
 
 On Wednesday, January 29, 2014 5:11:20 PM UTC, John Myles White wrote:
 As we continue trying to prune DataFrames down to the essentials that we can 
 reasonably commit to maintaining for the long-term future, we've decided to 
 start using only symbols for the names of columns and remove all uses of 
 strings. 
 
 This change will go live on master today, so please don't pull from master 
 until you're ready to update your code. 
 
  -- John 
 
 



Re: [julia-users] Re: How to write a macro that can substitute variable values into an expression

2014-02-01 Thread John Myles White
If you want to do this, the easiest way is to define your own implementation of 
the @~ macro that the latest version Julia uses to parse expressions that look 
like R’s formulas.

That will give you access to the quoted expressions you’d need to manipulate to 
do your analysis.

Given those quoted expressions, you’ll need to define a symbolic 
differentiation tool that’s rich enough to handle the inputs you want to 
process. The Calculus package handles symbolic differentiation for a good chunk 
of functions, but you may need to extend it to your use case.

It may be worth noting that your example makes very heavy usage of R’s 
non-standard evaluation functionality, which is something that the Julia 
community has not invested much time into developing yet. Most Julia 
programmers tend to avoid operating on symbolic expressions.

 — John

On Feb 1, 2014, at 3:42 PM, Walking Sparrow hq...@gopivotal.com wrote:

 You are right about that I have an R background. What I am trying to do is to 
 evaluate a function given by the user. For example,
 
 I want to write a function that can compute the marginal effects of a linear 
 or logistic model. For simplicity, let's just use linear regression. If the 
 user did a linear regression using the following model (I am using the 
 formula syntax from R)
 
 y ~ x + z + sin(x) * sin(z) for the data set my_data, which has three columns 
 x, y, and z
 
 Then the marginal effects at the mean are computed like this: First, compute 
 the first derivative of 1+ x + z + sin(x) * sin(z). This can be done in R 
 using the function deriv to get the expression of the first derivative. In 
 the second step, I need to substitute the mean values of  x and z into the 
 result of the first step. An example of this would be the margins function 
 in the R package PivotalR (http://cran.r-project.org/web/packages/PivotalR/ 
 and https://github.com/gopivotal/PivotalR)
 
 Right now, I have no idea how to do the first step in Julia. But that is OK, 
 because I just started learning Julia.
 
 Now my question is in the second step. The user can use any complex 
 expressions in the linear regression like y ~ x + x*z + log(sin(x) + 2) * 
 log(cos(z) + 2), and the data set my_data and formula can have any number of 
 variables like x1, x2, , x1000. So when you write the code for the value 
 substitution in the second step, you cannot know which function and what 
 variables you will have.
 
 So in Julia or R, I need a function or macro F(f, []) that does this: 
 given a function f, whose format is the input from the user, and a set of 
 variable values [...], whose number and names are also the input from the 
 user, F(f, [...]) returns the value of f evaluated at the values [...]. For 
 example, the user inputs
 
 f = 1 + z + cos(x)*log(2+cos(z))/(2+sin(x))
 
 and [x = 2.3, z = 1.4],
 
 F should return the value of f evaluated at x = 2.3 and z = 1.4.
 
 This can be done in R, see margins function in PivotalR, which actually 
 does big data computation in-database. The problem is how to do the same 
 thing in Julia?
 
 Hope my explanation makes my question clearer.
 
 On Saturday, February 1, 2014 2:35:38 PM UTC-8, Jameson wrote:
 You need to provide more detail on what you are trying to do with this. You 
 seem 
 to be confusing several concepts involving the usage of expressions, 
 macros, and functions. I can't tell if you are trying to write special 
 syntax, or are just unaware of anonymous functions: 
 
 Mostly, why is :(sin(x) + cos(y) * sin(z)) an expression, and not a 
 function? It seems like you perhaps have an R background? 
 
 f(x,y,z) = (sin(x) + cos(y) * sin(z)) 
 f(1,2,3) 
 
 On Sat, Feb 1, 2014 at 12:04 PM, Walking Sparrow hq...@gopivotal.com wrote: 
  So the real question is how to generate a code block like this 
  
  quote 
  x = 2 
  y = 3 
  . 
  x + y +  
  end 
  
  Need to embed a for loop inside the macro definition? 
  
  
  
  On Saturday, February 1, 2014 8:52:30 AM UTC-8, Walking Sparrow wrote: 
  
  Please forgive me if this is a stupid question. Suppose I have an 
  expression 
  
  :(sin(x) + cos(y) * sin(z)) 
  
  and the values of x, y, z. 
  
  How can I write a macro that can substitute the values of x, y, z into the 
  above expression? The number of values that I want to substitute depends 
  on 
  the actual use cases and thus is unknown. 
  
  I wrote a function that can do this 
  
  function substitute(expr::Expr, vals::Array{Expr,1}) 
  for i = 1:length(vals) 
  @eval $(vals[i]) 
  end 
  @eval $expr 
  end 
  
  x = 10 
  y = 23 
  
  substitute(:(x+y), [:(x = 2), :(y = 3)]) 
  
  x 
  y 
  
  But if you run the above code, you will see that the values of global x 
  and y are changed, which is not what I intend to do. This is because 
  eval 
  does the evaluation in the global scope. Besides, I think it is a bad 
  coding 
  pattern to use eval and it is slow. 
  
  It would be better if this can 

Re: [julia-users] Re: Gadfly installation problem... ERROR: DenseArray not defined

2014-02-01 Thread John Myles White
I definitely agree that changing the version of Julia a package depends upon 
should trigger a 0.x - 0.(x + 1) bump.

 — John

On Feb 1, 2014, at 6:37 PM, Kevin Squire kevin.squ...@gmail.com wrote:

 One related thought: it would be nice if versions which target a new version 
 of Julia got a larger version bump, to make it easier to backport fixes to 
 previous versions of julia.  Something like:
 
 0.2.1  # targets Julia v0.2
 0.2.2
 0.2.3  # last real version which targets v0.2
 0.2.4  # simply add julia -0.2 to REQUIRES
 0.3.0  # first version which targets v0.3; use julia 0.3- in REQUIRES
 0.3.1
 0.3.2  # bug fix
 0.2.5  # port of bug fix back to 0.2 series
 
 There's no reason, of course, that the 0.2.x has to work with Julia v0.2, and 
 0.3.x has to work with Julia v0.3--It could just as easily be 0.1.x and 
 0.2.x, or 1.0.x and 2.0.x.
 
 Thoughts?
 
Kevin
 
 
 On Sat, Feb 1, 2014 at 3:20 PM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 I went into METADATA and updated the requires files, then submitted a new 
 commit. I actually did this for one release of NumericExtensions which would 
 reliably crash when loading on the 0.2 release.
 
  — John
 
 On Feb 1, 2014, at 3:19 PM, Dahua Lin linda...@gmail.com wrote:
 
 John,
 
 Could you elaborate a little bit about how you did this?
 
 Recent changes in NumericExtensions that rely on some new features have 
 caused headaches to users who use 0.2 release. I would like to do something 
 to fix it sometime next week.
 
 — Dahua
 
 On February 1, 2014 at 5:08:34 PM, John Myles White 
 (johnmyleswh...@gmail.com) wrote:
 
 I think so. I’ve done it recently and fixed some errors by doing it.
 
  — John
 
 On Feb 1, 2014, at 3:07 PM, Dahua Lin linda...@gmail.com wrote:
 
 Is it possible to update the requirement of previously tagged versions?
 
 On Friday, January 31, 2014 5:13:21 PM UTC-6, Ivar Nesje wrote:
 It seems like you are using the 0.2.0 version of Julia, and some package 
 authors have not correctly marked new versions of their package to require 
 0.3.0-prerelease when they decided to use features that has been 
 introduced after the release of 0.2.0. The consequence is that Pkg.add and 
 Pkg.update installs versions of some packages that is incompatible with 
 your version of Julia. I think this is a very unfortunate situation for 
 new people evaluating Julia, and the easiest way to solve this us to 
 compile from source or download a nightly release.
 
 



Re: [julia-users] printing from IJulia notebook

2014-02-02 Thread John Myles White
Is asking them to print PDF’s using the notebook export tools too onerous?

 — John

On Feb 2, 2014, at 8:33 AM, j verzani jverz...@gmail.com wrote:

 Is there an easy way to print an IJulia notebook? I'm using julia in a lab 
 setting and am providing notebooks for students to fill out and turn in. I'd 
 prefer they print them. Unfortunately, I don't see a print menu item and the 
 browser's print feature only prints the visible parts of the page. For the 
 tech savvy I've recommended exporting as an ipynb file, uploading to a public 
 site on dropbox, viewing that through nbviewer and then printing that web 
 page. Definitely tedious. Am I missing something obvious?



Re: [julia-users] Julia Parallel Computing Optimization

2014-02-03 Thread John Myles White
One potential performance issue here is that the array indexing steps like 
S[:,i][my] currently produce copies, not references, which would slow things 
down. Someone with more expertise in parallel programming might have better 
suggestions than that.

Have you tried profiling your code? 
http://docs.julialang.org/en/latest/stdlib/profile/

 — John

On Feb 3, 2014, at 6:32 AM, Alex C alex@gmail.com wrote:

 Hi,
 I am trying to port some Matlab code into Julia in order to improve 
 performance. The Julia parallel code currently takes about 2-3x as long as my 
 Matlab implementation. I am at wit's end as to how to improve the 
 performance. Any suggestions?  I tried using pmap but couldn't figure out how 
 to implement it in this case. FYI, I am using Julia on Windows 7 with 
 nprocs() = 5. 
 
 Thanks,
 Alex
 
 
 function 
 expensive_hat(S::Array{Complex{Float64},2},mx::Array{Int64,2},my::Array{Int64,2})
 
 samples = 64
 A = @parallel (+) for i = 1:samples
 abs2(S[:,i][my].*S[:,i][mx]);
 end
 B = @parallel (+) for i = 1:samples
 abs2(sqrt(conj(S[:,i][mx+my]).*S[:,i][mx+my]));
 end
 C = @parallel (+) for i = 1:samples
 conj(S[:,i][mx+my]).*S[:,i][my].*S[:,i][mx];
 end
 return (A.*B./samples./samples, C./samples);
 
 end
 
 data = rand(24000,64);
 limit = 2000;
 
 ix = int64([1:limit/2]);
 iy = ix[1:end/2];
 mg = zeros(Int64,length(iy),length(ix));
 mx = broadcast(+,ix',mg);
 my = broadcast(+,iy,mg);
 S = rfft(data,1)./24000;
 
 @elapsed (AB, C) = expensive_hat(S,mx,my)
 



Re: [julia-users] Julia Parallel Computing Optimization

2014-02-03 Thread John Myles White
Just to be clear: in the future, Julia will not makes copies during array 
slicing. But it does now, which can be costly.

 — John

On Feb 3, 2014, at 7:01 AM, David Salamon d...@lithp.org wrote:

 I agree with John about the insane amount of copying going on. However, I 
 added some @times to your code and it looks like most of the time is spent in 
 conj. You probably want to precompute that for both B and C's calculation.
 
 function expensive_hat(S::Array{Complex{Float64},2}, mx::Array{Int64,2}, 
 my::Array{Int64,2})
 samples = 64
 
 @time A = @parallel (+) for i = 1:samples
 abs2(S[:,i][my] .* S[:,i][mx]);
 end
 
 #@time B = @parallel (+) for i = 1:samples
 # abs2( sqrt( conj(S[:,i][mx+my]) .* S[:,i][mx+my] ) )
 @time b0 = conj(S[:,1][mx+my])
 @time b1 = b0 .* S[:,1][mx+my]
 @time b2 = sqrt(b1)
 @time B = abs2(b2)
 #end
 
 @time C = @parallel (+) for i = 1:samples
 conj(S[:,i][mx+my]) .* S[:,i][my].*S[:,i][mx];
 end
 
 @time ans = (A .* B ./ samples ./ samples, C./samples)
 
 return ans
 end
 
 data = rand(24000,64);
 limit = 2000;
 
 ix = int64([1:limit/2]);
 iy = ix[1:end/2];
 mg = zeros(Int64,length(iy),length(ix));
 mx = broadcast(+,ix',mg);
 my = broadcast(+,iy,mg);
 S = rfft(data,1)./24000;
 
 @time (AB, C) = expensive_hat(S,mx,my)
 
 
 On Mon, Feb 3, 2014 at 6:59 AM, John Myles White johnmyleswh...@gmail.com 
 wrote:
 One potential performance issue here is that the array indexing steps like 
 S[:,i][my] currently produce copies, not references, which would slow things 
 down. Someone with more expertise in parallel programming might have better 
 suggestions than that.
 
 Have you tried profiling your code? 
 http://docs.julialang.org/en/latest/stdlib/profile/
 
  — John
 
 On Feb 3, 2014, at 6:32 AM, Alex C alex@gmail.com wrote:
 
 Hi,
 I am trying to port some Matlab code into Julia in order to improve 
 performance. The Julia parallel code currently takes about 2-3x as long as 
 my Matlab implementation. I am at wit's end as to how to improve the 
 performance. Any suggestions?  I tried using pmap but couldn't figure out 
 how to implement it in this case. FYI, I am using Julia on Windows 7 with 
 nprocs() = 5. 
 
 Thanks,
 Alex
 
 
 function 
 expensive_hat(S::Array{Complex{Float64},2},mx::Array{Int64,2},my::Array{Int64,2})
 
 samples = 64
 A = @parallel (+) for i = 1:samples
 abs2(S[:,i][my].*S[:,i][mx]);
 end
 B = @parallel (+) for i = 1:samples
 abs2(sqrt(conj(S[:,i][mx+my]).*S[:,i][mx+my]));
 end
 C = @parallel (+) for i = 1:samples
 conj(S[:,i][mx+my]).*S[:,i][my].*S[:,i][mx];
 end
 return (A.*B./samples./samples, C./samples);
 
 end
 
 data = rand(24000,64);
 limit = 2000;
 
 ix = int64([1:limit/2]);
 iy = ix[1:end/2];
 mg = zeros(Int64,length(iy),length(ix));
 mx = broadcast(+,ix',mg);
 my = broadcast(+,iy,mg);
 S = rfft(data,1)./24000;
 
 @elapsed (AB, C) = expensive_hat(S,mx,my)
 
 
 



Re: [julia-users] How to write a macro that can substitute variable values into an expression

2014-02-03 Thread John Myles White
To make sure everyone’s on the same page, Walking Sparrow’s approach is 
completely standard for R. The way that R treats certain DataFrames as an 
additional scope in which to search for variable bindings is something R users 
have been taught to expect, even though it is an extremely un-Julian way of 
coding.

All that said, in DataFrames, our current solution is to completely avoid this 
kind of scoping until we’re confident that we can make it work efficiently. We 
may come back to it in the future, but there are other priorities to work on 
now.

 — John

On Feb 2, 2014, at 12:35 PM, Mauro mauro...@runbox.com wrote:

 On Sun, 2014-02-02 at 17:35, hq...@gopivotal.com wrote:
 User inputs a function, and a data.frame which contains all the variables 
 that appear in the function. I will need to substitute the mean values of 
 the variables into the function. (Actually for computing the marginal 
 effects, one also needs to compute the average of the function values 
 evaluated at all rows of the data.frame). 
 
 I think you're making hacking-life more complicated than it already is!
 You'll only need macros if you insist that the naming of the function
 arguments is automatically matched against the column names of the
 dataframe.  But I don't think that that is good idea: names of function
 arguments are here to refer to values inside the function and not
 outside of it.  Nor is it, I think, a particular Julian way of coding.
 
 I suggest instead something like this:
 
 user supplies
 - a function:
f(a,b,c,d) = ...
 - a DataFrame:
df
 - a tuple/list of column names to be used in the order they need to be
  inserted into f; i.e. this is a mapping from column-names to function
  argument position.  E.g.:
(height, :width, 'd', :x)
 
 you provide a function like so:
function F(userfn, datafr, fields)
# take mean of dataframe columns
colmeans = [mean(datafr[fl]) for fl in fields]
# maybe do some more stuff:
 
# call user function
return userfn(colmeans...) # (the three dots are the syntax used here)
end
 
 Then the user can call it like so:
F(f, df, (height, :width, 'd', :x))
 
 
 I reckon you ought to give this kind of user interface a try and see
 whether that works for you.
 
 So in order to use apply, I will need to extract the variable order and 
 names from the user-defined function. This is because the function might be 
 func(x,y,z), but the data.frame has the columns z, x, y, a, b, c, d (it has 
 more columns than what are needed by the function, which is the usual case. 
 And the order is different). 
 
 So John Myles White's opinion is that this is very hard to do in the 
 current Julia (see his post above).
 
 On Sunday, February 2, 2014 9:03:21 AM UTC-8, Johan Sigfrids wrote:
 
 Thinking about this, the whole let or macro might be overkill. If the 
 user provides both the function and the arguments, the user should be able 
 to provide the arguments in the correct form for the function, in which 
 case you need neither let nor macros. You could just call apply directly 
 on those two:
 
 user_function(x,y,z) = x + y + z^2
 user_arguments = (3, 4, 5)
 
 apply(user_function, user_arguments...)
 
 
 On Sunday, February 2, 2014 6:49:52 PM UTC+2, Walking Sparrow wrote:
 
 I guess apply and let can do some work here. But I do not know the 
 variable names and number that the user would use.
 
 So now I need a macro that can construct the let-apply block with the 
 variable number undetermined. The macro should be able to accept any 
 number 
 of variables.
 
 Suppose that the user inputs 
 
 func(x, y) = x+2y and x = 1, y = 2
 
 @my_macro func (x=1, y=2) would be expanded to
 
 let x = 1, y = 2
apply(func,1, 2)
 end
 
 And if the user inputs
 
 func(a, b, c, d) = a + b + c + d, and a = 1, b = 2, c = 3, d=4
 
 @my_macro func (a = 1, b = 2, c = 3, d = 4) would expand to
 
 let a = 1, b = 2, c = 3, d = 4
apply(func, 1,2,3,4)
 end
 
 How to write a macro like this? 
 
 If I knew the function and the variables, of course I could directly call 
 the function or use let-apply, but the problem is that these are the 
 inputs 
 of the user, which I cannot know beforehand.
 
 
 On Sunday, February 2, 2014 8:11:34 AM UTC-8, Keno Fischer wrote:
 
 Or you could just call the function directly:
 
 f = (x,y,z)-x+y+z^2
 let x=3, y=4, z=5
   f(x,y,z)
 end
 
 or 
 
 f((x,y,z)...)
 
 or 
 
 f((1,2,3)...)
 
 
 
 On Sun, Feb 2, 2014 at 10:59 AM, Johan Sigfrids 
 johan.s...@gmail.comwrote:
 
 Can't you just do this with apply? Something like this:
 
 f = (x, y, z) - x + y + z^2
 let x=3, y=4, z=5
apply(f, x, y, z)
 end
 
 
 On Sunday, February 2, 2014 5:47:36 PM UTC+2, Walking Sparrow wrote:
 
 Let me clarify a little bit. My question is actually the following:
 
 In R, one can do something like
 
 f - function(x1, x2, x3, x4, x5, x6) { some expressions that you 
 like to use }
 evaluate.at - list(x1 = 2, x2 = 2.3, x3 = 2, x4 = 1.2, x5 = 3.4, 
 x6 

Re: [julia-users] Sorting Index

2014-02-04 Thread John Myles White
I think you want sortperm.

 — John

On Feb 4, 2014, at 6:24 AM, RecentConvert giz...@gmail.com wrote:

 Is there an easier method to obtain the sorting index given a column of data? 
 In Matlab you can add a second output and it'll give you an index which you 
 can apply to other related arrays.
 
 using Datetime
 using DataFrame
 
 D = # load your data, DataFrame
 time = # Parse time from your loaded data
 
 y = [int64(time) [1:length(time)]]
 I = sortrows(y,by=x-x[1]) # Sort index (by time)
 I = I[1:end,2] # Remove unnecessary time column
 
 time = time[I] # Sort time by time
 



Re: [julia-users] Gadfly installation problem... ERROR: DenseArray not defined

2014-02-04 Thread John Myles White
Yes, assuming we can get the builds working smoothly, it would be really great 
to offer stable and unstable binaries right on the main downloads page.

 — John

On Feb 4, 2014, at 11:52 AM, Eric Davies iam...@gmail.com wrote:

 On Tuesday, 4 February 2014 12:35:19 UTC-6, Sung Soo Kim wrote:
 I think it would be a good idea to provide pre-release version as a binary. A 
 simple overnight automatic build system can be used to upload the most recent 
 pre-release version to the website easily, weekly (or even daily) basis 
 (though must be after success of automatic testing of the core and packages, 
 of course), so that new comers don't have to get into 'compiling' from the 
 source codes. Compiling IS a major barrier.
 
 This exists at http://status.julialang.org/, but it really should be visible 
 from the http://julialang.org/downloads/ page (and perhaps have matching 
 styling). 



Re: [julia-users] Gadfly installation problem... ERROR: DenseArray not defined

2014-02-04 Thread John Myles White
I’m sure you are.  :)

 — John

On Feb 4, 2014, at 6:37 PM, Elliot Saba staticfl...@gmail.com wrote:

 We're working on it, I promise. :)
 
 On Feb 4, 2014 6:01 PM, John Myles White johnmyleswh...@gmail.com wrote:
 Yes, assuming we can get the builds working smoothly, it would be really 
 great to offer stable and unstable binaries right on the main downloads page.
 
  — John
 
 On Feb 4, 2014, at 11:52 AM, Eric Davies iam...@gmail.com wrote:
 
 On Tuesday, 4 February 2014 12:35:19 UTC-6, Sung Soo Kim wrote:
 I think it would be a good idea to provide pre-release version as a binary. 
 A simple overnight automatic build system can be used to upload the most 
 recent pre-release version to the website easily, weekly (or even daily) 
 basis (though must be after success of automatic testing of the core and 
 packages, of course), so that new comers don't have to get into 'compiling' 
 from the source codes. Compiling IS a major barrier.
 
 This exists at http://status.julialang.org/, but it really should be visible 
 from the http://julialang.org/downloads/ page (and perhaps have matching 
 styling). 
 



Re: [julia-users] Re: operators and basic mathematical functions for DataFrames

2014-02-05 Thread John Myles White
This is definitely on purpose.

Quick summary:

* DataMatrix is a mathematical object
* DataFrame is a database

We're going to encourage use of colwise for some of these use cases. But for 
many of them we're going to encourage the use of DataMatrix instead.

 -- John

On Feb 5, 2014, at 5:07 AM, Johan Sigfrids johan.sigfr...@gmail.com wrote:

 Issue #484 seems to indicate it is on purpose. 
 
 On Wednesday, February 5, 2014 3:00:39 PM UTC+2, Christian Groll wrote:
 Since updating DataFrames and DataArrays recently, operators and basic 
 functions are not working on DataFrames anymore. Is this a new design 
 decision, or only temporary due to restructuring the code base?
 
 
 julia Pkg.status()
  - DataFrames0.5.1
  - DataArrays0.1.1
 
 julia df = DataFrame(rand(4, 2))
 4x2 DataFrame
 |---|--|--|
 | Row # | x1   | x2   |
 | 1 | 0.698851 | 0.353054 |
 | 2 | 0.427287 | 0.76353  |
 | 3 | 0.872991 | 0.182744 |
 | 4 | 0.779048 | 0.554823 |
 
 julia df + 1
 ERROR: no method +(DataFrame, Int64)
 
 julia mean(df)
 ERROR: no method +((ASCIIString,DataArray{Float64,1}), 
 (ASCIIString,DataArray{Float64,1}))
  in mean at statistics.jl:11
 
 julia df + df
 ERROR: no method +(DataFrame, DataFrame)
 



Re: [julia-users] Move Clustering.jl to JuliaStats

2014-02-05 Thread John Myles White
That's true. I find the mechanism a little opaque, so it makes it 
uncomfortable. But hopefully it will all work out.

 -- John

On Feb 5, 2014, at 2:04 AM, Ivar Nesje iva...@gmail.com wrote:

 I think Github will set up redirects if you use the move functionality.
 
 On Wednesday, February 5, 2014 2:54:36 AM UTC+1, John Myles White wrote:
 Hi all, 
 
 Over the coming weekend, I am going to move Clustering.jl to JuliaStats. I 
 hope the move will go smoothly, but am always wary about changing repo URL’s. 
 
  — John 
 



Re: [julia-users] Re: If (in my system) Int is an alias for Int32, then why there is no Float alias for Float32/64?

2014-02-07 Thread John Myles White
FYI, this claim about the safety of symbols is actually not true. You can 
reassign the bindings of sym just as easily as you can reassign the bindings of 
a variable bound to a.

 -- John

On Feb 7, 2014, at 8:00 AM, Felix dotfel...@gmail.com wrote:

 Ismeal VC
 check the julia docs at http://docs.julialang.org/en/latest/
 for the rest keep asking in this group someone will surely
 help you out.
 
 like look at a symbol as a safe string
 when you do something like 
 
 sym = :hello
 you will always know sym is hello
 
 you can also use it as an expression like
 http://docs.julialang.org/en/latest/manual/metaprogramming/#expressions-and-eval
 
 



Re: [julia-users] Re: Multiple plots in one (pdf) file?

2014-02-07 Thread John Myles White
Isn't the behavior Daniel described how ggplot2 works? Certainly it's how 
ggsave works.

 -- John

On Feb 7, 2014, at 9:41 AM, G. Patrick Mauroy gpmau...@gmail.com wrote:

 Ouch!
 In my opinion, this may be a major stumbling block for Julia adoption.
 I, and I am sure many, find it typical routine to load data, crunch, make a 
 variety of graphical views (sometimes very many), export them to files in an 
 organized way for analysis and sharing a story line.
 With many such plots, one file per plot could become quickly messy, harder to 
 manage.
 
 I suppose then a workaround would be to organize plots in sub-directories, as 
 PNG pictures for ease of scrolling through them.  Perhaps not that bad after 
 all thinking about it.  I suppose I can live with that.
 
 I still believe it would be a good idea if support to have multiple plots in 
 one pdf would be added somehow, very handy!
 
 Thanks for the info, it saves me some search time.
 
 On Friday, February 7, 2014 12:02:28 PM UTC-5, Daniel Jones wrote:
 There's not a way to put them on separate pdf pages, but you can stack them 
 and output them to the same pdf like:
 
 using Gadfly
 x = [1,2,3]
 plot1 = plot(x = x, y = x + 3)
 plot2 = plot(x = x, y = 2 * x + 1)
 draw(PDF(plotJ.pdf, 6inch, 6inch), vstack(plot1, plot2))
 
 
 On Friday, February 7, 2014 8:35:57 AM UTC-8, G. Patrick Mauroy wrote:
 Just starting taking a look at Julia.  I have seen examples on how to send a 
 plot to a file.  But I have not stumbled upon one example as yet to export 
 multiple plots to the same file, say pdf.
 Can someone please point me in the right direction?
 
 # R example of what I would like to do.
 x = 1:3
 pdf(file = plotR.pdf)
 plot(x = x, y = x + 3)
 plot(x = x, y = 2 * x + 1)
 dev.off()
 
 # My first Julia attempt.
 using Gadfly
 x = [1,2,3]
 plot1 = plot(x = x, y = x + 3)
 plot2 = plot(x = x, y = 2 * x + 1)
 draw(PDF(plotJ.pdf, 6inch, 3inch), plot1)
 draw(PDF(plotJ.pdf, 6inch, 3inch), plot2)
 
 Pb: plot2 overrides plot1, so only plot2 in plotJ.pdf.
 
 To be clear, in this example, I want plot1  plot2 in two distinct 
 plots/pages -- as opposed to merge both graphs into one plot.
 
 Thanks.



[julia-users] DBI / DBDSQLite

2014-02-09 Thread John Myles White
I’ve just moved DBI.jl to JuliaDB, the organization that I’m hoping will house 
Julia’s emerging database packages.

In the interest of getting some eyes on the DBI library without breaking Jacob 
Quinn’s substantially more stable SQLite.jl package, I’ve created a new 
DBDSQLite.jl package that provides a basic implementation of DBI’s interface.

Right now you need to make a custom binary of SQLite3 to use DBDSQLite.jl. I’m 
hoping to automatically build/provide custom binaries in the future to work 
around this.

Links for the curious:

https://github.com/JuliaDB/DBI.jl
https://github.com/JuliaDB/DBDSQLite.jl

— John



  1   2   3   4   5   6   7   8   >