[julia-users] Re: Performance confusions on matrix extractions in loops, and memory allocations

2014-11-10 Thread Todd Leo
I tested it again with @time test2(dtm'[:,1], dtm'[:,2]) and it took only 
0.013seconds. I also checked @time test2(v,w) and it resulted similar time. 
I changed nothing, it was odd.

On Monday, November 10, 2014 3:28:10 PM UTC+8, Daniel Høegh wrote:

 I have made a minimum test case:
 a=rand(1,2)
 function newsum(a)
 for i in 1:100
 sum(a[:,1])+sum(a[:,2])
 end
 end
 function newsum(a1,a2)
 for i in 1:100
 sum(a1)+sum(a2)
 end
 end
 @time newsum(a)
 @time newsum(a[:,1],a[:,2])
 elapsed time: 0.073095574 seconds (17709844 bytes allocated, 23.23% gc 
 time)
 elapsed time: 0.006946504 seconds (244796 bytes allocated)

 I suggest that a[:,1] is making a copy of the data in the a matrix this is 
 done in each iteration of the first function, but in the second function 
 this is done only once when the function is called like: 
 newsum(a[:,1],a[:,2]).



[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread David van Leeuwen
Hello, 

I didn't realize NamedArrays was broken on release-0.3, because of my lack 
of travis skills.  I had a different 0.4 incompatibility: 
(Dict{K,V})(ks::AbstractArray{K},vs::AbstractArray{V}) 
is deprecated, use (Dict{K,V})(zip(ks,vs)) instead.  Foolishly I replace 
my construct

Dict(keys, values)

by 

@Compat.dict(zip(keys, values))

but that breaks on release-0.3. 

Is there a recommended way to solve this incompatibility?

Cheers, 

---david



On Saturday, October 11, 2014 8:17:38 PM UTC+2, Stefan Karpinski wrote:

 This announcement is primarily for Julia package developers. Since there 
 is already some syntax breakage between Julia v0.3 and v0.4, and there will 
 be more, it's increasingly tricky to make packages to work on both 
 versions. The Compat package https://github.com/JuliaLang/Compat.jl was 
 just created to help: it provides compatibility constructs that will work 
 in both versions without warnings.

 For example, in v0.3 you could create a dictionary like this:

 julia [ :foo = 1, :bar = 2 ]
 Dict{Symbol,Int64} with 2 entries:
   :bar = 2
   :foo = 1


 This still works in v0.4 but it produces a warning. The new syntax is this:

 julia Dict(:foo = 1, :bar = 2)
 Dict{Symbol,Int64} with 2 entries:
   :bar = 2
   :foo = 1


 However, this newer syntax won't work in v0.3, so you're a bit stuck if 
 you want to write a dictionary literal in a way that will work in both v0.3 
 and v0.4 without producing a warning. Compat to the rescue!:

 julia using Compat

 julia @Compat.Dict(:foo = 2, :bar = 2)
 Dict{Symbol,Int64} with 2 entries:
   :bar = 2
   :foo = 2


 This works with no warning on both v0.3 and v0.4. We've intentionally not 
 exported the Dict macro so that the usage needs to be prefixed with 
 Compat., which will make usages of the compatibility workarounds easier 
 to find and remove later when they're no longer necessary.

 Currently, there's only a couple of definitions in the Compat package, but 
 if you have your own hacks that have helped make it easier to write 
 cross-version package code, please contribute them and we can build up a 
 nice little collection.



Re: [julia-users] no zero() for DateTime?

2014-11-10 Thread Ivar Nesje
Basically this is an issue with DataFrames using a function in base for a 
different purpose than its documented intent. zero() has been documented to 
mean additive identity 
http://docs.julialang.org/en/latest/stdlib/base/#Base.zero, and Date and 
DateTime, doesn't have an additive identity. (apart from the period types, 
but it is unclear which one to return)

Looking at dataframes, I discovered that they already monkey patch 
Base.zeros() to make it work for strings 
https://github.com/JuliaStats/DataFrames.jl/blob/211cd659cb7f9035980697f7effa081e29b9bf3e/src/dataframe/dataframe.jl#L805
.

I think this is a bigger issue to be discussed in the contest of the use 
case in DataFrames. My two obvious suggestions would be to:

   1. Change the documentation for zero() to say that it is the additive 
   identity unless it doesn't make sense, in which case any default value is 
   good.
   2. Create a new function in Base for this specific need of a default 
   value.

Ivar

kl. 03:53:43 UTC+1 mandag 10. november 2014 skrev Jacob Quinn følgende:

 HmmmI guess we could add 0 and 1 definitions if it'll be generally 
 useful (i.e. Date/DateTime s are ordinals with numeric-like properties, so 
 being able to define zero/one and have them work with generic functions).

 It still just seems a little weird because there's not a real solid 
 reasoning/meaning. I think one reason a lot of other languages define a 
 zero(::DateTime) is because values can be truthy or falsey, so you 
 would compare a date with zero(::DateTime) to check for falseness. In 
 Julia, you have to use explicit Booleans, so that's not as important a 
 reason.

 Happy to hear other opinions/use cases from people though.

 -Jacob

 On Sun, Nov 9, 2014 at 9:23 PM, Thomas Covert thom@gmail.com 
 javascript: wrote:

 To your first question, I'm sure there are good reasons for not having 
 zeros in the Date and Time types, but in other languages (i.e., stata), 
 dates and times are stored as integers or floats with respect to some 
 reference time.  So, I *think* the 0-date in stata refers to January 1, 
 1960.  Obviously this is fairly arbitrary, but there is some precedence for 
 it in other languages.

 On Sunday, November 9, 2014 8:17:04 PM UTC-6, Jacob Quinn wrote:

 What Date would represent zero(::Date)? Or one(::Date), for that matter? 
 Doesn't seem like a particularly useful definition. What's the use case?

 On Sun, Nov 9, 2014 at 9:14 PM, Thomas Covert thom@gmail.com 
 wrote:

 I'm using Dates.jl on 0.3 and have discovered that there is no zero 
 defined for the Date or DateTime types.  Is this intentional?






Re: [julia-users] no zero() for DateTime?

2014-11-10 Thread John Myles White
Yes, the use of zero is an anachronism from a design in which zero was used to 
have a default value for arbitrary types.

 -- John

On Nov 10, 2014, at 8:22 AM, Ivar Nesje iva...@gmail.com wrote:

 Basically this is an issue with DataFrames using a function in base for a 
 different purpose than its documented intent. zero() has been documented to 
 mean additive identity, and Date and DateTime, doesn't have an additive 
 identity. (apart from the period types, but it is unclear which one to return)
 
 Looking at dataframes, I discovered that they already monkey patch 
 Base.zeros() to make it work for strings.
 
 I think this is a bigger issue to be discussed in the contest of the use case 
 in DataFrames. My two obvious suggestions would be to:
 Change the documentation for zero() to say that it is the additive identity 
 unless it doesn't make sense, in which case any default value is good.
 Create a new function in Base for this specific need of a default value.
 Ivar
 
 kl. 03:53:43 UTC+1 mandag 10. november 2014 skrev Jacob Quinn følgende:
 HmmmI guess we could add 0 and 1 definitions if it'll be generally 
 useful (i.e. Date/DateTime s are ordinals with numeric-like properties, so 
 being able to define zero/one and have them work with generic functions).
 
 It still just seems a little weird because there's not a real solid 
 reasoning/meaning. I think one reason a lot of other languages define a 
 zero(::DateTime) is because values can be truthy or falsey, so you would 
 compare a date with zero(::DateTime) to check for falseness. In Julia, you 
 have to use explicit Booleans, so that's not as important a reason.
 
 Happy to hear other opinions/use cases from people though.
 
 -Jacob
 
 On Sun, Nov 9, 2014 at 9:23 PM, Thomas Covert thom@gmail.com wrote:
 To your first question, I'm sure there are good reasons for not having zeros 
 in the Date and Time types, but in other languages (i.e., stata), dates and 
 times are stored as integers or floats with respect to some reference time.  
 So, I *think* the 0-date in stata refers to January 1, 1960.  Obviously this 
 is fairly arbitrary, but there is some precedence for it in other languages.
 
 On Sunday, November 9, 2014 8:17:04 PM UTC-6, Jacob Quinn wrote:
 What Date would represent zero(::Date)? Or one(::Date), for that matter? 
 Doesn't seem like a particularly useful definition. What's the use case?
 
 On Sun, Nov 9, 2014 at 9:14 PM, Thomas Covert thom@gmail.com wrote:
 I'm using Dates.jl on 0.3 and have discovered that there is no zero defined 
 for the Date or DateTime types.  Is this intentional?
 
 
 
 



Re: [julia-users] Compressing .jld files

2014-11-10 Thread Robert Feldt
Has there been any progress on a (stand-alone) Blosc package for Julia? If 
not I might have time to contribute since I need a fast compressor for a 
project. If there is any code/start for it I'd appreciate it though.

Cheers,

Robert Feldt

Den tisdagen den 2:e september 2014 kl. 21:47:33 UTC+2 skrev Douglas Bates:

 Would it be reasonable to create a Blosc package or it is best to 
 incorporate it directly into the HDF5 package?  If a separate package is 
 reasonable I could start on it, as I was the one who suggested this in the 
 first place.

 On Tuesday, September 2, 2014 2:43:15 PM UTC-5, Tim Holy wrote:

 All these testimonials do make it sound promising. Even three-fold 
 compression 
 is a pretty big deal. 

 One disadvantage to compression is that it makes mmap impossible. But, 
 since 
 HDF5 supports hyperslabs, that's not as big a deal as it would have been. 

 --Tim 

 On Tuesday, September 02, 2014 12:11:55 PM Jake Bolewski wrote: 
  I've used Blosc in the past with great success.  Oftentimes it is 
 faster 
  than the uncompressed version if IO is the bottleneck.  The compression 
  ratios are not great but that is really not the point. 
  
  On Tuesday, September 2, 2014 2:09:20 PM UTC-4, Stefan Karpinski wrote: 
   That looks pretty sweet. It seems to avoid a lot of the pitfalls of 
   naively compressing data files while still getting the benefits. It 
 would 
   be great to support that in JLD, maybe even turned on by default. 
   
   
   On Tue, Sep 2, 2014 at 1:35 PM, Kevin Squire kevin@gmail.com 
   
   javascript: wrote: 
   Just to hype blosc a little more, see 
   
   http://www.blosc.org/blosc-in-depth.html 
   
   The main feature is that data is chunked so that the compressed 
 chunk 
   size fits into L1 cache, and is then decompressed and used there. 
  There 
   are a few more buzzwords (multithreading, simd) in the link above. 
 Worth 
   exploring where this might be useful in Julia. 
   
   Cheers, 
   
 Kevin 
   
   On Tuesday, September 2, 2014, Tim Holy tim@gmail.com 
 javascript: 
   
   wrote: 
   HDF5/JLD does support compression: 
   
   
 https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-w 
   riting-data 
   
   But it's not turned on by default. Matlab uses compression by 
 default, 
   and 
   I've found it's a huge bottleneck in terms of performance 
   ( 
   
 http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files 
   -more-quickly). But perhaps there's a good middle ground. It would 
 take 
   someone 
   doing a little experimentation to see what the compromises are. 
   
   --Tim 
   
   On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote: 
Now that the JLD format can handle DataFrame objects I would like 
 to 
   
   switch 
   
from storing data sets in .RData format to .jld format.  Datasets 
   
   stored in 
   
.RData format are compressed after they are written.  The default 
compression is gzip.  Bzip2 and xz compression are also 
 available. 
The 
compression can make a substantial difference in the file size 
 because 
   
   the 
   
data values are often highly repetitive. 

JLD is different in scope in that .jld files can be queried using 
   
   external 
   
programs like h5ls and the files can have new data added or 
 existing 
   
   data 
   
edited or removed.  The .RData format is an archival format. 
  Once the 
   
   file 
   
is written it cannot be modified in place. 

Given these differences I can appreciate that JLD files are not 
   
   compressed. 
   
 Nevertheless I think it would be useful to adopt a convention in 
 the 
   
   JLD 
   
module for accessing data from files with a .jld.xz or .jld.7z 
   
   extension. 
   
 It could be as simple as uncompressing the files in a temporary 
   
   directory, 
   
reading then removing, or it could be more sophisticated.  I 
 notice 
   
   that my 
   
versions of libjulia.so on an Ubuntu 64-bit system are linked 
 against 
   
   both 
   
libz.so and liblzma.so 

$ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so 
linux-vdso.so.1 =  (0x7fff5214f000) 
libdl.so.2 = /lib/x86_64-linux-gnu/libdl.so.2 
 (0x7f62932ee000) 
libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1 (0x7f62930d5000) 
libm.so.6 = /lib/x86_64-linux-gnu/libm.so.6 (0x7f6292dce000) 
librt.so.1 = /lib/x86_64-linux-gnu/librt.so.1 
 (0x7f6292bc6000) 
libpthread.so.0 = /lib/x86_64-linux-gnu/libpthread.so.0 
(0x7f62929a8000) 
libunwind.so.8 = /usr/lib/x86_64-linux-gnu/libunwind.so.8 
(0x7f629278c000) 
libstdc++.so.6 = /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
(0x7f6292488000) 
libgcc_s.so.1 = /lib/x86_64-linux-gnu/libgcc_s.so.1 
   
   (0x7f6292272000) 
   
libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6 (0x7f6291eab000) 
/lib64/ld-linux-x86-64.so.2 (0x7f62944b3000) 
liblzma.so.5 

[julia-users] Available packages for compression?

2014-11-10 Thread Robert Feldt
For a project I need fast string compression accessible from Julia. I have 
found:

* Gzip.jl, file-based access to gzip compression
  https://github.com/JuliaLang/GZip.jl

* Zlib.jl, in-memory access to gzip compression
  https://github.com/dcjones/Zlib.jl

* There has been talks about doing a Julia package for Blosc (blosc.org) 
and I found this but not sure it's working:
  https://github.com/jakebolewski/Blosc.jl
  https://groups.google.com/forum/#!topic/julia-users/eT5_h9zfT5k

If anyone knows of more/other compression packages useable from Julia, 
please share in this thread. This way people can get a more up-to-date 
view. 
Compression is a basic building block for a lot of different things so good 
if we have many options in Julia. Would be very nice to have access to 
liblzma, xz, paq etc, long-term.

If one just needs to estimate the LZ76 complexity there is a pure Julia 
implementation here:
https://github.com/robertfeldt/InfoTheory.jl/blob/master/spikes/lempel_ziv_76_complexity.jl
but it has bad performance for long strings compare to Zlib so probably not 
very useful.

Thanks,

Robert Feldt


[julia-users] Re: Available packages for compression?

2014-11-10 Thread Robert Feldt
If people want to try Blosc please see this issue for how to build it on 
Julia 0.3.0 (at least on my Mac OS X 10.9):

https://github.com/jakebolewski/Blosc.jl/issues/1

but then one can compare Zlib and Blosc compressors:

using Zlib
zliblength(str) = length(Zlib.compress(str,9,false,true))
using Blosc
lz4length(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:lz4))
lz4hclength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:lz4hc))
bzliblength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:zlib))

function report(name, func, input)
  tic()
  len = func(input)
  t = toq()
  @printf(%s, time = %.3e seconds, compression ratio = %.3f\n, name, t, 
length(input)/len)
end

for exponent in 1:7
  n = 10^exponent
  input = Uint8[1:n];
  strinput = string(input);
  println(\nInput of length 10^$exponent)
  report(zlib , (input) - zliblength(input), input)
  report(zlib in blosc, (input) - lz4hclength(input), input)
  report(lz4hc, (input) - bzliblength(input), input)
  report(lz4  , (input) - lz4length(input), input)
end

which gives output:

Input of length 10^1
zlib , time = 4.789e-02 seconds, compression ratio = 0.833
zlib in blosc, time = 3.256e-02 seconds, compression ratio = 0.385
lz4hc, time = 3.939e-03 seconds, compression ratio = 0.385
lz4  , time = 3.482e-03 seconds, compression ratio = 0.385

Input of length 10^2
zlib , time = 1.211e-04 seconds, compression ratio = 0.980
zlib in blosc, time = 1.448e-05 seconds, compression ratio = 0.862
lz4hc, time = 3.801e-06 seconds, compression ratio = 0.862
lz4  , time = 3.403e-06 seconds, compression ratio = 0.862

Input of length 10^3
zlib , time = 8.187e-05 seconds, compression ratio = 3.571
zlib in blosc, time = 1.400e-04 seconds, compression ratio = 3.413
lz4hc, time = 5.589e-05 seconds, compression ratio = 3.226
lz4  , time = 1.119e-05 seconds, compression ratio = 3.413

Input of length 10^4
zlib , time = 1.158e-04 seconds, compression ratio = 27.473
zlib in blosc, time = 4.732e-05 seconds, compression ratio = 30.395
lz4hc, time = 1.107e-04 seconds, compression ratio = 25.381
lz4  , time = 6.572e-06 seconds, compression ratio = 30.395

Input of length 10^5
zlib , time = 7.319e-04 seconds, compression ratio = 140.252
zlib in blosc, time = 2.058e-04 seconds, compression ratio = 146.628
lz4hc, time = 6.519e-04 seconds, compression ratio = 134.590
lz4  , time = 2.368e-05 seconds, compression ratio = 146.628

Input of length 10^6
zlib , time = 4.517e-03 seconds, compression ratio = 238.095
zlib in blosc, time = 2.291e-04 seconds, compression ratio = 237.473
lz4hc, time = 4.493e-03 seconds, compression ratio = 236.407
lz4  , time = 6.989e-04 seconds, compression ratio = 198.807

Input of length 10^7
zlib , time = 4.499e-02 seconds, compression ratio = 255.669
zlib in blosc, time = 3.146e-02 seconds, compression ratio = 246.299
lz4hc, time = 1.749e-02 seconds, compression ratio = 247.078
lz4  , time = 5.670e-03 seconds, compression ratio = 200.489

It seems that LZ4Hc compression in Blosc is sometimes quite some bit 
faster, but not always. Compression ratio is good. 
LZ4 is always faster than the others but sometimes compresses a bit less.
For strings shorter than ~350 characters there is not always any 
compression of the input.
Note that the string being compressed here is very regular though so this 
eval is not very good and might be misleading of compression levels to 
expect. This is just a very rough indication.

Cheers,

Robert



Den måndagen den 10:e november 2014 kl. 09:49:54 UTC+1 skrev Robert Feldt:

 For a project I need fast string compression accessible from Julia. I have 
 found:

 * Gzip.jl, file-based access to gzip compression
   https://github.com/JuliaLang/GZip.jl

 * Zlib.jl, in-memory access to gzip compression
   https://github.com/dcjones/Zlib.jl

 * There has been talks about doing a Julia package for Blosc (blosc.org) 
 and I found this but not sure it's working:
   https://github.com/jakebolewski/Blosc.jl
   https://groups.google.com/forum/#!topic/julia-users/eT5_h9zfT5k

 If anyone knows of more/other compression packages useable from Julia, 
 please share in this thread. This way people can get a more up-to-date 
 view. 
 Compression is a basic building block for a lot of different things so 
 good if we have many options in Julia. Would be very nice to have access to 
 liblzma, xz, paq etc, long-term.

 If one just needs to estimate the LZ76 complexity there is a pure Julia 
 implementation here:

 https://github.com/robertfeldt/InfoTheory.jl/blob/master/spikes/lempel_ziv_76_complexity.jl
 but it has bad performance for long strings compare to Zlib so probably 
 not very useful.

 Thanks,

 Robert Feldt



Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread Milan Bouchet-Valat
Le dimanche 09 novembre 2014 à 23:50 +, John Myles White a écrit :
 FWIW, I think the best way to move forward with NamedArrays is to
 replace NamedArrays with a parametric type Named{T} that wraps around
 other AbstractArray types. That gives you both named Array and named
 DataArray objects for the same cost.
Yeah, looks like a good idea. Duplicating the code for each array type
would be a waste.


Regards


 On Nov 9, 2014, at 5:49 PM, Tim Holy tim.h...@gmail.com wrote:
 
  Indeed, better to use a Dict if you're naming each row/column. I'd 
  forgotten 
  that was part of NamedArrays.
  
  --Tim
  
  On Sunday, November 09, 2014 06:11:44 PM Milan Bouchet-Valat wrote:
  Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit :
  With regards to arrays with named dimensions, I suspect that with the
  arrival of stagedfunctions, something like NamedAxesArrays
  (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But
  stagedfunctions still have some show-stopper bugs, and we need to fix
  those
  first.
  
  Interesting package!
  
  But when I said named dimensions, I actually meant that dimensions had
  names, but that elements on each dimension (rows, columns...) had names
  too. I'm not sure it also makes sense to use staged functions to
  specialize code on element names, since they can vary much more than
  dimension names. This could generate quite a lot of methods which would
  use memory even if only used once.
  
  
  Regards
  
  On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote:
  Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
  I would vote for calling such a function `table()`, to get even closer
  to R's table().
  
  Well, that's the debate at
  https://github.com/JuliaStats/StatsBase.jl/issues/32
  
  At first I was in favor of table() too, but now I prefer freqtable(),
  because table could mean any kind of cross-tabulation. I think
  NamedArray could even be called Table.
  
  And I can't wait for such functionality to be included in METADATA...
  
  Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
  when I first worked on the package. Now I see the tests are still
  failing. Do you know what is needed to make them work?
  
  Another point is that I think this deserves going into StatsBase, but
  before that we need everybody to agree on a design for NamedArrays.
  
  Regards
  
  On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
  
  wrote:
 Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
  
 écrit :
  I was also looking for a function like this, but could not
  find one in docs.julialang.org.  I was doing this
  (v0.4.0-dev), for anyone who is interested:
  
  
  example = rand(1:10,100)
  uexample = sort(unique(example))
  counts = map(x-count(y-x==y,example),uexample)
  
  
  It's pretty ugly, so thanks, Johan, for pointing out the
  StatsBase-countmap
  
 I've also put together a small package precisely aimed at
 offering an equivalent of R's table():
 https://github.com/nalimilan/Tables.jl
  
 But there's a more general issue about how to handle arrays
 with dimension names in Julia. NamedArrays.jl (which is used
 in my package) attempts to tackle this issue, but I don't
 think we've reached a consensus yet about the best solution.
  
  
 Regards
  
  On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
  
  wrote:
 I think countmap comes closest to giving you what
 you want:
  
 using StatsBase
 data = sample([a, b, c], 20)
 countmap(data)
  
 Dict{ASCIIString,Int64} with 3 entries:
   c = 3
   b = 10
   a = 7
  
 On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
  
 Oswald wrote:
 Hi
  
  
 I'm looking for the best way to count how
 many times a certain value x_i appears in
 vector x, where x could be integers, floats,
 strings. In R I would do table(x). I found
 StatsBase.counts(x,k) but I'm a bit confused
 by k (where k goes into 1:k, i.e. the vector
 is scanned to find how many elements locate
 at each point of 1:k). most of the times I
 don't know k, and in fact I would do
 table(x) just to find out what k was. Apart
 from that, I don't think I could use this
 with strings, as I can't construct a range
 object from strings.
  
  
 I'm wondering whether a method
 StatsBase.counts(x::Vector) just returning
 the frequency of each element appearing
 would be useful.
  
  
 The same applies to Base.hist if I
 understand correctly. I just don't want to
 have to 

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread Milan Bouchet-Valat
Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit :
 Hello, 
 
 On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote:
 NamedArrays.jl generally goes along this way. However, it
 remains limited in two aspects:
 
 
 1. Some fields in NamedArrays are not declared of specific
 types. In particular, the field `dicts` is of the type
 `Vector{Dict}`, and the use of this field is on the critical
 path when looping over the table, e.g. when counting. This
 would potentially lead to substantial impact on performance.
 
 
 In the beginning I have been experimenting with indexing speed, mainly
 to sort out the various forms of getindex(), and I although I don't
 remember the exact result, I do remember that I found the drop in
 performance w.r.t. integer indexing surprisingly small. 
 
 
 I suppose the problem you indicate can be alleviated by making
 NamedArray parameterized by the type of the key in the dict as well.  
Right. Sounds reasonable.

 2. Currently, it only accepts a limited set of types for
 indices, e.g. Real and String. But in some cases, people may
 go beyond this. I don't think we have to impose this limit. 
 
 
 Ah---I now see what you mean.  I thought I had built in support for
 all types as index, but there obviously is no catch all-rule in
 getindex.  I suppose NamedArray needs an update there. 
I think the last time I looked into this, it was a problem even for
efficiently indexing AbstractArrays:
https://github.com/JuliaLang/julia/pull/4892#issuecomment-31087910

Slow catch-all methods are good, but if we want specialized versions it
will probably need more work. If you want to accept combinations of
Int/String/Complement{T}/anything, the number of specialized methods to
generate explodes. I think the conclusion was that we needed to wait for
staged functions. Since they are implemented now, it may be a good time
to look into this issue for both AbstractArrays and NamedArrays.


Regards

 On Monday, November 10, 2014 8:35:32 AM UTC+8, Dahua Lin
 wrote:
 I have been observing an interesting differences
 between people coming from stats and machine learning.
 
 
 Stats people tend to favor the approach that allows
 one to directly use the category names to index the
 table, e.g. A[apple]. This tendency is clearly
 reflected in the design of R, where one can attach a
 name to everything.
 
 
 While in machine learning practice, it is a common
 convention to just encode categories into integers,
 and simply use an ordinary array to represent a
 counting table. Whereas it makes it a little bit
 inconvenient in an interactive environment, this way
 is generally more efficient when you have to deal with
 these categories over a large number of samples.
 
 
 These differences aside, I believe, however, that
 there exist a very generic approach to this problem --
 a multi-dimensional associative map, which allows one
 to write A[i1, i2, ...] where the indices can be
 arbitrary hashable  equality-comparable instances,
 including integers, strings, symbols, among many other
 things.
 
 
 A multi-dimensional associative map can be considered
 as a multi-dimensional generalization of dictionaries,
 which can be easily implemented via an
 multidimensional array and several dictionaries, each
 for one dimension, to map user-side indexes to integer
 indexes. 
 
 
 - Dahua
 
 
 
 
 
 
 
 On Monday, November 10, 2014 8:12:54 AM UTC+8, David
 van Leeuwen wrote:
 Hi, 
 
 On Sunday, November 9, 2014 5:10:19 PM UTC+1,
 Milan Bouchet-Valat wrot
 Actually I didn't do it because
 NamedArrays.jl didn't work well on 0.3
 when I first worked on the package.
 Now I see the tests are still failing.
 Do you know what is needed to make
 them work?
 
 

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Tim Holy
All good plans. (I'm not sure about using 65536 bins for 16-bit images, 
though, because that would be more bins than there are pixels in some images. 
Still, it's not all that much memory, really, so maybe that would be OK.)

It would be great to add native support. Presumably you've found the docs on 
adding support for new formats.

For formats that encode large datasets in a single block (like NRRD), you can 
work with GB-sized datasets on a laptop because you can use mmap (I do it 
routinely). But the love of TIFF does demand an alternative solution. 
Presumably we should add a lower-level routine that returns a structure that 
facilitates later access, e.g.,
imds = imdataset(my_image_file)
img = imds[z, 14, t, 7]
or somesuch.

--Tim

On Sunday, November 09, 2014 07:38:27 PM Aneesh Sathe wrote:
 Tim,
 i would like the imhist to be idiot proof. (i've been teaching matlab and
 nothing puts new people off more than things not being idiot proof).
 things like using 256 bins by default returning a plot  if no outputs
 are specified (basically make it like matlab's imthresh() )
 
 Btw, on matlab using bioformats is actually the slowest part of my
 algorithm, so unless it can be faster in julia native support might be
 nicer. Bioformats also fails in that it reads the whole sequence at once...
 so running things on laptops with even GB-level datasets is impossible. I
 wrote my own version of bfopen to only open the required XYZCT for
 specified series, but that only solves the memory usage.
 
 the source format for my image was .mvd2 (perkin elmer spinning disk).
 
 i know about JavaCall.jl just havent had the time to play with it...
 
 i was thinking it might be fun to attempt native support for a few formats.
 I can also generate test data in a few vendor formats for a few
 microscopes.
 perhaps even make it a julia-box based project. ;)
 
 On Monday, November 10, 2014 4:49:22 AM UTC+8, Tim Holy wrote:
  On Sunday, November 09, 2014 11:39:53 AM Aneesh Sathe wrote:
   Yes, Images does read it okay but only if i cut out the substack. If i
   don't, then it interprets the three channels as a time dimension, which
   isnt a pain at the moment but will be if i start using it for work.
  
  Hmm, that sounds like an annotation problem.
  
   I realized that both the convert and the g[:] would slow me down but the
   hist function just wouldn't work without that kind of dance. Also,
   graythresh (http://www.mathworks.com/help/images/ref/graythresh.html)
  
  uses
  
   reshape to make it all one image which might also add to speed.
   
   The pull request is well and good but personally i would rather have a
   dedicated image histogram function like
   imhist: http://www.mathworks.com/help/images/ref/imhist.html
   which would give histograms based on input images. To me that's the only
   way to make life easier. maybe i'll write one :)
  
  imhist is necessary in matlab largely because hist works columnwise; in a
  sense, Julia's `hist` is like imhist. Is there some specific functionality
  you're interested in? There's no reason Images can't provide a custom
  version
  of `hist`.
  
   Something about Images: do you think it possible to use the bio formats'
   .jar file to import images from a microscope format to Images?
   Opening a microscope format image file in the relevant software and then
   exporting it as tiff takes too long and i'd rather be able to access the
   images directly.
  
  Yes, expansion of Images' I/O capabilities would be great. I've wondered
  about
  Bio-Formats myself, but not had a direct need, nor do I know Java (but see
  JavaCall.jl, if you haven't already).
  
  The other way to go, of course, is Julia native support. Our support for
  NRRD
  is a reasonable model of this approach. However, the reason we use
  ImageMagick
  is because the reality is that there are a lot of formats out there; Bio-
  Formats would fill a similar need for vendor-specific file formats. Out of
  curiousity, what's the original format you're using?
  
  --Tim



[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread Nils Gudat
Hi David,

shouldnt it be @Compat Dict(zip(keys, values) instead of 
@Compat.dict(zip(keys, values)), i.e. a space between compat and dict 
rather than a dot method call?

Best,
Nils


Re: [julia-users] Performance confusions on matrix extractions in loops, and memory allocations

2014-11-10 Thread Milan Bouchet-Valat
Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo a écrit :
 Hi fellows, 
 
 
 
 I'm currently working on sparse matrix and cosine similarity
 computation, but my routines is running very slow, at least not reach
 my expectation. So I wrote some test functions, to dig out the reason
 of ineffectiveness. To my surprise, the execution time of passing two
 vectors to the test function and passing the whole sparse matrix
 differs greatly, the latter is 80x faster. I am wondering why
 extracting two vectors of the matrix in each loop is dramatically
 faster that much, and how to avoid the multi-GB memory allocate.
 Thanks guys.
 
 
 --
 BEST REGARDS,
 Todd Leo
 
 
 # The sparse matrix
 mat # 2000x15037 SparseMatrixCSC{Float64, Int64}
 
 
 # The two vectors, prepared in advance
 v = mat'[:,1]
 w = mat'[:,2]
 
 
 # Cosine similarity function
 function cosine_vectorized(i::SparseMatrixCSC{Float64, Int64},
 j::SparseMatrixCSC{Float64, Int64})
 return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j))
 end
I think you'll experience a dramatic speed gain if you write the sums in
explicit loops, accessing elements one by one, taking their product and
adding it immediately to a counter. In your current version, the
element-wise products allocate new vectors before computing the sums,
which is very costly.

This will also get rid of the difference you report between passing
arrays and vectors.


Regards

 function test1(d)
 res = 0.
 for i in 1:1
 res = cosine_vectorized(d[:,1], d[:,2])
 end
 end
 
 
 function test2(_v,_w)
 res = 0.
 for i in 1:1
 res = cosine_vectorized(_v, _w)
 end
 end
 
 
 test1(dtm)
 test2(v,w)
 gc()
 @time test1(dtm)
 gc()
 @time test2(v,w)
 
 
 #elapsed time: 0.054925372 seconds (59360080 bytes allocated, 59.07%
 gc time)
 
 #elapsed time: 4.204132608 seconds (3684160080 bytes allocated, 65.51%
 gc time)
 



[julia-users] Silhouette width

2014-11-10 Thread Francesco Brundu
Hi all,
I am new to Julia. I searched a bit but I did not find anything related to 
the silhouette (http://en.wikipedia.org/wiki/Silhouette_(clustering)) ..
Do you know if there is something about it?

Thanks,
Francesco


[julia-users] Input arguments to gemm!

2014-11-10 Thread Kapil Agarwal
Hi

I am unable to figure out what should I pass as input parameters to the 
gemm! function. The function declaration asks for function BlasChar, 
StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?

--
Kapil


Re: [julia-users] Reinterpreting parts of a byte array

2014-11-10 Thread Sebastian Good
Thanks for the responses. As usual, I discover myself making assumptions
that may not have been stated well.

1. I'll be reading small bits (32 bit ints, mostly) at fairly random
addresses and was worried about the overhead of creating array views for
such small objects. Perhaps they are optimized away. I should check :-)
2. I've been taught by other languages that touching raw pointers is
dangerous without also holding some promise that they won't be relocated,
e.g. by a copying collector, etc. I suppose if it's a memory mapped array,
I can roughly cheat and know that the OS won't move it, so Julia can't
either. But it worried me.

*Sebastian Good*


On Sun, Nov 9, 2014 at 11:36 PM, Jameson Nash vtjn...@gmail.com wrote:

 It rather depends upon what you know about the data. If you want a
 file-like abstraction, it may be possible to wrap it in an IOBuffer type
 (if not, it should be parameterized to allow it). If you want an array-like
 abstraction, then I think reinterpreting to different array types may be
 the most direct approach. If the array is coming from C, then you can use
 unsafe_load/unsafe_store directly. As Ivar points out, this is not more nor
 less dangerous than the same operation in C. Although, if you wrap the data
 buffer in a Julia object (or got it from a Julia call), you can gain some
 element of protection against memory corruption bugs by minimizing the
 amount of julia code that is directly interfacing with the raw memory
 pointer.


 On Sun Nov 09 2014 at 5:42:42 PM Ivar Nesje iva...@gmail.com wrote:

 Is there any problem with reinterpreting the array and then use a
 SubArray or ArrayView to do the index transformation?

 Pointer arithmetic is not more or less dangerous in Julia, than what it
 is in C. The only thing you need to ensure is that the object you have a
 pointer to is referenced by something the GC traverses, and that it isn't
 moved in memory (Eg. vector resize).




Re: [julia-users] Silhouette width

2014-11-10 Thread Jacob Quinn
Check out the Clustering.jl package which has an interface for silhouette.
Specifically, see this file:
https://github.com/JuliaStats/Clustering.jl/blob/master/src/silhouette.jl

-Jacob

On Mon, Nov 10, 2014 at 5:53 AM, Francesco Brundu 
francesco.bru...@gmail.com wrote:

 Hi all,
 I am new to Julia. I searched a bit but I did not find anything related to
 the silhouette (http://en.wikipedia.org/wiki/Silhouette_(clustering)) ..
 Do you know if there is something about it?

 Thanks,
 Francesco



[julia-users] Re: Input arguments to gemm!

2014-11-10 Thread Steven G. Johnson


On Monday, November 10, 2014 8:39:00 AM UTC-5, Kapil Agarwal wrote:

 I am unable to figure out what should I pass as input parameters to the 
 gemm! function. The function declaration asks for function BlasChar, 
 StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?


Yes.  (Or rather, the StridedFoo types are a superset, including various 
1d/2d array types.)
 


Re: [julia-users] Re: Input arguments to gemm!

2014-11-10 Thread Andreas Noack
E.g.

julia A = randn(3,4);B = randn(4,3);C = Array(Float64,3,3);


julia BLAS.gemm!('N', 'N', 1.0, A, B, 0.0, C)

3x3 Array{Float64,2}:

 -1.39617  4.02968   -1.2171

 -2.35074  2.609030.216789

  1.63807  0.102948  -0.41358



2014-11-10 9:09 GMT-05:00 Steven G. Johnson stevenj@gmail.com:



 On Monday, November 10, 2014 8:39:00 AM UTC-5, Kapil Agarwal wrote:

 I am unable to figure out what should I pass as input parameters to the
 gemm! function. The function declaration asks for function BlasChar,
 StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?


 Yes.  (Or rather, the StridedFoo types are a superset, including various
 1d/2d array types.)




[julia-users] Re: Great new expository article about Julia by the core developers

2014-11-10 Thread David Higgins
So how does one go about getting an invitation to JuliaBox? It's referenced 
in the article but you need an invitation to login

Dave.

On Saturday, 8 November 2014 22:58:31 UTC, Peter Simon wrote:

 Just found this great new highly accessible exposition about the Julia 
 language: http://arxiv.org/pdf/1411.1607v1.pdf, by Jeff et al.  It's the 
 perfect into to share with many of my not-yet-Julian colleagues.

 --Peter



Re: [julia-users] Re: strange speed reduction when using external function in inner loop

2014-11-10 Thread Rob J Goedman
David,

Not sure this is correct or helps, but on my Yosemite 10.10.1 MacBook Pro I 
get below results.

Regards,
Rob

*julia **@time prof(true)*

 Count FileFunction Line

47 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 15

   165 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   502 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

98 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

64 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 29

 5 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

20 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

45 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   883 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   45

   884 REPL.jl eval_user_input54

   502 array.jl+ 719

   165 random.jl   rand! 130

   884 task.jl anonymous  96

elapsed time: 1.51332406 seconds (488212276 bytes allocated, 53.00% gc time)


*julia **@time prof(true)*

 Count FileFunction Line

   156 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   577 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 21

   116 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 26

53 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

10 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

43 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 3 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   910 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   910 REPL.jl eval_user_input54

   577 array.jl+ 719

   156 random.jl   rand! 130

   910 task.jl anonymous  96

elapsed time: 1.488157718 seconds (488208960 bytes allocated, 50.96% gc 
time)


*julia **@time prof(true)*

 Count FileFunction Line

   174 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   545 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

   115 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 26

46 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 29

 8 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

18 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

28 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 3 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   894 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   894 REPL.jl eval_user_input54

   545 array.jl+ 719

   174 random.jl   rand! 130

   894 task.jl anonymous  96

elapsed time: 1.448621207 seconds (488206436 bytes allocated, 49.75% gc 
time)


*julia **@time prof(true)*

 Count FileFunction Line

   165 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   584 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

   117 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

51 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 5 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

16 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

34 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

   922 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   922 REPL.jl eval_user_input54

   584 array.jl+ 719

   165 random.jl   rand! 130

   922 

[julia-users] travis for os x packages

2014-11-10 Thread Simon Byrne
I would like to set up travis for an OS X-only package: does anyone have 
suggestions for how I should set up travis (or has anyone already done 
this)?

simon


Re: [julia-users] Translating Class-Based OO Apps to Julia

2014-11-10 Thread Greg Trzeciak


On Thursday, January 17, 2013 2:56:52 AM UTC+1, Stefan Karpinski wrote:

 ... This definitely should go in an object-oriented programming in Julia 
 document.


Does a document like this exist? It would definitely be useful. 


[julia-users] parallel for loop in Julia

2014-11-10 Thread DrKey
I'm a beginner at using Julia and I have written a simple molecular dynamic 
simulation, which works quite well and fast.

Now I'm trying to parallelize my core loop which calculates the forces 
between each pair of particles.

My loop is:

for partA = 1:nParts-1
for partB = (partA+1):nParts

# Calculate particle-particle distance
dr = coords[:,partA] - coords[:,partB];
   
dr2 = dot(dr,dr) 

invDr2 = 1.0/dr2; 
invDr6 = invDr2^3;
tforce = invDr2^4 * (invDr6 - 0.5);

forces[:,partA] = forces[:,partA] + dr* tforce ;
forces[:,partB] = forces[:,partB] - dr* tforce ;
end
end

coords is a array holding the 3 dimensional coordinates for each particle.
nParts is the number of particles and forces has the same size as coords 
and holds the forces for each particle.

I tried @parallel for with different reduction operators (I found + and 
vcat, of course with changing my loop a little bit) which are not 
documented very well. At least I only found examples for (+) in the help.
What is the best way to parallelize this? 



[julia-users] Error in PyPlot; cm_get_cmap not defined

2014-11-10 Thread Nils Gudat
I'm using PyPlot to make 3D plots, which I color by getting color maps 
through ColorMap(::String). After running a Pkg.update() today, I am now 
getting an error message when trying to construct a 3D plot, saying 
cm_get_cmap not defined (...) at Plots.jl:141. 
Indeed, when checking colormaps.jl 
https://github.com/stevengj/PyPlot.jl/blob/master/src/colormaps.jl, I 
find that ColorMaps should lead to a call to get_cmap, not cm_get_cmap. Why 
is my PyPlot trying to get the color maps through a different function?


[julia-users] Absolute value of big(-0.0)

2014-11-10 Thread Samuel S. Watson
I'm getting (notice the negative sign):

abs(big(-0.0)) = -0e+00 with 256 bits of precision

I think it would be better to have abs(big(-0.0)) return 0e+00 (for 
example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an 
abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is 
ifelse(x0,-x,0), and -0 is not less than 0. 


Re: [julia-users] Absolute value of big(-0.0)

2014-11-10 Thread Samuel S. Watson
Done: https://github.com/JuliaLang/julia/issues/8968

On Monday, November 10, 2014 12:06:31 PM UTC-5, Stefan Karpinski wrote:

 This is indeed a bug – could you open an issue? 
 https://github.com/JuliaLang/julia/issues

 On Mon, Nov 10, 2014 at 5:55 PM, Samuel S. Watson samuel@gmail.com 
 javascript: wrote:

 I'm getting (notice the negative sign):

 abs(big(-0.0)) = -0e+00 with 256 bits of precision

 I think it would be better to have abs(big(-0.0)) return 0e+00 (for 
 example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an 
 abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is 
 ifelse(x0,-x,0), and -0 is not less than 0. 




[julia-users] Re: Error in PyPlot; cm_get_cmap not defined

2014-11-10 Thread Steven G. Johnson
Should be fixed now, sorry.


[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread John Drummond
Thank you, that's helpful. 
I reentered it all in a fresh session and found it working as well - I'll 
try and find the difference which caused it not to work and come back.
Kind Regards, John.

On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:

 This code works everywhere I'm able to try it. 

 kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:

 I was originally julia 0.3.1 on windows 7
 this is on Macosx 10 julia 0.3.2
 I loaded the file LogParse.jl below and then in the repl ran

 reload(LogParse.jl)

 methods(isless)


 ary1 = LogParse.DayPriceText[]
 push!(ary1,LogParse.DayPriceText(4,a1,1))
 push!(ary1,LogParse.DayPriceText(2,a1,1))
 push!(ary1,LogParse.DayPriceText(6,a1,1))


 sort(ary1)

 sort(ary1,lt=LogParse.isless)
 I get the same messages - methods(isless) shows that it's loaded
 but the sort can't find it, even when I try to specify the function


 #in file LogParse.jl ###
 module LogParse
 export DayPriceText
 import Base.isless

 type DayPriceText
   a1::Uint32
   b1::ASCIIString
   a2::Uint32
 end

 function isless(a::DayPriceText, b::DayPriceText)
   if (a.a1  b.a1)
 return true
   else
 return false
   end
 end


 end
 ##

 Many thanks.
 Kind regards, John


 On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:

 In this case it would be really great if you had a minimal reproducible 
 example. It looks to me as you are doing everything right, so I would start 
 looking for typos and scoping issues. It's hard to find them without 
 looking at the code.

 Ideally the example should be small and possible to paste into a REPL 
 session, but if you can publish your code and don't want to extract only 
 the relevant part, that might be fine too.

 Julia version and operating system is also nice to include, so that we 
 have it available in case we have problems reproducing your results.

 Regards Ivar

 kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende:

 Hi,
 I suspect I'm doing something stupid but no idea what I'm missing.

 I create a module .
 I create a type in it, DayPriceText
 I import Base.isless
 I define isless for the type

 now in the repl I get

 methods(isless)
 =
 # 25 methods for generic function isless:
 ..
 isless(x::DayPriceText,y::DayPriceText) at 
 c:\works\juliaplay\LogParse.jl:16

 but

 julia typeof(a1p)
 Array{DayPriceText,1}

 julia sort(a1p, lt=CILogParse.isless)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 julia sort(a1p)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 I'm sure there's some obvious answer, but I've not idea what.
 Thanks for any help
 kind regards, John.



Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread David van Leeuwen
Hello, 

On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat wrote:

 Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit : 
  Hello, 
  
  On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: 
  NamedArrays.jl generally goes along this way. However, it 
  remains limited in two aspects: 
  
  
  1. Some fields in NamedArrays are not declared of specific 
  types. In particular, the field `dicts` is of the type 
  `Vector{Dict}`, and the use of this field is on the critical 
  path when looping over the table, e.g. when counting. This 
  would potentially lead to substantial impact on performance.  
  
  I suppose the problem you indicate can be alleviated by making 
  NamedArray parameterized by the type of the key in the dict as well.   
 Right. Sounds reasonable. 


I've been pondering over how this could be done. NamedArray has a type 
parameter N, and it should then further have N type parameters indicating 
the dictionary type along each of the N dimension.  So I figure this is 
going to be a challenging type definition.  

---david




[julia-users] Re: parallel for loop in Julia

2014-11-10 Thread DrKey
Here is what i tried:
variant1:

forcp = zeros(3,1);

forcp = @parallel (hcat) for partA = 1:nPart
for partB = (partA+1):nPart
...
end
forcp = forces[:,partA];
end

variant2:
function calcforces(coords,L,np,i) # with np... number of processes i... 
current process
for partA = i+1:np:nPart-1
for partB = (partA+1):nPart
...
return forces
end

np = nprocs();
parad = Array(RemoteRef,np);

and then calling function calcforces with: 
for i=1:np parad[i] = @spawn LJ_Force_MT(coords,L,np,i); end
for i=1:np forces = fetch(parad[i]); end

both ways are giving me wrong results over more than 1 timestep



[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread David van Leeuwen
Hi Nils, 

My current work around is

## temporary compatibility hack
if VERSION  v0.4.0-dev
Base.Dict(z::Base.Zip2) = Dict(z.a, z.b)
end

On Monday, November 10, 2014 12:04:14 PM UTC+1, Nils Gudat wrote:

 Hi David,

 shouldnt it be @Compat Dict(zip(keys, values)) instead of 
 @Compat.Dict(zip(keys, values)), i.e. a space between compat and dict 
 rather than a dot method call?

 I was just following Stefan's syntax.  The dots on my screen are about as 
big as the stuck pieces of dust, but I really believe there is a period 
there. 

julia @Compat.Dict(:foo = 2, :bar = 2)
 Dict{Symbol,Int64} with 2 entries:
   :bar = 2
   :foo = 2

  Macro programming is beyond the scope of my brain, anyway...

---david

 Best,
 Nils



[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread Steven G. Johnson
On Monday, November 10, 2014 1:15:40 PM UTC-5, David van Leeuwen wrote:
 

 I was just following Stefan's syntax.  The dots on my screen are about as 
 big as the stuck pieces of dust, but I really believe there is a period 
 there. 
  


The syntax in Compat.jl changed shortly after its release.  The new syntax 
is to use:

 @compat ...Julia 0.4 syntax

and have it be automatically translated into older syntax as needed.  If 
there is a case where this does not work, please file an issue. 


[julia-users] JuliaBox

2014-11-10 Thread David Higgins
Hi,

Does anyone if JuliaBox http://www.juliabox.org is open to applications 
to use it these days? I came across it in the ArXiV paper about Julia 
mentioned here 
https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ. I'm 
a current Julia user but I have a number of colleagues who would be 
interested in a sandboxed, non-install version to play with before making 
the jump to installation. I made the mistake of suggesting JuliaBox before 
verifying that it was possible to create an account, it seems it's invite 
only for now.

Thanks,
Dave.


Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson


On Tuesday, September 2, 2014 3:58:25 PM UTC-4, Jake Bolewski wrote:

 It would be best to incorporate it into the HDF5 package.  A julia package 
 would be useful if you wanted to do the same sort of compression on Julia 
 binary blobs, such as serialized julia values in an IOBuffer.


Wouldn't it be better to have a separate Blosc.jl package that is used by 
HDF5.jl?   After all, there are presumably many other applications of this.

Note that HDF5 has a Blosc filter 
(http://www.hdfgroup.org/services/filters.html#blosc and 
https://github.com/Blosc/c-blosc/tree/master/hdf5), so that I guess you can 
use Blosc internally in the HDF5 file while still allowing HDF5 tools to 
work with the file. 


Re: [julia-users] Contributing to a Julia Package

2014-11-10 Thread João Felipe Santos
Hi Tim,

you have to create a fork on Github and then push your new branch to your
personal fork. Then, on Github, switch to that fork and the interface will
show a Pull request button if your personal fork is ahead of the upstream
repository.

Best

--
João Felipe Santos

On Mon, Nov 10, 2014 at 2:17 PM, Tim Wheeler timwheeleronl...@gmail.com
wrote:

 Hello Julia Users,

 I wrote some code that I would like to submit via pull request to a Julia
 package. The thing is, I am new to this and do not understand the pull
 request process.

 What I have done:

- used Pkg.add to obtain a local version of said package
- ran `git branch mybranch` to create a local git branch
- created my code additions and used `git add` to include them. Ran
`git commit -m`

 I am confused over how to continue. The instructions on git for issuing a
 pull request require that I use their UI interface, but my local branch is
 not going to show up when I select new pull request because it is, well,
 local to my machine. Do I need to fork the repository first? When I try
 creating a branch through the UI I do not get an option to create one like
 they indicate in the tutorial
 https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch,
 perhaps because I am not a repo owner.

 Thank you.



Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda
Hello David,

Sorry about that. You can use the invite code G01014. How many others do
you want to invite? A handful should be fine. Just do not publish it online.

Thank you

On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithiohuig...@gmail.com
wrote:

 Hi,

 Does anyone if JuliaBox http://www.juliabox.org is open to applications
 to use it these days? I came across it in the ArXiV paper about Julia
 mentioned here
 https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ.
 I'm a current Julia user but I have a number of colleagues who would be
 interested in a sandboxed, non-install version to play with before making
 the jump to installation. I made the mistake of suggesting JuliaBox before
 verifying that it was possible to create an account, it seems it's invite
 only for now.

 Thanks,
 Dave.



Re: [julia-users] JuliaBox

2014-11-10 Thread David Higgins
Thanks Ivar.

5 people Shashi, all academics so I'd like to get them interested.

Dave.

On Monday, 10 November 2014 19:31:17 UTC, Shashi Gowda wrote:

 Hello David,

 Sorry about that. You can use the invite code G01014. How many others do 
 you want to invite? A handful should be fine. Just do not publish it online.

 Thank you

 On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithio...@gmail.com 
 javascript: wrote:

 Hi,

 Does anyone if JuliaBox http://www.juliabox.org is open to 
 applications to use it these days? I came across it in the ArXiV paper 
 about Julia mentioned here 
 https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ. 
 I'm a current Julia user but I have a number of colleagues who would be 
 interested in a sandboxed, non-install version to play with before making 
 the jump to installation. I made the mistake of suggesting JuliaBox before 
 verifying that it was possible to create an account, it seems it's invite 
 only for now.

 Thanks,
 Dave.




Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda
On Tue, Nov 11, 2014 at 1:01 AM, Shashi Gowda shashigowd...@gmail.com
wrote:


 Just do not publish it online.


Oops I meant to send it to David directly. If anyone else wants a code,
please let me know.


Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda
Sure :) Happy to let them in.

On Tue, Nov 11, 2014 at 1:02 AM, David Higgins daithiohuig...@gmail.com
wrote:

 Thanks Ivar.

 5 people Shashi, all academics so I'd like to get them interested.

 Dave.

 On Monday, 10 November 2014 19:31:17 UTC, Shashi Gowda wrote:

 Hello David,

 Sorry about that. You can use the invite code G01014. How many others do
 you want to invite? A handful should be fine. Just do not publish it online.

 Thank you

 On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithio...@gmail.com
 wrote:

 Hi,

 Does anyone if JuliaBox http://www.juliabox.org is open to
 applications to use it these days? I came across it in the ArXiV paper
 about Julia mentioned here
 https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ.
 I'm a current Julia user but I have a number of colleagues who would be
 interested in a sandboxed, non-install version to play with before making
 the jump to installation. I made the mistake of suggesting JuliaBox before
 verifying that it was possible to create an account, it seems it's invite
 only for now.

 Thanks,
 Dave.





[julia-users] Re: JuliaBox

2014-11-10 Thread Pablo Zubieta
Hi Shashi, I would like a code too.

Thanks in advance,
Pablo


[julia-users] Re: Contributing to a Julia Package

2014-11-10 Thread Tim Wheeler
Thank you! It seems to have worked.
Per João's suggestions, I had to:


   - Create a fork on Github of the target package repository
   - Clone my fork locally
   - Create a branch on my local repository
   - Add, commit,  push my changes to said branch
   - On github I could then submit the pull request from my forked repo to 
   the upstream master






On Monday, November 10, 2014 11:17:55 AM UTC-8, Tim Wheeler wrote:

 Hello Julia Users,

 I wrote some code that I would like to submit via pull request to a Julia 
 package. The thing is, I am new to this and do not understand the pull 
 request process.

 What I have done:

- used Pkg.add to obtain a local version of said package
- ran `git branch mybranch` to create a local git branch 
- created my code additions and used `git add` to include them. Ran 
`git commit -m`

 I am confused over how to continue. The instructions on git for issuing a 
 pull request require that I use their UI interface, but my local branch is 
 not going to show up when I select new pull request because it is, well, 
 local to my machine. Do I need to fork the repository first? When I try 
 creating a branch through the UI I do not get an option to create one like 
 they indicate in the tutorial 
 https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch,
  
 perhaps because I am not a repo owner.

 Thank you.



Re: [julia-users] JuliaBox

2014-11-10 Thread David Higgins


On Monday, 10 November 2014 19:33:09 UTC, Shashi Gowda wrote:


 On Tue, Nov 11, 2014 at 1:01 AM, Shashi Gowda shashi...@gmail.com 
 javascript: wrote:


 Just do not publish it online.


 Oops I meant to send it to David directly. If anyone else wants a code, 
 please let me know.


I did wonder about this bit :P

Thank you very much in any case.

Dave 


Re: [julia-users] travis for os x packages

2014-11-10 Thread Elliot Saba
Yep.  Essentially, you'll need to enable the osx build environment
http://docs.travis-ci.com/user/osx-ci-environment/.  It looks like Travis
is not accepting http://docs.travis-ci.com/user/multi-os/ more multi-os
requests at the moment, so the typical approach, (used on, for instance,
the main julia repository
https://github.com/JuliaLang/julia/blob/master/.travis.yml#L2-L4) won't
work.

You may not be able to get it to run on multiple OS'es, but you should be
able to get it to run on OSX only by setting the language to
objective-c.  This will get it to run on OSX only, then you can use
the default
.travis.yml file
https://github.com/JuliaLang/julia/blob/tk/default-travis-multi-os/base/pkg/generate.jl#L139-L155
that is generated by Pkg.

In short, you should be able to take that default file, change the language
to objective-c, remove the os block, and call it good.  Save that as
.travis.yml in your repo, enable Travis in your repository's services
section, and test away!
-E

On Mon, Nov 10, 2014 at 7:50 AM, Simon Byrne simonby...@gmail.com wrote:

 I would like to set up travis for an OS X-only package: does anyone have
 suggestions for how I should set up travis (or has anyone already done
 this)?

 simon



Re: [julia-users] Compressing .jld files

2014-11-10 Thread Douglas Bates
On Monday, November 10, 2014 12:55:24 PM UTC-6, Steven G. Johnson wrote:



 On Tuesday, September 2, 2014 3:58:25 PM UTC-4, Jake Bolewski wrote:

 It would be best to incorporate it into the HDF5 package.  A julia 
 package would be useful if you wanted to do the same sort of compression on 
 Julia binary blobs, such as serialized julia values in an IOBuffer.


 Wouldn't it be better to have a separate Blosc.jl package that is used by 
 HDF5.jl?   After all, there are presumably many other applications of this.


That seems to be the most reasonable approach but I couldn't work out how 
to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
package aware of that location when building libhdf5.  Are there examples 
of how to do that?
 


 Note that HDF5 has a Blosc filter (
 http://www.hdfgroup.org/services/filters.html#blosc and 
 https://github.com/Blosc/c-blosc/tree/master/hdf5), so that I guess you 
 can use Blosc internally in the HDF5 file while still allowing HDF5 tools 
 to work with the file. 



[julia-users] Re: Contributing to a Julia Package

2014-11-10 Thread Ivar Nesje
Another important point (for actively developed packages) is that Pkg.add() 
checks out the commit of the latest released version registered in 
METADATA.jl. Most packages do development on the master branch, so you 
should likely base your changes on master, rather than the latest released 
version.

To do this, you can use `Pkg.checkout()`, but `git checkout master` will 
also work.

Ivar

kl. 21:07:49 UTC+1 mandag 10. november 2014 skrev Tim Wheeler følgende:

 Thank you! It seems to have worked.
 Per João's suggestions, I had to:


- Create a fork on Github of the target package repository
- Clone my fork locally
- Create a branch on my local repository
- Add, commit,  push my changes to said branch
- On github I could then submit the pull request from my forked repo 
to the upstream master






 On Monday, November 10, 2014 11:17:55 AM UTC-8, Tim Wheeler wrote:

 Hello Julia Users,

 I wrote some code that I would like to submit via pull request to a Julia 
 package. The thing is, I am new to this and do not understand the pull 
 request process.

 What I have done:

- used Pkg.add to obtain a local version of said package
- ran `git branch mybranch` to create a local git branch 
- created my code additions and used `git add` to include them. Ran 
`git commit -m`

 I am confused over how to continue. The instructions on git for issuing a 
 pull request require that I use their UI interface, but my local branch is 
 not going to show up when I select new pull request because it is, well, 
 local to my machine. Do I need to fork the repository first? When I try 
 creating a branch through the UI I do not get an option to create one like 
 they indicate in the tutorial 
 https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch,
  
 perhaps because I am not a repo owner.

 Thank you.



Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread Milan Bouchet-Valat
Le lundi 10 novembre 2014 à 10:07 -0800, David van Leeuwen a écrit :
 Hello, 
 
 On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat wrote:
 Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit 
 : 
  Hello, 
  
  On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: 
  NamedArrays.jl generally goes along this way. However, it 
  remains limited in two aspects: 
  
  
  1. Some fields in NamedArrays are not declared of specific 
  types. In particular, the field `dicts` is of the type 
  `Vector{Dict}`, and the use of this field is on the 
 critical 
  path when looping over the table, e.g. when counting. This 
  would potentially lead to substantial impact on 
 performance.  
  
  I suppose the problem you indicate can be alleviated by making 
  NamedArray parameterized by the type of the key in the dict as 
 well.   
 Right. Sounds reasonable. 
 
 
 
 I've been pondering over how this could be done. NamedArray has a type
 parameter N, and it should then further have N type parameters
 indicating the dictionary type along each of the N dimension.  So I
 figure this is going to be a challenging type definition.  
A tuple type could be used to give the type of the dimension names.

But there's another issue: `dicts::Vector{Dict}` cannot be defined more
precisely than that if heterogeneous types are allowed for different
dimensions. Is this a case where staged functions could be used to
generate efficient functions to access dictionaries?


Regards


[julia-users] Help optimizing sparse matrix code

2014-11-10 Thread Joshua Tokle
Hello! I'm trying to replace an existing matlab code with julia and I'm 
having trouble matching the performance of the original code. The matlab 
code is here:
https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m

The program clusters inventors from a database of patent applications. The 
input data is a sparse boolean matrix (named XX in the script), where each 
row defines an inventor and each column defines a feature. For example, the 
jth column might correspond to a feature first name is John. If there is 
a 1 in the XX[i, j], this means that inventor i's first name is John. Given 
an inventor i, we find similar inventors by identifying rows in the matrix 
that agree with XX[i, :] on a given column and then applying element-wise 
boolean operations to the rows. In the code, for a given value of `index`, 
C_lastname holds the unique column in XX corresponding to a last name 
feature such that XX[index, :] equals 1. C_firstname holds the unique 
column in XX corresponding to a first name feature such that XX[index, :] 
equals 1. And so on. The following code snippet finds all rows in the 
matrix that agree with XX[index, :] on full name and one of patent assignee 
name, inventory city, or patent class:

lump_index_2 = step  ((C_assignee | C_city | C_class))

The `step` variable is an indicator that's used to prevent the same 
inventors from being considered multiple times. My attempt at a literal 
translation of this code to julia is here:
https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl

The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean operations 
aren't supported for sparse matrices in julia, so I fake it with integer 
arithmetic.  The line that corresponds to the matlab code above is

lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class)))

The reason I grouped it this way is that initially `step` will be a 
sparse vector of all 1's, and I thought it might help to do the truly 
sparse arithmetic first.

I've been testing this code on a Windows 2008 Server. The test data 
contains 45,763 inventors and 274,578 possible features (in other words, XX 
is an 45,763 x 274,58 sparse matrix). The matlab program consistently takes 
about 70 seconds to run on this data. The julia version shows a lot of 
variation: it's taken as little as 60 seconds and as much as 10 minutes. 
However, most runs take around 3.5 to 4 minutes. I pasted one output from 
the sampling profiler here [1]. If I'm reading this correctly, it looks 
like the program is spending most of its time performing element-wise 
multiplication of the indicator vectors I described above.

I would be grateful for any suggestions that would bring the performance of 
the julia program in line with the matlab version. I've heard that the last 
time the matlab code was run on the full data set it took a couple days, so 
a slow-down of 3-4x is a signficant burden. I did attempt to write a more 
idiomatic julia version using Dicts and Sets, but it's slower than the 
version that uses sparse matrix operations:
https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl

Thank you!
Josh


[1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5



Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson


 Wouldn't it be better to have a separate Blosc.jl package that is used by 
 HDF5.jl?   After all, there are presumably many other applications of this.


 That seems to be the most reasonable approach but I couldn't work out how 
 to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
 package aware of that location when building libhdf5.  Are there examples 
 of how to do that?


I've just created a Blosc.jl package and registered it.   Do Pkg.update() 
and Pkg.add(Blosc) to get it.

To get the library location in the HDF5 package, just:

1) Add Blosc to the REQUIRE file
2) import Blosc
3) Blosc.libblosc is the path to the shared library.


Re: [julia-users] Help optimizing sparse matrix code

2014-11-10 Thread Milan Bouchet-Valat
Le lundi 10 novembre 2014 à 13:03 -0800, Joshua Tokle a écrit :
 Hello! I'm trying to replace an existing matlab code with julia and
 I'm having trouble matching the performance of the original code. The
 matlab code is here:
 
 https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m
 
 The program clusters inventors from a database of patent applications.
 The input data is a sparse boolean matrix (named XX in the script),
 where each row defines an inventor and each column defines a feature.
 For example, the jth column might correspond to a feature first name
 is John. If there is a 1 in the XX[i, j], this means that inventor
 i's first name is John. Given an inventor i, we find similar inventors
 by identifying rows in the matrix that agree with XX[i, :] on a given
 column and then applying element-wise boolean operations to the rows.
 In the code, for a given value of `index`, C_lastname holds the unique
 column in XX corresponding to a last name feature such that
 XX[index, :] equals 1. C_firstname holds the unique column in XX
 corresponding to a first name feature such that XX[index, :] equals
 1. And so on. The following code snippet finds all rows in the matrix
 that agree with XX[index, :] on full name and one of patent assignee
 name, inventory city, or patent class:
 
 lump_index_2 = step  ((C_assignee | C_city | C_class))
 
 The `step` variable is an indicator that's used to prevent the same
 inventors from being considered multiple times. My attempt at a
 literal translation of this code to julia is here:
 
 https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl
 
 The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean
 operations aren't supported for sparse matrices in julia, so I fake it
 with integer arithmetic.  The line that corresponds to the matlab code
 above is
 
 lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class)))
You should be able to get a speedup by replacing this line with an
explicit `for` loop. First, you'll avoid memory allocation (one for each
+ or .* operation). Second, you'll be able to return as soon as the
index is found, instead of computing the value for all elements (IIUC
you're only looking for one index, right?).


My two cents

 The reason I grouped it this way is that initially `step` will be a
 sparse vector of all 1's, and I thought it might help to do the
 truly sparse arithmetic first.
 
 I've been testing this code on a Windows 2008 Server. The test data
 contains 45,763 inventors and 274,578 possible features (in other
 words, XX is an 45,763 x 274,58 sparse matrix). The matlab program
 consistently takes about 70 seconds to run on this data. The julia
 version shows a lot of variation: it's taken as little as 60 seconds
 and as much as 10 minutes. However, most runs take around 3.5 to 4
 minutes. I pasted one output from the sampling profiler here [1]. If
 I'm reading this correctly, it looks like the program is spending most
 of its time performing element-wise multiplication of the indicator
 vectors I described above.
 
 I would be grateful for any suggestions that would bring the
 performance of the julia program in line with the matlab version. I've
 heard that the last time the matlab code was run on the full data set
 it took a couple days, so a slow-down of 3-4x is a signficant burden.
 I did attempt to write a more idiomatic julia version using Dicts and
 Sets, but it's slower than the version that uses sparse matrix
 operations:
 
 https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl
 
 Thank you!
 Josh
 
 
 [1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5
 
 



Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson


On Monday, November 10, 2014 5:02:03 PM UTC-5, Steven G. Johnson wrote:

 I've just created a Blosc.jl package and registered it.   Do Pkg.update() 
 and Pkg.add(Blosc) to get it.


Oh, darn it, I just realized I am duplicating some work by jakebolewski... 


[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread John Drummond
Got it - I don't know whether it's a bug or not.
if I comment out 
#import Base.isless
in the LogParse.jl file and initially reload that in the repl and then 
reload the correct version with
import Base.isless
methods(isless) shows the method but sort says it's not defined, even when 
I specify it directly.
Apologies for not checking the initial input in a fresh session, I thought 
that reloading a module would completely reload the functions, but 
presumably not when appending to those in Base.

Kind regards, John.




On Monday, November 10, 2014 6:04:29 PM UTC, John Drummond wrote:

 Thank you, that's helpful. 
 I reentered it all in a fresh session and found it working as well - I'll 
 try and find the difference which caused it not to work and come back.
 Kind Regards, John.

 On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:

 This code works everywhere I'm able to try it. 

 kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:

 I was originally julia 0.3.1 on windows 7
 this is on Macosx 10 julia 0.3.2
 I loaded the file LogParse.jl below and then in the repl ran

 reload(LogParse.jl)

 methods(isless)


 ary1 = LogParse.DayPriceText[]
 push!(ary1,LogParse.DayPriceText(4,a1,1))
 push!(ary1,LogParse.DayPriceText(2,a1,1))
 push!(ary1,LogParse.DayPriceText(6,a1,1))


 sort(ary1)

 sort(ary1,lt=LogParse.isless)
 I get the same messages - methods(isless) shows that it's loaded
 but the sort can't find it, even when I try to specify the function


 #in file LogParse.jl ###
 module LogParse
 export DayPriceText
 import Base.isless

 type DayPriceText
   a1::Uint32
   b1::ASCIIString
   a2::Uint32
 end

 function isless(a::DayPriceText, b::DayPriceText)
   if (a.a1  b.a1)
 return true
   else
 return false
   end
 end


 end
 ##

 Many thanks.
 Kind regards, John


 On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:

 In this case it would be really great if you had a minimal reproducible 
 example. It looks to me as you are doing everything right, so I would 
 start 
 looking for typos and scoping issues. It's hard to find them without 
 looking at the code.

 Ideally the example should be small and possible to paste into a REPL 
 session, but if you can publish your code and don't want to extract only 
 the relevant part, that might be fine too.

 Julia version and operating system is also nice to include, so that we 
 have it available in case we have problems reproducing your results.

 Regards Ivar

 kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende:

 Hi,
 I suspect I'm doing something stupid but no idea what I'm missing.

 I create a module .
 I create a type in it, DayPriceText
 I import Base.isless
 I define isless for the type

 now in the repl I get

 methods(isless)
 =
 # 25 methods for generic function isless:
 ..
 isless(x::DayPriceText,y::DayPriceText) at 
 c:\works\juliaplay\LogParse.jl:16

 but

 julia typeof(a1p)
 Array{DayPriceText,1}

 julia sort(a1p, lt=CILogParse.isless)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 julia sort(a1p)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 I'm sure there's some obvious answer, but I've not idea what.
 Thanks for any help
 kind regards, John.



Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson


 That seems to be the most reasonable approach but I couldn't work out how 
 to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
 package aware of that location when building libhdf5.  Are there examples 
 of how to do that?


Note that the dependencies in some sense run in the opposite direction.  
You don't technically need to make HDF5 aware of Blosc when building 
libhdf5.  Instead, you need to build a Blosc filter for HDF5 (included 
with c-blosc) and register it with HDF5.

The Blosc.jl package can't build the HDF5 filter, because that would 
introduce an unnecessary dependency on HDF5 for other things using Blosc.   
So, at least this component needs to be built in/after the HDF5 package.


Re: [julia-users] Compressing .jld files

2014-11-10 Thread Jake Bolewski
The 64 bit issue is killer and why I didn't go farther with integrating 
blosc with hdf5.  I guess I should had been more vocal about this.  Take 
what you may from my nascent package :-)

On Monday, November 10, 2014 6:05:40 PM UTC-5, Steven G. Johnson wrote:


 That seems to be the most reasonable approach but I couldn't work out how 
 to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
 package aware of that location when building libhdf5.  Are there examples 
 of how to do that?


 Note that the dependencies in some sense run in the opposite direction.  
 You don't technically need to make HDF5 aware of Blosc when building 
 libhdf5.  Instead, you need to build a Blosc filter for HDF5 (included 
 with c-blosc) and register it with HDF5.

 The Blosc.jl package can't build the HDF5 filter, because that would 
 introduce an unnecessary dependency on HDF5 for other things using Blosc.   
 So, at least this component needs to be built in/after the HDF5 package.



[julia-users] Re: Great new expository article about Julia by the core developers

2014-11-10 Thread cdm
see this ...

https://groups.google.com/d/msg/julia-box/hw81as3GPWA/E1QJm1shnV4J



On Monday, November 10, 2014 7:37:08 AM UTC-8, David Higgins wrote:

 So how does one go about getting an invitation to JuliaBox? It's 
 referenced in the article but you need an invitation to login

 Dave.



[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread Ivar Nesje
That seems like a tricky edge case, indeed. Not sure if this is a bug 
either, or if there are any existing issues on github that covers this.

kl. 23:26:49 UTC+1 mandag 10. november 2014 skrev John Drummond følgende:

 Got it - I don't know whether it's a bug or not.
 if I comment out 
 #import Base.isless
 in the LogParse.jl file and initially reload that in the repl and then 
 reload the correct version with
 import Base.isless
 methods(isless) shows the method but sort says it's not defined, even when 
 I specify it directly.
 Apologies for not checking the initial input in a fresh session, I thought 
 that reloading a module would completely reload the functions, but 
 presumably not when appending to those in Base.

 Kind regards, John.




 On Monday, November 10, 2014 6:04:29 PM UTC, John Drummond wrote:

 Thank you, that's helpful. 
 I reentered it all in a fresh session and found it working as well - I'll 
 try and find the difference which caused it not to work and come back.
 Kind Regards, John.

 On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:

 This code works everywhere I'm able to try it. 

 kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:

 I was originally julia 0.3.1 on windows 7
 this is on Macosx 10 julia 0.3.2
 I loaded the file LogParse.jl below and then in the repl ran

 reload(LogParse.jl)

 methods(isless)


 ary1 = LogParse.DayPriceText[]
 push!(ary1,LogParse.DayPriceText(4,a1,1))
 push!(ary1,LogParse.DayPriceText(2,a1,1))
 push!(ary1,LogParse.DayPriceText(6,a1,1))


 sort(ary1)

 sort(ary1,lt=LogParse.isless)
 I get the same messages - methods(isless) shows that it's loaded
 but the sort can't find it, even when I try to specify the function


 #in file LogParse.jl ###
 module LogParse
 export DayPriceText
 import Base.isless

 type DayPriceText
   a1::Uint32
   b1::ASCIIString
   a2::Uint32
 end

 function isless(a::DayPriceText, b::DayPriceText)
   if (a.a1  b.a1)
 return true
   else
 return false
   end
 end


 end
 ##

 Many thanks.
 Kind regards, John


 On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:

 In this case it would be really great if you had a minimal 
 reproducible example. It looks to me as you are doing everything right, 
 so 
 I would start looking for typos and scoping issues. It's hard to find 
 them 
 without looking at the code.

 Ideally the example should be small and possible to paste into a REPL 
 session, but if you can publish your code and don't want to extract only 
 the relevant part, that might be fine too.

 Julia version and operating system is also nice to include, so that we 
 have it available in case we have problems reproducing your results.

 Regards Ivar

 kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond 
 følgende:

 Hi,
 I suspect I'm doing something stupid but no idea what I'm missing.

 I create a module .
 I create a type in it, DayPriceText
 I import Base.isless
 I define isless for the type

 now in the repl I get

 methods(isless)
 =
 # 25 methods for generic function isless:
 ..
 isless(x::DayPriceText,y::DayPriceText) at 
 c:\works\juliaplay\LogParse.jl:16

 but

 julia typeof(a1p)
 Array{DayPriceText,1}

 julia sort(a1p, lt=CILogParse.isless)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 julia sort(a1p)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 I'm sure there's some obvious answer, but I've not idea what.
 Thanks for any help
 kind regards, John.



[julia-users] Re: JuliaBox

2014-11-10 Thread cdm

the Sagemath Cloud google chrome app also gets users to a rich environment 
for Julia ...

  
 
https://chrome.google.com/webstore/detail/the-sagemath-cloud/eocdndagganmilahaiclppjigemcinmb


users can run Julia inside a terminal ... OR ... via iJulia notebooks ... 
OR ... via Sagemath worksheets.


also available for running Julia within a terminal, the VMs served at

   https://koding.com (there is also a google chrome app for this ...)


best,

cdm


On Monday, November 10, 2014 11:04:13 AM UTC-8, Ivar Nesje wrote:

 Yesterday someone suggested 
 https://groups.google.com/forum/#!searchin/julia-users/monster/julia-users/zEp8pKkEYHk/Oqb7NYdxFcwJ
  

  https://tmpnb.org/




[julia-users] Available packages for compression?

2014-11-10 Thread Steven G. Johnson
Pkg.add(Blosc) should now add a working Blosc package. 


Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson


On Monday, November 10, 2014 6:09:50 PM UTC-5, Jake Bolewski wrote:

 The 64 bit issue is killer and why I didn't go farther with integrating 
 blosc with hdf5.  I guess I should had been more vocal about this.  Take 
 what you may from my nascent package :-) 


Google's Snappy library has a 64-bit API, but seems to also be limited to 
32-bit sizes internally, as is the LZ4 library.  Kind of surprising that so 
many people would independently limit themselves to 32-bit buffers nowadays.


Re: [julia-users] Compressing .jld files

2014-11-10 Thread Steven G. Johnson
On Monday, November 10, 2014 8:39:41 PM UTC-5, Steven G. Johnson wrote:

 Google's Snappy library has a 64-bit API, but seems to also be limited to 
 32-bit sizes internally, as is the LZ4 library.  Kind of surprising that so 
 many people would independently limit themselves to 32-bit buffers nowadays.


Snappy's only excuse was backwards compatibility: 
https://code.google.com/p/snappy/issues/detail?id=76 


Re: [julia-users] travis for os x packages

2014-11-10 Thread Tony Kelman
I don't want to steal Pontus Stenetorp's thunder since he did all the work, 
but there's a PR open 
here https://github.com/travis-ci/travis-build/pull/318 that will sooner or 
later add community maintained support for Julia directly in Travis as 
`language: julia`. The default .travis.yml for Julia packages can be 
simplified even further once that gets rolled out.

That doesn't fix the capacity issues at Travis where they aren't accepting 
new repos, so for now the `language: objective-c` version, and using the 
install-julia.sh script, is the best way to temporarily test things out on 
Mac workers.


On Monday, November 10, 2014 12:32:34 PM UTC-8, Elliot Saba wrote:

 Yep.  Essentially, you'll need to enable the osx build environment 
 http://docs.travis-ci.com/user/osx-ci-environment/.  It looks like 
 Travis is not accepting http://docs.travis-ci.com/user/multi-os/ more 
 multi-os requests at the moment, so the typical approach, (used on, for 
 instance, the main julia repository 
 https://github.com/JuliaLang/julia/blob/master/.travis.yml#L2-L4) won't 
 work.

 You may not be able to get it to run on multiple OS'es, but you should be 
 able to get it to run on OSX only by setting the language to 
 objective-c.  This will get it to run on OSX only, then you can use the 
 default 
 .travis.yml file 
 https://github.com/JuliaLang/julia/blob/tk/default-travis-multi-os/base/pkg/generate.jl#L139-L155
  
 that is generated by Pkg.

 In short, you should be able to take that default file, change the 
 language to objective-c, remove the os block, and call it good.  Save 
 that as .travis.yml in your repo, enable Travis in your repository's 
 services section, and test away!
 -E

 On Mon, Nov 10, 2014 at 7:50 AM, Simon Byrne simon...@gmail.com 
 javascript: wrote:

 I would like to set up travis for an OS X-only package: does anyone have 
 suggestions for how I should set up travis (or has anyone already done 
 this)?

 simon




[julia-users] Displaying a polygon mesh

2014-11-10 Thread Simon Kornblith
Is there an easy way to display a polygon mesh in Julia, i.e., vertices and 
faces loaded from an STL file or created by marching tetrahedra using 
Meshes.jl? So far, I see:

   - PyPlot/matplotlib, which seems to be surprisingly difficult to 
   convince to do this.
   - GLPlot, which doesn't currently work for me on 0.4. (I haven't tried 
   very hard yet.)
   - ihnorton's VTK bindings, which aren't registered in METADATA.jl. 

Is there another option I'm missing? If not, can I convince one of these 
packages to show my mesh with minimal time investment, or should I use a 
separate volume viewer (or maybe a Python package via PyPlot)?

Thanks,
Simon


[julia-users] Julia Tech Talk at the University of Pennsylvania

2014-11-10 Thread Ted Fujimoto
Hi all,

Feel free to come by if you're around Philly!

Julia Tech Talk on Thursday, November 13 at 6:00pm at Wu and Chen Auditorium
When: Thursday, November 13 
https://www.facebook.com/events/calendar/2014/November/13 at 6:00pm
Where: Wu and Chen Auditorium 
https://www.facebook.com/pages/Wu-and-Chen-Auditorium/145368958832977 
Philadelphia, 
Pennsylvania 19104

 

On Thursday, November 13th @ 6pm the Dining Philosophers will be hosting a 
talk on the Julia Programming language in Wu  Chen Auditorium. Julia has 
the elegance and familiarity of Python and Matlab, with speed close to C, 
and is completely open source. This is a great opportunity for anyone 
interested in scientific and parallel computation, machine learning, data 
analysis, and visualization. There will be a giveaway of online JuliaBox 
codes for the Julia language for all attendees!

Speakers: Ted Fujimoto (CIT Masters student) and Randy Zwitch (Senior Data 
Scientist at Comcast)

 

Randy Zwitch is Senior Data Scientist at Comcast, researching how to 
improve the overall customer viewing experience using petabyte-scale tools 
and datasets. Randy also contributes to the R and Julia open-source 
communities, creating and maintaining packages primarily related to the web 
(HTTP requests/APIs, Server Log Parsing, Geo-Location, etc.) and database 
access. 


Abstract: Using publicly available datasets, Randy will provide an intro to 
machine learning using ad-hoc Julia code and via add-on packages.


[julia-users] Questions relating to packages and using/creating them

2014-11-10 Thread Dom Luna
I have some general questions about using packages.

1. Is there a way to create a workspace separate of $HOME/.julia? This 
would still have the same functionality when calling using in the REPL.
2. What's the best practice for packages with the same name? I don't have a 
problem related to this but I'm just curious how this is handled. I think 
via Pkg.add(...) there's only one definition of any package name, but with 
Pkg.clone(...) I could see package name collisions. Having all the packages 
under one directory doesn't seem scalable to me.

thanks


Re: [julia-users] Questions relating to packages and using/creating them

2014-11-10 Thread Isaiah Norton
1. see LOAD_PATH (http://julia.readthedocs.org/en/latest/manual/modules/)
2. this is not specifically supported, as far as I know. We could be fancy
and add a UUID to the package spec, or something like that, but I don't
think it is a very pressing concern right now. The simple options right now
are to manipulate LOAD_PATH to put the preferred package path(s) first (I
think this should work) or to manually `require` a specific path (which
won't work with `using`).

On Mon, Nov 10, 2014 at 9:25 PM, Dom Luna dluna...@gmail.com wrote:

 I have some general questions about using packages.

 1. Is there a way to create a workspace separate of $HOME/.julia? This
 would still have the same functionality when calling using in the REPL.
 2. What's the best practice for packages with the same name? I don't have
 a problem related to this but I'm just curious how this is handled. I think
 via Pkg.add(...) there's only one definition of any package name, but with
 Pkg.clone(...) I could see package name collisions. Having all the packages
 under one directory doesn't seem scalable to me.

 thanks



Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Aneesh Sathe
Unless I understood wrong (which is very possible) the 65536 bins were to 
cover all possible values a 16bit pixel can take. Though, in the actual 
graythresh function i will probably use 256 bins by default.

I did find the docs for adding custom formats 
(https://github.com/timholy/Images.jl/blob/master/doc/extendingIO.md) 

But perhaps making bio formats .jar file will be better in the long run for 
few reasons:

1) A lot more formats are covered so implementing that would allow coverage 
of more formats faster. 
2) I understand your reasons for making all images in the Gray range, but i 
prefer having real pixel values. That way its easier to correlate test 
data with something like Fiji or Matlab. And I don't understand Julia float 
handling fully but there might be a gain in speed if using non-float 
values. 
3) Bio formats already allows the reading of individual images based on 
XYZCT so that doesn't  need to be rebuilt. 

Course, the above is the ideal thing to do. I'm still trying to figure out 
how to use the .jar file, so i might just end up adding the custom format 
first. 

Let's see...

-Aneesh

On Monday, November 10, 2014 6:55:08 PM UTC+8, Tim Holy wrote:

 All good plans. (I'm not sure about using 65536 bins for 16-bit images, 
 though, because that would be more bins than there are pixels in some 
 images. 
 Still, it's not all that much memory, really, so maybe that would be OK.) 

 It would be great to add native support. Presumably you've found the docs 
 on 
 adding support for new formats. 

 For formats that encode large datasets in a single block (like NRRD), you 
 can 
 work with GB-sized datasets on a laptop because you can use mmap (I do it 
 routinely). But the love of TIFF does demand an alternative solution. 
 Presumably we should add a lower-level routine that returns a structure 
 that 
 facilitates later access, e.g., 
 imds = imdataset(my_image_file) 
 img = imds[z, 14, t, 7] 
 or somesuch. 

 --Tim 

 On Sunday, November 09, 2014 07:38:27 PM Aneesh Sathe wrote: 
  Tim, 
  i would like the imhist to be idiot proof. (i've been teaching matlab 
 and 
  nothing puts new people off more than things not being idiot proof). 
  things like using 256 bins by default returning a plot  if no 
 outputs 
  are specified (basically make it like matlab's imthresh() ) 
  
  Btw, on matlab using bioformats is actually the slowest part of my 
  algorithm, so unless it can be faster in julia native support might be 
  nicer. Bioformats also fails in that it reads the whole sequence at 
 once... 
  so running things on laptops with even GB-level datasets is impossible. 
 I 
  wrote my own version of bfopen to only open the required XYZCT for 
  specified series, but that only solves the memory usage. 
  
  the source format for my image was .mvd2 (perkin elmer spinning disk). 
  
  i know about JavaCall.jl just havent had the time to play with it... 
  
  i was thinking it might be fun to attempt native support for a few 
 formats. 
  I can also generate test data in a few vendor formats for a few 
  microscopes. 
  perhaps even make it a julia-box based project. ;) 
  
  On Monday, November 10, 2014 4:49:22 AM UTC+8, Tim Holy wrote: 
   On Sunday, November 09, 2014 11:39:53 AM Aneesh Sathe wrote: 
Yes, Images does read it okay but only if i cut out the substack. If 
 i 
don't, then it interprets the three channels as a time dimension, 
 which 
isnt a pain at the moment but will be if i start using it for work. 
   
   Hmm, that sounds like an annotation problem. 
   
I realized that both the convert and the g[:] would slow me down but 
 the 
hist function just wouldn't work without that kind of dance. Also, 
graythresh (http://www.mathworks.com/help/images/ref/graythresh.html) 

   
   uses 
   
reshape to make it all one image which might also add to speed. 

The pull request is well and good but personally i would rather have 
 a 
dedicated image histogram function like 
imhist: http://www.mathworks.com/help/images/ref/imhist.html 
which would give histograms based on input images. To me that's the 
 only 
way to make life easier. maybe i'll write one :) 
   
   imhist is necessary in matlab largely because hist works columnwise; 
 in a 
   sense, Julia's `hist` is like imhist. Is there some specific 
 functionality 
   you're interested in? There's no reason Images can't provide a custom 
   version 
   of `hist`. 
   
Something about Images: do you think it possible to use the bio 
 formats' 
.jar file to import images from a microscope format to Images? 
Opening a microscope format image file in the relevant software and 
 then 
exporting it as tiff takes too long and i'd rather be able to access 
 the 
images directly. 
   
   Yes, expansion of Images' I/O capabilities would be great. I've 
 wondered 
   about 
   Bio-Formats myself, but not had a direct need, nor do I know Java (but 
 see 
   

Re: [julia-users] Performance confusions on matrix extractions in loops, and memory allocations

2014-11-10 Thread Todd Leo
I do, actually, tried expanding vectorized operations into explicit for 
loops, and computing vector multiplication / vector norm in BLAS 
interfaces. For explicit loops, it did allocate less memory, but took much 
more time. Meanwhile, the vectorized version which I've been get used to 
write runs incredibly fast, as the following tests indicates:

# Explicit for loop, slightly modified from SimilarityMetric.jl by 
johnmyleswhite 
(https://github.com/johnmyleswhite/SimilarityMetrics.jl/blob/master/src/cosine.jl)
function cosine(a::SparseMatrixCSC{Float64, Int64}, 
b::SparseMatrixCSC{Float64, Int64})
sA, sB, sI = 0.0, 0.0, 0.0
for i in 1:length(a)
sA += a[i]^2
sI += a[i] * b[i]
end
for i in 1:length(b)
sB += b[i]^2
end
return sI / sqrt(sA * sB)
end

# BLAS version
function cosine_blas(i::SparseMatrixCSC{Float64, Int64}, 
j::SparseMatrixCSC{Float64, Int64})
i = full(i)
j = full(j)
numerator = BLAS.dot(i, j)
denominator = BLAS.nrm2(i) * BLAS.nrm2(j)
return numerator / denominator
end

# the vectorized version remains the same, as the 1st post shows.

# Test functions
function test_explicit_loop(d)
for n in 1:1
v = d[:,1]
cosine(v,v)
end
end
  
function test_blas(d)
for n in 1:1
v = d[:,1]
cosine_blas(v,v)
end
end
  
function test_vectorized(d)
for n in 1:1
v = d[:,1]
cosine_vectorized(v,v)
end
end

test_explicit_loop(mat)
test_blas(mat)
test_vectorized(mat)
gc()
@time test_explicit_loop(mat)
gc()
@time test_blas(mat)
gc()
@time test_vectorized(mat)

# Results
elapsed time: 3.772606858 seconds (6240080 bytes allocated)
elapsed time: 0.400972089 seconds (327520080 bytes allocated, 81.58% gc 
time)
elapsed time: 0.011236068 seconds (34560080 bytes allocated)


On Monday, November 10, 2014 7:23:17 PM UTC+8, Milan Bouchet-Valat wrote:

  Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo a écrit : 

 Hi fellows,  

  

  I'm currently working on sparse matrix and cosine similarity 
 computation, but my routines is running very slow, at least not reach my 
 expectation. So I wrote some test functions, to dig out the reason of 
 ineffectiveness. To my surprise, the execution time of passing two vectors 
 to the test function and passing the whole sparse matrix differs greatly, 
 the latter is 80x faster. I am wondering why extracting two vectors of the 
 matrix in each loop is dramatically faster that much, and how to avoid the 
 multi-GB memory allocate. Thanks guys. 

  

  -- 

  BEST REGARDS, 

  Todd Leo 

  

  # The sparse matrix 

  mat # 2000x15037 SparseMatrixCSC{Float64, Int64} 

  

  # The two vectors, prepared in advance 

  v = mat'[:,1] 

  w = mat'[:,2] 

  

  # Cosine similarity function 

  function cosine_vectorized(i::SparseMatrixCSC{Float64, Int64}, 
 j::SparseMatrixCSC{Float64, Int64}) 

  return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j)) 

  end 

 I think you'll experience a dramatic speed gain if you write the sums in 
 explicit loops, accessing elements one by one, taking their product and 
 adding it immediately to a counter. In your current version, the 
 element-wise products allocate new vectors before computing the sums, which 
 is very costly.

 This will also get rid of the difference you report between passing arrays 
 and vectors.


 Regards

  function test1(d) 

  res = 0. 

  for i in 1:1 

  res = cosine_vectorized(d[:,1], d[:,2]) 

  end 

  end 

  

  function test2(_v,_w) 

  res = 0. 

  for i in 1:1 

  res = cosine_vectorized(_v, _w) 

  end 

  end 

  

  test1(dtm) 

  test2(v,w) 

  gc() 

  @time test1(dtm) 

  gc() 

  @time test2(v,w) 

  

  #elapsed time: 0.054925372 seconds (59360080 bytes allocated, 59.07% gc 
 time)

  #elapsed time: 4.204132608 seconds (3684160080 bytes allocated, 65.51% 
 gc time)

  
 

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Tim Holy
On Monday, November 10, 2014 06:49:17 PM Aneesh Sathe wrote:
 2) I understand your reasons for making all images in the Gray range, but i 
 prefer having real pixel values. That way its easier to correlate test
 data with something like Fiji or Matlab. And I don't understand Julia float
 handling fully but there might be a gain in speed if using non-float
 values.

They're not really float values, underneath they are integers. You can just say 
`reinterpret(Uint16, x)`.

--Tim


[julia-users] Re: Displaying a polygon mesh

2014-11-10 Thread Steven G. Johnson


On Monday, November 10, 2014 9:09:29 PM UTC-5, Simon Kornblith wrote:

 Is there an easy way to display a polygon mesh in Julia, i.e., vertices 
 and faces loaded from an STL file or created by marching tetrahedra using 
 Meshes.jl? So far, I see:


Mayavi via PyCall? 


Re: [julia-users] Displaying a polygon mesh

2014-11-10 Thread Erik Schnetter
I'm using Compose (and Color), on which Gadfly is built. I tried
Gadfly itself, but there were some inefficiencies -- I tried to
compose an image consisting of many different edges, and this many
independent graphs (I'm using the wrong terminology here) was not
handled well.

I've copy-and-pasted my plot routines at
https://gist.github.com/eschnett/a9e7f70e4910e4ba2768 to give you an
example.

circle draws a filled circle (a vertex), and line draws a line (an
edge). I'm choosing colours depending on the z coordinate. The code
isn't self-contained, but should serve as example to see how
easy/complex this approach is.

-erik


On Mon, Nov 10, 2014 at 9:09 PM, Simon Kornblith si...@simonster.com wrote:
 Is there an easy way to display a polygon mesh in Julia, i.e., vertices and
 faces loaded from an STL file or created by marching tetrahedra using
 Meshes.jl? So far, I see:

 PyPlot/matplotlib, which seems to be surprisingly difficult to convince to
 do this.
 GLPlot, which doesn't currently work for me on 0.4. (I haven't tried very
 hard yet.)
 ihnorton's VTK bindings, which aren't registered in METADATA.jl.

 Is there another option I'm missing? If not, can I convince one of these
 packages to show my mesh with minimal time investment, or should I use a
 separate volume viewer (or maybe a Python package via PyPlot)?

 Thanks,
 Simon



-- 
Erik Schnetter schnet...@cct.lsu.edu
http://www.perimeterinstitute.ca/personal/eschnetter/


Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Aneesh Sathe
Ah! I had misunderstood that. Thank you! :)

On Tuesday, November 11, 2014 11:19:29 AM UTC+8, Tim Holy wrote:

 On Monday, November 10, 2014 06:49:17 PM Aneesh Sathe wrote: 
  2) I understand your reasons for making all images in the Gray range, 
 but i 
  prefer having real pixel values. That way its easier to correlate test 
  data with something like Fiji or Matlab. And I don't understand Julia 
 float 
  handling fully but there might be a gain in speed if using non-float 
  values. 

 They're not really float values, underneath they are integers. You can 
 just say 
 `reinterpret(Uint16, x)`. 

 --Tim 



Re: [julia-users] travis for os x packages

2014-11-10 Thread Pontus Stenetorp
On 11 November 2014 10:49, Tony Kelman t...@kelman.net wrote:
 I don't want to steal Pontus Stenetorp's thunder since he did all the work,
 but there's a PR open here
 https://github.com/travis-ci/travis-build/pull/318 that will sooner or later
 add community maintained support for Julia directly in Travis as
 `language: julia`. The default .travis.yml for Julia packages can be
 simplified even further once that gets rolled out.

No worries about the thunder, let's hope they merge it soon enough and
I can make a public announcement.  Also, thank you for poking them the
other day.

Pontus


[julia-users] Elementwise operator

2014-11-10 Thread Michael Louwrens
I was looking at the Devectorise package and was wondering, why not have an 
operator that calls elementwise operations?

While syntax is not something I have considered, using something basic like 
the example I see 

r = a .* b + c .* d + a


could be expressed as

r = .(a * b + c * d + a)


which would then apply the expression

a * b + c * d + a


to each element in the array.

.= could possibly be used in place of surrounding the expression with 
.(Expr).

I am not too familiar with Devectorise here but the advantage of this (from 
what I can tell with a limited look through of the readme) is that this 
could include user functions as well as user functions would then be 
applied.

r = .(a * b + c * d + foo(a) * bar(c,d))

or

r .= a * b + c * d + foo(a) * bar(c,d))


Should theoretically be possible then.

The obvious advantage would be that memory only needs to be allocated once 
for the new array instead of each broadcasted operator.

Just a thought which may be stepping on Devectorise's toes but reading 
through some of the vectorised code issues I thought this may be a simple 
solution which may provide a performance benefit.


[julia-users] Re: Displaying a polygon mesh

2014-11-10 Thread Alex
Winston has an experimental/undocumented function surf + some stuff around it 
(https://github.com/nolta/Winston.jl/blob/master/src/canvas3d.jl), which might 
be sufficient if you just want to have a look at your meshes.

Best,

Alex.

On Tuesday, 11 November 2014 03:09:29 UTC+1, Simon Kornblith  wrote:
 Is there an easy way to display a polygon mesh in Julia, i.e., vertices and 
 faces loaded from an STL file or created by marching tetrahedra using 
 Meshes.jl? So far, I see:
 PyPlot/matplotlib, which seems to be surprisingly difficult to convince to do 
 this.GLPlot, which doesn't currently work for me on 0.4. (I haven't tried 
 very hard yet.)
 ihnorton's VTK bindings, which aren't registered in METADATA.jl. 
 Is there another option I'm missing? If not, can I convince one of these 
 packages to show my mesh with minimal time investment, or should I use a 
 separate volume viewer (or maybe a Python package via PyPlot)?
 
 Thanks,
 Simon


[julia-users] Re: Initialize dict of dicts with = syntax

2014-11-10 Thread Todd Leo
How to initialize an array of dicts? Is there any suggested ways to do it?

julia (Int64=Int64)[]
Dict{Int64,Int64} with 0 entries

# And since brackets creates Arrays:
julia Any[]
0-element Array{Any,1}

# So I suppose this would generate array of dicts, until it fails:
julia ((Int64=Int64)[])[]
ERROR: `getindex` has no method matching getindex(::Dict{Int64,Int64})




On Sunday, May 4, 2014 5:02:14 AM UTC+8, thom lake wrote:

 One thing that I like about {} for initializing Array{Any,1}, is the 
 consistency with comprehension syntax. Namely, braces for Any, brackets for 
 specific types

 julia typeof({i=2i for i = 1:10})
 Dict{Any,Any}

 julia typeof([i=2i for i = 1:10])
 Dict{Int64,Int64}

 julia typeof({2i for i = 1:10})
 Array{Any,1}

 julia typeof([2i for i = 1:10])
 Array{Int64,1}




[julia-users] Re: parallel for loop in Julia

2014-11-10 Thread DrKey
Thank you for your answer.
Do you have any suggestions how to deal with that?

Am Montag, 10. November 2014 23:25:23 UTC+1 schrieb ele...@gmail.com:



 On Tuesday, November 11, 2014 5:10:30 AM UTC+11, DrKey wrote:

 Here is what i tried:
 variant1:

 forcp = zeros(3,1);

 forcp = @parallel (hcat) for partA = 1:nPart
 for partB = (partA+1):nPart
 ...
 end
 forcp = forces[:,partA];
 end

 variant2:
 function calcforces(coords,L,np,i) # with np... number of processes i... 
 current process
 for partA = i+1:np:nPart-1
 for partB = (partA+1):nPart
 ...
 return forces
 end

 np = nprocs();
 parad = Array(RemoteRef,np);

 and then calling function calcforces with: 
 for i=1:np parad[i] = @spawn LJ_Force_MT(coords,L,np,i); end
 for i=1:np forces = fetch(parad[i]); end

 both ways are giving me wrong results over more than 1 timestep


 You have multiple parallel loops modifying the forces array.  They will be 
 generating races for sure.

 Cheers
 Lex