[julia-users] Re: Performance confusions on matrix extractions in loops, and memory allocations
I tested it again with @time test2(dtm'[:,1], dtm'[:,2]) and it took only 0.013seconds. I also checked @time test2(v,w) and it resulted similar time. I changed nothing, it was odd. On Monday, November 10, 2014 3:28:10 PM UTC+8, Daniel Høegh wrote: I have made a minimum test case: a=rand(1,2) function newsum(a) for i in 1:100 sum(a[:,1])+sum(a[:,2]) end end function newsum(a1,a2) for i in 1:100 sum(a1)+sum(a2) end end @time newsum(a) @time newsum(a[:,1],a[:,2]) elapsed time: 0.073095574 seconds (17709844 bytes allocated, 23.23% gc time) elapsed time: 0.006946504 seconds (244796 bytes allocated) I suggest that a[:,1] is making a copy of the data in the a matrix this is done in each iteration of the first function, but in the second function this is done only once when the function is called like: newsum(a[:,1],a[:,2]).
[julia-users] Re: ANN: Compat.jl
Hello, I didn't realize NamedArrays was broken on release-0.3, because of my lack of travis skills. I had a different 0.4 incompatibility: (Dict{K,V})(ks::AbstractArray{K},vs::AbstractArray{V}) is deprecated, use (Dict{K,V})(zip(ks,vs)) instead. Foolishly I replace my construct Dict(keys, values) by @Compat.dict(zip(keys, values)) but that breaks on release-0.3. Is there a recommended way to solve this incompatibility? Cheers, ---david On Saturday, October 11, 2014 8:17:38 PM UTC+2, Stefan Karpinski wrote: This announcement is primarily for Julia package developers. Since there is already some syntax breakage between Julia v0.3 and v0.4, and there will be more, it's increasingly tricky to make packages to work on both versions. The Compat package https://github.com/JuliaLang/Compat.jl was just created to help: it provides compatibility constructs that will work in both versions without warnings. For example, in v0.3 you could create a dictionary like this: julia [ :foo = 1, :bar = 2 ] Dict{Symbol,Int64} with 2 entries: :bar = 2 :foo = 1 This still works in v0.4 but it produces a warning. The new syntax is this: julia Dict(:foo = 1, :bar = 2) Dict{Symbol,Int64} with 2 entries: :bar = 2 :foo = 1 However, this newer syntax won't work in v0.3, so you're a bit stuck if you want to write a dictionary literal in a way that will work in both v0.3 and v0.4 without producing a warning. Compat to the rescue!: julia using Compat julia @Compat.Dict(:foo = 2, :bar = 2) Dict{Symbol,Int64} with 2 entries: :bar = 2 :foo = 2 This works with no warning on both v0.3 and v0.4. We've intentionally not exported the Dict macro so that the usage needs to be prefixed with Compat., which will make usages of the compatibility workarounds easier to find and remove later when they're no longer necessary. Currently, there's only a couple of definitions in the Compat package, but if you have your own hacks that have helped make it easier to write cross-version package code, please contribute them and we can build up a nice little collection.
Re: [julia-users] no zero() for DateTime?
Basically this is an issue with DataFrames using a function in base for a different purpose than its documented intent. zero() has been documented to mean additive identity http://docs.julialang.org/en/latest/stdlib/base/#Base.zero, and Date and DateTime, doesn't have an additive identity. (apart from the period types, but it is unclear which one to return) Looking at dataframes, I discovered that they already monkey patch Base.zeros() to make it work for strings https://github.com/JuliaStats/DataFrames.jl/blob/211cd659cb7f9035980697f7effa081e29b9bf3e/src/dataframe/dataframe.jl#L805 . I think this is a bigger issue to be discussed in the contest of the use case in DataFrames. My two obvious suggestions would be to: 1. Change the documentation for zero() to say that it is the additive identity unless it doesn't make sense, in which case any default value is good. 2. Create a new function in Base for this specific need of a default value. Ivar kl. 03:53:43 UTC+1 mandag 10. november 2014 skrev Jacob Quinn følgende: HmmmI guess we could add 0 and 1 definitions if it'll be generally useful (i.e. Date/DateTime s are ordinals with numeric-like properties, so being able to define zero/one and have them work with generic functions). It still just seems a little weird because there's not a real solid reasoning/meaning. I think one reason a lot of other languages define a zero(::DateTime) is because values can be truthy or falsey, so you would compare a date with zero(::DateTime) to check for falseness. In Julia, you have to use explicit Booleans, so that's not as important a reason. Happy to hear other opinions/use cases from people though. -Jacob On Sun, Nov 9, 2014 at 9:23 PM, Thomas Covert thom@gmail.com javascript: wrote: To your first question, I'm sure there are good reasons for not having zeros in the Date and Time types, but in other languages (i.e., stata), dates and times are stored as integers or floats with respect to some reference time. So, I *think* the 0-date in stata refers to January 1, 1960. Obviously this is fairly arbitrary, but there is some precedence for it in other languages. On Sunday, November 9, 2014 8:17:04 PM UTC-6, Jacob Quinn wrote: What Date would represent zero(::Date)? Or one(::Date), for that matter? Doesn't seem like a particularly useful definition. What's the use case? On Sun, Nov 9, 2014 at 9:14 PM, Thomas Covert thom@gmail.com wrote: I'm using Dates.jl on 0.3 and have discovered that there is no zero defined for the Date or DateTime types. Is this intentional?
Re: [julia-users] no zero() for DateTime?
Yes, the use of zero is an anachronism from a design in which zero was used to have a default value for arbitrary types. -- John On Nov 10, 2014, at 8:22 AM, Ivar Nesje iva...@gmail.com wrote: Basically this is an issue with DataFrames using a function in base for a different purpose than its documented intent. zero() has been documented to mean additive identity, and Date and DateTime, doesn't have an additive identity. (apart from the period types, but it is unclear which one to return) Looking at dataframes, I discovered that they already monkey patch Base.zeros() to make it work for strings. I think this is a bigger issue to be discussed in the contest of the use case in DataFrames. My two obvious suggestions would be to: Change the documentation for zero() to say that it is the additive identity unless it doesn't make sense, in which case any default value is good. Create a new function in Base for this specific need of a default value. Ivar kl. 03:53:43 UTC+1 mandag 10. november 2014 skrev Jacob Quinn følgende: HmmmI guess we could add 0 and 1 definitions if it'll be generally useful (i.e. Date/DateTime s are ordinals with numeric-like properties, so being able to define zero/one and have them work with generic functions). It still just seems a little weird because there's not a real solid reasoning/meaning. I think one reason a lot of other languages define a zero(::DateTime) is because values can be truthy or falsey, so you would compare a date with zero(::DateTime) to check for falseness. In Julia, you have to use explicit Booleans, so that's not as important a reason. Happy to hear other opinions/use cases from people though. -Jacob On Sun, Nov 9, 2014 at 9:23 PM, Thomas Covert thom@gmail.com wrote: To your first question, I'm sure there are good reasons for not having zeros in the Date and Time types, but in other languages (i.e., stata), dates and times are stored as integers or floats with respect to some reference time. So, I *think* the 0-date in stata refers to January 1, 1960. Obviously this is fairly arbitrary, but there is some precedence for it in other languages. On Sunday, November 9, 2014 8:17:04 PM UTC-6, Jacob Quinn wrote: What Date would represent zero(::Date)? Or one(::Date), for that matter? Doesn't seem like a particularly useful definition. What's the use case? On Sun, Nov 9, 2014 at 9:14 PM, Thomas Covert thom@gmail.com wrote: I'm using Dates.jl on 0.3 and have discovered that there is no zero defined for the Date or DateTime types. Is this intentional?
Re: [julia-users] Compressing .jld files
Has there been any progress on a (stand-alone) Blosc package for Julia? If not I might have time to contribute since I need a fast compressor for a project. If there is any code/start for it I'd appreciate it though. Cheers, Robert Feldt Den tisdagen den 2:e september 2014 kl. 21:47:33 UTC+2 skrev Douglas Bates: Would it be reasonable to create a Blosc package or it is best to incorporate it directly into the HDF5 package? If a separate package is reasonable I could start on it, as I was the one who suggested this in the first place. On Tuesday, September 2, 2014 2:43:15 PM UTC-5, Tim Holy wrote: All these testimonials do make it sound promising. Even three-fold compression is a pretty big deal. One disadvantage to compression is that it makes mmap impossible. But, since HDF5 supports hyperslabs, that's not as big a deal as it would have been. --Tim On Tuesday, September 02, 2014 12:11:55 PM Jake Bolewski wrote: I've used Blosc in the past with great success. Oftentimes it is faster than the uncompressed version if IO is the bottleneck. The compression ratios are not great but that is really not the point. On Tuesday, September 2, 2014 2:09:20 PM UTC-4, Stefan Karpinski wrote: That looks pretty sweet. It seems to avoid a lot of the pitfalls of naively compressing data files while still getting the benefits. It would be great to support that in JLD, maybe even turned on by default. On Tue, Sep 2, 2014 at 1:35 PM, Kevin Squire kevin@gmail.com javascript: wrote: Just to hype blosc a little more, see http://www.blosc.org/blosc-in-depth.html The main feature is that data is chunked so that the compressed chunk size fits into L1 cache, and is then decompressed and used there. There are a few more buzzwords (multithreading, simd) in the link above. Worth exploring where this might be useful in Julia. Cheers, Kevin On Tuesday, September 2, 2014, Tim Holy tim@gmail.com javascript: wrote: HDF5/JLD does support compression: https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-w riting-data But it's not turned on by default. Matlab uses compression by default, and I've found it's a huge bottleneck in terms of performance ( http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files -more-quickly). But perhaps there's a good middle ground. It would take someone doing a little experimentation to see what the compromises are. --Tim On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote: Now that the JLD format can handle DataFrame objects I would like to switch from storing data sets in .RData format to .jld format. Datasets stored in .RData format are compressed after they are written. The default compression is gzip. Bzip2 and xz compression are also available. The compression can make a substantial difference in the file size because the data values are often highly repetitive. JLD is different in scope in that .jld files can be queried using external programs like h5ls and the files can have new data added or existing data edited or removed. The .RData format is an archival format. Once the file is written it cannot be modified in place. Given these differences I can appreciate that JLD files are not compressed. Nevertheless I think it would be useful to adopt a convention in the JLD module for accessing data from files with a .jld.xz or .jld.7z extension. It could be as simple as uncompressing the files in a temporary directory, reading then removing, or it could be more sophisticated. I notice that my versions of libjulia.so on an Ubuntu 64-bit system are linked against both libz.so and liblzma.so $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so linux-vdso.so.1 = (0x7fff5214f000) libdl.so.2 = /lib/x86_64-linux-gnu/libdl.so.2 (0x7f62932ee000) libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1 (0x7f62930d5000) libm.so.6 = /lib/x86_64-linux-gnu/libm.so.6 (0x7f6292dce000) librt.so.1 = /lib/x86_64-linux-gnu/librt.so.1 (0x7f6292bc6000) libpthread.so.0 = /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f62929a8000) libunwind.so.8 = /usr/lib/x86_64-linux-gnu/libunwind.so.8 (0x7f629278c000) libstdc++.so.6 = /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x7f6292488000) libgcc_s.so.1 = /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x7f6292272000) libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6 (0x7f6291eab000) /lib64/ld-linux-x86-64.so.2 (0x7f62944b3000) liblzma.so.5
[julia-users] Available packages for compression?
For a project I need fast string compression accessible from Julia. I have found: * Gzip.jl, file-based access to gzip compression https://github.com/JuliaLang/GZip.jl * Zlib.jl, in-memory access to gzip compression https://github.com/dcjones/Zlib.jl * There has been talks about doing a Julia package for Blosc (blosc.org) and I found this but not sure it's working: https://github.com/jakebolewski/Blosc.jl https://groups.google.com/forum/#!topic/julia-users/eT5_h9zfT5k If anyone knows of more/other compression packages useable from Julia, please share in this thread. This way people can get a more up-to-date view. Compression is a basic building block for a lot of different things so good if we have many options in Julia. Would be very nice to have access to liblzma, xz, paq etc, long-term. If one just needs to estimate the LZ76 complexity there is a pure Julia implementation here: https://github.com/robertfeldt/InfoTheory.jl/blob/master/spikes/lempel_ziv_76_complexity.jl but it has bad performance for long strings compare to Zlib so probably not very useful. Thanks, Robert Feldt
[julia-users] Re: Available packages for compression?
If people want to try Blosc please see this issue for how to build it on Julia 0.3.0 (at least on my Mac OS X 10.9): https://github.com/jakebolewski/Blosc.jl/issues/1 but then one can compare Zlib and Blosc compressors: using Zlib zliblength(str) = length(Zlib.compress(str,9,false,true)) using Blosc lz4length(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, cname=:lz4)) lz4hclength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, cname=:lz4hc)) bzliblength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, cname=:zlib)) function report(name, func, input) tic() len = func(input) t = toq() @printf(%s, time = %.3e seconds, compression ratio = %.3f\n, name, t, length(input)/len) end for exponent in 1:7 n = 10^exponent input = Uint8[1:n]; strinput = string(input); println(\nInput of length 10^$exponent) report(zlib , (input) - zliblength(input), input) report(zlib in blosc, (input) - lz4hclength(input), input) report(lz4hc, (input) - bzliblength(input), input) report(lz4 , (input) - lz4length(input), input) end which gives output: Input of length 10^1 zlib , time = 4.789e-02 seconds, compression ratio = 0.833 zlib in blosc, time = 3.256e-02 seconds, compression ratio = 0.385 lz4hc, time = 3.939e-03 seconds, compression ratio = 0.385 lz4 , time = 3.482e-03 seconds, compression ratio = 0.385 Input of length 10^2 zlib , time = 1.211e-04 seconds, compression ratio = 0.980 zlib in blosc, time = 1.448e-05 seconds, compression ratio = 0.862 lz4hc, time = 3.801e-06 seconds, compression ratio = 0.862 lz4 , time = 3.403e-06 seconds, compression ratio = 0.862 Input of length 10^3 zlib , time = 8.187e-05 seconds, compression ratio = 3.571 zlib in blosc, time = 1.400e-04 seconds, compression ratio = 3.413 lz4hc, time = 5.589e-05 seconds, compression ratio = 3.226 lz4 , time = 1.119e-05 seconds, compression ratio = 3.413 Input of length 10^4 zlib , time = 1.158e-04 seconds, compression ratio = 27.473 zlib in blosc, time = 4.732e-05 seconds, compression ratio = 30.395 lz4hc, time = 1.107e-04 seconds, compression ratio = 25.381 lz4 , time = 6.572e-06 seconds, compression ratio = 30.395 Input of length 10^5 zlib , time = 7.319e-04 seconds, compression ratio = 140.252 zlib in blosc, time = 2.058e-04 seconds, compression ratio = 146.628 lz4hc, time = 6.519e-04 seconds, compression ratio = 134.590 lz4 , time = 2.368e-05 seconds, compression ratio = 146.628 Input of length 10^6 zlib , time = 4.517e-03 seconds, compression ratio = 238.095 zlib in blosc, time = 2.291e-04 seconds, compression ratio = 237.473 lz4hc, time = 4.493e-03 seconds, compression ratio = 236.407 lz4 , time = 6.989e-04 seconds, compression ratio = 198.807 Input of length 10^7 zlib , time = 4.499e-02 seconds, compression ratio = 255.669 zlib in blosc, time = 3.146e-02 seconds, compression ratio = 246.299 lz4hc, time = 1.749e-02 seconds, compression ratio = 247.078 lz4 , time = 5.670e-03 seconds, compression ratio = 200.489 It seems that LZ4Hc compression in Blosc is sometimes quite some bit faster, but not always. Compression ratio is good. LZ4 is always faster than the others but sometimes compresses a bit less. For strings shorter than ~350 characters there is not always any compression of the input. Note that the string being compressed here is very regular though so this eval is not very good and might be misleading of compression levels to expect. This is just a very rough indication. Cheers, Robert Den måndagen den 10:e november 2014 kl. 09:49:54 UTC+1 skrev Robert Feldt: For a project I need fast string compression accessible from Julia. I have found: * Gzip.jl, file-based access to gzip compression https://github.com/JuliaLang/GZip.jl * Zlib.jl, in-memory access to gzip compression https://github.com/dcjones/Zlib.jl * There has been talks about doing a Julia package for Blosc (blosc.org) and I found this but not sure it's working: https://github.com/jakebolewski/Blosc.jl https://groups.google.com/forum/#!topic/julia-users/eT5_h9zfT5k If anyone knows of more/other compression packages useable from Julia, please share in this thread. This way people can get a more up-to-date view. Compression is a basic building block for a lot of different things so good if we have many options in Julia. Would be very nice to have access to liblzma, xz, paq etc, long-term. If one just needs to estimate the LZ76 complexity there is a pure Julia implementation here: https://github.com/robertfeldt/InfoTheory.jl/blob/master/spikes/lempel_ziv_76_complexity.jl but it has bad performance for long strings compare to Zlib so probably not very useful. Thanks, Robert Feldt
Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)
Le dimanche 09 novembre 2014 à 23:50 +, John Myles White a écrit : FWIW, I think the best way to move forward with NamedArrays is to replace NamedArrays with a parametric type Named{T} that wraps around other AbstractArray types. That gives you both named Array and named DataArray objects for the same cost. Yeah, looks like a good idea. Duplicating the code for each array type would be a waste. Regards On Nov 9, 2014, at 5:49 PM, Tim Holy tim.h...@gmail.com wrote: Indeed, better to use a Dict if you're naming each row/column. I'd forgotten that was part of NamedArrays. --Tim On Sunday, November 09, 2014 06:11:44 PM Milan Bouchet-Valat wrote: Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit : With regards to arrays with named dimensions, I suspect that with the arrival of stagedfunctions, something like NamedAxesArrays (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But stagedfunctions still have some show-stopper bugs, and we need to fix those first. Interesting package! But when I said named dimensions, I actually meant that dimensions had names, but that elements on each dimension (rows, columns...) had names too. I'm not sure it also makes sense to use staged functions to specialize code on element names, since they can vary much more than dimension names. This could generate quite a lot of methods which would use memory even if only used once. Regards On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote: Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit : I would vote for calling such a function `table()`, to get even closer to R's table(). Well, that's the debate at https://github.com/JuliaStats/StatsBase.jl/issues/32 At first I was in favor of table() too, but now I prefer freqtable(), because table could mean any kind of cross-tabulation. I think NamedArray could even be called Table. And I can't wait for such functionality to be included in METADATA... Actually I didn't do it because NamedArrays.jl didn't work well on 0.3 when I first worked on the package. Now I see the tests are still failing. Do you know what is needed to make them work? Another point is that I think this deserves going into StatsBase, but before that we need everybody to agree on a design for NamedArrays. Regards On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat wrote: Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a écrit : I was also looking for a function like this, but could not find one in docs.julialang.org. I was doing this (v0.4.0-dev), for anyone who is interested: example = rand(1:10,100) uexample = sort(unique(example)) counts = map(x-count(y-x==y,example),uexample) It's pretty ugly, so thanks, Johan, for pointing out the StatsBase-countmap I've also put together a small package precisely aimed at offering an equivalent of R's table(): https://github.com/nalimilan/Tables.jl But there's a more general issue about how to handle arrays with dimension names in Julia. NamedArrays.jl (which is used in my package) attempts to tackle this issue, but I don't think we've reached a consensus yet about the best solution. Regards On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids wrote: I think countmap comes closest to giving you what you want: using StatsBase data = sample([a, b, c], 20) countmap(data) Dict{ASCIIString,Int64} with 3 entries: c = 3 b = 10 a = 7 On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian Oswald wrote: Hi I'm looking for the best way to count how many times a certain value x_i appears in vector x, where x could be integers, floats, strings. In R I would do table(x). I found StatsBase.counts(x,k) but I'm a bit confused by k (where k goes into 1:k, i.e. the vector is scanned to find how many elements locate at each point of 1:k). most of the times I don't know k, and in fact I would do table(x) just to find out what k was. Apart from that, I don't think I could use this with strings, as I can't construct a range object from strings. I'm wondering whether a method StatsBase.counts(x::Vector) just returning the frequency of each element appearing would be useful. The same applies to Base.hist if I understand correctly. I just don't want to have to
Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)
Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit : Hello, On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: NamedArrays.jl generally goes along this way. However, it remains limited in two aspects: 1. Some fields in NamedArrays are not declared of specific types. In particular, the field `dicts` is of the type `Vector{Dict}`, and the use of this field is on the critical path when looping over the table, e.g. when counting. This would potentially lead to substantial impact on performance. In the beginning I have been experimenting with indexing speed, mainly to sort out the various forms of getindex(), and I although I don't remember the exact result, I do remember that I found the drop in performance w.r.t. integer indexing surprisingly small. I suppose the problem you indicate can be alleviated by making NamedArray parameterized by the type of the key in the dict as well. Right. Sounds reasonable. 2. Currently, it only accepts a limited set of types for indices, e.g. Real and String. But in some cases, people may go beyond this. I don't think we have to impose this limit. Ah---I now see what you mean. I thought I had built in support for all types as index, but there obviously is no catch all-rule in getindex. I suppose NamedArray needs an update there. I think the last time I looked into this, it was a problem even for efficiently indexing AbstractArrays: https://github.com/JuliaLang/julia/pull/4892#issuecomment-31087910 Slow catch-all methods are good, but if we want specialized versions it will probably need more work. If you want to accept combinations of Int/String/Complement{T}/anything, the number of specialized methods to generate explodes. I think the conclusion was that we needed to wait for staged functions. Since they are implemented now, it may be a good time to look into this issue for both AbstractArrays and NamedArrays. Regards On Monday, November 10, 2014 8:35:32 AM UTC+8, Dahua Lin wrote: I have been observing an interesting differences between people coming from stats and machine learning. Stats people tend to favor the approach that allows one to directly use the category names to index the table, e.g. A[apple]. This tendency is clearly reflected in the design of R, where one can attach a name to everything. While in machine learning practice, it is a common convention to just encode categories into integers, and simply use an ordinary array to represent a counting table. Whereas it makes it a little bit inconvenient in an interactive environment, this way is generally more efficient when you have to deal with these categories over a large number of samples. These differences aside, I believe, however, that there exist a very generic approach to this problem -- a multi-dimensional associative map, which allows one to write A[i1, i2, ...] where the indices can be arbitrary hashable equality-comparable instances, including integers, strings, symbols, among many other things. A multi-dimensional associative map can be considered as a multi-dimensional generalization of dictionaries, which can be easily implemented via an multidimensional array and several dictionaries, each for one dimension, to map user-side indexes to integer indexes. - Dahua On Monday, November 10, 2014 8:12:54 AM UTC+8, David van Leeuwen wrote: Hi, On Sunday, November 9, 2014 5:10:19 PM UTC+1, Milan Bouchet-Valat wrot Actually I didn't do it because NamedArrays.jl didn't work well on 0.3 when I first worked on the package. Now I see the tests are still failing. Do you know what is needed to make them work?
Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm
All good plans. (I'm not sure about using 65536 bins for 16-bit images, though, because that would be more bins than there are pixels in some images. Still, it's not all that much memory, really, so maybe that would be OK.) It would be great to add native support. Presumably you've found the docs on adding support for new formats. For formats that encode large datasets in a single block (like NRRD), you can work with GB-sized datasets on a laptop because you can use mmap (I do it routinely). But the love of TIFF does demand an alternative solution. Presumably we should add a lower-level routine that returns a structure that facilitates later access, e.g., imds = imdataset(my_image_file) img = imds[z, 14, t, 7] or somesuch. --Tim On Sunday, November 09, 2014 07:38:27 PM Aneesh Sathe wrote: Tim, i would like the imhist to be idiot proof. (i've been teaching matlab and nothing puts new people off more than things not being idiot proof). things like using 256 bins by default returning a plot if no outputs are specified (basically make it like matlab's imthresh() ) Btw, on matlab using bioformats is actually the slowest part of my algorithm, so unless it can be faster in julia native support might be nicer. Bioformats also fails in that it reads the whole sequence at once... so running things on laptops with even GB-level datasets is impossible. I wrote my own version of bfopen to only open the required XYZCT for specified series, but that only solves the memory usage. the source format for my image was .mvd2 (perkin elmer spinning disk). i know about JavaCall.jl just havent had the time to play with it... i was thinking it might be fun to attempt native support for a few formats. I can also generate test data in a few vendor formats for a few microscopes. perhaps even make it a julia-box based project. ;) On Monday, November 10, 2014 4:49:22 AM UTC+8, Tim Holy wrote: On Sunday, November 09, 2014 11:39:53 AM Aneesh Sathe wrote: Yes, Images does read it okay but only if i cut out the substack. If i don't, then it interprets the three channels as a time dimension, which isnt a pain at the moment but will be if i start using it for work. Hmm, that sounds like an annotation problem. I realized that both the convert and the g[:] would slow me down but the hist function just wouldn't work without that kind of dance. Also, graythresh (http://www.mathworks.com/help/images/ref/graythresh.html) uses reshape to make it all one image which might also add to speed. The pull request is well and good but personally i would rather have a dedicated image histogram function like imhist: http://www.mathworks.com/help/images/ref/imhist.html which would give histograms based on input images. To me that's the only way to make life easier. maybe i'll write one :) imhist is necessary in matlab largely because hist works columnwise; in a sense, Julia's `hist` is like imhist. Is there some specific functionality you're interested in? There's no reason Images can't provide a custom version of `hist`. Something about Images: do you think it possible to use the bio formats' .jar file to import images from a microscope format to Images? Opening a microscope format image file in the relevant software and then exporting it as tiff takes too long and i'd rather be able to access the images directly. Yes, expansion of Images' I/O capabilities would be great. I've wondered about Bio-Formats myself, but not had a direct need, nor do I know Java (but see JavaCall.jl, if you haven't already). The other way to go, of course, is Julia native support. Our support for NRRD is a reasonable model of this approach. However, the reason we use ImageMagick is because the reality is that there are a lot of formats out there; Bio- Formats would fill a similar need for vendor-specific file formats. Out of curiousity, what's the original format you're using? --Tim
[julia-users] Re: ANN: Compat.jl
Hi David, shouldnt it be @Compat Dict(zip(keys, values) instead of @Compat.dict(zip(keys, values)), i.e. a space between compat and dict rather than a dot method call? Best, Nils
Re: [julia-users] Performance confusions on matrix extractions in loops, and memory allocations
Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo a écrit : Hi fellows, I'm currently working on sparse matrix and cosine similarity computation, but my routines is running very slow, at least not reach my expectation. So I wrote some test functions, to dig out the reason of ineffectiveness. To my surprise, the execution time of passing two vectors to the test function and passing the whole sparse matrix differs greatly, the latter is 80x faster. I am wondering why extracting two vectors of the matrix in each loop is dramatically faster that much, and how to avoid the multi-GB memory allocate. Thanks guys. -- BEST REGARDS, Todd Leo # The sparse matrix mat # 2000x15037 SparseMatrixCSC{Float64, Int64} # The two vectors, prepared in advance v = mat'[:,1] w = mat'[:,2] # Cosine similarity function function cosine_vectorized(i::SparseMatrixCSC{Float64, Int64}, j::SparseMatrixCSC{Float64, Int64}) return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j)) end I think you'll experience a dramatic speed gain if you write the sums in explicit loops, accessing elements one by one, taking their product and adding it immediately to a counter. In your current version, the element-wise products allocate new vectors before computing the sums, which is very costly. This will also get rid of the difference you report between passing arrays and vectors. Regards function test1(d) res = 0. for i in 1:1 res = cosine_vectorized(d[:,1], d[:,2]) end end function test2(_v,_w) res = 0. for i in 1:1 res = cosine_vectorized(_v, _w) end end test1(dtm) test2(v,w) gc() @time test1(dtm) gc() @time test2(v,w) #elapsed time: 0.054925372 seconds (59360080 bytes allocated, 59.07% gc time) #elapsed time: 4.204132608 seconds (3684160080 bytes allocated, 65.51% gc time)
[julia-users] Silhouette width
Hi all, I am new to Julia. I searched a bit but I did not find anything related to the silhouette (http://en.wikipedia.org/wiki/Silhouette_(clustering)) .. Do you know if there is something about it? Thanks, Francesco
[julia-users] Input arguments to gemm!
Hi I am unable to figure out what should I pass as input parameters to the gemm! function. The function declaration asks for function BlasChar, StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array? -- Kapil
Re: [julia-users] Reinterpreting parts of a byte array
Thanks for the responses. As usual, I discover myself making assumptions that may not have been stated well. 1. I'll be reading small bits (32 bit ints, mostly) at fairly random addresses and was worried about the overhead of creating array views for such small objects. Perhaps they are optimized away. I should check :-) 2. I've been taught by other languages that touching raw pointers is dangerous without also holding some promise that they won't be relocated, e.g. by a copying collector, etc. I suppose if it's a memory mapped array, I can roughly cheat and know that the OS won't move it, so Julia can't either. But it worried me. *Sebastian Good* On Sun, Nov 9, 2014 at 11:36 PM, Jameson Nash vtjn...@gmail.com wrote: It rather depends upon what you know about the data. If you want a file-like abstraction, it may be possible to wrap it in an IOBuffer type (if not, it should be parameterized to allow it). If you want an array-like abstraction, then I think reinterpreting to different array types may be the most direct approach. If the array is coming from C, then you can use unsafe_load/unsafe_store directly. As Ivar points out, this is not more nor less dangerous than the same operation in C. Although, if you wrap the data buffer in a Julia object (or got it from a Julia call), you can gain some element of protection against memory corruption bugs by minimizing the amount of julia code that is directly interfacing with the raw memory pointer. On Sun Nov 09 2014 at 5:42:42 PM Ivar Nesje iva...@gmail.com wrote: Is there any problem with reinterpreting the array and then use a SubArray or ArrayView to do the index transformation? Pointer arithmetic is not more or less dangerous in Julia, than what it is in C. The only thing you need to ensure is that the object you have a pointer to is referenced by something the GC traverses, and that it isn't moved in memory (Eg. vector resize).
Re: [julia-users] Silhouette width
Check out the Clustering.jl package which has an interface for silhouette. Specifically, see this file: https://github.com/JuliaStats/Clustering.jl/blob/master/src/silhouette.jl -Jacob On Mon, Nov 10, 2014 at 5:53 AM, Francesco Brundu francesco.bru...@gmail.com wrote: Hi all, I am new to Julia. I searched a bit but I did not find anything related to the silhouette (http://en.wikipedia.org/wiki/Silhouette_(clustering)) .. Do you know if there is something about it? Thanks, Francesco
[julia-users] Re: Input arguments to gemm!
On Monday, November 10, 2014 8:39:00 AM UTC-5, Kapil Agarwal wrote: I am unable to figure out what should I pass as input parameters to the gemm! function. The function declaration asks for function BlasChar, StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array? Yes. (Or rather, the StridedFoo types are a superset, including various 1d/2d array types.)
Re: [julia-users] Re: Input arguments to gemm!
E.g. julia A = randn(3,4);B = randn(4,3);C = Array(Float64,3,3); julia BLAS.gemm!('N', 'N', 1.0, A, B, 0.0, C) 3x3 Array{Float64,2}: -1.39617 4.02968 -1.2171 -2.35074 2.609030.216789 1.63807 0.102948 -0.41358 2014-11-10 9:09 GMT-05:00 Steven G. Johnson stevenj@gmail.com: On Monday, November 10, 2014 8:39:00 AM UTC-5, Kapil Agarwal wrote: I am unable to figure out what should I pass as input parameters to the gemm! function. The function declaration asks for function BlasChar, StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array? Yes. (Or rather, the StridedFoo types are a superset, including various 1d/2d array types.)
[julia-users] Re: Great new expository article about Julia by the core developers
So how does one go about getting an invitation to JuliaBox? It's referenced in the article but you need an invitation to login Dave. On Saturday, 8 November 2014 22:58:31 UTC, Peter Simon wrote: Just found this great new highly accessible exposition about the Julia language: http://arxiv.org/pdf/1411.1607v1.pdf, by Jeff et al. It's the perfect into to share with many of my not-yet-Julian colleagues. --Peter
Re: [julia-users] Re: strange speed reduction when using external function in inner loop
David, Not sure this is correct or helps, but on my Yosemite 10.10.1 MacBook Pro I get below results. Regards, Rob *julia **@time prof(true)* Count FileFunction Line 47 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 15 165 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19 502 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20 98 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23 64 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 29 5 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31 20 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 6 45 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 7 1 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 9 883 /Users/rob/Projects/Julia/Rob/innnercall.jl prof 14 1 /Users/rob/Projects/Julia/Rob/innnercall.jl prof 45 884 REPL.jl eval_user_input54 502 array.jl+ 719 165 random.jl rand! 130 884 task.jl anonymous 96 elapsed time: 1.51332406 seconds (488212276 bytes allocated, 53.00% gc time) *julia **@time prof(true)* Count FileFunction Line 156 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19 577 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 21 116 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 26 53 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31 10 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 6 43 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 7 3 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 9 910 /Users/rob/Projects/Julia/Rob/innnercall.jl prof 14 910 REPL.jl eval_user_input54 577 array.jl+ 719 156 random.jl rand! 130 910 task.jl anonymous 96 elapsed time: 1.488157718 seconds (488208960 bytes allocated, 50.96% gc time) *julia **@time prof(true)* Count FileFunction Line 174 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19 545 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20 115 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 26 46 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 29 8 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31 18 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 6 28 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 7 3 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 9 894 /Users/rob/Projects/Julia/Rob/innnercall.jl prof 14 894 REPL.jl eval_user_input54 545 array.jl+ 719 174 random.jl rand! 130 894 task.jl anonymous 96 elapsed time: 1.448621207 seconds (488206436 bytes allocated, 49.75% gc time) *julia **@time prof(true)* Count FileFunction Line 165 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19 584 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20 117 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23 51 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27 5 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31 16 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 6 34 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot 7 922 /Users/rob/Projects/Julia/Rob/innnercall.jl prof 14 922 REPL.jl eval_user_input54 584 array.jl+ 719 165 random.jl rand! 130 922
[julia-users] travis for os x packages
I would like to set up travis for an OS X-only package: does anyone have suggestions for how I should set up travis (or has anyone already done this)? simon
Re: [julia-users] Translating Class-Based OO Apps to Julia
On Thursday, January 17, 2013 2:56:52 AM UTC+1, Stefan Karpinski wrote: ... This definitely should go in an object-oriented programming in Julia document. Does a document like this exist? It would definitely be useful.
[julia-users] parallel for loop in Julia
I'm a beginner at using Julia and I have written a simple molecular dynamic simulation, which works quite well and fast. Now I'm trying to parallelize my core loop which calculates the forces between each pair of particles. My loop is: for partA = 1:nParts-1 for partB = (partA+1):nParts # Calculate particle-particle distance dr = coords[:,partA] - coords[:,partB]; dr2 = dot(dr,dr) invDr2 = 1.0/dr2; invDr6 = invDr2^3; tforce = invDr2^4 * (invDr6 - 0.5); forces[:,partA] = forces[:,partA] + dr* tforce ; forces[:,partB] = forces[:,partB] - dr* tforce ; end end coords is a array holding the 3 dimensional coordinates for each particle. nParts is the number of particles and forces has the same size as coords and holds the forces for each particle. I tried @parallel for with different reduction operators (I found + and vcat, of course with changing my loop a little bit) which are not documented very well. At least I only found examples for (+) in the help. What is the best way to parallelize this?
[julia-users] Error in PyPlot; cm_get_cmap not defined
I'm using PyPlot to make 3D plots, which I color by getting color maps through ColorMap(::String). After running a Pkg.update() today, I am now getting an error message when trying to construct a 3D plot, saying cm_get_cmap not defined (...) at Plots.jl:141. Indeed, when checking colormaps.jl https://github.com/stevengj/PyPlot.jl/blob/master/src/colormaps.jl, I find that ColorMaps should lead to a call to get_cmap, not cm_get_cmap. Why is my PyPlot trying to get the color maps through a different function?
[julia-users] Absolute value of big(-0.0)
I'm getting (notice the negative sign): abs(big(-0.0)) = -0e+00 with 256 bits of precision I think it would be better to have abs(big(-0.0)) return 0e+00 (for example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is ifelse(x0,-x,0), and -0 is not less than 0.
Re: [julia-users] Absolute value of big(-0.0)
Done: https://github.com/JuliaLang/julia/issues/8968 On Monday, November 10, 2014 12:06:31 PM UTC-5, Stefan Karpinski wrote: This is indeed a bug – could you open an issue? https://github.com/JuliaLang/julia/issues On Mon, Nov 10, 2014 at 5:55 PM, Samuel S. Watson samuel@gmail.com javascript: wrote: I'm getting (notice the negative sign): abs(big(-0.0)) = -0e+00 with 256 bits of precision I think it would be better to have abs(big(-0.0)) return 0e+00 (for example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is ifelse(x0,-x,0), and -0 is not less than 0.
[julia-users] Re: Error in PyPlot; cm_get_cmap not defined
Should be fixed now, sorry.
[julia-users] Re: defining function for lt for use in sort - simple question
Thank you, that's helpful. I reentered it all in a fresh session and found it working as well - I'll try and find the difference which caused it not to work and come back. Kind Regards, John. On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote: This code works everywhere I'm able to try it. kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende: I was originally julia 0.3.1 on windows 7 this is on Macosx 10 julia 0.3.2 I loaded the file LogParse.jl below and then in the repl ran reload(LogParse.jl) methods(isless) ary1 = LogParse.DayPriceText[] push!(ary1,LogParse.DayPriceText(4,a1,1)) push!(ary1,LogParse.DayPriceText(2,a1,1)) push!(ary1,LogParse.DayPriceText(6,a1,1)) sort(ary1) sort(ary1,lt=LogParse.isless) I get the same messages - methods(isless) shows that it's loaded but the sort can't find it, even when I try to specify the function #in file LogParse.jl ### module LogParse export DayPriceText import Base.isless type DayPriceText a1::Uint32 b1::ASCIIString a2::Uint32 end function isless(a::DayPriceText, b::DayPriceText) if (a.a1 b.a1) return true else return false end end end ## Many thanks. Kind regards, John On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote: In this case it would be really great if you had a minimal reproducible example. It looks to me as you are doing everything right, so I would start looking for typos and scoping issues. It's hard to find them without looking at the code. Ideally the example should be small and possible to paste into a REPL session, but if you can publish your code and don't want to extract only the relevant part, that might be fine too. Julia version and operating system is also nice to include, so that we have it available in case we have problems reproducing your results. Regards Ivar kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende: Hi, I suspect I'm doing something stupid but no idea what I'm missing. I create a module . I create a type in it, DayPriceText I import Base.isless I define isless for the type now in the repl I get methods(isless) = # 25 methods for generic function isless: .. isless(x::DayPriceText,y::DayPriceText) at c:\works\juliaplay\LogParse.jl:16 but julia typeof(a1p) Array{DayPriceText,1} julia sort(a1p, lt=CILogParse.isless) ERROR: `isless` has no method matching isless(::DayPriceText, ::DayPriceText) in sort! at sort.jl:246 julia sort(a1p) ERROR: `isless` has no method matching isless(::DayPriceText, ::DayPriceText) in sort! at sort.jl:246 I'm sure there's some obvious answer, but I've not idea what. Thanks for any help kind regards, John.
Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)
Hello, On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat wrote: Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit : Hello, On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: NamedArrays.jl generally goes along this way. However, it remains limited in two aspects: 1. Some fields in NamedArrays are not declared of specific types. In particular, the field `dicts` is of the type `Vector{Dict}`, and the use of this field is on the critical path when looping over the table, e.g. when counting. This would potentially lead to substantial impact on performance. I suppose the problem you indicate can be alleviated by making NamedArray parameterized by the type of the key in the dict as well. Right. Sounds reasonable. I've been pondering over how this could be done. NamedArray has a type parameter N, and it should then further have N type parameters indicating the dictionary type along each of the N dimension. So I figure this is going to be a challenging type definition. ---david
[julia-users] Re: parallel for loop in Julia
Here is what i tried: variant1: forcp = zeros(3,1); forcp = @parallel (hcat) for partA = 1:nPart for partB = (partA+1):nPart ... end forcp = forces[:,partA]; end variant2: function calcforces(coords,L,np,i) # with np... number of processes i... current process for partA = i+1:np:nPart-1 for partB = (partA+1):nPart ... return forces end np = nprocs(); parad = Array(RemoteRef,np); and then calling function calcforces with: for i=1:np parad[i] = @spawn LJ_Force_MT(coords,L,np,i); end for i=1:np forces = fetch(parad[i]); end both ways are giving me wrong results over more than 1 timestep
[julia-users] Re: ANN: Compat.jl
Hi Nils, My current work around is ## temporary compatibility hack if VERSION v0.4.0-dev Base.Dict(z::Base.Zip2) = Dict(z.a, z.b) end On Monday, November 10, 2014 12:04:14 PM UTC+1, Nils Gudat wrote: Hi David, shouldnt it be @Compat Dict(zip(keys, values)) instead of @Compat.Dict(zip(keys, values)), i.e. a space between compat and dict rather than a dot method call? I was just following Stefan's syntax. The dots on my screen are about as big as the stuck pieces of dust, but I really believe there is a period there. julia @Compat.Dict(:foo = 2, :bar = 2) Dict{Symbol,Int64} with 2 entries: :bar = 2 :foo = 2 Macro programming is beyond the scope of my brain, anyway... ---david Best, Nils
[julia-users] Re: ANN: Compat.jl
On Monday, November 10, 2014 1:15:40 PM UTC-5, David van Leeuwen wrote: I was just following Stefan's syntax. The dots on my screen are about as big as the stuck pieces of dust, but I really believe there is a period there. The syntax in Compat.jl changed shortly after its release. The new syntax is to use: @compat ...Julia 0.4 syntax and have it be automatically translated into older syntax as needed. If there is a case where this does not work, please file an issue.
[julia-users] JuliaBox
Hi, Does anyone if JuliaBox http://www.juliabox.org is open to applications to use it these days? I came across it in the ArXiV paper about Julia mentioned here https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ. I'm a current Julia user but I have a number of colleagues who would be interested in a sandboxed, non-install version to play with before making the jump to installation. I made the mistake of suggesting JuliaBox before verifying that it was possible to create an account, it seems it's invite only for now. Thanks, Dave.
Re: [julia-users] Compressing .jld files
On Tuesday, September 2, 2014 3:58:25 PM UTC-4, Jake Bolewski wrote: It would be best to incorporate it into the HDF5 package. A julia package would be useful if you wanted to do the same sort of compression on Julia binary blobs, such as serialized julia values in an IOBuffer. Wouldn't it be better to have a separate Blosc.jl package that is used by HDF5.jl? After all, there are presumably many other applications of this. Note that HDF5 has a Blosc filter (http://www.hdfgroup.org/services/filters.html#blosc and https://github.com/Blosc/c-blosc/tree/master/hdf5), so that I guess you can use Blosc internally in the HDF5 file while still allowing HDF5 tools to work with the file.
Re: [julia-users] Contributing to a Julia Package
Hi Tim, you have to create a fork on Github and then push your new branch to your personal fork. Then, on Github, switch to that fork and the interface will show a Pull request button if your personal fork is ahead of the upstream repository. Best -- João Felipe Santos On Mon, Nov 10, 2014 at 2:17 PM, Tim Wheeler timwheeleronl...@gmail.com wrote: Hello Julia Users, I wrote some code that I would like to submit via pull request to a Julia package. The thing is, I am new to this and do not understand the pull request process. What I have done: - used Pkg.add to obtain a local version of said package - ran `git branch mybranch` to create a local git branch - created my code additions and used `git add` to include them. Ran `git commit -m` I am confused over how to continue. The instructions on git for issuing a pull request require that I use their UI interface, but my local branch is not going to show up when I select new pull request because it is, well, local to my machine. Do I need to fork the repository first? When I try creating a branch through the UI I do not get an option to create one like they indicate in the tutorial https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch, perhaps because I am not a repo owner. Thank you.
Re: [julia-users] JuliaBox
Hello David, Sorry about that. You can use the invite code G01014. How many others do you want to invite? A handful should be fine. Just do not publish it online. Thank you On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithiohuig...@gmail.com wrote: Hi, Does anyone if JuliaBox http://www.juliabox.org is open to applications to use it these days? I came across it in the ArXiV paper about Julia mentioned here https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ. I'm a current Julia user but I have a number of colleagues who would be interested in a sandboxed, non-install version to play with before making the jump to installation. I made the mistake of suggesting JuliaBox before verifying that it was possible to create an account, it seems it's invite only for now. Thanks, Dave.
Re: [julia-users] JuliaBox
Thanks Ivar. 5 people Shashi, all academics so I'd like to get them interested. Dave. On Monday, 10 November 2014 19:31:17 UTC, Shashi Gowda wrote: Hello David, Sorry about that. You can use the invite code G01014. How many others do you want to invite? A handful should be fine. Just do not publish it online. Thank you On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithio...@gmail.com javascript: wrote: Hi, Does anyone if JuliaBox http://www.juliabox.org is open to applications to use it these days? I came across it in the ArXiV paper about Julia mentioned here https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ. I'm a current Julia user but I have a number of colleagues who would be interested in a sandboxed, non-install version to play with before making the jump to installation. I made the mistake of suggesting JuliaBox before verifying that it was possible to create an account, it seems it's invite only for now. Thanks, Dave.
Re: [julia-users] JuliaBox
On Tue, Nov 11, 2014 at 1:01 AM, Shashi Gowda shashigowd...@gmail.com wrote: Just do not publish it online. Oops I meant to send it to David directly. If anyone else wants a code, please let me know.
Re: [julia-users] JuliaBox
Sure :) Happy to let them in. On Tue, Nov 11, 2014 at 1:02 AM, David Higgins daithiohuig...@gmail.com wrote: Thanks Ivar. 5 people Shashi, all academics so I'd like to get them interested. Dave. On Monday, 10 November 2014 19:31:17 UTC, Shashi Gowda wrote: Hello David, Sorry about that. You can use the invite code G01014. How many others do you want to invite? A handful should be fine. Just do not publish it online. Thank you On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithio...@gmail.com wrote: Hi, Does anyone if JuliaBox http://www.juliabox.org is open to applications to use it these days? I came across it in the ArXiV paper about Julia mentioned here https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ. I'm a current Julia user but I have a number of colleagues who would be interested in a sandboxed, non-install version to play with before making the jump to installation. I made the mistake of suggesting JuliaBox before verifying that it was possible to create an account, it seems it's invite only for now. Thanks, Dave.
[julia-users] Re: JuliaBox
Hi Shashi, I would like a code too. Thanks in advance, Pablo
[julia-users] Re: Contributing to a Julia Package
Thank you! It seems to have worked. Per João's suggestions, I had to: - Create a fork on Github of the target package repository - Clone my fork locally - Create a branch on my local repository - Add, commit, push my changes to said branch - On github I could then submit the pull request from my forked repo to the upstream master On Monday, November 10, 2014 11:17:55 AM UTC-8, Tim Wheeler wrote: Hello Julia Users, I wrote some code that I would like to submit via pull request to a Julia package. The thing is, I am new to this and do not understand the pull request process. What I have done: - used Pkg.add to obtain a local version of said package - ran `git branch mybranch` to create a local git branch - created my code additions and used `git add` to include them. Ran `git commit -m` I am confused over how to continue. The instructions on git for issuing a pull request require that I use their UI interface, but my local branch is not going to show up when I select new pull request because it is, well, local to my machine. Do I need to fork the repository first? When I try creating a branch through the UI I do not get an option to create one like they indicate in the tutorial https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch, perhaps because I am not a repo owner. Thank you.
Re: [julia-users] JuliaBox
On Monday, 10 November 2014 19:33:09 UTC, Shashi Gowda wrote: On Tue, Nov 11, 2014 at 1:01 AM, Shashi Gowda shashi...@gmail.com javascript: wrote: Just do not publish it online. Oops I meant to send it to David directly. If anyone else wants a code, please let me know. I did wonder about this bit :P Thank you very much in any case. Dave
Re: [julia-users] travis for os x packages
Yep. Essentially, you'll need to enable the osx build environment http://docs.travis-ci.com/user/osx-ci-environment/. It looks like Travis is not accepting http://docs.travis-ci.com/user/multi-os/ more multi-os requests at the moment, so the typical approach, (used on, for instance, the main julia repository https://github.com/JuliaLang/julia/blob/master/.travis.yml#L2-L4) won't work. You may not be able to get it to run on multiple OS'es, but you should be able to get it to run on OSX only by setting the language to objective-c. This will get it to run on OSX only, then you can use the default .travis.yml file https://github.com/JuliaLang/julia/blob/tk/default-travis-multi-os/base/pkg/generate.jl#L139-L155 that is generated by Pkg. In short, you should be able to take that default file, change the language to objective-c, remove the os block, and call it good. Save that as .travis.yml in your repo, enable Travis in your repository's services section, and test away! -E On Mon, Nov 10, 2014 at 7:50 AM, Simon Byrne simonby...@gmail.com wrote: I would like to set up travis for an OS X-only package: does anyone have suggestions for how I should set up travis (or has anyone already done this)? simon
Re: [julia-users] Compressing .jld files
On Monday, November 10, 2014 12:55:24 PM UTC-6, Steven G. Johnson wrote: On Tuesday, September 2, 2014 3:58:25 PM UTC-4, Jake Bolewski wrote: It would be best to incorporate it into the HDF5 package. A julia package would be useful if you wanted to do the same sort of compression on Julia binary blobs, such as serialized julia values in an IOBuffer. Wouldn't it be better to have a separate Blosc.jl package that is used by HDF5.jl? After all, there are presumably many other applications of this. That seems to be the most reasonable approach but I couldn't work out how to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 package aware of that location when building libhdf5. Are there examples of how to do that? Note that HDF5 has a Blosc filter ( http://www.hdfgroup.org/services/filters.html#blosc and https://github.com/Blosc/c-blosc/tree/master/hdf5), so that I guess you can use Blosc internally in the HDF5 file while still allowing HDF5 tools to work with the file.
[julia-users] Re: Contributing to a Julia Package
Another important point (for actively developed packages) is that Pkg.add() checks out the commit of the latest released version registered in METADATA.jl. Most packages do development on the master branch, so you should likely base your changes on master, rather than the latest released version. To do this, you can use `Pkg.checkout()`, but `git checkout master` will also work. Ivar kl. 21:07:49 UTC+1 mandag 10. november 2014 skrev Tim Wheeler følgende: Thank you! It seems to have worked. Per João's suggestions, I had to: - Create a fork on Github of the target package repository - Clone my fork locally - Create a branch on my local repository - Add, commit, push my changes to said branch - On github I could then submit the pull request from my forked repo to the upstream master On Monday, November 10, 2014 11:17:55 AM UTC-8, Tim Wheeler wrote: Hello Julia Users, I wrote some code that I would like to submit via pull request to a Julia package. The thing is, I am new to this and do not understand the pull request process. What I have done: - used Pkg.add to obtain a local version of said package - ran `git branch mybranch` to create a local git branch - created my code additions and used `git add` to include them. Ran `git commit -m` I am confused over how to continue. The instructions on git for issuing a pull request require that I use their UI interface, but my local branch is not going to show up when I select new pull request because it is, well, local to my machine. Do I need to fork the repository first? When I try creating a branch through the UI I do not get an option to create one like they indicate in the tutorial https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch, perhaps because I am not a repo owner. Thank you.
Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)
Le lundi 10 novembre 2014 à 10:07 -0800, David van Leeuwen a écrit : Hello, On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat wrote: Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit : Hello, On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: NamedArrays.jl generally goes along this way. However, it remains limited in two aspects: 1. Some fields in NamedArrays are not declared of specific types. In particular, the field `dicts` is of the type `Vector{Dict}`, and the use of this field is on the critical path when looping over the table, e.g. when counting. This would potentially lead to substantial impact on performance. I suppose the problem you indicate can be alleviated by making NamedArray parameterized by the type of the key in the dict as well. Right. Sounds reasonable. I've been pondering over how this could be done. NamedArray has a type parameter N, and it should then further have N type parameters indicating the dictionary type along each of the N dimension. So I figure this is going to be a challenging type definition. A tuple type could be used to give the type of the dimension names. But there's another issue: `dicts::Vector{Dict}` cannot be defined more precisely than that if heterogeneous types are allowed for different dimensions. Is this a case where staged functions could be used to generate efficient functions to access dictionaries? Regards
[julia-users] Help optimizing sparse matrix code
Hello! I'm trying to replace an existing matlab code with julia and I'm having trouble matching the performance of the original code. The matlab code is here: https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m The program clusters inventors from a database of patent applications. The input data is a sparse boolean matrix (named XX in the script), where each row defines an inventor and each column defines a feature. For example, the jth column might correspond to a feature first name is John. If there is a 1 in the XX[i, j], this means that inventor i's first name is John. Given an inventor i, we find similar inventors by identifying rows in the matrix that agree with XX[i, :] on a given column and then applying element-wise boolean operations to the rows. In the code, for a given value of `index`, C_lastname holds the unique column in XX corresponding to a last name feature such that XX[index, :] equals 1. C_firstname holds the unique column in XX corresponding to a first name feature such that XX[index, :] equals 1. And so on. The following code snippet finds all rows in the matrix that agree with XX[index, :] on full name and one of patent assignee name, inventory city, or patent class: lump_index_2 = step ((C_assignee | C_city | C_class)) The `step` variable is an indicator that's used to prevent the same inventors from being considered multiple times. My attempt at a literal translation of this code to julia is here: https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean operations aren't supported for sparse matrices in julia, so I fake it with integer arithmetic. The line that corresponds to the matlab code above is lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class))) The reason I grouped it this way is that initially `step` will be a sparse vector of all 1's, and I thought it might help to do the truly sparse arithmetic first. I've been testing this code on a Windows 2008 Server. The test data contains 45,763 inventors and 274,578 possible features (in other words, XX is an 45,763 x 274,58 sparse matrix). The matlab program consistently takes about 70 seconds to run on this data. The julia version shows a lot of variation: it's taken as little as 60 seconds and as much as 10 minutes. However, most runs take around 3.5 to 4 minutes. I pasted one output from the sampling profiler here [1]. If I'm reading this correctly, it looks like the program is spending most of its time performing element-wise multiplication of the indicator vectors I described above. I would be grateful for any suggestions that would bring the performance of the julia program in line with the matlab version. I've heard that the last time the matlab code was run on the full data set it took a couple days, so a slow-down of 3-4x is a signficant burden. I did attempt to write a more idiomatic julia version using Dicts and Sets, but it's slower than the version that uses sparse matrix operations: https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl Thank you! Josh [1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5
Re: [julia-users] Compressing .jld files
Wouldn't it be better to have a separate Blosc.jl package that is used by HDF5.jl? After all, there are presumably many other applications of this. That seems to be the most reasonable approach but I couldn't work out how to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 package aware of that location when building libhdf5. Are there examples of how to do that? I've just created a Blosc.jl package and registered it. Do Pkg.update() and Pkg.add(Blosc) to get it. To get the library location in the HDF5 package, just: 1) Add Blosc to the REQUIRE file 2) import Blosc 3) Blosc.libblosc is the path to the shared library.
Re: [julia-users] Help optimizing sparse matrix code
Le lundi 10 novembre 2014 à 13:03 -0800, Joshua Tokle a écrit : Hello! I'm trying to replace an existing matlab code with julia and I'm having trouble matching the performance of the original code. The matlab code is here: https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m The program clusters inventors from a database of patent applications. The input data is a sparse boolean matrix (named XX in the script), where each row defines an inventor and each column defines a feature. For example, the jth column might correspond to a feature first name is John. If there is a 1 in the XX[i, j], this means that inventor i's first name is John. Given an inventor i, we find similar inventors by identifying rows in the matrix that agree with XX[i, :] on a given column and then applying element-wise boolean operations to the rows. In the code, for a given value of `index`, C_lastname holds the unique column in XX corresponding to a last name feature such that XX[index, :] equals 1. C_firstname holds the unique column in XX corresponding to a first name feature such that XX[index, :] equals 1. And so on. The following code snippet finds all rows in the matrix that agree with XX[index, :] on full name and one of patent assignee name, inventory city, or patent class: lump_index_2 = step ((C_assignee | C_city | C_class)) The `step` variable is an indicator that's used to prevent the same inventors from being considered multiple times. My attempt at a literal translation of this code to julia is here: https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean operations aren't supported for sparse matrices in julia, so I fake it with integer arithmetic. The line that corresponds to the matlab code above is lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class))) You should be able to get a speedup by replacing this line with an explicit `for` loop. First, you'll avoid memory allocation (one for each + or .* operation). Second, you'll be able to return as soon as the index is found, instead of computing the value for all elements (IIUC you're only looking for one index, right?). My two cents The reason I grouped it this way is that initially `step` will be a sparse vector of all 1's, and I thought it might help to do the truly sparse arithmetic first. I've been testing this code on a Windows 2008 Server. The test data contains 45,763 inventors and 274,578 possible features (in other words, XX is an 45,763 x 274,58 sparse matrix). The matlab program consistently takes about 70 seconds to run on this data. The julia version shows a lot of variation: it's taken as little as 60 seconds and as much as 10 minutes. However, most runs take around 3.5 to 4 minutes. I pasted one output from the sampling profiler here [1]. If I'm reading this correctly, it looks like the program is spending most of its time performing element-wise multiplication of the indicator vectors I described above. I would be grateful for any suggestions that would bring the performance of the julia program in line with the matlab version. I've heard that the last time the matlab code was run on the full data set it took a couple days, so a slow-down of 3-4x is a signficant burden. I did attempt to write a more idiomatic julia version using Dicts and Sets, but it's slower than the version that uses sparse matrix operations: https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl Thank you! Josh [1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5
Re: [julia-users] Compressing .jld files
On Monday, November 10, 2014 5:02:03 PM UTC-5, Steven G. Johnson wrote: I've just created a Blosc.jl package and registered it. Do Pkg.update() and Pkg.add(Blosc) to get it. Oh, darn it, I just realized I am duplicating some work by jakebolewski...
[julia-users] Re: defining function for lt for use in sort - simple question
Got it - I don't know whether it's a bug or not. if I comment out #import Base.isless in the LogParse.jl file and initially reload that in the repl and then reload the correct version with import Base.isless methods(isless) shows the method but sort says it's not defined, even when I specify it directly. Apologies for not checking the initial input in a fresh session, I thought that reloading a module would completely reload the functions, but presumably not when appending to those in Base. Kind regards, John. On Monday, November 10, 2014 6:04:29 PM UTC, John Drummond wrote: Thank you, that's helpful. I reentered it all in a fresh session and found it working as well - I'll try and find the difference which caused it not to work and come back. Kind Regards, John. On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote: This code works everywhere I'm able to try it. kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende: I was originally julia 0.3.1 on windows 7 this is on Macosx 10 julia 0.3.2 I loaded the file LogParse.jl below and then in the repl ran reload(LogParse.jl) methods(isless) ary1 = LogParse.DayPriceText[] push!(ary1,LogParse.DayPriceText(4,a1,1)) push!(ary1,LogParse.DayPriceText(2,a1,1)) push!(ary1,LogParse.DayPriceText(6,a1,1)) sort(ary1) sort(ary1,lt=LogParse.isless) I get the same messages - methods(isless) shows that it's loaded but the sort can't find it, even when I try to specify the function #in file LogParse.jl ### module LogParse export DayPriceText import Base.isless type DayPriceText a1::Uint32 b1::ASCIIString a2::Uint32 end function isless(a::DayPriceText, b::DayPriceText) if (a.a1 b.a1) return true else return false end end end ## Many thanks. Kind regards, John On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote: In this case it would be really great if you had a minimal reproducible example. It looks to me as you are doing everything right, so I would start looking for typos and scoping issues. It's hard to find them without looking at the code. Ideally the example should be small and possible to paste into a REPL session, but if you can publish your code and don't want to extract only the relevant part, that might be fine too. Julia version and operating system is also nice to include, so that we have it available in case we have problems reproducing your results. Regards Ivar kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende: Hi, I suspect I'm doing something stupid but no idea what I'm missing. I create a module . I create a type in it, DayPriceText I import Base.isless I define isless for the type now in the repl I get methods(isless) = # 25 methods for generic function isless: .. isless(x::DayPriceText,y::DayPriceText) at c:\works\juliaplay\LogParse.jl:16 but julia typeof(a1p) Array{DayPriceText,1} julia sort(a1p, lt=CILogParse.isless) ERROR: `isless` has no method matching isless(::DayPriceText, ::DayPriceText) in sort! at sort.jl:246 julia sort(a1p) ERROR: `isless` has no method matching isless(::DayPriceText, ::DayPriceText) in sort! at sort.jl:246 I'm sure there's some obvious answer, but I've not idea what. Thanks for any help kind regards, John.
Re: [julia-users] Compressing .jld files
That seems to be the most reasonable approach but I couldn't work out how to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 package aware of that location when building libhdf5. Are there examples of how to do that? Note that the dependencies in some sense run in the opposite direction. You don't technically need to make HDF5 aware of Blosc when building libhdf5. Instead, you need to build a Blosc filter for HDF5 (included with c-blosc) and register it with HDF5. The Blosc.jl package can't build the HDF5 filter, because that would introduce an unnecessary dependency on HDF5 for other things using Blosc. So, at least this component needs to be built in/after the HDF5 package.
Re: [julia-users] Compressing .jld files
The 64 bit issue is killer and why I didn't go farther with integrating blosc with hdf5. I guess I should had been more vocal about this. Take what you may from my nascent package :-) On Monday, November 10, 2014 6:05:40 PM UTC-5, Steven G. Johnson wrote: That seems to be the most reasonable approach but I couldn't work out how to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 package aware of that location when building libhdf5. Are there examples of how to do that? Note that the dependencies in some sense run in the opposite direction. You don't technically need to make HDF5 aware of Blosc when building libhdf5. Instead, you need to build a Blosc filter for HDF5 (included with c-blosc) and register it with HDF5. The Blosc.jl package can't build the HDF5 filter, because that would introduce an unnecessary dependency on HDF5 for other things using Blosc. So, at least this component needs to be built in/after the HDF5 package.
[julia-users] Re: Great new expository article about Julia by the core developers
see this ... https://groups.google.com/d/msg/julia-box/hw81as3GPWA/E1QJm1shnV4J On Monday, November 10, 2014 7:37:08 AM UTC-8, David Higgins wrote: So how does one go about getting an invitation to JuliaBox? It's referenced in the article but you need an invitation to login Dave.
[julia-users] Re: defining function for lt for use in sort - simple question
That seems like a tricky edge case, indeed. Not sure if this is a bug either, or if there are any existing issues on github that covers this. kl. 23:26:49 UTC+1 mandag 10. november 2014 skrev John Drummond følgende: Got it - I don't know whether it's a bug or not. if I comment out #import Base.isless in the LogParse.jl file and initially reload that in the repl and then reload the correct version with import Base.isless methods(isless) shows the method but sort says it's not defined, even when I specify it directly. Apologies for not checking the initial input in a fresh session, I thought that reloading a module would completely reload the functions, but presumably not when appending to those in Base. Kind regards, John. On Monday, November 10, 2014 6:04:29 PM UTC, John Drummond wrote: Thank you, that's helpful. I reentered it all in a fresh session and found it working as well - I'll try and find the difference which caused it not to work and come back. Kind Regards, John. On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote: This code works everywhere I'm able to try it. kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende: I was originally julia 0.3.1 on windows 7 this is on Macosx 10 julia 0.3.2 I loaded the file LogParse.jl below and then in the repl ran reload(LogParse.jl) methods(isless) ary1 = LogParse.DayPriceText[] push!(ary1,LogParse.DayPriceText(4,a1,1)) push!(ary1,LogParse.DayPriceText(2,a1,1)) push!(ary1,LogParse.DayPriceText(6,a1,1)) sort(ary1) sort(ary1,lt=LogParse.isless) I get the same messages - methods(isless) shows that it's loaded but the sort can't find it, even when I try to specify the function #in file LogParse.jl ### module LogParse export DayPriceText import Base.isless type DayPriceText a1::Uint32 b1::ASCIIString a2::Uint32 end function isless(a::DayPriceText, b::DayPriceText) if (a.a1 b.a1) return true else return false end end end ## Many thanks. Kind regards, John On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote: In this case it would be really great if you had a minimal reproducible example. It looks to me as you are doing everything right, so I would start looking for typos and scoping issues. It's hard to find them without looking at the code. Ideally the example should be small and possible to paste into a REPL session, but if you can publish your code and don't want to extract only the relevant part, that might be fine too. Julia version and operating system is also nice to include, so that we have it available in case we have problems reproducing your results. Regards Ivar kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende: Hi, I suspect I'm doing something stupid but no idea what I'm missing. I create a module . I create a type in it, DayPriceText I import Base.isless I define isless for the type now in the repl I get methods(isless) = # 25 methods for generic function isless: .. isless(x::DayPriceText,y::DayPriceText) at c:\works\juliaplay\LogParse.jl:16 but julia typeof(a1p) Array{DayPriceText,1} julia sort(a1p, lt=CILogParse.isless) ERROR: `isless` has no method matching isless(::DayPriceText, ::DayPriceText) in sort! at sort.jl:246 julia sort(a1p) ERROR: `isless` has no method matching isless(::DayPriceText, ::DayPriceText) in sort! at sort.jl:246 I'm sure there's some obvious answer, but I've not idea what. Thanks for any help kind regards, John.
[julia-users] Re: JuliaBox
the Sagemath Cloud google chrome app also gets users to a rich environment for Julia ... https://chrome.google.com/webstore/detail/the-sagemath-cloud/eocdndagganmilahaiclppjigemcinmb users can run Julia inside a terminal ... OR ... via iJulia notebooks ... OR ... via Sagemath worksheets. also available for running Julia within a terminal, the VMs served at https://koding.com (there is also a google chrome app for this ...) best, cdm On Monday, November 10, 2014 11:04:13 AM UTC-8, Ivar Nesje wrote: Yesterday someone suggested https://groups.google.com/forum/#!searchin/julia-users/monster/julia-users/zEp8pKkEYHk/Oqb7NYdxFcwJ https://tmpnb.org/
[julia-users] Available packages for compression?
Pkg.add(Blosc) should now add a working Blosc package.
Re: [julia-users] Compressing .jld files
On Monday, November 10, 2014 6:09:50 PM UTC-5, Jake Bolewski wrote: The 64 bit issue is killer and why I didn't go farther with integrating blosc with hdf5. I guess I should had been more vocal about this. Take what you may from my nascent package :-) Google's Snappy library has a 64-bit API, but seems to also be limited to 32-bit sizes internally, as is the LZ4 library. Kind of surprising that so many people would independently limit themselves to 32-bit buffers nowadays.
Re: [julia-users] Compressing .jld files
On Monday, November 10, 2014 8:39:41 PM UTC-5, Steven G. Johnson wrote: Google's Snappy library has a 64-bit API, but seems to also be limited to 32-bit sizes internally, as is the LZ4 library. Kind of surprising that so many people would independently limit themselves to 32-bit buffers nowadays. Snappy's only excuse was backwards compatibility: https://code.google.com/p/snappy/issues/detail?id=76
Re: [julia-users] travis for os x packages
I don't want to steal Pontus Stenetorp's thunder since he did all the work, but there's a PR open here https://github.com/travis-ci/travis-build/pull/318 that will sooner or later add community maintained support for Julia directly in Travis as `language: julia`. The default .travis.yml for Julia packages can be simplified even further once that gets rolled out. That doesn't fix the capacity issues at Travis where they aren't accepting new repos, so for now the `language: objective-c` version, and using the install-julia.sh script, is the best way to temporarily test things out on Mac workers. On Monday, November 10, 2014 12:32:34 PM UTC-8, Elliot Saba wrote: Yep. Essentially, you'll need to enable the osx build environment http://docs.travis-ci.com/user/osx-ci-environment/. It looks like Travis is not accepting http://docs.travis-ci.com/user/multi-os/ more multi-os requests at the moment, so the typical approach, (used on, for instance, the main julia repository https://github.com/JuliaLang/julia/blob/master/.travis.yml#L2-L4) won't work. You may not be able to get it to run on multiple OS'es, but you should be able to get it to run on OSX only by setting the language to objective-c. This will get it to run on OSX only, then you can use the default .travis.yml file https://github.com/JuliaLang/julia/blob/tk/default-travis-multi-os/base/pkg/generate.jl#L139-L155 that is generated by Pkg. In short, you should be able to take that default file, change the language to objective-c, remove the os block, and call it good. Save that as .travis.yml in your repo, enable Travis in your repository's services section, and test away! -E On Mon, Nov 10, 2014 at 7:50 AM, Simon Byrne simon...@gmail.com javascript: wrote: I would like to set up travis for an OS X-only package: does anyone have suggestions for how I should set up travis (or has anyone already done this)? simon
[julia-users] Displaying a polygon mesh
Is there an easy way to display a polygon mesh in Julia, i.e., vertices and faces loaded from an STL file or created by marching tetrahedra using Meshes.jl? So far, I see: - PyPlot/matplotlib, which seems to be surprisingly difficult to convince to do this. - GLPlot, which doesn't currently work for me on 0.4. (I haven't tried very hard yet.) - ihnorton's VTK bindings, which aren't registered in METADATA.jl. Is there another option I'm missing? If not, can I convince one of these packages to show my mesh with minimal time investment, or should I use a separate volume viewer (or maybe a Python package via PyPlot)? Thanks, Simon
[julia-users] Julia Tech Talk at the University of Pennsylvania
Hi all, Feel free to come by if you're around Philly! Julia Tech Talk on Thursday, November 13 at 6:00pm at Wu and Chen Auditorium When: Thursday, November 13 https://www.facebook.com/events/calendar/2014/November/13 at 6:00pm Where: Wu and Chen Auditorium https://www.facebook.com/pages/Wu-and-Chen-Auditorium/145368958832977 Philadelphia, Pennsylvania 19104 On Thursday, November 13th @ 6pm the Dining Philosophers will be hosting a talk on the Julia Programming language in Wu Chen Auditorium. Julia has the elegance and familiarity of Python and Matlab, with speed close to C, and is completely open source. This is a great opportunity for anyone interested in scientific and parallel computation, machine learning, data analysis, and visualization. There will be a giveaway of online JuliaBox codes for the Julia language for all attendees! Speakers: Ted Fujimoto (CIT Masters student) and Randy Zwitch (Senior Data Scientist at Comcast) Randy Zwitch is Senior Data Scientist at Comcast, researching how to improve the overall customer viewing experience using petabyte-scale tools and datasets. Randy also contributes to the R and Julia open-source communities, creating and maintaining packages primarily related to the web (HTTP requests/APIs, Server Log Parsing, Geo-Location, etc.) and database access. Abstract: Using publicly available datasets, Randy will provide an intro to machine learning using ad-hoc Julia code and via add-on packages.
[julia-users] Questions relating to packages and using/creating them
I have some general questions about using packages. 1. Is there a way to create a workspace separate of $HOME/.julia? This would still have the same functionality when calling using in the REPL. 2. What's the best practice for packages with the same name? I don't have a problem related to this but I'm just curious how this is handled. I think via Pkg.add(...) there's only one definition of any package name, but with Pkg.clone(...) I could see package name collisions. Having all the packages under one directory doesn't seem scalable to me. thanks
Re: [julia-users] Questions relating to packages and using/creating them
1. see LOAD_PATH (http://julia.readthedocs.org/en/latest/manual/modules/) 2. this is not specifically supported, as far as I know. We could be fancy and add a UUID to the package spec, or something like that, but I don't think it is a very pressing concern right now. The simple options right now are to manipulate LOAD_PATH to put the preferred package path(s) first (I think this should work) or to manually `require` a specific path (which won't work with `using`). On Mon, Nov 10, 2014 at 9:25 PM, Dom Luna dluna...@gmail.com wrote: I have some general questions about using packages. 1. Is there a way to create a workspace separate of $HOME/.julia? This would still have the same functionality when calling using in the REPL. 2. What's the best practice for packages with the same name? I don't have a problem related to this but I'm just curious how this is handled. I think via Pkg.add(...) there's only one definition of any package name, but with Pkg.clone(...) I could see package name collisions. Having all the packages under one directory doesn't seem scalable to me. thanks
Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm
Unless I understood wrong (which is very possible) the 65536 bins were to cover all possible values a 16bit pixel can take. Though, in the actual graythresh function i will probably use 256 bins by default. I did find the docs for adding custom formats (https://github.com/timholy/Images.jl/blob/master/doc/extendingIO.md) But perhaps making bio formats .jar file will be better in the long run for few reasons: 1) A lot more formats are covered so implementing that would allow coverage of more formats faster. 2) I understand your reasons for making all images in the Gray range, but i prefer having real pixel values. That way its easier to correlate test data with something like Fiji or Matlab. And I don't understand Julia float handling fully but there might be a gain in speed if using non-float values. 3) Bio formats already allows the reading of individual images based on XYZCT so that doesn't need to be rebuilt. Course, the above is the ideal thing to do. I'm still trying to figure out how to use the .jar file, so i might just end up adding the custom format first. Let's see... -Aneesh On Monday, November 10, 2014 6:55:08 PM UTC+8, Tim Holy wrote: All good plans. (I'm not sure about using 65536 bins for 16-bit images, though, because that would be more bins than there are pixels in some images. Still, it's not all that much memory, really, so maybe that would be OK.) It would be great to add native support. Presumably you've found the docs on adding support for new formats. For formats that encode large datasets in a single block (like NRRD), you can work with GB-sized datasets on a laptop because you can use mmap (I do it routinely). But the love of TIFF does demand an alternative solution. Presumably we should add a lower-level routine that returns a structure that facilitates later access, e.g., imds = imdataset(my_image_file) img = imds[z, 14, t, 7] or somesuch. --Tim On Sunday, November 09, 2014 07:38:27 PM Aneesh Sathe wrote: Tim, i would like the imhist to be idiot proof. (i've been teaching matlab and nothing puts new people off more than things not being idiot proof). things like using 256 bins by default returning a plot if no outputs are specified (basically make it like matlab's imthresh() ) Btw, on matlab using bioformats is actually the slowest part of my algorithm, so unless it can be faster in julia native support might be nicer. Bioformats also fails in that it reads the whole sequence at once... so running things on laptops with even GB-level datasets is impossible. I wrote my own version of bfopen to only open the required XYZCT for specified series, but that only solves the memory usage. the source format for my image was .mvd2 (perkin elmer spinning disk). i know about JavaCall.jl just havent had the time to play with it... i was thinking it might be fun to attempt native support for a few formats. I can also generate test data in a few vendor formats for a few microscopes. perhaps even make it a julia-box based project. ;) On Monday, November 10, 2014 4:49:22 AM UTC+8, Tim Holy wrote: On Sunday, November 09, 2014 11:39:53 AM Aneesh Sathe wrote: Yes, Images does read it okay but only if i cut out the substack. If i don't, then it interprets the three channels as a time dimension, which isnt a pain at the moment but will be if i start using it for work. Hmm, that sounds like an annotation problem. I realized that both the convert and the g[:] would slow me down but the hist function just wouldn't work without that kind of dance. Also, graythresh (http://www.mathworks.com/help/images/ref/graythresh.html) uses reshape to make it all one image which might also add to speed. The pull request is well and good but personally i would rather have a dedicated image histogram function like imhist: http://www.mathworks.com/help/images/ref/imhist.html which would give histograms based on input images. To me that's the only way to make life easier. maybe i'll write one :) imhist is necessary in matlab largely because hist works columnwise; in a sense, Julia's `hist` is like imhist. Is there some specific functionality you're interested in? There's no reason Images can't provide a custom version of `hist`. Something about Images: do you think it possible to use the bio formats' .jar file to import images from a microscope format to Images? Opening a microscope format image file in the relevant software and then exporting it as tiff takes too long and i'd rather be able to access the images directly. Yes, expansion of Images' I/O capabilities would be great. I've wondered about Bio-Formats myself, but not had a direct need, nor do I know Java (but see
Re: [julia-users] Performance confusions on matrix extractions in loops, and memory allocations
I do, actually, tried expanding vectorized operations into explicit for loops, and computing vector multiplication / vector norm in BLAS interfaces. For explicit loops, it did allocate less memory, but took much more time. Meanwhile, the vectorized version which I've been get used to write runs incredibly fast, as the following tests indicates: # Explicit for loop, slightly modified from SimilarityMetric.jl by johnmyleswhite (https://github.com/johnmyleswhite/SimilarityMetrics.jl/blob/master/src/cosine.jl) function cosine(a::SparseMatrixCSC{Float64, Int64}, b::SparseMatrixCSC{Float64, Int64}) sA, sB, sI = 0.0, 0.0, 0.0 for i in 1:length(a) sA += a[i]^2 sI += a[i] * b[i] end for i in 1:length(b) sB += b[i]^2 end return sI / sqrt(sA * sB) end # BLAS version function cosine_blas(i::SparseMatrixCSC{Float64, Int64}, j::SparseMatrixCSC{Float64, Int64}) i = full(i) j = full(j) numerator = BLAS.dot(i, j) denominator = BLAS.nrm2(i) * BLAS.nrm2(j) return numerator / denominator end # the vectorized version remains the same, as the 1st post shows. # Test functions function test_explicit_loop(d) for n in 1:1 v = d[:,1] cosine(v,v) end end function test_blas(d) for n in 1:1 v = d[:,1] cosine_blas(v,v) end end function test_vectorized(d) for n in 1:1 v = d[:,1] cosine_vectorized(v,v) end end test_explicit_loop(mat) test_blas(mat) test_vectorized(mat) gc() @time test_explicit_loop(mat) gc() @time test_blas(mat) gc() @time test_vectorized(mat) # Results elapsed time: 3.772606858 seconds (6240080 bytes allocated) elapsed time: 0.400972089 seconds (327520080 bytes allocated, 81.58% gc time) elapsed time: 0.011236068 seconds (34560080 bytes allocated) On Monday, November 10, 2014 7:23:17 PM UTC+8, Milan Bouchet-Valat wrote: Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo a écrit : Hi fellows, I'm currently working on sparse matrix and cosine similarity computation, but my routines is running very slow, at least not reach my expectation. So I wrote some test functions, to dig out the reason of ineffectiveness. To my surprise, the execution time of passing two vectors to the test function and passing the whole sparse matrix differs greatly, the latter is 80x faster. I am wondering why extracting two vectors of the matrix in each loop is dramatically faster that much, and how to avoid the multi-GB memory allocate. Thanks guys. -- BEST REGARDS, Todd Leo # The sparse matrix mat # 2000x15037 SparseMatrixCSC{Float64, Int64} # The two vectors, prepared in advance v = mat'[:,1] w = mat'[:,2] # Cosine similarity function function cosine_vectorized(i::SparseMatrixCSC{Float64, Int64}, j::SparseMatrixCSC{Float64, Int64}) return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j)) end I think you'll experience a dramatic speed gain if you write the sums in explicit loops, accessing elements one by one, taking their product and adding it immediately to a counter. In your current version, the element-wise products allocate new vectors before computing the sums, which is very costly. This will also get rid of the difference you report between passing arrays and vectors. Regards function test1(d) res = 0. for i in 1:1 res = cosine_vectorized(d[:,1], d[:,2]) end end function test2(_v,_w) res = 0. for i in 1:1 res = cosine_vectorized(_v, _w) end end test1(dtm) test2(v,w) gc() @time test1(dtm) gc() @time test2(v,w) #elapsed time: 0.054925372 seconds (59360080 bytes allocated, 59.07% gc time) #elapsed time: 4.204132608 seconds (3684160080 bytes allocated, 65.51% gc time)
Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm
On Monday, November 10, 2014 06:49:17 PM Aneesh Sathe wrote: 2) I understand your reasons for making all images in the Gray range, but i prefer having real pixel values. That way its easier to correlate test data with something like Fiji or Matlab. And I don't understand Julia float handling fully but there might be a gain in speed if using non-float values. They're not really float values, underneath they are integers. You can just say `reinterpret(Uint16, x)`. --Tim
[julia-users] Re: Displaying a polygon mesh
On Monday, November 10, 2014 9:09:29 PM UTC-5, Simon Kornblith wrote: Is there an easy way to display a polygon mesh in Julia, i.e., vertices and faces loaded from an STL file or created by marching tetrahedra using Meshes.jl? So far, I see: Mayavi via PyCall?
Re: [julia-users] Displaying a polygon mesh
I'm using Compose (and Color), on which Gadfly is built. I tried Gadfly itself, but there were some inefficiencies -- I tried to compose an image consisting of many different edges, and this many independent graphs (I'm using the wrong terminology here) was not handled well. I've copy-and-pasted my plot routines at https://gist.github.com/eschnett/a9e7f70e4910e4ba2768 to give you an example. circle draws a filled circle (a vertex), and line draws a line (an edge). I'm choosing colours depending on the z coordinate. The code isn't self-contained, but should serve as example to see how easy/complex this approach is. -erik On Mon, Nov 10, 2014 at 9:09 PM, Simon Kornblith si...@simonster.com wrote: Is there an easy way to display a polygon mesh in Julia, i.e., vertices and faces loaded from an STL file or created by marching tetrahedra using Meshes.jl? So far, I see: PyPlot/matplotlib, which seems to be surprisingly difficult to convince to do this. GLPlot, which doesn't currently work for me on 0.4. (I haven't tried very hard yet.) ihnorton's VTK bindings, which aren't registered in METADATA.jl. Is there another option I'm missing? If not, can I convince one of these packages to show my mesh with minimal time investment, or should I use a separate volume viewer (or maybe a Python package via PyPlot)? Thanks, Simon -- Erik Schnetter schnet...@cct.lsu.edu http://www.perimeterinstitute.ca/personal/eschnetter/
Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm
Ah! I had misunderstood that. Thank you! :) On Tuesday, November 11, 2014 11:19:29 AM UTC+8, Tim Holy wrote: On Monday, November 10, 2014 06:49:17 PM Aneesh Sathe wrote: 2) I understand your reasons for making all images in the Gray range, but i prefer having real pixel values. That way its easier to correlate test data with something like Fiji or Matlab. And I don't understand Julia float handling fully but there might be a gain in speed if using non-float values. They're not really float values, underneath they are integers. You can just say `reinterpret(Uint16, x)`. --Tim
Re: [julia-users] travis for os x packages
On 11 November 2014 10:49, Tony Kelman t...@kelman.net wrote: I don't want to steal Pontus Stenetorp's thunder since he did all the work, but there's a PR open here https://github.com/travis-ci/travis-build/pull/318 that will sooner or later add community maintained support for Julia directly in Travis as `language: julia`. The default .travis.yml for Julia packages can be simplified even further once that gets rolled out. No worries about the thunder, let's hope they merge it soon enough and I can make a public announcement. Also, thank you for poking them the other day. Pontus
[julia-users] Elementwise operator
I was looking at the Devectorise package and was wondering, why not have an operator that calls elementwise operations? While syntax is not something I have considered, using something basic like the example I see r = a .* b + c .* d + a could be expressed as r = .(a * b + c * d + a) which would then apply the expression a * b + c * d + a to each element in the array. .= could possibly be used in place of surrounding the expression with .(Expr). I am not too familiar with Devectorise here but the advantage of this (from what I can tell with a limited look through of the readme) is that this could include user functions as well as user functions would then be applied. r = .(a * b + c * d + foo(a) * bar(c,d)) or r .= a * b + c * d + foo(a) * bar(c,d)) Should theoretically be possible then. The obvious advantage would be that memory only needs to be allocated once for the new array instead of each broadcasted operator. Just a thought which may be stepping on Devectorise's toes but reading through some of the vectorised code issues I thought this may be a simple solution which may provide a performance benefit.
[julia-users] Re: Displaying a polygon mesh
Winston has an experimental/undocumented function surf + some stuff around it (https://github.com/nolta/Winston.jl/blob/master/src/canvas3d.jl), which might be sufficient if you just want to have a look at your meshes. Best, Alex. On Tuesday, 11 November 2014 03:09:29 UTC+1, Simon Kornblith wrote: Is there an easy way to display a polygon mesh in Julia, i.e., vertices and faces loaded from an STL file or created by marching tetrahedra using Meshes.jl? So far, I see: PyPlot/matplotlib, which seems to be surprisingly difficult to convince to do this.GLPlot, which doesn't currently work for me on 0.4. (I haven't tried very hard yet.) ihnorton's VTK bindings, which aren't registered in METADATA.jl. Is there another option I'm missing? If not, can I convince one of these packages to show my mesh with minimal time investment, or should I use a separate volume viewer (or maybe a Python package via PyPlot)? Thanks, Simon
[julia-users] Re: Initialize dict of dicts with = syntax
How to initialize an array of dicts? Is there any suggested ways to do it? julia (Int64=Int64)[] Dict{Int64,Int64} with 0 entries # And since brackets creates Arrays: julia Any[] 0-element Array{Any,1} # So I suppose this would generate array of dicts, until it fails: julia ((Int64=Int64)[])[] ERROR: `getindex` has no method matching getindex(::Dict{Int64,Int64}) On Sunday, May 4, 2014 5:02:14 AM UTC+8, thom lake wrote: One thing that I like about {} for initializing Array{Any,1}, is the consistency with comprehension syntax. Namely, braces for Any, brackets for specific types julia typeof({i=2i for i = 1:10}) Dict{Any,Any} julia typeof([i=2i for i = 1:10]) Dict{Int64,Int64} julia typeof({2i for i = 1:10}) Array{Any,1} julia typeof([2i for i = 1:10]) Array{Int64,1}
[julia-users] Re: parallel for loop in Julia
Thank you for your answer. Do you have any suggestions how to deal with that? Am Montag, 10. November 2014 23:25:23 UTC+1 schrieb ele...@gmail.com: On Tuesday, November 11, 2014 5:10:30 AM UTC+11, DrKey wrote: Here is what i tried: variant1: forcp = zeros(3,1); forcp = @parallel (hcat) for partA = 1:nPart for partB = (partA+1):nPart ... end forcp = forces[:,partA]; end variant2: function calcforces(coords,L,np,i) # with np... number of processes i... current process for partA = i+1:np:nPart-1 for partB = (partA+1):nPart ... return forces end np = nprocs(); parad = Array(RemoteRef,np); and then calling function calcforces with: for i=1:np parad[i] = @spawn LJ_Force_MT(coords,L,np,i); end for i=1:np forces = fetch(parad[i]); end both ways are giving me wrong results over more than 1 timestep You have multiple parallel loops modifying the forces array. They will be generating races for sure. Cheers Lex