date:20141110

Le dimanche 09 novembre 2014 à 23:50 +, John Myles White a écrit :
 FWIW, I think the best way to move forward with NamedArrays is to
 replace NamedArrays with a parametric type Named{T} that wraps around
 other AbstractArray types. That gives you both named Array and named
 DataArray objects for the same cost.
Yeah, looks like a good idea. Duplicating the code for each array type
would be a waste.


Regards


 On Nov 9, 2014, at 5:49 PM, Tim Holy tim.h...@gmail.com wrote:
 
  Indeed, better to use a Dict if you're naming each row/column. I'd 
  forgotten 
  that was part of NamedArrays.
  
  --Tim
  
  On Sunday, November 09, 2014 06:11:44 PM Milan Bouchet-Valat wrote:
  Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit :
  With regards to arrays with named dimensions, I suspect that with the
  arrival of stagedfunctions, something like NamedAxesArrays
  (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But
  stagedfunctions still have some show-stopper bugs, and we need to fix
  those
  first.
  
  Interesting package!
  
  But when I said named dimensions, I actually meant that dimensions had
  names, but that elements on each dimension (rows, columns...) had names
  too. I'm not sure it also makes sense to use staged functions to
  specialize code on element names, since they can vary much more than
  dimension names. This could generate quite a lot of methods which would
  use memory even if only used once.
  
  
  Regards
  
  On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote:
  Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
  I would vote for calling such a function `table()`, to get even closer
  to R's table().
  
  Well, that's the debate at
  https://github.com/JuliaStats/StatsBase.jl/issues/32
  
  At first I was in favor of table() too, but now I prefer freqtable(),
  because table could mean any kind of cross-tabulation. I think
  NamedArray could even be called Table.
  
  And I can't wait for such functionality to be included in METADATA...
  
  Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
  when I first worked on the package. Now I see the tests are still
  failing. Do you know what is needed to make them work?
  
  Another point is that I think this deserves going into StatsBase, but
  before that we need everybody to agree on a design for NamedArrays.
  
  Regards
  
  On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
  
  wrote:
 Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
  
 écrit :
  I was also looking for a function like this, but could not
  find one in docs.julialang.org.  I was doing this
  (v0.4.0-dev), for anyone who is interested:
  
  
  example = rand(1:10,100)
  uexample = sort(unique(example))
  counts = map(x-count(y-x==y,example),uexample)
  
  
  It's pretty ugly, so thanks, Johan, for pointing out the
  StatsBase-countmap
  
 I've also put together a small package precisely aimed at
 offering an equivalent of R's table():
 https://github.com/nalimilan/Tables.jl
  
 But there's a more general issue about how to handle arrays
 with dimension names in Julia. NamedArrays.jl (which is used
 in my package) attempts to tackle this issue, but I don't
 think we've reached a consensus yet about the best solution.
  
  
 Regards
  
  On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
  
  wrote:
 I think countmap comes closest to giving you what
 you want:
  
 using StatsBase
 data = sample([a, b, c], 20)
 countmap(data)
  
 Dict{ASCIIString,Int64} with 3 entries:
   c = 3
   b = 10
   a = 7
  
 On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
  
 Oswald wrote:
 Hi
  
  
 I'm looking for the best way to count how
 many times a certain value x_i appears in
 vector x, where x could be integers, floats,
 strings. In R I would do table(x). I found
 StatsBase.counts(x,k) but I'm a bit confused
 by k (where k goes into 1:k, i.e. the vector
 is scanned to find how many elements locate
 at each point of 1:k). most of the times I
 don't know k, and in fact I would do
 table(x) just to find out what k was. Apart
 from that, I don't think I could use this
 with strings, as I can't construct a range
 object from strings.
  
  
 I'm wondering whether a method
 StatsBase.counts(x::Vector) just returning
 the frequency of each element appearing
 would be useful.
  
  
 The same applies to Base.hist if I
 understand correctly. I just don't want to
 have to

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit :
 Hello, 
 
 On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote:
 NamedArrays.jl generally goes along this way. However, it
 remains limited in two aspects:
 
 
 1. Some fields in NamedArrays are not declared of specific
 types. In particular, the field `dicts` is of the type
 `Vector{Dict}`, and the use of this field is on the critical
 path when looping over the table, e.g. when counting. This
 would potentially lead to substantial impact on performance.
 
 
 In the beginning I have been experimenting with indexing speed, mainly
 to sort out the various forms of getindex(), and I although I don't
 remember the exact result, I do remember that I found the drop in
 performance w.r.t. integer indexing surprisingly small. 
 
 
 I suppose the problem you indicate can be alleviated by making
 NamedArray parameterized by the type of the key in the dict as well.  
Right. Sounds reasonable.

 2. Currently, it only accepts a limited set of types for
 indices, e.g. Real and String. But in some cases, people may
 go beyond this. I don't think we have to impose this limit. 
 
 
 Ah---I now see what you mean.  I thought I had built in support for
 all types as index, but there obviously is no catch all-rule in
 getindex.  I suppose NamedArray needs an update there. 
I think the last time I looked into this, it was a problem even for
efficiently indexing AbstractArrays:
https://github.com/JuliaLang/julia/pull/4892#issuecomment-31087910

Slow catch-all methods are good, but if we want specialized versions it
will probably need more work. If you want to accept combinations of
Int/String/Complement{T}/anything, the number of specialized methods to
generate explodes. I think the conclusion was that we needed to wait for
staged functions. Since they are implemented now, it may be a good time
to look into this issue for both AbstractArrays and NamedArrays.


Regards

 On Monday, November 10, 2014 8:35:32 AM UTC+8, Dahua Lin
 wrote:
 I have been observing an interesting differences
 between people coming from stats and machine learning.
 
 
 Stats people tend to favor the approach that allows
 one to directly use the category names to index the
 table, e.g. A[apple]. This tendency is clearly
 reflected in the design of R, where one can attach a
 name to everything.
 
 
 While in machine learning practice, it is a common
 convention to just encode categories into integers,
 and simply use an ordinary array to represent a
 counting table. Whereas it makes it a little bit
 inconvenient in an interactive environment, this way
 is generally more efficient when you have to deal with
 these categories over a large number of samples.
 
 
 These differences aside, I believe, however, that
 there exist a very generic approach to this problem --
 a multi-dimensional associative map, which allows one
 to write A[i1, i2, ...] where the indices can be
 arbitrary hashable  equality-comparable instances,
 including integers, strings, symbols, among many other
 things.
 
 
 A multi-dimensional associative map can be considered
 as a multi-dimensional generalization of dictionaries,
 which can be easily implemented via an
 multidimensional array and several dictionaries, each
 for one dimension, to map user-side indexes to integer
 indexes. 
 
 
 - Dahua
 
 
 
 
 
 
 
 On Monday, November 10, 2014 8:12:54 AM UTC+8, David
 van Leeuwen wrote:
 Hi, 
 
 On Sunday, November 9, 2014 5:10:19 PM UTC+1,
 Milan Bouchet-Valat wrot
 Actually I didn't do it because
 NamedArrays.jl didn't work well on 0.3
 when I first worked on the package.
 Now I see the tests are still failing.
 Do you know what is needed to make
 them work?

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Tim Holy

All good plans. (I'm not sure about using 65536 bins for 16-bit images, 
though, because that would be more bins than there are pixels in some images. 
Still, it's not all that much memory, really, so maybe that would be OK.)

It would be great to add native support. Presumably you've found the docs on 
adding support for new formats.

For formats that encode large datasets in a single block (like NRRD), you can 
work with GB-sized datasets on a laptop because you can use mmap (I do it 
routinely). But the love of TIFF does demand an alternative solution. 
Presumably we should add a lower-level routine that returns a structure that 
facilitates later access, e.g.,
imds = imdataset(my_image_file)
img = imds[z, 14, t, 7]
or somesuch.

--Tim

On Sunday, November 09, 2014 07:38:27 PM Aneesh Sathe wrote:
 Tim,
 i would like the imhist to be idiot proof. (i've been teaching matlab and
 nothing puts new people off more than things not being idiot proof).
 things like using 256 bins by default returning a plot  if no outputs
 are specified (basically make it like matlab's imthresh() )
 
 Btw, on matlab using bioformats is actually the slowest part of my
 algorithm, so unless it can be faster in julia native support might be
 nicer. Bioformats also fails in that it reads the whole sequence at once...
 so running things on laptops with even GB-level datasets is impossible. I
 wrote my own version of bfopen to only open the required XYZCT for
 specified series, but that only solves the memory usage.
 
 the source format for my image was .mvd2 (perkin elmer spinning disk).
 
 i know about JavaCall.jl just havent had the time to play with it...
 
 i was thinking it might be fun to attempt native support for a few formats.
 I can also generate test data in a few vendor formats for a few
 microscopes.
 perhaps even make it a julia-box based project. ;)
 
 On Monday, November 10, 2014 4:49:22 AM UTC+8, Tim Holy wrote:
  On Sunday, November 09, 2014 11:39:53 AM Aneesh Sathe wrote:
   Yes, Images does read it okay but only if i cut out the substack. If i
   don't, then it interprets the three channels as a time dimension, which
   isnt a pain at the moment but will be if i start using it for work.
  
  Hmm, that sounds like an annotation problem.
  
   I realized that both the convert and the g[:] would slow me down but the
   hist function just wouldn't work without that kind of dance. Also,
   graythresh (http://www.mathworks.com/help/images/ref/graythresh.html)
  
  uses
  
   reshape to make it all one image which might also add to speed.
   
   The pull request is well and good but personally i would rather have a
   dedicated image histogram function like
   imhist: http://www.mathworks.com/help/images/ref/imhist.html
   which would give histograms based on input images. To me that's the only
   way to make life easier. maybe i'll write one :)
  
  imhist is necessary in matlab largely because hist works columnwise; in a
  sense, Julia's `hist` is like imhist. Is there some specific functionality
  you're interested in? There's no reason Images can't provide a custom
  version
  of `hist`.
  
   Something about Images: do you think it possible to use the bio formats'
   .jar file to import images from a microscope format to Images?
   Opening a microscope format image file in the relevant software and then
   exporting it as tiff takes too long and i'd rather be able to access the
   images directly.
  
  Yes, expansion of Images' I/O capabilities would be great. I've wondered
  about
  Bio-Formats myself, but not had a direct need, nor do I know Java (but see
  JavaCall.jl, if you haven't already).
  
  The other way to go, of course, is Julia native support. Our support for
  NRRD
  is a reasonable model of this approach. However, the reason we use
  ImageMagick
  is because the reality is that there are a lot of formats out there; Bio-
  Formats would fill a similar need for vendor-specific file formats. Out of
  curiousity, what's the original format you're using?
  
  --Tim

[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread Nils Gudat

Hi David,

shouldnt it be @Compat Dict(zip(keys, values) instead of 
@Compat.dict(zip(keys, values)), i.e. a space between compat and dict 
rather than a dot method call?

Best,
Nils

Re: [julia-users] Performance confusions on matrix extractions in loops, and memory allocations

Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo a écrit :
 Hi fellows, 
 
 
 
 I'm currently working on sparse matrix and cosine similarity
 computation, but my routines is running very slow, at least not reach
 my expectation. So I wrote some test functions, to dig out the reason
 of ineffectiveness. To my surprise, the execution time of passing two
 vectors to the test function and passing the whole sparse matrix
 differs greatly, the latter is 80x faster. I am wondering why
 extracting two vectors of the matrix in each loop is dramatically
 faster that much, and how to avoid the multi-GB memory allocate.
 Thanks guys.
 
 
 --
 BEST REGARDS,
 Todd Leo
 
 
 # The sparse matrix
 mat # 2000x15037 SparseMatrixCSC{Float64, Int64}
 
 
 # The two vectors, prepared in advance
 v = mat'[:,1]
 w = mat'[:,2]
 
 
 # Cosine similarity function
 function cosine_vectorized(i::SparseMatrixCSC{Float64, Int64},
 j::SparseMatrixCSC{Float64, Int64})
 return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j))
 end
I think you'll experience a dramatic speed gain if you write the sums in
explicit loops, accessing elements one by one, taking their product and
adding it immediately to a counter. In your current version, the
element-wise products allocate new vectors before computing the sums,
which is very costly.

This will also get rid of the difference you report between passing
arrays and vectors.


Regards

 function test1(d)
 res = 0.
 for i in 1:1
 res = cosine_vectorized(d[:,1], d[:,2])
 end
 end
 
 
 function test2(_v,_w)
 res = 0.
 for i in 1:1
 res = cosine_vectorized(_v, _w)
 end
 end
 
 
 test1(dtm)
 test2(v,w)
 gc()
 @time test1(dtm)
 gc()
 @time test2(v,w)
 
 
 #elapsed time: 0.054925372 seconds (59360080 bytes allocated, 59.07%
 gc time)
 
 #elapsed time: 4.204132608 seconds (3684160080 bytes allocated, 65.51%
 gc time)

[julia-users] Silhouette width

2014-11-10 Thread Francesco Brundu

Hi all,
I am new to Julia. I searched a bit but I did not find anything related to 
the silhouette (http://en.wikipedia.org/wiki/Silhouette_(clustering)) ..
Do you know if there is something about it?

Thanks,
Francesco

[julia-users] Input arguments to gemm!

2014-11-10 Thread Kapil Agarwal

Hi

I am unable to figure out what should I pass as input parameters to the 
gemm! function. The function declaration asks for function BlasChar, 
StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?

--
Kapil

Re: [julia-users] Reinterpreting parts of a byte array

2014-11-10 Thread Sebastian Good

Thanks for the responses. As usual, I discover myself making assumptions
that may not have been stated well.

1. I'll be reading small bits (32 bit ints, mostly) at fairly random
addresses and was worried about the overhead of creating array views for
such small objects. Perhaps they are optimized away. I should check :-)
2. I've been taught by other languages that touching raw pointers is
dangerous without also holding some promise that they won't be relocated,
e.g. by a copying collector, etc. I suppose if it's a memory mapped array,
I can roughly cheat and know that the OS won't move it, so Julia can't
either. But it worried me.

*Sebastian Good*


On Sun, Nov 9, 2014 at 11:36 PM, Jameson Nash vtjn...@gmail.com wrote:

 It rather depends upon what you know about the data. If you want a
 file-like abstraction, it may be possible to wrap it in an IOBuffer type
 (if not, it should be parameterized to allow it). If you want an array-like
 abstraction, then I think reinterpreting to different array types may be
 the most direct approach. If the array is coming from C, then you can use
 unsafe_load/unsafe_store directly. As Ivar points out, this is not more nor
 less dangerous than the same operation in C. Although, if you wrap the data
 buffer in a Julia object (or got it from a Julia call), you can gain some
 element of protection against memory corruption bugs by minimizing the
 amount of julia code that is directly interfacing with the raw memory
 pointer.


 On Sun Nov 09 2014 at 5:42:42 PM Ivar Nesje iva...@gmail.com wrote:

 Is there any problem with reinterpreting the array and then use a
 SubArray or ArrayView to do the index transformation?

 Pointer arithmetic is not more or less dangerous in Julia, than what it
 is in C. The only thing you need to ensure is that the object you have a
 pointer to is referenced by something the GC traverses, and that it isn't
 moved in memory (Eg. vector resize).

Re: [julia-users] Silhouette width

2014-11-10 Thread Jacob Quinn

Check out the Clustering.jl package which has an interface for silhouette.
Specifically, see this file:
https://github.com/JuliaStats/Clustering.jl/blob/master/src/silhouette.jl

-Jacob

On Mon, Nov 10, 2014 at 5:53 AM, Francesco Brundu 
francesco.bru...@gmail.com wrote:

 Hi all,
 I am new to Julia. I searched a bit but I did not find anything related to
 the silhouette (http://en.wikipedia.org/wiki/Silhouette_(clustering)) ..
 Do you know if there is something about it?

 Thanks,
 Francesco

[julia-users] Re: Input arguments to gemm!



On Monday, November 10, 2014 8:39:00 AM UTC-5, Kapil Agarwal wrote:

 I am unable to figure out what should I pass as input parameters to the 
 gemm! function. The function declaration asks for function BlasChar, 
 StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?


Yes.  (Or rather, the StridedFoo types are a superset, including various 
1d/2d array types.)

Re: [julia-users] Re: Input arguments to gemm!

2014-11-10 Thread Andreas Noack

E.g.

julia A = randn(3,4);B = randn(4,3);C = Array(Float64,3,3);


julia BLAS.gemm!('N', 'N', 1.0, A, B, 0.0, C)

3x3 Array{Float64,2}:

 -1.39617  4.02968   -1.2171

 -2.35074  2.609030.216789

  1.63807  0.102948  -0.41358



2014-11-10 9:09 GMT-05:00 Steven G. Johnson stevenj@gmail.com:



 On Monday, November 10, 2014 8:39:00 AM UTC-5, Kapil Agarwal wrote:

 I am unable to figure out what should I pass as input parameters to the
 gemm! function. The function declaration asks for function BlasChar,
 StridedVecOrMat. StridedMatrix. Are they same as a normal Char and Array?


 Yes.  (Or rather, the StridedFoo types are a superset, including various
 1d/2d array types.)

[julia-users] Re: Great new expository article about Julia by the core developers

So how does one go about getting an invitation to JuliaBox? It's referenced 
in the article but you need an invitation to login

Dave.

On Saturday, 8 November 2014 22:58:31 UTC, Peter Simon wrote:

 Just found this great new highly accessible exposition about the Julia 
 language: http://arxiv.org/pdf/1411.1607v1.pdf, by Jeff et al.  It's the 
 perfect into to share with many of my not-yet-Julian colleagues.

 --Peter

Re: [julia-users] Re: strange speed reduction when using external function in inner loop

2014-11-10 Thread Rob J Goedman

David,

Not sure this is correct or helps, but on my Yosemite 10.10.1 MacBook Pro I 
get below results.

Regards,
Rob

*julia **@time prof(true)*

 Count FileFunction Line

47 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 15

   165 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   502 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

98 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

64 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 29

 5 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

20 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

45 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   883 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   45

   884 REPL.jl eval_user_input54

   502 array.jl+ 719

   165 random.jl   rand! 130

   884 task.jl anonymous  96

elapsed time: 1.51332406 seconds (488212276 bytes allocated, 53.00% gc time)


*julia **@time prof(true)*

 Count FileFunction Line

   156 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   577 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 21

   116 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 26

53 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

10 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

43 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 3 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   910 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   910 REPL.jl eval_user_input54

   577 array.jl+ 719

   156 random.jl   rand! 130

   910 task.jl anonymous  96

elapsed time: 1.488157718 seconds (488208960 bytes allocated, 50.96% gc 
time)


*julia **@time prof(true)*

 Count FileFunction Line

   174 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   545 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

   115 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

 2 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 26

46 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 1 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 29

 8 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

18 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

28 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

 3 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   9

   894 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   894 REPL.jl eval_user_input54

   545 array.jl+ 719

   174 random.jl   rand! 130

   894 task.jl anonymous  96

elapsed time: 1.448621207 seconds (488206436 bytes allocated, 49.75% gc 
time)


*julia **@time prof(true)*

 Count FileFunction Line

   165 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 19

   584 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 20

   117 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 23

51 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 27

 5 /Users/rob/Projects/Julia/Rob/innnercall.jl f! 31

16 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   6

34 /Users/rob/Projects/Julia/Rob/innnercall.jl mydot   7

   922 /Users/rob/Projects/Julia/Rob/innnercall.jl prof   14

   922 REPL.jl eval_user_input54

   584 array.jl+ 719

   165 random.jl   rand! 130

   922

[julia-users] travis for os x packages

2014-11-10 Thread Simon Byrne

I would like to set up travis for an OS X-only package: does anyone have 
suggestions for how I should set up travis (or has anyone already done 
this)?

simon

Re: [julia-users] Translating Class-Based OO Apps to Julia

2014-11-10 Thread Greg Trzeciak



On Thursday, January 17, 2013 2:56:52 AM UTC+1, Stefan Karpinski wrote:

 ... This definitely should go in an object-oriented programming in Julia 
 document.


Does a document like this exist? It would definitely be useful.

[julia-users] parallel for loop in Julia

2014-11-10 Thread DrKey

I'm a beginner at using Julia and I have written a simple molecular dynamic 
simulation, which works quite well and fast.

Now I'm trying to parallelize my core loop which calculates the forces 
between each pair of particles.

My loop is:

for partA = 1:nParts-1
for partB = (partA+1):nParts

# Calculate particle-particle distance
dr = coords[:,partA] - coords[:,partB];
   
dr2 = dot(dr,dr) 

invDr2 = 1.0/dr2; 
invDr6 = invDr2^3;
tforce = invDr2^4 * (invDr6 - 0.5);

forces[:,partA] = forces[:,partA] + dr* tforce ;
forces[:,partB] = forces[:,partB] - dr* tforce ;
end
end

coords is a array holding the 3 dimensional coordinates for each particle.
nParts is the number of particles and forces has the same size as coords 
and holds the forces for each particle.

I tried @parallel for with different reduction operators (I found + and 
vcat, of course with changing my loop a little bit) which are not 
documented very well. At least I only found examples for (+) in the help.
What is the best way to parallelize this?

[julia-users] Error in PyPlot; cm_get_cmap not defined

2014-11-10 Thread Nils Gudat

I'm using PyPlot to make 3D plots, which I color by getting color maps 
through ColorMap(::String). After running a Pkg.update() today, I am now 
getting an error message when trying to construct a 3D plot, saying 
cm_get_cmap not defined (...) at Plots.jl:141. 
Indeed, when checking colormaps.jl 
https://github.com/stevengj/PyPlot.jl/blob/master/src/colormaps.jl, I 
find that ColorMaps should lead to a call to get_cmap, not cm_get_cmap. Why 
is my PyPlot trying to get the color maps through a different function?

[julia-users] Absolute value of big(-0.0)

2014-11-10 Thread Samuel S. Watson

I'm getting (notice the negative sign):

abs(big(-0.0)) = -0e+00 with 256 bits of precision

I think it would be better to have abs(big(-0.0)) return 0e+00 (for 
example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an 
abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is 
ifelse(x0,-x,0), and -0 is not less than 0.

Re: [julia-users] Absolute value of big(-0.0)

2014-11-10 Thread Samuel S. Watson

Done: https://github.com/JuliaLang/julia/issues/8968

On Monday, November 10, 2014 12:06:31 PM UTC-5, Stefan Karpinski wrote:

 This is indeed a bug – could you open an issue? 
 https://github.com/JuliaLang/julia/issues

 On Mon, Nov 10, 2014 at 5:55 PM, Samuel S. Watson samuel@gmail.com 
 javascript: wrote:

 I'm getting (notice the negative sign):

 abs(big(-0.0)) = -0e+00 with 256 bits of precision

 I think it would be better to have abs(big(-0.0)) return 0e+00 (for 
 example, abs(-0.0) returns 0.0). Perhaps this could be fixed with an 
 abs(::BigFloat) method. It seems that the problem is that abs(x::Real) is 
 ifelse(x0,-x,0), and -0 is not less than 0.

[julia-users] Re: Error in PyPlot; cm_get_cmap not defined

Should be fixed now, sorry.

[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread John Drummond

Thank you, that's helpful. 
I reentered it all in a fresh session and found it working as well - I'll 
try and find the difference which caused it not to work and come back.
Kind Regards, John.

On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:

 This code works everywhere I'm able to try it. 

 kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:

 I was originally julia 0.3.1 on windows 7
 this is on Macosx 10 julia 0.3.2
 I loaded the file LogParse.jl below and then in the repl ran

 reload(LogParse.jl)

 methods(isless)


 ary1 = LogParse.DayPriceText[]
 push!(ary1,LogParse.DayPriceText(4,a1,1))
 push!(ary1,LogParse.DayPriceText(2,a1,1))
 push!(ary1,LogParse.DayPriceText(6,a1,1))


 sort(ary1)

 sort(ary1,lt=LogParse.isless)
 I get the same messages - methods(isless) shows that it's loaded
 but the sort can't find it, even when I try to specify the function


 #in file LogParse.jl ###
 module LogParse
 export DayPriceText
 import Base.isless

 type DayPriceText
   a1::Uint32
   b1::ASCIIString
   a2::Uint32
 end

 function isless(a::DayPriceText, b::DayPriceText)
   if (a.a1  b.a1)
 return true
   else
 return false
   end
 end


 end
 ##

 Many thanks.
 Kind regards, John


 On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:

 In this case it would be really great if you had a minimal reproducible 
 example. It looks to me as you are doing everything right, so I would start 
 looking for typos and scoping issues. It's hard to find them without 
 looking at the code.

 Ideally the example should be small and possible to paste into a REPL 
 session, but if you can publish your code and don't want to extract only 
 the relevant part, that might be fine too.

 Julia version and operating system is also nice to include, so that we 
 have it available in case we have problems reproducing your results.

 Regards Ivar

 kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende:

 Hi,
 I suspect I'm doing something stupid but no idea what I'm missing.

 I create a module .
 I create a type in it, DayPriceText
 I import Base.isless
 I define isless for the type

 now in the repl I get

 methods(isless)
 =
 # 25 methods for generic function isless:
 ..
 isless(x::DayPriceText,y::DayPriceText) at 
 c:\works\juliaplay\LogParse.jl:16

 but

 julia typeof(a1p)
 Array{DayPriceText,1}

 julia sort(a1p, lt=CILogParse.isless)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 julia sort(a1p)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 I'm sure there's some obvious answer, but I've not idea what.
 Thanks for any help
 kind regards, John.

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

2014-11-10 Thread David van Leeuwen

Hello, 

On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat wrote:

 Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit : 
  Hello, 
  
  On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: 
  NamedArrays.jl generally goes along this way. However, it 
  remains limited in two aspects: 
  
  
  1. Some fields in NamedArrays are not declared of specific 
  types. In particular, the field `dicts` is of the type 
  `Vector{Dict}`, and the use of this field is on the critical 
  path when looping over the table, e.g. when counting. This 
  would potentially lead to substantial impact on performance.  
  
  I suppose the problem you indicate can be alleviated by making 
  NamedArray parameterized by the type of the key in the dict as well.   
 Right. Sounds reasonable. 


I've been pondering over how this could be done. NamedArray has a type 
parameter N, and it should then further have N type parameters indicating 
the dictionary type along each of the N dimension.  So I figure this is 
going to be a challenging type definition.  

---david

[julia-users] Re: parallel for loop in Julia

2014-11-10 Thread DrKey

Here is what i tried:
variant1:

forcp = zeros(3,1);

forcp = @parallel (hcat) for partA = 1:nPart
for partB = (partA+1):nPart
...
end
forcp = forces[:,partA];
end

variant2:
function calcforces(coords,L,np,i) # with np... number of processes i... 
current process
for partA = i+1:np:nPart-1
for partB = (partA+1):nPart
...
return forces
end

np = nprocs();
parad = Array(RemoteRef,np);

and then calling function calcforces with: 
for i=1:np parad[i] = @spawn LJ_Force_MT(coords,L,np,i); end
for i=1:np forces = fetch(parad[i]); end

both ways are giving me wrong results over more than 1 timestep

[julia-users] Re: ANN: Compat.jl

2014-11-10 Thread David van Leeuwen

Hi Nils, 

My current work around is

## temporary compatibility hack
if VERSION  v0.4.0-dev
Base.Dict(z::Base.Zip2) = Dict(z.a, z.b)
end

On Monday, November 10, 2014 12:04:14 PM UTC+1, Nils Gudat wrote:

 Hi David,

 shouldnt it be @Compat Dict(zip(keys, values)) instead of 
 @Compat.Dict(zip(keys, values)), i.e. a space between compat and dict 
 rather than a dot method call?

 I was just following Stefan's syntax.  The dots on my screen are about as 
big as the stuck pieces of dust, but I really believe there is a period 
there. 

julia @Compat.Dict(:foo = 2, :bar = 2)
 Dict{Symbol,Int64} with 2 entries:
   :bar = 2
   :foo = 2

  Macro programming is beyond the scope of my brain, anyway...

---david

 Best,
 Nils

[julia-users] Re: ANN: Compat.jl

On Monday, November 10, 2014 1:15:40 PM UTC-5, David van Leeuwen wrote:
 

 I was just following Stefan's syntax.  The dots on my screen are about as 
 big as the stuck pieces of dust, but I really believe there is a period 
 there. 
  


The syntax in Compat.jl changed shortly after its release.  The new syntax 
is to use:

 @compat ...Julia 0.4 syntax

and have it be automatically translated into older syntax as needed.  If 
there is a case where this does not work, please file an issue.

[julia-users] JuliaBox

Hi,

Does anyone if JuliaBox http://www.juliabox.org is open to applications 
to use it these days? I came across it in the ArXiV paper about Julia 
mentioned here 
https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ. I'm 
a current Julia user but I have a number of colleagues who would be 
interested in a sandboxed, non-install version to play with before making 
the jump to installation. I made the mistake of suggesting JuliaBox before 
verifying that it was possible to create an account, it seems it's invite 
only for now.

Thanks,
Dave.

Re: [julia-users] Compressing .jld files



On Tuesday, September 2, 2014 3:58:25 PM UTC-4, Jake Bolewski wrote:

 It would be best to incorporate it into the HDF5 package.  A julia package 
 would be useful if you wanted to do the same sort of compression on Julia 
 binary blobs, such as serialized julia values in an IOBuffer.


Wouldn't it be better to have a separate Blosc.jl package that is used by 
HDF5.jl?   After all, there are presumably many other applications of this.

Note that HDF5 has a Blosc filter 
(http://www.hdfgroup.org/services/filters.html#blosc and 
https://github.com/Blosc/c-blosc/tree/master/hdf5), so that I guess you can 
use Blosc internally in the HDF5 file while still allowing HDF5 tools to 
work with the file.

Re: [julia-users] Contributing to a Julia Package

2014-11-10 Thread João Felipe Santos

Hi Tim,

you have to create a fork on Github and then push your new branch to your
personal fork. Then, on Github, switch to that fork and the interface will
show a Pull request button if your personal fork is ahead of the upstream
repository.

Best

--
João Felipe Santos

On Mon, Nov 10, 2014 at 2:17 PM, Tim Wheeler timwheeleronl...@gmail.com
wrote:

 Hello Julia Users,

 I wrote some code that I would like to submit via pull request to a Julia
 package. The thing is, I am new to this and do not understand the pull
 request process.

 What I have done:

- used Pkg.add to obtain a local version of said package
- ran `git branch mybranch` to create a local git branch
- created my code additions and used `git add` to include them. Ran
`git commit -m`

 I am confused over how to continue. The instructions on git for issuing a
 pull request require that I use their UI interface, but my local branch is
 not going to show up when I select new pull request because it is, well,
 local to my machine. Do I need to fork the repository first? When I try
 creating a branch through the UI I do not get an option to create one like
 they indicate in the tutorial
 https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch,
 perhaps because I am not a repo owner.

 Thank you.

Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda

Hello David,

Sorry about that. You can use the invite code G01014. How many others do
you want to invite? A handful should be fine. Just do not publish it online.

Thank you

On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithiohuig...@gmail.com
wrote:

 Hi,

 Does anyone if JuliaBox http://www.juliabox.org is open to applications
 to use it these days? I came across it in the ArXiV paper about Julia
 mentioned here
 https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ.
 I'm a current Julia user but I have a number of colleagues who would be
 interested in a sandboxed, non-install version to play with before making
 the jump to installation. I made the mistake of suggesting JuliaBox before
 verifying that it was possible to create an account, it seems it's invite
 only for now.

 Thanks,
 Dave.

Re: [julia-users] JuliaBox

Thanks Ivar.

5 people Shashi, all academics so I'd like to get them interested.

Dave.

On Monday, 10 November 2014 19:31:17 UTC, Shashi Gowda wrote:

 Hello David,

 Sorry about that. You can use the invite code G01014. How many others do 
 you want to invite? A handful should be fine. Just do not publish it online.

 Thank you

 On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithio...@gmail.com 
 javascript: wrote:

 Hi,

 Does anyone if JuliaBox http://www.juliabox.org is open to 
 applications to use it these days? I came across it in the ArXiV paper 
 about Julia mentioned here 
 https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ. 
 I'm a current Julia user but I have a number of colleagues who would be 
 interested in a sandboxed, non-install version to play with before making 
 the jump to installation. I made the mistake of suggesting JuliaBox before 
 verifying that it was possible to create an account, it seems it's invite 
 only for now.

 Thanks,
 Dave.

Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda

On Tue, Nov 11, 2014 at 1:01 AM, Shashi Gowda shashigowd...@gmail.com
wrote:


 Just do not publish it online.


Oops I meant to send it to David directly. If anyone else wants a code,
please let me know.

Re: [julia-users] JuliaBox

2014-11-10 Thread Shashi Gowda

Sure :) Happy to let them in.

On Tue, Nov 11, 2014 at 1:02 AM, David Higgins daithiohuig...@gmail.com
wrote:

 Thanks Ivar.

 5 people Shashi, all academics so I'd like to get them interested.

 Dave.

 On Monday, 10 November 2014 19:31:17 UTC, Shashi Gowda wrote:

 Hello David,

 Sorry about that. You can use the invite code G01014. How many others do
 you want to invite? A handful should be fine. Just do not publish it online.

 Thank you

 On Tue, Nov 11, 2014 at 12:15 AM, David Higgins daithio...@gmail.com
 wrote:

 Hi,

 Does anyone if JuliaBox http://www.juliabox.org is open to
 applications to use it these days? I came across it in the ArXiV paper
 about Julia mentioned here
 https://groups.google.com/d/msg/julia-users/DtjfcslGcMw/s-QBbFnelugJ.
 I'm a current Julia user but I have a number of colleagues who would be
 interested in a sandboxed, non-install version to play with before making
 the jump to installation. I made the mistake of suggesting JuliaBox before
 verifying that it was possible to create an account, it seems it's invite
 only for now.

 Thanks,
 Dave.

[julia-users] Re: JuliaBox

2014-11-10 Thread Pablo Zubieta

Hi Shashi, I would like a code too.

Thanks in advance,
Pablo

[julia-users] Re: Contributing to a Julia Package

2014-11-10 Thread Tim Wheeler

Thank you! It seems to have worked.
Per João's suggestions, I had to:


   - Create a fork on Github of the target package repository
   - Clone my fork locally
   - Create a branch on my local repository
   - Add, commit,  push my changes to said branch
   - On github I could then submit the pull request from my forked repo to 
   the upstream master






On Monday, November 10, 2014 11:17:55 AM UTC-8, Tim Wheeler wrote:

 Hello Julia Users,

 I wrote some code that I would like to submit via pull request to a Julia 
 package. The thing is, I am new to this and do not understand the pull 
 request process.

 What I have done:

- used Pkg.add to obtain a local version of said package
- ran `git branch mybranch` to create a local git branch 
- created my code additions and used `git add` to include them. Ran 
`git commit -m`

 I am confused over how to continue. The instructions on git for issuing a 
 pull request require that I use their UI interface, but my local branch is 
 not going to show up when I select new pull request because it is, well, 
 local to my machine. Do I need to fork the repository first? When I try 
 creating a branch through the UI I do not get an option to create one like 
 they indicate in the tutorial 
 https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch,
  
 perhaps because I am not a repo owner.

 Thank you.

Re: [julia-users] JuliaBox



On Monday, 10 November 2014 19:33:09 UTC, Shashi Gowda wrote:


 On Tue, Nov 11, 2014 at 1:01 AM, Shashi Gowda shashi...@gmail.com 
 javascript: wrote:


 Just do not publish it online.


 Oops I meant to send it to David directly. If anyone else wants a code, 
 please let me know.


I did wonder about this bit :P

Thank you very much in any case.

Dave

Re: [julia-users] travis for os x packages

2014-11-10 Thread Elliot Saba

Yep.  Essentially, you'll need to enable the osx build environment
http://docs.travis-ci.com/user/osx-ci-environment/.  It looks like Travis
is not accepting http://docs.travis-ci.com/user/multi-os/ more multi-os
requests at the moment, so the typical approach, (used on, for instance,
the main julia repository
https://github.com/JuliaLang/julia/blob/master/.travis.yml#L2-L4) won't
work.

You may not be able to get it to run on multiple OS'es, but you should be
able to get it to run on OSX only by setting the language to
objective-c.  This will get it to run on OSX only, then you can use
the default
.travis.yml file
https://github.com/JuliaLang/julia/blob/tk/default-travis-multi-os/base/pkg/generate.jl#L139-L155
that is generated by Pkg.

In short, you should be able to take that default file, change the language
to objective-c, remove the os block, and call it good.  Save that as
.travis.yml in your repo, enable Travis in your repository's services
section, and test away!
-E

On Mon, Nov 10, 2014 at 7:50 AM, Simon Byrne simonby...@gmail.com wrote:

 I would like to set up travis for an OS X-only package: does anyone have
 suggestions for how I should set up travis (or has anyone already done
 this)?

 simon

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Douglas Bates

On Monday, November 10, 2014 12:55:24 PM UTC-6, Steven G. Johnson wrote:



 On Tuesday, September 2, 2014 3:58:25 PM UTC-4, Jake Bolewski wrote:

 It would be best to incorporate it into the HDF5 package.  A julia 
 package would be useful if you wanted to do the same sort of compression on 
 Julia binary blobs, such as serialized julia values in an IOBuffer.


 Wouldn't it be better to have a separate Blosc.jl package that is used by 
 HDF5.jl?   After all, there are presumably many other applications of this.


That seems to be the most reasonable approach but I couldn't work out how 
to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
package aware of that location when building libhdf5.  Are there examples 
of how to do that?
 


 Note that HDF5 has a Blosc filter (
 http://www.hdfgroup.org/services/filters.html#blosc and 
 https://github.com/Blosc/c-blosc/tree/master/hdf5), so that I guess you 
 can use Blosc internally in the HDF5 file while still allowing HDF5 tools 
 to work with the file.

[julia-users] Re: Contributing to a Julia Package

2014-11-10 Thread Ivar Nesje

Another important point (for actively developed packages) is that Pkg.add()
checks out the commit of the latest released version registered in
METADATA.jl. Most packages do development on the master branch, so you
should likely base your changes on master, rather than the latest released
version.

To do this, you can use `Pkg.checkout()`, but `git checkout master` will
also work.

Ivar

kl. 21:07:49 UTC+1 mandag 10. november 2014 skrev Tim Wheeler følgende:

Thank you! It seems to have worked.
Per João's suggestions, I had to:

- Create a fork on Github of the target package repository
- Clone my fork locally
- Create a branch on my local repository
- Add, commit, push my changes to said branch
- On github I could then submit the pull request from my forked repo
to the upstream master

On Monday, November 10, 2014 11:17:55 AM UTC-8, Tim Wheeler wrote:

Hello Julia Users,

I wrote some code that I would like to submit via pull request to a Julia
package. The thing is, I am new to this and do not understand the pull
request process.

What I have done:

- used Pkg.add to obtain a local version of said package
- ran `git branch mybranch` to create a local git branch
- created my code additions and used `git add` to include them. Ran
`git commit -m`

I am confused over how to continue. The instructions on git for issuing a
pull request require that I use their UI interface, but my local branch is
not going to show up when I select new pull request because it is, well,
local to my machine. Do I need to fork the repository first? When I try
creating a branch through the UI I do not get an option to create one like
they indicate in the tutorial
https://help.github.com/articles/creating-and-deleting-branches-within-your-repository/#creating-a-branch,

perhaps because I am not a repo owner.

Thank you.

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Le lundi 10 novembre 2014 à 10:07 -0800, David van Leeuwen a écrit :
 Hello, 
 
 On Monday, November 10, 2014 11:01:59 AM UTC+1, Milan Bouchet-Valat wrote:
 Le dimanche 09 novembre 2014 à 23:48 -0800, David van Leeuwen a écrit 
 : 
  Hello, 
  
  On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: 
  NamedArrays.jl generally goes along this way. However, it 
  remains limited in two aspects: 
  
  
  1. Some fields in NamedArrays are not declared of specific 
  types. In particular, the field `dicts` is of the type 
  `Vector{Dict}`, and the use of this field is on the 
 critical 
  path when looping over the table, e.g. when counting. This 
  would potentially lead to substantial impact on 
 performance.  
  
  I suppose the problem you indicate can be alleviated by making 
  NamedArray parameterized by the type of the key in the dict as 
 well.   
 Right. Sounds reasonable. 
 
 
 
 I've been pondering over how this could be done. NamedArray has a type
 parameter N, and it should then further have N type parameters
 indicating the dictionary type along each of the N dimension.  So I
 figure this is going to be a challenging type definition.  
A tuple type could be used to give the type of the dimension names.

But there's another issue: `dicts::Vector{Dict}` cannot be defined more
precisely than that if heterogeneous types are allowed for different
dimensions. Is this a case where staged functions could be used to
generate efficient functions to access dictionaries?


Regards

[julia-users] Help optimizing sparse matrix code

2014-11-10 Thread Joshua Tokle

Hello! I'm trying to replace an existing matlab code with julia and I'm 
having trouble matching the performance of the original code. The matlab 
code is here:
https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m

The program clusters inventors from a database of patent applications. The 
input data is a sparse boolean matrix (named XX in the script), where each 
row defines an inventor and each column defines a feature. For example, the 
jth column might correspond to a feature first name is John. If there is 
a 1 in the XX[i, j], this means that inventor i's first name is John. Given 
an inventor i, we find similar inventors by identifying rows in the matrix 
that agree with XX[i, :] on a given column and then applying element-wise 
boolean operations to the rows. In the code, for a given value of `index`, 
C_lastname holds the unique column in XX corresponding to a last name 
feature such that XX[index, :] equals 1. C_firstname holds the unique 
column in XX corresponding to a first name feature such that XX[index, :] 
equals 1. And so on. The following code snippet finds all rows in the 
matrix that agree with XX[index, :] on full name and one of patent assignee 
name, inventory city, or patent class:

lump_index_2 = step  ((C_assignee | C_city | C_class))

The `step` variable is an indicator that's used to prevent the same 
inventors from being considered multiple times. My attempt at a literal 
translation of this code to julia is here:
https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl

The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean operations 
aren't supported for sparse matrices in julia, so I fake it with integer 
arithmetic.  The line that corresponds to the matlab code above is

lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class)))

The reason I grouped it this way is that initially `step` will be a 
sparse vector of all 1's, and I thought it might help to do the truly 
sparse arithmetic first.

I've been testing this code on a Windows 2008 Server. The test data 
contains 45,763 inventors and 274,578 possible features (in other words, XX 
is an 45,763 x 274,58 sparse matrix). The matlab program consistently takes 
about 70 seconds to run on this data. The julia version shows a lot of 
variation: it's taken as little as 60 seconds and as much as 10 minutes. 
However, most runs take around 3.5 to 4 minutes. I pasted one output from 
the sampling profiler here [1]. If I'm reading this correctly, it looks 
like the program is spending most of its time performing element-wise 
multiplication of the indicator vectors I described above.

I would be grateful for any suggestions that would bring the performance of 
the julia program in line with the matlab version. I've heard that the last 
time the matlab code was run on the full data set it took a couple days, so 
a slow-down of 3-4x is a signficant burden. I did attempt to write a more 
idiomatic julia version using Dicts and Sets, but it's slower than the 
version that uses sparse matrix operations:
https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl

Thank you!
Josh


[1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5

Re: [julia-users] Compressing .jld files



 Wouldn't it be better to have a separate Blosc.jl package that is used by 
 HDF5.jl?   After all, there are presumably many other applications of this.


 That seems to be the most reasonable approach but I couldn't work out how 
 to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
 package aware of that location when building libhdf5.  Are there examples 
 of how to do that?


I've just created a Blosc.jl package and registered it.   Do Pkg.update() 
and Pkg.add(Blosc) to get it.

To get the library location in the HDF5 package, just:

1) Add Blosc to the REQUIRE file
2) import Blosc
3) Blosc.libblosc is the path to the shared library.

Re: [julia-users] Help optimizing sparse matrix code

Le lundi 10 novembre 2014 à 13:03 -0800, Joshua Tokle a écrit :
 Hello! I'm trying to replace an existing matlab code with julia and
 I'm having trouble matching the performance of the original code. The
 matlab code is here:
 
 https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m
 
 The program clusters inventors from a database of patent applications.
 The input data is a sparse boolean matrix (named XX in the script),
 where each row defines an inventor and each column defines a feature.
 For example, the jth column might correspond to a feature first name
 is John. If there is a 1 in the XX[i, j], this means that inventor
 i's first name is John. Given an inventor i, we find similar inventors
 by identifying rows in the matrix that agree with XX[i, :] on a given
 column and then applying element-wise boolean operations to the rows.
 In the code, for a given value of `index`, C_lastname holds the unique
 column in XX corresponding to a last name feature such that
 XX[index, :] equals 1. C_firstname holds the unique column in XX
 corresponding to a first name feature such that XX[index, :] equals
 1. And so on. The following code snippet finds all rows in the matrix
 that agree with XX[index, :] on full name and one of patent assignee
 name, inventory city, or patent class:
 
 lump_index_2 = step  ((C_assignee | C_city | C_class))
 
 The `step` variable is an indicator that's used to prevent the same
 inventors from being considered multiple times. My attempt at a
 literal translation of this code to julia is here:
 
 https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl
 
 The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean
 operations aren't supported for sparse matrices in julia, so I fake it
 with integer arithmetic.  The line that corresponds to the matlab code
 above is
 
 lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class)))
You should be able to get a speedup by replacing this line with an
explicit `for` loop. First, you'll avoid memory allocation (one for each
+ or .* operation). Second, you'll be able to return as soon as the
index is found, instead of computing the value for all elements (IIUC
you're only looking for one index, right?).


My two cents

 The reason I grouped it this way is that initially `step` will be a
 sparse vector of all 1's, and I thought it might help to do the
 truly sparse arithmetic first.
 
 I've been testing this code on a Windows 2008 Server. The test data
 contains 45,763 inventors and 274,578 possible features (in other
 words, XX is an 45,763 x 274,58 sparse matrix). The matlab program
 consistently takes about 70 seconds to run on this data. The julia
 version shows a lot of variation: it's taken as little as 60 seconds
 and as much as 10 minutes. However, most runs take around 3.5 to 4
 minutes. I pasted one output from the sampling profiler here [1]. If
 I'm reading this correctly, it looks like the program is spending most
 of its time performing element-wise multiplication of the indicator
 vectors I described above.
 
 I would be grateful for any suggestions that would bring the
 performance of the julia program in line with the matlab version. I've
 heard that the last time the matlab code was run on the full data set
 it took a couple days, so a slow-down of 3-4x is a signficant burden.
 I did attempt to write a more idiomatic julia version using Dicts and
 Sets, but it's slower than the version that uses sparse matrix
 operations:
 
 https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl
 
 Thank you!
 Josh
 
 
 [1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5

Re: [julia-users] Compressing .jld files



On Monday, November 10, 2014 5:02:03 PM UTC-5, Steven G. Johnson wrote:

 I've just created a Blosc.jl package and registered it.   Do Pkg.update() 
 and Pkg.add(Blosc) to get it.


Oh, darn it, I just realized I am duplicating some work by jakebolewski...

[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread John Drummond

Got it - I don't know whether it's a bug or not.
if I comment out 
#import Base.isless
in the LogParse.jl file and initially reload that in the repl and then 
reload the correct version with
import Base.isless
methods(isless) shows the method but sort says it's not defined, even when 
I specify it directly.
Apologies for not checking the initial input in a fresh session, I thought 
that reloading a module would completely reload the functions, but 
presumably not when appending to those in Base.

Kind regards, John.




On Monday, November 10, 2014 6:04:29 PM UTC, John Drummond wrote:

 Thank you, that's helpful. 
 I reentered it all in a fresh session and found it working as well - I'll 
 try and find the difference which caused it not to work and come back.
 Kind Regards, John.

 On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:

 This code works everywhere I'm able to try it. 

 kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:

 I was originally julia 0.3.1 on windows 7
 this is on Macosx 10 julia 0.3.2
 I loaded the file LogParse.jl below and then in the repl ran

 reload(LogParse.jl)

 methods(isless)


 ary1 = LogParse.DayPriceText[]
 push!(ary1,LogParse.DayPriceText(4,a1,1))
 push!(ary1,LogParse.DayPriceText(2,a1,1))
 push!(ary1,LogParse.DayPriceText(6,a1,1))


 sort(ary1)

 sort(ary1,lt=LogParse.isless)
 I get the same messages - methods(isless) shows that it's loaded
 but the sort can't find it, even when I try to specify the function


 #in file LogParse.jl ###
 module LogParse
 export DayPriceText
 import Base.isless

 type DayPriceText
   a1::Uint32
   b1::ASCIIString
   a2::Uint32
 end

 function isless(a::DayPriceText, b::DayPriceText)
   if (a.a1  b.a1)
 return true
   else
 return false
   end
 end


 end
 ##

 Many thanks.
 Kind regards, John


 On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:

 In this case it would be really great if you had a minimal reproducible 
 example. It looks to me as you are doing everything right, so I would 
 start 
 looking for typos and scoping issues. It's hard to find them without 
 looking at the code.

 Ideally the example should be small and possible to paste into a REPL 
 session, but if you can publish your code and don't want to extract only 
 the relevant part, that might be fine too.

 Julia version and operating system is also nice to include, so that we 
 have it available in case we have problems reproducing your results.

 Regards Ivar

 kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond følgende:

 Hi,
 I suspect I'm doing something stupid but no idea what I'm missing.

 I create a module .
 I create a type in it, DayPriceText
 I import Base.isless
 I define isless for the type

 now in the repl I get

 methods(isless)
 =
 # 25 methods for generic function isless:
 ..
 isless(x::DayPriceText,y::DayPriceText) at 
 c:\works\juliaplay\LogParse.jl:16

 but

 julia typeof(a1p)
 Array{DayPriceText,1}

 julia sort(a1p, lt=CILogParse.isless)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 julia sort(a1p)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 I'm sure there's some obvious answer, but I've not idea what.
 Thanks for any help
 kind regards, John.

Re: [julia-users] Compressing .jld files



 That seems to be the most reasonable approach but I couldn't work out how 
 to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
 package aware of that location when building libhdf5.  Are there examples 
 of how to do that?


Note that the dependencies in some sense run in the opposite direction.  
You don't technically need to make HDF5 aware of Blosc when building 
libhdf5.  Instead, you need to build a Blosc filter for HDF5 (included 
with c-blosc) and register it with HDF5.

The Blosc.jl package can't build the HDF5 filter, because that would 
introduce an unnecessary dependency on HDF5 for other things using Blosc.   
So, at least this component needs to be built in/after the HDF5 package.

Re: [julia-users] Compressing .jld files

2014-11-10 Thread Jake Bolewski

The 64 bit issue is killer and why I didn't go farther with integrating 
blosc with hdf5.  I guess I should had been more vocal about this.  Take 
what you may from my nascent package :-)

On Monday, November 10, 2014 6:05:40 PM UTC-5, Steven G. Johnson wrote:


 That seems to be the most reasonable approach but I couldn't work out how 
 to create a Blosc.jl package which creates a libblosc DLL and make the HDF5 
 package aware of that location when building libhdf5.  Are there examples 
 of how to do that?


 Note that the dependencies in some sense run in the opposite direction.  
 You don't technically need to make HDF5 aware of Blosc when building 
 libhdf5.  Instead, you need to build a Blosc filter for HDF5 (included 
 with c-blosc) and register it with HDF5.

 The Blosc.jl package can't build the HDF5 filter, because that would 
 introduce an unnecessary dependency on HDF5 for other things using Blosc.   
 So, at least this component needs to be built in/after the HDF5 package.

[julia-users] Re: Great new expository article about Julia by the core developers

2014-11-10 Thread cdm

see this ...

https://groups.google.com/d/msg/julia-box/hw81as3GPWA/E1QJm1shnV4J



On Monday, November 10, 2014 7:37:08 AM UTC-8, David Higgins wrote:

 So how does one go about getting an invitation to JuliaBox? It's 
 referenced in the article but you need an invitation to login

 Dave.

[julia-users] Re: defining function for lt for use in sort - simple question

2014-11-10 Thread Ivar Nesje

That seems like a tricky edge case, indeed. Not sure if this is a bug 
either, or if there are any existing issues on github that covers this.

kl. 23:26:49 UTC+1 mandag 10. november 2014 skrev John Drummond følgende:

 Got it - I don't know whether it's a bug or not.
 if I comment out 
 #import Base.isless
 in the LogParse.jl file and initially reload that in the repl and then 
 reload the correct version with
 import Base.isless
 methods(isless) shows the method but sort says it's not defined, even when 
 I specify it directly.
 Apologies for not checking the initial input in a fresh session, I thought 
 that reloading a module would completely reload the functions, but 
 presumably not when appending to those in Base.

 Kind regards, John.




 On Monday, November 10, 2014 6:04:29 PM UTC, John Drummond wrote:

 Thank you, that's helpful. 
 I reentered it all in a fresh session and found it working as well - I'll 
 try and find the difference which caused it not to work and come back.
 Kind Regards, John.

 On Sunday, November 9, 2014 8:22:44 AM UTC, Ivar Nesje wrote:

 This code works everywhere I'm able to try it. 

 kl. 03:18:13 UTC+1 søndag 9. november 2014 skrev John Drummond følgende:

 I was originally julia 0.3.1 on windows 7
 this is on Macosx 10 julia 0.3.2
 I loaded the file LogParse.jl below and then in the repl ran

 reload(LogParse.jl)

 methods(isless)


 ary1 = LogParse.DayPriceText[]
 push!(ary1,LogParse.DayPriceText(4,a1,1))
 push!(ary1,LogParse.DayPriceText(2,a1,1))
 push!(ary1,LogParse.DayPriceText(6,a1,1))


 sort(ary1)

 sort(ary1,lt=LogParse.isless)
 I get the same messages - methods(isless) shows that it's loaded
 but the sort can't find it, even when I try to specify the function


 #in file LogParse.jl ###
 module LogParse
 export DayPriceText
 import Base.isless

 type DayPriceText
   a1::Uint32
   b1::ASCIIString
   a2::Uint32
 end

 function isless(a::DayPriceText, b::DayPriceText)
   if (a.a1  b.a1)
 return true
   else
 return false
   end
 end


 end
 ##

 Many thanks.
 Kind regards, John


 On Friday, November 7, 2014 7:34:40 PM UTC, Ivar Nesje wrote:

 In this case it would be really great if you had a minimal 
 reproducible example. It looks to me as you are doing everything right, 
 so 
 I would start looking for typos and scoping issues. It's hard to find 
 them 
 without looking at the code.

 Ideally the example should be small and possible to paste into a REPL 
 session, but if you can publish your code and don't want to extract only 
 the relevant part, that might be fine too.

 Julia version and operating system is also nice to include, so that we 
 have it available in case we have problems reproducing your results.

 Regards Ivar

 kl. 20:14:48 UTC+1 fredag 7. november 2014 skrev John Drummond 
 følgende:

 Hi,
 I suspect I'm doing something stupid but no idea what I'm missing.

 I create a module .
 I create a type in it, DayPriceText
 I import Base.isless
 I define isless for the type

 now in the repl I get

 methods(isless)
 =
 # 25 methods for generic function isless:
 ..
 isless(x::DayPriceText,y::DayPriceText) at 
 c:\works\juliaplay\LogParse.jl:16

 but

 julia typeof(a1p)
 Array{DayPriceText,1}

 julia sort(a1p, lt=CILogParse.isless)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 julia sort(a1p)
 ERROR: `isless` has no method matching isless(::DayPriceText, 
 ::DayPriceText)
  in sort! at sort.jl:246

 I'm sure there's some obvious answer, but I've not idea what.
 Thanks for any help
 kind regards, John.

[julia-users] Re: JuliaBox

2014-11-10 Thread cdm


the Sagemath Cloud google chrome app also gets users to a rich environment 
for Julia ...

  
 
https://chrome.google.com/webstore/detail/the-sagemath-cloud/eocdndagganmilahaiclppjigemcinmb


users can run Julia inside a terminal ... OR ... via iJulia notebooks ... 
OR ... via Sagemath worksheets.


also available for running Julia within a terminal, the VMs served at

   https://koding.com (there is also a google chrome app for this ...)


best,

cdm


On Monday, November 10, 2014 11:04:13 AM UTC-8, Ivar Nesje wrote:

 Yesterday someone suggested 
 https://groups.google.com/forum/#!searchin/julia-users/monster/julia-users/zEp8pKkEYHk/Oqb7NYdxFcwJ
  

  https://tmpnb.org/

[julia-users] Available packages for compression?

Pkg.add(Blosc) should now add a working Blosc package.

Re: [julia-users] Compressing .jld files



On Monday, November 10, 2014 6:09:50 PM UTC-5, Jake Bolewski wrote:

 The 64 bit issue is killer and why I didn't go farther with integrating 
 blosc with hdf5.  I guess I should had been more vocal about this.  Take 
 what you may from my nascent package :-) 


Google's Snappy library has a 64-bit API, but seems to also be limited to 
32-bit sizes internally, as is the LZ4 library.  Kind of surprising that so 
many people would independently limit themselves to 32-bit buffers nowadays.

Re: [julia-users] Compressing .jld files

On Monday, November 10, 2014 8:39:41 PM UTC-5, Steven G. Johnson wrote:

 Google's Snappy library has a 64-bit API, but seems to also be limited to 
 32-bit sizes internally, as is the LZ4 library.  Kind of surprising that so 
 many people would independently limit themselves to 32-bit buffers nowadays.


Snappy's only excuse was backwards compatibility: 
https://code.google.com/p/snappy/issues/detail?id=76

Re: [julia-users] travis for os x packages

2014-11-10 Thread Tony Kelman

I don't want to steal Pontus Stenetorp's thunder since he did all the work, 
but there's a PR open 
here https://github.com/travis-ci/travis-build/pull/318 that will sooner or 
later add community maintained support for Julia directly in Travis as 
`language: julia`. The default .travis.yml for Julia packages can be 
simplified even further once that gets rolled out.

That doesn't fix the capacity issues at Travis where they aren't accepting 
new repos, so for now the `language: objective-c` version, and using the 
install-julia.sh script, is the best way to temporarily test things out on 
Mac workers.


On Monday, November 10, 2014 12:32:34 PM UTC-8, Elliot Saba wrote:

 Yep.  Essentially, you'll need to enable the osx build environment 
 http://docs.travis-ci.com/user/osx-ci-environment/.  It looks like 
 Travis is not accepting http://docs.travis-ci.com/user/multi-os/ more 
 multi-os requests at the moment, so the typical approach, (used on, for 
 instance, the main julia repository 
 https://github.com/JuliaLang/julia/blob/master/.travis.yml#L2-L4) won't 
 work.

 You may not be able to get it to run on multiple OS'es, but you should be 
 able to get it to run on OSX only by setting the language to 
 objective-c.  This will get it to run on OSX only, then you can use the 
 default 
 .travis.yml file 
 https://github.com/JuliaLang/julia/blob/tk/default-travis-multi-os/base/pkg/generate.jl#L139-L155
  
 that is generated by Pkg.

 In short, you should be able to take that default file, change the 
 language to objective-c, remove the os block, and call it good.  Save 
 that as .travis.yml in your repo, enable Travis in your repository's 
 services section, and test away!
 -E

 On Mon, Nov 10, 2014 at 7:50 AM, Simon Byrne simon...@gmail.com 
 javascript: wrote:

 I would like to set up travis for an OS X-only package: does anyone have 
 suggestions for how I should set up travis (or has anyone already done 
 this)?

 simon

[julia-users] Displaying a polygon mesh

2014-11-10 Thread Simon Kornblith

Is there an easy way to display a polygon mesh in Julia, i.e., vertices and 
faces loaded from an STL file or created by marching tetrahedra using 
Meshes.jl? So far, I see:

   - PyPlot/matplotlib, which seems to be surprisingly difficult to 
   convince to do this.
   - GLPlot, which doesn't currently work for me on 0.4. (I haven't tried 
   very hard yet.)
   - ihnorton's VTK bindings, which aren't registered in METADATA.jl. 

Is there another option I'm missing? If not, can I convince one of these 
packages to show my mesh with minimal time investment, or should I use a 
separate volume viewer (or maybe a Python package via PyPlot)?

Thanks,
Simon

[julia-users] Julia Tech Talk at the University of Pennsylvania

2014-11-10 Thread Ted Fujimoto

Hi all,

Feel free to come by if you're around Philly!

Julia Tech Talk on Thursday, November 13 at 6:00pm at Wu and Chen Auditorium
When: Thursday, November 13 
https://www.facebook.com/events/calendar/2014/November/13 at 6:00pm
Where: Wu and Chen Auditorium 
https://www.facebook.com/pages/Wu-and-Chen-Auditorium/145368958832977 
Philadelphia, 
Pennsylvania 19104

 

On Thursday, November 13th @ 6pm the Dining Philosophers will be hosting a 
talk on the Julia Programming language in Wu  Chen Auditorium. Julia has 
the elegance and familiarity of Python and Matlab, with speed close to C, 
and is completely open source. This is a great opportunity for anyone 
interested in scientific and parallel computation, machine learning, data 
analysis, and visualization. There will be a giveaway of online JuliaBox 
codes for the Julia language for all attendees!

Speakers: Ted Fujimoto (CIT Masters student) and Randy Zwitch (Senior Data 
Scientist at Comcast)

 

Randy Zwitch is Senior Data Scientist at Comcast, researching how to 
improve the overall customer viewing experience using petabyte-scale tools 
and datasets. Randy also contributes to the R and Julia open-source 
communities, creating and maintaining packages primarily related to the web 
(HTTP requests/APIs, Server Log Parsing, Geo-Location, etc.) and database 
access. 


Abstract: Using publicly available datasets, Randy will provide an intro to 
machine learning using ad-hoc Julia code and via add-on packages.

[julia-users] Questions relating to packages and using/creating them

2014-11-10 Thread Dom Luna

I have some general questions about using packages.

1. Is there a way to create a workspace separate of $HOME/.julia? This 
would still have the same functionality when calling using in the REPL.
2. What's the best practice for packages with the same name? I don't have a 
problem related to this but I'm just curious how this is handled. I think 
via Pkg.add(...) there's only one definition of any package name, but with 
Pkg.clone(...) I could see package name collisions. Having all the packages 
under one directory doesn't seem scalable to me.

thanks

Re: [julia-users] Questions relating to packages and using/creating them

2014-11-10 Thread Isaiah Norton

1. see LOAD_PATH (http://julia.readthedocs.org/en/latest/manual/modules/)
2. this is not specifically supported, as far as I know. We could be fancy
and add a UUID to the package spec, or something like that, but I don't
think it is a very pressing concern right now. The simple options right now
are to manipulate LOAD_PATH to put the preferred package path(s) first (I
think this should work) or to manually `require` a specific path (which
won't work with `using`).

On Mon, Nov 10, 2014 at 9:25 PM, Dom Luna dluna...@gmail.com wrote:

 I have some general questions about using packages.

 1. Is there a way to create a workspace separate of $HOME/.julia? This
 would still have the same functionality when calling using in the REPL.
 2. What's the best practice for packages with the same name? I don't have
 a problem related to this but I'm just curious how this is handled. I think
 via Pkg.add(...) there's only one definition of any package name, but with
 Pkg.clone(...) I could see package name collisions. Having all the packages
 under one directory doesn't seem scalable to me.

 thanks

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Aneesh Sathe

Unless I understood wrong (which is very possible) the 65536 bins were to 
cover all possible values a 16bit pixel can take. Though, in the actual 
graythresh function i will probably use 256 bins by default.

I did find the docs for adding custom formats 
(https://github.com/timholy/Images.jl/blob/master/doc/extendingIO.md) 

But perhaps making bio formats .jar file will be better in the long run for 
few reasons:

1) A lot more formats are covered so implementing that would allow coverage 
of more formats faster. 
2) I understand your reasons for making all images in the Gray range, but i 
prefer having real pixel values. That way its easier to correlate test 
data with something like Fiji or Matlab. And I don't understand Julia float 
handling fully but there might be a gain in speed if using non-float 
values. 
3) Bio formats already allows the reading of individual images based on 
XYZCT so that doesn't  need to be rebuilt. 

Course, the above is the ideal thing to do. I'm still trying to figure out 
how to use the .jar file, so i might just end up adding the custom format 
first. 

Let's see...

-Aneesh

On Monday, November 10, 2014 6:55:08 PM UTC+8, Tim Holy wrote:

 All good plans. (I'm not sure about using 65536 bins for 16-bit images, 
 though, because that would be more bins than there are pixels in some 
 images. 
 Still, it's not all that much memory, really, so maybe that would be OK.) 

 It would be great to add native support. Presumably you've found the docs 
 on 
 adding support for new formats. 

 For formats that encode large datasets in a single block (like NRRD), you 
 can 
 work with GB-sized datasets on a laptop because you can use mmap (I do it 
 routinely). But the love of TIFF does demand an alternative solution. 
 Presumably we should add a lower-level routine that returns a structure 
 that 
 facilitates later access, e.g., 
 imds = imdataset(my_image_file) 
 img = imds[z, 14, t, 7] 
 or somesuch. 

 --Tim 

 On Sunday, November 09, 2014 07:38:27 PM Aneesh Sathe wrote: 
  Tim, 
  i would like the imhist to be idiot proof. (i've been teaching matlab 
 and 
  nothing puts new people off more than things not being idiot proof). 
  things like using 256 bins by default returning a plot  if no 
 outputs 
  are specified (basically make it like matlab's imthresh() ) 
  
  Btw, on matlab using bioformats is actually the slowest part of my 
  algorithm, so unless it can be faster in julia native support might be 
  nicer. Bioformats also fails in that it reads the whole sequence at 
 once... 
  so running things on laptops with even GB-level datasets is impossible. 
 I 
  wrote my own version of bfopen to only open the required XYZCT for 
  specified series, but that only solves the memory usage. 
  
  the source format for my image was .mvd2 (perkin elmer spinning disk). 
  
  i know about JavaCall.jl just havent had the time to play with it... 
  
  i was thinking it might be fun to attempt native support for a few 
 formats. 
  I can also generate test data in a few vendor formats for a few 
  microscopes. 
  perhaps even make it a julia-box based project. ;) 
  
  On Monday, November 10, 2014 4:49:22 AM UTC+8, Tim Holy wrote: 
   On Sunday, November 09, 2014 11:39:53 AM Aneesh Sathe wrote: 
Yes, Images does read it okay but only if i cut out the substack. If 
 i 
don't, then it interprets the three channels as a time dimension, 
 which 
isnt a pain at the moment but will be if i start using it for work. 
   
   Hmm, that sounds like an annotation problem. 
   
I realized that both the convert and the g[:] would slow me down but 
 the 
hist function just wouldn't work without that kind of dance. Also, 
graythresh (http://www.mathworks.com/help/images/ref/graythresh.html) 

   
   uses 
   
reshape to make it all one image which might also add to speed. 

The pull request is well and good but personally i would rather have 
 a 
dedicated image histogram function like 
imhist: http://www.mathworks.com/help/images/ref/imhist.html 
which would give histograms based on input images. To me that's the 
 only 
way to make life easier. maybe i'll write one :) 
   
   imhist is necessary in matlab largely because hist works columnwise; 
 in a 
   sense, Julia's `hist` is like imhist. Is there some specific 
 functionality 
   you're interested in? There's no reason Images can't provide a custom 
   version 
   of `hist`. 
   
Something about Images: do you think it possible to use the bio 
 formats' 
.jar file to import images from a microscope format to Images? 
Opening a microscope format image file in the relevant software and 
 then 
exporting it as tiff takes too long and i'd rather be able to access 
 the 
images directly. 
   
   Yes, expansion of Images' I/O capabilities would be great. I've 
 wondered 
   about 
   Bio-Formats myself, but not had a direct need, nor do I know Java (but 
 see

Re: [julia-users] Performance confusions on matrix extractions in loops, and memory allocations

2014-11-10 Thread Todd Leo

I do, actually, tried expanding vectorized operations into explicit for 
loops, and computing vector multiplication / vector norm in BLAS 
interfaces. For explicit loops, it did allocate less memory, but took much 
more time. Meanwhile, the vectorized version which I've been get used to 
write runs incredibly fast, as the following tests indicates:

# Explicit for loop, slightly modified from SimilarityMetric.jl by 
johnmyleswhite 
(https://github.com/johnmyleswhite/SimilarityMetrics.jl/blob/master/src/cosine.jl)
function cosine(a::SparseMatrixCSC{Float64, Int64}, 
b::SparseMatrixCSC{Float64, Int64})
sA, sB, sI = 0.0, 0.0, 0.0
for i in 1:length(a)
sA += a[i]^2
sI += a[i] * b[i]
end
for i in 1:length(b)
sB += b[i]^2
end
return sI / sqrt(sA * sB)
end

# BLAS version
function cosine_blas(i::SparseMatrixCSC{Float64, Int64}, 
j::SparseMatrixCSC{Float64, Int64})
i = full(i)
j = full(j)
numerator = BLAS.dot(i, j)
denominator = BLAS.nrm2(i) * BLAS.nrm2(j)
return numerator / denominator
end

# the vectorized version remains the same, as the 1st post shows.

# Test functions
function test_explicit_loop(d)
for n in 1:1
v = d[:,1]
cosine(v,v)
end
end
  
function test_blas(d)
for n in 1:1
v = d[:,1]
cosine_blas(v,v)
end
end
  
function test_vectorized(d)
for n in 1:1
v = d[:,1]
cosine_vectorized(v,v)
end
end

test_explicit_loop(mat)
test_blas(mat)
test_vectorized(mat)
gc()
@time test_explicit_loop(mat)
gc()
@time test_blas(mat)
gc()
@time test_vectorized(mat)

# Results
elapsed time: 3.772606858 seconds (6240080 bytes allocated)
elapsed time: 0.400972089 seconds (327520080 bytes allocated, 81.58% gc 
time)
elapsed time: 0.011236068 seconds (34560080 bytes allocated)


On Monday, November 10, 2014 7:23:17 PM UTC+8, Milan Bouchet-Valat wrote:

  Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo a écrit : 

 Hi fellows,  

  

  I'm currently working on sparse matrix and cosine similarity 
 computation, but my routines is running very slow, at least not reach my 
 expectation. So I wrote some test functions, to dig out the reason of 
 ineffectiveness. To my surprise, the execution time of passing two vectors 
 to the test function and passing the whole sparse matrix differs greatly, 
 the latter is 80x faster. I am wondering why extracting two vectors of the 
 matrix in each loop is dramatically faster that much, and how to avoid the 
 multi-GB memory allocate. Thanks guys. 

  

  -- 

  BEST REGARDS, 

  Todd Leo 

  

  # The sparse matrix 

  mat # 2000x15037 SparseMatrixCSC{Float64, Int64} 

  

  # The two vectors, prepared in advance 

  v = mat'[:,1] 

  w = mat'[:,2] 

  

  # Cosine similarity function 

  function cosine_vectorized(i::SparseMatrixCSC{Float64, Int64}, 
 j::SparseMatrixCSC{Float64, Int64}) 

  return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j)) 

  end 

 I think you'll experience a dramatic speed gain if you write the sums in 
 explicit loops, accessing elements one by one, taking their product and 
 adding it immediately to a counter. In your current version, the 
 element-wise products allocate new vectors before computing the sums, which 
 is very costly.

 This will also get rid of the difference you report between passing arrays 
 and vectors.


 Regards

  function test1(d) 

  res = 0. 

  for i in 1:1 

  res = cosine_vectorized(d[:,1], d[:,2]) 

  end 

  end 

  

  function test2(_v,_w) 

  res = 0. 

  for i in 1:1 

  res = cosine_vectorized(_v, _w) 

  end 

  end 

  

  test1(dtm) 

  test2(v,w) 

  gc() 

  @time test1(dtm) 

  gc() 

  @time test2(v,w) 

  

  #elapsed time: 0.054925372 seconds (59360080 bytes allocated, 59.07% gc 
 time)

  #elapsed time: 4.204132608 seconds (3684160080 bytes allocated, 65.51% 
 gc time)

Re: [julia-users] Image processing: Otsu's method thresholding. Help with optimizing code/algorithm

2014-11-10 Thread Tim Holy

On Monday, November 10, 2014 06:49:17 PM Aneesh Sathe wrote:
 2) I understand your reasons for making all images in the Gray range, but i 
 prefer having real pixel values. That way its easier to correlate test
 data with something like Fiji or Matlab. And I don't understand Julia float
 handling fully but there might be a gain in speed if using non-float
 values.

They're not really float values, underneath they are integers. You can just say 
`reinterpret(Uint16, x)`.

--Tim

[julia-users] Re: Displaying a polygon mesh