Re: [julia-users] Re: eachline() work with pmap() is slow

2016-10-16 Thread Jeremy McNees
Care to share the code you used?

On Sun, Oct 16, 2016 at 9:24 AM  wrote:

> Problem solved by hand writing an implementation of ``pmap`` & codes
> updated :)
>
>


[julia-users] Re: eachline() work with pmap() is slow

2016-10-14 Thread Jeremy McNees
I need to run something similar due to a large number of text files that I 
have. They are too large to load into memory at one-time, let alone 
multiple files at the same time. I find that pmap() works very well here. 

First, you should wrap your for loop in a function. In general you should 
block your code with functions in Julia. Second, can you provide a 
determiner to the split function? 

Third, you may not need 32 procs for this job. There's overhead associated 
with parallel processing. 

This stackoverflow post has some more information that might be useful: 
http://stackoverflow.com/questions/21890893/reading-csv-in-julia-is-slow-compared-to-python/35120894?noredirect=1#comment66827279_35120894



[julia-users] Re: eachline() work with pmap() is slow

2016-10-14 Thread Jeremy McNees
I need to run something similar due to a large number of text files that I 
have. They are too large to load into memory at one-time, let alone 
multiple files at the same time. I find that pmap() works very well here. 

First, you should wrap your for loop in a function. In general you should 
block your code with functions in Julia. Second, can you provide a 
determiner to the split function? 

Third, you may not need 32 procs for this job. There's overhead associated 
with parallel processing. 

This stackoverflow post has some more information that might be useful: 
http://stackoverflow.com/questions/21890893/reading-csv-in-julia-is-slow-compared-to-python/35120894?noredirect=1#comment66827279_35120894


On Thursday, October 13, 2016 at 11:45:36 PM UTC-4, love...@gmail.com wrote:
>
> I want to process each line of a large text file (100G) in parallel using 
> the following code
>
> pmap(process_fun, eachline(the_file))
>
> however, it seems that pmap is slow. following is a dummy experiment:
>
> julia> writedlm("tmp.txt",rand(10,100)) # produce a large file
> julia> @time for l in eachline("tmp.txt")
>   split(l)
>   end
>   5.678517 seconds (11.00 M allocations: 732.637 MB, 40.67% gc time)
>
> julia> addprocs() # 32 core
>
> julia> @time map(split, eachline("tmp.txt"));
>   4.834571 seconds (11.00 M allocations: 734.638 MB, 32.84% gc time)
>
> julia> @time pmap(split, eachline("tmp.txt"));
> 112.275411 seconds (227.06 M allocations: 10.024 GB, 50.72% gc time)
>
> the goal is to process those files (300+) as fast as possible. and maybe 
> there are better ways to call pmap?
>


[julia-users] Re: RDatasets "UndefVarError: displaysize not defined"

2016-09-08 Thread Jeremy McNees
Have the same Issue. No problem in the REPL, but does not work in IJulia.

On Monday, August 29, 2016 at 5:13:52 PM UTC-4, Rock Pereira wrote:
>
> It also works in 0.4.6 in the REPL
>