[julia-users] Re: Parallel file access

2016-10-17 Thread Zachary Roth
Thanks for the responses.

Raph, thank you again.  I very much appreciate your "humble offering". 
 I'll take a further look into your gist.

Steven, I'm happy to use the right tool for the job...so long as I have an 
idea of what it is.  Would you care to offer more insights or suggestions 
for the ill-informed (such as myself)?

---Zachary



On Sunday, October 16, 2016 at 7:51:19 AM UTC-4, Steven Sagaert wrote:
>
> that because SQLLite isn't a multi-user DB server but a single user 
> embedded (desktop) db. Use the right tool for the job.
>
> On Saturday, October 15, 2016 at 7:02:58 PM UTC+2, Ralph Smith wrote:
>>
>> How are the processes supposed to interact with the database?  Without 
>> extra synchronization logic, SQLite.jl gives (occasionally)
>> ERROR: LoadError: On worker 2:
>> SQLite.SQLiteException("database is locked")
>> which on the face of it suggests that all workers are using the same 
>> connection, although I opened the DB separately in each process.
>> (I think we should get "busy" instead of "locked", but then still have no 
>> good way to test for this and wait for a wake-up signal.)
>> So we seem to be at least as badly off as the original post, except with 
>> DB calls instead of simple writes.
>>
>> We shouldn't have to stand up a separate multithreaded DB server just for 
>> this. Would you be kind enough to give us an example of simple (i.e. not 
>> client-server) multiprocess DB access in Julia?
>>
>> On Saturday, October 15, 2016 at 9:40:17 AM UTC-4, Steven Sagaert wrote:
>>>
>>> It still surprises me how in the scientific computing field people still 
>>> refuse to learn about databases and then replicate database functionality 
>>> in files in a complicated and probably buggy way. HDF5  is one example, 
>>> there are many others. If you want to to fancy search (i.e. speedup search 
>>> via indices) or do things like parallel writes/concurrency you REALLY 
>>> should use databases. That's what they were invented for decades ago. 
>>> Nowadays there a bigger choice than ever: Relational or non-relational 
>>> (NOSQL), single host or distributed, web interface or not,  disk-based or 
>>> in-memory,... There really is no excuse anymore not to use a database if 
>>> you want to go beyond just reading in a bunch of data in one go in memory.
>>>
>>> On Monday, October 10, 2016 at 5:09:39 PM UTC+2, Zachary Roth wrote:
>>>>
>>>> Hi, everyone,
>>>>
>>>> I'm trying to save to a single file from multiple worker processes, but 
>>>> don't know of a nice way to coordinate this.  When I don't coordinate, 
>>>> saving works fine much of the time.  But I sometimes get errors with 
>>>> reading/writing of files, which I'm assuming is happening because multiple 
>>>> processes are trying to use the same file simultaneously.
>>>>
>>>> I tried to coordinate this with a queue/channel of `Condition`s managed 
>>>> by a task running in process 1, but this isn't working for me.  I've tried 
>>>> to simiplify this to track down the problem.  At least part of the issue 
>>>> seems to be writing to the channel from process 2.  Specifically, when I 
>>>> `put!` something onto a channel (or `push!` onto an array) from process 2, 
>>>> the channel/array is still empty back on process 1.  I feel like I'm 
>>>> missing something simple.  Is there an easier way to go about coordinating 
>>>> multiple processes that are trying to access the same file?  If not, does 
>>>> anyone have any tips?
>>>>
>>>> Thanks for any help you can offer.
>>>>
>>>> Cheers,
>>>> ---Zachary
>>>>
>>>

[julia-users] Re: Parallel file access

2016-10-14 Thread Zachary Roth
Thanks for the reply and suggestion, Ralph.  I tried to get this working 
with semaphores/mutexes/locks/etc.  But I've not been having any luck.

Here's a simplified, incomplete version of what I'm trying to do.  I'm 
hoping that someone can offer a suggestion if they see some sample code.  

function localfunction()
files = listfiles()
locks = [Threads.SpinLock() for _ in files]
ranges = getindexranges(length(files))

pmap(pairs(ranges)) do rows_and_cols
rows, cols = rows_and_cols
workerfunction(files, locks, rows, cols)
end
end

function workerfunction(files, locks, rows, cols)
data = kindofexpensive(...)
pairs = pairs(rows, cols)

@sync for idx in unique([rows; cols])
@async begin
lock(locks[idx])
updatefile(files[idx], data[idx])
unlock(locks[idx])
end
end
end

This (obviously) does not work.  I think that the problem is that the locks 
are being copied when the function is spawned on each process.  I've tried 
wrapping the locks/semaphores in Futures/RemoteChannels, but that also 
hasn't worked for me.

I found that I could do the sort of coordination that I need by starting 
Tasks on the local process.  More specifically, each file would have an 
associated Task to handle the coordination between processes.  But this 
only worked for me in a simplified situation with the Tasks being declared 
globally.  When I tried to implement this coordination within localfunction 
above, I got an error (really a bunch of errors) that said that a running 
Task cannot be serialized.

Sorry for the long post, but I'm really hoping that someone can help me 
out.  I have a feeling that I'm missing something pretty simple.

---Zachary




On Tuesday, October 11, 2016 at 10:15:06 AM UTC-4, Ralph Smith wrote:
>
> You can do it with 2 (e.g. integer) channels per worker (requests and 
> replies) and a task for each pair in the main process. That's so ugly I'd 
> be tempted to write an
>  interface to named system semaphores. Or just use a separate file for 
> each worker.
>
> On Monday, October 10, 2016 at 11:09:39 AM UTC-4, Zachary Roth wrote:
>>
>> Hi, everyone,
>>
>> I'm trying to save to a single file from multiple worker processes, but 
>> don't know of a nice way to coordinate this.  When I don't coordinate, 
>> saving works fine much of the time.  But I sometimes get errors with 
>> reading/writing of files, which I'm assuming is happening because multiple 
>> processes are trying to use the same file simultaneously.
>>
>> I tried to coordinate this with a queue/channel of `Condition`s managed 
>> by a task running in process 1, but this isn't working for me.  I've tried 
>> to simiplify this to track down the problem.  At least part of the issue 
>> seems to be writing to the channel from process 2.  Specifically, when I 
>> `put!` something onto a channel (or `push!` onto an array) from process 2, 
>> the channel/array is still empty back on process 1.  I feel like I'm 
>> missing something simple.  Is there an easier way to go about coordinating 
>> multiple processes that are trying to access the same file?  If not, does 
>> anyone have any tips?
>>
>> Thanks for any help you can offer.
>>
>> Cheers,
>> ---Zachary
>>
>

[julia-users] Parallel file access

2016-10-10 Thread Zachary Roth
Hi, everyone,

I'm trying to save to a single file from multiple worker processes, but 
don't know of a nice way to coordinate this.  When I don't coordinate, 
saving works fine much of the time.  But I sometimes get errors with 
reading/writing of files, which I'm assuming is happening because multiple 
processes are trying to use the same file simultaneously.

I tried to coordinate this with a queue/channel of `Condition`s managed by 
a task running in process 1, but this isn't working for me.  I've tried to 
simiplify this to track down the problem.  At least part of the issue seems 
to be writing to the channel from process 2.  Specifically, when I `put!` 
something onto a channel (or `push!` onto an array) from process 2, the 
channel/array is still empty back on process 1.  I feel like I'm missing 
something simple.  Is there an easier way to go about coordinating multiple 
processes that are trying to access the same file?  If not, does anyone 
have any tips?

Thanks for any help you can offer.

Cheers,
---Zachary


Re: [julia-users] Troubles saving/loading data (HDF5 / JLD)

2014-03-12 Thread Zachary Roth
Wonderful.  Running `Pkg.update()` did indeed do the trick (even for the 
original data set that I had tried to load).  Thanks so much for fixing 
this.  It's really great that a problem like this can be addressed so 
quickly after one simple post; it's certainly a breath of fresh air after 
working with something like MATLAB.

---Zach


[julia-users] Troubles saving/loading data (HDF5 / JLD)

2014-03-10 Thread Zachary Roth
I'm having some trouble loading saved data, and I haven't had any luck in 
looking for solutions online.  I've reduced the code to as short of an 
example as I could:

using HDF5, JLD

type Test
values::Vector{Integer}
end

x = Test([])
@save "test.jld" x
@load "test.jld" x

When I run the above code, I get the following error:

Error dereferencing object
while loading In[1], in expression starting on line 9
 in h5r_dereference at /home/zroth/.julia/HDF5/src/plain.jl:1673
 in getindex at /home/zroth/.julia/HDF5/src/jld.jl:178
 in getrefs at /home/zroth/.julia/HDF5/src/jld.jl:413
 in read at /home/zroth/.julia/HDF5/src/jld.jl:344
 in read at /home/zroth/.julia/HDF5/src/jld.jl:220
 in getrefs at /home/zroth/.julia/HDF5/src/jld.jl:415
 in read at /home/zroth/.julia/HDF5/src/jld.jl:384
 in read at /home/zroth/.julia/HDF5/src/jld.jl:220
 in read at /home/zroth/.julia/HDF5/src/jld.jl:209
 in anonymous at no file


HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 139795549558592:
  #000: ../../../src/H5R.c line 533 in H5Rdereference(): unable dereference 
object
major: References
minor: Unable to initialize object
  #001: ../../../src/H5R.c line 419 in H5R_dereference(): dereferencing deleted 
object
major: References
minor: Bad object header link count
  #002: ../../../src/H5O.c line 1542 in H5O_link(): unable to load object header
major: Object header
minor: Unable to load metadata into cache
  #003: ../../../src/H5AC.c line 1831 in H5AC_protect(): H5C_protect() failed.
major: Object cache
minor: Unable to protect metadata
  #004: ../../../src/H5C.c line 6160 in H5C_protect(): can't load entry
major: Object cache
minor: Unable to load metadata into cache
  #005: ../../../src/H5C.c line 10990 in H5C_load_entry(): unable to load entry
major: Object cache
minor: Unable to load metadata into cache
  #006: ../../../src/H5Ocache.c line 154 in H5O_load(): unable to read object 
header
major: Object header
minor: Read failed
  #007: ../../../src/H5Fio.c line 113 in H5F_block_read(): read from metadata 
accumulator failed
major: Low-level I/O
minor: Read failed
  #008: ../../../src/H5Faccum.c line 196 in H5F_accum_read(): driver read 
request failed
major: Low-level I/O
minor: Read failed
  #009: ../../../src/H5FDint.c line 142 in H5FD_read(): driver read request 
failed
major: Virtual File Layer
minor: Read failed
  #010: ../../../src/H5FDsec2.c line 739 in H5FD_sec2_read(): addr overflow
major: Invalid arguments to routine
minor: Address overflowed


The problem only seems to occur when [] is passed to Test; for example, 
saving the output of Test([1:4]) works just fine for me.  (For the record, 
I am working with real data that will sometimes be empty; it took me a 
while to track down that empty vectors were causing the problem.)  And from 
what I can tell, the resultant file doesn't seem to be corrupted.  If 
somebody can help me out, I'd greatly appreciate it.


Re: [julia-users] Re: Issue with broadcast

2013-12-16 Thread Zachary Roth
I looked around a bit, and it seems that there is some plan to attach 
return types to methods.  Ideally, it would be nice if this information 
could eventually be used to determine the type in question without having 
to explicitly specify it; but it also seems that doing this could be 
problematic since different methods (with different output types) could 
potentially be chosen for each entry of the resultant array.  So, at this 
point, it seems to me that knowing the output type is so important that 
omitting the type should be thought of as a special case.  (In fact, I had 
played around with `broadcast` before this; and I ran into some issue then, 
too.  I suspect that this was also my problem then.)  That said, my 
preference would be that the type should be toward the beginning of the 
argument list as Toivo originally suggested.