Hi Sam

When running my application I got this assert sometimes

"src/client/sysint/sys-io.sm:1860: io_post_write_ack_recv:
                                                Assertion `ret == 0'
failed:"

I tried to get this assert by running one of the tests included in pvfs2
package

I build this setup:

4 pvfs server with storage space on the same machine and client runs as well
on this machine.

I made small modification in io-stress.c:

1. I linked Test module with pvfs-threaded instead of pvfs only
2. I changed write block size to 1MB
2. I am running the test in a loop and writing in each iteration 50MB.

Running this test reproduce the problem.

I realized that:

1. This problem seems like a race between the thread responsible of sending
   Write Buffer and BMI thread.
   
2. Because everything runs on the same machine and latency of message is

   Very low the reply returns very fast.

3. To make this assumption more clear to me, I run all 4 pvfs servers on a

   RAM Disk and the Assert reproduced in every run.

Appreciate any help
Thanx
Hagai


-----Original Message-----
From: Sam Lang [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 03, 2007 3:37 PM
To: Hagai Avrahami
Subject: Re: Question


On Jul 2, 2007, at 12:38 PM, Hagai Avrahami wrote:

> Hi Sam
>
> I took io-stress changed it a little bit and then tried to write  
> blocks
> after EOF and it all went good.
> So I guess it something I did wrong in my code....
>
> Maybe you have any suggestions what can lead to this assert?

Not without knowing more about the code you've written.  My best  
guess would be that you have memory corruption somewhere else and its  
causing an erroneous error in your code.  That's unlikely though.   
You could try running it in valgrind and see if you get any errors.

>
> Do PVFS2 fill the gaps with 0(zero)?

Yes.

>
> When I am trying to write to offset after EOF how does PVFS2 knows  
> it has
> passed EOF (how does it know the size of file?), as I understood  
> Get Size is
> operations involving query of all IO severs in the collection.

Yes each IO server's stripe size is maintained on the IO server.  The  
client doesn't need to *know* that you're writing to an offset past  
EOF, it just determines the writes that need to be made to each  
server.  Determining the size of the file and the end of file are  
only important for a read operation.

-sam

>
> Thanx
> Hagai
>
>
> -----Original Message-----
> From: Sam Lang [mailto:[EMAIL PROTECTED]
> Sent: Sunday, July 01, 2007 8:24 PM
> To: Hagai Avrahami
> Subject: Re: Question
>
>
> Hi Hagai,
>
> What you're trying to do should work (writing past EOF).  Do you have
> a test program that I could run to reproduce the problem?
>
> -sam
>
> On Jul 1, 2007, at 1:56 PM, Hagai Avrahami wrote:
>
>> Hi Sam
>>
>> I Use PVFS_isys_io(....) to write data
>> And I use PVFS_sys_testsome(...) to fetch all finished operations.
>>
>> I am trying to write file with size of 512MB, Stripe size of 1MB.
>>
>> Every 3 continuous block I write next one with gap of 1 MB....
>> Write 1(Offset - 0), Write 2(Offset 1MB), Write 2(Offset 2MB),
>> Write 2(Offset 4MB), Write 2(Offset 5MB), Write 2(Offset 6MB),
>> Write 2(Offset 8MB),
>>
>> If am writing with no gaps I don't get this Assert and with the
>> gaps It
>> happens every time during the write.
>>
>> Can't I write to offset bigger than the size of the file?
>> I Assumed that if I do so, PVFS2 will complete the gap with 0(zero)?
>> Is this true?
>>
>> Appreciate your help
>> Thanx Hagai
>>
>> -----Original Message-----
>> From: Sam Lang [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, June 28, 2007 6:42 PM
>> To: Hagai Avrahami
>> Subject: Re: Question
>>
>>
>> Hagai,
>>
>> Its not possible to get immediate completion from that particular
>> bmi_recv, because its a post of a receive for write completion, but
>> the post occurs before the write request has been made, so getting a
>> receive immediately isn't going to happen.
>>
>> How are you using IO state machine?
>>
>> -sam
>>
>> On Jun 28, 2007, at 7:36 AM, Hagai Avrahami wrote:
>>
>>> Hi Sam
>>>
>>>
>>>
>>> I am getting this Error when I am running PVFS2 Client
>>>
>>>
>>>
>>> "src/client/sysint/sys-io.sm:1860: io_post_write_ack_recv:
>>> Assertion `ret == 0' failed:"
>>>
>>>
>>>
>>> When I debugged it I found that
>>>
>>>
>>>
>>> ret = job_bmi_recv(
>>>
>>>         cur_ctx->msg.svr_addr, cur_ctx->write_ack.encoded_resp_p,
>>>
>>>         cur_ctx->write_ack.max_resp_sz, cur_ctx->session_tag,
>>>
>>>         BMI_PRE_ALLOC, sm_p, status_user_tag,
>>>
>>>         &cur_ctx->write_ack.recv_status, &cur_ctx- 
>>> >write_ack.recv_id,
>>>
>>>         pint_client_sm_context, JOB_TIMEOUT_INF);
>>>
>>>
>>>
>>>
>>>
>>> Returns 1 for immediate completion,
>>>
>>>  But in io_post_write_ack_recv there is check of assert (ret == 0)
>>>
>>>
>>>
>>> Do I understand well the situation?
>>>
>>>
>>>
>>> Thanx a lot
>>>
>>> Hagai
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> __________ NOD32 2366 (20070701) Information __________
>>
>> This message was checked by NOD32 antivirus system.
>> http://www.eset.com
>>
>>
>
>
> __________ NOD32 2368 (20070701) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>


__________ NOD32 2151 (20070328) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to