Hi Anthony, Argh.. That is really bad..:( Can you share a snippet of your code so that we can repro it locally, fix the bug and add it to the nightlies to catch future regressions? A couple of questions though: - Is this seen only through the Linux VFS interface? Does it work if it uses the pvfs system interfaces/MPI-IO? - What distro and/or glibc version on server?
As regards to this > For the first instance of the hole, I see "Posted UNKNOWN" in > the log. The offset (1301371504) corresponds with where the first > hole is in my test file. The message itself is harmless. It is a buglet in client-state-machine.c's PINT_client_get_name_str() since we don't have an entry for PVFS_CLIENT_PERF_COUNT_TIMER. Hence it prints "UNKNOWN". THis gets called only on the pvfs2-client-core startup though and what is weird is why it appears more than twice. I suspect a bug in the file offset handling in the kernel code but require some information on what interface is being used (readv/writev/aio/...?) and if possible the code itself.. thanks, Murali On 10/5/07, Anthony Tong <[EMAIL PROTECTED]> wrote: > I'm getting file holes with pvfs 2.6.3 on linux 2.6 (rhel4 kernels i386, > and a vanilla 2.6 as well) on a test system and can consistently reproduce > them. Holes are about 16k+ of zeros. > > Simple setup: 4 io servers, 1 metadata, over TCP, these nodes also mount > the filesystem. > > I am writing gigabyte files sequentially from a client via the vfs > interface > > I finally had some time to do some debugging this morning and here's > what I have found so far. "io,client" is the gossip mask on for > pvfs2-client-core-threaded. > > > Snippet from output of cmp -l good.file corrupt.file > 1301371505 127 0 > 1301371506 376 0 > 1301371509 115 0 > 1301371510 221 0 > 1301371511 132 0 > ... (and so forth till).. > 1301438544 110 0 > 1571986033 7 0 > > Searching for other "Posted UNKNOWN" messages and if there's a > file_req_off nearby, it corresponds for other holes as well. > > Gossip snippets > > [D 11:09:40.531507] * mem req size is 67040, file_req size is 67040 (bytes) > [D 11:09:40.531534] bstream_size = 325343856, datafile nr=1, ct=4, > file_req_off = 1301371504 > [D 11:09:40.531712] posted flow for context 0xb4bfd720 > [D 11:09:40.531790] preposting write ack for context 0xb4bfd720. > [D 11:09:41.563356] Posted UNKNOWN (waiting for test) > [D 11:09:41.563558] Posted UNKNOWN (waiting for test) > [D 11:09:41.640702] get_config state: server_get_config_setup_msgpair > [D 11:09:41.641900] Posted PVFS_SYS_FS_ADD (waiting for test) > [D 11:09:41.644099] * Adding new dynamic mount point <DYNAMIC-1> [7,0] > [D 11:09:41.644148] PINT_server_config_mgr_add_config: adding config 0x84e6680 > [D 11:09:41.644177] mapped fs_id 1867692515 => config 0x84e6680 > [D 11:09:41.644218] Set min handle recycle time to 360 seconds > [D 11:09:41.644249] Reloading handle mappings for fs_id 1867692515 > [D 11:09:41.644472] PVFS_isys_io entered [1048186] > [D 11:09:41.644548] (0x84f0c68) io state: io_init > [D 11:09:41.644582] (0x84f0c68) getattr_setup_msgpair > [D 11:09:41.644702] Posted PVFS_SYS_IO (waiting for test) > [D 11:09:41.645097] trying to add object reference to acache > [D 11:09:41.645138] (0x84f0c68) getattr state: getattr_cleanup > [D 11:09:41.645169] (0x84f0c68) io state: io_datafile_setup_msgpairs > [D 11:09:41.645201] - io_find_target_datafiles called > [D 11:09:41.645279] io_find_target_datafiles: datafile[1] might have data > (out=1) > [D 11:09:41.645319] io_find_target_datafiles: datafile[2] might have data > (out=2) > > ... > > [D 11:09:55.609389] * mem req size is 100272, file_req size is 100272 (bytes) > [D 11:09:55.609417] bstream_size = 393019392, datafile nr=0, ct=4, > file_req_off = 1571986032 > [D 11:09:55.609526] posted flow for context 0xb55fd318 > [D 11:09:55.609554] preposting write ack for context 0xb55fd318. > [D 11:09:56.627065] Posted UNKNOWN (waiting for test) > [D 11:09:56.627238] Posted UNKNOWN (waiting for test) > [D 11:09:56.693300] get_config state: server_get_config_setup_msgpair > [D 11:09:56.694529] Posted PVFS_SYS_FS_ADD (waiting for test) > [D 11:09:56.700558] * Adding new dynamic mount point <DYNAMIC-1> [7,0] > [D 11:09:56.700620] PINT_server_config_mgr_add_config: adding config 0x83c6680 > [D 11:09:56.700650] mapped fs_id 1867692515 => config 0x83c6680 > [D 11:09:56.700692] Set min handle recycle time to 360 seconds > [D 11:09:56.700735] Reloading handle mappings for fs_id 1867692515 > [D 11:09:56.700954] PVFS_isys_io entered [1048186] > [D 11:09:56.701033] (0x83d0c68) io state: io_init > [D 11:09:56.701066] (0x83d0c68) getattr_setup_msgpair > [D 11:09:56.701188] Posted PVFS_SYS_IO (waiting for test) > [D 11:09:56.701589] trying to add object reference to acache > [D 11:09:56.701631] (0x83d0c68) getattr state: getattr_cleanup > [D 11:09:56.701663] (0x83d0c68) io state: io_datafile_setup_msgpairs > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
