Hi Murali, no this didn't help. (In my versions (cvs and 1.4.0) I didn't see a src/io/bmi/bmi-tcp.c and I made the chance in src/io/bmi/bmi_tcp/bmi-tcp.c on line 2050). --------8<--------------- [EMAIL PROTECTED] bmi_tcp]# /usr/local/pvfs2_nodes/bin/pvfs2-cp sockio.o /mnt/pvfs2/ [E 10:07:29.224827] Receive immediately failed: Value too large for defined data type [E 10:07:29.224933] msgpair failed, will retry:: Value too large for defined data type [E 10:07:29.224970] *** msgpairarray_completion_fn: msgpair to server tcp://node10:3334 failed: Value too large for defined data type [E 10:07:29.224981] *** Non-BMI failure. [E 10:07:29.224991] getattr_object_getattr_failure : Value too large for defined data type PVFS_sys_create: Value too large for defined data type Could not open /mnt/pvfs2/ Segmentation fault -------->8---------------
BTW, after I set up a 1.8TB pvfs2, using the kernel module and "cp", I get a segfault copying within the pvfs2 filesystem. According to "diff" the file copy itself succeeded. ... Matt Murali Vilayannur wrote: >Hi Matt, >Could you replace line 2083 in src/io/bmi/bmi-tcp.c from >int copy_size = 0; >to bmi_size_t copy_size = 0; >and see if that helps? >Thanks, >Murali > >On Fri, 26 May 2006, Matt wrote: > > > >>Hi >> >>I did some more testing. >> >>My problems may to be related to a 2TB limit. If my combined storage size >>is below 2TB (9 nodes a 200GB), everything seems to work. However, if >>I use 10 or more nodes I get the reported error. >> >>The same holds if I stay away from our head node and use only Mandriva >>nodes with >>FC4 kernel. However, in this case the error message changes: >> >>---------8<--------------- >>[EMAIL PROTECTED] pvfs2]# /usr/local/pvfs2_nodes/bin/pvfs2-cp -t >>/home/munnich/Soft/pvfs2/pvfs2-1.4.0.tar.gz /mnt/pvfs2/testfile >>[E 14:22:06.071020] Receive immediately failed: Value too large for >>defined data type >>[E 14:22:06.071114] msgpair failed, will retry:: Value too large for >>defined data type >>[E 14:22:06.071159] *** msgpairarray_completion_fn: msgpair to server >>tcp://node10:3334 failed: Value too large for defined data type >>[E 14:22:06.071170] *** Non-BMI failure. >>[E 14:22:06.071179] getattr_object_getattr_failure : Value too large for >>defined data type >>PVFS_sys_create: Value too large for defined data type >>Could not open /mnt/pvfs2/testfile >>Segmentation fault (core dumped) >>--------->8--------------- >> >>Does this behavior ring a bell? >> >>... Matt >> >> >>Log files >>[EMAIL PROTECTED] pvfs2]# cat /tmp/pvfs2-server.log >>[D 14:21:59.274612] PVFS2 Server version 1.4.1pre1-2006-05-25-230553 >>starting. >>[D 14:21:59.275359] Passing tcp://node10:3334 as BMI listen address. >>[D 14:21:59.275417] BMI_tcp_initialize: Initializing TCP/IP module. >>[D 14:21:59.275495] BMI_tcp_initialize: TCP/IP module successfully >>initialized. >>[D 14:21:59.276813] dbpf_thread_initialize: initialized >>[D 14:21:59.278074] collection lookup: version is 0.1.2 >>[D 14:21:59.278238] dbpf_thread_function started >>[D 14:21:59.278312] - set handle re-use timeout to 360 seconds (ret=0) >>[D 14:21:59.301219] File system pvfs2-fs using handles: >> 4-390451575 >>[D 14:21:59.301276] Sync on metadata update for pvfs2-fs: yes >>[D 14:21:59.301287] Sync on I/O data update for pvfs2-fs: no >>[D 14:21:59.301320] Storage Init Complete (aio-threaded) >>[D 14:21:59.301331] 1 filesystem(s) initialized >>[D 14:21:59.301816] Initialization completed successfully. >>[D 14:22:06.068882] handle_new_connection: Assigning socket 12 to new >>method addr. >>[D 14:22:06.068956] tcp_do_work_recv: Reading header for new op. >>[D 14:22:06.068972] tcp_do_work_recv: Received new message; mode: 2. >>[D 14:22:06.068983] tcp_do_work_recv: tag: 1 >>[D 14:22:06.069054] (0x5e1b70) getconfig (prelude sm) state: req_sched >>[D 14:22:06.069118] (0x5e1b70) getconfig (prelude sm) state: >>getattr_if_needed >>[D 14:22:06.069132] (0x5e1b70) getconfig (prelude sm) state: perm_check >>(status = 0) >>[D 14:22:06.069147] (0x5e1b70) getconfig state: init >>[D 14:22:06.069162] (0x5e1b70) getconfig (FR sm) state: release: >>(error_code = 0) >>[D 14:22:06.069179] (0x5e1b70) getconfig (FR sm) state: send_resp >>(status = 0) >>[D 14:22:06.069204] BMI_post_send_list: addr: 65, count: 1, total_size: 1632 >>[D 14:22:06.069216] element 0: offset: 0x61b6f0, size: 1632 >>[D 14:22:06.069258] BMI_tcp_post_send_generic: Sent: 1632 bytes of data. >>[D 14:22:06.069273] (0x5e1b70) getconfig (FR sm) state: cleanup >>[D 14:22:06.069305] (0x5e1b70) getconfig state: cleanup >> >>[EMAIL PROTECTED] ~]# cat /tmp/pvfs2-server.log >>[D 13:44:33.574416] PVFS2 Server version 1.4.1pre1-2006-05-25-230553 >>starting. >>[E 13:44:37.699990] >>PVFS2 server got signal 15 (server_status_flag: 262143) >>[D 13:44:39.722231] PVFS2 Server version 1.4.1pre1-2006-05-25-230553 >>starting. >>[E 13:47:29.145477] >>PVFS2 server got signal 15 (server_status_flag: 262143) >>[D 13:47:31.168877] PVFS2 Server version 1.4.1pre1-2006-05-25-230553 >>starting. >>[D 13:47:31.169626] Passing tcp://node2:3334 as BMI listen address. >>[D 13:47:31.169682] BMI_tcp_initialize: Initializing TCP/IP module. >>[D 13:47:31.169755] BMI_tcp_initialize: TCP/IP module successfully >>initialized. >>[D 13:47:31.171265] dbpf_thread_initialize: initialized >>[D 13:47:31.172522] collection lookup: version is 0.1.2 >>[D 13:47:31.172670] - set handle re-use timeout to 360 seconds (ret=0) >>[D 13:47:31.172826] dbpf_thread_function started >>[D 13:47:31.172883] File system pvfs2-fs using handles: >> 1171354720-1561806291 >>[D 13:47:31.172895] Sync on metadata update for pvfs2-fs: yes >>[D 13:47:31.172909] Sync on I/O data update for pvfs2-fs: no >>[D 13:47:31.172942] Storage Init Complete (aio-threaded) >>[D 13:47:31.172953] 1 filesystem(s) initialized >>[D 13:47:31.173461] Initialization completed successfully. >> >> >> >>[EMAIL PROTECTED] pvfs2]# cat /etc/pvfs2-fs.conf >><Defaults> >> UnexpectedRequests 50 >> LogFile /tmp/pvfs2-server.log >> EventLogging storage,network,server >> LogStamp usec >> BMIModules bmi_tcp >> FlowModules flowproto_multiqueue >> PerfUpdateInterval 1000 >> ServerJobBMITimeoutSecs 30 >> ServerJobFlowTimeoutSecs 30 >> ClientJobBMITimeoutSecs 300 >> ClientJobFlowTimeoutSecs 300 >> ClientRetryLimit 5 >> ClientRetryDelayMilliSecs 2000 >></Defaults> >> >><Aliases> >> Alias node1 tcp://node10:3334 >> Alias node10 tcp://node11:3334 >> Alias node11 tcp://node1:3334 >> Alias node2 tcp://node2:3334 >> Alias node3 tcp://node3:3334 >> Alias node4 tcp://node4:3334 >> Alias node5 tcp://node5:3334 >> Alias node6 tcp://node6:3334 >> Alias node7 tcp://node7:3334 >> Alias node8 tcp://node8:3334 >> Alias node9 tcp://node9:3334 >></Aliases> >> >><Filesystem> >> Name pvfs2-fs >> ID 833677876 >> RootHandle 1048576 >> <MetaHandleRanges> >> Range node1 4-390451575 >> </MetaHandleRanges> >> <DataHandleRanges> >> Range node10 390451576-780903147 >> Range node11 780903148-1171354719 >> Range node2 1171354720-1561806291 >> Range node3 1561806292-1952257863 >> Range node4 1952257864-2342709435 >> Range node5 2342709436-2733161007 >> Range node6 2733161008-3123612579 >> Range node7 3123612580-3514064151 >> Range node8 3514064152-3904515723 >> Range node9 3904515724-4294967295 >> </DataHandleRanges> >> <StorageHints> >> TroveSyncMeta yes >> TroveSyncData no >> AttrCacheKeywords datafile_handles,metafile_dist >> AttrCacheKeywords dir_ent, symlink_target >> AttrCacheSize 4093 >> AttrCacheMaxNumElems 32768 >> </StorageHints> >></Filesystem> >> >> >> >> > > > > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
