thanks for the profiling help;
it worked, but I ended up getting the wrong things profiled, something
weird about profiling an entire suite I guess.
Instead, I used oprofile as an attempt at getting (not necessarily very
accurate) results for the pvfs2-i/o profiling.
The results were fairly straightforward: (I attached the results below)
A couple other (related) BMI/IB questions:
According to the profiling, we're looking at roughly 50% of total
cpu-time being spent inside check_cq() in a given run.
( src/io/bmi/bmi_ib/ib.c:check_cq() )
--I'll further test this on some longer tests tomorrow to get more
accurate readings.. however the initial results show check_cq being
overwhelmingly called.
the polling function inside check_cq() is << 1% so I doesnt appear as
if that is blocking it, which was one of my original thoughts.
Is this normal for the BMI/IB interface to be calling the check_cq()
function so often?
Can we aggregate some of these calls?
Any ideas/thoughts?
thanks,
- Kyle
---128MB test via 'pvfs2-cp -t /tmp/bigfile /mnt/pvfs2' ---
initial Results:
samples % symbol name
2151 51.5210 check_cq
318 7.6168 construct_poll_plan
303 7.2575 BMI_testcontext
257 6.1557 BMI_ib_testcontext
257 6.1557 bmi_thread_function
181 4.3353 PINT_thread_mgr_bmi_push
165 3.9521 job_testcontext
98 2.3473 gen_posix_mutex_lock
72 1.7246 do_one_work_cycle_all
60 1.4371 gen_posix_mutex_unlock
58 1.3892 completion_query_context
36 0.8623 .plt
35 0.8383 job_desc_q_shownext
35 0.8383 test_rq
18 0.4311 ibv_poll_cq
17 0.4072 test_sq
9 0.2156 PINT_process_request
6 0.1437 PINT_distribute
5 0.1198 encourage_send_incoming_cts
4 0.0958 BMI_ib_post_send_list
4 0.0958 PINT_acache_finalize
4 0.0958 hash_key
4 0.0958 mem_to_bmi_callback_fn
4 0.0958 qhash_init
3 0.0719 __qlist_add
3 0.0719 generic_post_send
3 0.0719 id_gen_safe_register
2 0.0479 Malloc
2 0.0479 PINT_request_disp
2 0.0479 PINT_state_machine_next
2 0.0479 build_context_flow
2 0.0479 contiguous_length
2 0.0479 id_gen_safe_unregister
2 0.0479 io_find_target_datafiles
2 0.0479 lebf_encode_req
2 0.0479 logical_to_physical_offset
2 0.0479 memcache_deregister
2 0.0479 post_sr_rdmaw
2 0.0479 qhash_search_and_remove
1 0.0240 BMI_ib_method_addr_lookup
1 0.0240 BMI_post_send_list
1 0.0240 PINT_acache_lookup
1 0.0240 PINT_cached_config_map_to_server
1 0.0240 PINT_client_state_machine_post
1 0.0240 PINT_client_state_machine_test
1 0.0240 PINT_copy_object_attr
1 0.0240 PINT_decode
1 0.0240 PINT_dotconf_handle_command
1 0.0240 PINT_encode_release
1 0.0240 PINT_req_sched_finalize
1 0.0240 PINT_subreq
1 0.0240 PVFS_Request_hvector
1 0.0240 PVFS_util_parse_pvfstab
1 0.0240 __PINT_server_config_mgr_get_config
1 0.0240 __PINT_server_config_mgr_put_config
1 0.0240 __job_time_mgr_add
1 0.0240 __libc_csu_init
1 0.0240 __qlist_del
1 0.0240 bmi_thread_mgr_callback
1 0.0240 check_context_status
1 0.0240 copy_filesystem
1 0.0240 create_cleanup
1 0.0240 do_one_test_cycle_req_sched
1 0.0240 encode_PVFS_handle_extent_array
1 0.0240 encode_PVFS_server_req
1 0.0240 ensure_connected
1 0.0240 hash_key_compare
1 0.0240 ibv_post_send
1 0.0240 io_analyze_results
1 0.0240 io_datafile_setup_msgpairs
1 0.0240 lebf_decode_resp
1 0.0240 memcache_lookup_cover
1 0.0240 post_sr_ack
1 0.0240 qhash_search
1 0.0240 qlist_add_tail
1 0.0240 qlist_del
1 0.0240 qlist_empty
1 0.0240 qlist_try_del_head
1 0.0240 ref_list_search_addr
1 0.0240 skip_whitespace
Sam Lang wrote:
The server should shutdown gracefully (cleanup and then exit 0) with
a SIGHUP (kill -1 <pid>). There's signal handling code in the server
that is supposed to catch SIGHUP at least, and begin shutdown. Can
you try kill -1 and see if that works for you?
-sam
On Feb 20, 2006, at 5:49 PM, Kyle Schochenmaier wrote:
Is there any way to stop the pvfs2-server without using the kill
signal?
I'd like to be able to do a profile of some of the execution so I
can work out some of the bottlenecks I'm having with the openIB-bmi
port, however, in order for this to work, I need to have a normal;
non-KILL signal termination for the process. I looked through some
of the code, and wasnt able to find anything, and it appears the /
etc/init.d/* scripts are just kill signals as well.
From the gprof documentation:
"In order to write the `gmon.out' file properly, your program must
exit normally: by returning from main or by calling exit. Calling
the low-level function _exit does not write the profile data, and
neither does abnormal termination due to an unhandled signal."
I'd like to do the profile on the server-side i/o, not the client-
side test programs...
Any ideas?
thanks,
- Kyle
--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
!DSPAM:43fa5b47127469535261939!
--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers