Re: QPID-1148, r671604, lockf, flock and fcntl

Alan Conway Fri, 27 Jun 2008 09:01:22 -0700

On Fri, 2008-06-27 at 17:14 +0200, Manuel Teira wrote:
> Alan Conway escribió:
> > On Thu, 2008-06-26 at 12:12 +0200, Manuel Teira wrote:
> >   
> >> Hello.
> >> After further  investigation and tests, related with the change in
> >> r671604 to drop the file locking strategy in favour of a flock on the
> >> data dir.
> >>
> >> Trying to write a similar code, but using lockf, I hit the issue that
> >> the file must be opened using O_RDWR or O_RWONLY, and that's not allowed
> >> for a directory.
> >> The same happens trying to use a fcntl call.
> >> And unexpectedly, the same for flock. In the solaris manual page:
> >>
> >> <snip>
> >>      Read permission is required on a file  to  obtain  a  shared
> >>      lock,   and  write  permission  is  required  to  obtain  an
> >>      exclusive lock.
> >> </snip>
> >>
> >> But the linux man page claims:
> >>
> >> <snip>
> >> A shared or exclusive lock can be placed on a file regardless of the
> >> mode in which the file was opened.
> >> </snip>
> >>
> >> I've searched the web for some BSD system pages, but they don't say
> >> anything about the file mode.
> >>
> >>
> >> On the other way, POSIX fcntl specification says, apropos the failure
> >> causes:
> >>
> >> [EBADF]
> >>     The /fildes/ argument is not a valid open file descriptor, or the
> >>     argument /cmd/ is F_SETLK or F_SETLKW, the type of lock, *l_type*,
> >>     is a shared lock (F_RDLCK), and /fildes/ is not a valid file
> >>     descriptor open for reading, or the type of lock *l_type*, is an
> >>     exclusive lock (F_WRLCK), and /fildes/ is not a valid file
> >>     descriptor open for writing.
> >>
> >> Posix specs also forces write permissions for lockf:
> >> http://www.opengroup.org/onlinepubs/007908799/xsh/lockf.html
> >>
> >>
> >>
> >> This leads to solaris not being able to lock directly on a directory,
> >> I'm afraid. Any idea?
> >>     
> >
> >
> > Yes, we can create (if it doesn't already exist) a lock file in the
> > directory and then use lockf to lock it. There's already code in
> > Daemon.cpp that does exactly this for the PID file. The reason I
> > switched to flock was because crashing or killed brokers were sometimes
> > leaving the lock file behind them, whereas a flock (or lockf)  lock is
> > automatically released when the process exits.
> >
> > We need to
> >  - create a qpid::sys::LockFile class that can be re-implemented on
> > different platforms.
> >  - use the Daemon.cpp code as the posix implementation.
> >  - Replace the locking code in Daemon.cpp and DataDir.cpp with the
> > common sys::LockFile.
> >
> > It's JIRA https://issues.apache.org/jira/browse/QPID-1158
> > Could you take this on Manuel? I'll can do it but it may take a couple
> > days to get to it.
> >   
> Of course, I will try (will try to start on monday). By the moment I've 
> reverted changes to keep using the old DataDir.cpp code. I was able to 
> pass most of the tests on solaris (more changes about bashisms needed, 
> though), I will have to take a look about some random message, but this 
> is a dump of a 'make check' session now:
> 
> -bash-3.00$ make check
> make  libshlibtest.la libdlclose_noop.la unit_test  perftest  txtest 
> latencytest client_test  topic_listener topic_publisher  publish consume
> `libshlibtest.la' is up to date.
> `libdlclose_noop.la' is up to date.
> `unit_test' is up to date.
> `perftest' is up to date.
> `txtest' is up to date.
> `latencytest' is up to date.
> `client_test' is up to date.
> `topic_listener' is up to date.
> `topic_publisher' is up to date.
> `publish' is up to date.
> `consume' is up to date.
> make  check-TESTS
> Running 154 test cases...
> 2008-jun-27 17:09:18 error Exception in client dispatch thread: 
> Connection closed by broker
> 
> *** No errors detected
> PASS: unit_test
> PASS: start_broker
> PASS: client_test
> SubscribeThread exception: Sequence error: expected  n==1 but got 0 
> (perftest.cpp:524)
> FAIL: quick_perftest
> PASS: quick_topictest
> sh: objdump: not found
> test_example (tests_0-10.example.ExampleTest) ... ok
> test_auto_rollback (tests_0-10.tx.TxTests) ... ok
> test_commit (tests_0-10.tx.TxTests) ... ok
> test_rollback (tests_0-10.tx.TxTests) ... ok
> test_broker_connectivity (tests_0-10.management.ManagementTest) ... ok
> test_self_session_id (tests_0-10.management.ManagementTest) ... ok
> test_standard_exchanges (tests_0-10.management.ManagementTest) ... ok
> test_system_object (tests_0-10.management.ManagementTest) ... ok
> test_bad_resume (tests_0-10.dtx.DtxTests) ... ok
> test_commit_unknown (tests_0-10.dtx.DtxTests) ... ok
> test_end (tests_0-10.dtx.DtxTests) ... ok
> test_end_suspend_and_fail (tests_0-10.dtx.DtxTests) ... ok
> test_end_unknown_xid (tests_0-10.dtx.DtxTests) ... ok
> test_forget_xid_on_completion (tests_0-10.dtx.DtxTests) ... ok
> test_get_timeout (tests_0-10.dtx.DtxTests) ... ok
> test_get_timeout_unknown (tests_0-10.dtx.DtxTests) ... ok
> test_implicit_end (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_commit_not_ended (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_commit_one_phase_false (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_commit_one_phase_true (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_prepare_not_ended (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_rollback_not_ended (tests_0-10.dtx.DtxTests) ... ok
> test_prepare_unknown (tests_0-10.dtx.DtxTests) ... ok
> test_recover (tests_0-10.dtx.DtxTests) ... ok
> test_rollback_unknown (tests_0-10.dtx.DtxTests) ... ok
> test_select_required (tests_0-10.dtx.DtxTests) ... ok
> test_set_timeout (tests_0-10.dtx.DtxTests) ... ok
> test_simple_commit (tests_0-10.dtx.DtxTests) ... ok
> test_simple_prepare_commit (tests_0-10.dtx.DtxTests) ... ok
> test_simple_prepare_rollback (tests_0-10.dtx.DtxTests) ... ok
> test_simple_rollback (tests_0-10.dtx.DtxTests) ... ok
> test_start_already_known (tests_0-10.dtx.DtxTests) ... ok
> test_start_join (tests_0-10.dtx.DtxTests) ... ok
> test_start_join_and_resume (tests_0-10.dtx.DtxTests) ... ok
> test_suspend_resume (tests_0-10.dtx.DtxTests) ... ok
> test_suspend_start_end_resume (tests_0-10.dtx.DtxTests) ... ok
> test_delete_while_used_by_exchange 
> (tests_0-10.alternate_exchange.AlternateExchangeTests) ... ok
> test_delete_while_used_by_queue 
> (tests_0-10.alternate_exchange.AlternateExchangeTests) ... ok
> test_queue_delete (tests_0-10.alternate_exchange.AlternateExchangeTests) 
> ... ok
> test_unroutable (tests_0-10.alternate_exchange.AlternateExchangeTests) 
> ... ok
> test (tests_0-10.exchange.DeclareMethodPassiveFieldNotFoundRuleTests) ... ok
> testDefaultExchange (tests_0-10.exchange.DefaultExchangeRuleTests) ... ok
> testHeadersBindNoMatchArg (tests_0-10.exchange.ExchangeTests) ... ok
> testMatchAll (tests_0-10.exchange.HeadersExchangeTests) ... ok
> testMatchAny (tests_0-10.exchange.HeadersExchangeTests) ... ok
> testDifferentDeclaredType (tests_0-10.exchange.MiscellaneousErrorsTests) 
> ... ok
> testTypeNotKnown (tests_0-10.exchange.MiscellaneousErrorsTests) ... ok
> testDirect (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
> testFanout (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
> testHeaders (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
> testTopic (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
> testAmqDirect (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
> testAmqFanOut (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
> testAmqMatch (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
> testAmqTopic (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
> test_ack_and_no_ack (tests_0-10.broker.BrokerTests) ... ok
> test_simple_delivery_immediate (tests_0-10.broker.BrokerTests) ... ok
> test_simple_delivery_queued (tests_0-10.broker.BrokerTests) ... ok
> test_ack (tests_0-10.message.MessageTests) ... ok
> test_acquire (tests_0-10.message.MessageTests) ... ok
> test_acquire_with_no_accept_and_credit_flow 
> (tests_0-10.message.MessageTests) ... ok
> test_cancel (tests_0-10.message.MessageTests) ... ok
> test_consume_exclusive (tests_0-10.message.MessageTests) ... ok
> test_consume_exclusive2 (tests_0-10.message.MessageTests) ... ok
> test_consume_queue_not_found (tests_0-10.message.MessageTests) ... ok
> test_consume_queue_not_specified (tests_0-10.message.MessageTests) ... ok
> test_consume_unique_consumers (tests_0-10.message.MessageTests) ... ok
> test_credit_flow_bytes (tests_0-10.message.MessageTests) ... ok
> test_credit_flow_messages (tests_0-10.message.MessageTests) ... ok
> test_empty_body (tests_0-10.message.MessageTests) ... ok
> test_incoming_start (tests_0-10.message.MessageTests) ... ok
> test_no_local (tests_0-10.message.MessageTests) ... ok
> test_no_local_awkward (tests_0-10.message.MessageTests) ... ok
> test_no_local_exclusive_subscribe (tests_0-10.message.MessageTests) ... ok
> test_ranged_ack (tests_0-10.message.MessageTests) ... ok
> test_reject (tests_0-10.message.MessageTests) ... ok
> test_release (tests_0-10.message.MessageTests) ... ok
> test_release_ordering (tests_0-10.message.MessageTests) ... ok
> test_release_unacquired (tests_0-10.message.MessageTests) ... ok
> test_subscribe_not_acquired (tests_0-10.message.MessageTests) ... ok
> test_subscribe_not_acquired_2 (tests_0-10.message.MessageTests) ... ok
> test_subscribe_not_acquired_3 (tests_0-10.message.MessageTests) ... ok
> test_window_flow_bytes (tests_0-10.message.MessageTests) ... ok
> test_window_flow_messages (tests_0-10.message.MessageTests) ... ok
> test_ack_message_from_deleted_queue 
> (tests_0-10.persistence.PersistenceTests) ... ok
> test_delete_queue_after_publish 
> (tests_0-10.persistence.PersistenceTests) ... ok
> test_queue_deletion (tests_0-10.persistence.PersistenceTests) ... ok
> test_autodelete_shared (tests_0-10.queue.QueueTests) ... ok
> test_bind (tests_0-10.queue.QueueTests) ... ok
> test_bind_queue_existence (tests_0-10.queue.QueueTests) ... ok
> test_declare_exclusive (tests_0-10.queue.QueueTests) ... ok
> test_declare_passive (tests_0-10.queue.QueueTests) ... ok
> test_delete_ifempty (tests_0-10.queue.QueueTests) ... ok
> test_delete_ifunused (tests_0-10.queue.QueueTests) ... ok
> test_delete_queue_exists (tests_0-10.queue.QueueTests) ... ok
> test_delete_simple (tests_0-10.queue.QueueTests) ... ok
> test_purge (tests_0-10.queue.QueueTests) ... ok
> test_purge_empty_name (tests_0-10.queue.QueueTests) ... ok
> test_purge_queue_exists (tests_0-10.queue.QueueTests) ... ok
> test_unbind_direct (tests_0-10.queue.QueueTests) ... ok
> test_unbind_fanout (tests_0-10.queue.QueueTests) ... ok
> test_unbind_headers (tests_0-10.queue.QueueTests) ... ok
> test_unbind_topic (tests_0-10.queue.QueueTests) ... ok
> test_exchange_bound_direct (tests_0-10.query.QueryTests) ... ok
> test_exchange_bound_fanout (tests_0-10.query.QueryTests) ... ok
> test_exchange_bound_header (tests_0-10.query.QueryTests) ... ok
> test_exchange_bound_topic (tests_0-10.query.QueryTests) ... ok
> test_exchange_query (tests_0-10.query.QueryTests) ... ok
> test_queue_query (tests_0-10.query.QueryTests) ... ok
> test_queue_query_unknown (tests_0-10.query.QueryTests) ... ok
> 
> ----------------------------------------------------------------------
> Ran 110 tests in 88.510s
> 
> OK
> PASS: python_tests
> PASS: stop_broker
> Running federation tests using brokers on ports 45428 45429
> sh: objdump: not found
> test_bridge_create_and_close (federation.FederationTests) ... ok
> test_pull_from_exchange (federation.FederationTests) ... ok
> test_pull_from_queue (federation.FederationTests) ... ok
> test_tracing (federation.FederationTests) ... ok
> 
> ----------------------------------------------------------------------
> Ran 4 tests in 48.880s
> 
> OK
> PASS: run_federation_tests
> ==============================================
> 1 of 8 tests failed
> Please report to [email protected]
> ==============================================
> 
> 
> 
> 
> Only a test is failing. There's also a weird message during unit_test 
> (Exception in client dispatch thread: Connection closed by broker), and


That is not an error, its comming from a test that deliberately provokes
various error conditions. It's being printed because the broker logs
errors on stderr by default. I can fix the tests to hide this message,
thanks for reminding me.

>  
> also those "sh: objdump not found" messages I'm still not sure where 
> they're coming from, since at a first look I was not able to find any 
> objdump invocation. Other than that, it gives me hope about having a 
> solaris working version soon.

It looks fantastic, definitely ready for a test drive on Linux. Will try
to do this next week.

Re: QPID-1148, r671604, lockf, flock and fcntl

Reply via email to