Re: Compiling for FreeBSD, trouble in buffer.c

2015-12-11 Thread Willem Jan Withagen

On 10-12-2015 16:03, Willem Jan Withagen wrote:

I have a failure in:
  ./unittest_erasure_code_shec_arguments
All tests befor this PASS. (other than rbd which is disabled to
the time being)

Which I traceback to code in ErasureCodeShec.cc
Line 218:
 unsigned blocksize = (*chunks.begin()).second.length();
After a few iterations I get a "negative" blocksize, which causes
allocations further on to really thrash the system out of swap.

At first I expected it could be due to a Clang typecasting problem.
But after more debugging I found the following in
buffer.h
 unsigned length() const {
#if 0
   // DEBUG: verify _len
   unsigned len = 0;
   for (std::list::const_iterator it = _buffers.begin();
it != _buffers.end();
it++) {
 len += (*it).length();
   }
   assert(len == _len);
#endif
   return _len;
 }

Which suggests that debugging was needed at this point earlier in life.
If I enable this debug block, I do get the assert affected.

Now the next question is why? Given the debug snippet it needed
analyzing before.
And the derived question then is:
 What is the easiest path to find out what is actually wrong here.



A further followup on this.

After some extensive debugging with gdb and watches, I've come to the 
conclusion

That the location of _len is used by more that one part of the code...
The location gets alternately written during:
TestErasureCodeShec_arguments.cc:136
shec_table.insert(std::make_pair(table_key,table_value));

Old value = 63015016
New value = 4294954344

Old value = 4294954344
New value = 63015016
.

To retain this value 4294954344, which is definitely not the length.
Because printing values on the Linux variant, it gives 32. Which sounds 
much more

sensible

So there a few possibilities that I can think of:
 1) Clang gets it wrong
 2) There is a mixup of different type of libs that make for different 
offsets in

the bufferlist structs
 3) the bufferlist code is has portability issues
 4) the bufferlist code has errors that do no show with gcc

Most likely it will be either 2) or 3) 
But other suggestions are welcome...

And since bufferlists are at the center of Ceph, better get things right.
So I'm going to go over the test/bufferlist.cc code and see what is in 
there.
And/or extract a less convoluted example from 
TestErasureCodeShec_arguments.cc

and see if it is in there as well.

--WjW






--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compiling for FreeBSD, trouble in buffer.c

2015-12-10 Thread Willem Jan Withagen

I have a failure in:
 ./unittest_erasure_code_shec_arguments
All tests befor this PASS. (other than rbd which is disabled to
the time being)

Which I traceback to code in ErasureCodeShec.cc
Line 218:
unsigned blocksize = (*chunks.begin()).second.length();
After a few iterations I get a "negative" blocksize, which causes
allocations further on to really thrash the system out of swap.

At first I expected it could be due to a Clang typecasting problem.
But after more debugging I found the following in
buffer.h
unsigned length() const {
#if 0
  // DEBUG: verify _len
  unsigned len = 0;
  for (std::list::const_iterator it = _buffers.begin();
   it != _buffers.end();
   it++) {
len += (*it).length();
  }
  assert(len == _len);
#endif
  return _len;
}

Which suggests that debugging was needed at this point earlier in life.
If I enable this debug block, I do get the assert affected.

Now the next question is why? Given the debug snippet it needed 
analyzing before.

And the derived question then is:
What is the easiest path to find out what is actually wrong here.

All suggestions welcome.

--WjW
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html