Re: Let's Not Destroy the World in 2038

2015-12-23 Thread Adam C. Emerson
On 22/12/2015, Gregory Farnum wrote:
[snip]
> So I think we're stuck with creating a new utime_t and incrementing
> the struct_v on everything that contains them. :/
[snip]
> We'll also then need the full feature bit system to make
> sure we send the old encoding to clients which don't understand the
> new one, and to prevent a mid-upgrade cluster from writing data on a
> new node that gets moved to a new node which doesn't understand it.

That is my understanding. I have the impression that network communication
get feature bits for the other nodes and on-disk structures are explicitly
versioned. If I'm mistaken, please hurl corrections at me.

> Given that utime_t occurs in a lot of places, and really can't change
> *again* after this, we probably shouldn't set up the new version with
> versioned encoding?

You're overly pessimistic. I'm hoping our post-human descendents store
their unfathomably alien, reconstructed minds in some galaxy spanning
descendent of Ceph and need more than a 64-bit second count.

However, I agree that the time value itself should not have an encoded
version tag.

To my intuition, the best way forward would be to:

(1) Add non-defaulted feature parameters on encode/decode of utime_t and
ceph::real_time. This will break everything that uses them.

(2) Add explicit encode_old/encode_new functions. that way when we KNOW which
one we want at compile time we don't have to pay for a runtime check.

(3) When we have feature bits, pass them in.

(4) When we have a version, bump it. For new versions, explicitly call
encode_new. When we know we want old, call old.

(5) If there are classes that we encode that have neither feature bits nor
versioning available, see what uses them and act accordingly. Hopefully the
special cases will be few.

Does that seem reasonable?

I thank you.

And all hypothetical post-huamn Ceph users thank you.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9


signature.asc
Description: PGP signature


Let's Not Destroy the World in 2038

2015-12-22 Thread Adam C. Emerson
Comrades,

Ceph's victory is assured. It will be the storage system of The Future.
Matt Benjamin has reminded me that if we don't act fast¹ Ceph will be
responsible for destroying the world.

utime_t() uses a 32-bit second count internally. This isn't great, but it's
something we can fix. ceph::real_time currently uses a 64-bit bit count of
nanoseconds, which is better. And we can change it to something else without
having to rewrite much other code.

The problem lies in our encode/deocde functions for time (both utime_t
and ceph::real_time, since I didn't want to break compatibility.) we
use a 32-bit second count. I would like to change the wire and disk
representation to a 64-bit second count and a 32-bit nanosecond count.

Would there be resistance to a project to do this? I don't know if a
FEATURE bit would help. A FEATURE bit to toggle the width of the second
count would be ideal if it would work. Otherwise it looks like the best
way to do this would be to find all the structures currently ::encoded
that hold time values, bump the version number and have an 'old_utime'
that we use for everything pre-change.

Thank you!

¹ Within the next twenty-three years. But that's not really a long time in the
  larger scheme of things.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9


signature.asc
Description: PGP signature


Re: Let's Not Destroy the World in 2038

2015-12-22 Thread Gregory Farnum
On Tue, Dec 22, 2015 at 12:10 PM, Adam C. Emerson  wrote:
> Comrades,
>
> Ceph's victory is assured. It will be the storage system of The Future.
> Matt Benjamin has reminded me that if we don't act fast¹ Ceph will be
> responsible for destroying the world.
>
> utime_t() uses a 32-bit second count internally. This isn't great, but it's
> something we can fix. ceph::real_time currently uses a 64-bit bit count of
> nanoseconds, which is better. And we can change it to something else without
> having to rewrite much other code.
>
> The problem lies in our encode/deocde functions for time (both utime_t
> and ceph::real_time, since I didn't want to break compatibility.) we
> use a 32-bit second count. I would like to change the wire and disk
> representation to a 64-bit second count and a 32-bit nanosecond count.
>
> Would there be resistance to a project to do this? I don't know if a
> FEATURE bit would help. A FEATURE bit to toggle the width of the second
> count would be ideal if it would work. Otherwise it looks like the best
> way to do this would be to find all the structures currently ::encoded
> that hold time values, bump the version number and have an 'old_utime'
> that we use for everything pre-change.

Unfortunately, we include utimes in structures that are written to
disk. So I think we're stuck with creating a new utime_t and
incrementing the struct_v on everything that contains them. :/

Of course, we'll also then need the full feature bit system to make
sure we send the old encoding to clients which don't understand the
new one, and to prevent a mid-upgrade cluster from writing data on a
new node that gets moved to a new node which doesn't understand it.

Given that utime_t occurs in a lot of places, and really can't change
*again* after this, we probably shouldn't set up the new version with
versioned encoding?
-Greg

>
> Thank you!
>
> ¹ Within the next twenty-three years. But that's not really a long time in the
>   larger scheme of things.
>
> --
> Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
> IRC: Aemerson@{RedHat, OFTC, Freenode}
> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html