Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-17 Thread H. Peter Anvin
On 12/17/2012 10:56 AM, Pavel Emelyanov wrote:
> On 12/17/2012 07:21 PM, H. Peter Anvin wrote:
>> Because it is almost impossible to do right?
> 
> In the generic case -- I tend to agree. But it's possible to describe
> how a library should communicate to crtools to make it possible.
> 
> Anyway, what I wanted to say -- we didn't have this scenario in our
> plans, but criu project is open, and if someone comes with sane idea,
> we will not object merging it.
> 

I doubt it is possible using existing compiler toolchains.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-17 Thread Pavel Emelyanov
On 12/17/2012 07:21 PM, H. Peter Anvin wrote:
> Because it is almost impossible to do right?

In the generic case -- I tend to agree. But it's possible to describe
how a library should communicate to crtools to make it possible.

Anyway, what I wanted to say -- we didn't have this scenario in our
plans, but criu project is open, and if someone comes with sane idea,
we will not object merging it.

> Pavel Emelyanov  wrote:
> 
>> On 12/14/2012 10:44 PM, Andy Lutomirski wrote:
>>> On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin 
>> wrote:
 On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
> On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
>> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin 
>> wrote:
>>> Wouldn't the vdso get mapped already and could be mremap()'d.  If
>> we
>> really need more control I'd almost push for a device/filesystem
>> node
>> that could be mmapped the usual way.
>>
>> Hmm.  That may work, but it'll still break ABI.  I'm not sure that
>> criu is stable enough yet that we should care.  Criu people?
>
> It's not yet, but we'd still appreciate the criu-friendly vdso
>> redesign.
>
>> (In brief summary: how annoying would it be if the vdso was no
>> longer
>> just a bunch of constant bytes that lived somewhere?)
>
> It depends on what vdso is going to be. In the perfect case it
>> should
> a) be mremap-able to any address (or be at fixed address _forever_,
>> but
>I assume this is not feasible);
> b) have entry points at fixed (or somehow movable) places.
>
> I admit that I didn't understand your question properly, if I did,
> please correct me.
>

 mremap() should work.  At the same time, the code itself is not
>> going to
 have any stability guarantees between kernel versions -- it
>> obviously
 cannot.
>>>
>>> We could guarantee that the symbols in the vdso resolve to particular
>>> offsets within the vdso.  (Yes, this is ugly.)
>>>
>>> Does criu support checkpointing with one version of a shared library
>>> and restoring with another?
>>
>> No, neither we have this in plans.
>> However, if somebody needs this and implements -- why not?!
>>
>> Thanks,
>> Pavel
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-17 Thread H. Peter Anvin
Because it is almost impossible to do right?

Pavel Emelyanov  wrote:

>On 12/14/2012 10:44 PM, Andy Lutomirski wrote:
>> On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin 
>wrote:
>>> On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
 On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin 
>wrote:
>> Wouldn't the vdso get mapped already and could be mremap()'d.  If
>we
> really need more control I'd almost push for a device/filesystem
>node
> that could be mmapped the usual way.
>
> Hmm.  That may work, but it'll still break ABI.  I'm not sure that
> criu is stable enough yet that we should care.  Criu people?

 It's not yet, but we'd still appreciate the criu-friendly vdso
>redesign.

> (In brief summary: how annoying would it be if the vdso was no
>longer
> just a bunch of constant bytes that lived somewhere?)

 It depends on what vdso is going to be. In the perfect case it
>should
 a) be mremap-able to any address (or be at fixed address _forever_,
>but
I assume this is not feasible);
 b) have entry points at fixed (or somehow movable) places.

 I admit that I didn't understand your question properly, if I did,
 please correct me.

>>>
>>> mremap() should work.  At the same time, the code itself is not
>going to
>>> have any stability guarantees between kernel versions -- it
>obviously
>>> cannot.
>> 
>> We could guarantee that the symbols in the vdso resolve to particular
>> offsets within the vdso.  (Yes, this is ugly.)
>> 
>> Does criu support checkpointing with one version of a shared library
>> and restoring with another?
>
>No, neither we have this in plans.
>However, if somebody needs this and implements -- why not?!
>
>Thanks,
>Pavel

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-17 Thread Pavel Emelyanov
On 12/14/2012 10:44 PM, Andy Lutomirski wrote:
> On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin  wrote:
>> On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
>>> On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
 On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin  wrote:
> Wouldn't the vdso get mapped already and could be mremap()'d.  If we
 really need more control I'd almost push for a device/filesystem node
 that could be mmapped the usual way.

 Hmm.  That may work, but it'll still break ABI.  I'm not sure that
 criu is stable enough yet that we should care.  Criu people?
>>>
>>> It's not yet, but we'd still appreciate the criu-friendly vdso redesign.
>>>
 (In brief summary: how annoying would it be if the vdso was no longer
 just a bunch of constant bytes that lived somewhere?)
>>>
>>> It depends on what vdso is going to be. In the perfect case it should
>>> a) be mremap-able to any address (or be at fixed address _forever_, but
>>>I assume this is not feasible);
>>> b) have entry points at fixed (or somehow movable) places.
>>>
>>> I admit that I didn't understand your question properly, if I did,
>>> please correct me.
>>>
>>
>> mremap() should work.  At the same time, the code itself is not going to
>> have any stability guarantees between kernel versions -- it obviously
>> cannot.
> 
> We could guarantee that the symbols in the vdso resolve to particular
> offsets within the vdso.  (Yes, this is ugly.)
> 
> Does criu support checkpointing with one version of a shared library
> and restoring with another?

No, neither we have this in plans.
However, if somebody needs this and implements -- why not?!

Thanks,
Pavel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-17 Thread Pavel Emelyanov
On 12/17/2012 07:21 PM, H. Peter Anvin wrote:
 Because it is almost impossible to do right?

In the generic case -- I tend to agree. But it's possible to describe
how a library should communicate to crtools to make it possible.

Anyway, what I wanted to say -- we didn't have this scenario in our
plans, but criu project is open, and if someone comes with sane idea,
we will not object merging it.

 Pavel Emelyanov xe...@parallels.com wrote:
 
 On 12/14/2012 10:44 PM, Andy Lutomirski wrote:
 On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin h...@zytor.com
 wrote:
 On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
 On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
 On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com
 wrote:
 Wouldn't the vdso get mapped already and could be mremap()'d.  If
 we
 really need more control I'd almost push for a device/filesystem
 node
 that could be mmapped the usual way.

 Hmm.  That may work, but it'll still break ABI.  I'm not sure that
 criu is stable enough yet that we should care.  Criu people?

 It's not yet, but we'd still appreciate the criu-friendly vdso
 redesign.

 (In brief summary: how annoying would it be if the vdso was no
 longer
 just a bunch of constant bytes that lived somewhere?)

 It depends on what vdso is going to be. In the perfect case it
 should
 a) be mremap-able to any address (or be at fixed address _forever_,
 but
I assume this is not feasible);
 b) have entry points at fixed (or somehow movable) places.

 I admit that I didn't understand your question properly, if I did,
 please correct me.


 mremap() should work.  At the same time, the code itself is not
 going to
 have any stability guarantees between kernel versions -- it
 obviously
 cannot.

 We could guarantee that the symbols in the vdso resolve to particular
 offsets within the vdso.  (Yes, this is ugly.)

 Does criu support checkpointing with one version of a shared library
 and restoring with another?

 No, neither we have this in plans.
 However, if somebody needs this and implements -- why not?!

 Thanks,
 Pavel
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-17 Thread H. Peter Anvin
On 12/17/2012 10:56 AM, Pavel Emelyanov wrote:
 On 12/17/2012 07:21 PM, H. Peter Anvin wrote:
 Because it is almost impossible to do right?
 
 In the generic case -- I tend to agree. But it's possible to describe
 how a library should communicate to crtools to make it possible.
 
 Anyway, what I wanted to say -- we didn't have this scenario in our
 plans, but criu project is open, and if someone comes with sane idea,
 we will not object merging it.
 

I doubt it is possible using existing compiler toolchains.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-17 Thread Pavel Emelyanov
On 12/14/2012 10:44 PM, Andy Lutomirski wrote:
 On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin h...@zytor.com wrote:
 On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
 On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
 On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote:
 Wouldn't the vdso get mapped already and could be mremap()'d.  If we
 really need more control I'd almost push for a device/filesystem node
 that could be mmapped the usual way.

 Hmm.  That may work, but it'll still break ABI.  I'm not sure that
 criu is stable enough yet that we should care.  Criu people?

 It's not yet, but we'd still appreciate the criu-friendly vdso redesign.

 (In brief summary: how annoying would it be if the vdso was no longer
 just a bunch of constant bytes that lived somewhere?)

 It depends on what vdso is going to be. In the perfect case it should
 a) be mremap-able to any address (or be at fixed address _forever_, but
I assume this is not feasible);
 b) have entry points at fixed (or somehow movable) places.

 I admit that I didn't understand your question properly, if I did,
 please correct me.


 mremap() should work.  At the same time, the code itself is not going to
 have any stability guarantees between kernel versions -- it obviously
 cannot.
 
 We could guarantee that the symbols in the vdso resolve to particular
 offsets within the vdso.  (Yes, this is ugly.)
 
 Does criu support checkpointing with one version of a shared library
 and restoring with another?

No, neither we have this in plans.
However, if somebody needs this and implements -- why not?!

Thanks,
Pavel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-17 Thread H. Peter Anvin
Because it is almost impossible to do right?

Pavel Emelyanov xe...@parallels.com wrote:

On 12/14/2012 10:44 PM, Andy Lutomirski wrote:
 On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin h...@zytor.com
wrote:
 On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
 On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
 On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com
wrote:
 Wouldn't the vdso get mapped already and could be mremap()'d.  If
we
 really need more control I'd almost push for a device/filesystem
node
 that could be mmapped the usual way.

 Hmm.  That may work, but it'll still break ABI.  I'm not sure that
 criu is stable enough yet that we should care.  Criu people?

 It's not yet, but we'd still appreciate the criu-friendly vdso
redesign.

 (In brief summary: how annoying would it be if the vdso was no
longer
 just a bunch of constant bytes that lived somewhere?)

 It depends on what vdso is going to be. In the perfect case it
should
 a) be mremap-able to any address (or be at fixed address _forever_,
but
I assume this is not feasible);
 b) have entry points at fixed (or somehow movable) places.

 I admit that I didn't understand your question properly, if I did,
 please correct me.


 mremap() should work.  At the same time, the code itself is not
going to
 have any stability guarantees between kernel versions -- it
obviously
 cannot.
 
 We could guarantee that the symbols in the vdso resolve to particular
 offsets within the vdso.  (Yes, this is ugly.)
 
 Does criu support checkpointing with one version of a shared library
 and restoring with another?

No, neither we have this in plans.
However, if somebody needs this and implements -- why not?!

Thanks,
Pavel

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 03:48 PM, John Stultz wrote:
> On 12/14/2012 02:48 PM, H. Peter Anvin wrote:
>> On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote:
>>> On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:
>>>
>>>
>>> This won't help in case of scenario you've been pointing in
>>> previous email (where c/r happens in a middle of vdso),
>>> would it? Because we still need somehow to be sure we're not
>>> checkpointing in a middle of signal handler which will return
>>> to some vdso place.
>> It is okay if and only if those vdso places never change... which I
>> think is doable if they only contain trival system call wrappers, i.e.
>> something like:
>>
>> movl $__SYS_gettimeofday, %eax
>> syscall
>> ret
> 
> Though doesn't this make it easier for exploits (somewhat undoing ASLR)?
> I know Andi always wanted to avoid having syscall instructions at a
> fixed location for the old vsyscall code (though I know we had it
> none-the-less for awhile).   But maybe I'm confusing issues here?
> 

They aren't in fixed addresses across processes... the vdso location can
still be randomized.  It just has to be the same across the
checkpoint/restart operation, just like all the other instructions.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread John Stultz

On 12/14/2012 02:48 PM, H. Peter Anvin wrote:

On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote:

On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:


This won't help in case of scenario you've been pointing in
previous email (where c/r happens in a middle of vdso),
would it? Because we still need somehow to be sure we're not
checkpointing in a middle of signal handler which will return
to some vdso place.

It is okay if and only if those vdso places never change... which I
think is doable if they only contain trival system call wrappers, i.e.
something like:

movl $__SYS_gettimeofday, %eax
syscall
ret


Though doesn't this make it easier for exploits (somewhat undoing ASLR)? 
I know Andi always wanted to avoid having syscall instructions at a 
fixed location for the old vsyscall code (though I know we had it 
none-the-less for awhile).   But maybe I'm confusing issues here?


thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 03:09 PM, Stefani Seibold wrote:
> 
> Sorry for not following the discussion, but im am currently trying to
> compile the vclocktime.c as a 32 bit object. Most of the (clever) work
> is done.
> 
> After this the next step is to map the needed fixmaps into the 32 bit
> address space. Maybe this can be done with install_special_mapping().
> 

install_special_mapping() is indeed how it is done.  The suggestion is
to make the vvar page an actual section inside the vdso, and then just
substitute the vvar page into the mapping array when installing the vdso
into the process user space.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Stefani Seibold
Am Freitag, den 14.12.2012, 14:46 -0800 schrieb H. Peter Anvin:
> On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
> > On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
> >> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin  wrote:
> >>> Wouldn't the vdso get mapped already and could be mremap()'d.  If we
> >> really need more control I'd almost push for a device/filesystem node
> >> that could be mmapped the usual way.
> >>
> >> Hmm.  That may work, but it'll still break ABI.  I'm not sure that
> >> criu is stable enough yet that we should care.  Criu people?
> > 
> > It's not yet, but we'd still appreciate the criu-friendly vdso redesign.
> > 
> >> (In brief summary: how annoying would it be if the vdso was no longer
> >> just a bunch of constant bytes that lived somewhere?)
> > 
> > It depends on what vdso is going to be. In the perfect case it should
> > a) be mremap-able to any address (or be at fixed address _forever_, but
> >I assume this is not feasible);
> > b) have entry points at fixed (or somehow movable) places.
> > 
> > I admit that I didn't understand your question properly, if I did,
> > please correct me.
> > 
> 
> Either way... criu on the side, we should proceed with this vdso
> redesign and get support for the 32-bit entry points including compat
> mode on x86-64.
> 
>   -hpa
> 
> 

Sorry for not following the discussion, but im am currently trying to
compile the vclocktime.c as a 32 bit object. Most of the (clever) work
is done.

After this the next step is to map the needed fixmaps into the 32 bit
address space. Maybe this can be done with install_special_mapping().

I think i will do this job in the next days.

- Stefani


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote:
> On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:
>> On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote:
>>>
>>> this would allow us to defer checkpoint until task finish vdso code. Peter,
>>> if I understand you correctly you propose we provide some own proxy-vdso
>>> which would redirect calls to real ones, right? But the main problem
>>> is that is exactly the idea to be able to c/r existing programs without
>>> recompiling and such (or I miss something here?).
>>
>> No, I'm proposing that you use a proxy-vdso which does nothing but
>> system calls, and therefore can be stable indefinitely.
> 
> This won't help in case of scenario you've been pointing in
> previous email (where c/r happens in a middle of vdso),
> would it? Because we still need somehow to be sure we're not
> checkpointing in a middle of signal handler which will return
> to some vdso place.

It is okay if and only if those vdso places never change... which I
think is doable if they only contain trival system call wrappers, i.e.
something like:

movl $__SYS_gettimeofday, %eax
syscall
ret

These kinds of wrappers don't rely on live data provided by the kernel,
and so can be checkpointed together with the rest of the process.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
> On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
>> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin  wrote:
>>> Wouldn't the vdso get mapped already and could be mremap()'d.  If we
>> really need more control I'd almost push for a device/filesystem node
>> that could be mmapped the usual way.
>>
>> Hmm.  That may work, but it'll still break ABI.  I'm not sure that
>> criu is stable enough yet that we should care.  Criu people?
> 
> It's not yet, but we'd still appreciate the criu-friendly vdso redesign.
> 
>> (In brief summary: how annoying would it be if the vdso was no longer
>> just a bunch of constant bytes that lived somewhere?)
> 
> It depends on what vdso is going to be. In the perfect case it should
> a) be mremap-able to any address (or be at fixed address _forever_, but
>I assume this is not feasible);
> b) have entry points at fixed (or somehow movable) places.
> 
> I admit that I didn't understand your question properly, if I did,
> please correct me.
> 

Either way... criu on the side, we should proceed with this vdso
redesign and get support for the 32-bit entry points including compat
mode on x86-64.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Cyrill Gorcunov
On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:
> On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote:
> > 
> > this would allow us to defer checkpoint until task finish vdso code. Peter,
> > if I understand you correctly you propose we provide some own proxy-vdso
> > which would redirect calls to real ones, right? But the main problem
> > is that is exactly the idea to be able to c/r existing programs without
> > recompiling and such (or I miss something here?).
> 
> No, I'm proposing that you use a proxy-vdso which does nothing but
> system calls, and therefore can be stable indefinitely.

This won't help in case of scenario you've been pointing in
previous email (where c/r happens in a middle of vdso),
would it? Because we still need somehow to be sure we're not
checkpointing in a middle of signal handler which will return
to some vdso place.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote:
> 
> this would allow us to defer checkpoint until task finish vdso code. Peter,
> if I understand you correctly you propose we provide some own proxy-vdso
> which would redirect calls to real ones, right? But the main problem
> is that is exactly the idea to be able to c/r existing programs without
> recompiling and such (or I miss something here?).
> 

No, I'm proposing that you use a proxy-vdso which does nothing but
system calls, and therefore can be stable indefinitely.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Cyrill Gorcunov
On Fri, Dec 14, 2012 at 02:00:17PM -0800, H. Peter Anvin wrote:
> On 12/14/2012 01:27 PM, Andy Lutomirski wrote:
> > 
> > I don't know all that much about the linux vm.  Can we create a
> > special vdso address_space or struct inode or something so that a
> > single vma can contain pages with different flags?
> > 
> 
> No, that is still different vmas, but it probably isn't a big deal.
> 
> The advantage of having an inode/namespace is that it lets you use
> mmap() as opposed to mremap() with it, which might be useful, I don't know.
> 
> One option for the checkpoint people might actually be to not use the
> vdso for a process that needs to be checkpointed and restarted on a
> different machine or different kernel version.  Instead they can install
> a pseudo-vdso which just calls normal system calls, and is simply a
> static piece of code that makes normal system calls ... since the
> internals of the kernel are hidden from userspace it is "clean" that way.
> 
> With any actual vdso you risk something like:
> 

Is there a chance to make it something like that (assuming the
dumpee is ptraced)

>   -> vdso entry

mark task as vdso-entered

>   -> signal received, transfer to signal handler
>   -> signal handler exit

before task leave vdso the task mark vdso-entered get cleaned
and if ptraced, the ptracing task is notified

> ... and now you return to the address in the old vdso, but the internals
> of the vdso may have changed.

this would allow us to defer checkpoint until task finish vdso code. Peter,
if I understand you correctly you propose we provide some own proxy-vdso
which would redirect calls to real ones, right? But the main problem
is that is exactly the idea to be able to c/r existing programs without
recompiling and such (or I miss something here?).

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 01:27 PM, Andy Lutomirski wrote:
> 
> I don't know all that much about the linux vm.  Can we create a
> special vdso address_space or struct inode or something so that a
> single vma can contain pages with different flags?
> 

No, that is still different vmas, but it probably isn't a big deal.

The advantage of having an inode/namespace is that it lets you use
mmap() as opposed to mremap() with it, which might be useful, I don't know.

One option for the checkpoint people might actually be to not use the
vdso for a process that needs to be checkpointed and restarted on a
different machine or different kernel version.  Instead they can install
a pseudo-vdso which just calls normal system calls, and is simply a
static piece of code that makes normal system calls ... since the
internals of the kernel are hidden from userspace it is "clean" that way.

With any actual vdso you risk something like:

-> vdso entry
-> signal received, transfer to signal handler
-> checkpoint
-> restart
-> signal handler exit

... and now you return to the address in the old vdso, but the internals
of the vdso may have changed.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Andy Lutomirski
On Fri, Dec 14, 2012 at 1:08 PM, H. Peter Anvin  wrote:
> On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote:

>>> The real issue is that happens if the process is checkpointed while
>>> inside the vdso and now eip/rip or a stack frame points into the vdso.
>>> This is not impossible or even unlikely, especially on 32 bits it is
>>> downright likely.
>>
>> I fear if there are stacked ip which point to vdso -- we simply won't
>> be able to restore properly if vdso internal format changed significantly
>> between kernel versions. (At moment we restore vdso exactly at same position
>> it was on checkpoint stage with same content, iirc).
>>
>
> I don't think there is a way around that.  It is completely unreasonable
> to say that the vdso cannot change between kernel versions, for obvious
> reasons.  It's worse than "significantly"... changing even one
> instruction makes it plausible your eip/rip will point into the middle
> of an instruction.

It's not just kernel versions -- different toolchains may generate
different code.  Heck, building from a different directory can
sometimes generate different output.

The ABI of each vdso function is stable, though -- a sufficiently
clever tool could (maybe) use that knowledge along with unwind data in
the vdso to fix everything up.  This would be interesting, perhaps,
but certainly not easy.

I say we declare "if you want a working vdso in a weird location,
mremap it".  But how does userspace figure out what size to pass to
mremap?  If it's one vma, it's easy.

I don't know all that much about the linux vm.  Can we create a
special vdso address_space or struct inode or something so that a
single vma can contain pages with different flags?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 01:20 PM, Cyrill Gorcunov wrote:
> On Fri, Dec 14, 2012 at 01:08:35PM -0800, H. Peter Anvin wrote:
>> On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote:
>
 The real issue is that happens if the process is checkpointed while
 inside the vdso and now eip/rip or a stack frame points into the vdso.
 This is not impossible or even unlikely, especially on 32 bits it is
 downright likely.
>>>
>>> I fear if there are stacked ip which point to vdso -- we simply won't
>>> be able to restore properly if vdso internal format changed significantly
>>> between kernel versions. (At moment we restore vdso exactly at same position
>>> it was on checkpoint stage with same content, iirc).
>>>
>>
>> I don't think there is a way around that.  It is completely unreasonable
>> to say that the vdso cannot change between kernel versions, for obvious
>> reasons.  It's worse than "significantly"... changing even one
>> instruction makes it plausible your eip/rip will point into the middle
>> of an instruction.
> 
> Well, one idea was to try to escape dumping when a dumpee inside vdso area
> and wait until it leaves this zone, then proceed dumping. Then, if vdso is
> changed (say some new instructions were added) we zap original prologues
> with jmp to new symbols from fresh vdso provided us by a kernel. I'm not
> really sure if this would help us much but just saying (I must admit I
> didn't looked yet into vdso implementation details, so sorry if it sounds
> stupid).
> 

Well, if the vdso contains a system call you may be waiting indefinitely.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Cyrill Gorcunov
On Fri, Dec 14, 2012 at 01:08:35PM -0800, H. Peter Anvin wrote:
> On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote:
> >>>
> >> The real issue is that happens if the process is checkpointed while
> >> inside the vdso and now eip/rip or a stack frame points into the vdso.
> >> This is not impossible or even unlikely, especially on 32 bits it is
> >> downright likely.
> > 
> > I fear if there are stacked ip which point to vdso -- we simply won't
> > be able to restore properly if vdso internal format changed significantly
> > between kernel versions. (At moment we restore vdso exactly at same position
> > it was on checkpoint stage with same content, iirc).
> > 
> 
> I don't think there is a way around that.  It is completely unreasonable
> to say that the vdso cannot change between kernel versions, for obvious
> reasons.  It's worse than "significantly"... changing even one
> instruction makes it plausible your eip/rip will point into the middle
> of an instruction.

Well, one idea was to try to escape dumping when a dumpee inside vdso area
and wait until it leaves this zone, then proceed dumping. Then, if vdso is
changed (say some new instructions were added) we zap original prologues
with jmp to new symbols from fresh vdso provided us by a kernel. I'm not
really sure if this would help us much but just saying (I must admit I
didn't looked yet into vdso implementation details, so sorry if it sounds
stupid).

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote:
>>>
>> The real issue is that happens if the process is checkpointed while
>> inside the vdso and now eip/rip or a stack frame points into the vdso.
>> This is not impossible or even unlikely, especially on 32 bits it is
>> downright likely.
> 
> I fear if there are stacked ip which point to vdso -- we simply won't
> be able to restore properly if vdso internal format changed significantly
> between kernel versions. (At moment we restore vdso exactly at same position
> it was on checkpoint stage with same content, iirc).
> 

I don't think there is a way around that.  It is completely unreasonable
to say that the vdso cannot change between kernel versions, for obvious
reasons.  It's worse than "significantly"... changing even one
instruction makes it plausible your eip/rip will point into the middle
of an instruction.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Cyrill Gorcunov
On Fri, Dec 14, 2012 at 10:47:53AM -0800, H. Peter Anvin wrote:
> On 12/14/2012 10:44 AM, Andy Lutomirski wrote:
> >>
> >> mremap() should work.  At the same time, the code itself is not going to
> >> have any stability guarantees between kernel versions -- it obviously
> >> cannot.
> > 
> > We could guarantee that the symbols in the vdso resolve to particular
> > offsets within the vdso.  (Yes, this is ugly.)
> > 
> > Does criu support checkpointing with one version of a shared library
> > and restoring with another?  If there are no textrels (or whatever the
> > relocation type that actually modifies text as opposed to just the plt
> > or got) then, in principle, it should be doable.  Otherwise some
> > kernel help will be needed to checkpoint reliably on one kernel and
> > restore somewhere else.
> > 
> > (This isn't a regression -- it's already broken.)
> > 
> The real issue is that happens if the process is checkpointed while
> inside the vdso and now eip/rip or a stack frame points into the vdso.
> This is not impossible or even unlikely, especially on 32 bits it is
> downright likely.

I fear if there are stacked ip which point to vdso -- we simply won't
be able to restore properly if vdso internal format changed significantly
between kernel versions. (At moment we restore vdso exactly at same position
it was on checkpoint stage with same content, iirc).

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 10:44 AM, Andy Lutomirski wrote:
>>
>> mremap() should work.  At the same time, the code itself is not going to
>> have any stability guarantees between kernel versions -- it obviously
>> cannot.
> 
> We could guarantee that the symbols in the vdso resolve to particular
> offsets within the vdso.  (Yes, this is ugly.)
> 
> Does criu support checkpointing with one version of a shared library
> and restoring with another?  If there are no textrels (or whatever the
> relocation type that actually modifies text as opposed to just the plt
> or got) then, in principle, it should be doable.  Otherwise some
> kernel help will be needed to checkpoint reliably on one kernel and
> restore somewhere else.
> 
> (This isn't a regression -- it's already broken.)
> 

The real issue is that happens if the process is checkpointed while
inside the vdso and now eip/rip or a stack frame points into the vdso.
This is not impossible or even unlikely, especially on 32 bits it is
downright likely.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Andy Lutomirski
On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin  wrote:
> On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
>> On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
>>> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin  wrote:
 Wouldn't the vdso get mapped already and could be mremap()'d.  If we
>>> really need more control I'd almost push for a device/filesystem node
>>> that could be mmapped the usual way.
>>>
>>> Hmm.  That may work, but it'll still break ABI.  I'm not sure that
>>> criu is stable enough yet that we should care.  Criu people?
>>
>> It's not yet, but we'd still appreciate the criu-friendly vdso redesign.
>>
>>> (In brief summary: how annoying would it be if the vdso was no longer
>>> just a bunch of constant bytes that lived somewhere?)
>>
>> It depends on what vdso is going to be. In the perfect case it should
>> a) be mremap-able to any address (or be at fixed address _forever_, but
>>I assume this is not feasible);
>> b) have entry points at fixed (or somehow movable) places.
>>
>> I admit that I didn't understand your question properly, if I did,
>> please correct me.
>>
>
> mremap() should work.  At the same time, the code itself is not going to
> have any stability guarantees between kernel versions -- it obviously
> cannot.

We could guarantee that the symbols in the vdso resolve to particular
offsets within the vdso.  (Yes, this is ugly.)

Does criu support checkpointing with one version of a shared library
and restoring with another?  If there are no textrels (or whatever the
relocation type that actually modifies text as opposed to just the plt
or got) then, in principle, it should be doable.  Otherwise some
kernel help will be needed to checkpoint reliably on one kernel and
restore somewhere else.

(This isn't a regression -- it's already broken.)

>
> Incidentally, the MAYWRITE bit which is there to allow breakpoints is
> obviously problematic for the vvar page.  We could mark the vvar page
> differently, meaning more vmas, or we could decide it just doesn't
> matter and that if you mprotect() the vvar page and write to it you get
> exactly what you asked for...

I have no strong preference here.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
> On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
>> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin  wrote:
>>> Wouldn't the vdso get mapped already and could be mremap()'d.  If we
>> really need more control I'd almost push for a device/filesystem node
>> that could be mmapped the usual way.
>>
>> Hmm.  That may work, but it'll still break ABI.  I'm not sure that
>> criu is stable enough yet that we should care.  Criu people?
> 
> It's not yet, but we'd still appreciate the criu-friendly vdso redesign.
> 
>> (In brief summary: how annoying would it be if the vdso was no longer
>> just a bunch of constant bytes that lived somewhere?)
> 
> It depends on what vdso is going to be. In the perfect case it should
> a) be mremap-able to any address (or be at fixed address _forever_, but
>I assume this is not feasible);
> b) have entry points at fixed (or somehow movable) places.
> 
> I admit that I didn't understand your question properly, if I did,
> please correct me.
> 

mremap() should work.  At the same time, the code itself is not going to
have any stability guarantees between kernel versions -- it obviously
cannot.

Incidentally, the MAYWRITE bit which is there to allow breakpoints is
obviously problematic for the vvar page.  We could mark the vvar page
differently, meaning more vmas, or we could decide it just doesn't
matter and that if you mprotect() the vvar page and write to it you get
exactly what you asked for...

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Pavel Emelyanov
On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin  wrote:
>> Wouldn't the vdso get mapped already and could be mremap()'d.  If we
> really need more control I'd almost push for a device/filesystem node
> that could be mmapped the usual way.
> 
> Hmm.  That may work, but it'll still break ABI.  I'm not sure that
> criu is stable enough yet that we should care.  Criu people?

It's not yet, but we'd still appreciate the criu-friendly vdso redesign.

> (In brief summary: how annoying would it be if the vdso was no longer
> just a bunch of constant bytes that lived somewhere?)

It depends on what vdso is going to be. In the perfect case it should
a) be mremap-able to any address (or be at fixed address _forever_, but
   I assume this is not feasible);
b) have entry points at fixed (or somehow movable) places.

I admit that I didn't understand your question properly, if I did,
please correct me.

Thanks,
Pavel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Pavel Emelyanov
On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
 On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote:
 Wouldn't the vdso get mapped already and could be mremap()'d.  If we
 really need more control I'd almost push for a device/filesystem node
 that could be mmapped the usual way.
 
 Hmm.  That may work, but it'll still break ABI.  I'm not sure that
 criu is stable enough yet that we should care.  Criu people?

It's not yet, but we'd still appreciate the criu-friendly vdso redesign.

 (In brief summary: how annoying would it be if the vdso was no longer
 just a bunch of constant bytes that lived somewhere?)

It depends on what vdso is going to be. In the perfect case it should
a) be mremap-able to any address (or be at fixed address _forever_, but
   I assume this is not feasible);
b) have entry points at fixed (or somehow movable) places.

I admit that I didn't understand your question properly, if I did,
please correct me.

Thanks,
Pavel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
 On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
 On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote:
 Wouldn't the vdso get mapped already and could be mremap()'d.  If we
 really need more control I'd almost push for a device/filesystem node
 that could be mmapped the usual way.

 Hmm.  That may work, but it'll still break ABI.  I'm not sure that
 criu is stable enough yet that we should care.  Criu people?
 
 It's not yet, but we'd still appreciate the criu-friendly vdso redesign.
 
 (In brief summary: how annoying would it be if the vdso was no longer
 just a bunch of constant bytes that lived somewhere?)
 
 It depends on what vdso is going to be. In the perfect case it should
 a) be mremap-able to any address (or be at fixed address _forever_, but
I assume this is not feasible);
 b) have entry points at fixed (or somehow movable) places.
 
 I admit that I didn't understand your question properly, if I did,
 please correct me.
 

mremap() should work.  At the same time, the code itself is not going to
have any stability guarantees between kernel versions -- it obviously
cannot.

Incidentally, the MAYWRITE bit which is there to allow breakpoints is
obviously problematic for the vvar page.  We could mark the vvar page
differently, meaning more vmas, or we could decide it just doesn't
matter and that if you mprotect() the vvar page and write to it you get
exactly what you asked for...

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Andy Lutomirski
On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin h...@zytor.com wrote:
 On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
 On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
 On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote:
 Wouldn't the vdso get mapped already and could be mremap()'d.  If we
 really need more control I'd almost push for a device/filesystem node
 that could be mmapped the usual way.

 Hmm.  That may work, but it'll still break ABI.  I'm not sure that
 criu is stable enough yet that we should care.  Criu people?

 It's not yet, but we'd still appreciate the criu-friendly vdso redesign.

 (In brief summary: how annoying would it be if the vdso was no longer
 just a bunch of constant bytes that lived somewhere?)

 It depends on what vdso is going to be. In the perfect case it should
 a) be mremap-able to any address (or be at fixed address _forever_, but
I assume this is not feasible);
 b) have entry points at fixed (or somehow movable) places.

 I admit that I didn't understand your question properly, if I did,
 please correct me.


 mremap() should work.  At the same time, the code itself is not going to
 have any stability guarantees between kernel versions -- it obviously
 cannot.

We could guarantee that the symbols in the vdso resolve to particular
offsets within the vdso.  (Yes, this is ugly.)

Does criu support checkpointing with one version of a shared library
and restoring with another?  If there are no textrels (or whatever the
relocation type that actually modifies text as opposed to just the plt
or got) then, in principle, it should be doable.  Otherwise some
kernel help will be needed to checkpoint reliably on one kernel and
restore somewhere else.

(This isn't a regression -- it's already broken.)


 Incidentally, the MAYWRITE bit which is there to allow breakpoints is
 obviously problematic for the vvar page.  We could mark the vvar page
 differently, meaning more vmas, or we could decide it just doesn't
 matter and that if you mprotect() the vvar page and write to it you get
 exactly what you asked for...

I have no strong preference here.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 10:44 AM, Andy Lutomirski wrote:

 mremap() should work.  At the same time, the code itself is not going to
 have any stability guarantees between kernel versions -- it obviously
 cannot.
 
 We could guarantee that the symbols in the vdso resolve to particular
 offsets within the vdso.  (Yes, this is ugly.)
 
 Does criu support checkpointing with one version of a shared library
 and restoring with another?  If there are no textrels (or whatever the
 relocation type that actually modifies text as opposed to just the plt
 or got) then, in principle, it should be doable.  Otherwise some
 kernel help will be needed to checkpoint reliably on one kernel and
 restore somewhere else.
 
 (This isn't a regression -- it's already broken.)
 

The real issue is that happens if the process is checkpointed while
inside the vdso and now eip/rip or a stack frame points into the vdso.
This is not impossible or even unlikely, especially on 32 bits it is
downright likely.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Cyrill Gorcunov
On Fri, Dec 14, 2012 at 10:47:53AM -0800, H. Peter Anvin wrote:
 On 12/14/2012 10:44 AM, Andy Lutomirski wrote:
 
  mremap() should work.  At the same time, the code itself is not going to
  have any stability guarantees between kernel versions -- it obviously
  cannot.
  
  We could guarantee that the symbols in the vdso resolve to particular
  offsets within the vdso.  (Yes, this is ugly.)
  
  Does criu support checkpointing with one version of a shared library
  and restoring with another?  If there are no textrels (or whatever the
  relocation type that actually modifies text as opposed to just the plt
  or got) then, in principle, it should be doable.  Otherwise some
  kernel help will be needed to checkpoint reliably on one kernel and
  restore somewhere else.
  
  (This isn't a regression -- it's already broken.)
  
 The real issue is that happens if the process is checkpointed while
 inside the vdso and now eip/rip or a stack frame points into the vdso.
 This is not impossible or even unlikely, especially on 32 bits it is
 downright likely.

I fear if there are stacked ip which point to vdso -- we simply won't
be able to restore properly if vdso internal format changed significantly
between kernel versions. (At moment we restore vdso exactly at same position
it was on checkpoint stage with same content, iirc).

Cyrill
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote:

 The real issue is that happens if the process is checkpointed while
 inside the vdso and now eip/rip or a stack frame points into the vdso.
 This is not impossible or even unlikely, especially on 32 bits it is
 downright likely.
 
 I fear if there are stacked ip which point to vdso -- we simply won't
 be able to restore properly if vdso internal format changed significantly
 between kernel versions. (At moment we restore vdso exactly at same position
 it was on checkpoint stage with same content, iirc).
 

I don't think there is a way around that.  It is completely unreasonable
to say that the vdso cannot change between kernel versions, for obvious
reasons.  It's worse than significantly... changing even one
instruction makes it plausible your eip/rip will point into the middle
of an instruction.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Cyrill Gorcunov
On Fri, Dec 14, 2012 at 01:08:35PM -0800, H. Peter Anvin wrote:
 On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote:
 
  The real issue is that happens if the process is checkpointed while
  inside the vdso and now eip/rip or a stack frame points into the vdso.
  This is not impossible or even unlikely, especially on 32 bits it is
  downright likely.
  
  I fear if there are stacked ip which point to vdso -- we simply won't
  be able to restore properly if vdso internal format changed significantly
  between kernel versions. (At moment we restore vdso exactly at same position
  it was on checkpoint stage with same content, iirc).
  
 
 I don't think there is a way around that.  It is completely unreasonable
 to say that the vdso cannot change between kernel versions, for obvious
 reasons.  It's worse than significantly... changing even one
 instruction makes it plausible your eip/rip will point into the middle
 of an instruction.

Well, one idea was to try to escape dumping when a dumpee inside vdso area
and wait until it leaves this zone, then proceed dumping. Then, if vdso is
changed (say some new instructions were added) we zap original prologues
with jmp to new symbols from fresh vdso provided us by a kernel. I'm not
really sure if this would help us much but just saying (I must admit I
didn't looked yet into vdso implementation details, so sorry if it sounds
stupid).

Cyrill
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 01:20 PM, Cyrill Gorcunov wrote:
 On Fri, Dec 14, 2012 at 01:08:35PM -0800, H. Peter Anvin wrote:
 On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote:

 The real issue is that happens if the process is checkpointed while
 inside the vdso and now eip/rip or a stack frame points into the vdso.
 This is not impossible or even unlikely, especially on 32 bits it is
 downright likely.

 I fear if there are stacked ip which point to vdso -- we simply won't
 be able to restore properly if vdso internal format changed significantly
 between kernel versions. (At moment we restore vdso exactly at same position
 it was on checkpoint stage with same content, iirc).


 I don't think there is a way around that.  It is completely unreasonable
 to say that the vdso cannot change between kernel versions, for obvious
 reasons.  It's worse than significantly... changing even one
 instruction makes it plausible your eip/rip will point into the middle
 of an instruction.
 
 Well, one idea was to try to escape dumping when a dumpee inside vdso area
 and wait until it leaves this zone, then proceed dumping. Then, if vdso is
 changed (say some new instructions were added) we zap original prologues
 with jmp to new symbols from fresh vdso provided us by a kernel. I'm not
 really sure if this would help us much but just saying (I must admit I
 didn't looked yet into vdso implementation details, so sorry if it sounds
 stupid).
 

Well, if the vdso contains a system call you may be waiting indefinitely.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Andy Lutomirski
On Fri, Dec 14, 2012 at 1:08 PM, H. Peter Anvin h...@zytor.com wrote:
 On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote:

 The real issue is that happens if the process is checkpointed while
 inside the vdso and now eip/rip or a stack frame points into the vdso.
 This is not impossible or even unlikely, especially on 32 bits it is
 downright likely.

 I fear if there are stacked ip which point to vdso -- we simply won't
 be able to restore properly if vdso internal format changed significantly
 between kernel versions. (At moment we restore vdso exactly at same position
 it was on checkpoint stage with same content, iirc).


 I don't think there is a way around that.  It is completely unreasonable
 to say that the vdso cannot change between kernel versions, for obvious
 reasons.  It's worse than significantly... changing even one
 instruction makes it plausible your eip/rip will point into the middle
 of an instruction.

It's not just kernel versions -- different toolchains may generate
different code.  Heck, building from a different directory can
sometimes generate different output.

The ABI of each vdso function is stable, though -- a sufficiently
clever tool could (maybe) use that knowledge along with unwind data in
the vdso to fix everything up.  This would be interesting, perhaps,
but certainly not easy.

I say we declare if you want a working vdso in a weird location,
mremap it.  But how does userspace figure out what size to pass to
mremap?  If it's one vma, it's easy.

I don't know all that much about the linux vm.  Can we create a
special vdso address_space or struct inode or something so that a
single vma can contain pages with different flags?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 01:27 PM, Andy Lutomirski wrote:
 
 I don't know all that much about the linux vm.  Can we create a
 special vdso address_space or struct inode or something so that a
 single vma can contain pages with different flags?
 

No, that is still different vmas, but it probably isn't a big deal.

The advantage of having an inode/namespace is that it lets you use
mmap() as opposed to mremap() with it, which might be useful, I don't know.

One option for the checkpoint people might actually be to not use the
vdso for a process that needs to be checkpointed and restarted on a
different machine or different kernel version.  Instead they can install
a pseudo-vdso which just calls normal system calls, and is simply a
static piece of code that makes normal system calls ... since the
internals of the kernel are hidden from userspace it is clean that way.

With any actual vdso you risk something like:

- vdso entry
- signal received, transfer to signal handler
- checkpoint
- restart
- signal handler exit

... and now you return to the address in the old vdso, but the internals
of the vdso may have changed.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Cyrill Gorcunov
On Fri, Dec 14, 2012 at 02:00:17PM -0800, H. Peter Anvin wrote:
 On 12/14/2012 01:27 PM, Andy Lutomirski wrote:
  
  I don't know all that much about the linux vm.  Can we create a
  special vdso address_space or struct inode or something so that a
  single vma can contain pages with different flags?
  
 
 No, that is still different vmas, but it probably isn't a big deal.
 
 The advantage of having an inode/namespace is that it lets you use
 mmap() as opposed to mremap() with it, which might be useful, I don't know.
 
 One option for the checkpoint people might actually be to not use the
 vdso for a process that needs to be checkpointed and restarted on a
 different machine or different kernel version.  Instead they can install
 a pseudo-vdso which just calls normal system calls, and is simply a
 static piece of code that makes normal system calls ... since the
 internals of the kernel are hidden from userspace it is clean that way.
 
 With any actual vdso you risk something like:
 

Is there a chance to make it something like that (assuming the
dumpee is ptraced)

   - vdso entry

mark task as vdso-entered

   - signal received, transfer to signal handler
   - signal handler exit

before task leave vdso the task mark vdso-entered get cleaned
and if ptraced, the ptracing task is notified

 ... and now you return to the address in the old vdso, but the internals
 of the vdso may have changed.

this would allow us to defer checkpoint until task finish vdso code. Peter,
if I understand you correctly you propose we provide some own proxy-vdso
which would redirect calls to real ones, right? But the main problem
is that is exactly the idea to be able to c/r existing programs without
recompiling and such (or I miss something here?).

Cyrill
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote:
 
 this would allow us to defer checkpoint until task finish vdso code. Peter,
 if I understand you correctly you propose we provide some own proxy-vdso
 which would redirect calls to real ones, right? But the main problem
 is that is exactly the idea to be able to c/r existing programs without
 recompiling and such (or I miss something here?).
 

No, I'm proposing that you use a proxy-vdso which does nothing but
system calls, and therefore can be stable indefinitely.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Cyrill Gorcunov
On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:
 On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote:
  
  this would allow us to defer checkpoint until task finish vdso code. Peter,
  if I understand you correctly you propose we provide some own proxy-vdso
  which would redirect calls to real ones, right? But the main problem
  is that is exactly the idea to be able to c/r existing programs without
  recompiling and such (or I miss something here?).
 
 No, I'm proposing that you use a proxy-vdso which does nothing but
 system calls, and therefore can be stable indefinitely.

This won't help in case of scenario you've been pointing in
previous email (where c/r happens in a middle of vdso),
would it? Because we still need somehow to be sure we're not
checkpointing in a middle of signal handler which will return
to some vdso place.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
 On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
 On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote:
 Wouldn't the vdso get mapped already and could be mremap()'d.  If we
 really need more control I'd almost push for a device/filesystem node
 that could be mmapped the usual way.

 Hmm.  That may work, but it'll still break ABI.  I'm not sure that
 criu is stable enough yet that we should care.  Criu people?
 
 It's not yet, but we'd still appreciate the criu-friendly vdso redesign.
 
 (In brief summary: how annoying would it be if the vdso was no longer
 just a bunch of constant bytes that lived somewhere?)
 
 It depends on what vdso is going to be. In the perfect case it should
 a) be mremap-able to any address (or be at fixed address _forever_, but
I assume this is not feasible);
 b) have entry points at fixed (or somehow movable) places.
 
 I admit that I didn't understand your question properly, if I did,
 please correct me.
 

Either way... criu on the side, we should proceed with this vdso
redesign and get support for the 32-bit entry points including compat
mode on x86-64.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote:
 On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:
 On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote:

 this would allow us to defer checkpoint until task finish vdso code. Peter,
 if I understand you correctly you propose we provide some own proxy-vdso
 which would redirect calls to real ones, right? But the main problem
 is that is exactly the idea to be able to c/r existing programs without
 recompiling and such (or I miss something here?).

 No, I'm proposing that you use a proxy-vdso which does nothing but
 system calls, and therefore can be stable indefinitely.
 
 This won't help in case of scenario you've been pointing in
 previous email (where c/r happens in a middle of vdso),
 would it? Because we still need somehow to be sure we're not
 checkpointing in a middle of signal handler which will return
 to some vdso place.

It is okay if and only if those vdso places never change... which I
think is doable if they only contain trival system call wrappers, i.e.
something like:

movl $__SYS_gettimeofday, %eax
syscall
ret

These kinds of wrappers don't rely on live data provided by the kernel,
and so can be checkpointed together with the rest of the process.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread Stefani Seibold
Am Freitag, den 14.12.2012, 14:46 -0800 schrieb H. Peter Anvin:
 On 12/14/2012 12:34 AM, Pavel Emelyanov wrote:
  On 12/14/2012 06:20 AM, Andy Lutomirski wrote:
  On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote:
  Wouldn't the vdso get mapped already and could be mremap()'d.  If we
  really need more control I'd almost push for a device/filesystem node
  that could be mmapped the usual way.
 
  Hmm.  That may work, but it'll still break ABI.  I'm not sure that
  criu is stable enough yet that we should care.  Criu people?
  
  It's not yet, but we'd still appreciate the criu-friendly vdso redesign.
  
  (In brief summary: how annoying would it be if the vdso was no longer
  just a bunch of constant bytes that lived somewhere?)
  
  It depends on what vdso is going to be. In the perfect case it should
  a) be mremap-able to any address (or be at fixed address _forever_, but
 I assume this is not feasible);
  b) have entry points at fixed (or somehow movable) places.
  
  I admit that I didn't understand your question properly, if I did,
  please correct me.
  
 
 Either way... criu on the side, we should proceed with this vdso
 redesign and get support for the 32-bit entry points including compat
 mode on x86-64.
 
   -hpa
 
 

Sorry for not following the discussion, but im am currently trying to
compile the vclocktime.c as a 32 bit object. Most of the (clever) work
is done.

After this the next step is to map the needed fixmaps into the 32 bit
address space. Maybe this can be done with install_special_mapping().

I think i will do this job in the next days.

- Stefani


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 03:09 PM, Stefani Seibold wrote:
 
 Sorry for not following the discussion, but im am currently trying to
 compile the vclocktime.c as a 32 bit object. Most of the (clever) work
 is done.
 
 After this the next step is to map the needed fixmaps into the 32 bit
 address space. Maybe this can be done with install_special_mapping().
 

install_special_mapping() is indeed how it is done.  The suggestion is
to make the vvar page an actual section inside the vdso, and then just
substitute the vvar page into the mapping array when installing the vdso
into the process user space.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread John Stultz

On 12/14/2012 02:48 PM, H. Peter Anvin wrote:

On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote:

On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:


This won't help in case of scenario you've been pointing in
previous email (where c/r happens in a middle of vdso),
would it? Because we still need somehow to be sure we're not
checkpointing in a middle of signal handler which will return
to some vdso place.

It is okay if and only if those vdso places never change... which I
think is doable if they only contain trival system call wrappers, i.e.
something like:

movl $__SYS_gettimeofday, %eax
syscall
ret


Though doesn't this make it easier for exploits (somewhat undoing ASLR)? 
I know Andi always wanted to avoid having syscall instructions at a 
fixed location for the old vsyscall code (though I know we had it 
none-the-less for awhile).   But maybe I'm confusing issues here?


thanks
-john
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin
On 12/14/2012 03:48 PM, John Stultz wrote:
 On 12/14/2012 02:48 PM, H. Peter Anvin wrote:
 On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote:
 On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:


 This won't help in case of scenario you've been pointing in
 previous email (where c/r happens in a middle of vdso),
 would it? Because we still need somehow to be sure we're not
 checkpointing in a middle of signal handler which will return
 to some vdso place.
 It is okay if and only if those vdso places never change... which I
 think is doable if they only contain trival system call wrappers, i.e.
 something like:

 movl $__SYS_gettimeofday, %eax
 syscall
 ret
 
 Though doesn't this make it easier for exploits (somewhat undoing ASLR)?
 I know Andi always wanted to avoid having syscall instructions at a
 fixed location for the old vsyscall code (though I know we had it
 none-the-less for awhile).   But maybe I'm confusing issues here?
 

They aren't in fixed addresses across processes... the vdso location can
still be randomized.  It just has to be the same across the
checkpoint/restart operation, just like all the other instructions.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/