Re: [PATCH 1/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation

2016-07-15 Thread Rafael J. Wysocki
On Thursday, July 14, 2016 09:15:46 PM Chen Yu wrote:
> snapshot test mode is used to verify if the snapshot data
> written to the swap device can be successfully restored to
> the memory. It is useful to ease the debugging process on
> hibernation, since this mode can not only bypass the
> BIOSes/bootloader, but also the system re-initialization.
> 
> For example:
> 
> echo snapshot > /sys/power/pm_test
> echo disk > /sys/power/state
> 
> [  267.959784] PM: Image saving progress:  80%
> [  268.038669] PM: Image saving progress:  90%
> [  268.111745] PM: Image saving progress: 100%
> [  268.129269] PM: Image saving done.
> [  268.133485] PM: Wrote 518612 kbytes in 0.75 seconds (691.48 MB/s)
> [  268.140564] PM: S|
> [  268.160067] hibernation debug: Waiting for 5 seconds.
> ...
> [  273.508638] PM: Looking for hibernation image.
> [  273.516583] PM: Image signature found, resuming
> [  273.926547] PM: Loading and decompressing image data (129653 pages)...
> [  274.122452] PM: Image loading progress:   0%
> [  274.322127] PM: Image loading progress:  10%
> ...
> 
> Comments and suggestions would be appreciated.
> 
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Chen Yu 
> ---
> v4:
>  - Fix some errors and modify the comment for software_resume_unthaw.
> v3:
>  - As Pavel mentioned, there was a potential risk in previous
>version that might break the filesystem. According to Rafael's suggestion,
>this version avoids that issue by restoring the pages with user/kernel
>threads kept in frozen. Also updated the document.
> ---
>  kernel/power/hibernate.c | 54 
> +++-
>  kernel/power/main.c  |  3 +++
>  kernel/power/power.h |  3 +++
>  kernel/power/suspend.c   |  8 +++
>  kernel/power/swap.c  |  7 +++
>  5 files changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 51441d8..57864fc 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -643,11 +643,59 @@ static void power_down(void)
>  }
>  
>  /**
> + * software_resume_unthaw - Resume from a saved hibernation image with 
> threads frozen.
> + *
> + * This routine is similar to software_resume, except that this one tries
> + * to resume from hibernation with user/kernel threads frozen. And it is 
> mainly
> + * used for snapshot test mode because it is important to keep the threads 
> frozen,
> + * otherwise the filesystem might be broken due to inconsistent 
> disk-data/metadata
> + * across hibernation.
> + *
> + * software_resume_unthaw() does not invoke PM_RESTORE_PREPARE related 
> callbacks,
> + * since it is unclear whether it is safe to 'jump' from 
> PM_HIBERNATION_PREPARE
> + * to PM_RESTORE_PREPARE directly, and bypass this message should not impact 
> much.
> + */
> +static int software_resume_unthaw(void)

This name is really not the best one and quite confusing.

> +{
> + int error;
> + unsigned int flags;
> +
> + pr_debug("PM: Hibernation image partition %d:%d present\n",
> + MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));

I don't think the above is necessary.

The image partition is known to be present at this point.

> +
> + pr_debug("PM: Looking for hibernation image.\n");
> + error = swsusp_check();
> + if (error)
> + return error;

The code below can be shared with software_resume(), can't it?  And I'd move
the above check to hibernate().

So I'd move the part of software_resume() between the "PM: Loading hibernation
image.\n" message and the invocation of unlock_device_hotplug() inclusive to a
separate function and call that from software_resume() and from here.

That new function can be called load_image_and_restore() or similar.

> +
> + pr_debug("PM: Loading hibernation image.\n");
> +
> + lock_device_hotplug();
> + error = create_basic_memory_bitmaps();
> + if (error)
> + goto Unlock;
> +
> + error = swsusp_read();
> + swsusp_close(FMODE_READ);
> + if (!error)
> + hibernation_restore(flags & SF_PLATFORM_MODE);
> +
> + printk(KERN_ERR "PM: Failed to load hibernation image, recovering.\n");
> + swsusp_free();
> + free_basic_memory_bitmaps();
> + Unlock:
> + unlock_device_hotplug();
> +
> + return error;
> +}
> +
> +/**
>   * hibernate - Carry out system hibernation, including saving the image.
>   */
>  int hibernate(void)
>  {
>   int error, nr_calls = 0;
> + bool snapshot_test = false;
>  
>   if (!hibernation_available()) {
>   pr_debug("PM: Hibernation not available.\n");
> @@ -699,7 +747,9 @@ int hibernate(void)
>   pr_debug("PM: writing image.\n");
>   error = swsusp_write(flags);
>   swsusp_free();
> - if (!error)
> + if (hibernation_test(TEST_SNAPSHOT))
> + snapshot_test = true;
> + 

Re: [PATCH 1/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation

2016-07-15 Thread Rafael J. Wysocki
On Thursday, July 14, 2016 09:15:46 PM Chen Yu wrote:
> snapshot test mode is used to verify if the snapshot data
> written to the swap device can be successfully restored to
> the memory. It is useful to ease the debugging process on
> hibernation, since this mode can not only bypass the
> BIOSes/bootloader, but also the system re-initialization.
> 
> For example:
> 
> echo snapshot > /sys/power/pm_test
> echo disk > /sys/power/state
> 
> [  267.959784] PM: Image saving progress:  80%
> [  268.038669] PM: Image saving progress:  90%
> [  268.111745] PM: Image saving progress: 100%
> [  268.129269] PM: Image saving done.
> [  268.133485] PM: Wrote 518612 kbytes in 0.75 seconds (691.48 MB/s)
> [  268.140564] PM: S|
> [  268.160067] hibernation debug: Waiting for 5 seconds.
> ...
> [  273.508638] PM: Looking for hibernation image.
> [  273.516583] PM: Image signature found, resuming
> [  273.926547] PM: Loading and decompressing image data (129653 pages)...
> [  274.122452] PM: Image loading progress:   0%
> [  274.322127] PM: Image loading progress:  10%
> ...
> 
> Comments and suggestions would be appreciated.
> 
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Chen Yu 
> ---
> v4:
>  - Fix some errors and modify the comment for software_resume_unthaw.
> v3:
>  - As Pavel mentioned, there was a potential risk in previous
>version that might break the filesystem. According to Rafael's suggestion,
>this version avoids that issue by restoring the pages with user/kernel
>threads kept in frozen. Also updated the document.
> ---
>  kernel/power/hibernate.c | 54 
> +++-
>  kernel/power/main.c  |  3 +++
>  kernel/power/power.h |  3 +++
>  kernel/power/suspend.c   |  8 +++
>  kernel/power/swap.c  |  7 +++
>  5 files changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 51441d8..57864fc 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -643,11 +643,59 @@ static void power_down(void)
>  }
>  
>  /**
> + * software_resume_unthaw - Resume from a saved hibernation image with 
> threads frozen.
> + *
> + * This routine is similar to software_resume, except that this one tries
> + * to resume from hibernation with user/kernel threads frozen. And it is 
> mainly
> + * used for snapshot test mode because it is important to keep the threads 
> frozen,
> + * otherwise the filesystem might be broken due to inconsistent 
> disk-data/metadata
> + * across hibernation.
> + *
> + * software_resume_unthaw() does not invoke PM_RESTORE_PREPARE related 
> callbacks,
> + * since it is unclear whether it is safe to 'jump' from 
> PM_HIBERNATION_PREPARE
> + * to PM_RESTORE_PREPARE directly, and bypass this message should not impact 
> much.
> + */
> +static int software_resume_unthaw(void)

This name is really not the best one and quite confusing.

> +{
> + int error;
> + unsigned int flags;
> +
> + pr_debug("PM: Hibernation image partition %d:%d present\n",
> + MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));

I don't think the above is necessary.

The image partition is known to be present at this point.

> +
> + pr_debug("PM: Looking for hibernation image.\n");
> + error = swsusp_check();
> + if (error)
> + return error;

The code below can be shared with software_resume(), can't it?  And I'd move
the above check to hibernate().

So I'd move the part of software_resume() between the "PM: Loading hibernation
image.\n" message and the invocation of unlock_device_hotplug() inclusive to a
separate function and call that from software_resume() and from here.

That new function can be called load_image_and_restore() or similar.

> +
> + pr_debug("PM: Loading hibernation image.\n");
> +
> + lock_device_hotplug();
> + error = create_basic_memory_bitmaps();
> + if (error)
> + goto Unlock;
> +
> + error = swsusp_read();
> + swsusp_close(FMODE_READ);
> + if (!error)
> + hibernation_restore(flags & SF_PLATFORM_MODE);
> +
> + printk(KERN_ERR "PM: Failed to load hibernation image, recovering.\n");
> + swsusp_free();
> + free_basic_memory_bitmaps();
> + Unlock:
> + unlock_device_hotplug();
> +
> + return error;
> +}
> +
> +/**
>   * hibernate - Carry out system hibernation, including saving the image.
>   */
>  int hibernate(void)
>  {
>   int error, nr_calls = 0;
> + bool snapshot_test = false;
>  
>   if (!hibernation_available()) {
>   pr_debug("PM: Hibernation not available.\n");
> @@ -699,7 +747,9 @@ int hibernate(void)
>   pr_debug("PM: writing image.\n");
>   error = swsusp_write(flags);
>   swsusp_free();
> - if (!error)
> + if (hibernation_test(TEST_SNAPSHOT))
> + snapshot_test = true;
> + if (!error && !snapshot_test)
>  

[PATCH 1/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation

2016-07-14 Thread Chen Yu
snapshot test mode is used to verify if the snapshot data
written to the swap device can be successfully restored to
the memory. It is useful to ease the debugging process on
hibernation, since this mode can not only bypass the
BIOSes/bootloader, but also the system re-initialization.

For example:

echo snapshot > /sys/power/pm_test
echo disk > /sys/power/state

[  267.959784] PM: Image saving progress:  80%
[  268.038669] PM: Image saving progress:  90%
[  268.111745] PM: Image saving progress: 100%
[  268.129269] PM: Image saving done.
[  268.133485] PM: Wrote 518612 kbytes in 0.75 seconds (691.48 MB/s)
[  268.140564] PM: S|
[  268.160067] hibernation debug: Waiting for 5 seconds.
...
[  273.508638] PM: Looking for hibernation image.
[  273.516583] PM: Image signature found, resuming
[  273.926547] PM: Loading and decompressing image data (129653 pages)...
[  274.122452] PM: Image loading progress:   0%
[  274.322127] PM: Image loading progress:  10%
...

Comments and suggestions would be appreciated.

Suggested-by: Rafael J. Wysocki 
Signed-off-by: Chen Yu 
---
v4:
 - Fix some errors and modify the comment for software_resume_unthaw.
v3:
 - As Pavel mentioned, there was a potential risk in previous
   version that might break the filesystem. According to Rafael's suggestion,
   this version avoids that issue by restoring the pages with user/kernel
   threads kept in frozen. Also updated the document.
---
 kernel/power/hibernate.c | 54 +++-
 kernel/power/main.c  |  3 +++
 kernel/power/power.h |  3 +++
 kernel/power/suspend.c   |  8 +++
 kernel/power/swap.c  |  7 +++
 5 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 51441d8..57864fc 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -643,11 +643,59 @@ static void power_down(void)
 }
 
 /**
+ * software_resume_unthaw - Resume from a saved hibernation image with threads 
frozen.
+ *
+ * This routine is similar to software_resume, except that this one tries
+ * to resume from hibernation with user/kernel threads frozen. And it is mainly
+ * used for snapshot test mode because it is important to keep the threads 
frozen,
+ * otherwise the filesystem might be broken due to inconsistent 
disk-data/metadata
+ * across hibernation.
+ *
+ * software_resume_unthaw() does not invoke PM_RESTORE_PREPARE related 
callbacks,
+ * since it is unclear whether it is safe to 'jump' from PM_HIBERNATION_PREPARE
+ * to PM_RESTORE_PREPARE directly, and bypass this message should not impact 
much.
+ */
+static int software_resume_unthaw(void)
+{
+   int error;
+   unsigned int flags;
+
+   pr_debug("PM: Hibernation image partition %d:%d present\n",
+   MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));
+
+   pr_debug("PM: Looking for hibernation image.\n");
+   error = swsusp_check();
+   if (error)
+   return error;
+
+   pr_debug("PM: Loading hibernation image.\n");
+
+   lock_device_hotplug();
+   error = create_basic_memory_bitmaps();
+   if (error)
+   goto Unlock;
+
+   error = swsusp_read();
+   swsusp_close(FMODE_READ);
+   if (!error)
+   hibernation_restore(flags & SF_PLATFORM_MODE);
+
+   printk(KERN_ERR "PM: Failed to load hibernation image, recovering.\n");
+   swsusp_free();
+   free_basic_memory_bitmaps();
+ Unlock:
+   unlock_device_hotplug();
+
+   return error;
+}
+
+/**
  * hibernate - Carry out system hibernation, including saving the image.
  */
 int hibernate(void)
 {
int error, nr_calls = 0;
+   bool snapshot_test = false;
 
if (!hibernation_available()) {
pr_debug("PM: Hibernation not available.\n");
@@ -699,7 +747,9 @@ int hibernate(void)
pr_debug("PM: writing image.\n");
error = swsusp_write(flags);
swsusp_free();
-   if (!error)
+   if (hibernation_test(TEST_SNAPSHOT))
+   snapshot_test = true;
+   if (!error && !snapshot_test)
power_down();
in_suspend = 0;
pm_restore_gfp_mask();
@@ -711,6 +761,8 @@ int hibernate(void)
free_basic_memory_bitmaps();
  Thaw:
unlock_device_hotplug();
+   if (snapshot_test)
+   software_resume_unthaw();
thaw_processes();
 
/* Don't bother checking whether freezer_test_done is true */
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 5ea50b1..80fe48e 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -83,6 +83,9 @@ int pm_test_level = TEST_NONE;
 
 static const char * const pm_tests[__TEST_AFTER_LAST] = {
[TEST_NONE] = "none",
+#ifdef CONFIG_HIBERNATION
+   [TEST_SNAPSHOT] = "snapshot",
+#endif
[TEST_CORE] = "core",

[PATCH 1/2][RFC v4] PM / hibernate: Introduce snapshot test mode for hibernation

2016-07-14 Thread Chen Yu
snapshot test mode is used to verify if the snapshot data
written to the swap device can be successfully restored to
the memory. It is useful to ease the debugging process on
hibernation, since this mode can not only bypass the
BIOSes/bootloader, but also the system re-initialization.

For example:

echo snapshot > /sys/power/pm_test
echo disk > /sys/power/state

[  267.959784] PM: Image saving progress:  80%
[  268.038669] PM: Image saving progress:  90%
[  268.111745] PM: Image saving progress: 100%
[  268.129269] PM: Image saving done.
[  268.133485] PM: Wrote 518612 kbytes in 0.75 seconds (691.48 MB/s)
[  268.140564] PM: S|
[  268.160067] hibernation debug: Waiting for 5 seconds.
...
[  273.508638] PM: Looking for hibernation image.
[  273.516583] PM: Image signature found, resuming
[  273.926547] PM: Loading and decompressing image data (129653 pages)...
[  274.122452] PM: Image loading progress:   0%
[  274.322127] PM: Image loading progress:  10%
...

Comments and suggestions would be appreciated.

Suggested-by: Rafael J. Wysocki 
Signed-off-by: Chen Yu 
---
v4:
 - Fix some errors and modify the comment for software_resume_unthaw.
v3:
 - As Pavel mentioned, there was a potential risk in previous
   version that might break the filesystem. According to Rafael's suggestion,
   this version avoids that issue by restoring the pages with user/kernel
   threads kept in frozen. Also updated the document.
---
 kernel/power/hibernate.c | 54 +++-
 kernel/power/main.c  |  3 +++
 kernel/power/power.h |  3 +++
 kernel/power/suspend.c   |  8 +++
 kernel/power/swap.c  |  7 +++
 5 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 51441d8..57864fc 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -643,11 +643,59 @@ static void power_down(void)
 }
 
 /**
+ * software_resume_unthaw - Resume from a saved hibernation image with threads 
frozen.
+ *
+ * This routine is similar to software_resume, except that this one tries
+ * to resume from hibernation with user/kernel threads frozen. And it is mainly
+ * used for snapshot test mode because it is important to keep the threads 
frozen,
+ * otherwise the filesystem might be broken due to inconsistent 
disk-data/metadata
+ * across hibernation.
+ *
+ * software_resume_unthaw() does not invoke PM_RESTORE_PREPARE related 
callbacks,
+ * since it is unclear whether it is safe to 'jump' from PM_HIBERNATION_PREPARE
+ * to PM_RESTORE_PREPARE directly, and bypass this message should not impact 
much.
+ */
+static int software_resume_unthaw(void)
+{
+   int error;
+   unsigned int flags;
+
+   pr_debug("PM: Hibernation image partition %d:%d present\n",
+   MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device));
+
+   pr_debug("PM: Looking for hibernation image.\n");
+   error = swsusp_check();
+   if (error)
+   return error;
+
+   pr_debug("PM: Loading hibernation image.\n");
+
+   lock_device_hotplug();
+   error = create_basic_memory_bitmaps();
+   if (error)
+   goto Unlock;
+
+   error = swsusp_read();
+   swsusp_close(FMODE_READ);
+   if (!error)
+   hibernation_restore(flags & SF_PLATFORM_MODE);
+
+   printk(KERN_ERR "PM: Failed to load hibernation image, recovering.\n");
+   swsusp_free();
+   free_basic_memory_bitmaps();
+ Unlock:
+   unlock_device_hotplug();
+
+   return error;
+}
+
+/**
  * hibernate - Carry out system hibernation, including saving the image.
  */
 int hibernate(void)
 {
int error, nr_calls = 0;
+   bool snapshot_test = false;
 
if (!hibernation_available()) {
pr_debug("PM: Hibernation not available.\n");
@@ -699,7 +747,9 @@ int hibernate(void)
pr_debug("PM: writing image.\n");
error = swsusp_write(flags);
swsusp_free();
-   if (!error)
+   if (hibernation_test(TEST_SNAPSHOT))
+   snapshot_test = true;
+   if (!error && !snapshot_test)
power_down();
in_suspend = 0;
pm_restore_gfp_mask();
@@ -711,6 +761,8 @@ int hibernate(void)
free_basic_memory_bitmaps();
  Thaw:
unlock_device_hotplug();
+   if (snapshot_test)
+   software_resume_unthaw();
thaw_processes();
 
/* Don't bother checking whether freezer_test_done is true */
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 5ea50b1..80fe48e 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -83,6 +83,9 @@ int pm_test_level = TEST_NONE;
 
 static const char * const pm_tests[__TEST_AFTER_LAST] = {
[TEST_NONE] = "none",
+#ifdef CONFIG_HIBERNATION
+   [TEST_SNAPSHOT] = "snapshot",
+#endif
[TEST_CORE] = "core",
[TEST_CPUS] = "processors",