Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-21 Thread Roy Franz
On Fri, Sep 20, 2013 at 8:00 AM, H. Peter Anvin  wrote:
> On 09/20/2013 04:27 AM, Matt Fleming wrote:
>> On Wed, 18 Sep, at 09:48:44PM, Roy Franz wrote:
>>> Would it be acceptable to fix the naming/comments, and convert values
>>> above 126 to '?'
>>> in the current patchset, and address a more thorough fix in another patch 
>>> set?
>>> The ARM and ARM64 EFI stub patchsets that are mostly complete depend
>>> on this one,
>>> so getting this merged soon would be helpful.
>>
>> Just fixing the function name and comments is enough for this patch
>> series. Anything else should be separate.
>>
>
> I just whipped up a patch to do proper UTF-16 to UTF-8 conversion.
> Completely untested, of course.
>
> -hpa
>

Thanks for putting this together.  I fixed up a few minor issues,
and it works.  Updated version below.  I'll submit this as a separate patch
as part of the EFI stub common code series.

Roy


commit 827285bac3daa79cd562bf79b5e9e88a61d357be
Author: H. Peter Anvin 
Date:   Fri Sep 20 12:46:16 2013 -0700

Do proper conversion from UTF-16 to UTF-8

Improve the conversion of the UTF-16 EFI command line
to UTF-8 for passing to the kernel.

Signed-off-by: Roy Franz 

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 5e708c0..4723dc89 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -486,8 +486,7 @@ struct boot_params *make_boot_params(void *handle,
efi_system_table_t *_table)
  hdr->type_of_loader = 0x21;

  /* Convert unicode cmdline to ascii */
- cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image,
-   _size);
+ cmdline_ptr = efi_convert_cmdline(sys_table, image, _size);
  if (!cmdline_ptr)
  goto fail;
  hdr->cmd_line_ptr = (unsigned long)cmdline_ptr;
diff --git a/drivers/firmware/efi/efi-stub-helper.c
b/drivers/firmware/efi/efi-stub-helper.c
index 335d17d..8a3ab4b 100644
--- a/drivers/firmware/efi/efi-stub-helper.c
+++ b/drivers/firmware/efi/efi-stub-helper.c
@@ -548,61 +548,112 @@ static efi_status_t
efi_relocate_kernel(efi_system_table_t *sys_table_arg,

  return status;
 }
-/* Convert the unicode UEFI command line to ASCII to pass to kernel.
+
+/*
+ * Get the number of UTF-8 bytes corresponding to an UTF-16 character.
+ * This overestimates for surrogates, but that is okay.
+ */
+static int efi_utf8_bytes(u16 c)
+{
+ return 1 + (c >= 0x80) + (c >= 0x800);
+}
+
+/*
+ * Convert an UTF-16 string, not necessarily null terminated, to UTF-8.
+ */
+static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n)
+{
+ unsigned int c;
+
+ while (n--) {
+ c = *src++;
+ if (n && c >= 0xd800 && c <= 0xdbff &&
+*src >= 0xdc00 && *src <= 0xdfff) {
+ c = 0x1 + ((c & 0x3ff) << 10) + (*src & 0x3ff);
+ src++;
+ n--;
+ }
+ if (c >= 0xd800 && c <= 0xdfff)
+ c = 0xfffd; /* Unmatched surrogate */
+ if (c < 0x80) {
+ *dst++ = c;
+ continue;
+ }
+ if (c < 0x800) {
+ *dst++ = 0xc0 + (c >> 6);
+ goto t1;
+ }
+ if (c < 0x1) {
+ *dst++ = 0xe0 + (c >> 12);
+ goto t2;
+ }
+ *dst++ = 0xf0 + (c >> 18);
+ *dst++ = 0x80 + ((c >> 12) & 0x3f);
+ t2:
+ *dst++ = 0x80 + ((c >> 6) & 0x3f);
+ t1:
+ *dst++ = 0x80 + (c & 0x3f);
+ }
+
+ return dst;
+}
+
+/*
+ * Convert the unicode UEFI command line to ASCII to pass to kernel.
  * Size of memory allocated return in *cmd_line_len.
  * Returns NULL on error.
  */
-static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
-  efi_loaded_image_t *image,
-  int *cmd_line_len)
+static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg,
+ efi_loaded_image_t *image,
+ int *cmd_line_len)
 {
- u16 *s2;
+ const u16 *s2;
  u8 *s1 = NULL;
  unsigned long cmdline_addr = 0;
  int load_options_size = image->load_options_size / 2; /* ASCII */
- void *options = image->load_options;
- int options_size = 0;
+ const u16 *options = image->load_options;
+ int options_bytes = 0; /* UTF-8 bytes */
+ int options_chars = 0; /* UTF-16 chars */
  efi_status_t status;
  int i;
  u16 zero = 0;

  if (options) {
  s2 = options;
- while (*s2 && *s2 != '\n' && options_size < load_options_size) {
+ while (*s2 && *s2 != '\n' && options_bytes < load_options_size) {
+ options_bytes += efi_utf8_bytes(*s2);
  s2++;
- options_size++;
  }
+ options_chars = s2 - options;
  }

- if (options_size == 0) {
- /* No command line options, so return empty string*/
- options_size = 1;
+ if (!options_chars) {
+ /* No command line options, so return empty string */
  options = 
  }

- options_size++;  /* NUL termination */
+ options_bytes++; /* NUL termination */
+
 #ifdef CONFIG_ARM
  /* For ARM, allocate at a high address to avoid reserved
  * regions at low addresses that we don't know the specfics of
  * at the time we are processing the command line.
  */
- status = efi_high_alloc(sys_table_arg, options_size, 0,
+ status = efi_high_alloc(sys_table_arg, options_bytes, 0,
 _addr, 0xf000);
 #else
- status = efi_low_alloc(sys_table_arg, options_size, 0,
+ status = efi_low_alloc(sys_table_arg, 

Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-21 Thread Roy Franz
On Fri, Sep 20, 2013 at 8:00 AM, H. Peter Anvin h...@zytor.com wrote:
 On 09/20/2013 04:27 AM, Matt Fleming wrote:
 On Wed, 18 Sep, at 09:48:44PM, Roy Franz wrote:
 Would it be acceptable to fix the naming/comments, and convert values
 above 126 to '?'
 in the current patchset, and address a more thorough fix in another patch 
 set?
 The ARM and ARM64 EFI stub patchsets that are mostly complete depend
 on this one,
 so getting this merged soon would be helpful.

 Just fixing the function name and comments is enough for this patch
 series. Anything else should be separate.


 I just whipped up a patch to do proper UTF-16 to UTF-8 conversion.
 Completely untested, of course.

 -hpa


Thanks for putting this together.  I fixed up a few minor issues,
and it works.  Updated version below.  I'll submit this as a separate patch
as part of the EFI stub common code series.

Roy


commit 827285bac3daa79cd562bf79b5e9e88a61d357be
Author: H. Peter Anvin h...@zytor.com
Date:   Fri Sep 20 12:46:16 2013 -0700

Do proper conversion from UTF-16 to UTF-8

Improve the conversion of the UTF-16 EFI command line
to UTF-8 for passing to the kernel.

Signed-off-by: Roy Franz roy.fr...@linaro.org

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 5e708c0..4723dc89 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -486,8 +486,7 @@ struct boot_params *make_boot_params(void *handle,
efi_system_table_t *_table)
  hdr-type_of_loader = 0x21;

  /* Convert unicode cmdline to ascii */
- cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image,
-   options_size);
+ cmdline_ptr = efi_convert_cmdline(sys_table, image, options_size);
  if (!cmdline_ptr)
  goto fail;
  hdr-cmd_line_ptr = (unsigned long)cmdline_ptr;
diff --git a/drivers/firmware/efi/efi-stub-helper.c
b/drivers/firmware/efi/efi-stub-helper.c
index 335d17d..8a3ab4b 100644
--- a/drivers/firmware/efi/efi-stub-helper.c
+++ b/drivers/firmware/efi/efi-stub-helper.c
@@ -548,61 +548,112 @@ static efi_status_t
efi_relocate_kernel(efi_system_table_t *sys_table_arg,

  return status;
 }
-/* Convert the unicode UEFI command line to ASCII to pass to kernel.
+
+/*
+ * Get the number of UTF-8 bytes corresponding to an UTF-16 character.
+ * This overestimates for surrogates, but that is okay.
+ */
+static int efi_utf8_bytes(u16 c)
+{
+ return 1 + (c = 0x80) + (c = 0x800);
+}
+
+/*
+ * Convert an UTF-16 string, not necessarily null terminated, to UTF-8.
+ */
+static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n)
+{
+ unsigned int c;
+
+ while (n--) {
+ c = *src++;
+ if (n  c = 0xd800  c = 0xdbff 
+*src = 0xdc00  *src = 0xdfff) {
+ c = 0x1 + ((c  0x3ff)  10) + (*src  0x3ff);
+ src++;
+ n--;
+ }
+ if (c = 0xd800  c = 0xdfff)
+ c = 0xfffd; /* Unmatched surrogate */
+ if (c  0x80) {
+ *dst++ = c;
+ continue;
+ }
+ if (c  0x800) {
+ *dst++ = 0xc0 + (c  6);
+ goto t1;
+ }
+ if (c  0x1) {
+ *dst++ = 0xe0 + (c  12);
+ goto t2;
+ }
+ *dst++ = 0xf0 + (c  18);
+ *dst++ = 0x80 + ((c  12)  0x3f);
+ t2:
+ *dst++ = 0x80 + ((c  6)  0x3f);
+ t1:
+ *dst++ = 0x80 + (c  0x3f);
+ }
+
+ return dst;
+}
+
+/*
+ * Convert the unicode UEFI command line to ASCII to pass to kernel.
  * Size of memory allocated return in *cmd_line_len.
  * Returns NULL on error.
  */
-static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
-  efi_loaded_image_t *image,
-  int *cmd_line_len)
+static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg,
+ efi_loaded_image_t *image,
+ int *cmd_line_len)
 {
- u16 *s2;
+ const u16 *s2;
  u8 *s1 = NULL;
  unsigned long cmdline_addr = 0;
  int load_options_size = image-load_options_size / 2; /* ASCII */
- void *options = image-load_options;
- int options_size = 0;
+ const u16 *options = image-load_options;
+ int options_bytes = 0; /* UTF-8 bytes */
+ int options_chars = 0; /* UTF-16 chars */
  efi_status_t status;
  int i;
  u16 zero = 0;

  if (options) {
  s2 = options;
- while (*s2  *s2 != '\n'  options_size  load_options_size) {
+ while (*s2  *s2 != '\n'  options_bytes  load_options_size) {
+ options_bytes += efi_utf8_bytes(*s2);
  s2++;
- options_size++;
  }
+ options_chars = s2 - options;
  }

- if (options_size == 0) {
- /* No command line options, so return empty string*/
- options_size = 1;
+ if (!options_chars) {
+ /* No command line options, so return empty string */
  options = zero;
  }

- options_size++;  /* NUL termination */
+ options_bytes++; /* NUL termination */
+
 #ifdef CONFIG_ARM
  /* For ARM, allocate at a high address to avoid reserved
  * regions at low addresses that we don't know the specfics of
  * at the time we are processing the command line.
  */
- status = efi_high_alloc(sys_table_arg, options_size, 0,
+ status = efi_high_alloc(sys_table_arg, options_bytes, 0,
 cmdline_addr, 0xf000);
 #else
- status = efi_low_alloc(sys_table_arg, options_size, 0,
+ status = efi_low_alloc(sys_table_arg, options_bytes, 0,

Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-20 Thread H. Peter Anvin
On 09/20/2013 04:27 AM, Matt Fleming wrote:
> On Wed, 18 Sep, at 09:48:44PM, Roy Franz wrote:
>> Would it be acceptable to fix the naming/comments, and convert values
>> above 126 to '?'
>> in the current patchset, and address a more thorough fix in another patch 
>> set?
>> The ARM and ARM64 EFI stub patchsets that are mostly complete depend
>> on this one,
>> so getting this merged soon would be helpful.
> 
> Just fixing the function name and comments is enough for this patch
> series. Anything else should be separate.
> 

I just whipped up a patch to do proper UTF-16 to UTF-8 conversion.
Completely untested, of course.

-hpa

>From 7d6cf630c1adbb9787a24c2994230373c2b20a8f Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" 
Date: Fri, 20 Sep 2013 09:55:39 -0500
Subject: [PATCH] efi: Handle arbitrary Unicode characters

Instead of truncating UTF-16 assuming all characters is ASCII,
properly convert it to UTF-8.

Signed-off-by: H. Peter Anvin 
---
 arch/x86/boot/compressed/eboot.c   |  3 +-
 drivers/firmware/efi/efi-stub-helper.c | 89 ++
 2 files changed, 71 insertions(+), 21 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index bf56206..aacbe50 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -488,8 +488,7 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table)
 	hdr->type_of_loader = 0x21;
 
 	/* Convert unicode cmdline to ascii */
-	cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image,
-		   _size);
+	cmdline_ptr = efi_convert_cmdline(sys_table, image, _size);
 	if (!cmdline_ptr)
 		goto fail;
 	hdr->cmd_line_ptr = (unsigned long)cmdline_ptr;
diff --git a/drivers/firmware/efi/efi-stub-helper.c b/drivers/firmware/efi/efi-stub-helper.c
index c91e757..7d4c1a4 100644
--- a/drivers/firmware/efi/efi-stub-helper.c
+++ b/drivers/firmware/efi/efi-stub-helper.c
@@ -567,20 +567,72 @@ static efi_status_t efi_relocate_kernel(efi_system_table_t *sys_table_arg,
 
 	return status;
 }
-/* Convert the unicode UEFI command line to ASCII to pass to kernel.
+
+/*
+ * Get the number of UTF-8 bytes corresponding to an UTF-16 character.
+ * This overestimates for surrogates, but that is okay.
+ */
+static int efi_utf8_bytes(u16 c)
+{
+	return 1 + (c >= 0x80) + (c >= 0x800);
+}
+
+/*
+ * Convert an UTF-16 string, not necessarily null terminated, to UTF-8.
+ */
+static u8 *efi_utf8_to_utf16(u8 *dst, const u16 *src, int n)
+{
+	unsigned int c;
+
+	while (n--) {
+		c = *src++;
+		if (n && c >= 0xd800 && c <= 0xdbff &&
+		*src >= 0xdc00 && *src <= 0xdfff) {
+			c = 0x1 + ((c & 0x3ff) << 10) + (*src & 0x3ff);
+			src++;
+			n--;
+		}
+		if (c >= 0xd800 && c <= 0xdfff)
+			c = 0xfffd; /* Unmatched surrogate */
+		if (c < 0x80) {
+			*dst++ = c;
+			continue;
+		}
+		if (c < 0x800) {
+			*dst++ = 0xc0 + (c >> 6);
+			goto t1;
+		}
+		if (c < 0x1) {
+			*dst++ = 0xe0 + (c >> 12);
+			goto t2;
+		}
+		*dst++ = 0xf0 + (c >> 18);
+		*dst++ = 0x80 + ((c >> 12) & 0x3f);
+	t2:
+		*dst++ = 0x80 + ((c >> 6) & 0x3f);
+	t1:
+		*dst++ = 0x80 + (c & 0x3f);
+	}
+
+	return dst;
+}
+
+/*
+ * Convert the unicode UEFI command line to ASCII to pass to kernel.
  * Size of memory allocated return in *cmd_line_len.
  * Returns NULL on error.
  */
-static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
-  efi_loaded_image_t *image,
-  int *cmd_line_len)
+static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg,
+ efi_loaded_image_t *image,
+ int *cmd_line_len)
 {
-	u16 *s2;
+	const u16 *s2;
 	u8 *s1 = NULL;
 	unsigned long cmdline_addr = 0;
 	int load_options_size = image->load_options_size / 2; /* ASCII */
-	void *options = image->load_options;
-	int options_size = 0;
+	const void *options = image->load_options;
+	int options_bytes = 0;	/* UTF-8 bytes */
+	int options_chars = 0;	/* UTF-16 chars */
 	efi_status_t status;
 	int i;
 	u16 zero = 0;
@@ -588,40 +640,39 @@ static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
 	if (options) {
 		s2 = options;
 		while (*s2 && *s2 != '\n' && options_size < load_options_size) {
+			options_bytes += efi_utf8_bytes(*s2);
 			s2++;
-			options_size++;
 		}
+		options_chars = s2 - options;
 	}
 
-	if (options_size == 0) {
-		/* No command line options, so return empty string*/
-		options_size = 1;
+	if (!options_chars) {
+		/* No command line options, so return empty string */
 		options = 
 	}
 
-	options_size++;  /* NUL termination */
+	options_bytes++;	/* NUL termination */
+
 #ifdef CONFIG_ARM
 	/* For ARM, allocate at a high address to avoid reserved
 	 * regions at low addresses that we don't know the specfics of
 	 * at the time we are processing the command line.
 	 */
-	status = efi_high_alloc(sys_table_arg, options_size, 0,
+	status = efi_high_alloc(sys_table_arg, options_bytes, 0,
 			_addr, 0xf000);
 #else
-	status = efi_low_alloc(sys_table_arg, 

Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-20 Thread Matt Fleming
On Fri, 20 Sep, at 01:02:41AM, Adam Borowski wrote:
> Are we on the same page so far?  If so, I can make a patch atop yours.

Patches to fix this would be most welcome, though I'd wait until Roy
sends out another version of this patch with the name/comment changes.

This series will eventually end up in the 'next' branch at,

  git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-20 Thread Matt Fleming
On Wed, 18 Sep, at 09:48:44PM, Roy Franz wrote:
> Would it be acceptable to fix the naming/comments, and convert values
> above 126 to '?'
> in the current patchset, and address a more thorough fix in another patch set?
> The ARM and ARM64 EFI stub patchsets that are mostly complete depend
> on this one,
> so getting this merged soon would be helpful.

Just fixing the function name and comments is enough for this patch
series. Anything else should be separate.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-20 Thread Matt Fleming
On Wed, 18 Sep, at 09:48:44PM, Roy Franz wrote:
 Would it be acceptable to fix the naming/comments, and convert values
 above 126 to '?'
 in the current patchset, and address a more thorough fix in another patch set?
 The ARM and ARM64 EFI stub patchsets that are mostly complete depend
 on this one,
 so getting this merged soon would be helpful.

Just fixing the function name and comments is enough for this patch
series. Anything else should be separate.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-20 Thread Matt Fleming
On Fri, 20 Sep, at 01:02:41AM, Adam Borowski wrote:
 Are we on the same page so far?  If so, I can make a patch atop yours.

Patches to fix this would be most welcome, though I'd wait until Roy
sends out another version of this patch with the name/comment changes.

This series will eventually end up in the 'next' branch at,

  git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-20 Thread H. Peter Anvin
On 09/20/2013 04:27 AM, Matt Fleming wrote:
 On Wed, 18 Sep, at 09:48:44PM, Roy Franz wrote:
 Would it be acceptable to fix the naming/comments, and convert values
 above 126 to '?'
 in the current patchset, and address a more thorough fix in another patch 
 set?
 The ARM and ARM64 EFI stub patchsets that are mostly complete depend
 on this one,
 so getting this merged soon would be helpful.
 
 Just fixing the function name and comments is enough for this patch
 series. Anything else should be separate.
 

I just whipped up a patch to do proper UTF-16 to UTF-8 conversion.
Completely untested, of course.

-hpa

From 7d6cf630c1adbb9787a24c2994230373c2b20a8f Mon Sep 17 00:00:00 2001
From: H. Peter Anvin h...@linux.intel.com
Date: Fri, 20 Sep 2013 09:55:39 -0500
Subject: [PATCH] efi: Handle arbitrary Unicode characters

Instead of truncating UTF-16 assuming all characters is ASCII,
properly convert it to UTF-8.

Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/boot/compressed/eboot.c   |  3 +-
 drivers/firmware/efi/efi-stub-helper.c | 89 ++
 2 files changed, 71 insertions(+), 21 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index bf56206..aacbe50 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -488,8 +488,7 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table)
 	hdr-type_of_loader = 0x21;
 
 	/* Convert unicode cmdline to ascii */
-	cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image,
-		   options_size);
+	cmdline_ptr = efi_convert_cmdline(sys_table, image, options_size);
 	if (!cmdline_ptr)
 		goto fail;
 	hdr-cmd_line_ptr = (unsigned long)cmdline_ptr;
diff --git a/drivers/firmware/efi/efi-stub-helper.c b/drivers/firmware/efi/efi-stub-helper.c
index c91e757..7d4c1a4 100644
--- a/drivers/firmware/efi/efi-stub-helper.c
+++ b/drivers/firmware/efi/efi-stub-helper.c
@@ -567,20 +567,72 @@ static efi_status_t efi_relocate_kernel(efi_system_table_t *sys_table_arg,
 
 	return status;
 }
-/* Convert the unicode UEFI command line to ASCII to pass to kernel.
+
+/*
+ * Get the number of UTF-8 bytes corresponding to an UTF-16 character.
+ * This overestimates for surrogates, but that is okay.
+ */
+static int efi_utf8_bytes(u16 c)
+{
+	return 1 + (c = 0x80) + (c = 0x800);
+}
+
+/*
+ * Convert an UTF-16 string, not necessarily null terminated, to UTF-8.
+ */
+static u8 *efi_utf8_to_utf16(u8 *dst, const u16 *src, int n)
+{
+	unsigned int c;
+
+	while (n--) {
+		c = *src++;
+		if (n  c = 0xd800  c = 0xdbff 
+		*src = 0xdc00  *src = 0xdfff) {
+			c = 0x1 + ((c  0x3ff)  10) + (*src  0x3ff);
+			src++;
+			n--;
+		}
+		if (c = 0xd800  c = 0xdfff)
+			c = 0xfffd; /* Unmatched surrogate */
+		if (c  0x80) {
+			*dst++ = c;
+			continue;
+		}
+		if (c  0x800) {
+			*dst++ = 0xc0 + (c  6);
+			goto t1;
+		}
+		if (c  0x1) {
+			*dst++ = 0xe0 + (c  12);
+			goto t2;
+		}
+		*dst++ = 0xf0 + (c  18);
+		*dst++ = 0x80 + ((c  12)  0x3f);
+	t2:
+		*dst++ = 0x80 + ((c  6)  0x3f);
+	t1:
+		*dst++ = 0x80 + (c  0x3f);
+	}
+
+	return dst;
+}
+
+/*
+ * Convert the unicode UEFI command line to ASCII to pass to kernel.
  * Size of memory allocated return in *cmd_line_len.
  * Returns NULL on error.
  */
-static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
-  efi_loaded_image_t *image,
-  int *cmd_line_len)
+static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg,
+ efi_loaded_image_t *image,
+ int *cmd_line_len)
 {
-	u16 *s2;
+	const u16 *s2;
 	u8 *s1 = NULL;
 	unsigned long cmdline_addr = 0;
 	int load_options_size = image-load_options_size / 2; /* ASCII */
-	void *options = image-load_options;
-	int options_size = 0;
+	const void *options = image-load_options;
+	int options_bytes = 0;	/* UTF-8 bytes */
+	int options_chars = 0;	/* UTF-16 chars */
 	efi_status_t status;
 	int i;
 	u16 zero = 0;
@@ -588,40 +640,39 @@ static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
 	if (options) {
 		s2 = options;
 		while (*s2  *s2 != '\n'  options_size  load_options_size) {
+			options_bytes += efi_utf8_bytes(*s2);
 			s2++;
-			options_size++;
 		}
+		options_chars = s2 - options;
 	}
 
-	if (options_size == 0) {
-		/* No command line options, so return empty string*/
-		options_size = 1;
+	if (!options_chars) {
+		/* No command line options, so return empty string */
 		options = zero;
 	}
 
-	options_size++;  /* NUL termination */
+	options_bytes++;	/* NUL termination */
+
 #ifdef CONFIG_ARM
 	/* For ARM, allocate at a high address to avoid reserved
 	 * regions at low addresses that we don't know the specfics of
 	 * at the time we are processing the command line.
 	 */
-	status = efi_high_alloc(sys_table_arg, options_size, 0,
+	status = efi_high_alloc(sys_table_arg, options_bytes, 0,
 			cmdline_addr, 0xf000);
 #else
-	status = efi_low_alloc(sys_table_arg, 

Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-19 Thread Adam Borowski
On Wed, Sep 18, 2013 at 09:48:44PM -0700, Roy Franz wrote:
> On Wed, Sep 18, 2013 at 8:44 PM, Adam Borowski  wrote:
> > [UCS2 truncation]
> 
> I stuck to re-arranging the code that was there, as I don't know enough
> about character encodings to propose changes.

I on the other hand don't know the kernel (lurking because of my first
patch), but I'm on a crusade against mangled Unicode (so far in the
userland).  Can't let such a blatant error slip through on my watch :)

> Also, this code is running as part of the kernel decompressor, rather than
> the kernel itself, so it doesn't have access to any kernel facilities, and
> it also needs to be position independent.

Ok, so it can't reuse common libraries.  No problem, a simplified, sanitized
and optimized copy of utf16s_to_utf8s() can be done in quite less code than
the original.

> It's running in a quite limited environment - the decompressor has
> its own copy of strstr(), and other string functions.

I'd need nothing but a way to alloc the new string.  And I see this is
already done (efi_{low,high_alloc()).

> I checked the UEFI specification, and it states that all 16 bit strings
> are UCS-2, unless otherwise noted.

... which means it will either get upgraded to UTF-16 in a subsequent
version, or some Unicode strings get mangled.  I'd ignore this bit and
implement full UTF-16 from the start: every legal UCS-2 string can be
decoded as UTF-16 so it's a strict superset.

> The load options that the command line is provided through a void pointer
> specified as: [snip]

Either a null pointer or a 16-bit string, that sounds clear enough.

I see not a word about endianness (does anything do EFI on big endian?),
but "same as host" seems to be a reasonable assumption.

> Would it be acceptable to fix the naming/comments, and convert values
> above 126 to '?' in the current patchset, and address a more thorough fix
> in another patch set?  The ARM and ARM64 EFI stub patchsets that are
> mostly complete depend on this one, so getting this merged soon would be
> helpful.

I don't want to hinder your work, so what about putting in your version
as-is and fixing it later?

> > There's just one problem: which encoding to use, but
> > these days, most distributions have either dropped non-UTF8 or hardly pay
> > lip service, so we could get away with hard-coding UTF-8: those few who
> > use ancient charsets can stick to ASCII.

Not being able to use regular kernel facilities makes supporting ancient
charsets a lost cause.  I'm so weeping about them... not.

> I would certainly appreciate your help improving this

Are we on the same page so far?  If so, I can make a patch atop yours.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-19 Thread Adam Borowski
On Wed, Sep 18, 2013 at 09:48:44PM -0700, Roy Franz wrote:
 On Wed, Sep 18, 2013 at 8:44 PM, Adam Borowski kilob...@angband.pl wrote:
  [UCS2 truncation]
 
 I stuck to re-arranging the code that was there, as I don't know enough
 about character encodings to propose changes.

I on the other hand don't know the kernel (lurking because of my first
patch), but I'm on a crusade against mangled Unicode (so far in the
userland).  Can't let such a blatant error slip through on my watch :)

 Also, this code is running as part of the kernel decompressor, rather than
 the kernel itself, so it doesn't have access to any kernel facilities, and
 it also needs to be position independent.

Ok, so it can't reuse common libraries.  No problem, a simplified, sanitized
and optimized copy of utf16s_to_utf8s() can be done in quite less code than
the original.

 It's running in a quite limited environment - the decompressor has
 its own copy of strstr(), and other string functions.

I'd need nothing but a way to alloc the new string.  And I see this is
already done (efi_{low,high_alloc()).

 I checked the UEFI specification, and it states that all 16 bit strings
 are UCS-2, unless otherwise noted.

... which means it will either get upgraded to UTF-16 in a subsequent
version, or some Unicode strings get mangled.  I'd ignore this bit and
implement full UTF-16 from the start: every legal UCS-2 string can be
decoded as UTF-16 so it's a strict superset.

 The load options that the command line is provided through a void pointer
 specified as: [snip]

Either a null pointer or a 16-bit string, that sounds clear enough.

I see not a word about endianness (does anything do EFI on big endian?),
but same as host seems to be a reasonable assumption.

 Would it be acceptable to fix the naming/comments, and convert values
 above 126 to '?' in the current patchset, and address a more thorough fix
 in another patch set?  The ARM and ARM64 EFI stub patchsets that are
 mostly complete depend on this one, so getting this merged soon would be
 helpful.

I don't want to hinder your work, so what about putting in your version
as-is and fixing it later?

  There's just one problem: which encoding to use, but
  these days, most distributions have either dropped non-UTF8 or hardly pay
  lip service, so we could get away with hard-coding UTF-8: those few who
  use ancient charsets can stick to ASCII.

Not being able to use regular kernel facilities makes supporting ancient
charsets a lost cause.  I'm so weeping about them... not.

 I would certainly appreciate your help improving this

Are we on the same page so far?  If so, I can make a patch atop yours.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-18 Thread Roy Franz
On Wed, Sep 18, 2013 at 8:44 PM, Adam Borowski  wrote:
> On Mon, Sep 16, 2013 at 09:11:25PM -0700, Roy Franz wrote:
>> +/* Convert the unicode UEFI command line to ASCII to pass to kernel.
>> + * Size of memory allocated return in *cmd_line_len.
>> + * Returns NULL on error.
>> + */
>> +static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
>
>> +   int load_options_size = image->load_options_size / 2; /* ASCII */
>
>> + for (i = 0; i < options_size - 1; i++)
>> + *s1++ = *s2++;
>
> I'm afraid both this commit and comments inside are misnamed.  What you're
> changing here is the encoding rather than character set.
>
> In fact, these days it's 8-bit encodings that are more likely to be Unicode
> than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
> In either case, though, we have here is a 7-bit charset encoded as either
> 8-bit or 16-bit units.  What this function does is blindly truncating upper
> byte.  The supported payload is in both cases ASCII.
>
> I'd thus rename the function to what it already does: truncating u16 to u8,
> and adjust comments accordingly.
>
> Replacing values above 126 with a token character like '?' would be good
> too: that'd avoid producing corrupted characters and/or random ASCII chars.

I stuck to re-arranging the code that was there, as I don't know
enough about character encodings
to propose changes.  Also, this code is running as part of the kernel
decompressor, rather
than the kernel itself, so it doesn't have access to any kernel
facilities, and it also needs to
be position independent.  It's running in a quite limited environment
-  the decompressor has
its own copy of strstr(), and other string functions.

I checked the UEFI specification, and it states that all 16 bit
strings are UCS-2, unless
otherwise noted.  The load options that the command line is provided
through a void pointer
specified as:

"LoadOptions A pointer to the image's binary load options."

I couldn't quickly find any other mentions of LoadOptions in the
spec., so I don't know what
other types of values could be, or are typically provided.  The spec
isn't helpful regarding the type
of this data, and I don't know what common practice is here among the
various EFI/UEFI firmwares
out there.

Would it be acceptable to fix the naming/comments, and convert values
above 126 to '?'
in the current patchset, and address a more thorough fix in another patch set?
The ARM and ARM64 EFI stub patchsets that are mostly complete depend
on this one,
so getting this merged soon would be helpful.

Something like:
>> +/* Convert the UEFI command line from 16 bit to 8 bit encoding to pass to 
>> kernel.
>> + * by truncating to 7 bits.  Values above 126 are converted to 63 (ASCII 
>> '?')
>> + * Size of memory allocated return in *cmd_line_len.
>> + * Returns NULL on error.
>> + */
+static char *efi_truncate_cmdline_to_8bit_encoding(efi_system_table_t
*sys_table_arg,
(update code to truncate and convert to '?')

>
> Your commit only moves things around, so it might be out of scope for now,
> but I wonder: what if the kernel actually supported Unicode here?  Few
> cmdline arguments take values where non-ASCII makes sense, but at least some
> do: for example, a Russian guy is not unlikely to name subvolumes using
> cyrillic.  Supporting that would be easy (estimating the length then
> tf16us_to_utf8s()).  There's just one problem: which encoding to use, but
> these days, most distributions have either dropped non-UTF8 or hardly pay
> lip service, so we could get away with hard-coding UTF-8: those few who
> use ancient charsets can stick to ASCII.  Would this be ok?  If so, shout,
> I can code this if you don't care enough.

I would certainly appreciate your help improving this, although I think we'll
need to have a better understanding of what the UEFI firmware is providing
before we can implement something better.


>
> --
> ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-18 Thread H. Peter Anvin
On 09/18/2013 10:44 PM, Adam Borowski wrote:
> 
> In fact, these days it's 8-bit encodings that are more likely to be Unicode
> than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
> In either case, though, we have here is a 7-bit charset encoded as either
> 8-bit or 16-bit units.  What this function does is blindly truncating upper
> byte.  The supported payload is in both cases ASCII.
> 
> I'd thus rename the function to what it already does: truncating u16 to u8,
> and adjust comments accordingly.
> 
> Replacing values above 126 with a token character like '?' would be good
> too: that'd avoid producing corrupted characters and/or random ASCII chars.
> 
> Your commit only moves things around, so it might be out of scope for now,
> but I wonder: what if the kernel actually supported Unicode here?  Few
> cmdline arguments take values where non-ASCII makes sense, but at least some
> do: for example, a Russian guy is not unlikely to name subvolumes using
> cyrillic.  Supporting that would be easy (estimating the length then
> utf16s_to_utf8s()).  There's just one problem: which encoding to use, but
> these days, most distributions have either dropped non-UTF8 or hardly pay
> lip service, so we could get away with hard-coding UTF-8: those few who
> use ancient charsets can stick to ASCII.  Would this be ok?  If so, shout,
> I can code this if you don't care enough.
> 

We should, indeed, do proper conversion to UTF-8 here.

I also suspect we should assume the input is UTF-16 rather than UCS-2,
although that is a bit more exotic.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-18 Thread Adam Borowski
On Mon, Sep 16, 2013 at 09:11:25PM -0700, Roy Franz wrote:
> +/* Convert the unicode UEFI command line to ASCII to pass to kernel.
> + * Size of memory allocated return in *cmd_line_len.
> + * Returns NULL on error.
> + */
> +static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,

> +   int load_options_size = image->load_options_size / 2; /* ASCII */

> + for (i = 0; i < options_size - 1; i++)
> + *s1++ = *s2++;

I'm afraid both this commit and comments inside are misnamed.  What you're
changing here is the encoding rather than character set.

In fact, these days it's 8-bit encodings that are more likely to be Unicode
than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
In either case, though, we have here is a 7-bit charset encoded as either
8-bit or 16-bit units.  What this function does is blindly truncating upper
byte.  The supported payload is in both cases ASCII.

I'd thus rename the function to what it already does: truncating u16 to u8,
and adjust comments accordingly.

Replacing values above 126 with a token character like '?' would be good
too: that'd avoid producing corrupted characters and/or random ASCII chars.

Your commit only moves things around, so it might be out of scope for now,
but I wonder: what if the kernel actually supported Unicode here?  Few
cmdline arguments take values where non-ASCII makes sense, but at least some
do: for example, a Russian guy is not unlikely to name subvolumes using
cyrillic.  Supporting that would be easy (estimating the length then
utf16s_to_utf8s()).  There's just one problem: which encoding to use, but
these days, most distributions have either dropped non-UTF8 or hardly pay
lip service, so we could get away with hard-coding UTF-8: those few who
use ancient charsets can stick to ASCII.  Would this be ok?  If so, shout,
I can code this if you don't care enough.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-18 Thread Adam Borowski
On Mon, Sep 16, 2013 at 09:11:25PM -0700, Roy Franz wrote:
 +/* Convert the unicode UEFI command line to ASCII to pass to kernel.
 + * Size of memory allocated return in *cmd_line_len.
 + * Returns NULL on error.
 + */
 +static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,

 +   int load_options_size = image-load_options_size / 2; /* ASCII */

 + for (i = 0; i  options_size - 1; i++)
 + *s1++ = *s2++;

I'm afraid both this commit and comments inside are misnamed.  What you're
changing here is the encoding rather than character set.

In fact, these days it's 8-bit encodings that are more likely to be Unicode
than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
In either case, though, we have here is a 7-bit charset encoded as either
8-bit or 16-bit units.  What this function does is blindly truncating upper
byte.  The supported payload is in both cases ASCII.

I'd thus rename the function to what it already does: truncating u16 to u8,
and adjust comments accordingly.

Replacing values above 126 with a token character like '?' would be good
too: that'd avoid producing corrupted characters and/or random ASCII chars.

Your commit only moves things around, so it might be out of scope for now,
but I wonder: what if the kernel actually supported Unicode here?  Few
cmdline arguments take values where non-ASCII makes sense, but at least some
do: for example, a Russian guy is not unlikely to name subvolumes using
cyrillic.  Supporting that would be easy (estimating the length then
utf16s_to_utf8s()).  There's just one problem: which encoding to use, but
these days, most distributions have either dropped non-UTF8 or hardly pay
lip service, so we could get away with hard-coding UTF-8: those few who
use ancient charsets can stick to ASCII.  Would this be ok?  If so, shout,
I can code this if you don't care enough.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-18 Thread H. Peter Anvin
On 09/18/2013 10:44 PM, Adam Borowski wrote:
 
 In fact, these days it's 8-bit encodings that are more likely to be Unicode
 than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
 In either case, though, we have here is a 7-bit charset encoded as either
 8-bit or 16-bit units.  What this function does is blindly truncating upper
 byte.  The supported payload is in both cases ASCII.
 
 I'd thus rename the function to what it already does: truncating u16 to u8,
 and adjust comments accordingly.
 
 Replacing values above 126 with a token character like '?' would be good
 too: that'd avoid producing corrupted characters and/or random ASCII chars.
 
 Your commit only moves things around, so it might be out of scope for now,
 but I wonder: what if the kernel actually supported Unicode here?  Few
 cmdline arguments take values where non-ASCII makes sense, but at least some
 do: for example, a Russian guy is not unlikely to name subvolumes using
 cyrillic.  Supporting that would be easy (estimating the length then
 utf16s_to_utf8s()).  There's just one problem: which encoding to use, but
 these days, most distributions have either dropped non-UTF8 or hardly pay
 lip service, so we could get away with hard-coding UTF-8: those few who
 use ancient charsets can stick to ASCII.  Would this be ok?  If so, shout,
 I can code this if you don't care enough.
 

We should, indeed, do proper conversion to UTF-8 here.

I also suspect we should assume the input is UTF-16 rather than UCS-2,
although that is a bit more exotic.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

2013-09-18 Thread Roy Franz
On Wed, Sep 18, 2013 at 8:44 PM, Adam Borowski kilob...@angband.pl wrote:
 On Mon, Sep 16, 2013 at 09:11:25PM -0700, Roy Franz wrote:
 +/* Convert the unicode UEFI command line to ASCII to pass to kernel.
 + * Size of memory allocated return in *cmd_line_len.
 + * Returns NULL on error.
 + */
 +static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,

 +   int load_options_size = image-load_options_size / 2; /* ASCII */

 + for (i = 0; i  options_size - 1; i++)
 + *s1++ = *s2++;

 I'm afraid both this commit and comments inside are misnamed.  What you're
 changing here is the encoding rather than character set.

 In fact, these days it's 8-bit encodings that are more likely to be Unicode
 than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
 In either case, though, we have here is a 7-bit charset encoded as either
 8-bit or 16-bit units.  What this function does is blindly truncating upper
 byte.  The supported payload is in both cases ASCII.

 I'd thus rename the function to what it already does: truncating u16 to u8,
 and adjust comments accordingly.

 Replacing values above 126 with a token character like '?' would be good
 too: that'd avoid producing corrupted characters and/or random ASCII chars.

I stuck to re-arranging the code that was there, as I don't know
enough about character encodings
to propose changes.  Also, this code is running as part of the kernel
decompressor, rather
than the kernel itself, so it doesn't have access to any kernel
facilities, and it also needs to
be position independent.  It's running in a quite limited environment
-  the decompressor has
its own copy of strstr(), and other string functions.

I checked the UEFI specification, and it states that all 16 bit
strings are UCS-2, unless
otherwise noted.  The load options that the command line is provided
through a void pointer
specified as:

LoadOptions A pointer to the image's binary load options.

I couldn't quickly find any other mentions of LoadOptions in the
spec., so I don't know what
other types of values could be, or are typically provided.  The spec
isn't helpful regarding the type
of this data, and I don't know what common practice is here among the
various EFI/UEFI firmwares
out there.

Would it be acceptable to fix the naming/comments, and convert values
above 126 to '?'
in the current patchset, and address a more thorough fix in another patch set?
The ARM and ARM64 EFI stub patchsets that are mostly complete depend
on this one,
so getting this merged soon would be helpful.

Something like:
 +/* Convert the UEFI command line from 16 bit to 8 bit encoding to pass to 
 kernel.
 + * by truncating to 7 bits.  Values above 126 are converted to 63 (ASCII 
 '?')
 + * Size of memory allocated return in *cmd_line_len.
 + * Returns NULL on error.
 + */
+static char *efi_truncate_cmdline_to_8bit_encoding(efi_system_table_t
*sys_table_arg,
(update code to truncate and convert to '?')


 Your commit only moves things around, so it might be out of scope for now,
 but I wonder: what if the kernel actually supported Unicode here?  Few
 cmdline arguments take values where non-ASCII makes sense, but at least some
 do: for example, a Russian guy is not unlikely to name subvolumes using
 cyrillic.  Supporting that would be easy (estimating the length then
 tf16us_to_utf8s()).  There's just one problem: which encoding to use, but
 these days, most distributions have either dropped non-UTF8 or hardly pay
 lip service, so we could get away with hard-coding UTF-8: those few who
 use ancient charsets can stick to ASCII.  Would this be ok?  If so, shout,
 I can code this if you don't care enough.

I would certainly appreciate your help improving this, although I think we'll
need to have a better understanding of what the UEFI firmware is providing
before we can implement something better.



 --
 ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/