Hello,

1) The best I could find on lower case/upper case is [1];
The Wikipedia page you link to is about a code page and the collating sequence is the same as ASCII so no, that's not it.

2) In the cp1252 table "A" < "a", it follows the numeric order 0x31 < 0x41. But what R is using is the locale LC_COLLATE setting, not the "C" one.

How to validate the end results? The best way is to check the current setting, with Sys.getlocale.



[1] https://books.google.pt/books?id=GkajBQAAQBAJ&pg=PA259&lpg=PA259&dq=collating+sequence+portuguese&source=bl&ots=fVnUYHz0ev&sig=ACfU3U3xjpJfPNcWEfvwb_2nScYb89CeOw&hl=pt-PT&sa=X&ved=2ahUKEwiAoNTW-JP3AhVI1xoKHXT-C4oQ6AF6BAgUEAM#v=onepage&q=collating%20sequence%20portuguese&f=false


Hope this helps,

Rui Barradas

Às 16:33 de 14/04/2022, Kristjan Kure escreveu:
Hi Rui

Thank you for the code snippet.

1) How do you find your "Portuguese_Portugal.1252" symbols table now?
Is it this https://en.wikipedia.org/wiki/Windows-1252 <https://en.wikipedia.org/wiki/Windows-1252>?

2) What attributes and values do you check to validate the end result?
I see there is a section "Codepage layout" and I can find "A" and "a" symbols.

What values on that table tell you "A" is bigger than "a"?
"A" < "a" # returns FALSE
"A" > "a" # returns TRUE

PS! My locale is Estonian_Estonia.1257

Regards,
Kristjan

On Thu, Apr 14, 2022 at 5:05 PM Rui Barradas <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>> wrote:

    Hello,

    This is a locale issue, you are counting on the ASCII table codes but
    that's only valid for the "C" locale.

    old_loc <- Sys.getlocale("LC_COLLATE")

    "A" < "a"
    #> [1] FALSE
    "A" > "a"
    #> [1] TRUE

    Sys.setlocale("LC_COLLATE", locale = "C")
    #> [1] "C"

    "A" < "a"
    #> [1] TRUE
    "A" > "a"
    #> [1] FALSE

    Sys.setlocale("LC_COLLATE", old_loc)
    #> [1] "Portuguese_Portugal.1252"


    Hope this helps,

    Rui Barradas

    Às 15:06 de 13/04/2022, Kristjan Kure escreveu:
     > Hi!
     >
     > Sorry, I am a beginner in R.
     >
     > I was not able to find answers to my questions (tried Google, Stack
     > Overflow, etc). Please correct me if anything is wrong here.
     >
     > When comparing symbols/strings in R - raw numeric values are compared
     > symbol by symbol starting from left? If raw numeric values are
    not used is
     > there an ASCII / Unicode table where symbols have
    values/ranking/order and
     > R compares those values?
     >
     > *2) Comparing symbols*
     > Letter "a" raw value is 61, letter "b" raw value is 62? Is this
    correct?
     >
     > # Raw value for "a" = 61
     > a_raw <- charToRaw("a")
     > a_raw
     >
     > # Raw value for "b" = 62
     > b_raw <- charToRaw("b")
     > b_raw
     >
     > # equals TRUE
     > "a" < "b"
     >
     > Ok, so 61 is less than 62 so it's TRUE. Is this correct?
     >
     > *3) Comparing strings #1*
     > "1040" <= "12000"
     >
     > raw_1040 <- charToRaw("1040")
     > raw_1040
     > #31 *30* (comparison happens with the second symbol) 34 30
     >
     > raw_12000 <- charToRaw("12000")
     > raw_12000
     > #31 *32* (comparison happens with the second symbol) 30 30 30
     >
     > The symbol in the second position is 30 and it's less than 32.
    Equals to
     > true. Is this correct?
     >
     > *4) Comparing strings #2*
     > "1040" <= "10000"
     >
     > raw_1040 <- charToRaw("1040")
     > raw_1040
     > #31 30 *34*  (comparison happens with third symbol) 30
     >
     > raw_10000 <- charToRaw("10000")
     > raw_10000
     > #31 30 *30*  (comparison happens with third symbol) 30 30
     >
     > The symbol in the third position is 34 is greater than 30. Equals
    to false.
     > Is this correct?
     >
     > *5) Problem - Why does this equal FALSE?*
     > *"A" < "a"*
     >
     > 41 < 61 # FALSE?
     >
     > # Raw value for "A" = 41
     > A_raw <- charToRaw("A")
     > A_raw
     >
     > # Raw value for "a" = 61
     > a_raw <- charToRaw("a")
     > a_raw
     >
     > Why is capitalized "A" not less than lowercase "a"? Based on raw
    values it
     > should be. What am I missing here?
     >
     > Thanks
     > Kristjan
     >
     >       [[alternative HTML version deleted]]
     >
     > ______________________________________________
     > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    -- To UNSUBSCRIBE and more, see
     > https://stat.ethz.ch/mailman/listinfo/r-help
    <https://stat.ethz.ch/mailman/listinfo/r-help>
     > PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    <http://www.R-project.org/posting-guide.html>
     > and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to