Log Message
TextCodec refinements https://bugs.webkit.org/show_bug.cgi?id=216219 Reviewed by Sam Weinig.
Source/_javascript_Core: * parser/Lexer.h: (JSC::Lexer<UChar>::isWhiteSpace): Use byteOrderMark constant. Source/WebCore: * dom/TextDecoder.cpp: (WebCore::TextDecoder::bytesNeededForFullBOMIgnoreCheck const): Deleted. (WebCore::TextDecoder::isBeginningOfIncompleteBOM const): Deleted. (WebCore::TextDecoder::ignoreBOMIfNecessary): Deleted. (WebCore::TextDecoder::decode): Call stripByteOrderMark on the TextCodec to tell it to ignore the BOM, once when creating the codec, and also after each non-streaming invocation. Removed the rest of the BOM handling. * dom/TextDecoder.h: Removed WaitForMoreBOMBytes, ignoreBOMIfNecessary, bytesNeededForFullBOMIgnoreCheck, isBeginningOfIncompleteBOM, m_buffer, m_bomIgnoredIfNecessary, and made m_options const. * platform/text/TextCodec.h: Added a virtual stripByteOrderMark function to be used before decoding; does nothing by default. Changed the encode function to be a const member function to help implementers remember to not have it save any state, unlike the decode function which should. * platform/text/TextCodecCJK.cpp: Moved the TextCodecCJK::Encoding enumeration in here. (WebCore::jis0208DecodeIndex): Use a named type and use std::size instead of WTF_ARRAY_LENGTH. (WebCore::codePointJIS0212): Ditto. (WebCore::iso2022JPEncode): Made this a non-member function and moved the encoding state in here since each call to encode is separate and we don't want to leave any state behind in the TextCodec between calls. (WebCore::eucKREncodingIndex): Use a refernece instead of a pointer and std::size instead of WTF_ARRAY_LENGTH. (WebCore::big5DecodeIndex): Ditto. (WebCore::TextCodecCJK::encode const): Made const. * platform/text/TextCodecCJK.h: Marked the class final, moved the enumeration values for Encoding, ISO2022JPEncoderState, and m_iso2022JPEncoderState out of the class definition. Made encode const. * platform/text/TextCodecICU.cpp: (WebCore::TextCodecICU::encode const): Made const. * platform/text/TextCodecICU.h: Marked the class final, and rearranged the class members to match other TextCodec classes, with register functions before the constructor and destructor. Made encode const. * platform/text/TextCodecLatin1.cpp: (WebCore::TextCodecLatin1::encode const): Made const. * platform/text/TextCodecLatin1.h: Removed a stray blank line and made encode const. * platform/text/TextCodecReplacement.cpp: (WebCore::TextCodecReplacement::encode const): Added. * platform/text/TextCodecReplacement.h: Marked the class final, changed it to no longer derive from TextCodecUTF8, and added an encode function. * platform/text/TextCodecSingleByte.cpp: Moved the TextCodecSingleByte::Encoding enumeration in here and changed from Iso to ISO. Added SingleByteDecodeTable type and moved TextCodecSingleByte::EncodeTable in here and renamed it to SingleByteEncodeTable so it's not a class member any more. Marked all the decode tables static since we don't get internal linkage from just marking them constexpr, while moving to use the SingleByteDecodeTable type. (WebCore::tableForEncoding): Use SingleByteDecodeTable and SingleByteEncodeTable type names. Use std::count to count the replacement characters instead of writing our own loop. Use std::size(decodeTable) instead of defining a tableSize. Update for ISO name change. Use RELEASE_ASSERT_NOT_REACHED so we don't have to write a dead code return statement. (WebCore::tableForDecoding): Use SingleByteDecideTable type for return value. Update for ISO name change. Use RELEASE_ASSERT_NOT_REACHED so we don't have to write a dead code return statement. Also make this constexpr since it's just selecting a global based on an enumeration value. (WebCore::encode): Made this a non-member function since it does not need access to TextCodec members. This helps us keep implementation details out of the header. (WebCore::decode): Ditto. (WebCore::TextCodecSingleByte::encode const): Made this const and updated to call the non-member function. (WebCore::TextCodecSingleByte::decode): Ditto. (WebCore::TextCodecSingleByte::registerCodecs): Update for the ISO name change. * platform/text/TextCodecSingleByte.h: Marked the class final, and moved the enumeration values for Encoding, EncodeTable, and the encode and decode functions that take table arguments all out of the class definition. Made encode const. * platform/text/TextCodecUTF16.cpp: (WebCore::TextCodecUTF16::decode): Added logic to drop the first byte order mark after m_shouldStripByteOrderMark is set to true. Changed code to call through rather than recursively calling self in the case of an unpaired lead surrogate, removing the need to put the processBytesShared lambda into a Function. Renamed the processBytesShared lambda to processCodeUnit. (WebCore::TextCodecUTF16::encode const): Made const. * platform/text/TextCodecUTF16.h: Marked the class final, added a stripByteOrderMark member function and a m_shouldStripByteOrderMark data member, and made encode const. * platform/text/TextCodecUTF8.cpp: (WebCore::TextCodecUTF8::handlePartialSequence): Added logic to drop the first byte order mark after m_shouldStripByteOrderMark is set to true, making sure to keep it out of the hot ASCII decode loop. (WebCore::TextCodecUTF8::decode): Ditto. Also added code to make sure the partial sequence is cleared out as part of a flush even when stopOnError is true. (WebCore::TextCodecUTF8::encodeUTF8): Renamed this so it can be a static member function, so it can be called by TextCodecReplacement. (WebCore::TextCodecUTF8::encode const): Made this const and have it call encodeUTF8. * platform/text/TextCodecUTF8.h: Marked the class final, added a stripByteOrderMark member function and a m_shouldStripByteOrderMark data member, added static member function encodeUTF8, and made the encode function const. * platform/text/TextCodecUserDefined.cpp: (WebCore::TextCodecUserDefined::encode const): Made const. * platform/text/TextCodecUserDefined.h: Marked the class final and made encode const. * xml/XSLStyleSheetLibxslt.cpp: (WebCore::XSLStyleSheet::parseString): Use byteOrderMark. * xml/parser/XMLDocumentParserLibxml2.cpp: (WebCore::switchToUTF16): Ditto. (WebCore::nativeEndianUTF16Encoding): Ditto. Source/WTF: * wtf/unicode/CharacterNames.h: Use constexpr instead of just const. Added byteOrderMark, synonym for zeroWidthNoBreakSpace.
Modified Paths
- trunk/Source/_javascript_Core/ChangeLog
- trunk/Source/_javascript_Core/parser/Lexer.h
- trunk/Source/WTF/ChangeLog
- trunk/Source/WTF/wtf/unicode/CharacterNames.h
- trunk/Source/WebCore/ChangeLog
- trunk/Source/WebCore/dom/TextDecoder.cpp
- trunk/Source/WebCore/dom/TextDecoder.h
- trunk/Source/WebCore/platform/text/TextCodec.h
- trunk/Source/WebCore/platform/text/TextCodecCJK.cpp
- trunk/Source/WebCore/platform/text/TextCodecCJK.h
- trunk/Source/WebCore/platform/text/TextCodecICU.cpp
- trunk/Source/WebCore/platform/text/TextCodecICU.h
- trunk/Source/WebCore/platform/text/TextCodecLatin1.cpp
- trunk/Source/WebCore/platform/text/TextCodecLatin1.h
- trunk/Source/WebCore/platform/text/TextCodecReplacement.cpp
- trunk/Source/WebCore/platform/text/TextCodecReplacement.h
- trunk/Source/WebCore/platform/text/TextCodecSingleByte.cpp
- trunk/Source/WebCore/platform/text/TextCodecSingleByte.h
- trunk/Source/WebCore/platform/text/TextCodecUTF16.cpp
- trunk/Source/WebCore/platform/text/TextCodecUTF16.h
- trunk/Source/WebCore/platform/text/TextCodecUTF8.cpp
- trunk/Source/WebCore/platform/text/TextCodecUTF8.h
- trunk/Source/WebCore/platform/text/TextCodecUserDefined.cpp
- trunk/Source/WebCore/platform/text/TextCodecUserDefined.h
- trunk/Source/WebCore/xml/XSLStyleSheetLibxslt.cpp
- trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp
Diff
Modified: trunk/Source/_javascript_Core/ChangeLog (266680 => 266681)
--- trunk/Source/_javascript_Core/ChangeLog 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/_javascript_Core/ChangeLog 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,3 +1,13 @@
+2020-09-06 Darin Adler <[email protected]>
+
+ TextCodec refinements
+ https://bugs.webkit.org/show_bug.cgi?id=216219
+
+ Reviewed by Sam Weinig.
+
+ * parser/Lexer.h:
+ (JSC::Lexer<UChar>::isWhiteSpace): Use byteOrderMark constant.
+
2020-09-05 Yusuke Suzuki <[email protected]>
Unreviewed, suppress exception checking after unwrapForOldFunctions
Modified: trunk/Source/_javascript_Core/parser/Lexer.h (266680 => 266681)
--- trunk/Source/_javascript_Core/parser/Lexer.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/_javascript_Core/parser/Lexer.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -29,9 +29,12 @@
#include "SourceCode.h"
#include <wtf/ASCIICType.h>
#include <wtf/Vector.h>
+#include <wtf/unicode/CharacterNames.h>
namespace JSC {
+struct ParsedUnicodeEscapeValue;
+
enum class LexerFlags : uint8_t {
IgnoreReservedWords = 1 << 0,
DontBuildStrings = 1 << 1,
@@ -38,8 +41,6 @@
DontBuildKeywords = 1 << 2
};
-struct ParsedUnicodeEscapeValue;
-
bool isLexerKeyword(const Identifier&);
template <typename T>
@@ -240,7 +241,7 @@
template <>
ALWAYS_INLINE bool Lexer<UChar>::isWhiteSpace(UChar ch)
{
- return isLatin1(ch) ? Lexer<LChar>::isWhiteSpace(static_cast<LChar>(ch)) : (u_charType(ch) == U_SPACE_SEPARATOR || ch == 0xFEFF);
+ return isLatin1(ch) ? Lexer<LChar>::isWhiteSpace(static_cast<LChar>(ch)) : (u_charType(ch) == U_SPACE_SEPARATOR || ch == byteOrderMark);
}
template <>
Modified: trunk/Source/WTF/ChangeLog (266680 => 266681)
--- trunk/Source/WTF/ChangeLog 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WTF/ChangeLog 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,3 +1,13 @@
+2020-09-06 Darin Adler <[email protected]>
+
+ TextCodec refinements
+ https://bugs.webkit.org/show_bug.cgi?id=216219
+
+ Reviewed by Sam Weinig.
+
+ * wtf/unicode/CharacterNames.h: Use constexpr instead of just const.
+ Added byteOrderMark, synonym for zeroWidthNoBreakSpace.
+
2020-09-05 Myles C. Maxfield <[email protected]>
[Cocoa] USE(PLATFORM_SYSTEM_FALLBACK_LIST) is true on all Cocoa platforms, so there's no need to consult it in Cocoa-specific files
Modified: trunk/Source/WTF/wtf/unicode/CharacterNames.h (266680 => 266681)
--- trunk/Source/WTF/wtf/unicode/CharacterNames.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WTF/wtf/unicode/CharacterNames.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -35,74 +35,75 @@
// Most of these are UChar constants, not UChar32, which makes them
// more convenient for WebCore code that mostly uses UTF-16.
-const UChar AppleLogo = 0xF8FF;
-const UChar HiraganaLetterSmallA = 0x3041;
-const UChar32 aegeanWordSeparatorDot = 0x10101;
-const UChar32 aegeanWordSeparatorLine = 0x10100;
-const UChar apostrophe = 0x0027;
-const UChar blackCircle = 0x25CF;
-const UChar blackSquare = 0x25A0;
-const UChar blackUpPointingTriangle = 0x25B2;
-const UChar bullet = 0x2022;
-const UChar bullseye = 0x25CE;
-const UChar carriageReturn = 0x000D;
-const UChar combiningEnclosingKeycap = 0x20E3;
-const UChar ethiopicPrefaceColon = 0x1366;
-const UChar ethiopicWordspace = 0x1361;
-const UChar firstStrongIsolate = 0x2068;
-const UChar fisheye = 0x25C9;
-const UChar hebrewPunctuationGeresh = 0x05F3;
-const UChar hebrewPunctuationGershayim = 0x05F4;
-const UChar horizontalEllipsis = 0x2026;
-const UChar hyphen = 0x2010;
-const UChar hyphenMinus = 0x002D;
-const UChar ideographicComma = 0x3001;
-const UChar ideographicFullStop = 0x3002;
-const UChar ideographicSpace = 0x3000;
-const UChar leftDoubleQuotationMark = 0x201C;
-const UChar leftLowDoubleQuotationMark = 0x201E;
-const UChar leftSingleQuotationMark = 0x2018;
-const UChar leftLowSingleQuotationMark = 0x201A;
-const UChar leftToRightEmbed = 0x202A;
-const UChar leftToRightIsolate = 0x2066;
-const UChar leftToRightMark = 0x200E;
-const UChar leftToRightOverride = 0x202D;
-const UChar minusSign = 0x2212;
-const UChar narrowNoBreakSpace = 0x202F;
-const UChar narrowNonBreakingSpace = 0x202F;
-const UChar newlineCharacter = 0x000A;
-const UChar noBreakSpace = 0x00A0;
-const UChar objectReplacementCharacter = 0xFFFC;
-const UChar optionKey = 0x2325;
-const UChar popDirectionalFormatting = 0x202C;
-const UChar popDirectionalIsolate = 0x2069;
-const UChar quotationMark = 0x0022;
-const UChar replacementCharacter = 0xFFFD;
-const UChar rightDoubleQuotationMark = 0x201D;
-const UChar rightSingleQuotationMark = 0x2019;
-const UChar rightToLeftEmbed = 0x202B;
-const UChar rightToLeftIsolate = 0x2067;
-const UChar rightToLeftMark = 0x200F;
-const UChar rightToLeftOverride = 0x202E;
-const UChar sesameDot = 0xFE45;
-const UChar smallLetterSharpS = 0x00DF;
-const UChar softHyphen = 0x00AD;
-const UChar space = 0x0020;
-const UChar tabCharacter = 0x0009;
-const UChar tibetanMarkDelimiterTshegBstar = 0x0F0C;
-const UChar tibetanMarkIntersyllabicTsheg = 0x0F0B;
-const UChar32 ugariticWordDivider = 0x1039F;
-const UChar upArrowhead = 0x2303;
-const UChar whiteBullet = 0x25E6;
-const UChar whiteCircle = 0x25CB;
-const UChar whiteSesameDot = 0xFE46;
-const UChar whiteUpPointingTriangle = 0x25B3;
-const UChar wordJoiner = 0x2060;
-const UChar yenSign = 0x00A5;
-const UChar zeroWidthJoiner = 0x200D;
-const UChar zeroWidthNoBreakSpace = 0xFEFF;
-const UChar zeroWidthNonJoiner = 0x200C;
-const UChar zeroWidthSpace = 0x200B;
+constexpr UChar AppleLogo = 0xF8FF;
+constexpr UChar HiraganaLetterSmallA = 0x3041;
+constexpr UChar32 aegeanWordSeparatorDot = 0x10101;
+constexpr UChar32 aegeanWordSeparatorLine = 0x10100;
+constexpr UChar apostrophe = 0x0027;
+constexpr UChar blackCircle = 0x25CF;
+constexpr UChar blackSquare = 0x25A0;
+constexpr UChar blackUpPointingTriangle = 0x25B2;
+constexpr UChar bullet = 0x2022;
+constexpr UChar bullseye = 0x25CE;
+constexpr UChar byteOrderMark = 0xFEFF;
+constexpr UChar carriageReturn = 0x000D;
+constexpr UChar combiningEnclosingKeycap = 0x20E3;
+constexpr UChar ethiopicPrefaceColon = 0x1366;
+constexpr UChar ethiopicWordspace = 0x1361;
+constexpr UChar firstStrongIsolate = 0x2068;
+constexpr UChar fisheye = 0x25C9;
+constexpr UChar hebrewPunctuationGeresh = 0x05F3;
+constexpr UChar hebrewPunctuationGershayim = 0x05F4;
+constexpr UChar horizontalEllipsis = 0x2026;
+constexpr UChar hyphen = 0x2010;
+constexpr UChar hyphenMinus = 0x002D;
+constexpr UChar ideographicComma = 0x3001;
+constexpr UChar ideographicFullStop = 0x3002;
+constexpr UChar ideographicSpace = 0x3000;
+constexpr UChar leftDoubleQuotationMark = 0x201C;
+constexpr UChar leftLowDoubleQuotationMark = 0x201E;
+constexpr UChar leftSingleQuotationMark = 0x2018;
+constexpr UChar leftLowSingleQuotationMark = 0x201A;
+constexpr UChar leftToRightEmbed = 0x202A;
+constexpr UChar leftToRightIsolate = 0x2066;
+constexpr UChar leftToRightMark = 0x200E;
+constexpr UChar leftToRightOverride = 0x202D;
+constexpr UChar minusSign = 0x2212;
+constexpr UChar narrowNoBreakSpace = 0x202F;
+constexpr UChar narrowNonBreakingSpace = 0x202F;
+constexpr UChar newlineCharacter = 0x000A;
+constexpr UChar noBreakSpace = 0x00A0;
+constexpr UChar objectReplacementCharacter = 0xFFFC;
+constexpr UChar optionKey = 0x2325;
+constexpr UChar popDirectionalFormatting = 0x202C;
+constexpr UChar popDirectionalIsolate = 0x2069;
+constexpr UChar quotationMark = 0x0022;
+constexpr UChar replacementCharacter = 0xFFFD;
+constexpr UChar rightDoubleQuotationMark = 0x201D;
+constexpr UChar rightSingleQuotationMark = 0x2019;
+constexpr UChar rightToLeftEmbed = 0x202B;
+constexpr UChar rightToLeftIsolate = 0x2067;
+constexpr UChar rightToLeftMark = 0x200F;
+constexpr UChar rightToLeftOverride = 0x202E;
+constexpr UChar sesameDot = 0xFE45;
+constexpr UChar smallLetterSharpS = 0x00DF;
+constexpr UChar softHyphen = 0x00AD;
+constexpr UChar space = 0x0020;
+constexpr UChar tabCharacter = 0x0009;
+constexpr UChar tibetanMarkDelimiterTshegBstar = 0x0F0C;
+constexpr UChar tibetanMarkIntersyllabicTsheg = 0x0F0B;
+constexpr UChar32 ugariticWordDivider = 0x1039F;
+constexpr UChar upArrowhead = 0x2303;
+constexpr UChar whiteBullet = 0x25E6;
+constexpr UChar whiteCircle = 0x25CB;
+constexpr UChar whiteSesameDot = 0xFE46;
+constexpr UChar whiteUpPointingTriangle = 0x25B3;
+constexpr UChar wordJoiner = 0x2060;
+constexpr UChar yenSign = 0x00A5;
+constexpr UChar zeroWidthJoiner = 0x200D;
+constexpr UChar zeroWidthNoBreakSpace = 0xFEFF;
+constexpr UChar zeroWidthNonJoiner = 0x200C;
+constexpr UChar zeroWidthSpace = 0x200B;
} // namespace Unicode
} // namespace WTF
@@ -116,6 +117,7 @@
using WTF::Unicode::blackUpPointingTriangle;
using WTF::Unicode::bullet;
using WTF::Unicode::bullseye;
+using WTF::Unicode::byteOrderMark;
using WTF::Unicode::carriageReturn;
using WTF::Unicode::combiningEnclosingKeycap;
using WTF::Unicode::ethiopicPrefaceColon;
Modified: trunk/Source/WebCore/ChangeLog (266680 => 266681)
--- trunk/Source/WebCore/ChangeLog 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/ChangeLog 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,3 +1,137 @@
+2020-09-06 Darin Adler <[email protected]>
+
+ TextCodec refinements
+ https://bugs.webkit.org/show_bug.cgi?id=216219
+
+ Reviewed by Sam Weinig.
+
+ * dom/TextDecoder.cpp:
+ (WebCore::TextDecoder::bytesNeededForFullBOMIgnoreCheck const): Deleted.
+ (WebCore::TextDecoder::isBeginningOfIncompleteBOM const): Deleted.
+ (WebCore::TextDecoder::ignoreBOMIfNecessary): Deleted.
+ (WebCore::TextDecoder::decode): Call stripByteOrderMark on the TextCodec
+ to tell it to ignore the BOM, once when creating the codec, and also after
+ each non-streaming invocation. Removed the rest of the BOM handling.
+
+ * dom/TextDecoder.h: Removed WaitForMoreBOMBytes, ignoreBOMIfNecessary,
+ bytesNeededForFullBOMIgnoreCheck, isBeginningOfIncompleteBOM, m_buffer,
+ m_bomIgnoredIfNecessary, and made m_options const.
+
+ * platform/text/TextCodec.h: Added a virtual stripByteOrderMark function
+ to be used before decoding; does nothing by default. Changed the encode
+ function to be a const member function to help implementers remember to
+ not have it save any state, unlike the decode function which should.
+
+ * platform/text/TextCodecCJK.cpp: Moved the TextCodecCJK::Encoding
+ enumeration in here.
+ (WebCore::jis0208DecodeIndex): Use a named type and use std::size
+ instead of WTF_ARRAY_LENGTH.
+ (WebCore::codePointJIS0212): Ditto.
+ (WebCore::iso2022JPEncode): Made this a non-member function and moved
+ the encoding state in here since each call to encode is separate and
+ we don't want to leave any state behind in the TextCodec between calls.
+ (WebCore::eucKREncodingIndex): Use a refernece instead of a pointer
+ and std::size instead of WTF_ARRAY_LENGTH.
+ (WebCore::big5DecodeIndex): Ditto.
+ (WebCore::TextCodecCJK::encode const): Made const.
+
+ * platform/text/TextCodecCJK.h: Marked the class final, moved
+ the enumeration values for Encoding, ISO2022JPEncoderState, and
+ m_iso2022JPEncoderState out of the class definition. Made encode const.
+
+ * platform/text/TextCodecICU.cpp:
+ (WebCore::TextCodecICU::encode const): Made const.
+ * platform/text/TextCodecICU.h: Marked the class final, and
+ rearranged the class members to match other TextCodec classes,
+ with register functions before the constructor and destructor.
+ Made encode const.
+
+ * platform/text/TextCodecLatin1.cpp:
+ (WebCore::TextCodecLatin1::encode const): Made const.
+ * platform/text/TextCodecLatin1.h: Removed a stray blank line
+ and made encode const.
+
+ * platform/text/TextCodecReplacement.cpp:
+ (WebCore::TextCodecReplacement::encode const): Added.
+ * platform/text/TextCodecReplacement.h: Marked the class final,
+ changed it to no longer derive from TextCodecUTF8, and added
+ an encode function.
+
+ * platform/text/TextCodecSingleByte.cpp: Moved the
+ TextCodecSingleByte::Encoding enumeration in here and changed
+ from Iso to ISO. Added SingleByteDecodeTable type and
+ moved TextCodecSingleByte::EncodeTable in here and renamed
+ it to SingleByteEncodeTable so it's not a class member any more.
+ Marked all the decode tables static since we don't get internal
+ linkage from just marking them constexpr, while moving to use
+ the SingleByteDecodeTable type.
+ (WebCore::tableForEncoding): Use SingleByteDecodeTable and
+ SingleByteEncodeTable type names. Use std::count to count the
+ replacement characters instead of writing our own loop.
+ Use std::size(decodeTable) instead of defining a tableSize.
+ Update for ISO name change. Use RELEASE_ASSERT_NOT_REACHED so
+ we don't have to write a dead code return statement.
+ (WebCore::tableForDecoding): Use SingleByteDecideTable type
+ for return value. Update for ISO name change. Use
+ RELEASE_ASSERT_NOT_REACHED so we don't have to write a dead
+ code return statement. Also make this constexpr since it's
+ just selecting a global based on an enumeration value.
+ (WebCore::encode): Made this a non-member function since it
+ does not need access to TextCodec members. This helps us keep
+ implementation details out of the header.
+ (WebCore::decode): Ditto.
+ (WebCore::TextCodecSingleByte::encode const): Made this const
+ and updated to call the non-member function.
+ (WebCore::TextCodecSingleByte::decode): Ditto.
+ (WebCore::TextCodecSingleByte::registerCodecs): Update for
+ the ISO name change.
+
+ * platform/text/TextCodecSingleByte.h: Marked the class
+ final, and moved the enumeration values for Encoding, EncodeTable,
+ and the encode and decode functions that take table arguments
+ all out of the class definition. Made encode const.
+
+ * platform/text/TextCodecUTF16.cpp:
+ (WebCore::TextCodecUTF16::decode): Added logic to drop the first
+ byte order mark after m_shouldStripByteOrderMark is set to true.
+ Changed code to call through rather than recursively calling self
+ in the case of an unpaired lead surrogate, removing the need to
+ put the processBytesShared lambda into a Function. Renamed the
+ processBytesShared lambda to processCodeUnit.
+ (WebCore::TextCodecUTF16::encode const): Made const.
+
+ * platform/text/TextCodecUTF16.h: Marked the class final, added
+ a stripByteOrderMark member function and a m_shouldStripByteOrderMark
+ data member, and made encode const.
+
+ * platform/text/TextCodecUTF8.cpp:
+ (WebCore::TextCodecUTF8::handlePartialSequence): Added logic to
+ drop the first byte order mark after m_shouldStripByteOrderMark is
+ set to true, making sure to keep it out of the hot ASCII decode loop.
+ (WebCore::TextCodecUTF8::decode): Ditto. Also added code to make sure
+ the partial sequence is cleared out as part of a flush even when
+ stopOnError is true.
+ (WebCore::TextCodecUTF8::encodeUTF8): Renamed this so it can be
+ a static member function, so it can be called by TextCodecReplacement.
+ (WebCore::TextCodecUTF8::encode const): Made this const and have it
+ call encodeUTF8.
+
+ * platform/text/TextCodecUTF8.h: Marked the class final, added
+ a stripByteOrderMark member function and a m_shouldStripByteOrderMark
+ data member, added static member function encodeUTF8, and made the
+ encode function const.
+
+ * platform/text/TextCodecUserDefined.cpp:
+ (WebCore::TextCodecUserDefined::encode const): Made const.
+ * platform/text/TextCodecUserDefined.h: Marked the class final and
+ made encode const.
+
+ * xml/XSLStyleSheetLibxslt.cpp:
+ (WebCore::XSLStyleSheet::parseString): Use byteOrderMark.
+ * xml/parser/XMLDocumentParserLibxml2.cpp:
+ (WebCore::switchToUTF16): Ditto.
+ (WebCore::nativeEndianUTF16Encoding): Ditto.
+
2020-09-06 Zalan Bujtas <[email protected]>
[LFC][IFC] Move Line handing to LineBuilder
Modified: trunk/Source/WebCore/dom/TextDecoder.cpp (266680 => 266681)
--- trunk/Source/WebCore/dom/TextDecoder.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/dom/TextDecoder.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016 Apple Inc. All rights reserved.
+ * Copyright (C) 2016-2020 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -32,6 +32,12 @@
namespace WebCore {
+TextDecoder::TextDecoder(const char* label, Options options)
+ : m_textEncoding(label)
+ , m_options(options)
+{
+}
+
TextDecoder::~TextDecoder() = default;
ExceptionOr<Ref<TextDecoder>> TextDecoder::create(const String& label, Options options)
@@ -46,81 +52,6 @@
return decoder;
}
-TextDecoder::TextDecoder(const char* label, Options options)
- : m_textEncoding(label)
- , m_options(options)
-{
-}
-
-constexpr uint8_t utf8BOMBytes[3] { 0xEF, 0xBB, 0xBF };
-constexpr uint8_t utf16BEBOMBytes[2] { 0xFE, 0xFF };
-constexpr uint8_t utf16LEBOMBytes[2] { 0xFF, 0xFE };
-
-size_t TextDecoder::bytesNeededForFullBOMIgnoreCheck() const
-{
- if (m_textEncoding == UTF8Encoding())
- return sizeof(utf8BOMBytes);
- if (m_textEncoding == UTF16BigEndianEncoding())
- return sizeof(utf16BEBOMBytes);
- if (m_textEncoding == UTF16LittleEndianEncoding())
- return sizeof(utf16LEBOMBytes);
- return 0;
-}
-
-bool TextDecoder::isBeginningOfIncompleteBOM(const uint8_t* bytes, size_t length) const
-{
- if (!length)
- return true;
-
- if (m_textEncoding == UTF8Encoding()) {
- if (length == 1)
- return bytes[0] == utf8BOMBytes[0];
- return length == 2 && bytes[0] == utf8BOMBytes[0] && bytes[1] == utf8BOMBytes[1];
- }
- if (m_textEncoding == UTF16BigEndianEncoding())
- return length == 1 && bytes[0] == utf16BEBOMBytes[0];
- if (m_textEncoding == UTF16LittleEndianEncoding())
- return length == 1 && bytes[0] == utf16LEBOMBytes[0];
-
- return false;
-}
-
-auto TextDecoder::ignoreBOMIfNecessary(const uint8_t*& data, size_t& length, bool stream) -> WaitForMoreBOMBytes
-{
- if (m_bomIgnoredIfNecessary || m_options.ignoreBOM)
- return WaitForMoreBOMBytes::No;
-
- if (stream && length < bytesNeededForFullBOMIgnoreCheck()) {
- if (isBeginningOfIncompleteBOM(data, length))
- return WaitForMoreBOMBytes::Yes;
- m_bomIgnoredIfNecessary = true;
- return WaitForMoreBOMBytes::No;
- }
-
- if (m_textEncoding == UTF8Encoding()
- && length >= sizeof(utf8BOMBytes)
- && data[0] == utf8BOMBytes[0]
- && data[1] == utf8BOMBytes[1]
- && data[2] == utf8BOMBytes[2]) {
- data += sizeof(utf8BOMBytes);
- length -= sizeof(utf8BOMBytes);
- } else if (m_textEncoding == UTF16BigEndianEncoding()
- && length >= sizeof(utf16BEBOMBytes)
- && data[0] == utf16BEBOMBytes[0]
- && data[1] == utf16BEBOMBytes[1]) {
- data += sizeof(utf16BEBOMBytes);
- length -= sizeof(utf16BEBOMBytes);
- } else if (m_textEncoding == UTF16LittleEndianEncoding()
- && length >= sizeof(utf16LEBOMBytes)
- && data[0] == utf16LEBOMBytes[0]
- && data[1] == utf16LEBOMBytes[1]) {
- data += sizeof(utf16LEBOMBytes);
- length -= sizeof(utf16LEBOMBytes);
- }
- m_bomIgnoredIfNecessary = true;
- return WaitForMoreBOMBytes::No;
-}
-
ExceptionOr<String> TextDecoder::decode(Optional<BufferSource::VariantType> input, DecodeOptions options)
{
Optional<BufferSource> inputBuffer;
@@ -132,31 +63,18 @@
length = inputBuffer->length();
}
- if (!options.stream)
- m_bomIgnoredIfNecessary = false;
-
- bool alreadyBuffered = false;
- if (m_buffer.size()) {
- m_buffer.append(data, length);
- data = ""
- length = m_buffer.size();
- alreadyBuffered = true;
+ if (!m_codec) {
+ m_codec = newTextCodec(m_textEncoding);
+ if (!m_options.ignoreBOM)
+ m_codec->stripByteOrderMark();
}
- if (ignoreBOMIfNecessary(data, length, options.stream) == WaitForMoreBOMBytes::Yes) {
- ASSERT(options.stream);
- if (!alreadyBuffered)
- m_buffer.append(data, length);
- return String();
- }
+ bool sawError = false;
+ String result = m_codec->decode(reinterpret_cast<const char*>(data), length, !options.stream, m_options.fatal, sawError);
- auto oldBuffer = std::exchange(m_buffer, { });
+ if (!options.stream && !m_options.ignoreBOM)
+ m_codec->stripByteOrderMark();
- if (!m_codec)
- m_codec = newTextCodec(m_textEncoding);
-
- bool sawError = false;
- String result = m_codec->decode(reinterpret_cast<const char*>(data), length, !options.stream, false, sawError);
if (sawError && m_options.fatal)
return Exception { TypeError };
return result;
Modified: trunk/Source/WebCore/dom/TextDecoder.h (266680 => 266681)
--- trunk/Source/WebCore/dom/TextDecoder.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/dom/TextDecoder.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016 Apple Inc. All rights reserved.
+ * Copyright (C) 2016-2020 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -28,7 +28,6 @@
#include "ExceptionOr.h"
#include "TextEncoding.h"
#include <wtf/RefCounted.h>
-#include <wtf/text/WTFString.h>
namespace WebCore {
@@ -36,35 +35,29 @@
class TextDecoder : public RefCounted<TextDecoder> {
public:
+ ~TextDecoder();
+
struct Options {
bool fatal { false };
bool ignoreBOM { false };
};
- struct DecodeOptions {
- bool stream { false };
- };
-
static ExceptionOr<Ref<TextDecoder>> create(const String& label, Options);
- ~TextDecoder();
String encoding() const;
bool fatal() const { return m_options.fatal; }
bool ignoreBOM() const { return m_options.ignoreBOM; }
+
+ struct DecodeOptions {
+ bool stream { false };
+ };
ExceptionOr<String> decode(Optional<BufferSource::VariantType>, DecodeOptions);
private:
TextDecoder(const char*, Options);
- enum class WaitForMoreBOMBytes : bool { No, Yes };
- WaitForMoreBOMBytes ignoreBOMIfNecessary(const uint8_t*& data, size_t& length, bool stream);
- size_t bytesNeededForFullBOMIgnoreCheck() const;
- bool isBeginningOfIncompleteBOM(const uint8_t*, size_t) const;
-
const TextEncoding m_textEncoding;
+ const Options m_options;
std::unique_ptr<TextCodec> m_codec;
- Options m_options;
- Vector<uint8_t> m_buffer;
- bool m_bomIgnoredIfNecessary { false };
};
}
Modified: trunk/Source/WebCore/platform/text/TextCodec.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodec.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodec.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2004-2017 Apple Inc. All rights reserved.
+ * Copyright (C) 2004-2020 Apple Inc. All rights reserved.
* Copyright (C) 2006 Alexey Proskuryakov <[email protected]>
*
* Redistribution and use in source and binary forms, with or without
@@ -45,9 +45,11 @@
TextCodec() = default;
virtual ~TextCodec() = default;
+ virtual void stripByteOrderMark() { }
virtual String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) = 0;
- virtual Vector<uint8_t> encode(StringView, UnencodableHandling) = 0;
+ virtual Vector<uint8_t> encode(StringView, UnencodableHandling) const = 0;
+
// Fills a null-terminated string representation of the given
// unencodable character into the given replacement buffer.
// The length of the string (not including the null) will be returned.
@@ -62,4 +64,3 @@
using TextCodecRegistrar = void (*)(const char* name, NewTextCodecFunction&&);
} // namespace WebCore
-
Modified: trunk/Source/WebCore/platform/text/TextCodecCJK.cpp (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecCJK.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecCJK.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -33,6 +33,14 @@
namespace WebCore {
+enum class TextCodecCJK::Encoding : uint8_t {
+ EUC_JP,
+ ISO2022JP,
+ Shift_JIS,
+ EUC_KR,
+ Big5
+};
+
TextCodecCJK::TextCodecCJK(Encoding encoding)
: m_encoding(encoding)
{
@@ -113,11 +121,12 @@
});
}
-static const std::array<std::pair<uint16_t, UChar>, WTF_ARRAY_LENGTH(jis0208)>& jis0208DecodeIndex()
+using JIS0208DecodeIndex = std::array<std::pair<uint16_t, UChar>, std::size(jis0208)>;
+static const JIS0208DecodeIndex& jis0208DecodeIndex()
{
- static auto* table = [] {
- auto* table = new std::array<std::pair<uint16_t, UChar>, WTF_ARRAY_LENGTH(jis0208)>();
- for (size_t i = 0; i < WTF_ARRAY_LENGTH(jis0208); i++)
+ static auto& table = *[] {
+ auto* table = new JIS0208DecodeIndex;
+ for (size_t i = 0; i < std::size(jis0208); i++)
(*table)[i] = { jis0208[i].second, jis0208[i].first };
std::sort(table->begin(), table->end(), [] (auto& a, auto& b) {
return a.first < b.first;
@@ -124,7 +133,7 @@
});
return table;
}();
- return *table;
+ return table;
}
String TextCodecCJK::decodeCommon(const uint8_t* bytes, size_t length, bool flush, bool stopOnError, bool& sawError, const Function<SawError(uint8_t, StringBuilder&)>& byteParser)
@@ -183,7 +192,7 @@
static Optional<UChar> codePointJIS0212(uint16_t pointer)
{
- auto range = std::equal_range(jis0212, jis0212 + WTF_ARRAY_LENGTH(jis0212), std::pair<uint16_t, UChar>(pointer, 0), [](const auto& a, const auto& b) {
+ auto range = std::equal_range(jis0212, jis0212 + std::size(jis0212), std::pair<uint16_t, UChar>(pointer, 0), [](const auto& a, const auto& b) {
return a.first < b.first;
});
if (range.first != range.second) {
@@ -471,13 +480,16 @@
}
// https://encoding.spec.whatwg.org/#iso-2022-jp-encoder
-Vector<uint8_t> TextCodecCJK::iso2022JPEncode(StringView string, Function<void(UChar32, Vector<uint8_t>&)> unencodableHandler)
+static Vector<uint8_t> iso2022JPEncode(StringView string, Function<void(UChar32, Vector<uint8_t>&)> unencodableHandler)
{
+ enum class State : uint8_t { ASCII, Roman, Jis0208 };
+ State state { State::ASCII };
+
Vector<uint8_t> result;
result.reserveInitialCapacity(string.length());
auto changeStateToASCII = [&] {
- m_iso2022JPEncoderState = ISO2022JPEncoderState::ASCII;
+ state = State::ASCII;
result.append(0x1B);
result.append(0x28);
result.append(0x42);
@@ -484,7 +496,7 @@
};
auto statefulUnencodableHandler = [&] (UChar32 codePoint, Vector<uint8_t>& result) {
- if (m_iso2022JPEncoderState == ISO2022JPEncoderState::Jis0208)
+ if (state == State::Jis0208)
changeStateToASCII();
unencodableHandler(codePoint, result);
};
@@ -491,11 +503,11 @@
Function<void(UChar32)> parseCodePoint;
parseCodePoint = [&] (UChar32 codePoint) {
- if (m_iso2022JPEncoderState == ISO2022JPEncoderState::ASCII && isASCII(codePoint)) {
+ if (state == State::ASCII && isASCII(codePoint)) {
result.append(static_cast<uint8_t>(codePoint));
return;
}
- if (m_iso2022JPEncoderState == ISO2022JPEncoderState::Roman) {
+ if (state == State::Roman) {
if (isASCII(codePoint) && codePoint != 0x005C && codePoint !=0x007E) {
result.append(static_cast<uint8_t>(codePoint));
return;
@@ -509,14 +521,14 @@
return;
}
}
- if (isASCII(codePoint) && m_iso2022JPEncoderState != ISO2022JPEncoderState::ASCII) {
- if (m_iso2022JPEncoderState != ISO2022JPEncoderState::ASCII)
+ if (isASCII(codePoint) && state != State::ASCII) {
+ if (state != State::ASCII)
changeStateToASCII();
parseCodePoint(codePoint);
return;
}
- if ((codePoint == 0x00A5 || codePoint == 0x203E) && m_iso2022JPEncoderState != ISO2022JPEncoderState::Roman) {
- m_iso2022JPEncoderState = ISO2022JPEncoderState::Roman;
+ if ((codePoint == 0x00A5 || codePoint == 0x203E) && state != State::Roman) {
+ state = State::Roman;
result.append(0x1B);
result.append(0x28);
result.append(0x4A);
@@ -527,7 +539,7 @@
codePoint = 0xFF0D;
if (codePoint >= 0xFF61 && codePoint <= 0xFF9F) {
// From https://encoding.spec.whatwg.org/index-iso-2022-jp-katakana.txt
- static const UChar32 iso2022JPKatakana[63] {
+ static constexpr std::array<UChar32, 63> iso2022JPKatakana {
0x3002, 0x300C, 0x300D, 0x3001, 0x30FB, 0x30F2, 0x30A1, 0x30A3, 0x30A5, 0x30A7, 0x30A9, 0x30E3, 0x30E5, 0x30E7, 0x30C3, 0x30FC,
0x30A2, 0x30A4, 0x30A6, 0x30A8, 0x30AA, 0x30AB, 0x30AD, 0x30AF, 0x30B1, 0x30B3, 0x30B5, 0x30B7, 0x30B9, 0x30BB, 0x30BD, 0x30BF,
0x30C1, 0x30C4, 0x30C6, 0x30C8, 0x30CA, 0x30CB, 0x30CC, 0x30CD, 0x30CE, 0x30CF, 0x30D2, 0x30D5, 0x30D8, 0x30DB, 0x30DE, 0x30DF,
@@ -549,8 +561,8 @@
statefulUnencodableHandler(codePoint, result);
return;
}
- if (m_iso2022JPEncoderState != ISO2022JPEncoderState::Jis0208) {
- m_iso2022JPEncoderState = ISO2022JPEncoderState::Jis0208;
+ if (state != State::Jis0208) {
+ state = State::Jis0208;
result.append(0x1B);
result.append(0x24);
result.append(0x42);
@@ -566,7 +578,7 @@
for (WTF::CodePointIterator<UChar> iterator(characters.get(), characters.get() + string.length()); !iterator.atEnd(); ++iterator)
parseCodePoint(*iterator);
- if (m_iso2022JPEncoderState != ISO2022JPEncoderState::ASCII)
+ if (state != State::ASCII)
changeStateToASCII();
return result;
@@ -669,12 +681,12 @@
return result;
}
-using EUCKREncodingIndex = std::array<std::pair<UChar, uint16_t>, WTF_ARRAY_LENGTH(eucKRDecodingIndex)>;
+using EUCKREncodingIndex = std::array<std::pair<UChar, uint16_t>, std::size(eucKRDecodingIndex)>;
static const EUCKREncodingIndex& eucKREncodingIndex()
{
- static auto* table = [] {
- auto table = new EUCKREncodingIndex();
- for (size_t i = 0; i < WTF_ARRAY_LENGTH(eucKRDecodingIndex); i++)
+ static auto& table = *[] {
+ auto table = new EUCKREncodingIndex;
+ for (size_t i = 0; i < std::size(eucKRDecodingIndex); i++)
(*table)[i] = { eucKRDecodingIndex[i].second, eucKRDecodingIndex[i].first };
std::sort(table->begin(), table->end(), [] (auto& a, auto& b) {
return a.first < b.first;
@@ -681,7 +693,7 @@
});
return table;
}();
- return *table;
+ return table;
}
// https://encoding.spec.whatwg.org/#euc-kr-encoder
@@ -839,21 +851,21 @@
return entityUnencodableHandler;
}
-using Big5DecodeIndex = std::array<std::pair<uint16_t, UChar32>, WTF_ARRAY_LENGTH(big5DecodingExtras) + WTF_ARRAY_LENGTH(big5EncodingMap)>;
+using Big5DecodeIndex = std::array<std::pair<uint16_t, UChar32>, std::size(big5DecodingExtras) + std::size(big5EncodingMap)>;
static const Big5DecodeIndex& big5DecodeIndex()
{
- static auto* table = [] {
- auto table = new Big5DecodeIndex();
- for (size_t i = 0; i < WTF_ARRAY_LENGTH(big5DecodingExtras); i++)
+ static auto& table = *[] {
+ auto table = new Big5DecodeIndex;
+ for (size_t i = 0; i < std::size(big5DecodingExtras); i++)
(*table)[i] = big5DecodingExtras[i];
- for (size_t i = 0; i < WTF_ARRAY_LENGTH(big5EncodingMap); i++)
- (*table)[i + WTF_ARRAY_LENGTH(big5DecodingExtras)] = { big5EncodingMap[i].second, big5EncodingMap[i].first };
+ for (size_t i = 0; i < std::size(big5EncodingMap); i++)
+ (*table)[i + std::size(big5DecodingExtras)] = { big5EncodingMap[i].second, big5EncodingMap[i].first };
std::sort(table->begin(), table->end(), [] (auto& a, auto& b) {
return a.first < b.first;
});
return table;
}();
- return *table;
+ return table;
}
String TextCodecCJK::big5Decode(const uint8_t* bytes, size_t length, bool flush, bool stopOnError, bool& sawError)
@@ -923,7 +935,7 @@
return { };
}
-Vector<uint8_t> TextCodecCJK::encode(StringView string, UnencodableHandling handling)
+Vector<uint8_t> TextCodecCJK::encode(StringView string, UnencodableHandling handling) const
{
switch (m_encoding) {
case Encoding::EUC_JP:
@@ -942,4 +954,3 @@
}
} // namespace WebCore
-
Modified: trunk/Source/WebCore/platform/text/TextCodecCJK.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecCJK.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecCJK.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -26,29 +26,21 @@
#pragma once
#include "TextCodec.h"
-#include <wtf/Forward.h>
#include <wtf/Optional.h>
namespace WebCore {
-class TextCodecCJK : public TextCodec {
+class TextCodecCJK final : public TextCodec {
public:
- enum class Encoding : uint8_t {
- EUC_JP,
- ISO2022JP,
- Shift_JIS,
- EUC_KR,
- Big5
- };
-
- explicit TextCodecCJK(Encoding);
-
static void registerEncodingNames(EncodingNameRegistrar);
static void registerCodecs(TextCodecRegistrar);
+ enum class Encoding : uint8_t;
+ explicit TextCodecCJK(Encoding);
+
private:
String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) final;
- Vector<uint8_t> encode(StringView, UnencodableHandling) final;
+ Vector<uint8_t> encode(StringView, UnencodableHandling) const final;
enum class SawError : bool { No, Yes };
String decodeCommon(const uint8_t*, size_t, bool, bool, bool&, const Function<SawError(uint8_t, StringBuilder&)>&);
@@ -58,7 +50,6 @@
String shiftJISDecode(const uint8_t*, size_t, bool, bool, bool&);
String eucKRDecode(const uint8_t*, size_t, bool, bool, bool&);
String big5Decode(const uint8_t*, size_t, bool, bool, bool&);
- Vector<uint8_t> iso2022JPEncode(StringView, Function<void(UChar32, Vector<uint8_t>&)> unencodableHandler);
const Encoding m_encoding;
@@ -70,12 +61,8 @@
bool m_iso2022JPOutput { false };
Optional<uint8_t> m_iso2022JPSecondPrependedByte;
- enum class ISO2022JPEncoderState : uint8_t { ASCII, Roman, Jis0208 };
- ISO2022JPEncoderState m_iso2022JPEncoderState { ISO2022JPEncoderState::ASCII };
-
uint8_t m_lead { 0x00 };
Optional<uint8_t> m_prependedByte;
};
} // namespace WebCore
-
Modified: trunk/Source/WebCore/platform/text/TextCodecICU.cpp (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecICU.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecICU.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -357,7 +357,7 @@
UCNV_FROM_U_CALLBACK_SUBSTITUTE(context, fromUArgs, codeUnits, length, codePoint, reason, error);
}
-Vector<uint8_t> TextCodecICU::encode(StringView string, UnencodableHandling handling)
+Vector<uint8_t> TextCodecICU::encode(StringView string, UnencodableHandling handling) const
{
if (string.isEmpty())
return { };
Modified: trunk/Source/WebCore/platform/text/TextCodecICU.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecICU.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecICU.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -33,17 +33,17 @@
using ICUConverterPtr = std::unique_ptr<UConverter, void (*)(UConverter*)>;
-class TextCodecICU : public TextCodec {
+class TextCodecICU final : public TextCodec {
public:
+ static void registerEncodingNames(EncodingNameRegistrar);
+ static void registerCodecs(TextCodecRegistrar);
+
explicit TextCodecICU(const char* encoding, const char* canonicalConverterName);
virtual ~TextCodecICU();
- static void registerEncodingNames(EncodingNameRegistrar);
- static void registerCodecs(TextCodecRegistrar);
-
private:
String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) final;
- Vector<uint8_t> encode(StringView, UnencodableHandling) final;
+ Vector<uint8_t> encode(StringView, UnencodableHandling) const final;
void createICUConverter() const;
void releaseICUConverter() const;
@@ -65,4 +65,3 @@
};
} // namespace WebCore
-
Modified: trunk/Source/WebCore/platform/text/TextCodecLatin1.cpp (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecLatin1.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecLatin1.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -225,7 +225,7 @@
return result;
}
-Vector<uint8_t> TextCodecLatin1::encode(StringView string, UnencodableHandling handling)
+Vector<uint8_t> TextCodecLatin1::encode(StringView string, UnencodableHandling handling) const
{
{
Vector<uint8_t> result(string.length());
Modified: trunk/Source/WebCore/platform/text/TextCodecLatin1.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecLatin1.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecLatin1.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -36,8 +36,7 @@
private:
String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) final;
- Vector<uint8_t> encode(StringView, UnencodableHandling) final;
+ Vector<uint8_t> encode(StringView, UnencodableHandling) const final;
};
} // namespace WebCore
-
Modified: trunk/Source/WebCore/platform/text/TextCodecReplacement.cpp (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecReplacement.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecReplacement.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -59,4 +59,9 @@
return String { &replacementCharacter, 1 };
}
+Vector<uint8_t> TextCodecReplacement::encode(StringView string, UnencodableHandling unencodableHandling) const
+{
+ return TextCodecUTF8::encodeUTF8(string, unencodableHandling);
+}
+
} // namespace WebCore
Modified: trunk/Source/WebCore/platform/text/TextCodecReplacement.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecReplacement.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecReplacement.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Apple Inc. All rights reserved.
+ * Copyright (C) 2016-2020 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -29,7 +29,7 @@
namespace WebCore {
-class TextCodecReplacement : public TextCodecUTF8 {
+class TextCodecReplacement final : public TextCodec {
public:
static void registerEncodingNames(EncodingNameRegistrar);
static void registerCodecs(TextCodecRegistrar);
@@ -36,9 +36,9 @@
private:
String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) final;
+ Vector<uint8_t> encode(StringView, UnencodableHandling) const final;
bool m_sentEOF { false };
-
};
} // namespace WebCore
Modified: trunk/Source/WebCore/platform/text/TextCodecSingleByte.cpp (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecSingleByte.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecSingleByte.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -32,8 +32,24 @@
namespace WebCore {
+enum class TextCodecSingleByte::Encoding : uint8_t {
+ ISO_8859_3,
+ ISO_8859_6,
+ ISO_8859_7,
+ ISO_8859_8,
+ Windows_874,
+ Windows_1253,
+ Windows_1255,
+ Windows_1257,
+ IBM866,
+ KOI8U,
+};
+
+using SingleByteDecodeTable = std::array<UChar, 128>;
+using SingleByteEncodeTable = std::pair<const std::pair<UChar, uint8_t>*, size_t>;
+
// From https://encoding.spec.whatwg.org/index-iso-8859-3.txt with 0xFFFD filling the gaps
-constexpr UChar iso88593[128] {
+static constexpr SingleByteDecodeTable iso88593 {
0x0080, 0x0081, 0x0082, 0x0083, 0x0084, 0x0085, 0x0086, 0x0087, 0x0088, 0x0089, 0x008A, 0x008B, 0x008C, 0x008D, 0x008E, 0x008F,
0x0090, 0x0091, 0x0092, 0x0093, 0x0094, 0x0095, 0x0096, 0x0097, 0x0098, 0x0099, 0x009A, 0x009B, 0x009C, 0x009D, 0x009E, 0x009F,
0x00A0, 0x0126, 0x02D8, 0x00A3, 0x00A4, 0xFFFD, 0x0124, 0x00A7, 0x00A8, 0x0130, 0x015E, 0x011E, 0x0134, 0x00AD, 0xFFFD, 0x017B,
@@ -45,7 +61,7 @@
};
// From https://encoding.spec.whatwg.org/index-iso-8859-6.txt with 0xFFFD filling the gaps
-constexpr UChar iso88596[128] {
+static constexpr SingleByteDecodeTable iso88596 {
0x0080, 0x0081, 0x0082, 0x0083, 0x0084, 0x0085, 0x0086, 0x0087, 0x0088, 0x0089, 0x008A, 0x008B, 0x008C, 0x008D, 0x008E, 0x008F,
0x0090, 0x0091, 0x0092, 0x0093, 0x0094, 0x0095, 0x0096, 0x0097, 0x0098, 0x0099, 0x009A, 0x009B, 0x009C, 0x009D, 0x009E, 0x009F,
0x00A0, 0xFFFD, 0xFFFD, 0xFFFD, 0x00A4, 0xFFFD, 0xFFFD, 0xFFFD, 0xFFFD, 0xFFFD, 0xFFFD, 0xFFFD, 0x060C, 0x00AD, 0xFFFD, 0xFFFD,
@@ -57,7 +73,7 @@
};
// From https://encoding.spec.whatwg.org/index-iso-8859-7.txt with 0xFFFD filling the gaps
-constexpr UChar iso88597[128] {
+static constexpr SingleByteDecodeTable iso88597 {
0x0080, 0x0081, 0x0082, 0x0083, 0x0084, 0x0085, 0x0086, 0x0087, 0x0088, 0x0089, 0x008A, 0x008B, 0x008C, 0x008D, 0x008E, 0x008F,
0x0090, 0x0091, 0x0092, 0x0093, 0x0094, 0x0095, 0x0096, 0x0097, 0x0098, 0x0099, 0x009A, 0x009B, 0x009C, 0x009D, 0x009E, 0x009F,
0x00A0, 0x2018, 0x2019, 0x00A3, 0x20AC, 0x20AF, 0x00A6, 0x00A7, 0x00A8, 0x00A9, 0x037A, 0x00AB, 0x00AC, 0x00AD, 0xFFFD, 0x2015,
@@ -69,7 +85,7 @@
};
// From https://encoding.spec.whatwg.org/index-iso-8859-8.txt with 0xFFFD filling the gaps
-constexpr UChar iso88598[128] {
+static constexpr SingleByteDecodeTable iso88598 {
0x0080, 0x0081, 0x0082, 0x0083, 0x0084, 0x0085, 0x0086, 0x0087, 0x0088, 0x0089, 0x008A, 0x008B, 0x008C, 0x008D, 0x008E, 0x008F,
0x0090, 0x0091, 0x0092, 0x0093, 0x0094, 0x0095, 0x0096, 0x0097, 0x0098, 0x0099, 0x009A, 0x009B, 0x009C, 0x009D, 0x009E, 0x009F,
0x00A0, 0xFFFD, 0x00A2, 0x00A3, 0x00A4, 0x00A5, 0x00A6, 0x00A7, 0x00A8, 0x00A9, 0x00D7, 0x00AB, 0x00AC, 0x00AD, 0x00AE, 0x00AF,
@@ -81,7 +97,7 @@
};
// From https://encoding.spec.whatwg.org/index-windows-874.txt with 0xFFFD filling the gaps
-constexpr UChar windows874[128] {
+static constexpr SingleByteDecodeTable windows874 {
0x20AC, 0x0081, 0x0082, 0x0083, 0x0084, 0x2026, 0x0086, 0x0087, 0x0088, 0x0089, 0x008A, 0x008B, 0x008C, 0x008D, 0x008E, 0x008F,
0x0090, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014, 0x0098, 0x0099, 0x009A, 0x009B, 0x009C, 0x009D, 0x009E, 0x009F,
0x00A0, 0x0E01, 0x0E02, 0x0E03, 0x0E04, 0x0E05, 0x0E06, 0x0E07, 0x0E08, 0x0E09, 0x0E0A, 0x0E0B, 0x0E0C, 0x0E0D, 0x0E0E, 0x0E0F,
@@ -93,7 +109,7 @@
};
// From https://encoding.spec.whatwg.org/index-windows-1253.txt with 0xFFFD filling the gaps
-constexpr UChar windows1253[128] {
+static constexpr SingleByteDecodeTable windows1253 {
0x20AC, 0x0081, 0x201A, 0x0192, 0x201E, 0x2026, 0x2020, 0x2021, 0x0088, 0x2030, 0x008A, 0x2039, 0x008C, 0x008D, 0x008E, 0x008F,
0x0090, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014, 0x0098, 0x2122, 0x009A, 0x203A, 0x009C, 0x009D, 0x009E, 0x009F,
0x00A0, 0x0385, 0x0386, 0x00A3, 0x00A4, 0x00A5, 0x00A6, 0x00A7, 0x00A8, 0x00A9, 0xFFFD, 0x00AB, 0x00AC, 0x00AD, 0x00AE, 0x2015,
@@ -104,7 +120,7 @@
0x03C0, 0x03C1, 0x03C2, 0x03C3, 0x03C4, 0x03C5, 0x03C6, 0x03C7, 0x03C8, 0x03C9, 0x03CA, 0x03CB, 0x03CC, 0x03CD, 0x03CE, 0xFFFD
};
-constexpr UChar windows1255[128] {
+static constexpr SingleByteDecodeTable windows1255 {
0x20AC, 0x0081, 0x201A, 0x0192, 0x201E, 0x2026, 0x2020, 0x2021, 0x02C6, 0x2030, 0x008A, 0x2039, 0x008C, 0x008D, 0x008E, 0x008F,
0x0090, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014, 0x02DC, 0x2122, 0x009A, 0x203A, 0x009C, 0x009D, 0x009E, 0x009F,
0x00A0, 0x00A1, 0x00A2, 0x00A3, 0x20AA, 0x00A5, 0x00A6, 0x00A7, 0x00A8, 0x00A9, 0x00D7, 0x00AB, 0x00AC, 0x00AD, 0x00AE, 0x00AF,
@@ -116,7 +132,7 @@
};
// From https://encoding.spec.whatwg.org/index-windows-1257.txt with 0xFFFD filling the gaps
-constexpr UChar windows1257[128] {
+static constexpr SingleByteDecodeTable windows1257 {
0x20AC, 0x0081, 0x201A, 0x0083, 0x201E, 0x2026, 0x2020, 0x2021, 0x0088, 0x2030, 0x008A, 0x2039, 0x008C, 0x00A8, 0x02C7, 0x00B8,
0x0090, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014, 0x0098, 0x2122, 0x009A, 0x203A, 0x009C, 0x00AF, 0x02DB, 0x009F,
0x00A0, 0xFFFD, 0x00A2, 0x00A3, 0x00A4, 0xFFFD, 0x00A6, 0x00A7, 0x00D8, 0x00A9, 0x0156, 0x00AB, 0x00AC, 0x00AD, 0x00AE, 0x00C6,
@@ -128,7 +144,7 @@
};
// From https://encoding.spec.whatwg.org/index-koi8-u.txt
-constexpr UChar koi8u[128] {
+static constexpr SingleByteDecodeTable koi8u {
0x2500, 0x2502, 0x250C, 0x2510, 0x2514, 0x2518, 0x251C, 0x2524, 0x252C, 0x2534, 0x253C, 0x2580, 0x2584, 0x2588, 0x258C, 0x2590,
0x2591, 0x2592, 0x2593, 0x2320, 0x25A0, 0x2219, 0x221A, 0x2248, 0x2264, 0x2265, 0x00A0, 0x2321, 0x00B0, 0x00B2, 0x00B7, 0x00F7,
0x2550, 0x2551, 0x2552, 0x0451, 0x0454, 0x2554, 0x0456, 0x0457, 0x2557, 0x2558, 0x2559, 0x255A, 0x255B, 0x0491, 0x045E, 0x255E,
@@ -140,7 +156,7 @@
};
// From https://encoding.spec.whatwg.org/index-ibm866.txt
-constexpr UChar ibm866[128] {
+static constexpr SingleByteDecodeTable ibm866 {
0x0410, 0x0411, 0x0412, 0x0413, 0x0414, 0x0415, 0x0416, 0x0417, 0x0418, 0x0419, 0x041A, 0x041B, 0x041C, 0x041D, 0x041E, 0x041F,
0x0420, 0x0421, 0x0422, 0x0423, 0x0424, 0x0425, 0x0426, 0x0427, 0x0428, 0x0429, 0x042A, 0x042B, 0x042C, 0x042D, 0x042E, 0x042F,
0x0430, 0x0431, 0x0432, 0x0433, 0x0434, 0x0435, 0x0436, 0x0437, 0x0438, 0x0439, 0x043A, 0x043B, 0x043C, 0x043D, 0x043E, 0x043F,
@@ -151,22 +167,14 @@
0x0401, 0x0451, 0x0404, 0x0454, 0x0407, 0x0457, 0x040E, 0x045E, 0x00B0, 0x2219, 0x00B7, 0x221A, 0x2116, 0x00A4, 0x25A0, 0x00A0
};
-template<const UChar* decodeTable>
-std::pair<const std::pair<UChar, uint8_t>*, size_t> tableForEncoding()
+template<const SingleByteDecodeTable& decodeTable> SingleByteEncodeTable tableForEncoding()
{
- constexpr auto tableSize = 128;
- static size_t size = [&] {
- size_t size = 0;
- for (uint8_t i = 0; i < tableSize; i++) {
- if (decodeTable[i] != replacementCharacter)
- size++;
- }
- return size;
- }();
- static auto* table = [&] {
- static auto* table = new std::pair<UChar, uint8_t>[size];
+ // FIXME: With the C++20 version of std::count, we should be able to change this from const to constexpr and get it computed at compile time.
+ static const size_t size = std::size(decodeTable) - std::count(std::begin(decodeTable), std::end(decodeTable), replacementCharacter);
+ static const auto table = [&] {
+ auto table = new std::pair<UChar, uint8_t>[size];
size_t j = 0;
- for (uint8_t i = 0; i < tableSize; i++) {
+ for (uint8_t i = 0; i < std::size(decodeTable); i++) {
if (decodeTable[i] != replacementCharacter)
table[j++] = { decodeTable[i], i };
}
@@ -175,16 +183,16 @@
return { table, size };
}
-static std::pair<const std::pair<UChar, uint8_t>*, size_t> tableForEncoding(TextCodecSingleByte::Encoding encoding)
+static SingleByteEncodeTable tableForEncoding(TextCodecSingleByte::Encoding encoding)
{
switch (encoding) {
- case TextCodecSingleByte::Encoding::Iso_8859_3:
+ case TextCodecSingleByte::Encoding::ISO_8859_3:
return tableForEncoding<iso88593>();
- case TextCodecSingleByte::Encoding::Iso_8859_6:
+ case TextCodecSingleByte::Encoding::ISO_8859_6:
return tableForEncoding<iso88596>();
- case TextCodecSingleByte::Encoding::Iso_8859_7:
+ case TextCodecSingleByte::Encoding::ISO_8859_7:
return tableForEncoding<iso88597>();
- case TextCodecSingleByte::Encoding::Iso_8859_8:
+ case TextCodecSingleByte::Encoding::ISO_8859_8:
return tableForEncoding<iso88598>();
case TextCodecSingleByte::Encoding::Windows_874:
return tableForEncoding<windows874>();
@@ -199,20 +207,19 @@
case TextCodecSingleByte::Encoding::KOI8U:
return tableForEncoding<koi8u>();
}
- ASSERT_NOT_REACHED();
- return { nullptr, 0 };
+ RELEASE_ASSERT_NOT_REACHED();
}
-static const UChar* tableForDecoding(TextCodecSingleByte::Encoding encoding)
+static constexpr const SingleByteDecodeTable& tableForDecoding(TextCodecSingleByte::Encoding encoding)
{
switch (encoding) {
- case TextCodecSingleByte::Encoding::Iso_8859_3:
+ case TextCodecSingleByte::Encoding::ISO_8859_3:
return iso88593;
- case TextCodecSingleByte::Encoding::Iso_8859_6:
+ case TextCodecSingleByte::Encoding::ISO_8859_6:
return iso88596;
- case TextCodecSingleByte::Encoding::Iso_8859_7:
+ case TextCodecSingleByte::Encoding::ISO_8859_7:
return iso88597;
- case TextCodecSingleByte::Encoding::Iso_8859_8:
+ case TextCodecSingleByte::Encoding::ISO_8859_8:
return iso88598;
case TextCodecSingleByte::Encoding::Windows_874:
return windows874;
@@ -227,12 +234,11 @@
case TextCodecSingleByte::Encoding::KOI8U:
return koi8u;
}
- ASSERT_NOT_REACHED();
- return nullptr;
+ RELEASE_ASSERT_NOT_REACHED();
}
// https://encoding.spec.whatwg.org/#single-byte-encoder
-Vector<uint8_t> TextCodecSingleByte::encode(const EncodeTable& table, StringView string, Function<void(UChar32, Vector<uint8_t>&)>&& unencodableHandler)
+static Vector<uint8_t> encode(const SingleByteEncodeTable& table, StringView string, Function<void(UChar32, Vector<uint8_t>&)>&& unencodableHandler)
{
Vector<uint8_t> result;
result.reserveInitialCapacity(string.length());
@@ -264,7 +270,7 @@
}
// https://encoding.spec.whatwg.org/#single-byte-decoder
-String TextCodecSingleByte::decode(const UChar* table, const uint8_t* bytes, size_t length, bool, bool stopOnError, bool& sawError)
+static String decode(const SingleByteDecodeTable& table, const uint8_t* bytes, size_t length, bool, bool stopOnError, bool& sawError)
{
StringBuilder result;
result.reserveCapacity(length);
@@ -293,18 +299,20 @@
return result.toString();
}
-Vector<uint8_t> TextCodecSingleByte::encode(StringView string, UnencodableHandling handling)
+Vector<uint8_t> TextCodecSingleByte::encode(StringView string, UnencodableHandling handling) const
{
- return encode(tableForEncoding(m_encoding), string, unencodableHandler(handling));
+ return WebCore::encode(tableForEncoding(m_encoding), string, unencodableHandler(handling));
}
String TextCodecSingleByte::decode(const char* bytes, size_t length, bool flush, bool stopOnError, bool& sawError)
{
- return decode(tableForDecoding(m_encoding), reinterpret_cast<const uint8_t*>(bytes), length, flush, stopOnError, sawError);
+ return WebCore::decode(tableForDecoding(m_encoding), reinterpret_cast<const uint8_t*>(bytes), length, flush, stopOnError, sawError);
}
TextCodecSingleByte::TextCodecSingleByte(Encoding encoding)
- : m_encoding(encoding) { }
+ : m_encoding(encoding)
+{
+}
void TextCodecSingleByte::registerEncodingNames(EncodingNameRegistrar registrar)
{
@@ -420,19 +428,19 @@
void TextCodecSingleByte::registerCodecs(TextCodecRegistrar registrar)
{
registrar("ISO-8859-3", [] {
- return makeUnique<TextCodecSingleByte>(Encoding::Iso_8859_3);
+ return makeUnique<TextCodecSingleByte>(Encoding::ISO_8859_3);
});
registrar("ISO-8859-6", [] {
- return makeUnique<TextCodecSingleByte>(Encoding::Iso_8859_6);
+ return makeUnique<TextCodecSingleByte>(Encoding::ISO_8859_6);
});
registrar("ISO-8859-7", [] {
- return makeUnique<TextCodecSingleByte>(Encoding::Iso_8859_7);
+ return makeUnique<TextCodecSingleByte>(Encoding::ISO_8859_7);
});
registrar("ISO-8859-8", [] {
- return makeUnique<TextCodecSingleByte>(Encoding::Iso_8859_8);
+ return makeUnique<TextCodecSingleByte>(Encoding::ISO_8859_8);
});
registrar("ISO-8859-8-I", [] {
- return makeUnique<TextCodecSingleByte>(Encoding::Iso_8859_8);
+ return makeUnique<TextCodecSingleByte>(Encoding::ISO_8859_8);
});
registrar("windows-874", [] {
return makeUnique<TextCodecSingleByte>(Encoding::Windows_874);
Modified: trunk/Source/WebCore/platform/text/TextCodecSingleByte.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecSingleByte.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecSingleByte.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -29,36 +29,19 @@
namespace WebCore {
-class TextCodecSingleByte : public TextCodec {
+class TextCodecSingleByte final : public TextCodec {
public:
- enum class Encoding : uint8_t {
- Iso_8859_3,
- Iso_8859_6,
- Iso_8859_7,
- Iso_8859_8,
- Windows_874,
- Windows_1253,
- Windows_1255,
- Windows_1257,
- IBM866,
- KOI8U,
- };
+ static void registerEncodingNames(EncodingNameRegistrar);
+ static void registerCodecs(TextCodecRegistrar);
+ enum class Encoding : uint8_t;
explicit TextCodecSingleByte(Encoding);
- static void registerEncodingNames(EncodingNameRegistrar);
- static void registerCodecs(TextCodecRegistrar);
-
private:
String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) final;
- Vector<uint8_t> encode(StringView, UnencodableHandling) final;
+ Vector<uint8_t> encode(StringView, UnencodableHandling) const final;
- using EncodeTable = std::pair<const std::pair<UChar, uint8_t>*, size_t>;
- Vector<uint8_t> encode(const EncodeTable&, StringView, Function<void(UChar32, Vector<uint8_t>&)>&&);
- String decode(const UChar* table, const uint8_t*, size_t length, bool flush, bool stopOnError, bool& sawError);
-
const Encoding m_encoding;
};
} // namespace WebCore
-
Modified: trunk/Source/WebCore/platform/text/TextCodecUTF16.cpp (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecUTF16.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecUTF16.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2004-2019 Apple Inc. All rights reserved.
+ * Copyright (C) 2004-2020 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -73,8 +73,9 @@
StringBuilder result;
result.reserveCapacity(length / 2);
- Function<void(UChar)> processBytesShared;
- processBytesShared = [&] (UChar codeUnit) {
+ auto processCodeUnit = [&] (UChar codeUnit) {
+ if (std::exchange(m_shouldStripByteOrderMark, false) && codeUnit == byteOrderMark)
+ return;
if (m_leadSurrogate) {
auto leadSurrogate = *std::exchange(m_leadSurrogate, WTF::nullopt);
if (U16_IS_TRAIL(codeUnit)) {
@@ -83,8 +84,6 @@
}
sawError = true;
result.append(replacementCharacter);
- processBytesShared(codeUnit);
- return;
}
if (U16_IS_LEAD(codeUnit)) {
m_leadSurrogate = codeUnit;
@@ -98,10 +97,10 @@
result.append(codeUnit);
};
auto processBytesLE = [&] (uint8_t first, uint8_t second) {
- processBytesShared(first | (second << 8));
+ processCodeUnit(first | (second << 8));
};
auto processBytesBE = [&] (uint8_t first, uint8_t second) {
- processBytesShared((first << 8) | second);
+ processCodeUnit((first << 8) | second);
};
if (m_leadByte && p < end) {
@@ -130,18 +129,21 @@
m_leadByte = p[0];
} else
ASSERT(!p || p == end);
-
- if (flush && (m_leadByte || m_leadSurrogate)) {
- m_leadByte = WTF::nullopt;
- m_leadSurrogate = WTF::nullopt;
- sawError = true;
- result.append(replacementCharacter);
+
+ if (flush) {
+ m_shouldStripByteOrderMark = false;
+ if (m_leadByte || m_leadSurrogate) {
+ m_leadByte = WTF::nullopt;
+ m_leadSurrogate = WTF::nullopt;
+ sawError = true;
+ result.append(replacementCharacter);
+ }
}
return result.toString();
}
-Vector<uint8_t> TextCodecUTF16::encode(StringView string, UnencodableHandling)
+Vector<uint8_t> TextCodecUTF16::encode(StringView string, UnencodableHandling) const
{
Vector<uint8_t> result(WTF::checkedProduct<size_t>(string.length(), 2).unsafeGet());
auto* bytes = result.data();
Modified: trunk/Source/WebCore/platform/text/TextCodecUTF16.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecUTF16.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecUTF16.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2004-2017 Apple Inc. All rights reserved.
+ * Copyright (C) 2004-2020 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -30,7 +30,7 @@
namespace WebCore {
-class TextCodecUTF16 : public TextCodec {
+class TextCodecUTF16 final : public TextCodec {
public:
static void registerEncodingNames(EncodingNameRegistrar);
static void registerCodecs(TextCodecRegistrar);
@@ -38,12 +38,14 @@
explicit TextCodecUTF16(bool littleEndian);
private:
+ void stripByteOrderMark() final { m_shouldStripByteOrderMark = true; }
String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) final;
- Vector<uint8_t> encode(StringView, UnencodableHandling) final;
+ Vector<uint8_t> encode(StringView, UnencodableHandling) const final;
bool m_littleEndian;
Optional<uint8_t> m_leadByte;
Optional<UChar> m_leadSurrogate;
+ bool m_shouldStripByteOrderMark { false };
};
} // namespace WebCore
Modified: trunk/Source/WebCore/platform/text/TextCodecUTF8.cpp (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecUTF8.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecUTF8.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2004-2017 Apple Inc. All rights reserved.
+ * Copyright (C) 2004-2020 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -270,10 +270,12 @@
}
m_partialSequenceSize -= count;
+ if (std::exchange(m_shouldStripByteOrderMark, false) && character == byteOrderMark)
+ continue;
destination = appendCharacter(destination, character);
} while (m_partialSequenceSize);
}
-
+
String TextCodecUTF8::decode(const char* bytes, size_t length, bool flush, bool stopOnError, bool& sawError)
{
// Each input byte might turn into a character.
@@ -354,7 +356,10 @@
} while (flush && m_partialSequenceSize);
buffer.shrink(destination - buffer.characters());
-
+ if (flush)
+ m_partialSequenceSize = 0;
+ if (flush || buffer.length())
+ m_shouldStripByteOrderMark = false;
return String::adopt(WTFMove(buffer));
upConvertTo16Bit:
@@ -379,7 +384,7 @@
if (m_partialSequenceSize)
break;
}
-
+
while (source < end) {
if (isASCII(*source)) {
// Fast path for ASCII. Most UTF-8 text will be ASCII.
@@ -424,16 +429,21 @@
continue;
}
source += count;
+ if (character == byteOrderMark && destination16 == buffer16.characters() && std::exchange(m_shouldStripByteOrderMark, false))
+ continue;
destination16 = appendCharacter(destination16, character);
}
} while (flush && m_partialSequenceSize);
-
+
buffer16.shrink(destination16 - buffer16.characters());
-
+ if (flush)
+ m_partialSequenceSize = 0;
+ if (flush || buffer16.length())
+ m_shouldStripByteOrderMark = false;
return String::adopt(WTFMove(buffer16));
}
-Vector<uint8_t> TextCodecUTF8::encode(StringView string, UnencodableHandling)
+Vector<uint8_t> TextCodecUTF8::encodeUTF8(StringView string, UnencodableHandling)
{
// The maximum number of UTF-8 bytes needed per UTF-16 code unit is 3.
// BMP characters take only one UTF-16 code unit and can take up to 3 bytes (3x).
@@ -446,4 +456,9 @@
return bytes;
}
+Vector<uint8_t> TextCodecUTF8::encode(StringView string, UnencodableHandling unencodableHandling) const
+{
+ return encodeUTF8(string, unencodableHandling);
+}
+
} // namespace WebCore
Modified: trunk/Source/WebCore/platform/text/TextCodecUTF8.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecUTF8.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecUTF8.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2011-2017 Apple Inc. All rights reserved.
+ * Copyright (C) 2011-2020 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -31,14 +31,17 @@
namespace WebCore {
-class TextCodecUTF8 : public TextCodec {
+class TextCodecUTF8 final : public TextCodec {
public:
static void registerEncodingNames(EncodingNameRegistrar);
static void registerCodecs(TextCodecRegistrar);
+ static Vector<uint8_t> encodeUTF8(StringView, UnencodableHandling);
+
private:
- String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) override;
- Vector<uint8_t> encode(StringView, UnencodableHandling) final;
+ void stripByteOrderMark() final { m_shouldStripByteOrderMark = true; }
+ String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) final;
+ Vector<uint8_t> encode(StringView, UnencodableHandling) const final;
bool handlePartialSequence(LChar*& destination, const uint8_t*& source, const uint8_t* end, bool flush);
void handlePartialSequence(UChar*& destination, const uint8_t*& source, const uint8_t* end, bool flush, bool stopOnError, bool& sawError);
@@ -46,8 +49,7 @@
int m_partialSequenceSize { 0 };
uint8_t m_partialSequence[U8_MAX_LENGTH];
-
+ bool m_shouldStripByteOrderMark { false };
};
} // namespace WebCore
-
Modified: trunk/Source/WebCore/platform/text/TextCodecUserDefined.cpp (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecUserDefined.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecUserDefined.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -75,7 +75,7 @@
return result;
}
-Vector<uint8_t> TextCodecUserDefined::encode(StringView string, UnencodableHandling handling)
+Vector<uint8_t> TextCodecUserDefined::encode(StringView string, UnencodableHandling handling) const
{
{
Vector<uint8_t> result(string.length());
Modified: trunk/Source/WebCore/platform/text/TextCodecUserDefined.h (266680 => 266681)
--- trunk/Source/WebCore/platform/text/TextCodecUserDefined.h 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/platform/text/TextCodecUserDefined.h 2020-09-06 19:10:11 UTC (rev 266681)
@@ -29,7 +29,7 @@
namespace WebCore {
-class TextCodecUserDefined : public TextCodec {
+class TextCodecUserDefined final : public TextCodec {
public:
static void registerEncodingNames(EncodingNameRegistrar);
static void registerCodecs(TextCodecRegistrar);
@@ -36,7 +36,7 @@
private:
String decode(const char*, size_t length, bool flush, bool stopOnError, bool& sawError) final;
- Vector<uint8_t> encode(StringView, UnencodableHandling) final;
+ Vector<uint8_t> encode(StringView, UnencodableHandling) const final;
};
} // namespace WebCore
Modified: trunk/Source/WebCore/xml/XSLStyleSheetLibxslt.cpp (266680 => 266681)
--- trunk/Source/WebCore/xml/XSLStyleSheetLibxslt.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/xml/XSLStyleSheetLibxslt.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -36,6 +36,7 @@
#include <libxml/uri.h>
#include <libxslt/xsltutils.h>
#include <wtf/CheckedArithmetic.h>
+#include <wtf/unicode/CharacterNames.h>
#if OS(DARWIN) && !PLATFORM(GTK)
#include "SoftLinkLibxslt.h"
@@ -130,8 +131,7 @@
bool XSLStyleSheet::parseString(const String& string)
{
// Parse in a single chunk into an xmlDocPtr
- const UChar BOM = 0xFEFF;
- const unsigned char BOMHighByte = *reinterpret_cast<const unsigned char*>(&BOM);
+ const unsigned char BOMHighByte = *reinterpret_cast<const unsigned char*>(&byteOrderMark);
clearXSLStylesheetDocument();
PageConsoleClient* console = nullptr;
Modified: trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp (266680 => 266681)
--- trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp 2020-09-06 15:59:35 UTC (rev 266680)
+++ trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp 2020-09-06 19:10:11 UTC (rev 266681)
@@ -59,6 +59,7 @@
#include "XMLNSNames.h"
#include "XMLDocumentParserScope.h"
#include <libxml/parserInternals.h>
+#include <wtf/unicode/CharacterNames.h>
#include <wtf/unicode/UTF8Conversion.h>
#if ENABLE(XSLT)
@@ -410,8 +411,7 @@
// FIXME: Can we just use XML_PARSE_IGNORE_ENC now?
- const UChar BOM = 0xFEFF;
- const unsigned char BOMHighByte = *reinterpret_cast<const unsigned char*>(&BOM);
+ const unsigned char BOMHighByte = *reinterpret_cast<const unsigned char*>(&byteOrderMark);
xmlSwitchEncoding(ctxt, BOMHighByte == 0xFF ? XML_CHAR_ENCODING_UTF16LE : XML_CHAR_ENCODING_UTF16BE);
}
@@ -1351,8 +1351,7 @@
#if ENABLE(XSLT)
static inline const char* nativeEndianUTF16Encoding()
{
- const UChar BOM = 0xFEFF;
- const unsigned char BOMHighByte = *reinterpret_cast<const unsigned char*>(&BOM);
+ const unsigned char BOMHighByte = *reinterpret_cast<const unsigned char*>(&byteOrderMark);
return BOMHighByte == 0xFF ? "UTF-16LE" : "UTF-16BE";
}
_______________________________________________ webkit-changes mailing list [email protected] https://lists.webkit.org/mailman/listinfo/webkit-changes
