date:20240123

[jira] [Comment Edited] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810175#comment-17810175
 ] 

Joe Kesselman edited comment on XALANJ-2725 at 1/24/24 2:44 AM:


Unfortunately, no, the patch doesn't seem to be working for me. For both 
ToXMLStreamTest and ToHTMLStreamTest, the UTF-8 pass reports:
{code:java}
{code}
I should admit that I don't, at first glance, see +why+ it's failing; might be 
time to fire up the debugger and watch it fail.

 

 

Note that we have some annoyingly parallel solutions in this class – 
isHighSurrogate() is tested five different places under multiple serializing 
loops, One of them, in the *characters* method, actually does admit that 
there's a buffer bounds risk in its look-ahead; that's the one at line 1598 of 
my copy:
{code:java}
else if (Encodings.isHighUTF16Surrogate(ch) && i < end-1 && 
Encodings.isLowUTF16Surrogate(chars[i+1])) {{code}
though it doesn't do more than dump the surrogates as Numeric Character 
Entities when the boundary is crossed.  The others (two in 
*writeNormalizedCharacters,* one in {*}accumDefaultEscape{*}, one in 
{*}writeAttrString{*}) appear to blithely assume that if buffer division is 
possible it has been done on Unicode Character, rather than UTF16 unit, 
boundaries. Which is what I would expect to happen in most of our code. But 
mistakes get made, and the serializer APIs can be invoked from code other than 
Xalan, so guarding against this isn't an unreasonable idea.

In fact, that would probably be the right way to test this – write unit-test 
code that exercises ToStream directly at the API level, rather than trying to 
do so from the functional-test level.

Some off-the-cuff code review on your patch while I was glancing at it:

I notice that you clear m_highUTF16Surrogate after a surrogate pair, and flush 
it out before the "fallback plan". The former makes sense. The latter ... Well, 
ill-formed UTF16 isn't supposed to occur, so that combination really shouldn't 
happen. If it does, I'm not sure writing high out as a numeric character 
reference makes sense as anything but an error indication, in which case I'd be 
tempted to write it as _HIGH_SURROGATE_; to make clear that this 
is what's going on.

If we're going to assume that isolated surrogates are possible at all, your 
code risks combining them into a single character that never actually existed, 
since the high-surrogate cache isn't cleared until it's used. That could be 
hard to diagnose. ("Spooky action at a distance"?) I dislike spending the 
cycles, but we might want to make sure the high surrogate doesn't outlast the 
next UTF16 unit even if it isn't a low surrogate.

Or we can assert that providing correct UTF16 input is the responsibility of 
the users, and  sweep the whole issue of isolated surrogates under the carpet.

One more thought: Do we really need to construct a Character to cache a 
surrogate? Couldn't we just stash the numeric value (unsigned short?), with 0 
acting as the "none" case rather than null?  Object churn is generally a Bad 
thing in inner loops.


was (Author: JIRAUSER285361):
Unfortunately, no, the patch doesn't seem to be working for me. For both 
ToXMLStreamTest and ToHTMLStreamTest, the UTF-8 pass reports:
{code:java}
{code}
I should admit that I don't, at first glance, see +why+ it's failing; might be 
time to fire up the debugger and watch it fail.

 

 

Note that we have some annoyingly parallel solutions in this class – 
isHighSurrogate() is tested five different places under multiple serializing 
loops, One of them, in the *characters* method, actually does admit that 
there's a buffer bounds risk in its look-ahead; that's the one at line 1598 of 
my copy:
{code:java}
else if (Encodings.isHighUTF16Surrogate(ch) && i < end-1 && 
Encodings.isLowUTF16Surrogate(chars[i+1])) {{code}
though it doesn't do more than dump the surrogates as Numeric Character 
Entities when the boundary is crossed.  The others (two in 
*writeNormalizedCharacters,* one in {*}accumDefaultEscape{*}, one in 
{*}writeAttrString{*}) appear to blithely assume that if buffer division is 
possible, that's been done on Unicode Character, rather than UTF16 unit, 
boundaries. Which is what I would expect to happen in most of our code. But 
mistakes get made, and the serializer APIs can be invoked from code other than 
Xalan, so guarding against this isn't an unreasonable idea.

In fact, that would probably be the right way to test this – write unit-test 
code that exercises ToStream directly, rather than trying to do so from the 
functional-test level.

Some off-the-cuff code review on your patch while I was glancing at it:

I notice that you clear m_highUTF16Surrogate after a surrogate pair, and flush 
it out before the "fallback plan". The former makes sense. The latter ... Well, 
ill-formed UTF16 isn't supposed to occur, so that

[jira] [Comment Edited] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810175#comment-17810175
 ] 

Joe Kesselman edited comment on XALANJ-2725 at 1/24/24 2:43 AM:


Unfortunately, no, the patch doesn't seem to be working for me. For both 
ToXMLStreamTest and ToHTMLStreamTest, the UTF-8 pass reports:
{code:java}
{code}
I should admit that I don't, at first glance, see +why+ it's failing; might be 
time to fire up the debugger and watch it fail.

 

 

Note that we have some annoyingly parallel solutions in this class – 
isHighSurrogate() is tested five different places under multiple serializing 
loops, One of them, in the *characters* method, actually does admit that 
there's a buffer bounds risk in its look-ahead; that's the one at line 1598 of 
my copy:
{code:java}
else if (Encodings.isHighUTF16Surrogate(ch) && i < end-1 && 
Encodings.isLowUTF16Surrogate(chars[i+1])) {{code}
though it doesn't do more than dump the surrogates as Numeric Character 
Entities when the boundary is crossed.  The others (two in 
*writeNormalizedCharacters,* one in {*}accumDefaultEscape{*}, one in 
{*}writeAttrString{*}) appear to blithely assume that if buffer division is 
possible, that's been done on Unicode Character, rather than UTF16 unit, 
boundaries. Which is what I would expect to happen in most of our code. But 
mistakes get made, and the serializer APIs can be invoked from code other than 
Xalan, so guarding against this isn't an unreasonable idea.

In fact, that would probably be the right way to test this – write unit-test 
code that exercises ToStream directly, rather than trying to do so from the 
functional-test level.

Some off-the-cuff code review on your patch while I was glancing at it:

I notice that you clear m_highUTF16Surrogate after a surrogate pair, and flush 
it out before the "fallback plan". The former makes sense. The latter ... Well, 
ill-formed UTF16 isn't supposed to occur, so that combination really shouldn't 
happen. If it does, I'm not sure writing high out as a numeric character 
reference makes sense as anything but an error indication, in which case I'd be 
tempted to write it as _HIGH_SURROGATE_; to make clear that this 
is what's going on.

If we're going to assume that isolated surrogates are possible at all, your 
code risks combining them into a single character that never actually existed, 
since the high-surrogate cache isn't cleared until it's used. That could be 
hard to diagnose. ("Spooky action at a distance"?) I dislike spending the 
cycles, but we might want to make sure the high surrogate doesn't outlast the 
next UTF16 unit even if it isn't a low surrogate.

Or we can assert that providing correct UTF16 input is the responsibility of 
the users, and  sweep the whole issue of isolated surrogates under the carpet.

One more thought: Do we really need to construct a Character to cache a 
surrogate? Couldn't we just stash the numeric value (unsigned short?), with 0 
acting as the "none" case rather than null?  Object churn is generally a Bad 
thing in inner loops.


was (Author: JIRAUSER285361):
Unfortunately, no, the patch doesn't seem to be working for me. For both 
ToXMLStreamTest and ToHTMLStreamTest, the UTF-8 pass reports:


{code:java}
{code}
{{{}
{}}}I should admit that I don't, at first glance, see +why+ it's failing; might 
be time to fire up the debugger and watch it fail.{{{}
{}}}

 

 

Note that we have some annoyingly parallel solutions in this class – 
isHighSurrogate() is tested five different places under multiple serializing 
loops, One of them, in the *characters* method, actually does admit that 
there's a buffer bounds risk in its look-ahead; that's the one at line 1598 of 
my copy:
{{      }}
{code:java}
else if (Encodings.isHighUTF16Surrogate(ch) && i < end-1 && 
Encodings.isLowUTF16Surrogate(chars[i+1])) {{code}

though it doesn't do more than dump the surrogates as Numeric Character 
Entities when the boundary is crossed.  The others (two in 
*writeNormalizedCharacters,* one in {*}accumDefaultEscape{*}, one in 
{*}writeAttrString{*}) appear to blithely assume that if buffer division is 
possible, that's been done on Unicode Character, rather than UTF16 unit, 
boundaries. Which is what I would expect to happen in most of our code. But 
mistakes get made, and the serializer APIs can be invoked from code other than 
Xalan, so guarding against this isn't an unreasonable idea.

In fact, that would probably be the right way to test this – write unit-test 
code that exercises ToStream directly, rather than trying to do so from the 
functional-test level.





Some off-the-cuff code review on your patch while I was glancing at it:

I notice that you clear m_highUTF16Surrogate after a surrogate pair, and flush 
it out before the "fallback plan". The former makes sense. The latter ... Well, 
ill-formed UTF16 isn't

[jira] [Commented] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810175#comment-17810175
 ] 

Joe Kesselman commented on XALANJ-2725:
---

Unfortunately, no, the patch doesn't seem to be working for me. For both 
ToXMLStreamTest and ToHTMLStreamTest, the UTF-8 pass reports:


{code:java}
{code}
{{{}
{}}}I should admit that I don't, at first glance, see +why+ it's failing; might 
be time to fire up the debugger and watch it fail.{{{}
{}}}

 

 

Note that we have some annoyingly parallel solutions in this class – 
isHighSurrogate() is tested five different places under multiple serializing 
loops, One of them, in the *characters* method, actually does admit that 
there's a buffer bounds risk in its look-ahead; that's the one at line 1598 of 
my copy:
{{      }}
{code:java}
else if (Encodings.isHighUTF16Surrogate(ch) && i < end-1 && 
Encodings.isLowUTF16Surrogate(chars[i+1])) {{code}

though it doesn't do more than dump the surrogates as Numeric Character 
Entities when the boundary is crossed.  The others (two in 
*writeNormalizedCharacters,* one in {*}accumDefaultEscape{*}, one in 
{*}writeAttrString{*}) appear to blithely assume that if buffer division is 
possible, that's been done on Unicode Character, rather than UTF16 unit, 
boundaries. Which is what I would expect to happen in most of our code. But 
mistakes get made, and the serializer APIs can be invoked from code other than 
Xalan, so guarding against this isn't an unreasonable idea.

In fact, that would probably be the right way to test this – write unit-test 
code that exercises ToStream directly, rather than trying to do so from the 
functional-test level.





Some off-the-cuff code review on your patch while I was glancing at it:

I notice that you clear m_highUTF16Surrogate after a surrogate pair, and flush 
it out before the "fallback plan". The former makes sense. The latter ... Well, 
ill-formed UTF16 isn't supposed to occur, so that combination really shouldn't 
happen. If it does, I'm not sure writing high out as a numeric character 
reference makes sense as anything but an error indication, in which case I'd be 
tempted to write it as _HIGH_SURROGATE_; to make clear that this 
is what's going on.

If we're going to assume that isolated surrogates are possible at all, your 
code risks combining them into a single character that never actually existed, 
since the high-surrogate cache isn't cleared until it's used. That could be 
hard to diagnose. ("Spooky action at a distance"?) I dislike spending the 
cycles, but we might want to make sure the high surrogate doesn't outlast the 
next UTF16 unit even if it isn't a low surrogate.

Or we can assert that providing correct UTF16 input is the responsibility of 
the users, and  sweep the whole issue of isolated surrogates under the carpet.


One more thought: Do we really need to construct a Character to cache a 
surrogate? Couldn't we just stash the numeric value (unsigned short?), with 0 
acting as the "none" case rather than null?  Object churn is generally a Bad 
thing in inner loops.

> Possible buffer-boundry issue when serializing surrogate pairs
> --
>
> Key: XALANJ-2725
> URL: https://issues.apache.org/jira/browse/XALANJ-2725
> Project: XalanJ2
>  Issue Type: Improvement
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization
>Reporter: Joe Kesselman
>Assignee: Joe Kesselman
>Priority: Major
>  Labels: Surrogates, escaping, unicode, utf
> Attachments: astral-chars-split-buffer.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a 
> surrogate pair (two UTF-16 units), were not being serialized correctly. We 
> have a proposed fix for that.
> There is reported to still be an edge case when a surrogate pair which 
> crosses buffer boundaries might not be handled correctly. [~maxfortun] 
> offered what looks like a reasonable proposed fix 
> (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607),
>  but in my testing this was not serializing the surrogate pairs correctly, 
> causing regression on the tests XALANJ-2419 introduced. I don't know whether 
> that's because we're taking multiple paths through
> But the edge case does appear to be real, and if so we will need some such 
> solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail:

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810125#comment-17810125
 ] 

Joe Kesselman commented on XALANJ-2419:
---

Everything should have been carried forward. That may have been done manually 
and history may have been lost, but I don't believe any actual work has been 
lost.

In any case, Master *IS* where new development is going. So if you can find 
anything which has not been addressed there, please flag it for our attention. 
Just don't assume that the absence of a particular commit, or a particular 
merge, means something is missing. Real-world git histories sometimes get 
messy, especially when operated by real-world humans.

("Begin by assuming a spherical cow...")

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Fix For: The Latest Development Code
>
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Comment Edited] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810125#comment-17810125
 ] 

Joe Kesselman edited comment on XALANJ-2419 at 1/23/24 10:41 PM:
-

Everything should have been carried forward. That may have been done manually 
and history may have been lost, but I don't believe any actual work has been 
lost.

In any case, Master *IS* where new development is going. So if you can find 
anything which has not been addressed there, please flag it for our attention. 
Just don't assume that the absence of a particular commit, or a particular 
merge, means something is missing. Real-world git histories sometimes get 
messy, especially when operated by real-world humans. And don't let past 
mistakes get in the way of doing what's right in the future.

("Begin by assuming a spherical cow...")


was (Author: JIRAUSER285361):
Everything should have been carried forward. That may have been done manually 
and history may have been lost, but I don't believe any actual work has been 
lost.

In any case, Master *IS* where new development is going. So if you can find 
anything which has not been addressed there, please flag it for our attention. 
Just don't assume that the absence of a particular commit, or a particular 
merge, means something is missing. Real-world git histories sometimes get 
messy, especially when operated by real-world humans.

("Begin by assuming a spherical cow...")

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Fix For: The Latest Development Code
>
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Jira



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810119#comment-17810119
 ] 

Cédric Damioli commented on XALANJ-2419:


I totally agree with you in theory, but the fact is that 2.7.2 and 2.7.3 were 
*not* released from master, or am I wrong here ?

There is a 2_7_x_maint, but with no commits in 5 years

I'm afraid we've lost some commits here with the lost of 2_7_1_maint ?

Or were all commits on 2_7_1_maint from the last years actually backports from 
master ?

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Fix For: The Latest Development Code
>
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810117#comment-17810117
 ] 

Joe Kesselman commented on XALANJ-2419:
---

[~cdamioli] : I believe that was due to some mistakes in how the release was 
handled, and the awkward juggling needed to correct those mistakes.

*Master* is always supposed to be our primary development branch. New code may 
be developed on other branches but isn't official until it is merged into 
{*}Master{*},

When a release is made, a tag or fork is created for that release number. Thus, 
there should be branches/tags for {*}2.7.1{*}, {*}2.7.2{*}, and *2.7.3* (along 
with older checkpoints).

If hot fixes are needed which must be applied to code that has already been 
released (rather than just being included in the next release), we may create 
*maint* branches where the change is back-ported to the earlier versions.  
Essentially *2.7.1.maint* is the "development master" for *2.7.1.1.* This does 
_not_ mean *Master* should be derived from *maint* branches. It does mean that 
if something is fixed in an old release, Master should also be fixed  – but due 
to code evolution over time, the fix may not be identical, and *maint* is not 
one of *Master's* dependencies, so that must be done manually.

I believe that what I've just described is standard SCCS "best practice". It's 
certainly how we managed Xalan (mumble) years ago before I dropped out of it.

 

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Fix For: The Latest Development Code
>
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Resolved] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



 [ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Kesselman resolved XALANJ-2419.
---
Fix Version/s: The Latest Development Code
   Resolution: Fixed

Fixed on head. Haven't yet incremented semantic version to 2.7.3.1. Should?

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Fix For: The Latest Development Code
>
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Jira



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810114#comment-17810114
 ] 

Cédric Damioli commented on XALANJ-2419:


I may be wrong here, but I think 2.7.2 and 2.7.3 were not released from master 
but from some maintenance branch.

Something like 2_7_1_maint IIRC

By the way, I can't find that branch anymore in the github repo. Do you know 
where is it ?

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

Re: [PR] Make sure the new stream tests gate apitest success or failure [xalan-test]

2024-01-23 Thread via GitHub



jkesselm merged PR #9:
URL: https://github.com/apache/xalan-test/pull/9


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[PR] Make sure the new stream tests gate apitest success or failure [xalan-test]

2024-01-23 Thread via GitHub



jkesselm opened a new pull request, #9:
URL: https://github.com/apache/xalan-test/pull/9

   Notes to myself:
   
   Apparently we're relying on an explicit list of expected-good tests in 
build.xml, looking for the Pass-testname files.
   
   There is currently one known fail in SmoketestOuttakes (which is why it's an 
outtake), which I believe is also reflected in Harness failing. Should sanity 
check that it's something we're aware of and have either accepted divergence on 
or opened a work item for.
   
   Also note that the performance tests (TimeDTM*) are deliberately considered 
ambiguous, since we don't have anything calculating "reasonable" time limits 
for a platform, and indeed that's nontrivial to do. Performance testing is 
usually conducted manually/locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810111#comment-17810111
 ] 

Joe Kesselman commented on XALANJ-2419:
---

Merged into Master. Obviously if there seems to be a regression vs. 2.7.3, 
please let me know.

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

Re: [PR] Xalanj 2419 -- Issues serializing astral characters [xalan-test]

2024-01-23 Thread via GitHub



jkesselm merged PR #8:
URL: https://github.com/apache/xalan-test/pull/8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Re: [PR] XALANJ-2419: Erroneous serialization of astral characters [xalan-java]

2024-01-23 Thread via GitHub



jkesselm merged PR #163:
URL: https://github.com/apache/xalan-java/pull/163


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[jira] [Commented] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

2024-01-23 Thread Max (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810065#comment-17810065
 ] 

Max commented on XALANJ-2725:
-

[~kesh...@alum.mit.edu] ,
I added a patch I tried with your PR. See if your regression tests pass now. 
Need to figure out a good test case for the split buffer.

> Possible buffer-boundry issue when serializing surrogate pairs
> --
>
> Key: XALANJ-2725
> URL: https://issues.apache.org/jira/browse/XALANJ-2725
> Project: XalanJ2
>  Issue Type: Improvement
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization
>Reporter: Joe Kesselman
>Assignee: Joe Kesselman
>Priority: Major
>  Labels: Surrogates, escaping, unicode, utf
> Attachments: astral-chars-split-buffer.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a 
> surrogate pair (two UTF-16 units), were not being serialized correctly. We 
> have a proposed fix for that.
> There is reported to still be an edge case when a surrogate pair which 
> crosses buffer boundaries might not be handled correctly. [~maxfortun] 
> offered what looks like a reasonable proposed fix 
> (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607),
>  but in my testing this was not serializing the surrogate pairs correctly, 
> causing regression on the tests XALANJ-2419 introduced. I don't know whether 
> that's because we're taking multiple paths through
> But the edge case does appear to be real, and if so we will need some such 
> solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[jira] [Updated] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

2024-01-23 Thread Max (Jira)



 [ 
https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max updated XALANJ-2725:

Attachment: astral-chars-split-buffer.patch

> Possible buffer-boundry issue when serializing surrogate pairs
> --
>
> Key: XALANJ-2725
> URL: https://issues.apache.org/jira/browse/XALANJ-2725
> Project: XalanJ2
>  Issue Type: Improvement
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization
>Reporter: Joe Kesselman
>Assignee: Joe Kesselman
>Priority: Major
>  Labels: Surrogates, escaping, unicode, utf
> Attachments: astral-chars-split-buffer.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a 
> surrogate pair (two UTF-16 units), were not being serialized correctly. We 
> have a proposed fix for that.
> There is reported to still be an edge case when a surrogate pair which 
> crosses buffer boundaries might not be handled correctly. [~maxfortun] 
> offered what looks like a reasonable proposed fix 
> (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607),
>  but in my testing this was not serializing the surrogate pairs correctly, 
> causing regression on the tests XALANJ-2419 introduced. I don't know whether 
> that's because we're taking multiple paths through
> But the edge case does appear to be real, and if so we will need some such 
> solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Re: [PR] improving xpath 3.1 function fn:deep-equal's implementation, by adding support for collation argument. adding a new working related test case as well. committing new xercesj implementation ja

2024-01-23 Thread via GitHub



mukulga merged PR #164:
URL: https://github.com/apache/xalan-java/pull/164


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[PR] improving xpath 3.1 function fn:deep-equal's implementation, by adding support for collation argument. adding a new working related test case as well. committing new xercesj implementation jar as

2024-01-23 Thread via GitHub



mukulga opened a new pull request, #164:
URL: https://github.com/apache/xalan-java/pull/164

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[PR] XALANJ-2419: Erroneous serialization of astral characters [xalan-java]

2024-01-23 Thread via GitHub



jkesselm opened a new pull request, #163:
URL: https://github.com/apache/xalan-java/pull/163

   See also https://github.com/apache/xalan-test/compare/master...XALANJ-2419
   
   Note that there is a remaining edge case, 
https://issues.apache.org/jira/browse/XALANJ-2725, but this PR is a definite 
step forward.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[PR] Xalanj 2419 -- Issues serializing astral characters [xalan-test]

2024-01-23 Thread via GitHub



jkesselm opened a new pull request, #8:
URL: https://github.com/apache/xalan-test/pull/8

   See also https://github.com/apache/xalan-java/compare/master...XALANJ-2419


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810047#comment-17810047
 ] 

Joe Kesselman commented on XALANJ-2419:
---

Opened https://issues.apache.org/jira/browse/XALANJ-2725 for [~maxfortun] 's 
issue.

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Created] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

2024-01-23 Thread Joe Kesselman (Jira)

Joe Kesselman created XALANJ-2725:
-

 Summary: Possible buffer-boundry issue when serializing surrogate 
pairs
 Key: XALANJ-2725
 URL: https://issues.apache.org/jira/browse/XALANJ-2725
 Project: XalanJ2
  Issue Type: Improvement
  Security Level: No security risk; visible to anyone (Ordinary problems in 
Xalan projects.  Anybody can view the issue.)
  Components: Serialization
Reporter: Joe Kesselman
Assignee: Joe Kesselman


XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a 
surrogate pair (two UTF-16 units), were not being serialized correctly. We have 
a proposed fix for that.

There is reported to still be an edge case when a surrogate pair which crosses 
buffer boundaries might not be handled correctly. [~maxfortun] offered what 
looks like a reasonable proposed fix 
(https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607),
 but in my testing this was not serializing the surrogate pairs correctly, 
causing regression on the tests XALANJ-2419 introduced. I don't know whether 
that's because we're taking multiple paths through

But the edge case does appear to be real, and if so we will need some such 
solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

[jira] [Comment Edited] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Max (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810043#comment-17810043
 ] 

Max edited comment on XALANJ-2419 at 1/23/24 5:08 PM:
--

[~kesh...@alum.mit.edu] , thank you for working on this. As you suggested, why 
don't you merge what works and I can try to help you work on the split buffer 
issue after? on a good code?

 


was (Author: maxfortun):
[~kesh...@alum.mit.edu] , thank you for working on this. As you suggested, why 
don't you merge what works and I can try to help you work on the split buffer 
issue after on a good code?

 

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Max (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810043#comment-17810043
 ] 

Max commented on XALANJ-2419:
-

[~kesh...@alum.mit.edu] , thank you for working on this. As you suggested, why 
don't you merge what works and I can try to help you work on the split buffer 
issue after on a good code?

 

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810039#comment-17810039
 ] 

Joe Kesselman commented on XALANJ-2419:
---

Still having trouble with Max's suggestion. Current recommendation: Merge what 
we've got and open a new work item for that concern.

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



[ https://issues.apache.org/jira/browse/XALANJ-2419 ]


Joe Kesselman deleted comment on XALANJ-2419:
---

was (Author: JIRAUSER285361):
Max's alternative does cause a regression in some of the new tests, assuming I 
applied it correctly. Surprising. Can take a longer look, but may want to merge 
what we have first since it *is* an improvement over the previous code.




 

 

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

2024-01-23 Thread Joe Kesselman (Jira)



[ 
https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810029#comment-17810029
 ] 

Joe Kesselman commented on XALANJ-2419:
---

Max's alternative does cause a regression in some of the new tests, assuming I 
applied it correctly. Surprising. Can take a longer look, but may want to merge 
what we have first since it *is* an improvement over the previous code.




 

 

> Astral characters written as a pair of NCRs with the surrogate scalar values 
> when using UTF-8
> -
>
> Key: XALANJ-2419
> URL: https://issues.apache.org/jira/browse/XALANJ-2419
> Project: XalanJ2
>  Issue Type: Bug
>  Components: Serialization
>Affects Versions: 2.7.1
>Reporter: Henri Sivonen
>Assignee: Joe Kesselman
>Priority: Major
> Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
> else if (m_encodingInfo.isInEncoding(ch)) {
> // If the character is in the encoding, and
> // not in the normal ASCII range, we also
> // just leave it get added on to the clean characters
> 
> }
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do?  We choose to write out an entity
> writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>

[jira] [Comment Edited] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

[jira] [Comment Edited] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

[jira] [Commented] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Comment Edited] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Resolved] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

Re: [PR] Make sure the new stream tests gate apitest success or failure [xalan-test]

[PR] Make sure the new stream tests gate apitest success or failure [xalan-test]

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

Re: [PR] Xalanj 2419 -- Issues serializing astral characters [xalan-test]

Re: [PR] XALANJ-2419: Erroneous serialization of astral characters [xalan-java]

[jira] [Commented] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

[jira] [Updated] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

Re: [PR] improving xpath 3.1 function fn:deep-equal's implementation, by adding support for collation argument. adding a new working related test case as well. committing new xercesj implementation ja

[PR] improving xpath 3.1 function fn:deep-equal's implementation, by adding support for collation argument. adding a new working related test case as well. committing new xercesj implementation jar as

[PR] XALANJ-2419: Erroneous serialization of astral characters [xalan-java]

[PR] Xalanj 2419 -- Issues serializing astral characters [xalan-test]

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Created] (XALANJ-2725) Possible buffer-boundry issue when serializing surrogate pairs

[jira] [Comment Edited] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

[jira] [Commented] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

27 matches

Site Navigation

Mail list logo

Footer information