[jira] [Commented] (GEOMETRY-99) Inconsistent equals and hashCode

2020-07-04 Thread Matt Juntunen (Jira)


[ 
https://issues.apache.org/jira/browse/GEOMETRY-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151450#comment-17151450
 ] 

Matt Juntunen commented on GEOMETRY-99:
---

Fixed in fc9272b2366d6b5944e1c1a9ecf568ff89f95a30

> Inconsistent equals and hashCode
> 
>
> Key: GEOMETRY-99
> URL: https://issues.apache.org/jira/browse/GEOMETRY-99
> Project: Apache Commons Geometry
>  Issue Type: Bug
>Reporter: Matt Juntunen
>Priority: Major
>  Labels: beta1
>
> The following classes have inconsistent behavior between {{equals}} and 
> {{hashCode}} in regard to +0.0 and -0.0:
>  * EpsilonDoublePrecisionContext
>  * Vector1D
>  * Vector2D
>  * Vector2D
>  * SphericalCoordinates
>  * PolarCoordinates
>  * AffineTransformMatrix2D
>  * AffineTransformMatrix3D
>  * AxisAngleSequence
> This inconsistency comes from the use of == or {{Precision.equals}} to 
> compare doubles in the {{equals}} method (which consider +0.0 and -0.0 to be 
> equal) and {{Double.hashCode}} in the {{hashCode}} method (which returns 
> different hash codes for +0.0 and -0.0). Hence, instances could be considered 
> equal but have different hash codes.
> We should standardize on the use of {{Double.compare}} in {{equals}} and 
> {{Double.hashCode}} in {{hashCode}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEOMETRY-99) Inconsistent equals and hashCode

2020-07-04 Thread Matt Juntunen (Jira)


 [ 
https://issues.apache.org/jira/browse/GEOMETRY-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Juntunen resolved GEOMETRY-99.
---
Resolution: Fixed

> Inconsistent equals and hashCode
> 
>
> Key: GEOMETRY-99
> URL: https://issues.apache.org/jira/browse/GEOMETRY-99
> Project: Apache Commons Geometry
>  Issue Type: Bug
>Reporter: Matt Juntunen
>Priority: Major
>  Labels: beta1
>
> The following classes have inconsistent behavior between {{equals}} and 
> {{hashCode}} in regard to +0.0 and -0.0:
>  * EpsilonDoublePrecisionContext
>  * Vector1D
>  * Vector2D
>  * Vector2D
>  * SphericalCoordinates
>  * PolarCoordinates
>  * AffineTransformMatrix2D
>  * AffineTransformMatrix3D
>  * AxisAngleSequence
> This inconsistency comes from the use of == or {{Precision.equals}} to 
> compare doubles in the {{equals}} method (which consider +0.0 and -0.0 to be 
> equal) and {{Double.hashCode}} in the {{hashCode}} method (which returns 
> different hash codes for +0.0 and -0.0). Hence, instances could be considered 
> equal but have different hash codes.
> We should standardize on the use of {{Double.compare}} in {{equals}} and 
> {{Double.hashCode}} in {{hashCode}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-geometry] darkma773r closed pull request #94: Windows Build Fixes

2020-07-04 Thread GitBox


darkma773r closed pull request #94:
URL: https://github.com/apache/commons-geometry/pull/94


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r commented on pull request #94: Windows Build Fixes

2020-07-04 Thread GitBox


darkma773r commented on pull request #94:
URL: https://github.com/apache/commons-geometry/pull/94#issuecomment-653830224


   Merged in afde7adad1d9e7b893d37a8278812aaff6590313



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] asfgit merged pull request #93: GEOMETRY-99: Making equals and hashCode consistent

2020-07-04 Thread GitBox


asfgit merged pull request #93:
URL: https://github.com/apache/commons-geometry/pull/93


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r opened a new pull request #94: Windows Build Fixes

2020-07-04 Thread GitBox


darkma773r opened a new pull request #94:
URL: https://github.com/apache/commons-geometry/pull/94


   - fixing checkstyle suppression on windows
   - adjusting unit test value for more consistent result sorting



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (TEXT-178) StringSubstitutor incorrectly removes some escape characters

2020-07-04 Thread Gary D. Gregory (Jira)


 [ 
https://issues.apache.org/jira/browse/TEXT-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary D. Gregory updated TEXT-178:
-
Summary: StringSubstitutor incorrectly removes some escape characters  
(was: StringSubstitutor incorrectly removes escape characters)

> StringSubstitutor incorrectly removes some escape characters
> 
>
> Key: TEXT-178
> URL: https://issues.apache.org/jira/browse/TEXT-178
> Project: Commons Text
>  Issue Type: Bug
>Reporter: Gary D. Gregory
>Priority: Major
>
> The class {{StringSubstitutor}} incorrectly removes escape characters.
> Note that these tests pass:
> {code:java}
> /**
>  * Tests interpolation with weird boundary patterns.
>  */
> @Test
> public void testReplaceWeirdPattens() throws IOException {
> doTestNoReplace(StringUtils.EMPTY);
> doTestNoReplace(EMPTY_EXPR);
> doTestNoReplace("${ }");
> doTestNoReplace("${\t}");
> doTestNoReplace("${\n}");
> doTestNoReplace("${\b}");
> doTestNoReplace("${");
> doTestNoReplace("$}");
> doTestNoReplace("$$}");
> doTestNoReplace("}");
> doTestNoReplace("${}$");
> doTestNoReplace("${}$$");
> doTestNoReplace("${${");
> doTestNoReplace("${${}}");
> doTestNoReplace("${$${}}");
> doTestNoReplace("${$$${}}");
> doTestNoReplace("${$$${$}}");
> doTestNoReplace("${${}}");
> doTestNoReplace("${${ }}");
> }
> {code}
> But these tests fail:
> {code:java}
> /**
>  * Tests interpolation with weird boundary patterns.
>  */
> @Test
> @Disabled
> public void testReplaceWeirdPattensJira() throws IOException {
>  doTestNoReplace("$${");
>  doTestNoReplace("$${a");
>  doTestNoReplace("$$${");
>  doTestNoReplace("$$${a");
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TEXT-178) StringSubstitutor incorrectly removes escape characters

2020-07-04 Thread Gary D. Gregory (Jira)
Gary D. Gregory created TEXT-178:


 Summary: StringSubstitutor incorrectly removes escape characters
 Key: TEXT-178
 URL: https://issues.apache.org/jira/browse/TEXT-178
 Project: Commons Text
  Issue Type: Bug
Reporter: Gary D. Gregory


The class {{StringSubstitutor}} incorrectly removes escape characters.

Note that these tests pass:
{code:java}
/**
 * Tests interpolation with weird boundary patterns.
 */
@Test
public void testReplaceWeirdPattens() throws IOException {
doTestNoReplace(StringUtils.EMPTY);
doTestNoReplace(EMPTY_EXPR);
doTestNoReplace("${ }");
doTestNoReplace("${\t}");
doTestNoReplace("${\n}");
doTestNoReplace("${\b}");
doTestNoReplace("${");
doTestNoReplace("$}");
doTestNoReplace("$$}");
doTestNoReplace("}");
doTestNoReplace("${}$");
doTestNoReplace("${}$$");
doTestNoReplace("${${");
doTestNoReplace("${${}}");
doTestNoReplace("${$${}}");
doTestNoReplace("${$$${}}");
doTestNoReplace("${$$${$}}");
doTestNoReplace("${${}}");
doTestNoReplace("${${ }}");
}
{code}

But these tests fail:

{code:java}
/**
 * Tests interpolation with weird boundary patterns.
 */
@Test
@Disabled
public void testReplaceWeirdPattensJira() throws IOException {
 doTestNoReplace("$${");
 doTestNoReplace("$${a");
 doTestNoReplace("$$${");
 doTestNoReplace("$$${a");
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-geometry] darkma773r opened a new pull request #93: GEOMETRY-99: Making equals and hashCode consistent

2020-07-04 Thread GitBox


darkma773r opened a new pull request #93:
URL: https://github.com/apache/commons-geometry/pull/93


   Using Double.compare in equals and Double.hashCode in hashCode



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (GEOMETRY-99) Inconsistent equals and hashCode

2020-07-04 Thread Matt Juntunen (Jira)


 [ 
https://issues.apache.org/jira/browse/GEOMETRY-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Juntunen updated GEOMETRY-99:
--
Labels: beta1  (was: )

> Inconsistent equals and hashCode
> 
>
> Key: GEOMETRY-99
> URL: https://issues.apache.org/jira/browse/GEOMETRY-99
> Project: Apache Commons Geometry
>  Issue Type: Bug
>Reporter: Matt Juntunen
>Priority: Major
>  Labels: beta1
>
> The following classes have inconsistent behavior between {{equals}} and 
> {{hashCode}} in regard to +0.0 and -0.0:
>  * EpsilonDoublePrecisionContext
>  * Vector1D
>  * Vector2D
>  * Vector2D
>  * SphericalCoordinates
>  * PolarCoordinates
>  * AffineTransformMatrix2D
>  * AffineTransformMatrix3D
>  * AxisAngleSequence
> This inconsistency comes from the use of == or {{Precision.equals}} to 
> compare doubles in the {{equals}} method (which consider +0.0 and -0.0 to be 
> equal) and {{Double.hashCode}} in the {{hashCode}} method (which returns 
> different hash codes for +0.0 and -0.0). Hence, instances could be considered 
> equal but have different hash codes.
> We should standardize on the use of {{Double.compare}} in {{equals}} and 
> {{Double.hashCode}} in {{hashCode}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEOMETRY-99) Inconsistent equals and hashCode

2020-07-04 Thread Matt Juntunen (Jira)
Matt Juntunen created GEOMETRY-99:
-

 Summary: Inconsistent equals and hashCode
 Key: GEOMETRY-99
 URL: https://issues.apache.org/jira/browse/GEOMETRY-99
 Project: Apache Commons Geometry
  Issue Type: Bug
Reporter: Matt Juntunen


The following classes have inconsistent behavior between {{equals}} and 
{{hashCode}} in regard to +0.0 and -0.0:
 * EpsilonDoublePrecisionContext
 * Vector1D
 * Vector2D
 * Vector2D
 * SphericalCoordinates
 * PolarCoordinates
 * AffineTransformMatrix2D
 * AffineTransformMatrix3D
 * AxisAngleSequence

This inconsistency comes from the use of == or {{Precision.equals}} to compare 
doubles in the {{equals}} method (which consider +0.0 and -0.0 to be equal) and 
{{Double.hashCode}} in the {{hashCode}} method (which returns different hash 
codes for +0.0 and -0.0). Hence, instances could be considered equal but have 
different hash codes.

We should standardize on the use of {{Double.compare}} in {{equals}} and 
{{Double.hashCode}} in {{hashCode}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEOMETRY-95) CSG Examples

2020-07-04 Thread Matt Juntunen (Jira)


[ 
https://issues.apache.org/jira/browse/GEOMETRY-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151412#comment-17151412
 ] 

Matt Juntunen commented on GEOMETRY-95:
---

I've added a tutorial for BSP trees in general that will be in beta1. However, 
a full 3D CSG tutorial will wait for another release.

> CSG Examples
> 
>
> Key: GEOMETRY-95
> URL: https://issues.apache.org/jira/browse/GEOMETRY-95
> Project: Apache Commons Geometry
>  Issue Type: New Feature
>Reporter: Matt Juntunen
>Priority: Major
>
> Adding Constructive Solid Geometry examples and userguide entries to help new 
> users to the library use these features.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEOMETRY-95) CSG Examples

2020-07-04 Thread Matt Juntunen (Jira)


 [ 
https://issues.apache.org/jira/browse/GEOMETRY-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Juntunen updated GEOMETRY-95:
--
Labels:   (was: beta1)

> CSG Examples
> 
>
> Key: GEOMETRY-95
> URL: https://issues.apache.org/jira/browse/GEOMETRY-95
> Project: Apache Commons Geometry
>  Issue Type: New Feature
>Reporter: Matt Juntunen
>Priority: Major
>
> Adding Constructive Solid Geometry examples and userguide entries to help new 
> users to the library use these features.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [commons-geometry] darkma773r commented on pull request #87: suggest use enhanced for when we can

2020-07-04 Thread GitBox


darkma773r commented on pull request #87:
URL: https://github.com/apache/commons-geometry/pull/87#issuecomment-653757469


   Change implemented in 385c02a9b2cc924ebdaacc8ba02f93b76cac31cc



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r closed pull request #87: suggest use enhanced for when we can

2020-07-04 Thread GitBox


darkma773r closed pull request #87:
URL: https://github.com/apache/commons-geometry/pull/87


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r closed pull request #91: Change a class name from Error to Exception as it is an Exception not an Error

2020-07-04 Thread GitBox


darkma773r closed pull request #91:
URL: https://github.com/apache/commons-geometry/pull/91


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r commented on pull request #91: Change a class name from Error to Exception as it is an Exception not an Error

2020-07-04 Thread GitBox


darkma773r commented on pull request #91:
URL: https://github.com/apache/commons-geometry/pull/91#issuecomment-653756123


   Merged in commit 3c25e07d19e0d7c36073713c6a60d15adf4a5ecf



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r commented on pull request #90: simplify if conditions

2020-07-04 Thread GitBox


darkma773r commented on pull request #90:
URL: https://github.com/apache/commons-geometry/pull/90#issuecomment-653756077


   Merged in commit c2f929e2b182d070caf6ed3f605380d09cd1765b



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r closed pull request #90: simplify if conditions

2020-07-04 Thread GitBox


darkma773r closed pull request #90:
URL: https://github.com/apache/commons-geometry/pull/90


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r commented on pull request #89: add break to a loop.

2020-07-04 Thread GitBox


darkma773r commented on pull request #89:
URL: https://github.com/apache/commons-geometry/pull/89#issuecomment-653756063


   Merged in commit b06c457b98ff09f529c25fe9fb3f6e282320ad54



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r closed pull request #89: add break to a loop.

2020-07-04 Thread GitBox


darkma773r closed pull request #89:
URL: https://github.com/apache/commons-geometry/pull/89


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r closed pull request #88: change a field to local variable, as I didn't see anywhere using it.

2020-07-04 Thread GitBox


darkma773r closed pull request #88:
URL: https://github.com/apache/commons-geometry/pull/88


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r commented on pull request #88: change a field to local variable, as I didn't see anywhere using it.

2020-07-04 Thread GitBox


darkma773r commented on pull request #88:
URL: https://github.com/apache/commons-geometry/pull/88#issuecomment-653756040


   Merged in commit 6c90e34ff11fb9fa279d9b060abf70c14ce3cd2a



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r closed pull request #86: Fix readme in submodule

2020-07-04 Thread GitBox


darkma773r closed pull request #86:
URL: https://github.com/apache/commons-geometry/pull/86


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r commented on pull request #86: Fix readme in submodule

2020-07-04 Thread GitBox


darkma773r commented on pull request #86:
URL: https://github.com/apache/commons-geometry/pull/86#issuecomment-653755976


   Merged in commit bcfed9b1e161b70542f0356fb189a36f5eaea062



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r closed pull request #85: upgrade dependencies

2020-07-04 Thread GitBox


darkma773r closed pull request #85:
URL: https://github.com/apache/commons-geometry/pull/85


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r commented on pull request #85: upgrade dependencies

2020-07-04 Thread GitBox


darkma773r commented on pull request #85:
URL: https://github.com/apache/commons-geometry/pull/85#issuecomment-653755919


   Merged in commit 735ebe6b14f87b7fb0985249b81069ad8dee4840



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r commented on pull request #84: fix javadocs and typos.

2020-07-04 Thread GitBox


darkma773r commented on pull request #84:
URL: https://github.com/apache/commons-geometry/pull/84#issuecomment-653755842


   Merged in commit 6bba60a617c720753021b25a53db6925bfe05189



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-geometry] darkma773r closed pull request #84: fix javadocs and typos.

2020-07-04 Thread GitBox


darkma773r closed pull request #84:
URL: https://github.com/apache/commons-geometry/pull/84


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (COMPRESS-539) TarArchiveInputStream allocates a lot of memory when iterating through an archive

2020-07-04 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151287#comment-17151287
 ] 

Stefan Bodewig commented on COMPRESS-539:
-

I'm sorry we crossed with our actions mid-flight [~peterlee] . I'm afraid I 
have to ask you to revert your change to {{IOUtils.skip}}.

> TarArchiveInputStream allocates a lot of memory when iterating through an 
> archive
> -
>
> Key: COMPRESS-539
> URL: https://issues.apache.org/jira/browse/COMPRESS-539
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Assignee: Peter Lee
>Priority: Major
> Attachments: Don't_call_InputStream#skip.patch, 
> Reuse_recordBuffer.patch, image-2020-06-21-10-58-07-917.png, 
> image-2020-06-21-10-58-43-255.png, image-2020-06-21-10-59-10-825.png
>
>
>  I iterated through the linux source tar and noticed some unneeded 
> allocations happen without extracting any data.
> Reproducing code
> {code:java}
> File tarFile = new File("linux-5.7.1.tar");
> try (TarArchiveInputStream in = new 
> TarArchiveInputStream(Files.newInputStream(tarFile.toPath( {
> TarArchiveEntry entry;
> while ((entry = in.getNextTarEntry()) != null) {
> }
> }
> {code}
> The measurement was done on Java 11.0.7 with the Java Flight Recorder. 
> Options used: 
> -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
> Baseline with the current master implementation:
>  Estimated TLAB allocation: 293MiB
> !image-2020-06-21-10-58-07-917.png!
> 1. IOUtils.skip -> input.skip(numToSkip)
>  This delegates in my test scenario to the InputStream.skip implementation 
> which allocates a new byte[] for every invocation. By simply commenting out 
> the while loop which calls the skip method the estimated TLAB allocation 
> drops to 164MiB (-129MiB).
>  !image-2020-06-21-10-58-43-255.png! 
>  Commenting out the skip call does not seem to be the best solution but it 
> was quick for me to see how much memory can be saved. Also no unit tests 
> where failing for me.
> 2. TarArchiveInputStream.readRecord
>  For every read of the record a new byte[] is created. Since the record size 
> does not change the byte[] can be reused and created when instantiating the 
> TarStream. This optimization is already present in the 
> TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB 
> allocations further to 128MiB (-36MiB).
>  !image-2020-06-21-10-59-10-825.png!
> I attached the patches I used so the results can be verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-539) TarArchiveInputStream allocates a lot of memory when iterating through an archive

2020-07-04 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151285#comment-17151285
 ] 

Stefan Bodewig commented on COMPRESS-539:
-

COMPRESS-449 rather than 443, sorry.

> TarArchiveInputStream allocates a lot of memory when iterating through an 
> archive
> -
>
> Key: COMPRESS-539
> URL: https://issues.apache.org/jira/browse/COMPRESS-539
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Assignee: Peter Lee
>Priority: Major
> Attachments: Don't_call_InputStream#skip.patch, 
> Reuse_recordBuffer.patch, image-2020-06-21-10-58-07-917.png, 
> image-2020-06-21-10-58-43-255.png, image-2020-06-21-10-59-10-825.png
>
>
>  I iterated through the linux source tar and noticed some unneeded 
> allocations happen without extracting any data.
> Reproducing code
> {code:java}
> File tarFile = new File("linux-5.7.1.tar");
> try (TarArchiveInputStream in = new 
> TarArchiveInputStream(Files.newInputStream(tarFile.toPath( {
> TarArchiveEntry entry;
> while ((entry = in.getNextTarEntry()) != null) {
> }
> }
> {code}
> The measurement was done on Java 11.0.7 with the Java Flight Recorder. 
> Options used: 
> -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
> Baseline with the current master implementation:
>  Estimated TLAB allocation: 293MiB
> !image-2020-06-21-10-58-07-917.png!
> 1. IOUtils.skip -> input.skip(numToSkip)
>  This delegates in my test scenario to the InputStream.skip implementation 
> which allocates a new byte[] for every invocation. By simply commenting out 
> the while loop which calls the skip method the estimated TLAB allocation 
> drops to 164MiB (-129MiB).
>  !image-2020-06-21-10-58-43-255.png! 
>  Commenting out the skip call does not seem to be the best solution but it 
> was quick for me to see how much memory can be saved. Also no unit tests 
> where failing for me.
> 2. TarArchiveInputStream.readRecord
>  For every read of the record a new byte[] is created. Since the record size 
> does not change the byte[] can be reused and created when instantiating the 
> TarStream. This optimization is already present in the 
> TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB 
> allocations further to 128MiB (-36MiB).
>  !image-2020-06-21-10-59-10-825.png!
> I attached the patches I used so the results can be verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-539) TarArchiveInputStream allocates a lot of memory when iterating through an archive

2020-07-04 Thread Stefan Bodewig (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151283#comment-17151283
 ] 

Stefan Bodewig commented on COMPRESS-539:
-

not calling skip may reduce the amount of memory consumed but will slow things 
down considerably as skip is usually far more efficient than read. In your case 
I'd recommend wrapping your input stream into a {{SkipShieldingInputStream}} 
instead, which will have the same effect as commenting out the skip calls.

Re-using the buffer in {{TarArchiveInputStream}} sounds right. No idea why this 
is different from TarArchiveOutputStream where it seems we've used a shared 
buffer since forever.

[~peterlee] the team has had this discussion around COMPRESS-443 - a while 
before you joined - and back then explicitly decided we wanted to keep using 
{{skip}} for performance reasons.

> TarArchiveInputStream allocates a lot of memory when iterating through an 
> archive
> -
>
> Key: COMPRESS-539
> URL: https://issues.apache.org/jira/browse/COMPRESS-539
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Assignee: Peter Lee
>Priority: Major
> Attachments: Don't_call_InputStream#skip.patch, 
> Reuse_recordBuffer.patch, image-2020-06-21-10-58-07-917.png, 
> image-2020-06-21-10-58-43-255.png, image-2020-06-21-10-59-10-825.png
>
>
>  I iterated through the linux source tar and noticed some unneeded 
> allocations happen without extracting any data.
> Reproducing code
> {code:java}
> File tarFile = new File("linux-5.7.1.tar");
> try (TarArchiveInputStream in = new 
> TarArchiveInputStream(Files.newInputStream(tarFile.toPath( {
> TarArchiveEntry entry;
> while ((entry = in.getNextTarEntry()) != null) {
> }
> }
> {code}
> The measurement was done on Java 11.0.7 with the Java Flight Recorder. 
> Options used: 
> -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
> Baseline with the current master implementation:
>  Estimated TLAB allocation: 293MiB
> !image-2020-06-21-10-58-07-917.png!
> 1. IOUtils.skip -> input.skip(numToSkip)
>  This delegates in my test scenario to the InputStream.skip implementation 
> which allocates a new byte[] for every invocation. By simply commenting out 
> the while loop which calls the skip method the estimated TLAB allocation 
> drops to 164MiB (-129MiB).
>  !image-2020-06-21-10-58-43-255.png! 
>  Commenting out the skip call does not seem to be the best solution but it 
> was quick for me to see how much memory can be saved. Also no unit tests 
> where failing for me.
> 2. TarArchiveInputStream.readRecord
>  For every read of the record a new byte[] is created. Since the record size 
> does not change the byte[] can be reused and created when instantiating the 
> TarStream. This optimization is already present in the 
> TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB 
> allocations further to 128MiB (-36MiB).
>  !image-2020-06-21-10-59-10-825.png!
> I attached the patches I used so the results can be verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-539) TarArchiveInputStream allocates a lot of memory when iterating through an archive

2020-07-04 Thread Peter Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151284#comment-17151284
 ] 

Peter Lee commented on COMPRESS-539:


Hey [~rschimpf] . Please have a look at commit 
101137bfa0f3ae709c2a2771368b190ceb899ea0 if it helps.

> TarArchiveInputStream allocates a lot of memory when iterating through an 
> archive
> -
>
> Key: COMPRESS-539
> URL: https://issues.apache.org/jira/browse/COMPRESS-539
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Assignee: Peter Lee
>Priority: Major
> Attachments: Don't_call_InputStream#skip.patch, 
> Reuse_recordBuffer.patch, image-2020-06-21-10-58-07-917.png, 
> image-2020-06-21-10-58-43-255.png, image-2020-06-21-10-59-10-825.png
>
>
>  I iterated through the linux source tar and noticed some unneeded 
> allocations happen without extracting any data.
> Reproducing code
> {code:java}
> File tarFile = new File("linux-5.7.1.tar");
> try (TarArchiveInputStream in = new 
> TarArchiveInputStream(Files.newInputStream(tarFile.toPath( {
> TarArchiveEntry entry;
> while ((entry = in.getNextTarEntry()) != null) {
> }
> }
> {code}
> The measurement was done on Java 11.0.7 with the Java Flight Recorder. 
> Options used: 
> -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
> Baseline with the current master implementation:
>  Estimated TLAB allocation: 293MiB
> !image-2020-06-21-10-58-07-917.png!
> 1. IOUtils.skip -> input.skip(numToSkip)
>  This delegates in my test scenario to the InputStream.skip implementation 
> which allocates a new byte[] for every invocation. By simply commenting out 
> the while loop which calls the skip method the estimated TLAB allocation 
> drops to 164MiB (-129MiB).
>  !image-2020-06-21-10-58-43-255.png! 
>  Commenting out the skip call does not seem to be the best solution but it 
> was quick for me to see how much memory can be saved. Also no unit tests 
> where failing for me.
> 2. TarArchiveInputStream.readRecord
>  For every read of the record a new byte[] is created. Since the record size 
> does not change the byte[] can be reused and created when instantiating the 
> TarStream. This optimization is already present in the 
> TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB 
> allocations further to 128MiB (-36MiB).
>  !image-2020-06-21-10-59-10-825.png!
> I attached the patches I used so the results can be verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (COMPRESS-539) TarArchiveInputStream allocates a lot of memory when iterating through an archive

2020-07-04 Thread Peter Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Lee updated COMPRESS-539:
---
Assignee: Peter Lee

> TarArchiveInputStream allocates a lot of memory when iterating through an 
> archive
> -
>
> Key: COMPRESS-539
> URL: https://issues.apache.org/jira/browse/COMPRESS-539
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Assignee: Peter Lee
>Priority: Major
> Attachments: Don't_call_InputStream#skip.patch, 
> Reuse_recordBuffer.patch, image-2020-06-21-10-58-07-917.png, 
> image-2020-06-21-10-58-43-255.png, image-2020-06-21-10-59-10-825.png
>
>
>  I iterated through the linux source tar and noticed some unneeded 
> allocations happen without extracting any data.
> Reproducing code
> {code:java}
> File tarFile = new File("linux-5.7.1.tar");
> try (TarArchiveInputStream in = new 
> TarArchiveInputStream(Files.newInputStream(tarFile.toPath( {
> TarArchiveEntry entry;
> while ((entry = in.getNextTarEntry()) != null) {
> }
> }
> {code}
> The measurement was done on Java 11.0.7 with the Java Flight Recorder. 
> Options used: 
> -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
> Baseline with the current master implementation:
>  Estimated TLAB allocation: 293MiB
> !image-2020-06-21-10-58-07-917.png!
> 1. IOUtils.skip -> input.skip(numToSkip)
>  This delegates in my test scenario to the InputStream.skip implementation 
> which allocates a new byte[] for every invocation. By simply commenting out 
> the while loop which calls the skip method the estimated TLAB allocation 
> drops to 164MiB (-129MiB).
>  !image-2020-06-21-10-58-43-255.png! 
>  Commenting out the skip call does not seem to be the best solution but it 
> was quick for me to see how much memory can be saved. Also no unit tests 
> where failing for me.
> 2. TarArchiveInputStream.readRecord
>  For every read of the record a new byte[] is created. Since the record size 
> does not change the byte[] can be reused and created when instantiating the 
> TarStream. This optimization is already present in the 
> TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB 
> allocations further to 128MiB (-36MiB).
>  !image-2020-06-21-10-59-10-825.png!
> I attached the patches I used so the results can be verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (COMPRESS-539) TarArchiveInputStream allocates a lot of memory when iterating through an archive

2020-07-04 Thread Peter Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151216#comment-17151216
 ] 

Peter Lee edited comment on COMPRESS-539 at 7/4/20, 8:01 AM:
-

IIRC the IOUtils.skip was originally copied from Apache Commons IO(not a maven 
dependency to keep Compress as simple as we want it).

I just checked the IOUtils in Apache Commons IO and found out the skip method 
looks like this:
{code:java}
/**
 * Skips characters from an input character stream.
 * This implementation guarantees that it will read as many characters
 * as possible before giving up; this may not always be the case for
 * skip() implementations in subclasses of {@link Reader}.
 * 
 * Note that the implementation uses {@link Reader#read(char[], int, int)} 
rather
 * than delegating to {@link Reader#skip(long)}.
 * This means that the method may be considerably less efficient than using the 
actual skip implementation,
 * this is done to guarantee that the correct number of characters are skipped.
 * 
 *
 * @param input character stream to skip
 * @param toSkip number of characters to skip.
 * @return number of characters actually skipped.
 * @throws IOException  if there is a problem reading the file
 * @throws IllegalArgumentException if toSkip is negative
 * @see Reader#skip(long)
 * @see https://issues.apache.org/jira/browse/IO-203;>IO-203 - Add 
skipFully() method for InputStreams
 * @since 2.0
 */
public static long skip(final Reader input, final long toSkip) throws 
IOException {
if (toSkip < 0) {
throw new IllegalArgumentException("Skip count must be non-negative, 
actual: " + toSkip);
}
/*
 * N.B. no need to synchronize this because: - we don't care if the buffer 
is created multiple times (the data
 * is ignored) - we always use the same size buffer, so if it it is 
recreated it will still be OK (if the buffer
 * size were variable, we would need to synch. to ensure some other thread 
did not create a smaller one)
 */
if (SKIP_CHAR_BUFFER == null) {
SKIP_CHAR_BUFFER = new char[SKIP_BUFFER_SIZE];
}
long remain = toSkip;
while (remain > 0) {
// See https://issues.apache.org/jira/browse/IO-203 for why we use 
read() rather than delegating to skip()
final long n = input.read(SKIP_CHAR_BUFFER, 0, (int) Math.min(remain, 
SKIP_BUFFER_SIZE));
if (n < 0) { // EOF
break;
}
remain -= n;
}
return toSkip - remain;
}
{code}
Not calling inputstream.skip as you did in your patch. I believe this 
implemention would get the same memory benefit as your patch did.

-BTW seems the IOUtils methods in Commons Compress are somehow out of date - 
maybe we can update them using reference from Commons IO.-

Update : checked all the methods in IOUtils of Compress and it seems that only 
the skip method need to be updated.


was (Author: peterlee):
IIRC the IOUtils.skip was originally copied from Apache Commons IO(not a maven 
dependency to keep Compress as simple as we want it).

I just checked the IOUtils in Apache Commons IO and found out the skip method 
looks like this:
{code:java}
/**
 * Skips characters from an input character stream.
 * This implementation guarantees that it will read as many characters
 * as possible before giving up; this may not always be the case for
 * skip() implementations in subclasses of {@link Reader}.
 * 
 * Note that the implementation uses {@link Reader#read(char[], int, int)} 
rather
 * than delegating to {@link Reader#skip(long)}.
 * This means that the method may be considerably less efficient than using the 
actual skip implementation,
 * this is done to guarantee that the correct number of characters are skipped.
 * 
 *
 * @param input character stream to skip
 * @param toSkip number of characters to skip.
 * @return number of characters actually skipped.
 * @throws IOException  if there is a problem reading the file
 * @throws IllegalArgumentException if toSkip is negative
 * @see Reader#skip(long)
 * @see https://issues.apache.org/jira/browse/IO-203;>IO-203 - Add 
skipFully() method for InputStreams
 * @since 2.0
 */
public static long skip(final Reader input, final long toSkip) throws 
IOException {
if (toSkip < 0) {
throw new IllegalArgumentException("Skip count must be non-negative, 
actual: " + toSkip);
}
/*
 * N.B. no need to synchronize this because: - we don't care if the buffer 
is created multiple times (the data
 * is ignored) - we always use the same size buffer, so if it it is 
recreated it will still be OK (if the buffer
 * size were variable, we would need to synch. to ensure some other thread 
did not create a smaller one)
 */
if (SKIP_CHAR_BUFFER == null) {
SKIP_CHAR_BUFFER = new char[SKIP_BUFFER_SIZE];
}
long remain = toSkip;
while (remain > 0) {
// See 

[jira] [Commented] (COMPRESS-539) TarArchiveInputStream allocates a lot of memory when iterating through an archive

2020-07-04 Thread Peter Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151216#comment-17151216
 ] 

Peter Lee commented on COMPRESS-539:


IIRC the IOUtils.skip was originally copied from Apache Commons IO(not a maven 
dependency to keep Compress as simple as we want it).

I just checked the IOUtils in Apache Commons IO and found out the skip method 
looks like this:
{code:java}
/**
 * Skips characters from an input character stream.
 * This implementation guarantees that it will read as many characters
 * as possible before giving up; this may not always be the case for
 * skip() implementations in subclasses of {@link Reader}.
 * 
 * Note that the implementation uses {@link Reader#read(char[], int, int)} 
rather
 * than delegating to {@link Reader#skip(long)}.
 * This means that the method may be considerably less efficient than using the 
actual skip implementation,
 * this is done to guarantee that the correct number of characters are skipped.
 * 
 *
 * @param input character stream to skip
 * @param toSkip number of characters to skip.
 * @return number of characters actually skipped.
 * @throws IOException  if there is a problem reading the file
 * @throws IllegalArgumentException if toSkip is negative
 * @see Reader#skip(long)
 * @see https://issues.apache.org/jira/browse/IO-203;>IO-203 - Add 
skipFully() method for InputStreams
 * @since 2.0
 */
public static long skip(final Reader input, final long toSkip) throws 
IOException {
if (toSkip < 0) {
throw new IllegalArgumentException("Skip count must be non-negative, 
actual: " + toSkip);
}
/*
 * N.B. no need to synchronize this because: - we don't care if the buffer 
is created multiple times (the data
 * is ignored) - we always use the same size buffer, so if it it is 
recreated it will still be OK (if the buffer
 * size were variable, we would need to synch. to ensure some other thread 
did not create a smaller one)
 */
if (SKIP_CHAR_BUFFER == null) {
SKIP_CHAR_BUFFER = new char[SKIP_BUFFER_SIZE];
}
long remain = toSkip;
while (remain > 0) {
// See https://issues.apache.org/jira/browse/IO-203 for why we use 
read() rather than delegating to skip()
final long n = input.read(SKIP_CHAR_BUFFER, 0, (int) Math.min(remain, 
SKIP_BUFFER_SIZE));
if (n < 0) { // EOF
break;
}
remain -= n;
}
return toSkip - remain;
}
{code}
Not calling inputstream.skip as you did in your patch. I believe this 
implemention would get the same memory benefit as your patch did.

BTW seems the IOUtils methods in Commons Compress are somehow out of date - 
maybe we can update them using reference from Commons IO.

> TarArchiveInputStream allocates a lot of memory when iterating through an 
> archive
> -
>
> Key: COMPRESS-539
> URL: https://issues.apache.org/jira/browse/COMPRESS-539
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: Don't_call_InputStream#skip.patch, 
> Reuse_recordBuffer.patch, image-2020-06-21-10-58-07-917.png, 
> image-2020-06-21-10-58-43-255.png, image-2020-06-21-10-59-10-825.png
>
>
>  I iterated through the linux source tar and noticed some unneeded 
> allocations happen without extracting any data.
> Reproducing code
> {code:java}
> File tarFile = new File("linux-5.7.1.tar");
> try (TarArchiveInputStream in = new 
> TarArchiveInputStream(Files.newInputStream(tarFile.toPath( {
> TarArchiveEntry entry;
> while ((entry = in.getNextTarEntry()) != null) {
> }
> }
> {code}
> The measurement was done on Java 11.0.7 with the Java Flight Recorder. 
> Options used: 
> -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
> Baseline with the current master implementation:
>  Estimated TLAB allocation: 293MiB
> !image-2020-06-21-10-58-07-917.png!
> 1. IOUtils.skip -> input.skip(numToSkip)
>  This delegates in my test scenario to the InputStream.skip implementation 
> which allocates a new byte[] for every invocation. By simply commenting out 
> the while loop which calls the skip method the estimated TLAB allocation 
> drops to 164MiB (-129MiB).
>  !image-2020-06-21-10-58-43-255.png! 
>  Commenting out the skip call does not seem to be the best solution but it 
> was quick for me to see how much memory can be saved. Also no unit tests 
> where failing for me.
> 2. TarArchiveInputStream.readRecord
>  For every read of the record a new byte[] is created. Since the record size 
> does not change the byte[] can be reused and created when instantiating the 
> TarStream. This optimization is already present in the 
> TarArchiveOutputStream. Reusing the buffer 

[jira] [Commented] (COMPRESS-539) TarArchiveInputStream allocates a lot of memory when iterating through an archive

2020-07-04 Thread Peter Lee (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151207#comment-17151207
 ] 

Peter Lee commented on COMPRESS-539:


Looks great. Would you create a github PR so the tests could be applied?

> TarArchiveInputStream allocates a lot of memory when iterating through an 
> archive
> -
>
> Key: COMPRESS-539
> URL: https://issues.apache.org/jira/browse/COMPRESS-539
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: Don't_call_InputStream#skip.patch, 
> Reuse_recordBuffer.patch, image-2020-06-21-10-58-07-917.png, 
> image-2020-06-21-10-58-43-255.png, image-2020-06-21-10-59-10-825.png
>
>
>  I iterated through the linux source tar and noticed some unneeded 
> allocations happen without extracting any data.
> Reproducing code
> {code:java}
> File tarFile = new File("linux-5.7.1.tar");
> try (TarArchiveInputStream in = new 
> TarArchiveInputStream(Files.newInputStream(tarFile.toPath( {
> TarArchiveEntry entry;
> while ((entry = in.getNextTarEntry()) != null) {
> }
> }
> {code}
> The measurement was done on Java 11.0.7 with the Java Flight Recorder. 
> Options used: 
> -XX:StartFlightRecording=settings=profile,filename=allocations.jfr
> Baseline with the current master implementation:
>  Estimated TLAB allocation: 293MiB
> !image-2020-06-21-10-58-07-917.png!
> 1. IOUtils.skip -> input.skip(numToSkip)
>  This delegates in my test scenario to the InputStream.skip implementation 
> which allocates a new byte[] for every invocation. By simply commenting out 
> the while loop which calls the skip method the estimated TLAB allocation 
> drops to 164MiB (-129MiB).
>  !image-2020-06-21-10-58-43-255.png! 
>  Commenting out the skip call does not seem to be the best solution but it 
> was quick for me to see how much memory can be saved. Also no unit tests 
> where failing for me.
> 2. TarArchiveInputStream.readRecord
>  For every read of the record a new byte[] is created. Since the record size 
> does not change the byte[] can be reused and created when instantiating the 
> TarStream. This optimization is already present in the 
> TarArchiveOutputStream. Reusing the buffer reduces the estimated TLAB 
> allocations further to 128MiB (-36MiB).
>  !image-2020-06-21-10-59-10-825.png!
> I attached the patches I used so the results can be verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)