Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
Aggarwal-Raghav commented on PR #334: URL: https://github.com/apache/tez/pull/334#issuecomment-2124978152 @abstractdog, can you please help with the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
tez-yetus commented on PR #334: URL: https://github.com/apache/tez/pull/334#issuecomment-2077742024 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | +0 :ok: | reexec | 14m 22s | Docker mode activated. | ||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | The patch appears to include 1 new or modified test files. | ||| _ master Compile Tests _ | | +1 :green_heart: | mvninstall | 15m 10s | master passed | | +1 :green_heart: | compile | 0m 20s | master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 | | +1 :green_heart: | compile | 0m 20s | master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 | | +1 :green_heart: | checkstyle | 1m 8s | master passed | | +1 :green_heart: | javadoc | 0m 30s | master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 | | +1 :green_heart: | javadoc | 0m 15s | master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 | | +0 :ok: | spotbugs | 1m 4s | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 2s | master passed | ||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 11s | the patch passed | | +1 :green_heart: | compile | 0m 12s | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 | | +1 :green_heart: | javac | 0m 12s | the patch passed | | +1 :green_heart: | compile | 0m 10s | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 | | +1 :green_heart: | javac | 0m 10s | the patch passed | | -0 :warning: | checkstyle | 0m 5s | tez-plugins/tez-protobuf-history-plugin: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) | | +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. | | +1 :green_heart: | javadoc | 0m 7s | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 | | +1 :green_heart: | javadoc | 0m 7s | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 | | +1 :green_heart: | findbugs | 0m 27s | the patch passed | ||| _ Other Tests _ | | +1 :green_heart: | unit | 0m 27s | tez-protobuf-history-plugin in the patch passed. | | +1 :green_heart: | asflicense | 0m 14s | The patch does not generate ASF License warnings. | | | | 35m 49s | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/tez/pull/334 | | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile | | uname | Linux 012dcf99c519 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/tez.sh | | git revision | master / b5b622614 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 | | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/diff-checkstyle-tez-plugins_tez-protobuf-history-plugin.txt | | Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/testReport/ | | Max. process+thread count | 107 (vs. ulimit of 5500) | | modules | C: tez-plugins/tez-protobuf-history-plugin U: tez-plugins/tez-protobuf-history-plugin | | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/console | | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
Aggarwal-Raghav commented on PR #334: URL: https://github.com/apache/tez/pull/334#issuecomment-2077678494 > CodedInputStream.totalBytesRetired can be easily checked by CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads at least twice with ProtoMessageWritable and validates that cin.resetSizeCounter() was indeed called? Have added a basic UT for checking cin.resetSizeCounter() is called. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
abstractdog commented on PR #334: URL: https://github.com/apache/tez/pull/334#issuecomment-2073251105 CodedInputStream.totalBytesRetired can be easily checked by CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads at least twice with ProtoMessageWritable and validates that cin.resetSizeCounter() was indeed called? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
abstractdog commented on PR #334: URL: https://github.com/apache/tez/pull/334#issuecomment-1929754065 > @abstractdog @harishjp. Can you please help get this in tez 0.10.3 thanks @Aggarwal-Raghav for the patch, let me check soon I'm really sorry but tez 0.10.3 rc1 is currently being released, so we cannot add this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
Aggarwal-Raghav commented on PR #334: URL: https://github.com/apache/tez/pull/334#issuecomment-1929745499 @abstractdog @harishjp. Can you please help get this in tez 0.10.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
tez-yetus commented on PR #334: URL: https://github.com/apache/tez/pull/334#issuecomment-1926923425 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | +0 :ok: | reexec | 0m 15s | Docker mode activated. | ||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ master Compile Tests _ | | +1 :green_heart: | mvninstall | 17m 26s | master passed | | +1 :green_heart: | compile | 0m 29s | master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 | | +1 :green_heart: | compile | 0m 28s | master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | +1 :green_heart: | checkstyle | 1m 17s | master passed | | +1 :green_heart: | javadoc | 0m 35s | master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 | | +1 :green_heart: | javadoc | 0m 22s | master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | +0 :ok: | spotbugs | 1m 13s | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 11s | master passed | ||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 17s | the patch passed | | +1 :green_heart: | compile | 0m 17s | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 | | +1 :green_heart: | javac | 0m 17s | the patch passed | | +1 :green_heart: | compile | 0m 16s | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | +1 :green_heart: | javac | 0m 16s | the patch passed | | +1 :green_heart: | checkstyle | 0m 8s | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. | | +1 :green_heart: | javadoc | 0m 8s | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 | | +1 :green_heart: | javadoc | 0m 9s | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | +1 :green_heart: | findbugs | 0m 38s | the patch passed | ||| _ Other Tests _ | | +1 :green_heart: | unit | 0m 32s | tez-protobuf-history-plugin in the patch passed. | | +1 :green_heart: | asflicense | 0m 16s | The patch does not generate ASF License warnings. | | | | 25m 42s | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/tez/pull/334 | | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile | | uname | Linux 006060d13f5e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/tez.sh | | git revision | master / 5e1cdee75 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/testReport/ | | Max. process+thread count | 105 (vs. ulimit of 5500) | | modules | C: tez-plugins/tez-protobuf-history-plugin U: tez-plugins/tez-protobuf-history-plugin | | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/console | | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
Aggarwal-Raghav commented on code in PR #334: URL: https://github.com/apache/tez/pull/334#discussion_r1478118024 ## tez-plugins/tez-protobuf-history-plugin/src/main/java/org/apache/tez/dag/history/logging/proto/ProtoMessageWritable.java: ## @@ -96,6 +96,9 @@ public void readFields(DataInput in) throws IOException { cin = CodedInputStream.newInstance(din); cin.setSizeLimit(Integer.MAX_VALUE); } +if (din.in != in) { + cin.resetSizeCounter(); +} Review Comment: Thanks for the review @zabetak. **I missed this Java doc statement.** I was suspecting that resetting the _totalBytesRetired_ after every message read might have unexpected impact therefore, I resetted it after every hdfs split read. But based on the Javadoc, I think we can reset the counter after every mesage read. Will modify the patch. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
zabetak commented on code in PR #334: URL: https://github.com/apache/tez/pull/334#discussion_r1477969135 ## tez-plugins/tez-protobuf-history-plugin/src/main/java/org/apache/tez/dag/history/logging/proto/ProtoMessageWritable.java: ## @@ -96,6 +96,9 @@ public void readFields(DataInput in) throws IOException { cin = CodedInputStream.newInstance(din); cin.setSizeLimit(Integer.MAX_VALUE); } +if (din.in != in) { + cin.resetSizeCounter(); +} Review Comment: The javadoc of `CodedInputStream#setSizeLimit` says the following: ``` If you want to read several messages from a single CodedInputStream, you could call resetSizeCounter() after each one to avoid hitting the size limit. ``` Based on that I would be inclined to reset the counter after every single message otherwise it still seems feasible to hit the same error if the `DataInput` is sufficiently large. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]
tez-yetus commented on PR #334: URL: https://github.com/apache/tez/pull/334#issuecomment-1925287394 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | +0 :ok: | reexec | 22m 4s | Docker mode activated. | ||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ master Compile Tests _ | | +1 :green_heart: | mvninstall | 17m 29s | master passed | | +1 :green_heart: | compile | 0m 30s | master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 | | +1 :green_heart: | compile | 0m 28s | master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | +1 :green_heart: | checkstyle | 1m 17s | master passed | | +1 :green_heart: | javadoc | 0m 36s | master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 | | +1 :green_heart: | javadoc | 0m 22s | master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | +0 :ok: | spotbugs | 1m 16s | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 15s | master passed | ||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 17s | the patch passed | | +1 :green_heart: | compile | 0m 17s | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 | | +1 :green_heart: | javac | 0m 17s | the patch passed | | +1 :green_heart: | compile | 0m 15s | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | +1 :green_heart: | javac | 0m 15s | the patch passed | | +1 :green_heart: | checkstyle | 0m 9s | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. | | +1 :green_heart: | javadoc | 0m 8s | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 | | +1 :green_heart: | javadoc | 0m 8s | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | +1 :green_heart: | findbugs | 0m 37s | the patch passed | ||| _ Other Tests _ | | +1 :green_heart: | unit | 0m 32s | tez-protobuf-history-plugin in the patch passed. | | +1 :green_heart: | asflicense | 0m 17s | The patch does not generate ASF License warnings. | | | | 47m 39s | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/tez/pull/334 | | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile | | uname | Linux a99fc39f95c1 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/tez.sh | | git revision | master / 5e1cdee75 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~22.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/1/testReport/ | | Max. process+thread count | 105 (vs. ulimit of 5500) | | modules | C: tez-plugins/tez-protobuf-history-plugin U: tez-plugins/tez-protobuf-history-plugin | | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/1/console | | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org