Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-04-25 Thread via GitHub


tez-yetus commented on PR #334:
URL: https://github.com/apache/tez/pull/334#issuecomment-2077742024

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |  14m 22s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 
1 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  15m 10s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  master passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   0m 20s |  master passed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  master passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  master passed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   1m  4s |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   1m  2s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 11s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 12s |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javac  |   0m 12s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 10s |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  javac  |   0m 10s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m  5s |  
tez-plugins/tez-protobuf-history-plugin: The patch generated 1 new + 7 
unchanged - 0 fixed = 8 total (was 7)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  javadoc  |   0m  7s |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m  7s |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   0m 27s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 27s |  tez-protobuf-history-plugin in the 
patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 14s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  35m 49s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/tez/pull/334 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
   | uname | Linux 012dcf99c519 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / b5b622614 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | checkstyle | 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/diff-checkstyle-tez-plugins_tez-protobuf-history-plugin.txt
 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/testReport/ |
   | Max. process+thread count | 107 (vs. ulimit of 5500) |
   | modules | C: tez-plugins/tez-protobuf-history-plugin U: 
tez-plugins/tez-protobuf-history-plugin |
   | Console output | 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-04-25 Thread via GitHub


Aggarwal-Raghav commented on PR #334:
URL: https://github.com/apache/tez/pull/334#issuecomment-2077678494

   > CodedInputStream.totalBytesRetired can be easily checked by 
CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads 
at least twice with ProtoMessageWritable and validates that 
cin.resetSizeCounter() was indeed called?
   
   Have added a basic UT for checking cin.resetSizeCounter() is called.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-04-23 Thread via GitHub


abstractdog commented on PR #334:
URL: https://github.com/apache/tez/pull/334#issuecomment-2073251105

   CodedInputStream.totalBytesRetired can be easily checked by 
CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads 
at least twice with ProtoMessageWritable and validates that 
cin.resetSizeCounter() was indeed called?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-02-06 Thread via GitHub


abstractdog commented on PR #334:
URL: https://github.com/apache/tez/pull/334#issuecomment-1929754065

   > @abstractdog @harishjp. Can you please help get this in tez 0.10.3
   
   thanks @Aggarwal-Raghav  for the patch, let me check soon
   I'm really sorry but tez 0.10.3 rc1 is currently being released, so we 
cannot add this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-02-06 Thread via GitHub


Aggarwal-Raghav commented on PR #334:
URL: https://github.com/apache/tez/pull/334#issuecomment-1929745499

   @abstractdog @harishjp. Can you please help get this in tez 0.10.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-02-05 Thread via GitHub


tez-yetus commented on PR #334:
URL: https://github.com/apache/tez/pull/334#issuecomment-1926923425

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 15s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  17m 26s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 29s |  master passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04  |
   | +1 :green_heart: |  compile  |   0m 28s |  master passed with JDK Private 
Build-1.8.0_392-8u392-ga-1~22.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  master passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  master passed with JDK Private 
Build-1.8.0_392-8u392-ga-1~22.04-b08  |
   | +0 :ok: |  spotbugs  |   1m 13s |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   1m 11s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04  |
   | +1 :green_heart: |  javac  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~22.04-b08  |
   | +1 :green_heart: |  javac  |   0m 16s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m  8s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  javadoc  |   0m  8s |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04  |
   | +1 :green_heart: |  javadoc  |   0m  9s |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~22.04-b08  |
   | +1 :green_heart: |  findbugs  |   0m 38s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 32s |  tez-protobuf-history-plugin in the 
patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 16s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  25m 42s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/tez/pull/334 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
   | uname | Linux 006060d13f5e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 5e1cdee75 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~22.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~22.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/testReport/ |
   | Max. process+thread count | 105 (vs. ulimit of 5500) |
   | modules | C: tez-plugins/tez-protobuf-history-plugin U: 
tez-plugins/tez-protobuf-history-plugin |
   | Console output | 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/2/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-02-05 Thread via GitHub


Aggarwal-Raghav commented on code in PR #334:
URL: https://github.com/apache/tez/pull/334#discussion_r1478118024


##
tez-plugins/tez-protobuf-history-plugin/src/main/java/org/apache/tez/dag/history/logging/proto/ProtoMessageWritable.java:
##
@@ -96,6 +96,9 @@ public void readFields(DataInput in) throws IOException {
   cin = CodedInputStream.newInstance(din);
   cin.setSizeLimit(Integer.MAX_VALUE);
 }
+if (din.in != in) {
+  cin.resetSizeCounter();
+}

Review Comment:
   Thanks for the review @zabetak.
   **I missed this Java doc statement.**  I was suspecting that resetting the 
_totalBytesRetired_  after every message read might have unexpected impact 
therefore, I resetted it after every hdfs split read. But based on the Javadoc, 
I think we can reset the counter after every mesage read. Will modify the patch.
   
   Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-02-05 Thread via GitHub


zabetak commented on code in PR #334:
URL: https://github.com/apache/tez/pull/334#discussion_r1477969135


##
tez-plugins/tez-protobuf-history-plugin/src/main/java/org/apache/tez/dag/history/logging/proto/ProtoMessageWritable.java:
##
@@ -96,6 +96,9 @@ public void readFields(DataInput in) throws IOException {
   cin = CodedInputStream.newInstance(din);
   cin.setSizeLimit(Integer.MAX_VALUE);
 }
+if (din.in != in) {
+  cin.resetSizeCounter();
+}

Review Comment:
   The javadoc of `CodedInputStream#setSizeLimit` says the following:
   ```
   If you want to read several messages from a single CodedInputStream, you 
could call resetSizeCounter() after each one to avoid hitting the size limit.
   ```
   Based on that I would be inclined to reset the counter after every single 
message otherwise it still seems feasible to hit the same error if the 
`DataInput` is sufficiently large.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] TEZ-4540: Reading proto data more than 2GB from multiple splits fails [tez]

2024-02-03 Thread via GitHub


tez-yetus commented on PR #334:
URL: https://github.com/apache/tez/pull/334#issuecomment-1925287394

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |  22m  4s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  17m 29s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  master passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04  |
   | +1 :green_heart: |  compile  |   0m 28s |  master passed with JDK Private 
Build-1.8.0_392-8u392-ga-1~22.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  master passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  master passed with JDK Private 
Build-1.8.0_392-8u392-ga-1~22.04-b08  |
   | +0 :ok: |  spotbugs  |   1m 16s |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   1m 15s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04  |
   | +1 :green_heart: |  javac  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 15s |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~22.04-b08  |
   | +1 :green_heart: |  javac  |   0m 15s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m  9s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  javadoc  |   0m  8s |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04  |
   | +1 :green_heart: |  javadoc  |   0m  8s |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~22.04-b08  |
   | +1 :green_heart: |  findbugs  |   0m 37s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 32s |  tez-protobuf-history-plugin in the 
patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 17s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  47m 39s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/tez/pull/334 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
   | uname | Linux a99fc39f95c1 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 5e1cdee75 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~22.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~22.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/1/testReport/ |
   | Max. process+thread count | 105 (vs. ulimit of 5500) |
   | modules | C: tez-plugins/tez-protobuf-history-plugin U: 
tez-plugins/tez-protobuf-history-plugin |
   | Console output | 
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/1/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org