Re: [PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-12 Thread via GitHub
Lobo2008 commented on code in PR #2607: URL: https://github.com/apache/uniffle/pull/2607#discussion_r2343531709 ## client-mr/core/src/main/java/org/apache/hadoop/mapred/RssMapOutputCollector.java: ## @@ -78,12 +78,31 @@ public void init(Context context) throws IOException, Clas

Re: [PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-12 Thread via GitHub
zhengchenyu commented on code in PR #2607: URL: https://github.com/apache/uniffle/pull/2607#discussion_r2343508582 ## client-mr/core/src/main/java/org/apache/hadoop/mapred/RssMapOutputCollector.java: ## @@ -78,12 +78,31 @@ public void init(Context context) throws IOException, C

Re: [PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-12 Thread via GitHub
zhengchenyu commented on code in PR #2607: URL: https://github.com/apache/uniffle/pull/2607#discussion_r2343508582 ## client-mr/core/src/main/java/org/apache/hadoop/mapred/RssMapOutputCollector.java: ## @@ -78,12 +78,31 @@ public void init(Context context) throws IOException, C

Re: [PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-12 Thread via GitHub
Lobo2008 commented on code in PR #2607: URL: https://github.com/apache/uniffle/pull/2607#discussion_r2343491893 ## client-mr/core/src/main/java/org/apache/hadoop/mapred/RssMapOutputCollector.java: ## @@ -78,12 +78,31 @@ public void init(Context context) throws IOException, Clas

Re: [PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-12 Thread via GitHub
Lobo2008 commented on code in PR #2607: URL: https://github.com/apache/uniffle/pull/2607#discussion_r2343327242 ## client-mr/core/src/main/java/org/apache/hadoop/mapred/RssMapOutputCollector.java: ## @@ -78,12 +78,60 @@ public void init(Context context) throws IOException, Clas

Re: [PR] [#2591] feat(client): Introduce the mechanism to report localfile read plan [uniffle]

2025-09-11 Thread via GitHub
zuston commented on PR #2603: URL: https://github.com/apache/uniffle/pull/2603#issuecomment-3268622023 This is ready to review @jerqi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-11 Thread via GitHub
Lobo2008 opened a new pull request, #2607: URL: https://github.com/apache/uniffle/pull/2607 ### What changes were proposed in this pull request? Introduce a configuration `mapreduce.rss.client.combiner.enable` to control whether the map-stage combiner runs in Uniffle MapReduce client.

Re: [PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-11 Thread via GitHub
github-actions[bot] commented on PR #2607: URL: https://github.com/apache/uniffle/pull/2607#issuecomment-3283493368 ## Test Results  3 108 files   - 12   3 108 suites   - 12   6h 44m 38s ⏱️ - 4m 49s  1 201 tests ± 0   1 199 ✅ ± 0   1 💤 ±0  1 ❌ +1  15 199 runs   - 12  15 183 ✅  - 12  

Re: [PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-11 Thread via GitHub
zhengchenyu commented on code in PR #2607: URL: https://github.com/apache/uniffle/pull/2607#discussion_r2342814141 ## client-mr/core/src/main/java/org/apache/hadoop/mapred/RssMapOutputCollector.java: ## @@ -78,12 +78,60 @@ public void init(Context context) throws IOException, C

Re: [PR] [#2606] feat(client-mr): Add safety switch for map-stage combiner [uniffle]

2025-09-11 Thread via GitHub
zuston commented on PR #2607: URL: https://github.com/apache/uniffle/pull/2607#issuecomment-3283425820 Could you help review this? @zhengchenyu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [#2595] feat(client): Dedicated retry times on request assignment when partition reassign [uniffle]

2025-09-11 Thread via GitHub
github-actions[bot] commented on PR #2608: URL: https://github.com/apache/uniffle/pull/2608#issuecomment-3281984387 ## Test Results   886 files   - 2 234    886 suites   - 2 234   20m 44s ⏱️ - 6h 28m 43s   410 tests  -   791    410 ✅  -   789  0 💤  -  1  0 ❌ ±0  6 136 runs   - 9 075 

[PR] [#2595] feat(client): Dedicated retry times on request assignment when partition reassign [uniffle]

2025-09-11 Thread via GitHub
cchung100m opened a new pull request, #2608: URL: https://github.com/apache/uniffle/pull/2608 ### What changes were proposed in this pull request? Dedicated retry times on request assignment when partition reassign ### Why are the changes needed? for https://github.com/apache/uni

[I] [Bug] [MR] Enable Map-Stage Combiner by Default Causes Severe GC and Job Failures [uniffle]

2025-09-11 Thread via GitHub
Lobo2008 opened a new issue, #2606: URL: https://github.com/apache/uniffle/issues/2606 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in t

Re: [PR] [#2591] fix(client): Missing task_id propagation in getLocalShuffleDataV3 [uniffle]

2025-09-11 Thread via GitHub
zuston merged PR #2605: URL: https://github.com/apache/uniffle/pull/2605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

Re: [PR] [#2591] fix(client): Incorrect header length for getLocalShuffleDataV3 [uniffle]

2025-09-11 Thread via GitHub
zuston merged PR #2604: URL: https://github.com/apache/uniffle/pull/2604 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

Re: [PR] [#2591] fix(client): Incorrect header length for getLocalShuffleDataV3 [uniffle]

2025-09-11 Thread via GitHub
zuston commented on PR #2604: URL: https://github.com/apache/uniffle/pull/2604#issuecomment-3278658225 > Could you add a UT for this fix? This is the followup PR for #2603 . Due to the missing server side implementation, it's hard to add end-to-end tests. But I think I will implement

Re: [PR] [#2591] fix(client): Missing task_id propagation in getLocalShuffleDataV3 [uniffle]

2025-09-10 Thread via GitHub
github-actions[bot] commented on PR #2605: URL: https://github.com/apache/uniffle/pull/2605#issuecomment-3274561869 ## Test Results  3 120 files  ±0   3 120 suites  ±0   6h 51m 8s ⏱️ +7s  1 201 tests ±0   1 200 ✅ ±0   1 💤 ±0  0 ❌ ±0  15 211 runs  ±0  15 196 ✅ ±0  15 💤 ±0  0 ❌ ±0 

Re: [PR] [#2591] fix(client): Incorrect header length for getLocalShuffleDataV3 [uniffle]

2025-09-09 Thread via GitHub
jerqi commented on PR #2604: URL: https://github.com/apache/uniffle/pull/2604#issuecomment-3273543948 Could you add a UT for this fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [#2591] fix(client): Incorrect header length for getLocalShuffleDataV3 [uniffle]

2025-09-09 Thread via GitHub
github-actions[bot] commented on PR #2604: URL: https://github.com/apache/uniffle/pull/2604#issuecomment-3273312971 ## Test Results  3 120 files  ±0   3 120 suites  ±0   6h 49m 51s ⏱️ - 1m 10s  1 201 tests ±0   1 200 ✅ ±0   1 💤 ±0  0 ❌ ±0  15 211 runs  ±0  15 196 ✅ ±0  15 💤 ±0  0 ❌ ±

[PR] [#2591] fix(client): Incorrect header length for getLocalShuffleDataV3 [uniffle]

2025-09-09 Thread via GitHub
zuston opened a new pull request, #2604: URL: https://github.com/apache/uniffle/pull/2604 ### What changes were proposed in this pull request? Fix incorrect header length for getLocalShuffleDataV3 ### Why are the changes needed? This will make getLocalShuffleDataV3 invali

Re: [I] [FEATURE] Propogate read localfiles plan to shuffle server to acheive better read ahead performance [uniffle]

2025-09-09 Thread via GitHub
zuston closed issue #2591: [FEATURE] Propogate read localfiles plan to shuffle server to acheive better read ahead performance URL: https://github.com/apache/uniffle/issues/2591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [#2591] feat(client): Introduce the mechanism to report localfile read plan [uniffle]

2025-09-09 Thread via GitHub
zuston merged PR #2603: URL: https://github.com/apache/uniffle/pull/2603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

Re: [I] [FEATURE] Dedicated retry times on request assignment when partition reassign [uniffle]

2025-09-08 Thread via GitHub
zuston commented on issue #2595: URL: https://github.com/apache/uniffle/issues/2595#issuecomment-3268623468 > Hi [@zuston](https://github.com/zuston) > > Do we need to add a new configuration option for a retry logic with backoff and assign it to `GrpcClient`? > > https://gith

Re: [PR] [#2591] feat(client): Introduce the mechanism to report localfile read plan [uniffle]

2025-09-08 Thread via GitHub
zuston closed pull request #2603: [#2591] feat(client): Introduce the mechanism to report localfile read plan URL: https://github.com/apache/uniffle/pull/2603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [#2591] feat(client): Introduce the mechanism to report localfile read plan [uniffle]

2025-09-08 Thread via GitHub
zuston commented on PR #2603: URL: https://github.com/apache/uniffle/pull/2603#issuecomment-3265957922 cc @jerqi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [#2601] feat(spark): Overlapping decompression for shuffle read [uniffle]

2025-09-07 Thread via GitHub
zuston merged PR #2602: URL: https://github.com/apache/uniffle/pull/2602 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

Re: [I] [FEATURE] Overlapping decompression for shuffle read [uniffle]

2025-09-07 Thread via GitHub
zuston closed issue #2601: [FEATURE] Overlapping decompression for shuffle read URL: https://github.com/apache/uniffle/issues/2601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [#2591] feat(client): Introduce a mechanism to report localfile read plan before reading [uniffle]

2025-09-07 Thread via GitHub
jerqi commented on PR #2603: URL: https://github.com/apache/uniffle/pull/2603#issuecomment-3257618533 Is it possible to merge this request to other requests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [#2591] feat(client): Introduce a mechanism to report localfile read plan before reading [uniffle]

2025-09-07 Thread via GitHub
zuston commented on PR #2603: URL: https://github.com/apache/uniffle/pull/2603#issuecomment-3257613401 cc @jerqi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [#2569] feat(spark): Add statistic of shuffle read times [uniffle]

2025-09-07 Thread via GitHub
zuston merged PR #2598: URL: https://github.com/apache/uniffle/pull/2598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

Re: [PR] [#2591] feat(client): Introduce a mechanism to report localfile read plan before reading [uniffle]

2025-09-06 Thread via GitHub
github-actions[bot] commented on PR #2603: URL: https://github.com/apache/uniffle/pull/2603#issuecomment-3257034208 ## Test Results  3 090 files  ±0   3 090 suites  ±0   6h 47m 19s ⏱️ - 2m 10s  1 198 tests ±0   1 196 ✅ ±0   1 💤 ±0  0 ❌ ±0  1 🔥 ±0  15 166 runs  ±0  15 150 ✅ +1  15 💤 ±

Re: [I] [FEATURE] Propogate read localfiles plan to shuffle server to acheive better read ahead performance [uniffle]

2025-09-06 Thread via GitHub
zuston commented on issue #2591: URL: https://github.com/apache/uniffle/issues/2591#issuecomment-3256891441 I will implement a new grpc interface to support this. cc @jerqi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] [FEATURE] Dedicated retry times on request assignment when partition reassign [uniffle]

2025-09-05 Thread via GitHub
cchung100m commented on issue #2595: URL: https://github.com/apache/uniffle/issues/2595#issuecomment-3261380458 Hi @zuston Do we need to add a new configuration option for a retry logic with backoff and assign it to `GrpcClient`? https://github.com/apache/uniffle/blob/1e48bc6

Re: [PR] [#2599] fix(spark): Fix bug the incorrect shuffle read metric for spark [uniffle]

2025-09-05 Thread via GitHub
cchung100m commented on code in PR #2600: URL: https://github.com/apache/uniffle/pull/2600#discussion_r2326565975 ## client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java: ## @@ -117,17 +117,20 @@ public boolean hasNext() { // If Sh

Re: [PR] [#2591] feat(client): Introduce a mechanism to report localfile read plan before reading [uniffle]

2025-09-05 Thread via GitHub
zuston commented on PR #2603: URL: https://github.com/apache/uniffle/pull/2603#issuecomment-3257687273 > Is it possible to merge this request to other requests? Is it better to add the read ahead information to the read request? read A and read ahead B,C,D. Souds good. Let me try to u

[PR] [#2591] feat(client): Introduce a mechanism to report localfile read plan before reading [uniffle]

2025-09-04 Thread via GitHub
zuston opened a new pull request, #2603: URL: https://github.com/apache/uniffle/pull/2603 ### What changes were proposed in this pull request? This PR is to introduce a mechanism to report localfile read plan before real reading, and the changes only are scoped in the client side. Mor

Re: [I] [FEATURE] Propogate read localfiles plan to shuffle server to acheive better read ahead performance [uniffle]

2025-09-04 Thread via GitHub
zuston commented on issue #2591: URL: https://github.com/apache/uniffle/issues/2591#issuecomment-3256922460 And this feature could use the grpc since that is not heavy data transfer. The proto definition will be like this: ``` message ReportLocalReadPlanRequest { string appId

[PR] [#2601] feat(spark): Overlapping decompression for shuffle read [uniffle]

2025-09-04 Thread via GitHub
zuston opened a new pull request, #2602: URL: https://github.com/apache/uniffle/pull/2602 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patc

Re: [PR] [#2601] feat(spark): Overlapping decompression for shuffle read [uniffle]

2025-09-04 Thread via GitHub
github-actions[bot] commented on PR #2602: URL: https://github.com/apache/uniffle/pull/2602#issuecomment-3253019982 ## Test Results  2 985 files   - 105   2 985 suites   - 105   5h 37m 54s ⏱️ - 1h 11m 35s  1 068 tests  - 130   1 066 ✅  - 130   1 💤 ±0  1 ❌ +1  14 920 runs   - 246  14 

Re: [PR] [#2601] feat(spark): Overlapping decompression for shuffle read [uniffle]

2025-09-04 Thread via GitHub
jerqi commented on PR #2602: URL: https://github.com/apache/uniffle/pull/2602#issuecomment-3253062184 Spark is ok. But mr may need order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [#2569] feat(spark): Add statistic of shuffle read times [uniffle]

2025-09-04 Thread via GitHub
zuston commented on PR #2598: URL: https://github.com/apache/uniffle/pull/2598#issuecomment-3252203907 cc @jerqi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [#2601] feat(spark): Overlapping decompression for shuffle read [uniffle]

2025-09-04 Thread via GitHub
zuston commented on PR #2602: URL: https://github.com/apache/uniffle/pull/2602#issuecomment-3253276878 > Spark is ok. But mr may need order. Thanks for sharing this. Now this PR is only valid in spark -- This is an automated message from the Apache Git Service. To respond to the mes

[I] [FEATURE] Overlapping decompression for shufflfe read [uniffle]

2025-09-03 Thread via GitHub
zuston opened a new issue, #2601: URL: https://github.com/apache/uniffle/issues/2601 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the

Re: [PR] [#2599] fix(spark): Fix bug the incorrect shuffle read metric for spark [uniffle]

2025-09-03 Thread via GitHub
zuston commented on code in PR #2600: URL: https://github.com/apache/uniffle/pull/2600#discussion_r2320738532 ## client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java: ## @@ -117,17 +117,20 @@ public boolean hasNext() { // If Shuffl

Re: [PR] [#2599] fix(spark): Fix bug the incorrect shuffle read metric for spark [uniffle]

2025-09-03 Thread via GitHub
github-actions[bot] commented on PR #2600: URL: https://github.com/apache/uniffle/pull/2600#issuecomment-3249925243 ## Test Results  3 082 files   -  8   3 082 suites   - 8   6h 33m 20s ⏱️ - 16m 16s  1 198 tests ± 0   1 195 ✅  -  2   1 💤 ±0  1 ❌ +1  1 🔥 +1  15 154 runs   - 12  15 135

[PR] [#2599] Fix bug: Incorrect shuffle read metric for spark [uniffle]

2025-09-03 Thread via GitHub
cchung100m opened a new pull request, #2600: URL: https://github.com/apache/uniffle/pull/2600 ### What changes were proposed in this pull request? Fix bug: Incorrect shuffle read metric for Spark ### Why are the changes needed? for https://github.com/apache/uniffle/issues/2599

Re: [PR] [#2569] feat(spark): Add statistic of shuffle read times [uniffle]

2025-09-03 Thread via GitHub
github-actions[bot] commented on PR #2598: URL: https://github.com/apache/uniffle/pull/2598#issuecomment-3248531993 ## Test Results  2 982 files   - 108   2 982 suites   - 108   5h 43m 56s ⏱️ - 1h 5m 40s  1 066 tests  - 132   1 064 ✅  - 133   1 💤 ±0  0 ❌ ±0  1 🔥 +1  14 902 runs   - 2

[I] [Bug] Incorrect shuffle read metric for spark [uniffle]

2025-09-03 Thread via GitHub
zuston opened a new issue, #2599: URL: https://github.com/apache/uniffle/issues/2599 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the

[PR] [#2569] feat(spark): Add statistic of shuffle read times [uniffle]

2025-09-03 Thread via GitHub
zuston opened a new pull request, #2598: URL: https://github.com/apache/uniffle/pull/2598 ### What changes were proposed in this pull request? Add statistic of shuffle read times to find the bottleneck for shuffle reading ### Why are the changes needed? for #2569

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-31 Thread via GitHub
jerqi commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3240762777 The serialization of Spark happens in the shuffle write shuffle stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
chaokunyang commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3233250339 Data record in Spark SQL are alreay binary, there is no serialization happened. I suggest benchmark first before optimizing. -- This is an automated message from the Apache Git Serv

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
zuston commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3233276105 > Data record in Spark SQL are alreay binary, there is no serialization happened. I suggest benchmark first before optimizing. It seems that serialization is still happening. https:

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
zuston commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3233227564 > Only if you are using spark rdd with raw java objects, there will be serialization bottleneck. Such cases are similiar to datastream in flink. We've observed several times of e2e perform

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
chaokunyang commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3233213379 Only if you are using spark rdd with raw java objects, there will be serialization bottleneck. Such cases are similiar to datastream in flink. We've observed several times of e2e perf

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
zuston commented on code in PR #2597: URL: https://github.com/apache/uniffle/pull/2597#discussion_r2307189119 ## client-spark/common/src/main/scala/org/apache/spark/serializer/ForySerializer.scala: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
zuston commented on code in PR #2597: URL: https://github.com/apache/uniffle/pull/2597#discussion_r2307189119 ## client-spark/common/src/main/scala/org/apache/spark/serializer/ForySerializer.scala: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
zuston commented on code in PR #2597: URL: https://github.com/apache/uniffle/pull/2597#discussion_r2307184888 ## client-spark/common/pom.xml: ## @@ -89,6 +89,96 @@ net.jpountz.lz4 lz4 + + +org.apache.fory +fory-

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
zuston commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3233199861 Big thanks for your quick and patient review. @chaokunyang > Shuffle data should already be binary, is there anything that needs being serialized? If using the vanilla spark,

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
chaokunyang commented on code in PR #2597: URL: https://github.com/apache/uniffle/pull/2597#discussion_r2307144902 ## client-spark/common/src/main/scala/org/apache/spark/serializer/ForySerializer.scala: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
chaokunyang commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3233159317 Shuffle data should already be binary, is there anything that needs being serialized? Have you ever benchmark your job to see whether there is bottleneck on serialization? -

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
chaokunyang commented on code in PR #2597: URL: https://github.com/apache/uniffle/pull/2597#discussion_r2307141150 ## client-spark/common/src/main/scala/org/apache/spark/serializer/ForySerializer.scala: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
chaokunyang commented on code in PR #2597: URL: https://github.com/apache/uniffle/pull/2597#discussion_r2307124010 ## client-spark/common/src/main/scala/org/apache/spark/serializer/ForySerializer.scala: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
chaokunyang commented on code in PR #2597: URL: https://github.com/apache/uniffle/pull/2597#discussion_r2307122000 ## client-spark/common/src/main/scala/org/apache/spark/serializer/ForySerializer.scala: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
chaokunyang commented on code in PR #2597: URL: https://github.com/apache/uniffle/pull/2597#discussion_r2307115444 ## client-spark/common/pom.xml: ## @@ -89,6 +89,96 @@ net.jpountz.lz4 lz4 + + +org.apache.fory +

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
github-actions[bot] commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3232844747 ## Test Results  3 067 files   - 23   3 067 suites   - 23   6h 7m 44s ⏱️ - 41m 52s  1 207 tests + 9   1 123 ✅  -  74   1 💤 ±0  0 ❌ ±0   83 🔥 + 83  15 225 runs  +59  15 

Re: [PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
zuston commented on PR #2597: URL: https://github.com/apache/uniffle/pull/2597#issuecomment-3232744941 cc @chaokunyang . If you have time, could you help review this integration with Fory? So far, this implementation hasn’t shown significant improvements. I would greatly appreciate

[PR] [#2596] feat(spark): Introduce fory serializer [uniffle]

2025-08-28 Thread via GitHub
zuston opened a new pull request, #2597: URL: https://github.com/apache/uniffle/pull/2597 ### What changes were proposed in this pull request? This is an experimental feature to introduce the fory serializer to replace the villina spark serializer to speed up. ### Why are the c

Re: [I] [FEATURE] Dedicated faster serialization when shuffle writing/reading [uniffle]

2025-08-28 Thread via GitHub
zuston commented on issue #2596: URL: https://github.com/apache/uniffle/issues/2596#issuecomment-3232457340 > > cc [@jerqi](https://github.com/jerqi) > > There are some points that: > > 1. The type system of serialization > 2. supportsRelocationOfSerializedObjects Yes.

Re: [I] [FEATURE] Dedicated faster serialization when shuffle writing/reading [uniffle]

2025-08-28 Thread via GitHub
jerqi commented on issue #2596: URL: https://github.com/apache/uniffle/issues/2596#issuecomment-3232439848 cc @zhengchenyu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [FEATURE] Dedicated faster serialization when shuffle writing/reading [uniffle]

2025-08-28 Thread via GitHub
jerqi commented on issue #2596: URL: https://github.com/apache/uniffle/issues/2596#issuecomment-3232436453 > cc [@jerqi](https://github.com/jerqi) There are some points that: 1. The type system of serialization 2. supportsRelocationOfSerializedObjects -- This is an automated m

Re: [I] [FEATURE] Dedicated faster serialization when shuffle writing/reading [uniffle]

2025-08-28 Thread via GitHub
zuston commented on issue #2596: URL: https://github.com/apache/uniffle/issues/2596#issuecomment-3232267329 cc @jerqi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[I] [FEATURE] Dedicated faster serialization when shuffle writing/reading [uniffle]

2025-08-28 Thread via GitHub
zuston opened a new issue, #2596: URL: https://github.com/apache/uniffle/issues/2596 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the

Re: [I] [FEATURE] Collect the shuffle reader different phase times [uniffle]

2025-08-27 Thread via GitHub
zuston commented on issue #2569: URL: https://github.com/apache/uniffle/issues/2569#issuecomment-3231559410 I think I will take this for the read performance improvement, there will be more small tasks to do, if you have interest on this, please feel free to tell me -- This is an automat

[I] [FEATURE] Dedicated retry times on request assignment when partition reassign [uniffle]

2025-08-27 Thread via GitHub
zuston opened a new issue, #2595: URL: https://github.com/apache/uniffle/issues/2595 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the

[I] [Improvement] Partition reassignment improvements tracking [uniffle]

2025-08-27 Thread via GitHub
zuston opened a new issue, #2594: URL: https://github.com/apache/uniffle/issues/2594 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the

Re: [I] [FEATURE] Speed up the registering on reassign [uniffle]

2025-08-27 Thread via GitHub
zuston commented on issue #2532: URL: https://github.com/apache/uniffle/issues/2532#issuecomment-3231463680 In the case of partition reassign, this will block the rpc response. It should be acted with thread pool. -- This is an automated message from the Apache Git Service. To respond to

Re: [I] [Bug] No available replacement server when reassign is enabled [uniffle]

2025-08-27 Thread via GitHub
zuston closed issue #2563: [Bug] No available replacement server when reassign is enabled URL: https://github.com/apache/uniffle/issues/2563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] [FEATURE] Ignore failure when reporting shuffle metrics [uniffle]

2025-08-26 Thread via GitHub
zuston closed issue #2592: [FEATURE] Ignore failure when reporting shuffle metrics URL: https://github.com/apache/uniffle/issues/2592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [#2592] fix(spark): Ignore failure when reporting shuffle read metrics to driver [uniffle]

2025-08-26 Thread via GitHub
zuston merged PR #2593: URL: https://github.com/apache/uniffle/pull/2593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

Re: [PR] [#2592] fix(spark): Ignore failure when reporting shuffle read metrics to driver [uniffle]

2025-08-26 Thread via GitHub
github-actions[bot] commented on PR #2593: URL: https://github.com/apache/uniffle/pull/2593#issuecomment-3222983665 ## Test Results  3 090 files  ±0   3 090 suites  ±0   6h 51m 16s ⏱️ + 1m 44s  1 198 tests ±0   1 197 ✅ ±0   1 💤 ±0  0 ❌ ±0  15 166 runs  ±0  15 151 ✅ ±0  15 💤 ±0  0 ❌ ±

[PR] [#2592] fix(spark): Ignore failure when reporting shuffle read metrics to driver [uniffle]

2025-08-25 Thread via GitHub
zuston opened a new pull request, #2593: URL: https://github.com/apache/uniffle/pull/2593 ### What changes were proposed in this pull request? Ignore failure when reporting shuffle read metrics to driver ### Why are the changes needed? fix #2592 ### Does this PR i

Re: [I] [Bug] java.lang.IndexOutOfBoundsException: len is negative [uniffle]

2025-08-25 Thread via GitHub
zuston closed issue #2575: [Bug] java.lang.IndexOutOfBoundsException: len is negative URL: https://github.com/apache/uniffle/issues/2575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [#2575] fix(spark): Fix java.lang.IndexOutOfBoundsException: len is negative [uniffle]

2025-08-25 Thread via GitHub
zuston merged PR #2589: URL: https://github.com/apache/uniffle/pull/2589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

[I] [FEATURE] Ignore failure when reporting shuffle metrics [uniffle]

2025-08-25 Thread via GitHub
zuston opened a new issue, #2592: URL: https://github.com/apache/uniffle/issues/2592 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the

Re: [I] [FEATURE] Ignore failure when reporting shuffle metrics [uniffle]

2025-08-25 Thread via GitHub
zuston commented on issue #2592: URL: https://github.com/apache/uniffle/issues/2592#issuecomment-3222330015 This is not critical path, when encountering failure, we could ignore this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[I] [FEATURE] Propogate read localfiles plan to shuffle server to acheive better read ahead performance [uniffle]

2025-08-25 Thread via GitHub
zuston opened a new issue, #2591: URL: https://github.com/apache/uniffle/issues/2591 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the

Re: [I] [Umbrella] Tracking version release of 0.10.0 [uniffle]

2025-08-24 Thread via GitHub
jerqi commented on issue #2590: URL: https://github.com/apache/uniffle/issues/2590#issuecomment-3218711983 > I think it's time to release 0.10.0 version. cc [@jerqi](https://github.com/jerqi) [@xianjingfeng](https://github.com/xianjingfeng) +1. -- This is an automated message from

Re: [I] [Umbrella] Tracking version release of 0.10.0 [uniffle]

2025-08-24 Thread via GitHub
zuston commented on issue #2590: URL: https://github.com/apache/uniffle/issues/2590#issuecomment-3218703760 I think it's time to release 0.10.0 version. cc @jerqi @xianjingfeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] [Umbrella] Tracking version release of 0.10.0 [uniffle]

2025-08-24 Thread via GitHub
zuston opened a new issue, #2590: URL: https://github.com/apache/uniffle/issues/2590 ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the

Re: [I] [Improvement] Client compression optimization [uniffle]

2025-08-24 Thread via GitHub
zuston commented on issue #2494: URL: https://github.com/apache/uniffle/issues/2494#issuecomment-3218695410 All done. Close this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] [Improvement] Client compression optimization [uniffle]

2025-08-24 Thread via GitHub
zuston closed issue #2494: [Improvement] Client compression optimization URL: https://github.com/apache/uniffle/issues/2494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [#2494] feat(spark): Enable overlapping compression by default [uniffle]

2025-08-24 Thread via GitHub
zuston merged PR #2588: URL: https://github.com/apache/uniffle/pull/2588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

Re: [PR] [#2494] feat(spark): Enable overlapping compression by default [uniffle]

2025-08-24 Thread via GitHub
zuston commented on PR #2588: URL: https://github.com/apache/uniffle/pull/2588#issuecomment-3218606304 cc @jerqi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [#2575] fix(client-spark): Fix java.lang.IndexOutOfBoundsException: len is negative [uniffle]

2025-08-24 Thread via GitHub
github-actions[bot] commented on PR #2589: URL: https://github.com/apache/uniffle/pull/2589#issuecomment-3218475010 ## Test Results  3 090 files  +12   3 090 suites  +12   6h 48m 16s ⏱️ + 3m 19s  1 198 tests ± 0   1 197 ✅ + 1   1 💤 ±0  0 ❌  - 1  15 166 runs  +12  15 151 ✅ +13  15 💤 ±

[PR] [#2575] fix(client-spark): Fix java.lang.IndexOutOfBoundsException: len is negative [uniffle]

2025-08-24 Thread via GitHub
cchung100m opened a new pull request, #2589: URL: https://github.com/apache/uniffle/pull/2589 ### What changes were proposed in this pull request? Fix java.lang.IndexOutOfBoundsException: len is negative ### Why are the changes needed? for https://github.com/apache/uniffle/issues

Re: [PR] [#2586] fix(spark): Support writer switching servers on partition split with LOAD_BALANCE mode without reassign [uniffle]

2025-08-21 Thread via GitHub
zuston merged PR #2587: URL: https://github.com/apache/uniffle/pull/2587 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@uniffle.apac

Re: [I] [Bug] No available replacement server when partition split is enabled [uniffle]

2025-08-21 Thread via GitHub
zuston closed issue #2586: [Bug] No available replacement server when partition split is enabled URL: https://github.com/apache/uniffle/issues/2586 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [#2494] feat(spark): Enable overlapping compression by default [uniffle]

2025-08-20 Thread via GitHub
github-actions[bot] commented on PR #2588: URL: https://github.com/apache/uniffle/pull/2588#issuecomment-3209158920 ## Test Results  3 016 files   -  74   3 016 suites   - 74   6h 26m 11s ⏱️ - 23m 1s  1 163 tests  -  34   1 158 ✅  -  38   1 💤 ±0   4 ❌ + 4  14 877 runs   - 274  14 822

[PR] [#2494] feat(spark): Enable overlapping compression by default [uniffle]

2025-08-20 Thread via GitHub
zuston opened a new pull request, #2588: URL: https://github.com/apache/uniffle/pull/2588 ### What changes were proposed in this pull request? 1. Enable overlapping compression by default 2. Add doc for this feature ### Why are the changes needed? for #2494 ###

Re: [I] [FEATURE] Expose the sequential read feature to the shuffle server for localfile read ahead optimization [uniffle]

2025-08-20 Thread via GitHub
zuston closed issue #2565: [FEATURE] Expose the sequential read feature to the shuffle server for localfile read ahead optimization URL: https://github.com/apache/uniffle/issues/2565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

  1   2   3   4   5   6   7   8   9   10   >