[GitHub] [incubator-celeborn] FMX commented on pull request #1166: [WIP][CELEBORN-148] Flink shuffle read.

2023-01-22 Thread via GitHub
FMX commented on PR #1166: URL: https://github.com/apache/incubator-celeborn/pull/1166#issuecomment-1399677366 Closed and will resubmit in another pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [incubator-celeborn] FMX closed pull request #1166: [WIP][CELEBORN-148] Flink shuffle read.

2023-01-22 Thread via GitHub
FMX closed pull request #1166: [WIP][CELEBORN-148] Flink shuffle read. URL: https://github.com/apache/incubator-celeborn/pull/1166 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [incubator-celeborn] FMX opened a new pull request, #1177: [CELEBORN-235] Implement flink read plugin.

2023-01-27 Thread via GitHub
FMX opened a new pull request, #1177: URL: https://github.com/apache/incubator-celeborn/pull/1177 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1174: [CELEBORN-229][FOLLOWUP] Support collect metrics with customized labels

2023-01-27 Thread via GitHub
AngersZh commented on PR #1174: URL: https://github.com/apache/incubator-celeborn/pull/1174#issuecomment-1407275116 ping @waitinfuture , could you take a look if this change will have other impacts? -- This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1174: [CELEBORN-229][FOLLOWUP] Support collect metrics with customized labels

2023-01-27 Thread via GitHub
waitinfuture commented on code in PR #1174: URL: https://github.com/apache/incubator-celeborn/pull/1174#discussion_r1089630457 ## common/src/main/scala/org/apache/celeborn/common/metrics/MetricLabels.scala: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1178: [CELEBORN-237][IMPROVEMENT] push failed error message should show partition info

2023-01-27 Thread via GitHub
AngersZh opened a new pull request, #1178: URL: https://github.com/apache/incubator-celeborn/pull/1178 ### What changes were proposed in this pull request? Currently push data failed only show failed reason, but don't have partition info. In this pr, we enable push failed error messag

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1178: [CELEBORN-237][IMPROVEMENT] push failed error message should show partition info

2023-01-27 Thread via GitHub
AngersZh commented on PR #1178: URL: https://github.com/apache/incubator-celeborn/pull/1178#issuecomment-1407311222 ping @nafiyAix @waitinfuture @FMX -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1178: [CELEBORN-237][IMPROVEMENT] push failed error message should show partition info

2023-01-27 Thread via GitHub
waitinfuture commented on code in PR #1178: URL: https://github.com/apache/incubator-celeborn/pull/1178#discussion_r1089670215 ## worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala: ## @@ -423,7 +427,9 @@ class PushDataHandler extends BaseMess

[GitHub] [incubator-celeborn] waitinfuture merged pull request #1174: [CELEBORN-229][FOLLOWUP] Support collect metrics with customized labels

2023-01-28 Thread via GitHub
waitinfuture merged PR #1174: URL: https://github.com/apache/incubator-celeborn/pull/1174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] AngersZhuuuu merged pull request #1178: [CELEBORN-237][IMPROVEMENT] push failed error message should show partition info

2023-01-28 Thread via GitHub
AngersZh merged PR #1178: URL: https://github.com/apache/incubator-celeborn/pull/1178 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1179: [CELEBORN-241][IMPROVEMENT] limit inflight push timeout should > push data timeout > network timeout *2

2023-01-28 Thread via GitHub
AngersZh opened a new pull request, #1179: URL: https://github.com/apache/incubator-celeborn/pull/1179 ### What changes were proposed in this pull request? When enable replicate, push data timeout should > push master timeout + push slave timeout When disable replicate, push data t

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1179: [CELEBORN-241][IMPROVEMENT] limit inflight push timeout should > push data timeout > network timeout *2

2023-01-28 Thread via GitHub
AngersZh commented on PR #1179: URL: https://github.com/apache/incubator-celeborn/pull/1179#issuecomment-1407550996 ping @waitinfuture Could you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1180: [CELEBORN-238][IMPROVEMENT] Push data to master partition timeout should add to blacklist and reserve for some time

2023-01-28 Thread via GitHub
AngersZh opened a new pull request, #1180: URL: https://github.com/apache/incubator-celeborn/pull/1180 ### What changes were proposed in this pull request? After we change PUSH_DATA_TIMEOUT to double of network connection timeout If throw PUSH_DATA_TIMEOUT, it should be bot master a

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1180: [CELEBORN-238][IMPROVEMENT] Push data to master partition timeout should add to blacklist and reserve for some time

2023-01-28 Thread via GitHub
AngersZh commented on PR #1180: URL: https://github.com/apache/incubator-celeborn/pull/1180#issuecomment-1407581552 ping @waitinfuture -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1181: [CELEBORN-243][IMPROVEMENT] Create push client failed should have a ERROR type

2023-01-29 Thread via GitHub
AngersZh opened a new pull request, #1181: URL: https://github.com/apache/incubator-celeborn/pull/1181 ### What changes were proposed in this pull request? Push data connect failed is a critical error, we should have a divided error status for these case and add it to blacklist and

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1181: [CELEBORN-243][IMPROVEMENT] Create push client failed should have a ERROR type

2023-01-29 Thread via GitHub
AngersZh commented on PR #1181: URL: https://github.com/apache/incubator-celeborn/pull/1181#issuecomment-1407601264 ping @waitinfuture -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-29 Thread via GitHub
AngersZh opened a new pull request, #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [incubator-celeborn] kaijchen opened a new pull request, #1183: [CELEBORN-248] Non-ASCII characters in source code

2023-01-29 Thread via GitHub
kaijchen opened a new pull request, #1183: URL: https://github.com/apache/incubator-celeborn/pull/1183 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [incubator-celeborn] waitinfuture merged pull request #1183: [CELEBORN-248] Non-ASCII characters in source code

2023-01-29 Thread via GitHub
waitinfuture merged PR #1183: URL: https://github.com/apache/incubator-celeborn/pull/1183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] waitinfuture merged pull request #1179: [CELEBORN-241][IMPROVEMENT] limit inflight push timeout should > push data timeout

2023-01-29 Thread via GitHub
waitinfuture merged PR #1179: URL: https://github.com/apache/incubator-celeborn/pull/1179 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1181: [CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed should have a ERROR type

2023-01-30 Thread via GitHub
AngersZh commented on PR #1181: URL: https://github.com/apache/incubator-celeborn/pull/1181#issuecomment-1408162188 ping @waitinfuture Have change the code as discussed this morning, pls take a look -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [incubator-celeborn] nafiyAix commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-30 Thread via GitHub
nafiyAix commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1090307197 ## common/src/main/scala/org/apache/celeborn/common/metrics/source/ResourceConsumptionSource.scala: ## @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Soft

[GitHub] [incubator-celeborn] nafiyAix commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-30 Thread via GitHub
nafiyAix commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1090313649 ## master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala: ## @@ -624,12 +625,34 @@ private[celeborn] class Master( context.repl

[GitHub] [incubator-celeborn] nafiyAix commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-30 Thread via GitHub
nafiyAix commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1090315580 ## common/src/main/scala/org/apache/celeborn/common/metrics/source/AbstractSource.scala: ## @@ -72,8 +72,10 @@ abstract class AbstractSource(conf: Celeborn

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1149: [CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker

2023-01-30 Thread via GitHub
waitinfuture commented on PR #1149: URL: https://github.com/apache/incubator-celeborn/pull/1149#issuecomment-1408219824 plz test q23a with multiple worker kills and restarts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-30 Thread via GitHub
AngersZh commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1090348345 ## common/src/main/scala/org/apache/celeborn/common/metrics/source/AbstractSource.scala: ## @@ -72,8 +72,10 @@ abstract class AbstractSource(conf: Cele

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-30 Thread via GitHub
AngersZh commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1090350101 ## common/src/main/scala/org/apache/celeborn/common/metrics/source/ResourceConsumptionSource.scala: ## @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1181: [CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should

2023-01-30 Thread via GitHub
waitinfuture commented on code in PR #1181: URL: https://github.com/apache/incubator-celeborn/pull/1181#discussion_r1090332733 ## client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java: ## @@ -739,12 +739,22 @@ public void onSuccess(ByteBuffer response) {

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1181: [CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should

2023-01-30 Thread via GitHub
AngersZh commented on code in PR #1181: URL: https://github.com/apache/incubator-celeborn/pull/1181#discussion_r1090366527 ## client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java: ## @@ -739,12 +739,22 @@ public void onSuccess(ByteBuffer response) {

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1181: [CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should

2023-01-30 Thread via GitHub
AngersZh commented on code in PR #1181: URL: https://github.com/apache/incubator-celeborn/pull/1181#discussion_r1090368545 ## common/src/main/java/org/apache/celeborn/common/protocol/message/StatusCode.java: ## @@ -68,7 +68,11 @@ public enum StatusCode { REGION_FINISH_FA

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1180: [CELEBORN-238][IMPROVEMENT] Push data to master partition timeout should add to blacklist and reserve for some time

2023-01-30 Thread via GitHub
waitinfuture commented on PR #1180: URL: https://github.com/apache/incubator-celeborn/pull/1180#issuecomment-1408263069 currently we have no evidence whether pushdata timeout is caused by main or slave. I think we can refine the design that ```celeborn.push.data.timeout``` means **ONE WAY*

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1180: [CELEBORN-238][IMPROVEMENT] Push data to master partition timeout should add to blacklist and reserve for some time

2023-01-30 Thread via GitHub
waitinfuture commented on PR #1180: URL: https://github.com/apache/incubator-celeborn/pull/1180#issuecomment-1408265495 > currently we have no evidence whether pushdata timeout is caused by main or slave. I think we can refine the design that `celeborn.push.data.timeout` means **ONE WAY**

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1181: [CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should

2023-01-30 Thread via GitHub
AngersZh commented on code in PR #1181: URL: https://github.com/apache/incubator-celeborn/pull/1181#discussion_r1090374021 ## worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala: ## @@ -268,8 +275,9 @@ class PushDataHandler extends BaseMess

[GitHub] [incubator-celeborn] AngersZhuuuu merged pull request #1181: [CELEBORN-243][CELEBORN-245][IMPROVEMENT] Create push client failed and connection failed cause push failed should have their own

2023-01-30 Thread via GitHub
AngersZh merged PR #1181: URL: https://github.com/apache/incubator-celeborn/pull/1181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1180: [CELEBORN-238][IMPROVEMENT] Push data to master partition timeout should add to blacklist and reserve for some time

2023-01-30 Thread via GitHub
waitinfuture commented on PR #1180: URL: https://github.com/apache/incubator-celeborn/pull/1180#issuecomment-1408321911 > > currently we have no evidence whether pushdata timeout is caused by main or slave. I think we can refine the design that `celeborn.push.data.timeout` means **ONE WAY*

[GitHub] [incubator-celeborn] zhongqiangczq opened a new pull request, #1184: [CELEBORN-243] fix bug that os's disk usage is low but celeborn think…

2023-01-30 Thread via GitHub
zhongqiangczq opened a new pull request, #1184: URL: https://github.com/apache/incubator-celeborn/pull/1184 …s that it's high_disk_usage ### What changes were proposed in this pull request? StorageManager.workingDirWriters doesn't cleanup expired filewriter, so that it will

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1185: [CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too

2023-01-30 Thread via GitHub
AngersZh opened a new pull request, #1185: URL: https://github.com/apache/incubator-celeborn/pull/1185 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [incubator-celeborn] boneanxs commented on a diff in pull request #1066: [CELEBORN-207] Support network bakpressure and control

2023-01-30 Thread via GitHub
boneanxs commented on code in PR #1066: URL: https://github.com/apache/incubator-celeborn/pull/1066#discussion_r1091371568 ## common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala: ## @@ -2656,6 +2667,59 @@ object CelebornConf extends Logging { .timeConf(Ti

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1185: [CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too

2023-01-30 Thread via GitHub
AngersZh commented on PR #1185: URL: https://github.com/apache/incubator-celeborn/pull/1185#issuecomment-1409704977 ping @waitinfuture Could you take a look? have add a UT and it's behavior as expected. -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1185: [CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too

2023-01-30 Thread via GitHub
AngersZh commented on PR #1185: URL: https://github.com/apache/incubator-celeborn/pull/1185#issuecomment-1409706119 One question is that if we need to pass client side PUSH_DATA_TIMEOUT to worker side to keep same timeout -- This is an automated message from the Apache Git Service. T

[GitHub] [incubator-celeborn] waitinfuture opened a new pull request, #1186: [CELEBORN-252] Delete slides

2023-01-30 Thread via GitHub
waitinfuture opened a new pull request, #1186: URL: https://github.com/apache/incubator-celeborn/pull/1186 ### What changes were proposed in this pull request? Delete unnecessary slides from codebase. ### Why are the changes needed? ### Does this PR introd

[GitHub] [incubator-celeborn] pan3793 commented on pull request #1186: [CELEBORN-252] Delete slides

2023-01-30 Thread via GitHub
pan3793 commented on PR #1186: URL: https://github.com/apache/incubator-celeborn/pull/1186#issuecomment-1409779650 how about moving it to the website repo? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [incubator-celeborn] zwangsheng opened a new pull request, #1187: [HELM] Improve master/worker statefulset security context

2023-01-30 Thread via GitHub
zwangsheng opened a new pull request, #1187: URL: https://github.com/apache/incubator-celeborn/pull/1187 ### What changes were proposed in this pull request? Improve helm values, helps users configure custom security context for master/worker statefulsets. ### Why are t

[GitHub] [incubator-celeborn] boneanxs opened a new pull request, #1188: [MINOR] Fix the wrongly resolve celeborn.ha.master.node.id issue if enable HA

2023-01-30 Thread via GitHub
boneanxs opened a new pull request, #1188: URL: https://github.com/apache/incubator-celeborn/pull/1188 ### What changes were proposed in this pull request? `haMasterNodeIds` could resolve the `celeborn.ha.master.node.id` if we set it, ```java assertion failed: id doe

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1188: [CELEBORN-253][MINOR] Fix the wrongly resolve celeborn.ha.master.node.id issue if enable HA

2023-01-30 Thread via GitHub
AngersZh commented on code in PR #1188: URL: https://github.com/apache/incubator-celeborn/pull/1188#discussion_r1091534878 ## common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala: ## @@ -574,6 +574,7 @@ class CelebornConf(loadDefaults: Boolean) extends Clonea

[GitHub] [incubator-celeborn] FMX opened a new pull request, #1189: [CELEBORN-224][FOLLOWUP]Correct license and notices.

2023-01-30 Thread via GitHub
FMX opened a new pull request, #1189: URL: https://github.com/apache/incubator-celeborn/pull/1189 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [incubator-celeborn] AngersZhuuuu merged pull request #1188: [CELEBORN-253][MINOR] Fix the wrongly resolve celeborn.ha.master.node.id issue if enable HA

2023-01-30 Thread via GitHub
AngersZh merged PR #1188: URL: https://github.com/apache/incubator-celeborn/pull/1188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] pan3793 commented on pull request #1188: [CELEBORN-253][MINOR] Fix the wrongly resolve celeborn.ha.master.node.id issue if enable HA

2023-01-31 Thread via GitHub
pan3793 commented on PR #1188: URL: https://github.com/apache/incubator-celeborn/pull/1188#issuecomment-1409940033 Should it be ported to branch-0.2? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1189: [CELEBORN-224][FOLLOWUP]Correct license and notices.

2023-01-31 Thread via GitHub
waitinfuture commented on PR #1189: URL: https://github.com/apache/incubator-celeborn/pull/1189#issuecomment-1409940312 cc @carp84 please review this pr, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [incubator-celeborn] pan3793 commented on pull request #1187: [HELM] Improve master/worker statefulset security context

2023-01-31 Thread via GitHub
pan3793 commented on PR #1187: URL: https://github.com/apache/incubator-celeborn/pull/1187#issuecomment-1409945019 Please fill a JIRA ticket -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1188: [CELEBORN-253][MINOR] Fix the wrongly resolve celeborn.ha.master.node.id issue if enable HA

2023-01-31 Thread via GitHub
waitinfuture commented on PR #1188: URL: https://github.com/apache/incubator-celeborn/pull/1188#issuecomment-1409957587 > ported -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1186: [CELEBORN-252] Delete slides

2023-01-31 Thread via GitHub
waitinfuture commented on PR #1186: URL: https://github.com/apache/incubator-celeborn/pull/1186#issuecomment-1409961442 > how about moving it to the website repo? In fact the contents in the slides are out-of-date. We can upgrade the contends and put it in website. -- This is an a

[GitHub] [incubator-celeborn] FMX merged pull request #1186: [CELEBORN-252] Delete slides

2023-01-31 Thread via GitHub
FMX merged PR #1186: URL: https://github.com/apache/incubator-celeborn/pull/1186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@cele

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingRpcs to rpcs & pushs

2023-01-31 Thread via GitHub
AngersZh opened a new pull request, #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1184: [CELEBORN-243] fix bug that os's disk usage is low but celeborn think…

2023-01-31 Thread via GitHub
waitinfuture commented on code in PR #1184: URL: https://github.com/apache/incubator-celeborn/pull/1184#discussion_r1091643836 ## worker/src/main/scala/org/apache/celeborn/service/deploy/worker/Worker.scala: ## @@ -422,6 +422,13 @@ private[celeborn] class Worker( fileW

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingRpcs to rpcs & pushs

2023-01-31 Thread via GitHub
AngersZh commented on PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190#issuecomment-1410034957 ping @waitinfuture Could you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingRpcs to rpcs & pushs

2023-01-31 Thread via GitHub
AngersZh commented on PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190#issuecomment-1410055385 After detail thought, in current to add PushResponse PushFailure will cause unhandled Compatibility Issues, so we may only do like this now. -- This is an automated mes

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1149: [CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker

2023-01-31 Thread via GitHub
waitinfuture commented on PR #1149: URL: https://github.com/apache/incubator-celeborn/pull/1149#issuecomment-1410145690 I have tested by killing worker while running q23a, LGMT, merging to main. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [incubator-celeborn] waitinfuture merged pull request #1149: [CELEBORN-201] Separate partitionLocationInfo in LifecycleManager and worker

2023-01-31 Thread via GitHub
waitinfuture merged PR #1149: URL: https://github.com/apache/incubator-celeborn/pull/1149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] pan3793 opened a new pull request, #1191: [CELEBORN-171][FOLLOWUP] Auto activate jdk-8 profile

2023-01-31 Thread via GitHub
pan3793 opened a new pull request, #1191: URL: https://github.com/apache/incubator-celeborn/pull/1191 ### What changes were proposed in this pull request? Auto activate `jdk-8` maven profile when using JDK 8. ### Why are the changes needed? The `jdk-*` profile sho

[GitHub] [incubator-celeborn] pan3793 merged pull request #1191: [CELEBORN-171][FOLLOWUP] Auto activate jdk-8 profile

2023-01-31 Thread via GitHub
pan3793 merged PR #1191: URL: https://github.com/apache/incubator-celeborn/pull/1191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

[GitHub] [incubator-celeborn] pan3793 commented on pull request #1191: [CELEBORN-171][FOLLOWUP] Auto activate jdk-8 profile

2023-01-31 Thread via GitHub
pan3793 commented on PR #1191: URL: https://github.com/apache/incubator-celeborn/pull/1191#issuecomment-1410217432 Merged to main/branch-0.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [incubator-celeborn] pan3793 merged pull request #1187: [CELEBORN-256][HELM] Improve master/worker statefulset security context

2023-01-31 Thread via GitHub
pan3793 merged PR #1187: URL: https://github.com/apache/incubator-celeborn/pull/1187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingRpcs to rpcs & pushs

2023-01-31 Thread via GitHub
waitinfuture commented on code in PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190#discussion_r1091816605 ## common/src/main/java/org/apache/celeborn/common/network/client/TransportResponseHandler.java: ## @@ -44,6 +44,7 @@ public class TransportResponseHan

[GitHub] [incubator-celeborn] pan3793 commented on pull request #1187: [CELEBORN-256][HELM] Improve master/worker statefulset security context

2023-01-31 Thread via GitHub
pan3793 commented on PR #1187: URL: https://github.com/apache/incubator-celeborn/pull/1187#issuecomment-1410219188 Thanks, merged to main for 0.3.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-31 Thread via GitHub
waitinfuture commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1091842862 ## master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala: ## @@ -624,12 +625,34 @@ private[celeborn] class Master( context.

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-31 Thread via GitHub
waitinfuture commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1091842862 ## master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala: ## @@ -624,12 +625,34 @@ private[celeborn] class Master( context.

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-31 Thread via GitHub
waitinfuture commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1091842862 ## master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala: ## @@ -624,12 +625,34 @@ private[celeborn] class Master( context.

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1185: [CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too

2023-01-31 Thread via GitHub
waitinfuture commented on PR #1185: URL: https://github.com/apache/incubator-celeborn/pull/1185#issuecomment-1410258102 > One question is that if we need to pass client side PUSH_DATA_TIMEOUT to worker side to keep same timeout I think so, different jobs may pass different configs

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs

2023-01-31 Thread via GitHub
waitinfuture commented on code in PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190#discussion_r1091893844 ## common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java: ## @@ -321,4 +321,21 @@ protected void handleFailure(String err

[GitHub] [incubator-celeborn] zhongqiangczq commented on a diff in pull request #1184: [CELEBORN-243] fix bug that os's disk usage is low but celeborn think…

2023-01-31 Thread via GitHub
zhongqiangczq commented on code in PR #1184: URL: https://github.com/apache/incubator-celeborn/pull/1184#discussion_r1092657526 ## worker/src/main/scala/org/apache/celeborn/service/deploy/worker/Worker.scala: ## @@ -47,7 +47,7 @@ import org.apache.celeborn.common.quota.Resource

[GitHub] [incubator-celeborn] zhongqiangczq commented on a diff in pull request #1184: [CELEBORN-243] fix bug that os's disk usage is low but celeborn think…

2023-01-31 Thread via GitHub
zhongqiangczq commented on code in PR #1184: URL: https://github.com/apache/incubator-celeborn/pull/1184#discussion_r1092658646 ## worker/src/main/scala/org/apache/celeborn/service/deploy/worker/storage/StorageManager.scala: ## @@ -420,6 +420,15 @@ final private[worker] class S

[GitHub] [incubator-celeborn] zhongqiangczq commented on a diff in pull request #1184: [CELEBORN-243] fix bug that os's disk usage is low but celeborn think…

2023-01-31 Thread via GitHub
zhongqiangczq commented on code in PR #1184: URL: https://github.com/apache/incubator-celeborn/pull/1184#discussion_r1092660074 ## worker/src/main/scala/org/apache/celeborn/service/deploy/worker/Worker.scala: ## @@ -422,6 +422,13 @@ private[celeborn] class Worker( file

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs

2023-01-31 Thread via GitHub
AngersZh commented on code in PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190#discussion_r1092693000 ## common/src/main/java/org/apache/celeborn/common/network/client/TransportClient.java: ## @@ -321,4 +321,21 @@ protected void handleFailure(String err

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs

2023-01-31 Thread via GitHub
AngersZh commented on code in PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190#discussion_r1092693119 ## common/src/main/java/org/apache/celeborn/common/network/client/TransportResponseHandler.java: ## @@ -44,6 +44,7 @@ public class TransportResponseHan

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs

2023-01-31 Thread via GitHub
AngersZh commented on PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190#issuecomment-1411359462 ping @waitinfuture -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs

2023-01-31 Thread via GitHub
waitinfuture commented on code in PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190#discussion_r1092704305 ## common/src/main/java/org/apache/celeborn/common/network/client/TransportResponseHandler.java: ## @@ -184,13 +216,19 @@ public void handle(ResponseMe

[GitHub] [incubator-celeborn] waitinfuture merged pull request #1190: [CELEBORN-244][IMPROVEMENT] Separate outstandingPushes from outstandingRpcs

2023-01-31 Thread via GitHub
waitinfuture merged PR #1190: URL: https://github.com/apache/incubator-celeborn/pull/1190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1185: [CELEBORN-239][IMPROVEMENT] Worker replicate should enable push data timeout too

2023-01-31 Thread via GitHub
AngersZh commented on PR #1185: URL: https://github.com/apache/incubator-celeborn/pull/1185#issuecomment-1411523229 ping @waitinfuture Could you take a review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [incubator-celeborn] boneanxs opened a new pull request, #1192: [CELEBORN-258] restart-worker.sh could miss CELEBORN_WORKER_MEMORY and CELEBORN_WORKER_OFFHEAP_MEMORY settings

2023-01-31 Thread via GitHub
boneanxs opened a new pull request, #1192: URL: https://github.com/apache/incubator-celeborn/pull/1192 ### What changes were proposed in this pull request? If we restart worker using `restart-worker.sh`, `CELEBORN_WORKER_OFFHEAP_MEMORY` and `CELEBORN_WORKER_MEMORY` will not ta

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1193: [CELEBORN-257][IMPROVEMENT] Avoid one hash searching when process message in TransportResponseHandler

2023-01-31 Thread via GitHub
AngersZh opened a new pull request, #1193: URL: https://github.com/apache/incubator-celeborn/pull/1193 ### What changes were proposed in this pull request? Avoid one time of search hash key when handling response. ### Why are the changes needed? ### Does this

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1193: [CELEBORN-257][IMPROVEMENT] Avoid one hash searching when process message in TransportResponseHandler

2023-01-31 Thread via GitHub
AngersZh commented on PR #1193: URL: https://github.com/apache/incubator-celeborn/pull/1193#issuecomment-1411530231 ping @waitinfuture @FMX -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [incubator-celeborn] pan3793 merged pull request #1192: [CELEBORN-258] `sbin/restart-worker.sh` should respect CELEBORN_WORKER_MEMORY and CELEBORN_WORKER_OFFHEAP_MEMORY

2023-01-31 Thread via GitHub
pan3793 merged PR #1192: URL: https://github.com/apache/incubator-celeborn/pull/1192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

[GitHub] [incubator-celeborn] pan3793 commented on pull request #1192: [CELEBORN-258] `sbin/restart-worker.sh` should respect CELEBORN_WORKER_MEMORY and CELEBORN_WORKER_OFFHEAP_MEMORY

2023-01-31 Thread via GitHub
pan3793 commented on PR #1192: URL: https://github.com/apache/incubator-celeborn/pull/1192#issuecomment-1411539400 Thanks, merged to main/branch-0.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1194: [CELEBORN-259][BUG] Correct wrong comment in restart.sh

2023-01-31 Thread via GitHub
AngersZh opened a new pull request, #1194: URL: https://github.com/apache/incubator-celeborn/pull/1194 ### What changes were proposed in this pull request? Correct wrong comment in restart.sh ### Why are the changes needed? ### Does this PR introduce _

[GitHub] [incubator-celeborn] pan3793 merged pull request #1194: [CELEBORN-259][BUG] Correct wrong comment in restart.sh

2023-01-31 Thread via GitHub
pan3793 merged PR #1194: URL: https://github.com/apache/incubator-celeborn/pull/1194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

[GitHub] [incubator-celeborn] pan3793 commented on pull request #1194: [CELEBORN-259][BUG] Correct wrong comment in restart.sh

2023-01-31 Thread via GitHub
pan3793 commented on PR #1194: URL: https://github.com/apache/incubator-celeborn/pull/1194#issuecomment-1411550790 Thanks, merged to main/branch-0.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [incubator-celeborn] AngersZhuuuu merged pull request #1193: [CELEBORN-257][IMPROVEMENT] Avoid one hash searching when process message in TransportResponseHandler

2023-01-31 Thread via GitHub
AngersZh merged PR #1193: URL: https://github.com/apache/incubator-celeborn/pull/1193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubsc

[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-31 Thread via GitHub
AngersZh commented on PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#issuecomment-1411579153 How about current @waitinfuture @nafiyAix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [incubator-celeborn] nafiyAix commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-31 Thread via GitHub
nafiyAix commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1092850907 ## common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala: ## @@ -482,6 +482,8 @@ class CelebornConf(loadDefaults: Boolean) extends Cloneable

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-31 Thread via GitHub
AngersZh commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1092855945 ## common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala: ## @@ -482,6 +482,8 @@ class CelebornConf(loadDefaults: Boolean) extends Clonea

[GitHub] [incubator-celeborn] nafiyAix commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-01-31 Thread via GitHub
nafiyAix commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1092866184 ## master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala: ## @@ -624,12 +627,48 @@ private[celeborn] class Master( context.repl

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-02-01 Thread via GitHub
AngersZh commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1092880461 ## master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala: ## @@ -624,12 +627,48 @@ private[celeborn] class Master( context.

[GitHub] [incubator-celeborn] WillemJiang commented on a diff in pull request #1189: [CELEBORN-224][FOLLOWUP] Correct license and notices.

2023-02-01 Thread via GitHub
WillemJiang commented on code in PR #1189: URL: https://github.com/apache/incubator-celeborn/pull/1189#discussion_r1092803672 ## NOTICE-binary: ## @@ -63,4 +58,10 @@ Apache Ratis Server API Copyright 2017-2020 The Apache Software Foundation Apache Ratis Thirdparty Miscellan

[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1182: [CELEBORN-247] Add metrics for each user's quota usage

2023-02-01 Thread via GitHub
AngersZh commented on code in PR #1182: URL: https://github.com/apache/incubator-celeborn/pull/1182#discussion_r1092923263 ## master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala: ## @@ -624,12 +625,34 @@ private[celeborn] class Master( context.

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1189: [CELEBORN-224][FOLLOWUP] Correct license and notices.

2023-02-01 Thread via GitHub
waitinfuture commented on code in PR #1189: URL: https://github.com/apache/incubator-celeborn/pull/1189#discussion_r1092932857 ## NOTICE-binary: ## @@ -63,4 +58,10 @@ Apache Ratis Server API Copyright 2017-2020 The Apache Software Foundation Apache Ratis Thirdparty Miscella

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1189: [CELEBORN-224][FOLLOWUP] Correct license and notices.

2023-02-01 Thread via GitHub
waitinfuture commented on code in PR #1189: URL: https://github.com/apache/incubator-celeborn/pull/1189#discussion_r1092932857 ## NOTICE-binary: ## @@ -63,4 +58,10 @@ Apache Ratis Server API Copyright 2017-2020 The Apache Software Foundation Apache Ratis Thirdparty Miscella

[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1189: [CELEBORN-224][FOLLOWUP] Correct license and notices.

2023-02-01 Thread via GitHub
waitinfuture commented on code in PR #1189: URL: https://github.com/apache/incubator-celeborn/pull/1189#discussion_r1092932857 ## NOTICE-binary: ## @@ -63,4 +58,10 @@ Apache Ratis Server API Copyright 2017-2020 The Apache Software Foundation Apache Ratis Thirdparty Miscella

[GitHub] [incubator-celeborn] WillemJiang commented on a diff in pull request #1189: [CELEBORN-224][FOLLOWUP] Correct license and notices.

2023-02-01 Thread via GitHub
WillemJiang commented on code in PR #1189: URL: https://github.com/apache/incubator-celeborn/pull/1189#discussion_r1092947399 ## NOTICE-binary: ## @@ -63,4 +58,10 @@ Apache Ratis Server API Copyright 2017-2020 The Apache Software Foundation Apache Ratis Thirdparty Miscellan

[GitHub] [incubator-celeborn] boneanxs opened a new pull request, #1195: [CELEBORN-258][FOLLOW UP] `sbin/restart-worker.sh` should also import the `sbin/celeborn-config.sh`

2023-02-01 Thread via GitHub
boneanxs opened a new pull request, #1195: URL: https://github.com/apache/incubator-celeborn/pull/1195 ### What changes were proposed in this pull request? Follow up the pr #1192 to read the `sbin/celeborn-config.sh` firstly, otherwise `CELEBORN_WORKER_MEMORY` and `CELEBORN_WO

  1   2   3   4   5   6   7   8   9   10   >