[GitHub] [samza] ajothomas merged pull request #1649: [Docs] Samza 1.8.0 release website updates
ajothomas merged PR #1649: URL: https://github.com/apache/samza/pull/1649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas opened a new pull request, #1649: [Docs] Samza 1.8.0 release website updates
ajothomas opened a new pull request, #1649: URL: https://github.com/apache/samza/pull/1649 This PR has all website/doc related changes w.r.t to samza 1.8.0 release. Since we did not add anything for 1.7.0 release, this PR adds a release.md and download links for 1.7.0 release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
li-afaris commented on PR #1647: URL: https://github.com/apache/samza/pull/1647#issuecomment-1382703530 Good catch. I updated publish-site.sh and tested the best that I could as I don't have SVN write access. Let me know if it doesn't work correctly. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1204: Bump jekyll from 3.4.5 to 3.6.3 in /docs
li-afaris commented on PR #1204: URL: https://github.com/apache/samza/pull/1204#issuecomment-1381196091 This can probably be closed "won't-fix" as Jekyll 4.3.1 is in the dependency tree from pull request #1647. The following is from `./gradlew docs:dependencies` ``` gems +--- rubygems:jekyll-sass-converter:2.2.0 |\--- rubygems:sassc:]2.0.1,3.0[ -> 2.4.0 | \--- rubygems:ffi:[1.9.0,2.0[ -> 1.15.5 +--- rubygems:jekyll:4.3.1 |+--- rubygems:addressable:[2.4.0,3.0[ -> 2.8.1 ||\--- rubygems:public_suffix:[2.0.2,6.0[ -> 5.0.1 |+--- rubygems:colorator:[1.0.0,2.0[ -> 1.1.0 |+--- rubygems:em-websocket:[0.5.0,1.0[ -> 0.5.3 ||+--- rubygems:eventmachine:[0.12.9,) -> 1.2.7 ||\--- rubygems:http_parser.rb:[0,) -> 0.6.0 |+--- rubygems:i18n:[1.0.0,2.0[ -> 1.12.0 ||\--- rubygems:concurrent-ruby:[1.0.0,2.0[ -> 1.1.10 |+--- rubygems:jekyll-sass-converter:[2.0,4.0[ -> 2.2.0 (*) |+--- rubygems:jekyll-watch:[2.0.0,3.0[ -> 2.2.1 ||\--- rubygems:listen:[3.0.0,4.0[ -> 3.8.0 || +--- rubygems:rb-fsevent:[0.10.3,1.0[ -> 0.11.2 || \--- rubygems:rb-inotify:[0.9.10,1.0[ -> 0.10.1 || \--- rubygems:ffi:[1.0.0,2.0[ -> 1.15.5 |+--- rubygems:kramdown:[2.3.1,3.0[ -> 2.4.0 ||\--- rubygems:rexml:[0,) -> 3.2.5 |+--- rubygems:kramdown-parser-gfm:[1.0.0,2.0[ -> 1.1.0 ||\--- rubygems:kramdown:[2.0.0,3.0[ -> 2.4.0 (*) |+--- rubygems:liquid:[4.0.0,5.0[ -> 4.0.4 |+--- rubygems:mercenary:[0.3.6,0.5[ -> 0.4.0 |+--- rubygems:pathutil:[0.9.0,1.0[ -> 0.16.2 ||\--- rubygems:forwardable-extended:[2.6.0,3.0[ -> 2.6.0 |+--- rubygems:rouge:[3.0,5.0[ -> 3.30.0 |+--- rubygems:safe_yaml:[1.0.0,2.0[ -> 1.0.5 |+--- rubygems:terminal-table:[1.8,4.0[ -> 3.0.2 ||\--- rubygems:unicode-display_width:[1.1.1,3[ -> 2.4.2 |\--- rubygems:webrick:[1.7.0,2.0[ -> 1.7.0 +--- rubygems:http_parser.rb:0.6.0 +--- rubygems:webrick:1.7.0 +--- rubygems:json:2.6.3 +--- rubygems:rouge:3.30.0 +--- rubygems:kramdown:2.4.0 (*) \--- rubygems:kramdown-parser-gfm:1.1.0 (*) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1621: Bump ffi from 1.9.23 to 1.15.5 in /docs
li-afaris commented on PR #1621: URL: https://github.com/apache/samza/pull/1621#issuecomment-1381195077 This can probably be closed "won't-fix" as ffi was upgraded to 1.15.5 from pull request #1647. The following is from `./gradlew docs:dependencies` ``` gems +--- rubygems:jekyll-sass-converter:2.2.0 |\--- rubygems:sassc:]2.0.1,3.0[ -> 2.4.0 | \--- rubygems:ffi:[1.9.0,2.0[ -> 1.15.5 +--- rubygems:jekyll:4.3.1 |+--- rubygems:addressable:[2.4.0,3.0[ -> 2.8.1 ||\--- rubygems:public_suffix:[2.0.2,6.0[ -> 5.0.1 |+--- rubygems:colorator:[1.0.0,2.0[ -> 1.1.0 |+--- rubygems:em-websocket:[0.5.0,1.0[ -> 0.5.3 ||+--- rubygems:eventmachine:[0.12.9,) -> 1.2.7 ||\--- rubygems:http_parser.rb:[0,) -> 0.6.0 |+--- rubygems:i18n:[1.0.0,2.0[ -> 1.12.0 ||\--- rubygems:concurrent-ruby:[1.0.0,2.0[ -> 1.1.10 |+--- rubygems:jekyll-sass-converter:[2.0,4.0[ -> 2.2.0 (*) |+--- rubygems:jekyll-watch:[2.0.0,3.0[ -> 2.2.1 ||\--- rubygems:listen:[3.0.0,4.0[ -> 3.8.0 || +--- rubygems:rb-fsevent:[0.10.3,1.0[ -> 0.11.2 || \--- rubygems:rb-inotify:[0.9.10,1.0[ -> 0.10.1 || \--- rubygems:ffi:[1.0.0,2.0[ -> 1.15.5 |+--- rubygems:kramdown:[2.3.1,3.0[ -> 2.4.0 ||\--- rubygems:rexml:[0,) -> 3.2.5 |+--- rubygems:kramdown-parser-gfm:[1.0.0,2.0[ -> 1.1.0 ||\--- rubygems:kramdown:[2.0.0,3.0[ -> 2.4.0 (*) |+--- rubygems:liquid:[4.0.0,5.0[ -> 4.0.4 |+--- rubygems:mercenary:[0.3.6,0.5[ -> 0.4.0 |+--- rubygems:pathutil:[0.9.0,1.0[ -> 0.16.2 ||\--- rubygems:forwardable-extended:[2.6.0,3.0[ -> 2.6.0 |+--- rubygems:rouge:[3.0,5.0[ -> 3.30.0 |+--- rubygems:safe_yaml:[1.0.0,2.0[ -> 1.0.5 |+--- rubygems:terminal-table:[1.8,4.0[ -> 3.0.2 ||\--- rubygems:unicode-display_width:[1.1.1,3[ -> 2.4.2 |\--- rubygems:webrick:[1.7.0,2.0[ -> 1.7.0 +--- rubygems:http_parser.rb:0.6.0 +--- rubygems:webrick:1.7.0 +--- rubygems:json:2.6.3 +--- rubygems:rouge:3.30.0 +--- rubygems:kramdown:2.4.0 (*) \--- rubygems:kramdown-parser-gfm:1.1.0 (*) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1623: Bump json from 2.1.0 to 2.3.0 in /docs
li-afaris commented on PR #1623: URL: https://github.com/apache/samza/pull/1623#issuecomment-1381194149 This can probably be closed "won't-fix" as JSON 2.6.3 is in the dependency tree from pull request #1647. The following is from `./gradlew docs:dependencies` ``` gems +--- rubygems:jekyll-sass-converter:2.2.0 |\--- rubygems:sassc:]2.0.1,3.0[ -> 2.4.0 | \--- rubygems:ffi:[1.9.0,2.0[ -> 1.15.5 +--- rubygems:jekyll:4.3.1 |+--- rubygems:addressable:[2.4.0,3.0[ -> 2.8.1 ||\--- rubygems:public_suffix:[2.0.2,6.0[ -> 5.0.1 |+--- rubygems:colorator:[1.0.0,2.0[ -> 1.1.0 |+--- rubygems:em-websocket:[0.5.0,1.0[ -> 0.5.3 ||+--- rubygems:eventmachine:[0.12.9,) -> 1.2.7 ||\--- rubygems:http_parser.rb:[0,) -> 0.6.0 |+--- rubygems:i18n:[1.0.0,2.0[ -> 1.12.0 ||\--- rubygems:concurrent-ruby:[1.0.0,2.0[ -> 1.1.10 |+--- rubygems:jekyll-sass-converter:[2.0,4.0[ -> 2.2.0 (*) |+--- rubygems:jekyll-watch:[2.0.0,3.0[ -> 2.2.1 ||\--- rubygems:listen:[3.0.0,4.0[ -> 3.8.0 || +--- rubygems:rb-fsevent:[0.10.3,1.0[ -> 0.11.2 || \--- rubygems:rb-inotify:[0.9.10,1.0[ -> 0.10.1 || \--- rubygems:ffi:[1.0.0,2.0[ -> 1.15.5 |+--- rubygems:kramdown:[2.3.1,3.0[ -> 2.4.0 ||\--- rubygems:rexml:[0,) -> 3.2.5 |+--- rubygems:kramdown-parser-gfm:[1.0.0,2.0[ -> 1.1.0 ||\--- rubygems:kramdown:[2.0.0,3.0[ -> 2.4.0 (*) |+--- rubygems:liquid:[4.0.0,5.0[ -> 4.0.4 |+--- rubygems:mercenary:[0.3.6,0.5[ -> 0.4.0 |+--- rubygems:pathutil:[0.9.0,1.0[ -> 0.16.2 ||\--- rubygems:forwardable-extended:[2.6.0,3.0[ -> 2.6.0 |+--- rubygems:rouge:[3.0,5.0[ -> 3.30.0 |+--- rubygems:safe_yaml:[1.0.0,2.0[ -> 1.0.5 |+--- rubygems:terminal-table:[1.8,4.0[ -> 3.0.2 ||\--- rubygems:unicode-display_width:[1.1.1,3[ -> 2.4.2 |\--- rubygems:webrick:[1.7.0,2.0[ -> 1.7.0 +--- rubygems:http_parser.rb:0.6.0 +--- rubygems:webrick:1.7.0 +--- rubygems:json:2.6.3 +--- rubygems:rouge:3.30.0 +--- rubygems:kramdown:2.4.0 (*) \--- rubygems:kramdown-parser-gfm:1.1.0 (*) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1624: Bump redcarpet from 3.4.0 to 3.5.1 in /docs
li-afaris commented on PR #1624: URL: https://github.com/apache/samza/pull/1624#issuecomment-1381193456 This can probably be closed "won't-fix" as redcarpet was replaced with Kramdown from pull request #1647 and is no longer used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1632: Bump addressable from 2.5.2 to 2.8.1 in /docs
li-afaris commented on PR #1632: URL: https://github.com/apache/samza/pull/1632#issuecomment-1381192497 This can probably be closed "won't-fix" as addressable 2.8.1 is in the dependency tree from pull request #1647. The following is from `./gradlew docs:dependencies` ``` gems +--- rubygems:jekyll-sass-converter:2.2.0 |\--- rubygems:sassc:]2.0.1,3.0[ -> 2.4.0 | \--- rubygems:ffi:[1.9.0,2.0[ -> 1.15.5 +--- rubygems:jekyll:4.3.1 |+--- rubygems:addressable:[2.4.0,3.0[ -> 2.8.1 ||\--- rubygems:public_suffix:[2.0.2,6.0[ -> 5.0.1 |+--- rubygems:colorator:[1.0.0,2.0[ -> 1.1.0 |+--- rubygems:em-websocket:[0.5.0,1.0[ -> 0.5.3 ||+--- rubygems:eventmachine:[0.12.9,) -> 1.2.7 ||\--- rubygems:http_parser.rb:[0,) -> 0.6.0 |+--- rubygems:i18n:[1.0.0,2.0[ -> 1.12.0 ||\--- rubygems:concurrent-ruby:[1.0.0,2.0[ -> 1.1.10 |+--- rubygems:jekyll-sass-converter:[2.0,4.0[ -> 2.2.0 (*) |+--- rubygems:jekyll-watch:[2.0.0,3.0[ -> 2.2.1 ||\--- rubygems:listen:[3.0.0,4.0[ -> 3.8.0 || +--- rubygems:rb-fsevent:[0.10.3,1.0[ -> 0.11.2 || \--- rubygems:rb-inotify:[0.9.10,1.0[ -> 0.10.1 || \--- rubygems:ffi:[1.0.0,2.0[ -> 1.15.5 |+--- rubygems:kramdown:[2.3.1,3.0[ -> 2.4.0 ||\--- rubygems:rexml:[0,) -> 3.2.5 |+--- rubygems:kramdown-parser-gfm:[1.0.0,2.0[ -> 1.1.0 ||\--- rubygems:kramdown:[2.0.0,3.0[ -> 2.4.0 (*) |+--- rubygems:liquid:[4.0.0,5.0[ -> 4.0.4 |+--- rubygems:mercenary:[0.3.6,0.5[ -> 0.4.0 |+--- rubygems:pathutil:[0.9.0,1.0[ -> 0.16.2 ||\--- rubygems:forwardable-extended:[2.6.0,3.0[ -> 2.6.0 |+--- rubygems:rouge:[3.0,5.0[ -> 3.30.0 |+--- rubygems:safe_yaml:[1.0.0,2.0[ -> 1.0.5 |+--- rubygems:terminal-table:[1.8,4.0[ -> 3.0.2 ||\--- rubygems:unicode-display_width:[1.1.1,3[ -> 2.4.2 |\--- rubygems:webrick:[1.7.0,2.0[ -> 1.7.0 +--- rubygems:http_parser.rb:0.6.0 +--- rubygems:webrick:1.7.0 +--- rubygems:json:2.6.3 +--- rubygems:rouge:3.30.0 +--- rubygems:kramdown:2.4.0 (*) \--- rubygems:kramdown-parser-gfm:1.1.0 (*) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on a diff in pull request #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
li-afaris commented on code in PR #1647: URL: https://github.com/apache/samza/pull/1647#discussion_r1068741225 ## docs/meetups/index.html: ## @@ -21,7 +21,7 @@ --> -{% assign sorted = (site.meetups | sort: 'date') | reverse %} +{% assign sorted = site.meetups | sort: 'date' | reverse %} Review Comment: Yes this is due to the change from redcarpet to kramdown -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on a diff in pull request #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
li-afaris commented on code in PR #1647: URL: https://github.com/apache/samza/pull/1647#discussion_r1068741065 ## docs/talks/index.html: ## @@ -21,7 +21,7 @@ --> -{% assign sorted = (site.talks | sort: 'date') | reverse %} +{% assign sorted = site.talks | sort: 'date' | reverse %} Review Comment: Yes this is due to the change from redcarpet to kramdown ## docs/_blog/index.md: ## @@ -22,7 +22,7 @@ exclude_from_loop: true - {% assign sorted = (site.blog | sort: 'date') | reverse %} + {% assign sorted = site.blog | sort: 'date' | reverse %} Review Comment: Yes this is due to the change from redcarpet to kramdown -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on a diff in pull request #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
mynameborat commented on code in PR #1647: URL: https://github.com/apache/samza/pull/1647#discussion_r1068483167 ## docs/meetups/index.html: ## @@ -21,7 +21,7 @@ --> -{% assign sorted = (site.meetups | sort: 'date') | reverse %} +{% assign sorted = site.meetups | sort: 'date' | reverse %} Review Comment: why do we need these changes? ## docs/talks/index.html: ## @@ -21,7 +21,7 @@ --> -{% assign sorted = (site.talks | sort: 'date') | reverse %} +{% assign sorted = site.talks | sort: 'date' | reverse %} Review Comment: same as above. can we exclude these changes if they are not part of the original issue this RB is solving? ## docs/_blog/index.md: ## @@ -22,7 +22,7 @@ exclude_from_loop: true - {% assign sorted = (site.blog | sort: 'date') | reverse %} + {% assign sorted = site.blog | sort: 'date' | reverse %} Review Comment: revert unless its related to this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat merged pull request #1648: Update RocksDB to version 7.8.3 to reduce write amplification
mynameborat merged PR #1648: URL: https://github.com/apache/samza/pull/1648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on pull request #1648: Update RocksDB to version 7.8.3 to reduce write amplification
mynameborat commented on PR #1648: URL: https://github.com/apache/samza/pull/1648#issuecomment-1376380697 @shekhars-li can you run the build locally/publish another dummy commit to trigger the checks again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] shekhars-li opened a new pull request, #1648: Update RocksDB to version 7.8.3 to reduce write amplification
shekhars-li opened a new pull request, #1648: URL: https://github.com/apache/samza/pull/1648 What: Updated RocksDB to version 7.8.3 Why: This version reduces the write compaction by aligning compaction output file boundaries. More details can be found in [this blog post](http://rocksdb.org/blog/2022/10/31/align-compaction-output-file.html). TL;DR: Write amplification was observed to be reduced by 10% by optimizing cutting the compaction output file earlier. How: Feature is enabled by default in version 7.8.0 and above of RocksDB. Tests: None. Past tests were re-run to verify they are still passing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
li-afaris commented on PR #1647: URL: https://github.com/apache/samza/pull/1647#issuecomment-1372953087 FYI: Removing the Jekyll incremental build flag resolved the file name differences I found in the previous comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] amf3 commented on pull request #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
amf3 commented on PR #1647: URL: https://github.com/apache/samza/pull/1647#issuecomment-1371765244 I found the following differences after generating documents with Jekyll using both MRI Ruby & Jruby. Input was generated by running find on the _site directory and saving the output to a sorted text file. ``` $ sdiff -Ws -w 220 ~/mri_ruby_docs ~/jruby_docs ./Gemfile < ./Gemfile.lock < ./index.md.bak < ./meetups/aug-2016.md < ``` The missing Gemfile, Gemfile.lock, & index.md.bak files I'm not concerned with as they were removed as part of this change. The meetups/aug-2016.md file was a surprise. FWIW: it's not referenced in the MRI Ruby build or at https://samza.apache.org/meetups so I'm not concerned about it right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
li-afaris commented on PR #1647: URL: https://github.com/apache/samza/pull/1647#issuecomment-1361904668 > I'm open to suggestions for improving confidence relating to the Jekyll plugin changes. One thing I thought of was to list & compare the generated file names in _site. I found some of the javadoc files were missing with the jruby generated site. I think I need to revisit the Jekyll plugin changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris commented on pull request #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
li-afaris commented on PR #1647: URL: https://github.com/apache/samza/pull/1647#issuecomment-1360645861 I'm not sure why the last commit (ccb932d) failed testing when the previous one (651a1bc) worked. You can see the diff by clicking the ccb932d link above. Is the samza-kafka_2.12:test task stable? I ran `./gradlew clean build test` locally and the build was successful. Let me know if I need to trigger another build with an empty commit or if it's something else. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] li-afaris opened a new pull request, #1647: [SAMZA-2772] Manage Jekyll dependencies with Gradle
li-afaris opened a new pull request, #1647: URL: https://github.com/apache/samza/pull/1647 # Summary This change moves Jekyll dependency management into Gradle. Contributors will no longer be required to install ruby & Jekyll by hand. # Changes * Gemfiles no longer manage dependencies. This is done with the jruby-gradle-plugin Gradle plugin. * rubygems.org is still the source of Ruby libraries * The locally install MRI Ruby interpreter is replaced with JRuby. * This allows the Ruby interpreter to be a part of the Samza project. * Jekyll plugin changes * Rouge replaces pygments.rb. Rouge is native Ruby and does not have a python requirement. * Kramdown replaces Redcarpet because Redcarpet does not work with JRuby. * Both Kramdown & Rouge can run inside the JRuby JVM. * buildscript.gradle * jcenter.bintray.com is no longer active. * plugins.gradle.org has the required plugins # Testing Testing was done with JDK 1.8 and on Linux. I visually confirmed that content generated with JRuby looks like content generated by MRI Ruby by running two Jekyll instances at the same time. Both Jekyll instances threw the same errors for missing content at `/learn/documentation/latest`. I'm open to suggestions for improving confidence relating to the Jekyll plugin changes. I also ran `./gradlew test` which built successfully. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] xinyuiscool merged pull request #1646: [Docs] Add Ajo Thomas to committers list and GPG keys
xinyuiscool merged PR #1646: URL: https://github.com/apache/samza/pull/1646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas opened a new pull request, #1646: [DOC] Add Ajo Thomas to committers list and GPG keys
ajothomas opened a new pull request, #1646: URL: https://github.com/apache/samza/pull/1646 Add Ajo Thomas to committers list and GPG keys. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] dxichen merged pull request #1645: Fix restore factory not configured log
dxichen merged PR #1645: URL: https://github.com/apache/samza/pull/1645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] dxichen opened a new pull request, #1645: Fix restore factory not configured log
dxichen opened a new pull request, #1645: URL: https://github.com/apache/samza/pull/1645 Currently if a restore factory is missing due to a misconfiguration we exit due to a NPE ``` 2022-12-13 18:23:07.794 [main] SamzaContainer [ERROR] Caught exception/error while initializing container. java.lang.NullPointerException: null at org.apache.samza.storage.ContainerStorageManager.lambda$createTaskRestoreManagers$15(ContainerStorageManager.java:485) ~[samza-core_2.12-320.1081.0.4.jar:?] at java.util.HashMap.forEach(HashMap.java:1289) ~[?:1.8.0_282] at org.apache.samza.storage.ContainerStorageManager.createTaskRestoreManagers(ContainerStorageManager.java:480) ~[samza-core_2.12-320.1081.0.4.jar:?] at org.apache.samza.storage.ContainerStorageManager.lambda$restoreStores$27(ContainerStorageManager.java:857) ~[samza-core_2.12-320.1081.0.4.jar:?] at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684) ~[?:1.8.0_282] ``` Changed the behavior to output a more meaningful log -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] dxichen merged pull request #1644: Fix performance bug in kafka/blobstore commit lifecycle.
dxichen merged PR #1644: URL: https://github.com/apache/samza/pull/1644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] shekhars-li opened a new pull request, #1644: Fix perfomance bug
shekhars-li opened a new pull request, #1644: URL: https://github.com/apache/samza/pull/1644 Bug - We introduced blob store backed for state backup and restore in [this commit](https://github.com/apache/samza/commit/7cc4eaa96fff244f6dce9c18af804917db7c3b2b). - `StateBackendFactory` implementations like for Kafka and BlobStore create systemAdmins every time `getBackupFactory` is called from `SamzaContainer`. This leads to creation of duplicate kafka admin threads. - This has impact on performance of large samza jobs as we have thousands of duplicate threads spawned. Fix - Pass `systemAdminsMap` to `getBackendFactory` method. This initializes the SystemAdmins with the system admin map rather than creating system admins every time the method is called. Test - Tested with local jobs to verify the kafka admin threads are not duplicated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] shanthoosh merged pull request #1641: LISAMZA-27395 removing the current recursive call prevention logic
shanthoosh merged PR #1641: URL: https://github.com/apache/samza/pull/1641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] shanthoosh commented on pull request #1641: LISAMZA-27395 removing the current recursive call prevention logic
shanthoosh commented on PR #1641: URL: https://github.com/apache/samza/pull/1641#issuecomment-1317938599 Thanks for adding the tests @jia-gao . After going through the linkedin documents and JIRA tickets, it seems like we are just proceeding with the approach finalized as a part of the investigation. We had just removed the recursive call check from the log4j2-appender implementation and we're relying upon native-recursive check from log4j2. Mostly the changes look good to me, let us wait for the build to pass and we can merge this in! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] jia-gao commented on pull request #1641: LISAMZA-27395 removing the current recursive call prevention logic
jia-gao commented on PR #1641: URL: https://github.com/apache/samza/pull/1641#issuecomment-1317936984 > validating > @jia-gao As discussed offline, it would be great if we can add a unit-test for validating these changes in this patch. Added unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] xinyuiscool merged pull request #1642: SAMZA-2769: [PipelineDrain] Add drainMode to DrainNotification
xinyuiscool merged PR #1642: URL: https://github.com/apache/samza/pull/1642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1642: SAMZA-2769: [PipelineDrain] Add drainMode to DrainNotification
ajothomas commented on code in PR #1642: URL: https://github.com/apache/samza/pull/1642#discussion_r1024611620 ## samza-core/src/main/java/org/apache/samza/drain/DrainUtils.java: ## @@ -54,44 +74,34 @@ private DrainUtils() { * @return generated uuid for the DrainNotification */ @VisibleForTesting - public static UUID writeDrainNotification(MetadataStore metadataStore, String runId) { + public static UUID writeSimpleDrainNotification(MetadataStore metadataStore, String runId) { Review Comment: Merged this with `writeDrainNotification(metadataStore)` ## samza-core/src/main/java/org/apache/samza/drain/DrainUtils.java: ## @@ -54,44 +74,34 @@ private DrainUtils() { * @return generated uuid for the DrainNotification */ @VisibleForTesting - public static UUID writeDrainNotification(MetadataStore metadataStore, String runId) { + public static UUID writeSimpleDrainNotification(MetadataStore metadataStore, String runId) { Preconditions.checkArgument(metadataStore != null, "MetadataStore cannot be null."); Preconditions.checkArgument(!Strings.isNullOrEmpty(runId), "runId should be non-null."); LOG.info("Attempting to write DrainNotification to metadata-store for the deployment ID {}", runId); +final UUID uuid = UUID.randomUUID(); +final DrainNotification message = DrainNotification.simple(uuid, runId); +return writeDrainNotification(metadataStore, message); + } + + /** + * Writes a {@link DrainNotification} to an underlying metadata store. + * */ + @VisibleForTesting + public static UUID writeDrainNotification(MetadataStore metadataStore, DrainNotification drainNotification) { Review Comment: This is package private now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] shanthoosh commented on pull request #1641: LISAMZA-27395 removing the current recursive call prevention logic
shanthoosh commented on PR #1641: URL: https://github.com/apache/samza/pull/1641#issuecomment-1317817165 @jia-gao As discussed offline, it would be great if we can add a unit-test for validating these changes in this patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1642: SAMZA-2769: [PipelineDrain] Add drainMode to DrainNotification
ajothomas commented on code in PR #1642: URL: https://github.com/apache/samza/pull/1642#discussion_r1024598787 ## samza-api/src/main/java/org/apache/samza/drain/DrainMode.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.drain; + +/** + * Defines the type of drain operation. + * */ +public enum DrainMode { + /** + * Simple drain mode is the default behavior of the drain operation. + * All intermediate streams and any samza managed state will be drained and the pipeline will be gracefully shutdown + * as a result. User state will not be drained. + * */ + SIMPLE; Review Comment: Done ## samza-api/src/main/java/org/apache/samza/drain/DrainNotification.java: ## @@ -35,9 +35,23 @@ public class DrainNotification { */ private final String runId; - public DrainNotification(UUID uuid, String runId) { + /***/ + private final DrainMode drainMode; + + public DrainNotification(UUID uuid, String runId, DrainMode drainMode) { +Preconditions.checkNotNull(uuid); +Preconditions.checkNotNull(runId); +Preconditions.checkNotNull(drainMode); this.uuid = uuid; this.runId = runId; +this.drainMode = drainMode; + } + + /** + * Creates a DrainNotification in {@link DrainMode#SIMPLE} mode. + * */ + public static DrainNotification simple(UUID uuid, String runId) { Review Comment: Done ## samza-core/src/main/java/org/apache/samza/drain/DrainUtils.java: ## @@ -45,6 +45,26 @@ public class DrainUtils { private DrainUtils() { } + /** + * Writes a {@link DrainNotification} to the underlying metastore. This method should be used by external controllers + * to issue a DrainNotification to the JobCoordinator and Samza Containers. + * @param metadataStore Metadata store to write drain notification to. + * + * @return generated uuid for the DrainNotification + */ + public static UUID writeSimpleDrainNotification(MetadataStore metadataStore) { Review Comment: Changed it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] xinyuiscool commented on a diff in pull request #1642: SAMZA-2769: [PipelineDrain] Add drainMode to DrainNotification
xinyuiscool commented on code in PR #1642: URL: https://github.com/apache/samza/pull/1642#discussion_r1024576429 ## samza-api/src/main/java/org/apache/samza/drain/DrainMode.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.drain; + +/** + * Defines the type of drain operation. + * */ +public enum DrainMode { + /** + * Simple drain mode is the default behavior of the drain operation. + * All intermediate streams and any samza managed state will be drained and the pipeline will be gracefully shutdown + * as a result. User state will not be drained. + * */ + SIMPLE; Review Comment: As your javadoc suggests, let's call it default? ## samza-api/src/main/java/org/apache/samza/drain/DrainNotification.java: ## @@ -35,9 +35,23 @@ public class DrainNotification { */ private final String runId; - public DrainNotification(UUID uuid, String runId) { + /***/ + private final DrainMode drainMode; + + public DrainNotification(UUID uuid, String runId, DrainMode drainMode) { +Preconditions.checkNotNull(uuid); +Preconditions.checkNotNull(runId); +Preconditions.checkNotNull(drainMode); this.uuid = uuid; this.runId = runId; +this.drainMode = drainMode; + } + + /** + * Creates a DrainNotification in {@link DrainMode#SIMPLE} mode. + * */ + public static DrainNotification simple(UUID uuid, String runId) { Review Comment: let's just call it create(). We can have another variant factory method for create with other mode. ## samza-core/src/main/java/org/apache/samza/drain/DrainUtils.java: ## @@ -54,44 +74,34 @@ private DrainUtils() { * @return generated uuid for the DrainNotification */ @VisibleForTesting - public static UUID writeDrainNotification(MetadataStore metadataStore, String runId) { + public static UUID writeSimpleDrainNotification(MetadataStore metadataStore, String runId) { Review Comment: If this is only used for testing, we should be able to test the variant of using DrainNotification. If not user-facing, let's get rid of it. ## samza-core/src/main/java/org/apache/samza/drain/DrainUtils.java: ## @@ -54,44 +74,34 @@ private DrainUtils() { * @return generated uuid for the DrainNotification */ @VisibleForTesting - public static UUID writeDrainNotification(MetadataStore metadataStore, String runId) { + public static UUID writeSimpleDrainNotification(MetadataStore metadataStore, String runId) { Preconditions.checkArgument(metadataStore != null, "MetadataStore cannot be null."); Preconditions.checkArgument(!Strings.isNullOrEmpty(runId), "runId should be non-null."); LOG.info("Attempting to write DrainNotification to metadata-store for the deployment ID {}", runId); +final UUID uuid = UUID.randomUUID(); +final DrainNotification message = DrainNotification.simple(uuid, runId); +return writeDrainNotification(metadataStore, message); + } + + /** + * Writes a {@link DrainNotification} to an underlying metadata store. + * */ + @VisibleForTesting + public static UUID writeDrainNotification(MetadataStore metadataStore, DrainNotification drainNotification) { Review Comment: If it's only used for testing, please make it package private. ## samza-core/src/main/java/org/apache/samza/drain/DrainUtils.java: ## @@ -45,6 +45,26 @@ public class DrainUtils { private DrainUtils() { } + /** + * Writes a {@link DrainNotification} to the underlying metastore. This method should be used by external controllers + * to issue a DrainNotification to the JobCoordinator and Samza Containers. + * @param metadataStore Metadata store to write drain notification to. + * + * @return generated uuid for the DrainNotification + */ + public static UUID writeSimpleDrainNotification(MetadataStore metadataStore) { Review Comment: Let's just use writeDrainNotification(). We can overload it with other parameters later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To
[GitHub] [samza] ajothomas closed pull request #1643: SAMZA-2770: [Pipeline Drain] Enable drain monitor by default
ajothomas closed pull request #1643: SAMZA-2770: [Pipeline Drain] Enable drain monitor by default URL: https://github.com/apache/samza/pull/1643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas opened a new pull request, #1643: SAMZA-2770: [Pipeline Drain] Enable drain monitor by default
ajothomas opened a new pull request, #1643: URL: https://github.com/apache/samza/pull/1643 # Summary Drain monitor had been turned off by default. The idea was to turn it on after testing the pipeline drain feature. # Changes - Set `DRAIN_MONITOR_ENABLED_DEFAULT` to true in `JobConfig` - Alter Drain integration tests which had explicitly set the `job.drain-monitor.enabled` to true # Tests - No new tests. Re-run existing integration tests # API Changes None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas opened a new pull request, #1642: SAMZA-2769: [PipelineDrain] Abstract DrainNotification to make it extensible
ajothomas opened a new pull request, #1642: URL: https://github.com/apache/samza/pull/1642 # Summary The current `DrainNotification` is a concrete class tasked with signaling samza engine to clear intermediate state. It is currently inextensible and any changes to the class in the future to accommodate new features would make the class backward incompatible. As DrainNotifications are stored in the metadata store, we need to be make sure that it can be extended in a backward compatible way. # Changes - Convert `DrainNotification` to an abstract class. Add `BasicDrainNotification` to represent a standard drain operation which involves samza clearing intermediate streams and any system managed state. Treat `DrainNotification` and `BasicDrainNotification` as api by putting in under `samza-api` - Change `DrainNotificationObjectMapper` - Alter `DrainUtils` and `DrainMonitor` to accommodate the above changes # Tests - Tests for `DrainNotificationObjectMapper` - Tests for `DrainUtils` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] jia-gao commented on a diff in pull request #1641: LISAMZA-27395 removing the current recursive call prevention logic
jia-gao commented on code in PR #1641: URL: https://github.com/apache/samza/pull/1641#discussion_r1024297945 ## samza-log4j2/src/main/java/org/apache/samza/logging/log4j2/StreamAppenderMetrics.java: ## @@ -46,7 +43,6 @@ public class StreamAppenderMetrics extends MetricsBase { public StreamAppenderMetrics(String prefix, MetricsRegistry registry) { super(prefix + "-", registry); bufferFillPct = newGauge("buffer-fill-percent", 0); -recursiveCalls = newCounter("recursive-calls"); Review Comment: Didn't find StreamAppenderMetrics published in any oss samza doc https://samza.apache.org/learn/documentation/latest/container/metrics-table.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1641: LISAMZA-27395 removing the current recursive call prevention logic
alnzng commented on code in PR #1641: URL: https://github.com/apache/samza/pull/1641#discussion_r1021917783 ## samza-log4j2/src/main/java/org/apache/samza/logging/log4j2/StreamAppenderMetrics.java: ## @@ -46,7 +43,6 @@ public class StreamAppenderMetrics extends MetricsBase { public StreamAppenderMetrics(String prefix, MetricsRegistry registry) { super(prefix + "-", registry); bufferFillPct = newGauge("buffer-fill-percent", 0); -recursiveCalls = newCounter("recursive-calls"); Review Comment: is this metric published in some Samza doc? If yes, let's remove it from there as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] jia-gao commented on pull request #1641: LISAMZA-27395 removing the current recursive call prevention logic
jia-gao commented on PR #1641: URL: https://github.com/apache/samza/pull/1641#issuecomment-1314166098 > Is the default support to prevent recursion not available when this change was introduced or did we upgrade to newer version of log4j that adds this prevention at the framework layer? log4j1 doesn't have this prevention that's when we added the change in samza. Later we upgraded to log4j2 which has he prevention at the framework layer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat merged pull request #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
mynameborat merged PR #1637: URL: https://github.com/apache/samza/pull/1637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
ajothomas commented on code in PR #1637: URL: https://github.com/apache/samza/pull/1637#discussion_r1018537087 ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -106,7 +110,10 @@ public void run() { callback.failure(new SamzaException(msg)); } }; - ScheduledFuture scheduledFuture = timer.schedule(timerTask, timeout, TimeUnit.MILLISECONDS); + + final ScheduledFuture scheduledFuture = isDraining && (envelope.isDrain() || envelope.isWatermark()) + ? timer.schedule(timerTask, drainCallbackTimeout, TimeUnit.MILLISECONDS) + : timer.schedule(timerTask, timeout, TimeUnit.MILLISECONDS); Review Comment: Added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
ajothomas commented on code in PR #1637: URL: https://github.com/apache/samza/pull/1637#discussion_r1018502824 ## samza-core/src/main/java/org/apache/samza/config/TaskConfig.java: ## @@ -90,6 +90,13 @@ public class TaskConfig extends MapConfig { // timeout for triggering a callback public static final String CALLBACK_TIMEOUT_MS = "task.callback.timeout.ms"; static final long DEFAULT_CALLBACK_TIMEOUT_MS = -1L; + + // timeout for triggering a callback during drain + public static final String DRAIN_CALLBACK_TIMEOUT_MS = "task.callback.drain.timeout.ms"; + + // default timeout for triggering a callback during drain + static final long DEFAULT_DRAIN_CALLBACK_TIMEOUT_MS = -1L; Review Comment: The defaults for the process callback timeout and drain callback timeout are both -1 here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
ajothomas commented on code in PR #1637: URL: https://github.com/apache/samza/pull/1637#discussion_r1018501997 ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -106,7 +110,10 @@ public void run() { callback.failure(new SamzaException(msg)); } }; - ScheduledFuture scheduledFuture = timer.schedule(timerTask, timeout, TimeUnit.MILLISECONDS); + + final ScheduledFuture scheduledFuture = isDraining && (envelope.isDrain() || envelope.isWatermark()) Review Comment: Thanks, refactored the `TaskCallbackManager`'s `createCallback` method to incorporate timeout and moved the logic to decide what timeout to use to the `process()` method inside `AsyncTaskWorker` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
ajothomas commented on code in PR #1637: URL: https://github.com/apache/samza/pull/1637#discussion_r1018500682 ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -127,4 +127,14 @@ public List updateCallback(TaskCallbackImpl callback) { return ImmutableList.of(callback); } } + + /** + * Override the timeout set in the callback manager with the given new timeout. + * This is intended to be used with pipeline drain as we want to override the existing timeout with a higher timeout. + * + * @param timeout new timeout for process callbacks + * */ + public void updateTaskCallbackTimeout(long timeout) { Review Comment: Changed this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] xinyuiscool merged pull request #1640: SAMZA-2741: [Pipeline Drain] Fix ApplicationUtil.isHighLevelApiJob to work for anonymous and lambda `SamzaApplication` classes
xinyuiscool merged PR #1640: URL: https://github.com/apache/samza/pull/1640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on a diff in pull request #1640: SAMZA-2741: [Pipeline Drain] Fix ApplicationUtil.isHighLevelApiJob to work for anonymous and lambda `SamzaApplication` classes
mynameborat commented on code in PR #1640: URL: https://github.com/apache/samza/pull/1640#discussion_r1017221725 ## samza-core/src/main/java/org/apache/samza/config/ApplicationConfig.java: ## @@ -46,6 +46,11 @@ public enum ApplicationMode { BATCH } + public enum ApplicationApiType { +LEGACY, +LOW_LEVEL, +HIGH_LEVEL + } Review Comment: The reason to do so even if its internally used is to make sure any changes to this enum should be treated as API changes and would need to be treated as backward compatible as this will be used in the coordinator stream configuration which are persisted and read across different versions of samza. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on pull request #1641: LISAMZA-27395 removing the current recursive call prevention logic
mynameborat commented on PR #1641: URL: https://github.com/apache/samza/pull/1641#issuecomment-1307762089 Is the default support to prevent recursion not available when this change was introduced or did we upgrade to newer version of log4j that adds this prevention at the framework layer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] jia-gao opened a new pull request, #1641: LISAMZA-27395 removing the current recursive call prevention logic
jia-gao opened a new pull request, #1641: URL: https://github.com/apache/samza/pull/1641 Issue: We have observed logging data loss happen in StreamAppender The RC is that current implementation of the “[append()](https://github.com/apache/samza/blob/master/samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java#L146)” method uses one AtomicBoolean variable to detect if the thread is called recursively. However, it won’t work as expected since Log4j2 framework already applies recursive call prevention and it has the side effect that data loss might occur in the parallel logging scenario Change: Remove current recursive call prevention logic, Leverage [log4j2 framework](https://github.com/apache/logging-log4j2/blob/release-2.x/log4j-core/src/main/java/org/apache/logging/log4j/core/config/AppenderControl.java#L122) to prevent recursive calls API Changes: Remove a metric recursivecall from StreamAppenderMetrics since it is no longer used Test Done: ./gradlew build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on a diff in pull request #1640: SAMZA-2741: [Pipeline Drain] Fix ApplicationUtil.isHighLevelApiJob to work for anonymous and lambda `SamzaApplication` classes
mynameborat commented on code in PR #1640: URL: https://github.com/apache/samza/pull/1640#discussion_r1016065159 ## samza-core/src/main/java/org/apache/samza/container/RunLoopFactory.java: ## @@ -69,6 +69,8 @@ public static Runnable createRunLoop(scala.collection.immutable.Map
[GitHub] [samza] ajothomas opened a new pull request, #1640: SAMZA-2741: [Pipeline Drain] Fix ApplicationUtil.isHighLevelApiJob to work for anonymous and lambda `SamzaApplication` classes
ajothomas opened a new pull request, #1640: URL: https://github.com/apache/samza/pull/1640 # Summary `ApplicationUtil.isHighLevelApiJob` doesn't work for cases where `SamzaApplication` could have been created as an anonymous class and lambdas # Cause Anonymous class names and lambdas classes cannot be created using `Class forName()`. As a result, the current logic currently assumes that the class name was incorrect. # Fix Introduce an internal config `app.api.type` and we set the config using `AppDescriptor` in `JobPlanner`. # Test - Unit test for `ApplicationUtil.isHighLevelApiJob` - Tested with a sample application -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat merged pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
mynameborat merged PR #1636: URL: https://github.com/apache/samza/pull/1636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on PR #1636: URL: https://github.com/apache/samza/pull/1636#issuecomment-1306372678 @mynameborat I have fixed the check issue(checksytle issue) and resolved all the comments. Can you please check? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015983431 ## samza-core/src/main/java/org/apache/samza/container/host/ProcessCPUStatistics.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import java.util.Objects; + + +/** + * A {@link ProcessCPUStatistics} object represents recent CPU usage percentage about the container process(including its child processes) + */ +public class ProcessCPUStatistics { Review Comment: This is for the consistency with the memory metric implementation: `SystemMemoryStatistics` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
mynameborat commented on PR #1636: URL: https://github.com/apache/samza/pull/1636#issuecomment-1306336433 Looks good to me mostly. Can you close out the conversations that have been resolved and respond to the comments that are pending. Also, fix the latest checks so that I can merge them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
mynameborat commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015978318 ## samza-core/src/main/java/org/apache/samza/container/host/ProcessCPUStatistics.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import java.util.Objects; + + +/** + * A {@link ProcessCPUStatistics} object represents recent CPU usage percentage about the container process(including its child processes) + */ +public class ProcessCPUStatistics { Review Comment: as in why not use primitive and have a class to represent CPU statistics? Do you plan to expand the PCU stats? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015958371 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); Review Comment: Yes, we can. I can update it and return null for consistent with `PosxCommandBasedStatsGetter ` implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015949373 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); Review Comment: Oh, I see the gap here: Your point is that host CPU num != JVM available CPU num, and I thought they were the same. I think your point is correct in some scenarios. I did some research/test on the `OperatingSystemMXBean. getAvailableProcessors()`, the count is for logical CPU count instead of phsical CPU count. For non-containerization or VM use cases, the available logic/phsical CPU count in JVM could be the same as in host, IIUC. I will update this oshi implementation based on logical CPU count anyway since we should be consistent with JVM. Note: I didn't verify if oshi shows the consistent behavior/result regarding the CPU count in the containerizaiton or VM land(e.g. docker, kubernetes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015949373 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); Review Comment: Oh, I see the gap here: Your point is that host CPU num != JVM available CPU num, and I thought they were the same. I think your point is correct in some scenarios. I did some research/test on the `OperatingSystemMXBean. getAvailableProcessors()`, the count is for logical CPU count instead of phsical CPU count. For non-containerization or VM use cases, the available logic/phsical CPU count in JVM should be same as in host, IIUC. I will update this oshi implementation based on logical CPU count anyway since we should be consistent with JVM. Note: I didn't verify if oshi shows the consistent behavior/result regarding the CPU count in the containerizaiton or VM land(e.g. docker, kubernetes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015949373 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); Review Comment: Oh, I see the gap here: Your point is that host CPU num != JVM available CPU num, and I thought they were the same. I think your point is correct. I did some research/test on the `OperatingSystemMXBean. getAvailableProcessors()`, the count is for logical CPU count instead of phsical CPU count. For non-containerization or VM use cases, the available logic/phsical CPU count in JVM should be same as in host, IIUC. I will update this oshi implementation based on logical CPU count anyway since we should be consistent with JVM. Note: I didn't verify if oshi shows the consistent behavior/result regarding the CPU count in the containerizaiton or VM land(e.g. docker, kubernetes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015949373 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); Review Comment: Oh, I see the gap here: Your point is that host CPU num != JVM available CPU num, and I thought they were the same. I think your point is correct. I did some research/test on the `OperatingSystemMXBean. getAvailableProcessors()`, the count is for logical CPU count instead of phsical CPU count. For non-containerization or VM use cases, the available logic/phsical CPU count in JVM should be same with host, IIUC. I will update this oshi implementation based on logical CPU count anyway since we should be consistent with JVM. Note: I didn't verify if oshi shows the consistent behavior/result regarding the CPU count in the containerizaiton or VM land(e.g. docker, kubernetes). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] lakshmi-manasa-g commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
lakshmi-manasa-g commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015754291 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); Review Comment: no, i am saying that the current samza process-cpu usage does NOT take into account the num cpu cores on the host. OperatingSystemMXBean is actually giving out the fraction of the total available cpu that the jvm process is using. hence it is between [0,1] AND NOT the actual cpu tick count which would need to take into account cpu core count but the proposed change in the pr takes into account num cores and gives the actual cpu tick count ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015751569 ## samza-core/src/main/java/org/apache/samza/container/host/ProcessCPUStatistics.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import java.util.Objects; + + +/** + * A {@link ProcessCPUStatistics} object represents recent CPU usage percentage about the container process(including its child processes) + */ +public class ProcessCPUStatistics { Review Comment: Can you please clarify `wrap this into a container object`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015714131 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); Review Comment: >but in general, an interface impl should have a clear contract instead of bubbling up exceptions that are internal to its impl and except the caller/user of the interface to deal with it. Make sense. But in this case, I didn't see any checked/unchecked/errors that can be thrown in the current oshi implementation. Will you still suggest that we should catch all Exceptions explicitly or throw Samza its own exception? ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting +
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015713429 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); Review Comment: Yeah, your understanding is the same as mine. So you are good with this PR's implementation, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] lakshmi-manasa-g commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
lakshmi-manasa-g commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015642158 ## samza-core/src/main/java/org/apache/samza/container/host/SystemStatisticsGetter.java: ## @@ -30,4 +30,11 @@ public interface SystemStatisticsGetter { * @return {@link SystemMemoryStatistics} for the Samza container */ SystemMemoryStatistics getSystemMemoryStatistics(); + + /** + * Returns the {@link ProcessCPUStatistics} for the current Samza container process(includes its child processes) + * + * @return {@link ProcessCPUStatistics} for the Samza container process + */ + ProcessCPUStatistics getProcessCPUStatistics(); Review Comment: got it. that makes sense to me. ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); + } + + private List getCurrentProcessAndChildProcesses() { +final List processes = new ArrayList<>(); +// get current process +processes.add(os.getProcess(os.getProcessId())); +// get all child processes of current process +processes.addAll(os.getChildProcesses(os.getProcessId(), OperatingSystem.ProcessFiltering.ALL_PROCESSES, +OperatingSystem.ProcessSorting.NO_SORTING, 0)); +return processes; + } + + private double getTotalCPUUsage(List processes) { +return processes.stream() +.mapToDouble(p -> p.getProcessCpuLoadBetweenTicks(previousProcessSnapshots.get(p.getProcessID( Review Comment: actually this brings out an important point: the impl is ``` public ProcessCPUStatistics getProcessCPUStatistics() { final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); refreshProcessSnapshots(currentProcessAndChildProcesses); return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); ``` so the first time this is called, `previousProcessSnapshots` is an empty hash map right. so `get` from it will return null. seems in this case the cpu usage returned cumulative in that case. which seems okay. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact
[GitHub] [samza] lakshmi-manasa-g commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
lakshmi-manasa-g commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1015641830 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); Review Comment: from the desc above, it seems the OperatingSystemMXBean is actually giving out the **fraction** of the total available cpu that the jvm process is using. hence it is between [0,1]. My understanding is based on the following excerpt pasted above >A value of 0.0 means that none of the CPUs were running threads from the JVM process during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running threads from the JVM 100% of the time during the recent period being observed. this to me indicates fraction of the num CPUs being used by jvm process. Also matches the way `getProccessCpuLoad` is used within Samza's JVMMetrics ``` val processCpuLoad = operatingSystemMXBean.getProcessCpuLoad gProcessCpuUsage.set(processCpuLoad * PCT) gProcessCpuUsageProcessors.set(processCpuLoad * operatingSystemMXBean.getAvailableProcessors) gSystemCpuUsage.set(operatingSystemMXBean.getSystemCpuLoad * PCT) ``` Note how we multiply processCpuLoad with 100 to get the percent of total cpu being used by the process. ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014527935 ## samza-core/src/main/java/org/apache/samza/container/host/PosixCommandBasedStatisticsGetter.java: ## @@ -85,4 +86,10 @@ public SystemMemoryStatistics getSystemMemoryStatistics() { return null; } } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +throw new NotImplementedException( +"No appropriate Posix command available for getting recent CPU usage information. For example, the CPU information exposed by ps command 'ps -o %cpu= -p ' represents the percentage of time spent running during the entire lifetime of a process not for the recent CPU usage"); + } Review Comment: Yes, `top` can give the instant/recent CPU usage information for a given process, but it has below cons so that I didn't choose it: - Need to have operating system specific implementation, since the top command is not a POSIX command and its input and output format in Linux and macOS are different. - Slower than ps/oshi command. Especially, on macOS the top command needs to get at least two samples to get the right CPU usage, which takes longer time(~ 2 seconds) than 1 sample. - not clear the reason, but it shows the consistent behavior that one sample(-l1) always return 0.0 on my macOS -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014528695 ## samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala: ## @@ -49,6 +49,7 @@ class SamzaContainerMetrics( val executorWorkFactor = newGauge("executor-work-factor", 1.0) val physicalMemoryMb = newGauge("physical-memory-mb", 0.0F) val physicalMemoryUtilization = newGauge("physical-memory-utilization", 0.0F) + val totalProcessCpuUsage = newGauge("total-process-cpu-usage", 0.0) Review Comment: Prefer to create a new metric `total-process-cpu-usage` to represent the cpu usage for the Samza container’s JVM process and all its child/descendant processes because of - I view it as a non-backward compatible change if let the existing “process-cpu-usage” metric includes child/descendant processes especially for non-JVM processes. - Samza official doc clearly states that the `process-cpu-usage` returns the “recent cpu usage” for the Java Virtual Machine process: https://samza.apache.org/learn/documentation/1.6.0/operations/monitoring.html - Having a new metric could be consistent with how Samza exposes `physical-memory-mb`. This memory related metric which includes child process is available for Samza container only, the AM doesn’t have it. - Implementation/Maintainance complexity. If keep using the existing `process-cpu-usage` we will have to deprecate the JVMMetrics implementation and add a new hook into AM to expose the metric. Good point on the doc, let me update the document. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014528695 ## samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala: ## @@ -49,6 +49,7 @@ class SamzaContainerMetrics( val executorWorkFactor = newGauge("executor-work-factor", 1.0) val physicalMemoryMb = newGauge("physical-memory-mb", 0.0F) val physicalMemoryUtilization = newGauge("physical-memory-utilization", 0.0F) + val totalProcessCpuUsage = newGauge("total-process-cpu-usage", 0.0) Review Comment: Prefer to create a new metric(e.g. total-process-cpu-usage) to represent the cpu usage for the Samza container’s JVM process and all its child/descendant processes because of - I view it as a non-backward compatible change if let the existing “process-cpu-usage” metric to include child/descendant processes especially for non-JVM processes. - Samza official doc clearly states that the `process-cpu-usage` returns the “recent cpu usage” for the Java Virtual Machine process: https://samza.apache.org/learn/documentation/1.6.0/operations/monitoring.html - Having a new metric could be consistent with how Samza exposes physical-memory-mb . This memory related metric which includes child process is available for Samza container only, the AM doesn’t have it. - Implementation/Maintainance complexity. If keep using the existing `process-cpu-usage` we will have to deprecate the JVMMetrics implementation and add a new hook into AM to expose the metric. Good point on the doc, let me update the document. ## samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala: ## @@ -49,6 +49,7 @@ class SamzaContainerMetrics( val executorWorkFactor = newGauge("executor-work-factor", 1.0) val physicalMemoryMb = newGauge("physical-memory-mb", 0.0F) val physicalMemoryUtilization = newGauge("physical-memory-utilization", 0.0F) + val totalProcessCpuUsage = newGauge("total-process-cpu-usage", 0.0) Review Comment: Prefer to create a new metric(e.g. total-process-cpu-usage) to represent the cpu usage for the Samza container’s JVM process and all its child/descendant processes because of - I view it as a non-backward compatible change if let the existing “process-cpu-usage” metric to include child/descendant processes especially for non-JVM processes. - Samza official doc clearly states that the `process-cpu-usage` returns the “recent cpu usage” for the Java Virtual Machine process: https://samza.apache.org/learn/documentation/1.6.0/operations/monitoring.html - Having a new metric could be consistent with how Samza exposes `physical-memory-mb`. This memory related metric which includes child process is available for Samza container only, the AM doesn’t have it. - Implementation/Maintainance complexity. If keep using the existing `process-cpu-usage` we will have to deprecate the JVMMetrics implementation and add a new hook into AM to expose the metric. Good point on the doc, let me update the document. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014527935 ## samza-core/src/main/java/org/apache/samza/container/host/PosixCommandBasedStatisticsGetter.java: ## @@ -85,4 +86,10 @@ public SystemMemoryStatistics getSystemMemoryStatistics() { return null; } } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +throw new NotImplementedException( +"No appropriate Posix command available for getting recent CPU usage information. For example, the CPU information exposed by ps command 'ps -o %cpu= -p ' represents the percentage of time spent running during the entire lifetime of a process not for the recent CPU usage"); + } Review Comment: Yes, `top` can give the instant/recent CPU usage information for a given process, but it has below cons so that I didn't choose it: - Need to have operating system specific implementation, since the top command is not a POSIX command and its input and output format in Linux and macOS are different. - Slower than ps command. Especially, on macOS the top command needs to get at least two samples to get the right CPU usage, which takes longer time(~ 2 seconds) than 1 sample. - not clear the reason, but it shows the consistent behavior that one sample(-l1) always return 0.0 on my macOS -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014527343 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); + } + + private List getCurrentProcessAndChildProcesses() { +final List processes = new ArrayList<>(); +// get current process +processes.add(os.getProcess(os.getProcessId())); +// get all child processes of current process +processes.addAll(os.getChildProcesses(os.getProcessId(), OperatingSystem.ProcessFiltering.ALL_PROCESSES, +OperatingSystem.ProcessSorting.NO_SORTING, 0)); +return processes; + } + + private double getTotalCPUUsage(List processes) { +return processes.stream() +.mapToDouble(p -> p.getProcessCpuLoadBetweenTicks(previousProcessSnapshots.get(p.getProcessID( Review Comment: No, it won't: https://github.com/oshi/oshi/blob/master/oshi-core/src/main/java/oshi/software/common/AbstractOSProcess.java#L52 ```java @Override public double getProcessCpuLoadBetweenTicks(OSProcess priorSnapshot) { if (priorSnapshot != null && this.processID == priorSnapshot.getProcessID() && getUpTime() > priorSnapshot.getUpTime()) { return (getUserTime() - priorSnapshot.getUserTime() + getKernelTime() - priorSnapshot.getKernelTime()) / (double) (getUpTime() - priorSnapshot.getUpTime()); } return getProcessCpuLoadCumulative(); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014525707 ## samza-core/src/main/java/org/apache/samza/container/host/SystemStatisticsGetter.java: ## @@ -30,4 +30,11 @@ public interface SystemStatisticsGetter { * @return {@link SystemMemoryStatistics} for the Samza container */ SystemMemoryStatistics getSystemMemoryStatistics(); + + /** + * Returns the {@link ProcessCPUStatistics} for the current Samza container process(includes its child processes) + * + * @return {@link ProcessCPUStatistics} for the Samza container process + */ + ProcessCPUStatistics getProcessCPUStatistics(); Review Comment: Exactly that's the point why I decided to use `ProcessCPUStats` which is trying to avoid confusion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014525560 ## samza-core/src/main/java/org/apache/samza/container/host/SystemStatistics.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import java.util.Objects; + +/** + * A {@link SystemStatistics} object represents system related information about the physical process that runs the + * {@link org.apache.samza.container.SamzaContainer}. + */ +public class SystemStatistics { + + private final ProcessCPUStatistics cpuStatistics; + private final SystemMemoryStatistics memoryStatistics; + + public SystemStatistics(ProcessCPUStatistics cpuStatistics, SystemMemoryStatistics memoryStatistics) { Review Comment: The actual logic to handle the null is inside the Listener implementaiton: https://github.com/apache/samza/pull/1636/files#diff-f79781ad4c55ae7860829b06fd9dfd15e8069c37e64f8854d8f27ca2cd1f3ee5R653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014525279 ## samza-core/src/main/java/org/apache/samza/container/host/StatisticsMonitorImpl.java: ## @@ -117,23 +118,23 @@ public void run() { } private void sampleStatistics() { -SystemMemoryStatistics statistics = null; +SystemMemoryStatistics memoryStatistics = null; +ProcessCPUStatistics cpuStatistics = null; try { - statistics = statisticsGetter.getSystemMemoryStatistics(); + memoryStatistics = statisticsGetter.getSystemMemoryStatistics(); + cpuStatistics = statisticsGetter.getProcessCPUStatistics(); } catch (Throwable e) { LOG.error("Error during obtaining statistics: ", e); } - +SystemStatistics systemStatistics = new SystemStatistics(cpuStatistics, memoryStatistics); for (Listener listener : listenerSet.keySet()) { - if (statistics != null) { Review Comment: I moved the null check in the listener: https://github.com/apache/samza/pull/1636/files#diff-f79781ad4c55ae7860829b06fd9dfd15e8069c37e64f8854d8f27ca2cd1f3ee5R653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014524771 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); Review Comment: There is already a try-catch block to catch all the exceptions: https://github.com/apache/samza/pull/1636/files#diff-3b72fed01e1d344575c2eaf6dd353a69ae1fcb8c304c0ebb7543ab23ded6ad9fR126, it seems not necessary to do it here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014523821 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); Review Comment: The child processes might be different not only for start-up but also for runtime, think about a case that some child processes are died because of exceptions or they are completed. I think we'd better get the real-time information. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014523336 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); Review Comment: My understanding is that the current Samza implementation only runs one thread to trigger all SystemStatisticsGetter implementations, so there won't be any race conditions. And in general, for such container/system/global metric, it doesn't make sense that we run multiple threads to collect data points. If we update the class as NotThreadSafe as you suggested in the first comment, will it help? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014522272 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw new NotImplementedException("Not implemented"); + } + + @Override + public ProcessCPUStatistics getProcessCPUStatistics() { +final List currentProcessAndChildProcesses = getCurrentProcessAndChildProcesses(); +final double totalCPUUsage = getTotalCPUUsage(currentProcessAndChildProcesses); +refreshProcessSnapshots(currentProcessAndChildProcesses); +return new ProcessCPUStatistics(100d * totalCPUUsage / cpuCount); Review Comment: The existing CPU reporting actually takes into account the core-count on the host, right? Because it leverages on Java function `OperatingSystemMXBean. getProcessCpuLoad ()` whose value is a double in the [0.0,1.0] interval. If not, the value could be larger than 1.0. https://docs.oracle.com/javase/8/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuTime-- >Returns the "recent cpu usage" for the Java Virtual Machine process. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that none of the CPUs were running threads from the JVM process during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running threads from the JVM 100% of the time during the recent period being observed. Threads from the JVM include the application threads as well as the JVM internal threads. All values betweens 0.0 and 1.0 are possible depending of the activities going on in the JVM process and the whole system. If the Java Virtual Machine recent CPU usage is not available, the method returns a negative value. ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import
[GitHub] [samza] mynameborat commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
mynameborat commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1013440757 ## samza-core/src/main/java/org/apache/samza/container/host/DefaultSystemStatisticsGetter.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; + + +/** + * An default implementation of {@link SystemStatisticsGetter} that relies on {@link PosixCommandBasedStatisticsGetter} + * and {@link OshiBasedStatisticsGetter} implementations + */ +public class DefaultSystemStatisticsGetter implements SystemStatisticsGetter { + Review Comment: nit: remove extra line ## samza-core/src/main/java/org/apache/samza/container/host/ProcessCPUStatistics.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import java.util.Objects; + + +/** + * A {@link ProcessCPUStatistics} object represents recent CPU usage percentage about the container process(including its child processes) + */ +public class ProcessCPUStatistics { Review Comment: why do we need to wrap this into a container object? ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); + + private final OperatingSystem os; + private final int cpuCount; + + public OshiBasedStatisticsGetter() { +this(new SystemInfo()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(SystemInfo si) { +this(si.getOperatingSystem(), si.getHardware().getProcessor().getPhysicalProcessorCount()); + } + + @VisibleForTesting + OshiBasedStatisticsGetter(OperatingSystem os, int cpuCount) { +this.os = os; +this.cpuCount = cpuCount; + } + + @Override + public SystemMemoryStatistics getSystemMemoryStatistics() { +throw
[GitHub] [samza] lakshmi-manasa-g commented on a diff in pull request #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
lakshmi-manasa-g commented on code in PR #1636: URL: https://github.com/apache/samza/pull/1636#discussion_r1014326963 ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) Review Comment: would be good to more docs on how the Stat getter works - aka previous snapshots, we should also call out if the impl is thread safe and if not, clarify why its not a problem ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import com.google.common.annotations.VisibleForTesting; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.commons.lang3.NotImplementedException; +import oshi.SystemInfo; +import oshi.software.os.OSProcess; +import oshi.software.os.OperatingSystem; + + +/** + * An implementation of {@link SystemStatisticsGetter} that relies on using oshi framework(https://www.oshi.ooo/) + */ +public class OshiBasedStatisticsGetter implements SystemStatisticsGetter { + // the snapshots of current JVM process and its child processes + private final Map previousProcessSnapshots = new HashMap<>(); Review Comment: should we use ConcurrentHashMap? ## samza-core/src/main/java/org/apache/samza/container/host/ProcessCPUStatistics.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.samza.container.host; + +import java.util.Objects; + + +/** + * A {@link ProcessCPUStatistics} object represents recent CPU usage percentage about the container process(including its child processes) Review Comment: small clarification might be good to say that the percentage is against total cpu of host ## samza-core/src/main/java/org/apache/samza/container/host/OshiBasedStatisticsGetter.java: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license
[GitHub] [samza] alnzng commented on pull request #1638: SAMZA-2764: add the metric "container-active-threads" back
alnzng commented on PR #1638: URL: https://github.com/apache/samza/pull/1638#issuecomment-1303916282 thanks for merging it @mynameborat -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat merged pull request #1638: SAMZA-2764: add the metric "container-active-threads" back
mynameborat merged PR #1638: URL: https://github.com/apache/samza/pull/1638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on a diff in pull request #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
mynameborat commented on code in PR #1637: URL: https://github.com/apache/samza/pull/1637#discussion_r1013403151 ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -106,7 +110,10 @@ public void run() { callback.failure(new SamzaException(msg)); } }; - ScheduledFuture scheduledFuture = timer.schedule(timerTask, timeout, TimeUnit.MILLISECONDS); + + final ScheduledFuture scheduledFuture = isDraining && (envelope.isDrain() || envelope.isWatermark()) + ? timer.schedule(timerTask, drainCallbackTimeout, TimeUnit.MILLISECONDS) + : timer.schedule(timerTask, timeout, TimeUnit.MILLISECONDS); Review Comment: Can we add unit tests to the addition to ensure we use the drainTimeout vs timeout appropriately? ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -106,7 +110,10 @@ public void run() { callback.failure(new SamzaException(msg)); } }; - ScheduledFuture scheduledFuture = timer.schedule(timerTask, timeout, TimeUnit.MILLISECONDS); + + final ScheduledFuture scheduledFuture = isDraining && (envelope.isDrain() || envelope.isWatermark()) Review Comment: why do we need both `isDraining` and `envelope.isDrain()`? What happens if the runloop has not propagated its intent but the envelope is draining? Will it be treated as regular message processing? ## samza-core/src/main/java/org/apache/samza/config/TaskConfig.java: ## @@ -90,6 +90,13 @@ public class TaskConfig extends MapConfig { // timeout for triggering a callback public static final String CALLBACK_TIMEOUT_MS = "task.callback.timeout.ms"; static final long DEFAULT_CALLBACK_TIMEOUT_MS = -1L; + + // timeout for triggering a callback during drain + public static final String DRAIN_CALLBACK_TIMEOUT_MS = "task.callback.drain.timeout.ms"; + + // default timeout for triggering a callback during drain + static final long DEFAULT_DRAIN_CALLBACK_TIMEOUT_MS = -1L; Review Comment: What does -1L default denote? Wait forever in case of drain message flow stuck is it? ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -106,7 +110,10 @@ public void run() { callback.failure(new SamzaException(msg)); } }; - ScheduledFuture scheduledFuture = timer.schedule(timerTask, timeout, TimeUnit.MILLISECONDS); + + final ScheduledFuture scheduledFuture = isDraining && (envelope.isDrain() || envelope.isWatermark()) Review Comment: Might be easier to drive this in one place as opposed to split across? e.g., Would adding a new method w/ `createCallback(..., timeout)` fit better here? That way you can keep all the logic of drain and watermark and where we are abstracted from this piece of code (ideally thats how it should be) and then have the caller determine the timeout based on its need. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] xinyuiscool merged pull request #1639: SAMZA-2741: [Pipeline Drain] Fix processing of Drain messages for High-Level and Low-level API
xinyuiscool merged PR #1639: URL: https://github.com/apache/samza/pull/1639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1639: SAMZA-2741: [Pipeline Drain] Fix processing of Drain messages for High-Level and Low-level API
ajothomas commented on code in PR #1639: URL: https://github.com/apache/samza/pull/1639#discussion_r1013123821 ## samza-test/src/test/java/org/apache/samza/test/drain/DrainHighLevelApiIntegrationTest.java: ## @@ -74,20 +74,30 @@ public void describe(StreamApplicationDescriptor appDescriptor) { .map(KV::getValue) .partitionBy(PageView::getMemberId, pv -> pv, KVSerde.of(new IntegerSerde(), new TestTableData.PageViewJsonSerde()), "p1") + .map(kv -> KV.of(kv.getKey() * 31, kv.getValue())) + .partitionBy(KV::getKey, KV::getValue, KVSerde.of(new IntegerSerde(), new TestTableData.PageViewJsonSerde()), "p2") .sink((m, collector, coordinator) -> { RECEIVED.add(m.getValue()); }); } } - // The test can be occasionally flaky, so we set Ignore annotation - // Remove ignore annotation and run the test as follows: - // ./gradlew :samza-test:test --tests org.apache.samza.test.drain.DrainHighLevelApiIntegrationTest -PscalaSuffix=2.12 + /** + * This test will test drain and consumption of some messages from the in-memory topic. + * In order to simulate the real-world behaviour of drain, the test adds messages to the in-memory topic buffer periodically + * in a delayed fashion instead of all at once. The test then writes the drain notification message to the in-memory + * metadata store to drain and stop the pipeline. This write is done shortly after the pipeline starts and before all + * the messages are written to the topic's buffer. As a result, the total count of the processed messages will be less + * than the expected count of messages. + * */ @Ignore @Test - public void testPipeline() { + public void testDrain() { Review Comment: Tests have been turned on now after a few tweaks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1639: SAMZA-2741: [Pipeline Drain] Fix processing of Drain messages for High-Level and Low-level API
ajothomas commented on code in PR #1639: URL: https://github.com/apache/samza/pull/1639#discussion_r1013123456 ## samza-core/src/main/java/org/apache/samza/container/RunLoop.java: ## @@ -875,49 +876,40 @@ private boolean shouldDrain() { return false; } - if (!pendingEnvelopeQueue.isEmpty()) { -PendingEnvelope pendingEnvelope = pendingEnvelopeQueue.peek(); -IncomingMessageEnvelope envelope = pendingEnvelope.envelope; + if (pendingEnvelopeQueue.size() > 0) { +final PendingEnvelope pendingEnvelope = pendingEnvelopeQueue.peek(); +final IncomingMessageEnvelope envelope = pendingEnvelope.envelope; if (envelope.isDrain()) { final DrainMessage message = (DrainMessage) envelope.getMessage(); if (!message.getRunId().equals(runId)) { -// Removing the drain message from the pending queue as it doesn't match with the current runId -// Removing it will ensure that it is not picked up by process() -pendingEnvelopeQueue.remove(); +// Removing the drain message from the pending queue as it doesn't match with the current deployment +final PendingEnvelope discardedDrainMessage = pendingEnvelopeQueue.remove(); + consumerMultiplexer.tryUpdate(discardedDrainMessage.envelope.getSystemStreamPartition()); } else { +// Found drain message matching the current deployment + // set the RunLoop to drain mode if (!isDraining) { drain(); } -if (elasticityFactor <= 1) { - SystemStreamPartition ssp = envelope.getSystemStreamPartition(); - processingSspSetToDrain.remove(ssp); -} else { - // SystemConsumers will write only one envelope (enclosing DrainMessage) per SSP in its buffer. - // This envelope doesn't have keybucket info it's SSP. With elasticity, the same SSP can be processed by - // multiple tasks. Therefore, if envelope contains drain message, the ssp of envelope should be removed - // from task's processing set irrespective of keyBucket. - SystemStreamPartition sspOfEnvelope = envelope.getSystemStreamPartition(); - Optional ssp = processingSspSetToDrain.stream() - .filter(sspInSet -> sspInSet.getSystemStream().equals(sspOfEnvelope.getSystemStream()) - && sspInSet.getPartition().equals(sspOfEnvelope.getPartition())) - .findFirst(); - ssp.ifPresent(processingSspSetToDrain::remove); -} if (!hasIntermediateStreams) { - // Don't remove from the pending queue as we want the DAG to pick up Drain message and propagate it to - // intermediate streams + // The flow below only applies to samza low-level API + + // For high-level API, we do not remove the message from pending queue. + // It will be picked by the process flow instead of drain flow, as we want the drain control message + // to be processed by the High-level API Operator DAG. + + processingSspSetToDrain.remove(envelope.getSystemStreamPartition()); pendingEnvelopeQueue.remove(); + return processingSspSetToDrain.isEmpty(); Review Comment: Changed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1639: SAMZA-2741: [Pipeline Drain] Fix processing of Drain messages for High-Level and Low-level API
ajothomas commented on code in PR #1639: URL: https://github.com/apache/samza/pull/1639#discussion_r1013123301 ## samza-core/src/main/java/org/apache/samza/container/RunLoop.java: ## @@ -875,49 +876,40 @@ private boolean shouldDrain() { return false; } - if (!pendingEnvelopeQueue.isEmpty()) { -PendingEnvelope pendingEnvelope = pendingEnvelopeQueue.peek(); -IncomingMessageEnvelope envelope = pendingEnvelope.envelope; + if (pendingEnvelopeQueue.size() > 0) { +final PendingEnvelope pendingEnvelope = pendingEnvelopeQueue.peek(); +final IncomingMessageEnvelope envelope = pendingEnvelope.envelope; if (envelope.isDrain()) { final DrainMessage message = (DrainMessage) envelope.getMessage(); if (!message.getRunId().equals(runId)) { -// Removing the drain message from the pending queue as it doesn't match with the current runId -// Removing it will ensure that it is not picked up by process() -pendingEnvelopeQueue.remove(); +// Removing the drain message from the pending queue as it doesn't match with the current deployment +final PendingEnvelope discardedDrainMessage = pendingEnvelopeQueue.remove(); + consumerMultiplexer.tryUpdate(discardedDrainMessage.envelope.getSystemStreamPartition()); } else { +// Found drain message matching the current deployment + // set the RunLoop to drain mode if (!isDraining) { drain(); } -if (elasticityFactor <= 1) { - SystemStreamPartition ssp = envelope.getSystemStreamPartition(); - processingSspSetToDrain.remove(ssp); -} else { - // SystemConsumers will write only one envelope (enclosing DrainMessage) per SSP in its buffer. - // This envelope doesn't have keybucket info it's SSP. With elasticity, the same SSP can be processed by - // multiple tasks. Therefore, if envelope contains drain message, the ssp of envelope should be removed - // from task's processing set irrespective of keyBucket. - SystemStreamPartition sspOfEnvelope = envelope.getSystemStreamPartition(); - Optional ssp = processingSspSetToDrain.stream() - .filter(sspInSet -> sspInSet.getSystemStream().equals(sspOfEnvelope.getSystemStream()) - && sspInSet.getPartition().equals(sspOfEnvelope.getPartition())) - .findFirst(); - ssp.ifPresent(processingSspSetToDrain::remove); -} if (!hasIntermediateStreams) { Review Comment: Changed this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas commented on a diff in pull request #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
ajothomas commented on code in PR #1637: URL: https://github.com/apache/samza/pull/1637#discussion_r1012008633 ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -76,7 +76,7 @@ List update(TaskCallbackImpl cb) { private final TaskCallbacks completedCallbacks = new TaskCallbacks(); private final ScheduledExecutorService timer; private final TaskCallbackListener listener; - private final long timeout; + private long timeout; Review Comment: Fixed this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] xinyuiscool commented on a diff in pull request #1639: SAMZA-2741: [Pipeline Drain] Fix processing of Drain messages for High-Level and Low-level API
xinyuiscool commented on code in PR #1639: URL: https://github.com/apache/samza/pull/1639#discussion_r1010901435 ## samza-test/src/test/java/org/apache/samza/test/drain/DrainHighLevelApiIntegrationTest.java: ## @@ -74,20 +74,30 @@ public void describe(StreamApplicationDescriptor appDescriptor) { .map(KV::getValue) .partitionBy(PageView::getMemberId, pv -> pv, KVSerde.of(new IntegerSerde(), new TestTableData.PageViewJsonSerde()), "p1") + .map(kv -> KV.of(kv.getKey() * 31, kv.getValue())) + .partitionBy(KV::getKey, KV::getValue, KVSerde.of(new IntegerSerde(), new TestTableData.PageViewJsonSerde()), "p2") .sink((m, collector, coordinator) -> { RECEIVED.add(m.getValue()); }); } } - // The test can be occasionally flaky, so we set Ignore annotation - // Remove ignore annotation and run the test as follows: - // ./gradlew :samza-test:test --tests org.apache.samza.test.drain.DrainHighLevelApiIntegrationTest -PscalaSuffix=2.12 + /** + * This test will test drain and consumption of some messages from the in-memory topic. + * In order to simulate the real-world behaviour of drain, the test adds messages to the in-memory topic buffer periodically + * in a delayed fashion instead of all at once. The test then writes the drain notification message to the in-memory + * metadata store to drain and stop the pipeline. This write is done shortly after the pipeline starts and before all + * the messages are written to the topic's buffer. As a result, the total count of the processed messages will be less + * than the expected count of messages. + * */ @Ignore @Test - public void testPipeline() { + public void testDrain() { Review Comment: Seems most of the integration tests are ignored due to relying on wall-clock time. Is it possible to enable a couple and we don't need to do this timed wait? Not sure this TestRunner support waitForFinish(). but samza runners have this api so in theory we should be able to use it. ## samza-core/src/main/java/org/apache/samza/container/RunLoop.java: ## @@ -875,49 +876,40 @@ private boolean shouldDrain() { return false; } - if (!pendingEnvelopeQueue.isEmpty()) { -PendingEnvelope pendingEnvelope = pendingEnvelopeQueue.peek(); -IncomingMessageEnvelope envelope = pendingEnvelope.envelope; + if (pendingEnvelopeQueue.size() > 0) { +final PendingEnvelope pendingEnvelope = pendingEnvelopeQueue.peek(); +final IncomingMessageEnvelope envelope = pendingEnvelope.envelope; if (envelope.isDrain()) { final DrainMessage message = (DrainMessage) envelope.getMessage(); if (!message.getRunId().equals(runId)) { -// Removing the drain message from the pending queue as it doesn't match with the current runId -// Removing it will ensure that it is not picked up by process() -pendingEnvelopeQueue.remove(); +// Removing the drain message from the pending queue as it doesn't match with the current deployment +final PendingEnvelope discardedDrainMessage = pendingEnvelopeQueue.remove(); + consumerMultiplexer.tryUpdate(discardedDrainMessage.envelope.getSystemStreamPartition()); } else { +// Found drain message matching the current deployment + // set the RunLoop to drain mode if (!isDraining) { drain(); } -if (elasticityFactor <= 1) { - SystemStreamPartition ssp = envelope.getSystemStreamPartition(); - processingSspSetToDrain.remove(ssp); -} else { - // SystemConsumers will write only one envelope (enclosing DrainMessage) per SSP in its buffer. - // This envelope doesn't have keybucket info it's SSP. With elasticity, the same SSP can be processed by - // multiple tasks. Therefore, if envelope contains drain message, the ssp of envelope should be removed - // from task's processing set irrespective of keyBucket. - SystemStreamPartition sspOfEnvelope = envelope.getSystemStreamPartition(); - Optional ssp = processingSspSetToDrain.stream() - .filter(sspInSet -> sspInSet.getSystemStream().equals(sspOfEnvelope.getSystemStream()) - && sspInSet.getPartition().equals(sspOfEnvelope.getPartition())) - .findFirst(); - ssp.ifPresent(processingSspSetToDrain::remove); -} if (!hasIntermediateStreams) { - // Don't remove from the pending queue as we want the DAG to pick up Drain message and propagate it to - // intermediate streams + // The flow below only applies to samza low-level API + + // For high-level API, we do not remove the
[GitHub] [samza] ajothomas opened a new pull request, #1639: SAMZA-2741: Pipeline Drain- Fix processing of Drain messages for High-Level and Low-level API
ajothomas opened a new pull request, #1639: URL: https://github.com/apache/samza/pull/1639 # Symptoms and Cause: There were a few issues with the processing of drain messages that this PR attempts to fix. The following issues were encountered during the end-to-end testing with test pipelines. 1. It was observed that the `RunLoop` was running into an exception when the tasks were trying to consume from the intermediate stream. This was happening as we stop all consumers when we encounter drain. This also stops the consumers which ingest data from the intermediate streams. After 1 was resolved, some additional issues were observed- 2. The pipeline was getting stuck after partial consumption of data after the last shuffle stage. This was happening as we were removing the SSPs from the processing set of SSPs for a task for high-level API processing. This was leading to unprocessed drain messages and thereby the pipeline would remain stuck. 3. The pipeline was stuck if a drain message was encountered prior to the start of the pipeline. This is happening as we set the `SystemConsumers` to drain mode before it is even started by `SamzaContainer` # Changes: 1. Changed the `SystemConsumers` code to remove the line which stopped registered `SystemConsumer`s on drain. 2. Restrict the logic to remove the SSP from the processing SSP set only to high-level API. Additionally, ask the `TaskCoordinator` to commit the task once all streams for a task have drained. 3. To fix the issue of pipeline getting stuck if a drain control message was present prior to container start, we write drain and watermark control messages on `SystemConsumers` start if it is in drain mode. Additionally, Drain related integration tests (`DrainLowLevelApiIntegrationTest` and `DrainHighLevelApiIntegrationTest`) have been enabled as most of the flakiness and edge-cases have been fixed now. There is a small chance that both the tests can be flaky on rare occasions as `TestRunner.run` and in-memory metadata store writes are happening in separate threads. Despite adding a generous 5 seconds delay for metadata store writes, it is possible for the order to get mixed up which causes the test to fail. Added a test rule `RetryRule` to attempt a retry if the tests fail the first time. # Tests: - End to end tests for both high-level and low-level API - Unit tests for Drain in RunLoop - Integ test for Drain for High-level and Low-Level API- DrainLowLevelApiIntegrationTest & DrainHighLevelApiIntegrationTest # API changes: None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng opened a new pull request, #1638: SAMZA-2764: add the metric "container-active-threads" back
alnzng opened a new pull request, #1638: URL: https://github.com/apache/samza/pull/1638 # Symptom No data was emitted for the metric `container-active-threads` # Cause This PR(#1501 ) removed the logic to emit data for metric accidently: https://github.com/apache/samza/pull/1501/files#diff-f79781ad4c55ae7860829b06fd9dfd15e8069c37e64f8854d8f27ca2cd1f3ee5L637 # Changes - Add the deleted logic back in a new class`SamzaContainerMonitorListener` # Tests - Unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on a diff in pull request #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
mynameborat commented on code in PR #1637: URL: https://github.com/apache/samza/pull/1637#discussion_r1010012333 ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -76,7 +76,7 @@ List update(TaskCallbackImpl cb) { private final TaskCallbacks completedCallbacks = new TaskCallbacks(); private final ScheduledExecutorService timer; private final TaskCallbackListener listener; - private final long timeout; + private long timeout; Review Comment: In general, its anti pattern to override instance variables post construction. We should probably find another way to enforce timeout for drain. ## samza-core/src/main/java/org/apache/samza/config/TaskConfig.java: ## @@ -90,6 +90,13 @@ public class TaskConfig extends MapConfig { // timeout for triggering a callback public static final String CALLBACK_TIMEOUT_MS = "task.callback.timeout.ms"; static final long DEFAULT_CALLBACK_TIMEOUT_MS = -1L; + + // timeout for triggering a callback during drain + public static final String DRAIN_CALLBACK_TIMEOUT_MS = "task.callback.drain.timeout.ms"; + + // default timeout for triggering a callback during drain + static final long DEFAULT_DRAIN_CALLBACK_TIMEOUT_MS = Duration.ofMinutes(30).toMillis(); Review Comment: The default seems too high for applications to get notified in case of faults. What is the rationale behind choosing such a high value? Is there a systematic way to compute the value here? Like what are the downstream dependencies of this function and what SLAs to account for when computing the value for this. Document all of these as part of the configuration documentation as well. ## samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java: ## @@ -127,4 +127,14 @@ public List updateCallback(TaskCallbackImpl callback) { return ImmutableList.of(callback); } } + + /** + * Override the timeout set in the callback manager with the given new timeout. + * This is intended to be used with pipeline drain as we want to override the existing timeout with a higher timeout. + * + * @param timeout new timeout for process callbacks + * */ + public void updateTaskCallbackTimeout(long timeout) { Review Comment: Why are we overriding the existing timeout set as part of the construction of the class? This is confusing and breaks the timeout originally configured that gets passed to the callback manager (which is `task.callback.timeout.ms`). With this change, we are changing user behavior with setting the default message processing timeout latency to 30 minutes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] ajothomas opened a new pull request, #1637: SAMZA-2765: [Pipeline Drain] Adding config for task callback timeout during drain
ajothomas opened a new pull request, #1637: URL: https://github.com/apache/samza/pull/1637 # Improvement: This PR is a part of the Pipeline Drain work and aims to provide a way to specify a task callback timeout to be used for pipeline drain. The standard `task.callback.timeout.ms` parameter might be too small for the drain operation to complete. Drain operation will involve clearing of intermediate state which might require a larger timeout for the task callback. We need a way to provide a configurable timeout to be used during drain operation. # Changes: - Add a new config `task.callback.drain.timeout.ms` in `TaskConfig` - Added logic to override `TaskCallbackManager`'s timeout parameter in `TaskWorker`s on drain. # Tests: - Unit test changes for `TaskConfig` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] Treydone commented on pull request #1633: Add GraalSystems in "powered-by"
Treydone commented on PR #1633: URL: https://github.com/apache/samza/pull/1633#issuecomment-1297695724 Sorry, I can't build the site locally... Even from the master branch with the command `bundle exec jekyll build,` I got an error ``` jekyll 3.4.5 | Error: No header received back. Traceback (most recent call last): 49: from /usr/local/bin/jekyll:23:in `' 48: from /usr/local/bin/jekyll:23:in `load' 47: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/exe/jekyll:13:in `' 46: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program' 45: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go' 44: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute' 43: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each' 42: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute' 41: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:16:in `block (2 levels) in init_with_program' 40: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:34:in `process' 39: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:63:in `build' 38: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/command.rb:26:in `process_site' 37: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:69:in `process' 36: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:190:in `render' 35: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:449:in `render_docs' 34: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:449:in `each' 33: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:450:in `block in render_docs' 32: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:450:in `each' 31: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:452:in `block (2 levels) in render_docs' 30: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/renderer.rb:82:in `run' 29: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/renderer.rb:134:in `render_liquid' 28: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/liquid_renderer/file.rb:26:in `render!' 27: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/liquid_renderer/file.rb:47:in `measure_time' 26: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/liquid_renderer/file.rb:27:in `block in render!' 25: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/liquid_renderer/file.rb:40:in `measure_bytes' 24: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/liquid_renderer/file.rb:28:in `block (2 levels) in render!' 23: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/template.rb:222:in `render!' 22: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/template.rb:209:in `render' 21: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/template.rb:262:in `with_profiling' 20: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/template.rb:210:in `block in render' 19: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/block.rb:108:in `render' 18: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/block.rb:122:in `render_all' 17: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/block.rb:122:in `each' 16: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/block.rb:135:in `block in render_all' 15: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/profiler/hooks.rb:4:in `render_token_with_profiling' 14: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/profiler.rb:80:in `profile_token_render' 13: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/profiler/hooks.rb:5:in `block in render_token_with_profiling' 12: from /var/lib/gems/2.7.0/gems/liquid-3.0.6/lib/liquid/block.rb:151:in `render_token' 11: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/tags/highlight.rb:39:in `render' 10: from /var/lib/gems/2.7.0/gems/jekyll-3.4.5/lib/jekyll/tags/highlight.rb:89:in `render_pygments' 9: from /usr/lib/ruby/2.7.0/forwardable.rb:235:in `highlight' 8: from /var/lib/gems/2.7.0/gems/pygments.rb-1.2.1/lib/pygments/popen.rb:236:in `highlight' 7: from /var/lib/gems/2.7.0/gems/pygments.rb-1.2.1/lib/pygments/popen.rb:254:in `mentos' 6: from /usr/lib/ruby/2.7.0/timeout.rb:110:in `timeout' 5: from /usr/lib/ruby/2.7.0/timeout.rb:33:in `catch' 4: from /usr/lib/ruby/2.7.0/timeout.rb:33:in `catch' 3: from
[GitHub] [samza] mynameborat commented on pull request #1635: SAMZA-2763: Support worker JVM opts for Samza Beam portable mode
mynameborat commented on PR #1635: URL: https://github.com/apache/samza/pull/1635#issuecomment-1292317691 We will revisit this when we have Samza Beam portable support in oSS -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat closed pull request #1635: SAMZA-2763: Support worker JVM opts for Samza Beam portable mode
mynameborat closed pull request #1635: SAMZA-2763: Support worker JVM opts for Samza Beam portable mode URL: https://github.com/apache/samza/pull/1635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] alnzng opened a new pull request, #1636: SAMZA-2762: new cpu usage metric which counts child processes usage
alnzng opened a new pull request, #1636: URL: https://github.com/apache/samza/pull/1636 # Symptom We have observed that some use cases used quasar(TensorFlow framework) to do model inference and this framework spawn child processes(non-JVM) to run TensorFlow serving. These child processes were using high CPU usage(200%) however their CPU usage can't be captured by the existing CPU usage metric `process-cpu-usage` # Cause The existing metric `process-cpu-usage` metric was designed for capturing the [CPU usage for the JVM process](https://samza.apache.org/learn/documentation/1.6.0/operations/monitoring.html) only, it can't count the child processes(especially for non-JVM processes) usage. # Changes - Reply on [oshi framwork](https://www.oshi.ooo/) to capture the CPU usage for the JVM process and all its child processes, and create a new metric to display the total CPU usage. # API Changes - Added a new metric `total-process-cpu-usage` in `SamzaContainerMetrics` which is similar with [how we provided `physical-memory-mb` metric](https://github.com/apache/samza/pull/1530) # Tests - Unit tests - Tested with `samza-hello-samza` and verify the metric data points ![Screen Shot 2022-10-25 at 10 23 58 PM](https://user-images.githubusercontent.com/59407935/197942249-21dae598-7e18-4bb0-88e5-6a752ca49765.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat opened a new pull request, #1635: SAMZA-2763: Support worker JVM opts for Samza Beam portable mode
mynameborat opened a new pull request, #1635: URL: https://github.com/apache/samza/pull/1635 **Summary**: Support JVM options for worker process in Samza Beam portable mode **Description**: With portable mode support for Samza Beam, we want to tune and configure the JVM options for worker process. In this PR, we add support by introducing `worker.opts` configuration. **Changes**: - Added `worker.opts` configuration - Updated configuration table and website **API Changes**: None **Usage Instructions**: `worker.opts` can be used similar to other samza application configuration although it only applies to Samza Beam portable execution mode and is ignored otherwise. **Upgrade Instructions**: None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] sborya merged pull request #1634: Update zookeeper and kazoo version
sborya merged PR #1634: URL: https://github.com/apache/samza/pull/1634 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [samza] mynameborat commented on pull request #1633: Add GraalSystems in "powered-by"
mynameborat commented on PR #1633: URL: https://github.com/apache/samza/pull/1633#issuecomment-1281000897 can you publish the website locally to make sure the changes look as expected? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@samza.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org