Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
leerho merged PR #51: URL: https://github.com/apache/datasketches-rust/pull/51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on PR #51: URL: https://github.com/apache/datasketches-rust/pull/51#issuecomment-3701471850 @leerho this PR is mergable. I'll have a follow-up to use the new codec utils for frequencies serde impls. But I don't extend this PR further to make it too mixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654818591
##
datasketches/src/resize.rs:
##
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+/// For the Families that accept this configuration parameter, it controls the
size multiple that
+/// affects how fast the internal cache grows, when more space is required.
+///
+/// For Theta Sketches, the Resize Factor is a dynamic, speed performance vs.
memory size tradeoff.
+/// The sketches created on-heap and configured with a Resize Factor of > X1
start out with an
+/// internal hash table size that is the smallest submultiple of the target
Nominal Entries
+/// and larger than the minimum required hash table size for that sketch.
+///
+/// When the sketch needs to be resized larger, then the Resize Factor is used
as a multiplier of
+/// the current sketch cache array size.
+///
+/// "X1" means no resizing is allowed and the sketch will be initialized at
full size.
+///
+/// "X2" means the internal cache will start very small and double in size
until the target size is
+/// reached.
+///
+/// Similarly, "X4" is a factor of 4 and "X8" is a factor of 8.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum ResizeFactor {
+/// Do not resize. Sketch will be configured to full size.
+X1,
+/// Resize by factor of 2
+X2,
+/// Resize by factor of 4
+X4,
+/// Resize by factor of 8
+X8,
+}
Review Comment:
Moved `ResizeFactor` as a top-level concept. This is shared among
ThetaSketch, TupleSketch, and several sampling sketches.
cc @leerho for double checking that put `ResizeFactor` a top-level exported
symbol looks good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51: URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654651691 ## datasketches/src/frequencies/mod.rs: ## @@ -29,7 +29,6 @@ mod serde; mod serialization; mod sketch; -pub use self::serde::ItemsSerde; Review Comment: To @PsiACE, No need `ItemsSerde` if we already implement serde over concrete `FrequentItemsSketch` and `FrequentItemsSketch`. But the C++ impl does have a Serde abstraction you may investigate in https://github.com/apache/datasketches-cpp/blob/7bb979d3/common/include/serde.hpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51: URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654655382 ## datasketches/src/frequencies/mod.rs: ## @@ -29,7 +29,6 @@ mod serde; mod serialization; mod sketch; -pub use self::serde::ItemsSerde; Review Comment: cc @AlexanderSaydakov I'd appreciate it if you can share the story and usage of `serde.hpp`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51: URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654654273 ## datasketches/src/theta/mod.rs: ## @@ -32,4 +32,6 @@ mod hash_table; mod sketch; +pub use self::hash_table::ResizeFactor; pub use self::sketch::ThetaSketch; +pub use self::sketch::ThetaSketchBuilder; Review Comment: To @ZENOTME, `ThetaSketchBuilder` and `ResizeFactor` are needed because `ThetaSketch::builder` exposes the builder, which has a `resize_factor` method that only accepts a `ResizeFactor` instance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51: URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654654273 ## datasketches/src/theta/mod.rs: ## @@ -32,4 +32,6 @@ mod hash_table; mod sketch; +pub use self::hash_table::ResizeFactor; pub use self::sketch::ThetaSketch; +pub use self::sketch::ThetaSketchBuilder; Review Comment: `ThetaSketchBuilder` and `ResizeFactor` are needed because `ThetaSketch::builder` expose the builder and the builder has a `resize_factor` method that only accepts `ResizeFactor` instance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654653263
##
datasketches/tests/frequencies_update_test.rs:
##
@@ -479,20 +479,6 @@ fn test_longs_reset() {
assert_eq!(sketch.lg_max_map_size(), 3);
}
-#[test]
-#[should_panic(expected = "count may not be negative")]
-fn test_longs_negative_count_panics() {
-let mut sketch: FrequentItemsSketch = FrequentItemsSketch::new(8);
-sketch.update_with_count(1, -1);
-}
-
-#[test]
-#[should_panic(expected = "count may not be negative")]
-fn test_items_negative_count_panics() {
-let mut sketch = FrequentItemsSketch::new(8);
-sketch.update_with_count("a".to_string(), -1);
-}
-
Review Comment:
Never panic by type check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654652501
##
datasketches/src/frequencies/reverse_purge_item_hash_map.rs:
##
@@ -118,16 +118,16 @@ impl ReversePurgeItemHashMap {
/// Shifts all values by `adjust_amount`.
///
/// This is used during purges to decrement counters.
-pub fn adjust_all_values_by(&mut self, adjust_amount: i64) {
Review Comment:
Revoke `pub` when it's only used in the same file. Less publicity, easier
reasoning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654652019
##
datasketches/src/frequencies/reverse_purge_item_hash_map.rs:
##
@@ -35,7 +35,7 @@ pub(super) struct ReversePurgeItemHashMap {
lg_length: u8,
load_threshold: usize,
keys: Vec>,
-values: Vec,
+values: Vec,
Review Comment:
All offsets and counts are non-negative. Use `u64` for all these places.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)
tisonkun commented on code in PR #51: URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654651691 ## datasketches/src/frequencies/mod.rs: ## @@ -29,7 +29,6 @@ mod serde; mod serialization; mod sketch; -pub use self::serde::ItemsSerde; Review Comment: To @PsiACE, No need `ItemsSerde` if we already implement serde over concrete `FrequentItemsSketch` and `FrequentItemsSketch`. But the C++ impl does have a Serde abstraction you may investigate in https://github.com/apache/datasketches-cpp/blob/master/common/include/serde.hpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
