Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


leerho merged PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#issuecomment-3701471850

   @leerho this PR is mergable. I'll have a follow-up to use the new codec 
utils for frequencies serde impls. But I don't extend this PR further to make 
it too mixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654818591


##
datasketches/src/resize.rs:
##
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+/// For the Families that accept this configuration parameter, it controls the 
size multiple that
+/// affects how fast the internal cache grows, when more space is required.
+///
+/// For Theta Sketches, the Resize Factor is a dynamic, speed performance vs. 
memory size tradeoff.
+/// The sketches created on-heap and configured with a Resize Factor of > X1 
start out with an
+/// internal hash table size that is the smallest submultiple of the target 
Nominal Entries
+/// and larger than the minimum required hash table size for that sketch.
+///
+/// When the sketch needs to be resized larger, then the Resize Factor is used 
as a multiplier of
+/// the current sketch cache array size.
+///
+/// "X1" means no resizing is allowed and the sketch will be initialized at 
full size.
+///
+/// "X2" means the internal cache will start very small and double in size 
until the target size is
+/// reached.
+///
+/// Similarly, "X4" is a factor of 4 and "X8" is a factor of 8.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum ResizeFactor {
+/// Do not resize. Sketch will be configured to full size.
+X1,
+/// Resize by factor of 2
+X2,
+/// Resize by factor of 4
+X4,
+/// Resize by factor of 8
+X8,
+}

Review Comment:
   Moved `ResizeFactor` as a top-level concept. This is shared among 
ThetaSketch, TupleSketch, and several sampling sketches.
   
   cc @leerho for double checking that put `ResizeFactor` a top-level exported 
symbol looks good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654651691


##
datasketches/src/frequencies/mod.rs:
##
@@ -29,7 +29,6 @@ mod serde;
 mod serialization;
 mod sketch;
 
-pub use self::serde::ItemsSerde;

Review Comment:
   To @PsiACE,
   
   No need `ItemsSerde` if we already implement serde over concrete 
`FrequentItemsSketch` and `FrequentItemsSketch`.
   
   But the C++ impl does have a Serde abstraction you may investigate in 
https://github.com/apache/datasketches-cpp/blob/7bb979d3/common/include/serde.hpp



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654655382


##
datasketches/src/frequencies/mod.rs:
##
@@ -29,7 +29,6 @@ mod serde;
 mod serialization;
 mod sketch;
 
-pub use self::serde::ItemsSerde;

Review Comment:
   cc @AlexanderSaydakov I'd appreciate it if you can share the story and usage 
of `serde.hpp`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654654273


##
datasketches/src/theta/mod.rs:
##
@@ -32,4 +32,6 @@
 mod hash_table;
 mod sketch;
 
+pub use self::hash_table::ResizeFactor;
 pub use self::sketch::ThetaSketch;
+pub use self::sketch::ThetaSketchBuilder;

Review Comment:
   To @ZENOTME,
   
   `ThetaSketchBuilder` and `ResizeFactor` are needed because 
`ThetaSketch::builder` exposes the builder, which has a `resize_factor` method 
that only accepts a `ResizeFactor` instance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654654273


##
datasketches/src/theta/mod.rs:
##
@@ -32,4 +32,6 @@
 mod hash_table;
 mod sketch;
 
+pub use self::hash_table::ResizeFactor;
 pub use self::sketch::ThetaSketch;
+pub use self::sketch::ThetaSketchBuilder;

Review Comment:
   `ThetaSketchBuilder` and `ResizeFactor` are needed because 
`ThetaSketch::builder` expose the builder and the builder has a `resize_factor` 
method that only accepts `ResizeFactor` instance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654653263


##
datasketches/tests/frequencies_update_test.rs:
##
@@ -479,20 +479,6 @@ fn test_longs_reset() {
 assert_eq!(sketch.lg_max_map_size(), 3);
 }
 
-#[test]
-#[should_panic(expected = "count may not be negative")]
-fn test_longs_negative_count_panics() {
-let mut sketch: FrequentItemsSketch = FrequentItemsSketch::new(8);
-sketch.update_with_count(1, -1);
-}
-
-#[test]
-#[should_panic(expected = "count may not be negative")]
-fn test_items_negative_count_panics() {
-let mut sketch = FrequentItemsSketch::new(8);
-sketch.update_with_count("a".to_string(), -1);
-}
-

Review Comment:
   Never panic by type check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654652501


##
datasketches/src/frequencies/reverse_purge_item_hash_map.rs:
##
@@ -118,16 +118,16 @@ impl ReversePurgeItemHashMap {
 /// Shifts all values by `adjust_amount`.
 ///
 /// This is used during purges to decrement counters.
-pub fn adjust_all_values_by(&mut self, adjust_amount: i64) {

Review Comment:
   Revoke `pub` when it's only used in the same file. Less publicity, easier 
reasoning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654652019


##
datasketches/src/frequencies/reverse_purge_item_hash_map.rs:
##
@@ -35,7 +35,7 @@ pub(super) struct ReversePurgeItemHashMap {
 lg_length: u8,
 load_threshold: usize,
 keys: Vec>,
-values: Vec,
+values: Vec,

Review Comment:
   All offsets and counts are non-negative. Use `u64` for all these places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] chore: fine tune frequencies and theta sketches (datasketches-rust)

2025-12-30 Thread via GitHub


tisonkun commented on code in PR #51:
URL: https://github.com/apache/datasketches-rust/pull/51#discussion_r2654651691


##
datasketches/src/frequencies/mod.rs:
##
@@ -29,7 +29,6 @@ mod serde;
 mod serialization;
 mod sketch;
 
-pub use self::serde::ItemsSerde;

Review Comment:
   To @PsiACE,
   
   No need `ItemsSerde` if we already implement serde over concrete 
`FrequentItemsSketch` and `FrequentItemsSketch`.
   
   But the C++ impl does have a Serde abstraction you may investigate in 
https://github.com/apache/datasketches-cpp/blob/master/common/include/serde.hpp



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]