Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
alamb commented on PR #7865: URL: https://github.com/apache/arrow-rs/pull/7865#issuecomment-3046096988 Thanks again @scovich and @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
alamb merged PR #7865: URL: https://github.com/apache/arrow-rs/pull/7865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2187517979
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
> The new code doesn't insert a new object field's name to the metadata
dictionary until the builder's finishmethod is called.
Doesn't it? I saw that `insert` already modify the metadata builder. It is
happened before `finish` is called, right?
```rust
pub fn insert<'m, 'd, T: Into>>(&mut self, key: &str, value:
T) {
let metadata_builder = self.parent_state.metadata_builder();
let field_id = metadata_builder.upsert_field_name(key); // <- it is
inserted into the metadata builder
}
```
The only difference to previous code, is `check_pending_field` is removed
now so if it is not finished, the field won't be updated to `fields` and no
effect to `finish`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2187517979
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
> The new code doesn't insert a new object field's name to the metadata
dictionary until the builder's finishmethod is called.
Doesn't it? I saw that `insert` already modify the metadata builder. It is
happened before `finish` is called, right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2187512080
##
parquet-variant/src/builder.rs:
##
@@ -626,50 +712,35 @@ impl Drop for ListBuilder<'_> {
/// A builder for creating [`Variant::Object`] values.
///
/// See the examples on [`VariantBuilder`] for usage.
-pub struct ObjectBuilder<'a, 'b> {
-parent_buffer: &'a mut ValueBuffer,
-metadata_builder: &'a mut MetadataBuilder,
+pub struct ObjectBuilder<'a> {
+parent_state: ParentState<'a>,
fields: IndexMap, // (field_id, offset)
buffer: ValueBuffer,
-/// Is there a pending list or object that needs to be finalized?
-pending: Option<(&'b str, usize)>,
validate_unique_fields: bool,
/// Set of duplicate fields to report for errors
duplicate_fields: HashSet,
}
-impl<'a, 'b> ObjectBuilder<'a, 'b> {
-fn new(parent_buffer: &'a mut ValueBuffer, metadata_builder: &'a mut
MetadataBuilder) -> Self {
+impl<'a> ObjectBuilder<'a> {
+fn new(parent_state: ParentState<'a>, validate_unique_fields: bool) ->
Self {
Self {
-parent_buffer,
-metadata_builder,
+parent_state,
fields: IndexMap::new(),
buffer: ValueBuffer::default(),
-pending: None,
-validate_unique_fields: false,
+validate_unique_fields,
duplicate_fields: HashSet::new(),
}
}
-fn check_pending_field(&mut self) {
-let Some(&(field_name, field_start)) = self.pending.as_ref() else {
-return;
-};
-
-let field_id = self.metadata_builder.upsert_field_name(field_name);
-self.fields.insert(field_id, field_start);
-
-self.pending = None;
-}
-
/// Add a field with key and value to the object
///
/// Note: when inserting duplicate keys, the new value overwrites the
previous mapping,
/// but the old value remains in the buffer, resulting in a larger variant
pub fn insert<'m, 'd, T: Into>>(&mut self, key: &str,
value: T) {
-self.check_pending_field();
+// Get metadata_builder from parent state
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
Seems so, looks like the metadata builder doesn't affect the finalization
too much. It is just used to provide unique field id.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2187498765
##
parquet-variant/src/builder.rs:
##
@@ -626,50 +712,35 @@ impl Drop for ListBuilder<'_> {
/// A builder for creating [`Variant::Object`] values.
///
/// See the examples on [`VariantBuilder`] for usage.
-pub struct ObjectBuilder<'a, 'b> {
-parent_buffer: &'a mut ValueBuffer,
-metadata_builder: &'a mut MetadataBuilder,
+pub struct ObjectBuilder<'a> {
+parent_state: ParentState<'a>,
fields: IndexMap, // (field_id, offset)
buffer: ValueBuffer,
-/// Is there a pending list or object that needs to be finalized?
-pending: Option<(&'b str, usize)>,
validate_unique_fields: bool,
/// Set of duplicate fields to report for errors
duplicate_fields: HashSet,
}
-impl<'a, 'b> ObjectBuilder<'a, 'b> {
-fn new(parent_buffer: &'a mut ValueBuffer, metadata_builder: &'a mut
MetadataBuilder) -> Self {
+impl<'a> ObjectBuilder<'a> {
+fn new(parent_state: ParentState<'a>, validate_unique_fields: bool) ->
Self {
Self {
-parent_buffer,
-metadata_builder,
+parent_state,
fields: IndexMap::new(),
buffer: ValueBuffer::default(),
-pending: None,
-validate_unique_fields: false,
+validate_unique_fields,
duplicate_fields: HashSet::new(),
}
}
-fn check_pending_field(&mut self) {
-let Some(&(field_name, field_start)) = self.pending.as_ref() else {
-return;
-};
-
-let field_id = self.metadata_builder.upsert_field_name(field_name);
-self.fields.insert(field_id, field_start);
-
-self.pending = None;
-}
-
/// Add a field with key and value to the object
///
/// Note: when inserting duplicate keys, the new value overwrites the
previous mapping,
/// but the old value remains in the buffer, resulting in a larger variant
pub fn insert<'m, 'd, T: Into>>(&mut self, key: &str,
value: T) {
-self.check_pending_field();
+// Get metadata_builder from parent state
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
Hmm, so modification to metadata builder from a nested builder won't affect
the parent builder if the nested builder doesn't finalize in the end ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2187498765
##
parquet-variant/src/builder.rs:
##
@@ -626,50 +712,35 @@ impl Drop for ListBuilder<'_> {
/// A builder for creating [`Variant::Object`] values.
///
/// See the examples on [`VariantBuilder`] for usage.
-pub struct ObjectBuilder<'a, 'b> {
-parent_buffer: &'a mut ValueBuffer,
-metadata_builder: &'a mut MetadataBuilder,
+pub struct ObjectBuilder<'a> {
+parent_state: ParentState<'a>,
fields: IndexMap, // (field_id, offset)
buffer: ValueBuffer,
-/// Is there a pending list or object that needs to be finalized?
-pending: Option<(&'b str, usize)>,
validate_unique_fields: bool,
/// Set of duplicate fields to report for errors
duplicate_fields: HashSet,
}
-impl<'a, 'b> ObjectBuilder<'a, 'b> {
-fn new(parent_buffer: &'a mut ValueBuffer, metadata_builder: &'a mut
MetadataBuilder) -> Self {
+impl<'a> ObjectBuilder<'a> {
+fn new(parent_state: ParentState<'a>, validate_unique_fields: bool) ->
Self {
Self {
-parent_buffer,
-metadata_builder,
+parent_state,
fields: IndexMap::new(),
buffer: ValueBuffer::default(),
-pending: None,
-validate_unique_fields: false,
+validate_unique_fields,
duplicate_fields: HashSet::new(),
}
}
-fn check_pending_field(&mut self) {
-let Some(&(field_name, field_start)) = self.pending.as_ref() else {
-return;
-};
-
-let field_id = self.metadata_builder.upsert_field_name(field_name);
-self.fields.insert(field_id, field_start);
-
-self.pending = None;
-}
-
/// Add a field with key and value to the object
///
/// Note: when inserting duplicate keys, the new value overwrites the
previous mapping,
/// but the old value remains in the buffer, resulting in a larger variant
pub fn insert<'m, 'd, T: Into>>(&mut self, key: &str,
value: T) {
-self.check_pending_field();
+// Get metadata_builder from parent state
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
Hmm, so modification to metadata builder from an un-finalized builder won't
affect the parent builder in the end?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2187497103
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
Oh, I saw these tests are updated to include `insert`, looks like they are
good now?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2187487621
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
No, I don't argue that we should do finalization in `drop`. I meant, is this
"no effect until finalized" only effective for builders which are added but are
called "insert" yet?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
alamb commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2187106825
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
> If so, this seems not resolve the issue @alamb raised at the beginning?
Because I guess the case @alamb concern is like
>
> ```
> let mut builder = VariantBuilder::new();
> ...
> {
> let _object_builder = builder.new_object();
> _object_builder.insert(xxx);
> _object_builder.insert(yyy);
> _object_builder.insert(zzz);
> // _object_builder doesn't been finalized.
> }
> ...
> builder.finish();
> ```
My initial concern was that it was too easy to forget to call `finish` and
thus lose the structure
Along the way @scovich convinced me that this was acceptable (and actually
preferrable) behavior rather than auto completing builders on drop
THe rationale is that there are other ways that `drop` gets called other
than forgetting to call `build` -- for example an exit on error when using `?`
where the object probably *shouldn't* get dropped
I think after this PR the behavior makes much more sense and is now
documented
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186133081
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
ohh this relates to
* https://github.com/apache/arrow-rs/issues/7870
If we didn't finish the outer builder, the top-level variant builder didn't
contain any value at all, and the resulting `value` buffer is empty.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186133081
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
ohh this relates to
* https://github.com/apache/arrow-rs/issues/7870
Because we didn't finish the outer builder, the top-level variant builder
didn't contain any value at all, and the resulting `value` buffer is empty.
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effec
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186128104
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
Hmm... I split the four inner/outer builder tests to not finish exactly one
of outer or inner... and the outer not-finish cases all break with "Received
empty bytes". So it looks like there _IS_ some problem still.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186127640
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
It's easy enough to add the nested insert(s), but I don't think I understand
"the issue" you refer to, that the current code does not fix?
The new code doesn't insert a new object field's name to the metadata
dictionary until the builder's `finish`method is called. You are correct, that
a scenario like the above could leave extra/unused field names in the metadata
dictionary (because the field names were added when the field finished
inserting, but then the object that contains those fields was never actually
added to its parent). However, spurious names in the metadata dictionary are
harmless -- the spec does not require that every metadata entry be referenced
as a field id in the corresponding value. In fact, the variant shredding spec
_requires_ the metadata dictionary to have "extra" field names, corresponding
to fields that were shredded out. That way, a reader who wants to unshred the
those fields back to normal variant values doesn't have to rewrite the metadata
dictionary.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186112108
##
parquet-variant/src/builder.rs:
##
@@ -1499,4 +1564,116 @@ mod tests {
let valid_result = valid_obj.finish();
assert!(valid_result.is_ok());
}
+
+#[test]
+fn test_variant_builder_to_list_builder_no_finish() {
+// Create a list builder but never finish it
+let mut builder = VariantBuilder::new();
+let _list_builder = builder.new_list();
+
+builder.append_value(42i8);
+
+// The original builder should be unchanged
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+assert_eq!(variant, Variant::Int8(42));
+}
+
+#[test]
+fn test_variant_builder_to_object_builder_no_finish() {
Review Comment:
Oh! You meant the other two tests, that exercise the top-level variant
builder:
```
running 6 tests
test builder::tests::test_variant_builder_to_list_builder_no_finish ... ok
test builder::tests::test_variant_builder_to_object_builder_no_finish ... ok
test builder::tests::test_object_builder_to_object_builder_no_finish ...
FAILED
test builder::tests::test_list_builder_to_list_builder_no_finish ... FAILED
test builder::tests::test_list_builder_to_object_builder_no_finish ... FAILED
test builder::tests::test_object_builder_to_list_builder_no_finish ... FAILED
```
The variant builder didn't have any `pending` mechanism, and failing to
`finish` its child builder already had no side effects. We just didn't have a
test that verified this before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on PR #7865: URL: https://github.com/apache/arrow-rs/pull/7865#issuecomment-3037233062 Hmm, when trying to rebase on top of https://github.com/apache/arrow-rs/pull/7843, I'm getting compilation failures such as: ``` error[E0509]: cannot move out of type `builder::ListBuilder<'_>`, which implements the `Drop` trait --> parquet-variant/src/builder.rs:697:43 | 697 | parent_buffer.append_offset_array(self.offsets, Some(data_size), offset_size); | | | | cannot move out of here | move occurs because `self.offsets` has type `std::vec::Vec`, which does not implement the `Copy` trait ``` That's not such a nice side effect of the empty `impl Drop`... but I guess easy enough to solve with `std::mem::take`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186084248
##
parquet-variant/src/builder.rs:
##
@@ -1499,4 +1564,116 @@ mod tests {
let valid_result = valid_obj.finish();
assert!(valid_result.is_ok());
}
+
+#[test]
+fn test_variant_builder_to_list_builder_no_finish() {
+// Create a list builder but never finish it
+let mut builder = VariantBuilder::new();
+let _list_builder = builder.new_list();
+
+builder.append_value(42i8);
+
+// The original builder should be unchanged
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+assert_eq!(variant, Variant::Int8(42));
+}
+
+#[test]
+fn test_variant_builder_to_object_builder_no_finish() {
Review Comment:
I just tried cherry picking the tests to main branch and all four failed in
the expected ways (empty values for lists; extra fields for objects)?
```
% cargo test --no-fail-fast -- no_finish
...
failures:
builder::tests::test_list_builder_to_list_builder_no_finish stdout
thread 'builder::tests::test_list_builder_to_list_builder_no_finish'
panicked at parquet-variant/src/builder.rs:1545:59:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Received
empty bytes")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
builder::tests::test_list_builder_to_object_builder_no_finish stdout
thread 'builder::tests::test_list_builder_to_object_builder_no_finish'
panicked at parquet-variant/src/builder.rs:1566:59:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Received
empty bytes")
builder::tests::test_object_builder_to_object_builder_no_finish stdout
thread 'builder::tests::test_object_builder_to_object_builder_no_finish'
panicked at parquet-variant/src/builder.rs:1610:9:
assertion `left == right` failed
left: 3
right: 2
builder::tests::test_object_builder_to_list_builder_no_finish stdout
thread 'builder::tests::test_object_builder_to_list_builder_no_finish'
panicked at parquet-variant/src/builder.rs:1589:9:
assertion `left == right` failed
left: 3
right: 2
failures:
builder::tests::test_list_builder_to_list_builder_no_finish
builder::tests::test_list_builder_to_object_builder_no_finish
builder::tests::test_object_builder_to_list_builder_no_finish
builder::tests::test_object_builder_to_object_builder_no_finish
test result: FAILED. 2 passed; 4 failed; 0 ignored; 0 measured; 90 filtered
out; finished in 0.00s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186084248
##
parquet-variant/src/builder.rs:
##
@@ -1499,4 +1564,116 @@ mod tests {
let valid_result = valid_obj.finish();
assert!(valid_result.is_ok());
}
+
+#[test]
+fn test_variant_builder_to_list_builder_no_finish() {
+// Create a list builder but never finish it
+let mut builder = VariantBuilder::new();
+let _list_builder = builder.new_list();
+
+builder.append_value(42i8);
+
+// The original builder should be unchanged
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+assert_eq!(variant, Variant::Int8(42));
+}
+
+#[test]
+fn test_variant_builder_to_object_builder_no_finish() {
Review Comment:
I just tried cherry picking the tests to main branch and all four failed?
```
% cargo test --no-fail-fast -- no_finish
...
failures:
builder::tests::test_list_builder_to_list_builder_no_finish stdout
thread 'builder::tests::test_list_builder_to_list_builder_no_finish'
panicked at parquet-variant/src/builder.rs:1545:59:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Received
empty bytes")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
builder::tests::test_list_builder_to_object_builder_no_finish stdout
thread 'builder::tests::test_list_builder_to_object_builder_no_finish'
panicked at parquet-variant/src/builder.rs:1566:59:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Received
empty bytes")
builder::tests::test_object_builder_to_object_builder_no_finish stdout
thread 'builder::tests::test_object_builder_to_object_builder_no_finish'
panicked at parquet-variant/src/builder.rs:1610:9:
assertion `left == right` failed
left: 3
right: 2
builder::tests::test_object_builder_to_list_builder_no_finish stdout
thread 'builder::tests::test_object_builder_to_list_builder_no_finish'
panicked at parquet-variant/src/builder.rs:1589:9:
assertion `left == right` failed
left: 3
right: 2
failures:
builder::tests::test_list_builder_to_list_builder_no_finish
builder::tests::test_list_builder_to_object_builder_no_finish
builder::tests::test_object_builder_to_list_builder_no_finish
builder::tests::test_object_builder_to_object_builder_no_finish
test result: FAILED. 2 passed; 4 failed; 0 ignored; 0 measured; 90 filtered
out; finished in 0.00s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186071888
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
///
-/// This consumes self and writes the object to the parent buffer.
-pub fn finish(mut self) -> Result<(), ArrowError> {
-self.check_pending_field();
+/// WARNING: The builder will have no effect unless/until
[`ListBuilder::finish`] is called.
+pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ListBuilder::new(parent_state, validate_unique_fields)
+}
+/// Finalizes this object and appends it to its parent, which otherwise
remains unmodified.
+pub fn finish(mut self) -> Result<(), ArrowError> {
+let metadata_builder = self.parent_state.metadata_builder();
Review Comment:
Correct me if I miss anything. The only major difference after this change
is the metadata builder and buffer of parent builder are pulled into parent
state.
But I took at the current codebase, `parent_buffer` is also only modified in
`finish`. So for this modification, before and after this change has no
difference.
For `metadata_builder`, looks like it is the major change. Now
`check_pending_field` is removed so any pending nested builder won't be updated
to metadata builder when we create another new nested builder.
But seems it is only limited to if the pending nested builder hasn't insert
any field yet? If we has `insert`ed fields to the pending nested builder,
metadata builder is already updated even the pending nested builder is not
finalized.
Maybe it is why all un-finialized nested builders in these new "no finish"
tests are without any "insert` calls?
If so, this seems not resolve the issue @alamb raised at the beginning?
Because I guess the case @alamb concern is like
```
let mut builder = VariantBuilder::new();
...
{
let _object_builder = builder.new_object();
_object_builder.insert(xxx);
_object_builder.insert(yyy);
_object_builder.insert(zzz);
// _object_builder doesn't been finalized.
}
...
builder.finish();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186067388
##
parquet-variant/src/builder.rs:
##
@@ -1499,4 +1564,116 @@ mod tests {
let valid_result = valid_obj.finish();
assert!(valid_result.is_ok());
}
+
+#[test]
+fn test_variant_builder_to_list_builder_no_finish() {
+// Create a list builder but never finish it
+let mut builder = VariantBuilder::new();
+let _list_builder = builder.new_list();
+
+builder.append_value(42i8);
+
+// The original builder should be unchanged
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+assert_eq!(variant, Variant::Int8(42));
+}
+
+#[test]
+fn test_variant_builder_to_object_builder_no_finish() {
Review Comment:
If a builder didn't `finish`, the next append to its parent would finalize
the orphaned `pending` stuff. For a list, that meant an extra offset (as if
there were a zero-sized value there, without even a header). For an object, it
would associate two different fields with the same value bytes.
That's a good question, why the tests pass without these fixes tho?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186063515
##
parquet-variant/src/builder.rs:
##
@@ -479,20 +572,29 @@ impl VariantBuilder {
self
}
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state(&mut self) -> (ParentState, bool) {
+let state = ParentState::Variant {
+buffer: &mut self.buffer,
+metadata_builder: &mut self.metadata_builder,
+};
+(state, self.validate_unique_fields)
+}
+
/// Create an [`ListBuilder`] for creating [`Variant::List`] values.
///
/// See the examples on [`VariantBuilder`] for usage.
pub fn new_list(&mut self) -> ListBuilder {
-ListBuilder::new(&mut self.buffer, &mut self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields)
+let (parent_state, validate_unique_fields) = self.parent_state();
+ListBuilder::new(parent_state, validate_unique_fields)
Review Comment:
That's what I was doing at first, but it was extra code duplicated at every
call site.
I also tried introducing a `ProvidesParentState` type trait, that the
builder `new` methods could take `&mut impl` of... but that also didn't work
great because `parent_state` for object builder takes a param while this one
does not.
So yeah... the current PR state is the least bad option I thought of.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186027159
##
parquet-variant/src/builder.rs:
##
@@ -1499,4 +1564,116 @@ mod tests {
let valid_result = valid_obj.finish();
assert!(valid_result.is_ok());
}
+
+#[test]
+fn test_variant_builder_to_list_builder_no_finish() {
+// Create a list builder but never finish it
+let mut builder = VariantBuilder::new();
+let _list_builder = builder.new_list();
+
+builder.append_value(42i8);
+
+// The original builder should be unchanged
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+assert_eq!(variant, Variant::Int8(42));
+}
+
+#[test]
+fn test_variant_builder_to_object_builder_no_finish() {
Review Comment:
Hmm what the difference before and after this change? Without this PR, this
test also passes without issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186021770
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
Review Comment:
Oh, it is mutable borrowed by the parent state.
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
Review Comment:
Oh, it is mutably borrowed by the parent state.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
viirya commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186003389
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
+///
+/// WARNING: The builder will have no effect unless/until
[`ObjectBuilder::finish`] is called.
+pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+let (parent_state, validate_unique_fields) = self.parent_state(key);
+ObjectBuilder::new(parent_state, validate_unique_fields)
}
-/// Finalize object
+/// Returns a list builder that can be used to append a new (nested) list
to this list.
Review Comment:
```suggestion
/// Returns a list builder that can be used to append a new (nested)
list to this object.
```
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool)
{
+let state = ParentState::Object {
+buffer: &mut self.buffer,
+metadata_builder: self.parent_state.metadata_builder(),
+fields: &mut self.fields,
+field_name: key,
+};
+(state, self.validate_unique_fields)
}
-/// Return a new [`ListBuilder`] to add a list with the specified key to
the
-/// object.
-pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let list_builder = ListBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-list_builder
+/// Returns an object builder that can be used to append a new (nested)
object to this list.
Review Comment:
```suggestion
/// Returns an object builder that can be used to append a new (nested)
object to this object.
```
##
parquet-variant/src/builder.rs:
##
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
self
}
-/// Return a new [`ObjectBuilder`] to add a nested object with the
specified
-/// key to the object.
-pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-self.check_pending_field();
-
-let field_start = self.buffer.offset();
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = Some((key, field_start));
-
-obj_builder
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
Revie
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
alamb commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2185946272
##
parquet-variant/src/builder.rs:
##
@@ -479,20 +572,29 @@ impl VariantBuilder {
self
}
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state(&mut self) -> (ParentState, bool) {
+let state = ParentState::Variant {
+buffer: &mut self.buffer,
+metadata_builder: &mut self.metadata_builder,
+};
+(state, self.validate_unique_fields)
+}
+
/// Create an [`ListBuilder`] for creating [`Variant::List`] values.
///
/// See the examples on [`VariantBuilder`] for usage.
pub fn new_list(&mut self) -> ListBuilder {
-ListBuilder::new(&mut self.buffer, &mut self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields)
+let (parent_state, validate_unique_fields) = self.parent_state();
+ListBuilder::new(parent_state, validate_unique_fields)
Review Comment:
If you want to avoid returning`validate_unique_fields` in a local field you
could do something like
```suggestion
let validate_unique_fields = self.validate_unique_fields;
let parent_state = self.parent_state();
ListBuilder::new(parent_state, validate_unique_fields)
```
##
parquet-variant/src/builder.rs:
##
@@ -549,107 +635,92 @@ impl<'a> ListBuilder<'a> {
self
}
-pub fn new_object(&mut self) -> ObjectBuilder {
-self.check_new_offset();
-
-let obj_builder = ObjectBuilder::new(&mut self.buffer,
self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields);
-self.pending = true;
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
Review Comment:
same comment as above -- I think you can remove the need to return
`validate_unique_fields` if you want
##
parquet-variant/src/builder.rs:
##
@@ -1499,4 +1564,116 @@ mod tests {
let valid_result = valid_obj.finish();
assert!(valid_result.is_ok());
}
+
+#[test]
+fn test_variant_builder_to_list_builder_no_finish() {
+// Create a list builder but never finish it
+let mut builder = VariantBuilder::new();
+let _list_builder = builder.new_list();
+
+builder.append_value(42i8);
+
+// The original builder should be unchanged
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+assert_eq!(variant, Variant::Int8(42));
+}
+
+#[test]
+fn test_variant_builder_to_object_builder_no_finish() {
+// Create an object builder but never finish it
+let mut builder = VariantBuilder::new();
+let _object_builder = builder.new_object();
+
+builder.append_value(42i8);
+
+// The original builder should be unchanged
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+assert_eq!(variant, Variant::Int8(42));
+}
+
+#[test]
+fn test_list_builder_to_list_builder_no_finish() {
+let mut builder = VariantBuilder::new();
+let mut list_builder = builder.new_list();
+list_builder.append_value(1i8);
+
+// Create a nested list builder but never finish it
+let _nested_list_builder = list_builder.new_list();
+
+list_builder.append_value(2i8);
+
+// The parent list should only contain the original values
+list_builder.finish();
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+let list = variant.as_list().unwrap();
+assert_eq!(list.len(), 2);
+assert_eq!(list.get(0).unwrap(), Variant::Int8(1));
+assert_eq!(list.get(1).unwrap(), Variant::Int8(2));
+}
+
+#[test]
+fn test_list_builder_to_object_builder_no_finish() {
+let mut builder = VariantBuilder::new();
+let mut list_builder = builder.new_list();
+list_builder.append_value(1i8);
+
+// Create a nested object builder but never finish it
+let _nested_object_builder = list_builder.new_object();
+
+list_builder.append_value(2i8);
+
+// The parent list should only contain the original values
+list_builder.finish();
+let (metadata, value) = builder.finish();
+let variant = Variant::try_new(&metadata, &value).unwrap();
+let list = variant.as_list().unwrap();
+assert_eq!(list.len(), 2);
+assert_eq!(list.get(0).unwrap(), Variant::Int8(1));
+assert_eq!(list.get(1).unwrap(), Variant::Int8(2));
+}
+
+#[test]
+fn test_object_builder_to_list_builder_no_finish() {
+let mut builder = VariantBuild
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
alamb commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2185946272
##
parquet-variant/src/builder.rs:
##
@@ -479,20 +572,29 @@ impl VariantBuilder {
self
}
+// Returns validate_unique_fields because we can no longer reference self
once this method returns.
+fn parent_state(&mut self) -> (ParentState, bool) {
+let state = ParentState::Variant {
+buffer: &mut self.buffer,
+metadata_builder: &mut self.metadata_builder,
+};
+(state, self.validate_unique_fields)
+}
+
/// Create an [`ListBuilder`] for creating [`Variant::List`] values.
///
/// See the examples on [`VariantBuilder`] for usage.
pub fn new_list(&mut self) -> ListBuilder {
-ListBuilder::new(&mut self.buffer, &mut self.metadata_builder)
-.with_validate_unique_fields(self.validate_unique_fields)
+let (parent_state, validate_unique_fields) = self.parent_state();
+ListBuilder::new(parent_state, validate_unique_fields)
Review Comment:
If you want to avoid returning`validate_unique_fields` in a local field you
could probably copy it by value with something like
```suggestion
let validate_unique_fields = self.validate_unique_fields;
let parent_state = self.parent_state();
ListBuilder::new(parent_state, validate_unique_fields)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [Variant] List and object builders have no effect until finalized [arrow-rs]
scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2183980555
##
parquet-variant/src/builder.rs:
##
@@ -291,14 +310,88 @@ impl MetadataBuilder {
write_offset(&mut metadata, cur_offset, offset_size);
Review Comment:
NOTE: We can't use `append_offset_array` because the final offset isn't
known until after we've written all the other offsets. Also, because `metadata`
isn't a `ValueBuffer`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
