[jira] [Commented] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2020-06-02 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124509#comment-17124509
 ] 

Wes McKinney commented on ARROW-4067:
-

I removed this from any milestone since it does not seem to be doing much harm 
to us at the moment. 

> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Minor
>  Labels: usability
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2019-01-08 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737336#comment-16737336
 ] 

Wes McKinney commented on ARROW-4067:
-

I'm in favor of these changes in principle, so long as we are careful to 
examine benchmarks while the changes are being made (to ascertain confidently 
that we are not creating performance regressions). Adding affordances for 
nested builders sounds like a good idea.

Sounds like a number of functions will have to undergo a deprecation cycle as a 
result of this. It would not be the first time. Given the invasiveness of this 
change to some developers, and since our CI scripts will require us to fix all 
the deprecated API uses, we'll probably want to have a test file like 
{{arrow-deprecated-test.cc}} where we suppress {{-Wdeprecated}}

> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: usability
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2019-01-02 Thread Benjamin Kietzman (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732262#comment-16732262
 ] 

Benjamin Kietzman commented on ARROW-4067:
--

[~wesmckinn]

BufferBuilders and ArrayBuilders have very similarly named methods with 
different semantics:

||Method||BufferBuilder||ArrayBuilder||Comments||
|Resize|set capacity, optionally shrinking to fit|set capacity, attempting to 
shrink is an error|Users might expect a method named 'resize' to modify the 
built container's length (as in the STL), but this is not the case here.|
|Reserve|expand capacity|expand capacity, use growth factor|BufferBuilders use 
a growth factor only inside Append methods.|

We could simplify this in several ways. My preference would be to remove Resize 
and add an optional parameter {{bool use_growth_factor}} to Reserve() in both 
ArrayBuilder and BufferBuilder. To replace Resize in BufferBuilder, I'd add 
SetCapacity (which would always shrink to fit). Finally, Reserve will accept an 
absolute capacity instead of an additional element count; the current method is 
confusing enough that we've even got an [error in one of our 
tests|https://github.com/apache/arrow/blob/73f94c93d7eee25a43415dfa7a806b887942abd1/cpp/src/arrow/array-test.cc#L273-L276]

> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: usability
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2018-12-18 Thread Benjamin Kietzman (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724314#comment-16724314
 ] 

Benjamin Kietzman commented on ARROW-4067:
--

TODO address multiple append of nested type

> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: usability
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2018-12-18 Thread Benjamin Kietzman (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724313#comment-16724313
 ] 

Benjamin Kietzman commented on ARROW-4067:
--

In many cases we do know the exact types at compile time, and for those we 
shouldn't have to suffer boilerplate. In general you are correct and we'd have 
to fallback to boilerplate or use visitors to accommodate unknown types

{code}
RETURN_NOT_OK(list_builder.Append(nullptr)); // unknown type
RETURN_NOT_OK(VisitBuilder(list_builder.value_builder(), [](auto *b) {
  return b->AppendNull();
}));
{code}

> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: usability
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2018-12-18 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724296#comment-16724296
 ] 

Antoine Pitrou commented on ARROW-4067:
---

+1 on the general idea of making the Builder interface more regular. We 
probably want to deprecate old methods before removing them.

As for nested types, your proposal only works if the exact types are known at 
compile-time, AFAICT. That can be the case in application code, but rarely in 
Arrow core libraries.


> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: usability
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2018-12-18 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724244#comment-16724244
 ] 

Wes McKinney commented on ARROW-4067:
-

No objections to adding builder visitor tools. 

Having helper classes for building elements in nested types sounds pretty 
useful. 

> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: usability
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4067) [C++] RFC: standardize ArrayBuilder subclasses

2018-12-18 Thread Benjamin Kietzman (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724232#comment-16724232
 ] 

Benjamin Kietzman commented on ARROW-4067:
--

Related: It would be useful to add {{VisitBuilderInline}} to visit_inline.h

> [C++] RFC: standardize ArrayBuilder subclasses
> --
>
> Key: ARROW-4067
> URL: https://issues.apache.org/jira/browse/ARROW-4067
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>  Labels: usability
>
> Each builder supports different and frequently differently named methods for 
> appending. It should be possible to establish a more consistent convention, 
> which would alleviate dev confusion and simplify generics.
> For example, let all Builders be required to define at minimum:
>  * {{Reserve(int64_t)}}
>  * a nested type named {{Scalar}}, which is the canonical scalar appended to 
> this builder. Append other types may be supported for convenience.
>  * {{UnsafeAppend(Scalar)}}
>  * {{UnsafeAppendNull()}}
> The other methods described below can be overridden if an optimization is 
> available or left defaulted (a CRTP helper can contain the default 
> implementations, for example {{Append(Scalar)}} would simply be a call to 
> Reserve then UnsafeAppend.
> In addition to their unsafe equivalents, {{Append(Scalar)}} and 
> {{AppendNull()}} should be available for appending without manual capacity 
> maintenance.
> It is not necessary for the rest of this RFC, but it would simplify builders 
> further if scalar append methods always had a single argument. For example, 
> this would mean abolishing {{BinaryBuilder::Append(const uint8_t*, int32_t)}} 
> in favor of {{BinaryBuilder::Append(basic_string_view)}}. There's no 
> runtime overhead involved in this change, and developers who have a pointer 
> and a length instead of a view can just construct one without boilerplate 
> using brace initialization: {code}b->Append({pointer, length});{code}
> Unsafe and safe methods should be provided for appending multiple values as 
> well. The default implementation will be a trivial loop but if optimizations 
> are available then this could be overridden (for example instead of copying 
> bits one by one into a BooleanBuilder, bytes could be memcpy'd). Append 
> methods for multiple values should accept two arguments, the first of which 
> contains values and the second of which defines validity. The canonical 
> multiple append method has signature {{Status(array_view values, 
> const uint8_t* valid_bytes)}}, but other overloads and helpers could be 
> provided as well:
> {code}
> b->Append({{1, 3, 4}}, all_valid); // append values with no nulls
> b->Append({{1, 3, 4}}, bool_vector); // use the elements of a vector 
> for validity
> b->Append({{1, 3, 4}}, bits(ptr)); // interpret ptr as a buffer of valid 
> bits, rather than valid bytes
> {code}
> Builders of nested types currently require developers to write boilerplate 
> wrangling the child builders. This could be alleviated by letting nested 
> builders' append methods return a helper as an output argument:
> {code}
> ListBuilder::List lst;
> RETURN_NOT_OK(list_builder.Append()); // ListBuilder::Scalar == 
> ListBuilder::ListBase*
> RETURN_NOT_OK(lst->Append(3));
> RETURN_NOT_OK(lst->Append(4));
> StructBuilder::Struct strct;
> RETURN_NOT_OK(struct_builder.Append());
> RETURN_NOT_OK(strct.Set(0, "uuid"));
> RETURN_NOT_OK(strct.Set(2, 47));
> RETURN_NOT_OK(strct->Finish()); // appends null to unspecified fields
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)