[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432608#comment-16432608
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on issue #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#issuecomment-380179127
 
 
   Thanks @kou !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432605#comment-16432605
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou closed pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/array-test.cc b/cpp/src/arrow/array-test.cc
index fb1bebfca..2ae9f48db 100644
--- a/cpp/src/arrow/array-test.cc
+++ b/cpp/src/arrow/array-test.cc
@@ -1022,6 +1022,72 @@ TEST_F(TestStringBuilder, TestAppendVector) {
   }
 }
 
+TEST_F(TestStringBuilder, TestAppendCStringsWithValidBytes) {
+  const char* strings[] = {nullptr, "aaa", nullptr, "ignored", ""};
+  vector valid_bytes = {1, 1, 1, 0, 1};
+
+  int N = static_cast(sizeof(strings) / sizeof(strings[0]));
+  int reps = 1000;
+
+  for (int j = 0; j < reps; ++j) {
+ASSERT_OK(builder_->Append(strings, N, valid_bytes.data()));
+  }
+  Done();
+
+  ASSERT_EQ(reps * N, result_->length());
+  ASSERT_EQ(reps * 3, result_->null_count());
+  ASSERT_EQ(reps * 3, result_->value_data()->size());
+
+  int32_t length;
+  int32_t pos = 0;
+  for (int i = 0; i < N * reps; ++i) {
+auto string = strings[i % N];
+if (string && valid_bytes[i % N]) {
+  ASSERT_FALSE(result_->IsNull(i));
+  result_->GetValue(i, &length);
+  ASSERT_EQ(pos, result_->value_offset(i));
+  ASSERT_EQ(static_cast(strlen(string)), length);
+  ASSERT_EQ(strings[i % N], result_->GetString(i));
+
+  pos += length;
+} else {
+  ASSERT_TRUE(result_->IsNull(i));
+}
+  }
+}
+
+TEST_F(TestStringBuilder, TestAppendCStringsWithoutValidBytes) {
+  const char* strings[] = {"", "bb", "a", nullptr, "ccc"};
+
+  int N = static_cast(sizeof(strings) / sizeof(strings[0]));
+  int reps = 1000;
+
+  for (int j = 0; j < reps; ++j) {
+ASSERT_OK(builder_->Append(strings, N));
+  }
+  Done();
+
+  ASSERT_EQ(reps * N, result_->length());
+  ASSERT_EQ(reps, result_->null_count());
+  ASSERT_EQ(reps * 6, result_->value_data()->size());
+
+  int32_t length;
+  int32_t pos = 0;
+  for (int i = 0; i < N * reps; ++i) {
+if (strings[i % N]) {
+  ASSERT_FALSE(result_->IsNull(i));
+  result_->GetValue(i, &length);
+  ASSERT_EQ(pos, result_->value_offset(i));
+  ASSERT_EQ(static_cast(strlen(strings[i % N])), length);
+  ASSERT_EQ(strings[i % N], result_->GetString(i));
+
+  pos += length;
+} else {
+  ASSERT_TRUE(result_->IsNull(i));
+}
+  }
+}
+
 TEST_F(TestStringBuilder, TestZeroLength) {
   // All buffers are null
   Done();
diff --git a/cpp/src/arrow/builder.cc b/cpp/src/arrow/builder.cc
index c97253e64..78c42f4fc 100644
--- a/cpp/src/arrow/builder.cc
+++ b/cpp/src/arrow/builder.cc
@@ -1413,6 +1413,64 @@ Status StringBuilder::Append(const 
std::vector& values,
   return Status::OK();
 }
 
+Status StringBuilder::Append(const char** values, int64_t length,
+ const uint8_t* valid_bytes) {
+  std::size_t total_length = 0;
+  std::vector value_lengths(length);
+  bool have_null_value = false;
+  for (int64_t i = 0; i < length; ++i) {
+if (values[i]) {
+  auto value_length = strlen(values[i]);
+  value_lengths[i] = value_length;
+  total_length += value_length;
+} else {
+  have_null_value = true;
+}
+  }
+  RETURN_NOT_OK(Reserve(length));
+  RETURN_NOT_OK(value_data_builder_.Reserve(total_length));
+  RETURN_NOT_OK(offsets_builder_.Reserve(length));
+
+  if (valid_bytes) {
+int64_t valid_bytes_offset = 0;
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  if (valid_bytes[i]) {
+if (values[i]) {
+  RETURN_NOT_OK(value_data_builder_.Append(
+  reinterpret_cast(values[i]), value_lengths[i]));
+} else {
+  UnsafeAppendToBitmap(valid_bytes + valid_bytes_offset, i - 
valid_bytes_offset);
+  UnsafeAppendToBitmap(false);
+  valid_bytes_offset = i + 1;
+}
+  }
+}
+UnsafeAppendToBitmap(valid_bytes + valid_bytes_offset, length - 
valid_bytes_offset);
+  } else {
+if (have_null_value) {
+  std::vector valid_vector(length, 0);
+  for (int64_t i = 0; i < length; ++i) {
+RETURN_NOT_OK(AppendNextOffset());
+if (values[i]) {
+  RETURN_NOT_OK(value_data_builder_.Append(
+  reinterpret_cast(values[i]), value_lengths[i]));
+  valid_vector[i] = 1;
+}
+  }
+  UnsafeAppendToBitmap(valid_vector.data(), length);
+} else {
+  for (int64_t i = 0; i < length; ++i) {
+RETURN_NOT_OK(AppendNextOffset());
+RETUR

[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432225#comment-16432225
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r180414507
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
+  /// \param[in] values a contiguous C array of C strings
+  /// \param[in] length the number of values to append
+  /// \param[in] valid_bytes an optional sequence of bytes where non-zero
+  /// indicates a valid (non-null) value
+  /// \return Status
+  Status Append(const char** values,
 
 Review comment:
   Ok, I think we can do this in a separate PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432215#comment-16432215
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

xhochy commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r180411014
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
+  /// \param[in] values a contiguous C array of C strings
+  /// \param[in] length the number of values to append
+  /// \param[in] valid_bytes an optional sequence of bytes where non-zero
+  /// indicates a valid (non-null) value
+  /// \return Status
+  Status Append(const char** values,
 
 Review comment:
   I'm happy to have methods with better names. We should deprecate the old 
variants then and leave them in for one release. Removing methods completely 
between release without deprecation already creates a lot of overhead in the 
libraries consuming Arrow. While there are not that many public visible, I 
would like to keep maintenance for them as smooth as possible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432148#comment-16432148
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r180394086
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
+  /// \param[in] values a contiguous C array of C strings
+  /// \param[in] length the number of values to append
+  /// \param[in] valid_bytes an optional sequence of bytes where non-zero
+  /// indicates a valid (non-null) value
+  /// \return Status
+  Status Append(const char** values,
 
 Review comment:
   I would agree to rename other vector append method to `AppendValues`, to 
avoid ambiguities and user errors, but I don't know how much we value API 
stability at this point. @xhochy ?
   
   (for example, in Python `list.append` always appends a single item and 
`list.extend` must be called to append multiple items at once)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429793#comment-16429793
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179952045
 
 

 ##
 File path: cpp/src/arrow/builder.cc
 ##
 @@ -1413,6 +1413,41 @@ Status StringBuilder::Append(const 
std::vector& values,
   return Status::OK();
 }
 
+Status StringBuilder::Append(const char** values,
+ int64_t length,
+ const uint8_t* valid_bytes) {
+  std::size_t total_length = 0;
+  std::vector value_lengths(length);
+  for (int64_t i = 0; i < length; ++i) {
+if (values[i]) {
+  auto value_length = strlen(values[i]);
+  value_lengths[i] = value_length;
+  total_length += value_length;
+}
+  }
+  RETURN_NOT_OK(Reserve(length));
+  RETURN_NOT_OK(value_data_builder_.Reserve(total_length));
+  RETURN_NOT_OK(offsets_builder_.Reserve(length));
+
+  if (valid_bytes) {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  if (valid_bytes[i]) {
+RETURN_NOT_OK(value_data_builder_.Append(
+reinterpret_cast(values[i]), value_lengths[i]));
+  }
+}
+  } else {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  RETURN_NOT_OK(value_data_builder_.Append(
+  reinterpret_cast(values[i]), value_lengths[i]));
 
 Review comment:
   Hmm, yes, I think it's a bad idea. If we accept `NULL` value pointers, it 
should IMHO *always* mean a null item in the array (regardless of 
`valid_bytes`). If we don't accept `NULL` value pointers, we should at least 
add a DCHECK against them.
   
   I think it's a good idea to accept them, that way people don't have to 
allocate a separate `valid_bytes` array.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429645#comment-16429645
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on issue #1852: ARROW-2411: [C++] Add StringBuilder::Append(const 
char **values)
URL: https://github.com/apache/arrow/pull/1852#issuecomment-379524557
 
 
   > "NULL-terminated  C strings" means each C string is terminated by a nul 
char, not that  the array is terminated by a NULL pointer :-)
   
   Thanks for the English lesson. :-)
   It's very helpful to me because I'm not a native English speaker.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429642#comment-16429642
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179936915
 
 

 ##
 File path: cpp/src/arrow/builder.cc
 ##
 @@ -1413,6 +1413,41 @@ Status StringBuilder::Append(const 
std::vector& values,
   return Status::OK();
 }
 
+Status StringBuilder::Append(const char** values,
+ int64_t length,
+ const uint8_t* valid_bytes) {
+  std::size_t total_length = 0;
+  std::vector value_lengths(length);
+  for (int64_t i = 0; i < length; ++i) {
+if (values[i]) {
+  auto value_length = strlen(values[i]);
+  value_lengths[i] = value_length;
+  total_length += value_length;
+}
+  }
+  RETURN_NOT_OK(Reserve(length));
+  RETURN_NOT_OK(value_data_builder_.Reserve(total_length));
+  RETURN_NOT_OK(offsets_builder_.Reserve(length));
+
+  if (valid_bytes) {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  if (valid_bytes[i]) {
+RETURN_NOT_OK(value_data_builder_.Append(
+reinterpret_cast(values[i]), value_lengths[i]));
+  }
+}
+  } else {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  RETURN_NOT_OK(value_data_builder_.Append(
+  reinterpret_cast(values[i]), value_lengths[i]));
 
 Review comment:
   It's interesting.
   I've implemented as the followings:
   
 * If `values[i]` is `NULL` and `valid_bytes` isn't `nullptr`, the value is 
an empty string not a null value. It's for respecting `valid_bytes` data.
 * If `values[i]` is `NULL` and `valid_bytes` is `nullptr`, the value is a 
null value.
   
   But users may be confused...
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429643#comment-16429643
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179936937
 
 

 ##
 File path: cpp/src/arrow/array-test.cc
 ##
 @@ -1022,6 +1022,39 @@ TEST_F(TestStringBuilder, TestAppendVector) {
   }
 }
 
+TEST_F(TestStringBuilder, TestAppendCStrings) {
 
 Review comment:
   Exactly.
   I've added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429638#comment-16429638
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179936463
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
 
 Review comment:
   I'm using "C string" as "nul-terminated char *".
   
   I changed to use "nul-terminated char*" instead of "C string". It'll be more 
clearer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429637#comment-16429637
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179936398
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
+  /// \param[in] values a contiguous C array of C strings
+  /// \param[in] length the number of values to append
+  /// \param[in] valid_bytes an optional sequence of bytes where non-zero
+  /// indicates a valid (non-null) value
+  /// \return Status
+  Status Append(const char** values,
 
 Review comment:
   Umm. We already use `Append()` for scalar append and vector append in other 
builders.
   If we use separated names for append variations, it may be better that we 
also change the current vector append methods to `AppendValues()` or something 
for consistency.
   
   @xhochy what do you think about this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429429#comment-16429429
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on issue #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#issuecomment-379478751
 
 
   > This implementation uses not NULL-terminated C strings and the length of 
values instead of NULL-terminated C strings
   
   "NULL-terminated C strings" means each C string is terminated by a nul char, 
not that the array is terminated by a NULL pointer :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429426#comment-16429426
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179919647
 
 

 ##
 File path: cpp/src/arrow/builder.cc
 ##
 @@ -1413,6 +1413,41 @@ Status StringBuilder::Append(const 
std::vector& values,
   return Status::OK();
 }
 
+Status StringBuilder::Append(const char** values,
+ int64_t length,
+ const uint8_t* valid_bytes) {
+  std::size_t total_length = 0;
+  std::vector value_lengths(length);
+  for (int64_t i = 0; i < length; ++i) {
+if (values[i]) {
+  auto value_length = strlen(values[i]);
+  value_lengths[i] = value_length;
+  total_length += value_length;
+}
+  }
+  RETURN_NOT_OK(Reserve(length));
+  RETURN_NOT_OK(value_data_builder_.Reserve(total_length));
+  RETURN_NOT_OK(offsets_builder_.Reserve(length));
+
+  if (valid_bytes) {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  if (valid_bytes[i]) {
+RETURN_NOT_OK(value_data_builder_.Append(
+reinterpret_cast(values[i]), value_lengths[i]));
+  }
+}
+  } else {
+for (int64_t i = 0; i < length; ++i) {
+  RETURN_NOT_OK(AppendNextOffset());
+  RETURN_NOT_OK(value_data_builder_.Append(
+  reinterpret_cast(values[i]), value_lengths[i]));
 
 Review comment:
   Does this create a non-null array item even if `values[i]` is NULL?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429427#comment-16429427
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179919834
 
 

 ##
 File path: cpp/src/arrow/array-test.cc
 ##
 @@ -1022,6 +1022,39 @@ TEST_F(TestStringBuilder, TestAppendVector) {
   }
 }
 
+TEST_F(TestStringBuilder, TestAppendCStrings) {
 
 Review comment:
   You should also write a test without a `valid_bytes` argument.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429425#comment-16429425
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179919633
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
 
 Review comment:
   "nul-terminated C strings".


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429428#comment-16429428
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

pitrou commented on a change in pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852#discussion_r179919627
 
 

 ##
 File path: cpp/src/arrow/builder.h
 ##
 @@ -720,6 +720,16 @@ class ARROW_EXPORT StringBuilder : public BinaryBuilder {
 
   Status Append(const std::vector& values,
 const uint8_t* valid_bytes = NULLPTR);
+
+  /// \brief Append a sequence of C strings in one shot
+  /// \param[in] values a contiguous C array of C strings
+  /// \param[in] length the number of values to append
+  /// \param[in] valid_bytes an optional sequence of bytes where non-zero
+  /// indicates a valid (non-null) value
+  /// \return Status
+  Status Append(const char** values,
 
 Review comment:
   I'd rather have it called `AppendCStrings` as @xhochy proposed. Too many 
overloads can create ambiguity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2411) [C++] Add method to append batches of null-terminated strings to StringBuilder

2018-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429363#comment-16429363
 ] 

ASF GitHub Bot commented on ARROW-2411:
---

kou opened a new pull request #1852: ARROW-2411: [C++] Add 
StringBuilder::Append(const char **values)
URL: https://github.com/apache/arrow/pull/1852
 
 
   This implementation uses not NULL-terminated C strings and the length of 
values instead of NULL-terminated C strings. Because other builder uses the 
interface such as `PrimitiveBuilder::Append()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add method to append batches of null-terminated strings to StringBuilder
> --
>
> Key: ARROW-2411
> URL: https://issues.apache.org/jira/browse/ARROW-2411
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GLib
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We should add a method {{StringBuilder::AppendCStrings(const char** values, 
> const uint8_t* valid_bytes = NULLPTR)}} to the {{StringBuilder}} class to 
> have fast inserts for these strings. See 
> https://github.com/apache/arrow/pull/1845/files for a use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)