[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-29 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r446816192



##
File path: r/src/array_from_vector.cpp
##
@@ -918,6 +923,97 @@ class Time64Converter : public TimeConverter {
   }
 };
 
+template 
+class BinaryVectorConverter : public VectorConverter {
+ public:
+  ~BinaryVectorConverter() {}
+
+  Status Init(ArrayBuilder* builder) {
+typed_builder_ = checked_cast(builder);
+return Status::OK();
+  }
+
+  Status Ingest(SEXP obj) {
+ARROW_RETURN_IF(TYPEOF(obj) != VECSXP, Status::RError("Expecting a list"));
+R_xlen_t n = XLENGTH(obj);
+
+// Reserve enough space before appending
+int64_t size = 0;
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP obj_i = VECTOR_ELT(obj, i);
+  if (!Rf_isNull(obj_i)) {
+ARROW_RETURN_IF(TYPEOF(obj_i) != RAWSXP,
+Status::RError("Expecting a raw vector"));
+size += XLENGTH(obj_i);
+  }
+}
+RETURN_NOT_OK(typed_builder_->Reserve(size));
+
+// append
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP obj_i = VECTOR_ELT(obj, i);
+  if (Rf_isNull(obj_i)) {
+RETURN_NOT_OK(typed_builder_->AppendNull());
+  } else {
+RETURN_NOT_OK(typed_builder_->Append(RAW(obj_i), XLENGTH(obj_i)));
+  }
+}
+return Status::OK();
+  }
+
+  Status GetResult(std::shared_ptr* result) {
+return typed_builder_->Finish(result);
+  }
+
+ private:
+  Builder* typed_builder_;
+};
+
+template 
+class StringVectorConverter : public VectorConverter {
+ public:
+  ~StringVectorConverter() {}
+
+  Status Init(ArrayBuilder* builder) {
+typed_builder_ = checked_cast(builder);
+return Status::OK();
+  }
+
+  Status Ingest(SEXP obj) {
+ARROW_RETURN_IF(TYPEOF(obj) != STRSXP,
+Status::RError("Expecting a character vector"));
+R_xlen_t n = XLENGTH(obj);
+
+// Reserve enough space before appending
+int64_t size = 0;
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP string_i = STRING_ELT(obj, i);
+  if (string_i != NA_STRING) {
+size += XLENGTH(string_i);
+  }
+}
+RETURN_NOT_OK(typed_builder_->Reserve(size));
+
+// append
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP string_i = STRING_ELT(obj, i);
+  if (string_i == NA_STRING) {
+RETURN_NOT_OK(typed_builder_->AppendNull());
+  } else {
+RETURN_NOT_OK(typed_builder_->Append(CHAR(string_i), 
XLENGTH(string_i)));

Review comment:
   I'll have a look once back at this. Overall there seems to be two 
concurrent systems with no real reason and I believe we should only keep the 
once powered by `VectorToArrayConverter` . 
   
   I'm on my `dplyr` week now, I'll try still to make some space for this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-26 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r446483934



##
File path: r/tests/testthat/test-Array.R
##
@@ -18,16 +18,16 @@
 context("Array")
 
 expect_array_roundtrip <- function(x, type) {
-  a <- Array$create(x)
+  a <- Array$create(x, type = type)

Review comment:
   ✅ 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-26 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r446269445



##
File path: r/tests/testthat/test-Array.R
##
@@ -18,16 +18,16 @@
 context("Array")
 
 expect_array_roundtrip <- function(x, type) {
-  a <- Array$create(x)
+  a <- Array$create(x, type = type)

Review comment:
   Fair enough. I'll update 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-26 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r446269132



##
File path: r/tests/testthat/test-Array.R
##
@@ -18,16 +18,16 @@
 context("Array")
 
 expect_array_roundtrip <- function(x, type) {
-  a <- Array$create(x)
+  a <- Array$create(x, type = type)
   expect_type_equal(a$type, type)
   expect_identical(length(a), length(x))
-  if (!inherits(type, "ListType")) {
+  if (!inherits(type, "ListType") && !inherits(type, "LargeListType")) {

Review comment:
   I don't think there is inheritance down the C++ code





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-26 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r446041588



##
File path: r/src/array_from_vector.cpp
##
@@ -1067,12 +1110,22 @@ std::shared_ptr 
InferArrowTypeFromVector(SEXP x) {
   if (Rf_inherits(x, "data.frame")) {
 return InferArrowTypeFromDataFrame(x);
   } else {
-if (XLENGTH(x) == 0) {
-  Rcpp::stop(
-  "Requires at least one element to infer the values' type of a list 
vector");
-}
+SEXP ptype = Rf_getAttrib(x, symbols::ptype);
+if (ptype == R_NilValue) {
+  if (XLENGTH(x) == 0) {
+Rcpp::stop(
+"Requires at least one element to infer the values' type of a list 
vector");
+  }
 
-return arrow::list(InferArrowType(VECTOR_ELT(x, 0)));
+  return arrow::list(InferArrowType(VECTOR_ELT(x, 0)));
+} else {
+  // special case list(raw()) -> BinaryArray
+  if (TYPEOF(ptype) == RAWSXP) {
+return arrow::binary();
+  }
+
+  return arrow::list(InferArrowType(ptype));

Review comment:
   Done. I had to modify the roundtrip checks to use `expect_equivalent()` 
because a roundtrip might add information: 
   
   ```
   list() -> List Array -> list_of( ptype = ) 
   ```
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-25 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r445994055



##
File path: r/src/array_from_vector.cpp
##
@@ -915,6 +924,39 @@ class Time64Converter : public TimeConverter {
   }
 };
 
+class BinaryVectorConverter : public VectorConverter {
+ public:
+  ~BinaryVectorConverter() {}
+
+  Status Init(ArrayBuilder* builder) {
+typed_builder_ = checked_cast(builder);
+return Status::OK();
+  }
+
+  Status Ingest(SEXP obj) {
+ARROW_RETURN_IF(TYPEOF(obj) != VECSXP, Status::RError("Expecting a list"));
+R_xlen_t n = XLENGTH(obj);
+for (R_xlen_t i = 0; i < n; i++) {
+  SEXP obj_i = VECTOR_ELT(obj, i);
+  if (Rf_isNull(obj_i)) {
+RETURN_NOT_OK(typed_builder_->AppendNull());
+  } else {
+ARROW_RETURN_IF(TYPEOF(obj_i) != RAWSXP,
+Status::RError("Expecting a raw vector"));
+RETURN_NOT_OK(typed_builder_->Append(RAW(obj_i), XLENGTH(obj_i)));

Review comment:
   Thanks. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r444308097



##
File path: r/src/array_from_vector.cpp
##
@@ -1067,12 +1110,22 @@ std::shared_ptr 
InferArrowTypeFromVector(SEXP x) {
   if (Rf_inherits(x, "data.frame")) {
 return InferArrowTypeFromDataFrame(x);
   } else {
-if (XLENGTH(x) == 0) {
-  Rcpp::stop(
-  "Requires at least one element to infer the values' type of a list 
vector");
-}
+SEXP ptype = Rf_getAttrib(x, symbols::ptype);
+if (ptype == R_NilValue) {

Review comment:
   Counting 6 cases of `Rf_isNull()` and 3 cases of `== R_NilValue` I'll 
switch to `Rf_isNull()` here
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r444283172



##
File path: r/src/array_from_vector.cpp
##
@@ -1067,12 +1110,22 @@ std::shared_ptr 
InferArrowTypeFromVector(SEXP x) {
   if (Rf_inherits(x, "data.frame")) {
 return InferArrowTypeFromDataFrame(x);
   } else {
-if (XLENGTH(x) == 0) {
-  Rcpp::stop(
-  "Requires at least one element to infer the values' type of a list 
vector");
-}
+SEXP ptype = Rf_getAttrib(x, symbols::ptype);
+if (ptype == R_NilValue) {
+  if (XLENGTH(x) == 0) {
+Rcpp::stop(
+"Requires at least one element to infer the values' type of a list 
vector");
+  }
 
-return arrow::list(InferArrowType(VECTOR_ELT(x, 0)));
+  return arrow::list(InferArrowType(VECTOR_ELT(x, 0)));
+} else {
+  // special case list(raw()) -> BinaryArray
+  if (TYPEOF(ptype) == RAWSXP) {
+return arrow::binary();
+  }
+
+  return arrow::list(InferArrowType(ptype));

Review comment:
   Oh yeah that makes sense. Although this only looks at the attribute, not 
specifically that it is a `vctrs_list_of` but on the arrow -> R conversion, I 
think it does not hurt to make it a `vctrs_list_of`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox


romainfrancois commented on a change in pull request #7514:
URL: https://github.com/apache/arrow/pull/7514#discussion_r444281970



##
File path: r/src/array_from_vector.cpp
##
@@ -1067,12 +1110,22 @@ std::shared_ptr 
InferArrowTypeFromVector(SEXP x) {
   if (Rf_inherits(x, "data.frame")) {
 return InferArrowTypeFromDataFrame(x);
   } else {
-if (XLENGTH(x) == 0) {
-  Rcpp::stop(
-  "Requires at least one element to infer the values' type of a list 
vector");
-}
+SEXP ptype = Rf_getAttrib(x, symbols::ptype);
+if (ptype == R_NilValue) {

Review comment:
   IIRC @lionel- ? said he prefers `== R_NilValue` 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org