On Wed, Jan 29, 2020 at 9:55 AM Calder, Matthew <[email protected]> wrote:
>
> I managed to get conversion from CH to arrow using a CHToArrowType<> 
> inter-type traits concept. However, I am still trying to crack the use of:
>
>  arrow::VisitArrayInline

Here's a minimal example of VisitArrayInline

struct ArrayVisitor {
  Status Visit(const Array& arr) {
    return Status::OK();
  }
};

Status VisitArrayInlineExample(const Array& arr) {
  ArrayVisitor visitor;
  return VisitArrayInline(arr, &visitor);
}

You can add different Visit functions to match different specific
Array subclasses or groups of types (e.g. integers, floating point,
etc.). std::enable_if is helpful (and the various helper templates in
arrow/type_traits.h)

>
> and
>
> arrow::ArrayDataVisitor

Here's an example (didn't compile this, but hopefully this gives the idea)

struct BooleanValueVisitor {
  int64_t num_true = 0;
  int64_t num_null = 0;

  Status VisitNull() {
    ++num_null;
    return Status::OK();
  }

  Status VisitValue(bool value) {
    if (value) ++num_true;
    return Status::OK();
  }
};


Status VisitBooleanValues(const Array& arr) {
  BooleanValueVisitor visitor;
  return ArrayDataVisitor<BooleanType>::Visit(*arr.data(), &visitor);
}

If you have a type-parameterized visitor, then you could have

template <typename ArrowType>
Status VisitArrayValues(const Array& arr) {
  MyValueVisitor<ArrowType> visitor;
  return ArrayDataVisitor<ArrowType>::Visit(*arr.data(), &visitor);
}

(FWIW, we developed ArrayDataVisitor primarily for internal library
use and not as a public API)

I would personally try to first use VisitArrayInline if at all
possible since it is simpler

>
> I have a struct:
>
> Struct AnArrayUser
> {
>      template <typename T> arrow::Status Visit(const T &a)
>      {
>            // How to invoke ArrayDataVisitor?
>      }
>
>      void Use(const arrow::Array &a) {arrow::VisitArrayInline(a, this);}
>
>
>      arrow::Status VisitNull() {return arrow::Status::OK();}
>      template <class T> arrow::Status VisitValue(T val) {return 
> arrow::Status::OK();}
> };
>
> Which appears to have it's "Use" method called appropriately. But inside of 
> the Visit method I have so far been unable to find the incantation to make a 
> call through the ArrayDataVisitor. I've tried several variations of:
>
> arrow::ArrayDataVisitor<typename T::TypeClass>::Visit(*(array.data()), this);
>
> at the // How to .. line above but can't seem to get it to work. I'm sure I 
> just have some fundamental misunderstanding of how this is supposed to work. 
> Can someone give me some guidance?
>
> Matt
>
>
>
> -----Original Message-----
> From: Wes McKinney <[email protected]>
> Sent: Wednesday, January 22, 2020 12:03 PM
> To: [email protected]
> Subject: Re: Converting clickhouse column to arrow array
>
> If you search for "VisitTypeInline" or "VisitArrayInline" in the C++ codebase 
> you can find numerous examples of where this is used
>
> On Wed, Jan 22, 2020 at 10:58 AM Thomas Buhrmann <[email protected]> 
> wrote:
> >
> > Hi,
> > I was looking for something similar, but didn't find a good example in the 
> > docs or the source code showing how to use the visitor pattern. It would be 
> > great, e.g., to have an example similar to the "Row to columnar 
> > conversion", showing a templated way to read arrow columns into C++ vectors 
> > using the visitor pattern, and without implementing a separate reader 
> > function for each arrow type. Would that be possible?
> >
> > Many thanks,
> > Thomas
> >
> > On Wed, 22 Jan 2020 at 17:13, Wes McKinney <[email protected]> wrote:
> >>
> >> hi Matt,
> >>
> >> I recommend you use the visitor pattern combined with the
> >> arrow::TypeTraits that we provide
> >>
> >> https://clicktime.symantec.com/38JEFUTGByJzrxbCs1aM2Mn7Vc?u=https%3A%
> >> 2F%2Fgithub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fcpp%2Fsrc%2Farrow%
> >> 2Ftype_traits.h
> >>
> >> You'll need to provide a compile-time mapping from Clickhouse types
> >> to Arrow types, but then you can statically access the correct
> >> builder type at compile time
> >>
> >> using ArrowType = typename CHToArrowType<CHType>::ArrowType; using
> >> BuilderType = typename TypeTraits<ArrowType>::BuilderType;
> >>
> >> ...
> >>
> >> or similar. In cases where the exported Clickhouse data does not have
> >> an associated AppendValues method in Arrow you may have to write a
> >> special case (please open JIRA issues if you think there should be
> >> more AppendValues methods)
> >>
> >> Thanks
> >>
> >> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew <[email protected]> 
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> >
> >> >
> >> > I am interfacing arrow to a Clickhouse database using their c++ client. 
> >> > Both arrow and CH have generic array-like classes with the element data 
> >> > type internalized. Ideally, I would like to be able to write something 
> >> > like:
> >> >
> >> >
> >> >
> >> > arrow::Array a = SomeConversionInvocation(clickhouse::Column c);
> >> >
> >> >
> >> >
> >> > Where the array and column have the same element type (int, double, 
> >> > string, …) but the code is generic to the specific type.
> >> >
> >> >
> >> >
> >> > I can do this by explicitly handling specific types through template 
> >> > specialization but I thought that since arrow already has pretty generic 
> >> > type handling through its templates, and clickhouse also has similar 
> >> > capability there ought to be a more seamless way to do the conversion. 
> >> > Zero copy would probably be a lot to ask, but something short of 
> >> > template specializations for every type is what I am aiming for.
> >> >
> >> >
> >> >
> >> > I currently do explicit type specialization. For example I have 
> >> > functions like:
> >> >
> >> >
> >> >
> >> > inline std::shared_ptr<arrow::Array> makeArray(const
> >> > std::vector<double> &v)
> >> >
> >> > {
> >> >
> >> >     arrow::DoubleBuilder builder;
> >> >
> >> >     builder.AppendValues(v);
> >> >
> >> >     std::shared_ptr<arrow::Array> array;
> >> >
> >> >     builder.Finish(&array);
> >> >
> >> >     return array;
> >> >
> >> > }
> >> >
> >> >
> >> >
> >> > inline std::shared_ptr<arrow::Array> makeArray(const
> >> > std::vector<int> &v)
> >> >
> >> > {
> >> >
> >> >     arrow::Int32Builder builder;
> >> >
> >> >     builder.AppendValues(v);
> >> >
> >> >     std::shared_ptr<arrow::Array> array;
> >> >
> >> >     builder.Finish(&array);
> >> >
> >> >     return array;
> >> >
> >> > }
> >> >
> >> >
> >> >
> >> > Which I suspect is unnecessarily explicit. Is there a more generic way 
> >> > of handling the variety of underlying array element data types when 
> >> > constructing arrow::Array objects? And can someone point me to examples 
> >> > that interface arrow to another similarly generically typed library 
> >> > (doesn’t have to be clickhouse). Thanks for any guidance.
> >> >
> >> >
> >> >
> >> > Matt
> >> >
> >> >
> >> >
> >> >
> >> > The information contained in this e-mail may be confidential and is 
> >> > intended solely for the use of the named addressee.
> >> >
> >> > Access, copying or re-use of the e-mail or any information contained 
> >> > therein by any other person is not authorized.
> >> >
> >> > If you are not the intended recipient please notify us immediately by 
> >> > returning the e-mail to the originator.
> >> >
> >> > Disclaimer Version MB.US.1
>
> The information contained in this e-mail may be confidential and is intended 
> solely for the use of the named addressee.
>
> Access, copying or re-use of the e-mail or any information contained therein 
> by any other person is not authorized.
>
> If you are not the intended recipient please notify us immediately by 
> returning the e-mail to the originator.
>
> Disclaimer Version MB.US.1

Reply via email to