On Thu, Jan 30, 2020 at 3:43 PM Micah Kornfield <emkornfi...@gmail.com> wrote: >> >> (FWIW, we developed ArrayDataVisitor primarily for internal library >> use and not as a public API) >> I would personally try to first use VisitArrayInline if at all >> possible since it is simpler > > > Is VisitArrayInline meant to be for public use? visitor_inline.h still has > the disclaimer "Private header, not to be exported".
I think it should be fine for public use -- we should amend the documentation. Using the "inline" version is simpler in many ways from the virtual Visitor when you have a templated Visit function that matches many type cases. > Thanks, > Micah > > On Wed, Jan 29, 2020 at 8:57 AM Wes McKinney <wesmck...@gmail.com> wrote: >> >> On Wed, Jan 29, 2020 at 9:55 AM Calder, Matthew <mcal...@xbktrading.com> >> wrote: >> > >> > I managed to get conversion from CH to arrow using a CHToArrowType<> >> > inter-type traits concept. However, I am still trying to crack the use of: >> > >> > arrow::VisitArrayInline >> >> Here's a minimal example of VisitArrayInline >> >> struct ArrayVisitor { >> Status Visit(const Array& arr) { >> return Status::OK(); >> } >> }; >> >> Status VisitArrayInlineExample(const Array& arr) { >> ArrayVisitor visitor; >> return VisitArrayInline(arr, &visitor); >> } >> >> You can add different Visit functions to match different specific >> Array subclasses or groups of types (e.g. integers, floating point, >> etc.). std::enable_if is helpful (and the various helper templates in >> arrow/type_traits.h) >> >> > >> > and >> > >> > arrow::ArrayDataVisitor >> >> Here's an example (didn't compile this, but hopefully this gives the idea) >> >> struct BooleanValueVisitor { >> int64_t num_true = 0; >> int64_t num_null = 0; >> >> Status VisitNull() { >> ++num_null; >> return Status::OK(); >> } >> >> Status VisitValue(bool value) { >> if (value) ++num_true; >> return Status::OK(); >> } >> }; >> >> >> Status VisitBooleanValues(const Array& arr) { >> BooleanValueVisitor visitor; >> return ArrayDataVisitor<BooleanType>::Visit(*arr.data(), &visitor); >> } >> >> If you have a type-parameterized visitor, then you could have >> >> template <typename ArrowType> >> Status VisitArrayValues(const Array& arr) { >> MyValueVisitor<ArrowType> visitor; >> return ArrayDataVisitor<ArrowType>::Visit(*arr.data(), &visitor); >> } >> >> (FWIW, we developed ArrayDataVisitor primarily for internal library >> use and not as a public API) >> >> I would personally try to first use VisitArrayInline if at all >> possible since it is simpler >> >> > >> > I have a struct: >> > >> > Struct AnArrayUser >> > { >> > template <typename T> arrow::Status Visit(const T &a) >> > { >> > // How to invoke ArrayDataVisitor? >> > } >> > >> > void Use(const arrow::Array &a) {arrow::VisitArrayInline(a, this);} >> > >> > >> > arrow::Status VisitNull() {return arrow::Status::OK();} >> > template <class T> arrow::Status VisitValue(T val) {return >> > arrow::Status::OK();} >> > }; >> > >> > Which appears to have it's "Use" method called appropriately. But inside >> > of the Visit method I have so far been unable to find the incantation to >> > make a call through the ArrayDataVisitor. I've tried several variations of: >> > >> > arrow::ArrayDataVisitor<typename T::TypeClass>::Visit(*(array.data()), >> > this); >> > >> > at the // How to .. line above but can't seem to get it to work. I'm sure >> > I just have some fundamental misunderstanding of how this is supposed to >> > work. Can someone give me some guidance? >> > >> > Matt >> > >> > >> > >> > -----Original Message----- >> > From: Wes McKinney <wesmck...@gmail.com> >> > Sent: Wednesday, January 22, 2020 12:03 PM >> > To: user@arrow.apache.org >> > Subject: Re: Converting clickhouse column to arrow array >> > >> > If you search for "VisitTypeInline" or "VisitArrayInline" in the C++ >> > codebase you can find numerous examples of where this is used >> > >> > On Wed, Jan 22, 2020 at 10:58 AM Thomas Buhrmann >> > <thomas.buehrm...@gmail.com> wrote: >> > > >> > > Hi, >> > > I was looking for something similar, but didn't find a good example in >> > > the docs or the source code showing how to use the visitor pattern. It >> > > would be great, e.g., to have an example similar to the "Row to columnar >> > > conversion", showing a templated way to read arrow columns into C++ >> > > vectors using the visitor pattern, and without implementing a separate >> > > reader function for each arrow type. Would that be possible? >> > > >> > > Many thanks, >> > > Thomas >> > > >> > > On Wed, 22 Jan 2020 at 17:13, Wes McKinney <wesmck...@gmail.com> wrote: >> > >> >> > >> hi Matt, >> > >> >> > >> I recommend you use the visitor pattern combined with the >> > >> arrow::TypeTraits that we provide >> > >> >> > >> https://clicktime.symantec.com/38JEFUTGByJzrxbCs1aM2Mn7Vc?u=https%3A% >> > >> 2F%2Fgithub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fcpp%2Fsrc%2Farrow% >> > >> 2Ftype_traits.h >> > >> >> > >> You'll need to provide a compile-time mapping from Clickhouse types >> > >> to Arrow types, but then you can statically access the correct >> > >> builder type at compile time >> > >> >> > >> using ArrowType = typename CHToArrowType<CHType>::ArrowType; using >> > >> BuilderType = typename TypeTraits<ArrowType>::BuilderType; >> > >> >> > >> ... >> > >> >> > >> or similar. In cases where the exported Clickhouse data does not have >> > >> an associated AppendValues method in Arrow you may have to write a >> > >> special case (please open JIRA issues if you think there should be >> > >> more AppendValues methods) >> > >> >> > >> Thanks >> > >> >> > >> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew >> > >> <mcal...@xbktrading.com> wrote: >> > >> > >> > >> > Hi, >> > >> > >> > >> > >> > >> > >> > >> > I am interfacing arrow to a Clickhouse database using their c++ >> > >> > client. Both arrow and CH have generic array-like classes with the >> > >> > element data type internalized. Ideally, I would like to be able to >> > >> > write something like: >> > >> > >> > >> > >> > >> > >> > >> > arrow::Array a = SomeConversionInvocation(clickhouse::Column c); >> > >> > >> > >> > >> > >> > >> > >> > Where the array and column have the same element type (int, double, >> > >> > string, …) but the code is generic to the specific type. >> > >> > >> > >> > >> > >> > >> > >> > I can do this by explicitly handling specific types through template >> > >> > specialization but I thought that since arrow already has pretty >> > >> > generic type handling through its templates, and clickhouse also has >> > >> > similar capability there ought to be a more seamless way to do the >> > >> > conversion. Zero copy would probably be a lot to ask, but something >> > >> > short of template specializations for every type is what I am aiming >> > >> > for. >> > >> > >> > >> > >> > >> > >> > >> > I currently do explicit type specialization. For example I have >> > >> > functions like: >> > >> > >> > >> > >> > >> > >> > >> > inline std::shared_ptr<arrow::Array> makeArray(const >> > >> > std::vector<double> &v) >> > >> > >> > >> > { >> > >> > >> > >> > arrow::DoubleBuilder builder; >> > >> > >> > >> > builder.AppendValues(v); >> > >> > >> > >> > std::shared_ptr<arrow::Array> array; >> > >> > >> > >> > builder.Finish(&array); >> > >> > >> > >> > return array; >> > >> > >> > >> > } >> > >> > >> > >> > >> > >> > >> > >> > inline std::shared_ptr<arrow::Array> makeArray(const >> > >> > std::vector<int> &v) >> > >> > >> > >> > { >> > >> > >> > >> > arrow::Int32Builder builder; >> > >> > >> > >> > builder.AppendValues(v); >> > >> > >> > >> > std::shared_ptr<arrow::Array> array; >> > >> > >> > >> > builder.Finish(&array); >> > >> > >> > >> > return array; >> > >> > >> > >> > } >> > >> > >> > >> > >> > >> > >> > >> > Which I suspect is unnecessarily explicit. Is there a more generic >> > >> > way of handling the variety of underlying array element data types >> > >> > when constructing arrow::Array objects? And can someone point me to >> > >> > examples that interface arrow to another similarly generically typed >> > >> > library (doesn’t have to be clickhouse). Thanks for any guidance. >> > >> > >> > >> > >> > >> > >> > >> > Matt >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > The information contained in this e-mail may be confidential and is >> > >> > intended solely for the use of the named addressee. >> > >> > >> > >> > Access, copying or re-use of the e-mail or any information contained >> > >> > therein by any other person is not authorized. >> > >> > >> > >> > If you are not the intended recipient please notify us immediately by >> > >> > returning the e-mail to the originator. >> > >> > >> > >> > Disclaimer Version MB.US.1 >> > >> > The information contained in this e-mail may be confidential and is >> > intended solely for the use of the named addressee. >> > >> > Access, copying or re-use of the e-mail or any information contained >> > therein by any other person is not authorized. >> > >> > If you are not the intended recipient please notify us immediately by >> > returning the e-mail to the originator. >> > >> > Disclaimer Version MB.US.1