[jira] [Commented] (ARROW-6377) [C++] Extending STL API to support row-wise conversion
[ https://issues.apache.org/jira/browse/ARROW-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947781#comment-16947781 ] Omer Ozarslan commented on ARROW-6377: -- ||Arrow Type||C++ Type|| |NA BOOL UINT8 INT8 UINT16 INT16 UINT32 INT32 UINT64 INT64 HALF_FLOAT FLOAT DOUBLE STRING BINARY FIXED_SIZE_BINARY DATE32 DATE64 TIMESTAMP TIME32 TIME64 INTERVAL DECIMAL LIST STRUCT UNION DICTIONARY MAP EXTENSION FIXED_SIZE_LIST DURATION LARGE_STRING LARGE_BINARY LARGE_LIST| | > [C++] Extending STL API to support row-wise conversion > -- > > Key: ARROW-6377 > URL: https://issues.apache.org/jira/browse/ARROW-6377 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > Fix For: 1.0.0 > > > Using array builders is the recommended way in the documentation for > converting rowwise data to arrow tables currently. However, array builders > has a low level interface to support various use cases in the library. They > require additional boilerplate due to type erasure, although some of these > boilerplate could be avoided in compile time if the schema is already known > and fixed (also discussed in ARROW-4067). > In some other part of the library, STL API provides a nice abstraction over > builders by inferring data type and builders from values provided, reducing > the boilerplate significantly. It handles automatically converting tuples > with a limited set of native types currently: numeric types, string and > vector (+ nullable variations of these in case ARROW-6326 is merged). It also > allows passing references in tuple values (implemented recently in > ARROW-6284). > As a more concrete example, this is the code which can be used to convert > {{row_data}} provided in examples: > > {code:cpp} > arrow::Status VectorToColumnarTableSTL(const std::vector& > rows, >std::shared_ptr* table) { > auto rng = rows | ranges::views::transform([](const data_row& row) { >return std::tuple&>( >row.id, row.cost, row.cost_components); >}); > return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng, >{"id", "cost", "cost_components"}, >table); > } > {code} > So, it allows more concise code for consumers of the API compared to using > builders directly. > There is no direct support by the library for other types (binary, struct, > union etc. types or converting iterable objects other than vectors to lists). > Users are provided a way to specialize their own data structures. One > limitation for implicit inference is that it is hard (or even impossible) to > infer exact type to use in some cases. For example, should > {{std::string_view}} value be inferred as string, binary, large binary or > list? This ambiguity can be avoided by providing some way for user to > explicitly state correct type for storing a column. For example a user can > return a so called {{BinaryCell}} class to return binary values. > Proposed changes: > * Implementing cell "adapters": Cells are non-owning references for each > type. It's user's responsibility keep pointed values alive. (Can scalars be > used in this context?) > ** BinaryCell > ** StringCell > ** ListCell (fo adapting any Range) > ** StructCell > ** ... > * Primitive types don't need such adapters since their values are trivial to > cast (e.g. just use int8_t(value) to use Int8Type). > * Adding benchmarks for comparing with builder performance. There is likely > to be some performance penalty due to hindering compiler optimizations. Yet, > this is acceptable in exchange of a more concise code IMHO. For fine-grained > control over performance, it will be still possible to directly use builders. > I have implemented something similar to BinaryCell for my use case. If above > changes sound reasonable, I will go ahead and start implementing other cells > to submit. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (ARROW-6377) [C++] Extending STL API to support row-wise conversion
[ https://issues.apache.org/jira/browse/ARROW-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omer Ozarslan updated ARROW-6377: - Comment: was deleted (was: ||Arrow Type||C++ Type|| |NA BOOL UINT8 INT8 UINT16 INT16 UINT32 INT32 UINT64 INT64 HALF_FLOAT FLOAT DOUBLE STRING BINARY FIXED_SIZE_BINARY DATE32 DATE64 TIMESTAMP TIME32 TIME64 INTERVAL DECIMAL LIST STRUCT UNION DICTIONARY MAP EXTENSION FIXED_SIZE_LIST DURATION LARGE_STRING LARGE_BINARY LARGE_LIST| |) > [C++] Extending STL API to support row-wise conversion > -- > > Key: ARROW-6377 > URL: https://issues.apache.org/jira/browse/ARROW-6377 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > Fix For: 1.0.0 > > > Using array builders is the recommended way in the documentation for > converting rowwise data to arrow tables currently. However, array builders > has a low level interface to support various use cases in the library. They > require additional boilerplate due to type erasure, although some of these > boilerplate could be avoided in compile time if the schema is already known > and fixed (also discussed in ARROW-4067). > In some other part of the library, STL API provides a nice abstraction over > builders by inferring data type and builders from values provided, reducing > the boilerplate significantly. It handles automatically converting tuples > with a limited set of native types currently: numeric types, string and > vector (+ nullable variations of these in case ARROW-6326 is merged). It also > allows passing references in tuple values (implemented recently in > ARROW-6284). > As a more concrete example, this is the code which can be used to convert > {{row_data}} provided in examples: > > {code:cpp} > arrow::Status VectorToColumnarTableSTL(const std::vector& > rows, >std::shared_ptr* table) { > auto rng = rows | ranges::views::transform([](const data_row& row) { >return std::tuple&>( >row.id, row.cost, row.cost_components); >}); > return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng, >{"id", "cost", "cost_components"}, >table); > } > {code} > So, it allows more concise code for consumers of the API compared to using > builders directly. > There is no direct support by the library for other types (binary, struct, > union etc. types or converting iterable objects other than vectors to lists). > Users are provided a way to specialize their own data structures. One > limitation for implicit inference is that it is hard (or even impossible) to > infer exact type to use in some cases. For example, should > {{std::string_view}} value be inferred as string, binary, large binary or > list? This ambiguity can be avoided by providing some way for user to > explicitly state correct type for storing a column. For example a user can > return a so called {{BinaryCell}} class to return binary values. > Proposed changes: > * Implementing cell "adapters": Cells are non-owning references for each > type. It's user's responsibility keep pointed values alive. (Can scalars be > used in this context?) > ** BinaryCell > ** StringCell > ** ListCell (fo adapting any Range) > ** StructCell > ** ... > * Primitive types don't need such adapters since their values are trivial to > cast (e.g. just use int8_t(value) to use Int8Type). > * Adding benchmarks for comparing with builder performance. There is likely > to be some performance penalty due to hindering compiler optimizations. Yet, > this is acceptable in exchange of a more concise code IMHO. For fine-grained > control over performance, it will be still possible to directly use builders. > I have implemented something similar to BinaryCell for my use case. If above > changes sound reasonable, I will go ahead and start implementing other cells > to submit. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6405) [Python] Add std::move wrapper for use in Cython
[ https://issues.apache.org/jira/browse/ARROW-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942124#comment-16942124 ] Omer Ozarslan commented on ARROW-6405: -- Ah, thanks. Sorry for delay. I was occupied with some other stuff. I also thought I could delay this to yield CI to more critical bugs before the release. It sounds okay to me to close this issue with that PR merged. I'm not sure if it's necessary create a separate PR just for this. There seems a few other points move is used. You may want to replace those in the PR as well: {code:java} ~/src/ext/arrow/python/pyarrow master grep "move" -rnw . | grep -E ".(pyx|pxi|pxd)" ./_flight.pyx:946: move(handler))) ./_flight.pyx:1358:new CPyFlightDataStream(result, move(data_stream))) ./_flight.pyx:1485:new CPyFlightDataStream(result, move(data_stream))) ./includes/libarrow_flight.pxd:378:unique_ptr[CFlightDataStream] move(unique_ptr[CFlightDataStream]) nogil ./includes/libarrow_flight.pxd:379:unique_ptr[CServerAuthHandler] move(unique_ptr[CServerAuthHandler]) nogil ./includes/libarrow_flight.pxd:380:unique_ptr[CClientAuthHandler] move(unique_ptr[CClientAuthHandler]) nogil ./_fs.pyx:249:def move(self, src, dest): {code} cymove doesn't enforce nogil as it made sense to me to leave decision for the gil to the caller of move (it's just casting after all). > [Python] Add std::move wrapper for use in Cython > > > Key: ARROW-6405 > URL: https://issues.apache.org/jira/browse/ARROW-6405 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > [~bkietz] pointed out this to me > https://github.com/ozars/cymove > This is small enough that we should simply copy this code into our codebase > (MIT-licensed) and fix the couple of places where we have > {{std::move}}-related workarounds. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6552) [C++] boost::optional in STL test fails compiling in gcc 4.8.2
[ https://issues.apache.org/jira/browse/ARROW-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928779#comment-16928779 ] Omer Ozarslan commented on ARROW-6552: -- Is there a simple way (such as Dockerfile) to test changes on gcc 4.8.2? > [C++] boost::optional in STL test fails compiling in gcc 4.8.2 > -- > > Key: ARROW-6552 > URL: https://issues.apache.org/jira/browse/ARROW-6552 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > > Quoting [~bkietz] from mailgroup: > {code:java} > a tuple constructor is choking on implicit conversion from > string literal (char[6]) to boost::optional{code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6552) [C++] boost::optional in STL test fails compiling in gcc 4.8.2
Omer Ozarslan created ARROW-6552: Summary: [C++] boost::optional in STL test fails compiling in gcc 4.8.2 Key: ARROW-6552 URL: https://issues.apache.org/jira/browse/ARROW-6552 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Omer Ozarslan Quoting [~bkietz] from mailgroup: {code:java} a tuple constructor is choking on implicit conversion from string literal (char[6]) to boost::optional{code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6405) [Python] Add std::move wrapper for use in Cython
[ https://issues.apache.org/jira/browse/ARROW-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920544#comment-16920544 ] Omer Ozarslan commented on ARROW-6405: -- I'd be happy to work on this if this isn't assigned to anyone. :) > [Python] Add std::move wrapper for use in Cython > > > Key: ARROW-6405 > URL: https://issues.apache.org/jira/browse/ARROW-6405 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > [~bkietz] pointed out this to me > https://github.com/ozars/cymove > This is small enough that we should simply copy this code into our codebase > (MIT-licensed) and fix the couple of places where we have > {{std::move}}-related workarounds. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6387) [Archery] Errors with make
[ https://issues.apache.org/jira/browse/ARROW-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918795#comment-16918795 ] Omer Ozarslan commented on ARROW-6387: -- Okay, submitting a PR soon. > [Archery] Errors with make > -- > > Key: ARROW-6387 > URL: https://issues.apache.org/jira/browse/ARROW-6387 > Project: Apache Arrow > Issue Type: Bug >Reporter: Omer Ozarslan >Priority: Minor > > {{archery --debug benchmark run}} gives error on Debian 10, CMake 3.13.4, GNU > make 4.2.1: > {code:java} > (.venv) omer@omer ~/src/ext/arrow/cpp/build master ● archery --debug > benchmark run > > DEBUG:archery:Running benchmark WORKSPACE > > > DEBUG:archery:Executing `['/usr/bin/cmake', '-GMake', > '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DCMAKE_BUILD_TYPE=release', > '-DBUILD_WARNING_LEVEL=production', '-DARROW_BUILD_TESTS=ON', > '-DARROW_BUILD_BENCHMARKS=ON', '-DARROW_PYTHON=OFF', '-DARROW_PARQUET=OFF', > '-DARROW_GANDIVA=OFF', '-DARROW_PLASMA=OFF', '-DARROW_FLIGHT=OFF', > '/home/omer/src/ext/arrow/cpp']` > CMake Error: Could not create named generator Make > > > > Generators > > Unix Makefiles = Generates standard UNIX makefiles. > > > Ninja= Generates build.ninja files. > > > Watcom WMake = Generates Watcom WMake makefiles. > > > CodeBlocks - Ninja = Generates CodeBlocks project files. > > > CodeBlocks - Unix Makefiles = Generates CodeBlocks project files. > > > CodeLite - Ninja = Generates CodeLite project files. > > CodeLite - Unix Makefiles= Generates CodeLite project files. > > Sublime Text 2 - Ninja = Generates Sublime Text 2 project files. > > Sublime Text 2 - Unix Makefiles >= Generates Sublime Text 2 project files. > > Kate - Ninja = Generates Kate project files. > > Kate - Unix Makefiles= Generates Kate project files. > Eclipse CDT4 - Ninja = Generates Eclipse CDT 4.0 project files. > Eclipse CDT4 - Unix Makefiles= Generates Eclipse CDT 4.0 project files. > Traceback (most recent call last): > [[[cropped]]]{code} > After trivial fix: > {code:java} > diff --git a/dev/archery/archery/utils/cmake.py > b/dev/archery/archery/utils/cmake.py > index 38aedab2d..3150ea9a6 100644 > --- a/dev/archery/archery/utils/cmake.py > +++ b/dev/archery/archery/utils/cmake.py > @@ -34,7 +34,7 @@ class CMake(Command): > in the search path. > """ > found_ninja = which("ninja") > -return "Ninja" if found_ninja else "Make" > +return "Ninja" if found_ninja else "Unix Makefiles"{code} > I get another error: > {code:java} > [[[cropped]] > -- Generating done > -- Build files have been written to: /tmp/arrow-bench-48x_yleb/WORKSPACE/build > DEBUG:archery:Executing `[None]` > Traceback (most recent call last): > File "/home/omer/src/ext/arrow/.venv/bin/archery", line 11, in > load_entry_point('archery', 'console_scripts', 'archery')() > File > "/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", > line 764, in __call__ > return self.main(*args, **kwargs) > File >
[jira] [Commented] (ARROW-6387) [Archery] Errors with make
[ https://issues.apache.org/jira/browse/ARROW-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918784#comment-16918784 ] Omer Ozarslan commented on ARROW-6387: -- Thanks. How about calling cmake --build instead of the build program itself? > [Archery] Errors with make > -- > > Key: ARROW-6387 > URL: https://issues.apache.org/jira/browse/ARROW-6387 > Project: Apache Arrow > Issue Type: Bug >Reporter: Omer Ozarslan >Priority: Minor > > {{archery --debug benchmark run}} gives error on Debian 10, CMake 3.13.4, GNU > make 4.2.1: > {code:java} > (.venv) omer@omer ~/src/ext/arrow/cpp/build master ● archery --debug > benchmark run > > DEBUG:archery:Running benchmark WORKSPACE > > > DEBUG:archery:Executing `['/usr/bin/cmake', '-GMake', > '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DCMAKE_BUILD_TYPE=release', > '-DBUILD_WARNING_LEVEL=production', '-DARROW_BUILD_TESTS=ON', > '-DARROW_BUILD_BENCHMARKS=ON', '-DARROW_PYTHON=OFF', '-DARROW_PARQUET=OFF', > '-DARROW_GANDIVA=OFF', '-DARROW_PLASMA=OFF', '-DARROW_FLIGHT=OFF', > '/home/omer/src/ext/arrow/cpp']` > CMake Error: Could not create named generator Make > > > > Generators > > Unix Makefiles = Generates standard UNIX makefiles. > > > Ninja= Generates build.ninja files. > > > Watcom WMake = Generates Watcom WMake makefiles. > > > CodeBlocks - Ninja = Generates CodeBlocks project files. > > > CodeBlocks - Unix Makefiles = Generates CodeBlocks project files. > > > CodeLite - Ninja = Generates CodeLite project files. > > CodeLite - Unix Makefiles= Generates CodeLite project files. > > Sublime Text 2 - Ninja = Generates Sublime Text 2 project files. > > Sublime Text 2 - Unix Makefiles >= Generates Sublime Text 2 project files. > > Kate - Ninja = Generates Kate project files. > > Kate - Unix Makefiles= Generates Kate project files. > Eclipse CDT4 - Ninja = Generates Eclipse CDT 4.0 project files. > Eclipse CDT4 - Unix Makefiles= Generates Eclipse CDT 4.0 project files. > Traceback (most recent call last): > [[[cropped]]]{code} > After trivial fix: > {code:java} > diff --git a/dev/archery/archery/utils/cmake.py > b/dev/archery/archery/utils/cmake.py > index 38aedab2d..3150ea9a6 100644 > --- a/dev/archery/archery/utils/cmake.py > +++ b/dev/archery/archery/utils/cmake.py > @@ -34,7 +34,7 @@ class CMake(Command): > in the search path. > """ > found_ninja = which("ninja") > -return "Ninja" if found_ninja else "Make" > +return "Ninja" if found_ninja else "Unix Makefiles"{code} > I get another error: > {code:java} > [[[cropped]] > -- Generating done > -- Build files have been written to: /tmp/arrow-bench-48x_yleb/WORKSPACE/build > DEBUG:archery:Executing `[None]` > Traceback (most recent call last): > File "/home/omer/src/ext/arrow/.venv/bin/archery", line 11, in > load_entry_point('archery', 'console_scripts', 'archery')() > File > "/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", > line 764, in __call__ > return self.main(*args, **kwargs) >
[jira] [Created] (ARROW-6387) [Archery] Errors with make
Omer Ozarslan created ARROW-6387: Summary: [Archery] Errors with make Key: ARROW-6387 URL: https://issues.apache.org/jira/browse/ARROW-6387 Project: Apache Arrow Issue Type: Bug Reporter: Omer Ozarslan {{archery --debug benchmark run}} gives error on Debian 10, CMake 3.13.4, GNU make 4.2.1: {code:java} (.venv) omer@omer ~/src/ext/arrow/cpp/build master ● archery --debug benchmark run DEBUG:archery:Running benchmark WORKSPACE DEBUG:archery:Executing `['/usr/bin/cmake', '-GMake', '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DCMAKE_BUILD_TYPE=release', '-DBUILD_WARNING_LEVEL=production', '-DARROW_BUILD_TESTS=ON', '-DARROW_BUILD_BENCHMARKS=ON', '-DARROW_PYTHON=OFF', '-DARROW_PARQUET=OFF', '-DARROW_GANDIVA=OFF', '-DARROW_PLASMA=OFF', '-DARROW_FLIGHT=OFF', '/home/omer/src/ext/arrow/cpp']` CMake Error: Could not create named generator Make Generators Unix Makefiles = Generates standard UNIX makefiles. Ninja= Generates build.ninja files. Watcom WMake = Generates Watcom WMake makefiles. CodeBlocks - Ninja = Generates CodeBlocks project files. CodeBlocks - Unix Makefiles = Generates CodeBlocks project files. CodeLite - Ninja = Generates CodeLite project files. CodeLite - Unix Makefiles= Generates CodeLite project files. Sublime Text 2 - Ninja = Generates Sublime Text 2 project files. Sublime Text 2 - Unix Makefiles = Generates Sublime Text 2 project files. Kate - Ninja = Generates Kate project files. Kate - Unix Makefiles= Generates Kate project files. Eclipse CDT4 - Ninja = Generates Eclipse CDT 4.0 project files. Eclipse CDT4 - Unix Makefiles= Generates Eclipse CDT 4.0 project files. Traceback (most recent call last): [[[cropped]]]{code} After trivial fix: {code:java} diff --git a/dev/archery/archery/utils/cmake.py b/dev/archery/archery/utils/cmake.py index 38aedab2d..3150ea9a6 100644 --- a/dev/archery/archery/utils/cmake.py +++ b/dev/archery/archery/utils/cmake.py @@ -34,7 +34,7 @@ class CMake(Command): in the search path. """ found_ninja = which("ninja") -return "Ninja" if found_ninja else "Make" +return "Ninja" if found_ninja else "Unix Makefiles"{code} I get another error: {code:java} [[[cropped]] -- Generating done -- Build files have been written to: /tmp/arrow-bench-48x_yleb/WORKSPACE/build DEBUG:archery:Executing `[None]` Traceback (most recent call last): File "/home/omer/src/ext/arrow/.venv/bin/archery", line 11, in load_entry_point('archery', 'console_scripts', 'archery')() File "/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", line 1137, in invoke return
[jira] [Commented] (ARROW-6371) [Doc] Row to columnar conversion example mentions arrow::Column in comments
[ https://issues.apache.org/jira/browse/ARROW-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917965#comment-16917965 ] Omer Ozarslan commented on ARROW-6371: -- Thanks. I replied this thread over email yesterday, but I guess the response didn't get through for some reason. I submitted the PR. > [Doc] Row to columnar conversion example mentions arrow::Column in comments > --- > > Key: ARROW-6371 > URL: https://issues.apache.org/jira/browse/ARROW-6371 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation >Reporter: Omer Ozarslan >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html > {code:cpp} > // The final representation should be an `arrow::Table` which in turn is made > up of > // an `arrow::Schema` and a list of `arrow::Column`. An `arrow::Column` is > again a > // named collection of one or more `arrow::Array` instances. As the first > step, we > // will iterate over the data and build up the arrays incrementally. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6377) [C++] Extending STL API to support row-wise conversion
[ https://issues.apache.org/jira/browse/ARROW-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917936#comment-16917936 ] Omer Ozarslan commented on ARROW-6377: -- On a side note, this _might_ have a better performance due to use of compile time knowledge, but it eventually comes down to benchmark. > [C++] Extending STL API to support row-wise conversion > -- > > Key: ARROW-6377 > URL: https://issues.apache.org/jira/browse/ARROW-6377 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > > Using array builders is the recommended way in the documentation for > converting rowwise data to arrow tables currently. However, array builders > has a low level interface to support various use cases in the library. They > require additional boilerplate due to type erasure, although some of these > boilerplate could be avoided in compile time if the schema is already known > and fixed (also discussed in ARROW-4067). > In some other part of the library, STL API provides a nice abstraction over > builders by inferring data type and builders from values provided, reducing > the boilerplate significantly. It handles automatically converting tuples > with a limited set of native types currently: numeric types, string and > vector (+ nullable variations of these in case ARROW-6326 is merged). It also > allows passing references in tuple values (implemented recently in > ARROW-6284). > As a more concrete example, this is the code which can be used to convert > {{row_data}} provided in examples: > > {code:cpp} > arrow::Status VectorToColumnarTableSTL(const std::vector& > rows, >std::shared_ptr* table) { > auto rng = rows | ranges::views::transform([](const data_row& row) { >return std::tuple&>( >row.id, row.cost, row.cost_components); >}); > return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng, >{"id", "cost", "cost_components"}, >table); > } > {code} > So, it allows more concise code for consumers of the API compared to using > builders directly. > There is no direct support by the library for other types (binary, struct, > union etc. types or converting iterable objects other than vectors to lists). > Users are provided a way to specialize their own data structures. One > limitation for implicit inference is that it is hard (or even impossible) to > infer exact type to use in some cases. For example, should > {{std::string_view}} value be inferred as string, binary, large binary or > list? This ambiguity can be avoided by providing some way for user to > explicitly state correct type for storing a column. For example a user can > return a so called {{BinaryCell}} class to return binary values. > Proposed changes: > * Implementing cell "adapters": Cells are non-owning references for each > type. It's user's responsibility keep pointed values alive. (Can scalars be > used in this context?) > ** BinaryCell > ** StringCell > ** ListCell (fo adapting any Range) > ** StructCell > ** ... > * Primitive types don't need such adapters since their values are trivial to > cast (e.g. just use int8_t(value) to use Int8Type). > * Adding benchmarks for comparing with builder performance. There is likely > to be some performance penalty due to hindering compiler optimizations. Yet, > this is acceptable in exchange of a more concise code IMHO. For fine-grained > control over performance, it will be still possible to directly use builders. > I have implemented something similar to BinaryCell for my use case. If above > changes sound reasonable, I will go ahead and start implementing other cells > to submit. > > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (ARROW-6377) [C++] Extending STL API to support row-wise conversion
[ https://issues.apache.org/jira/browse/ARROW-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917936#comment-16917936 ] Omer Ozarslan edited comment on ARROW-6377 at 8/28/19 5:08 PM: --- On a side note, this _might_ have a better performance due to use of compile time knowledge, but it eventually comes down to benchmarking. was (Author: ozars): On a side note, this _might_ have a better performance due to use of compile time knowledge, but it eventually comes down to benchmark. > [C++] Extending STL API to support row-wise conversion > -- > > Key: ARROW-6377 > URL: https://issues.apache.org/jira/browse/ARROW-6377 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > > Using array builders is the recommended way in the documentation for > converting rowwise data to arrow tables currently. However, array builders > has a low level interface to support various use cases in the library. They > require additional boilerplate due to type erasure, although some of these > boilerplate could be avoided in compile time if the schema is already known > and fixed (also discussed in ARROW-4067). > In some other part of the library, STL API provides a nice abstraction over > builders by inferring data type and builders from values provided, reducing > the boilerplate significantly. It handles automatically converting tuples > with a limited set of native types currently: numeric types, string and > vector (+ nullable variations of these in case ARROW-6326 is merged). It also > allows passing references in tuple values (implemented recently in > ARROW-6284). > As a more concrete example, this is the code which can be used to convert > {{row_data}} provided in examples: > > {code:cpp} > arrow::Status VectorToColumnarTableSTL(const std::vector& > rows, >std::shared_ptr* table) { > auto rng = rows | ranges::views::transform([](const data_row& row) { >return std::tuple&>( >row.id, row.cost, row.cost_components); >}); > return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng, >{"id", "cost", "cost_components"}, >table); > } > {code} > So, it allows more concise code for consumers of the API compared to using > builders directly. > There is no direct support by the library for other types (binary, struct, > union etc. types or converting iterable objects other than vectors to lists). > Users are provided a way to specialize their own data structures. One > limitation for implicit inference is that it is hard (or even impossible) to > infer exact type to use in some cases. For example, should > {{std::string_view}} value be inferred as string, binary, large binary or > list? This ambiguity can be avoided by providing some way for user to > explicitly state correct type for storing a column. For example a user can > return a so called {{BinaryCell}} class to return binary values. > Proposed changes: > * Implementing cell "adapters": Cells are non-owning references for each > type. It's user's responsibility keep pointed values alive. (Can scalars be > used in this context?) > ** BinaryCell > ** StringCell > ** ListCell (fo adapting any Range) > ** StructCell > ** ... > * Primitive types don't need such adapters since their values are trivial to > cast (e.g. just use int8_t(value) to use Int8Type). > * Adding benchmarks for comparing with builder performance. There is likely > to be some performance penalty due to hindering compiler optimizations. Yet, > this is acceptable in exchange of a more concise code IMHO. For fine-grained > control over performance, it will be still possible to directly use builders. > I have implemented something similar to BinaryCell for my use case. If above > changes sound reasonable, I will go ahead and start implementing other cells > to submit. > > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6377) [C++] Extending STL API to support row-wise conversion
Omer Ozarslan created ARROW-6377: Summary: [C++] Extending STL API to support row-wise conversion Key: ARROW-6377 URL: https://issues.apache.org/jira/browse/ARROW-6377 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Omer Ozarslan Using array builders is the recommended way in the documentation for converting rowwise data to arrow tables currently. However, array builders has a low level interface to support various use cases in the library. They require additional boilerplate due to type erasure, although some of these boilerplate could be avoided in compile time if the schema is already known and fixed (also discussed in ARROW-4067). In some other part of the library, STL API provides a nice abstraction over builders by inferring data type and builders from values provided, reducing the boilerplate significantly. It handles automatically converting tuples with a limited set of native types currently: numeric types, string and vector (+ nullable variations of these in case ARROW-6326 is merged). It also allows passing references in tuple values (implemented recently in ARROW-6284). As a more concrete example, this is the code which can be used to convert {{row_data}} provided in examples: {code:cpp} arrow::Status VectorToColumnarTableSTL(const std::vector& rows, std::shared_ptr* table) { auto rng = rows | ranges::views::transform([](const data_row& row) { return std::tuple&>( row.id, row.cost, row.cost_components); }); return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng, {"id", "cost", "cost_components"}, table); } {code} So, it allows more concise code for consumers of the API compared to using builders directly. There is no direct support by the library for other types (binary, struct, union etc. types or converting iterable objects other than vectors to lists). Users are provided a way to specialize their own data structures. One limitation for implicit inference is that it is hard (or even impossible) to infer exact type to use in some cases. For example, should {{std::string_view}} value be inferred as string, binary, large binary or list? This ambiguity can be avoided by providing some way for user to explicitly state correct type for storing a column. For example a user can return a so called {{BinaryCell}} class to return binary values. Proposed changes: * Implementing cell "adapters": Cells are non-owning references for each type. It's user's responsibility keep pointed values alive. (Can scalars be used in this context?) ** BinaryCell ** StringCell ** ListCell (fo adapting any Range) ** StructCell ** ... * Primitive types don't need such adapters since their values are trivial to cast (e.g. just use int8_t(value) to use Int8Type). * Adding benchmarks for comparing with builder performance. There is likely to be some performance penalty due to hindering compiler optimizations. Yet, this is acceptable in exchange of a more concise code IMHO. For fine-grained control over performance, it will be still possible to directly use builders. I have implemented something similar to BinaryCell for my use case. If above changes sound reasonable, I will go ahead and start implementing other cells to submit. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API
[ https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917884#comment-16917884 ] Omer Ozarslan commented on ARROW-6375: -- [~pitrou] Sure, I will. I'm also opening another issue about extending STL API for rowwise conversion in general. > [C++] Extend ConversionTraits to allow efficiently appending list values in > STL API > --- > > Key: ARROW-6375 > URL: https://issues.apache.org/jira/browse/ARROW-6375 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > > I was trying to benchmark performances of using array builders vs. STL API > for converting some row data to arrow tables. I realized it is around 1.5-1.8 > times slower to convert {{std::vector}} values with STL API than doing so > with builder API. It appears this is primarily due to appending rows via > {{...::Append}} method by iterating over > {{ConversionTrait>::AppendRow}} for each value. > Calling {{...::AppendValues}} would make it more efficient, however, > {{ConversionTraits}} doesn't offer a way for appending more than one cells > ({{AppendRow}} takes a builder and a single cell as its parameters). > Would it be possible to extend conversion traits with an optional method > {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization > to efficiently append multiple cells at once? In the example above this > function would be called with {{std::vector::data()}} and > {{std::vector::size()}} if provided. If such method isn't provided by the > specialization, current behavior (i.e. iterating over {{AppendRow}}) can be > used as default. > [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] > is the particular part in code that will be replaced in practice. Instead of > directly calling AppendRow in a for loop, a public helper function (e.g. > {{stl::AppendRows}}) can be provided, in which it implements above logic. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API
[ https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omer Ozarslan updated ARROW-6375: - Description: I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than doing so with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional method {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization to efficiently append multiple cells at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. was: I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than doing so with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional method {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization to efficiently append multiple values at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. > [C++] Extend ConversionTraits to allow efficiently appending list values in > STL API > --- > > Key: ARROW-6375 > URL: https://issues.apache.org/jira/browse/ARROW-6375 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > > I was trying to benchmark performances of using array builders vs. STL API > for converting some row data to arrow tables. I realized it is around 1.5-1.8 > times slower to convert {{std::vector}} values with STL API than doing so > with builder API. It appears this is primarily due to appending rows via > {{...::Append}} method by iterating over > {{ConversionTrait>::AppendRow}} for each value. > Calling {{...::AppendValues}} would make it more efficient, however, > {{ConversionTraits}} doesn't offer a way for appending more than one cells > ({{AppendRow}} takes a builder and a single cell as its parameters). > Would it be possible to extend conversion traits with an optional method > {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization > to efficiently append multiple cells at once? In the example above this > function would be called with {{std::vector::data()}} and > {{std::vector::size()}} if provided. If such method isn't provided by the > specialization, current behavior (i.e. iterating over {{AppendRow}}) can be > used as default. > [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] > is the particular part in code that will be replaced in practice. Instead of > directly calling AppendRow in a for loop, a public helper function (e.g. > {{stl::AppendRows}}) can be provided, in which it implements above logic. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API
[ https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omer Ozarslan updated ARROW-6375: - Description: I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than doing so with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional method {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization to efficiently append multiple values at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. was: I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than doing so with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional metho\{{d }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization to efficiently append multiple values at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. > [C++] Extend ConversionTraits to allow efficiently appending list values in > STL API > --- > > Key: ARROW-6375 > URL: https://issues.apache.org/jira/browse/ARROW-6375 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > > I was trying to benchmark performances of using array builders vs. STL API > for converting some row data to arrow tables. I realized it is around 1.5-1.8 > times slower to convert {{std::vector}} values with STL API than doing so > with builder API. It appears this is primarily due to appending rows via > {{...::Append}} method by iterating over > {{ConversionTrait>::AppendRow}} for each value. > Calling {{...::AppendValues}} would make it more efficient, however, > {{ConversionTraits}} doesn't offer a way for appending more than one cells > ({{AppendRow}} takes a builder and a single cell as its parameters). > Would it be possible to extend conversion traits with an optional method > {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization > to efficiently append multiple values at once? In the example above this > function would be called with {{std::vector::data()}} and > {{std::vector::size()}} if provided. If such method isn't provided by the > specialization, current behavior (i.e. iterating over {{AppendRow}}) can be > used as default. > [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] > is the particular part in code that will be replaced in practice. Instead of > directly calling AppendRow in a for loop, a public helper function (e.g. > {{stl::AppendRows}}) can be provided, in which it implements above logic. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API
[ https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omer Ozarslan updated ARROW-6375: - Description: I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than doing so with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional metho\{{d }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization to efficiently append multiple values at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. was: I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional metho{{d }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization to efficiently append multiple values at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. > [C++] Extend ConversionTraits to allow efficiently appending list values in > STL API > --- > > Key: ARROW-6375 > URL: https://issues.apache.org/jira/browse/ARROW-6375 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Omer Ozarslan >Priority: Major > > I was trying to benchmark performances of using array builders vs. STL API > for converting some row data to arrow tables. I realized it is around 1.5-1.8 > times slower to convert {{std::vector}} values with STL API than doing so > with builder API. It appears this is primarily due to appending rows via > {{...::Append}} method by iterating over > {{ConversionTrait>::AppendRow}} for each value. > Calling {{...::AppendValues}} would make it more efficient, however, > {{ConversionTraits}} doesn't offer a way for appending more than one cells > ({{AppendRow}} takes a builder and a single cell as its parameters). > Would it be possible to extend conversion traits with an optional metho\{{d > }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization > to efficiently append multiple values at once? In the example above this > function would be called with {{std::vector::data()}} and > {{std::vector::size()}} if provided. If such method isn't provided by the > specialization, current behavior (i.e. iterating over {{AppendRow}}) can be > used as default. > [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] > is the particular part in code that will be replaced in practice. Instead of > directly calling AppendRow in a for loop, a public helper function (e.g. > {{stl::AppendRows}}) can be provided, in which it implements above logic. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API
Omer Ozarslan created ARROW-6375: Summary: [C++] Extend ConversionTraits to allow efficiently appending list values in STL API Key: ARROW-6375 URL: https://issues.apache.org/jira/browse/ARROW-6375 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Omer Ozarslan I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional metho{{d }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization to efficiently append multiple values at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6371) [Doc] Row to columnar conversion example mentions arrow::Column in comments
Omer Ozarslan created ARROW-6371: Summary: [Doc] Row to columnar conversion example mentions arrow::Column in comments Key: ARROW-6371 URL: https://issues.apache.org/jira/browse/ARROW-6371 Project: Apache Arrow Issue Type: Bug Components: Documentation Reporter: Omer Ozarslan https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html {code:cpp} // The final representation should be an `arrow::Table` which in turn is made up of // an `arrow::Schema` and a list of `arrow::Column`. An `arrow::Column` is again a // named collection of one or more `arrow::Array` instances. As the first step, we // will iterate over the data and build up the arrays incrementally. {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6326) [C++] Nullable fields when converting std::tuple to Table
Omer Ozarslan created ARROW-6326: Summary: [C++] Nullable fields when converting std::tuple to Table Key: ARROW-6326 URL: https://issues.apache.org/jira/browse/ARROW-6326 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Omer Ozarslan {{std::optional}} isn't used for representing nullable fields in Arrow's current STL conversion API since it requires C++17. Also there are other ways to represent an optional field other than {{std::optional}} such as using pointers or external implementations of optional ({{boost::optional}}, {{type_safe::optional}} and alike). Since it is hard to maintain so many different kinds of specializations, introducing an {{Optional}} concept covering these classes could solve this issue and allow implementing nullable fields consistently. So, the gist of proposed change will be something along the lines of: {code:cpp} template constexpr bool is_optional_like_v = ...; template struct CTypeTraits>> { //... } template struct ConversionTraits>> : public CTypeTraits { //... } {code} For a type {{T}} to be considered as an {{Optional}}: 1) It should be convertible (implicitly or explicitly) to {{bool}}, i.e. it implements {{[explicit] operator bool()}}, 2) It should be dereferencable, i.e. it implements {{operator*()}}. These two requirements provide a generalized way of templating nullable fields based on pointers, {{std::optional}}, {{boost::optional}} etc. However, it would be better (necessary?) if this implementation should act as a default while not breaking existing specializations of users (e.g. an existing implementation in which {{std::optional}} is specialized by user). Is there any issues this approach may cause that I may have missed? I will open a draft PR for working on that meanwhile. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6284) [C++] Allow references in std::tuple when converting tuple to arrow array
Omer Ozarslan created ARROW-6284: Summary: [C++] Allow references in std::tuple when converting tuple to arrow array Key: ARROW-6284 URL: https://issues.apache.org/jira/browse/ARROW-6284 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Omer Ozarslan This allows using std::tuple (e.g. std::tie) to convert user data types. More details will be provided in PR. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed
[ https://issues.apache.org/jira/browse/ARROW-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omer Ozarslan closed ARROW-6195. Resolution: Invalid > [C++] CMake fails with file not found error while bundling thrift if python > is not installed > > > Key: ARROW-6195 > URL: https://issues.apache.org/jira/browse/ARROW-6195 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Omer Ozarslan >Priority: Minor > > I had this error message while I was trying to reproduce another issue in > docker. > To reproduce: > ``` > FROM debian:buster > RUN apt-get update > RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential > cmake > > WORKDIR /app > RUN git clone https://github.com/apache/arrow.git > WORKDIR /app/arrow/cpp/build > RUN git checkout 167cea0 # HEAD as of 10-Aug-19 > RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. > RUN cmake --build . --target thrift_ep -j 8 > ``` > Relevant part of output: > ``` > Scanning dependencies of target thrift_ep > [ 66%] Creating directories for 'thrift_ep' > [ 66%] Performing download step (verify and extract) for 'thrift_ep' > CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message): > File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz > make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: > thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 > make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2 > make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2 > make: *** [Makefile:487: thrift_ep] Error 2 > ``` > Installing python fixes the problem, but this isn't directly clear from the > error message. The source of issue is that execute_process in > get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value > since PYTHON_EXECUTABLE was empty. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed
[ https://issues.apache.org/jira/browse/ARROW-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904476#comment-16904476 ] Omer Ozarslan commented on ARROW-6195: -- Never mind. This is already documented in https://arrow.apache.org/docs/python/development.html. > [C++] CMake fails with file not found error while bundling thrift if python > is not installed > > > Key: ARROW-6195 > URL: https://issues.apache.org/jira/browse/ARROW-6195 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Omer Ozarslan >Priority: Minor > > I had this error message while I was trying to reproduce another issue in > docker. > To reproduce: > ``` > FROM debian:buster > RUN apt-get update > RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential > cmake > > WORKDIR /app > RUN git clone https://github.com/apache/arrow.git > WORKDIR /app/arrow/cpp/build > RUN git checkout 167cea0 # HEAD as of 10-Aug-19 > RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. > RUN cmake --build . --target thrift_ep -j 8 > ``` > Relevant part of output: > ``` > Scanning dependencies of target thrift_ep > [ 66%] Creating directories for 'thrift_ep' > [ 66%] Performing download step (verify and extract) for 'thrift_ep' > CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message): > File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz > make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: > thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 > make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2 > make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2 > make: *** [Makefile:487: thrift_ep] Error 2 > ``` > Installing python fixes the problem, but this isn't directly clear from the > error message. The source of issue is that execute_process in > get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value > since PYTHON_EXECUTABLE was empty. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed
[ https://issues.apache.org/jira/browse/ARROW-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omer Ozarslan updated ARROW-6195: - Component/s: C++ > [C++] CMake fails with file not found error while bundling thrift if python > is not installed > > > Key: ARROW-6195 > URL: https://issues.apache.org/jira/browse/ARROW-6195 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Omer Ozarslan >Priority: Minor > > I had this error message while I was trying to reproduce another issue in > docker. > To reproduce: > ``` > FROM debian:buster > RUN apt-get update > RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential > cmake > > WORKDIR /app > RUN git clone https://github.com/apache/arrow.git > WORKDIR /app/arrow/cpp/build > RUN git checkout 167cea0 # HEAD as of 10-Aug-19 > RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. > RUN cmake --build . --target thrift_ep -j 8 > ``` > Relevant part of output: > ``` > Scanning dependencies of target thrift_ep > [ 66%] Creating directories for 'thrift_ep' > [ 66%] Performing download step (verify and extract) for 'thrift_ep' > CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message): > File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz > make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: > thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 > make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2 > make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2 > make: *** [Makefile:487: thrift_ep] Error 2 > ``` > Installing python fixes the problem, but this isn't directly clear from the > error message. The source of issue is that execute_process in > get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value > since PYTHON_EXECUTABLE was empty. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed
[ https://issues.apache.org/jira/browse/ARROW-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omer Ozarslan updated ARROW-6195: - Description: I had this error message while I was trying to reproduce another issue in docker. To reproduce: ``` FROM debian:buster RUN apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential cmake WORKDIR /app RUN git clone https://github.com/apache/arrow.git WORKDIR /app/arrow/cpp/build RUN git checkout 167cea0 # HEAD as of 10-Aug-19 RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. RUN cmake --build . --target thrift_ep -j 8 ``` Relevant part of output: ``` Scanning dependencies of target thrift_ep [ 66%] Creating directories for 'thrift_ep' [ 66%] Performing download step (verify and extract) for 'thrift_ep' CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message): File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2 make: *** [Makefile:487: thrift_ep] Error 2 ``` Installing python fixes the problem, but this isn't directly clear from the error message. The source of issue is that execute_process in get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value since PYTHON_EXECUTABLE was empty. was: I had this error message while I was trying to reproduce another issue in docker. To reproduce: ``` FROM debian:buster RUN apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential cmake WORKDIR /app RUN git clone https://github.com/apache/arrow.git RUN git checkout 167cea0 # HEAD as of 10-Aug-19 WORKDIR /app/arrow/cpp/build RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. RUN cmake --build . --target thrift_ep -j 8 ``` Relevant part of output: ``` Scanning dependencies of target thrift_ep [ 66%] Creating directories for 'thrift_ep' [ 66%] Performing download step (verify and extract) for 'thrift_ep' CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message): File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2 make: *** [Makefile:487: thrift_ep] Error 2 ``` Installing python fixes the problem, but this isn't directly clear from the error message. The source of issue is that execute_process in get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value since PYTHON_EXECUTABLE was empty. > [C++] CMake fails with file not found error while bundling thrift if python > is not installed > > > Key: ARROW-6195 > URL: https://issues.apache.org/jira/browse/ARROW-6195 > Project: Apache Arrow > Issue Type: Bug >Reporter: Omer Ozarslan >Priority: Minor > > I had this error message while I was trying to reproduce another issue in > docker. > To reproduce: > ``` > FROM debian:buster > RUN apt-get update > RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential > cmake > > WORKDIR /app > RUN git clone https://github.com/apache/arrow.git > WORKDIR /app/arrow/cpp/build > RUN git checkout 167cea0 # HEAD as of 10-Aug-19 > RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. > RUN cmake --build . --target thrift_ep -j 8 > ``` > Relevant part of output: > ``` > Scanning dependencies of target thrift_ep > [ 66%] Creating directories for 'thrift_ep' > [ 66%] Performing download step (verify and extract) for 'thrift_ep' > CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message): > File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz > make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: > thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 > make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2 > make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2 > make: *** [Makefile:487: thrift_ep] Error 2 > ``` > Installing python fixes the problem, but this isn't directly clear from the > error message. The source of issue is that execute_process in > get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value > since PYTHON_EXECUTABLE was empty. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed
Omer Ozarslan created ARROW-6195: Summary: [C++] CMake fails with file not found error while bundling thrift if python is not installed Key: ARROW-6195 URL: https://issues.apache.org/jira/browse/ARROW-6195 Project: Apache Arrow Issue Type: Bug Reporter: Omer Ozarslan I had this error message while I was trying to reproduce another issue in docker. To reproduce: ``` FROM debian:buster RUN apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential cmake WORKDIR /app RUN git clone https://github.com/apache/arrow.git RUN git checkout 167cea0 # HEAD as of 10-Aug-19 WORKDIR /app/arrow/cpp/build RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. RUN cmake --build . --target thrift_ep -j 8 ``` Relevant part of output: ``` Scanning dependencies of target thrift_ep [ 66%] Creating directories for 'thrift_ep' [ 66%] Performing download step (verify and extract) for 'thrift_ep' CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message): File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2 make: *** [Makefile:487: thrift_ep] Error 2 ``` Installing python fixes the problem, but this isn't directly clear from the error message. The source of issue is that execute_process in get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value since PYTHON_EXECUTABLE was empty. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6190) [C++] Define and declare functions regardless of NDEBUG
[ https://issues.apache.org/jira/browse/ARROW-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904070#comment-16904070 ] Omer Ozarslan commented on ARROW-6190: -- Submitted PR on https://github.com/apache/arrow/pull/5049. > [C++] Define and declare functions regardless of NDEBUG > --- > > Key: ARROW-6190 > URL: https://issues.apache.org/jira/browse/ARROW-6190 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Omer Ozarslan >Priority: Minor > > NDEBUG is not shipped in linker flags, so I got a linker error with release > build on FixedSizeBinaryBuilder::UnsafeAppend(util::string_view value) call, > since it makes a call to CheckValueSize. > This is somewhat a follow-up of ARROW-2313. I took the same path by removing > NDEBUG ifdefs around CheckValueSize definition and declaration. > I applied the same fix to CheckUTF8Initialized as well after grepping the > source code for "#ifndef NDEBUG" and figured out it has the same issue. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-6190) [C++] Define and declare functions regardless of NDEBUG
[ https://issues.apache.org/jira/browse/ARROW-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omer Ozarslan updated ARROW-6190: - Summary: [C++] Define and declare functions regardless of NDEBUG (was: Define and declare functions regardless of NDEBUG) > [C++] Define and declare functions regardless of NDEBUG > --- > > Key: ARROW-6190 > URL: https://issues.apache.org/jira/browse/ARROW-6190 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Omer Ozarslan >Priority: Minor > > NDEBUG is not shipped in linker flags, so I got a linker error with release > build on FixedSizeBinaryBuilder::UnsafeAppend(util::string_view value) call, > since it makes a call to CheckValueSize. > This is somewhat a follow-up of ARROW-2313. I took the same path by removing > NDEBUG ifdefs around CheckValueSize definition and declaration. > I applied the same fix to CheckUTF8Initialized as well after grepping the > source code for "#ifndef NDEBUG" and figured out it has the same issue. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6190) Define and declare functions regardless of NDEBUG
Omer Ozarslan created ARROW-6190: Summary: Define and declare functions regardless of NDEBUG Key: ARROW-6190 URL: https://issues.apache.org/jira/browse/ARROW-6190 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Omer Ozarslan NDEBUG is not shipped in linker flags, so I got a linker error with release build on FixedSizeBinaryBuilder::UnsafeAppend(util::string_view value) call, since it makes a call to CheckValueSize. This is somewhat a follow-up of ARROW-2313. I took the same path by removing NDEBUG ifdefs around CheckValueSize definition and declaration. I applied the same fix to CheckUTF8Initialized as well after grepping the source code for "#ifndef NDEBUG" and figured out it has the same issue. -- This message was sent by Atlassian JIRA (v7.6.14#76016)