[jira] [Commented] (ARROW-6377) [C++] Extending STL API to support row-wise conversion

2019-10-09 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947781#comment-16947781
 ] 

Omer Ozarslan commented on ARROW-6377:
--

||Arrow Type||C++ Type||
|NA
 BOOL
 UINT8
 INT8
 UINT16
 INT16
 UINT32
 INT32
 UINT64
 INT64
 HALF_FLOAT
 FLOAT
 DOUBLE
 STRING
 BINARY
 FIXED_SIZE_BINARY
 DATE32
 DATE64
 TIMESTAMP
 TIME32
 TIME64
 INTERVAL
 DECIMAL
 LIST
 STRUCT
 UNION
 DICTIONARY
 MAP
 EXTENSION
 FIXED_SIZE_LIST
 DURATION
 LARGE_STRING
 LARGE_BINARY
 LARGE_LIST| |

> [C++] Extending STL API to support row-wise conversion
> --
>
> Key: ARROW-6377
> URL: https://issues.apache.org/jira/browse/ARROW-6377
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
> Fix For: 1.0.0
>
>
> Using array builders is the recommended way in the documentation for 
> converting rowwise data to arrow tables currently. However, array builders 
> has a low level interface to support various use cases in the library. They 
> require additional boilerplate due to type erasure, although some of these 
> boilerplate could be avoided in compile time if the schema is already known 
> and fixed (also discussed in ARROW-4067).
> In some other part of the library, STL API provides a nice abstraction over 
> builders by inferring data type and builders from values provided, reducing 
> the boilerplate significantly. It handles automatically converting tuples 
> with a limited set of native types currently: numeric types, string and 
> vector (+ nullable variations of these in case ARROW-6326 is merged). It also 
> allows passing references in tuple values (implemented recently in 
> ARROW-6284).
> As a more concrete example, this is the code which can be used to convert 
> {{row_data}} provided in examples:
>   
> {code:cpp}
> arrow::Status VectorToColumnarTableSTL(const std::vector& 
> rows,
>std::shared_ptr* table) {
> auto rng = rows | ranges::views::transform([](const data_row& row) {
>return std::tuple&>(
>row.id, row.cost, row.cost_components);
>});
> return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng,
>{"id", "cost", "cost_components"},
>table);
> }
> {code}
> So, it allows more concise code for consumers of the API compared to using 
> builders directly.
> There is no direct support by the library for other types (binary, struct, 
> union etc. types or converting iterable objects other than vectors to lists). 
> Users are provided a way to specialize their own data structures. One 
> limitation for implicit inference is that it is hard (or even impossible) to 
> infer exact type to use in some cases. For example, should 
> {{std::string_view}} value be inferred as string, binary, large binary or 
> list? This ambiguity can be avoided by providing some way for user to 
> explicitly state correct type for storing a column. For example a user can 
> return a so called {{BinaryCell}} class to return binary values.
> Proposed changes:
>  * Implementing cell "adapters": Cells are non-owning references for each 
> type. It's user's responsibility keep pointed values alive. (Can scalars be 
> used in this context?)
>  ** BinaryCell
>  ** StringCell
>  ** ListCell (fo adapting any Range)
>  ** StructCell
>  ** ...
>  * Primitive types don't need such adapters since their values are trivial to 
> cast (e.g. just use int8_t(value) to use Int8Type).
>  * Adding benchmarks for comparing with builder performance. There is likely 
> to be some performance penalty due to hindering compiler optimizations. Yet, 
> this is acceptable in exchange of a more concise code IMHO. For fine-grained 
> control over performance, it will be still possible to directly use builders.
> I have implemented something similar to BinaryCell for my use case. If above 
> changes sound reasonable, I will go ahead and start implementing other cells 
> to submit.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (ARROW-6377) [C++] Extending STL API to support row-wise conversion

2019-10-09 Thread Omer Ozarslan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Ozarslan updated ARROW-6377:
-
Comment: was deleted

(was: ||Arrow Type||C++ Type||
|NA
 BOOL
 UINT8
 INT8
 UINT16
 INT16
 UINT32
 INT32
 UINT64
 INT64
 HALF_FLOAT
 FLOAT
 DOUBLE
 STRING
 BINARY
 FIXED_SIZE_BINARY
 DATE32
 DATE64
 TIMESTAMP
 TIME32
 TIME64
 INTERVAL
 DECIMAL
 LIST
 STRUCT
 UNION
 DICTIONARY
 MAP
 EXTENSION
 FIXED_SIZE_LIST
 DURATION
 LARGE_STRING
 LARGE_BINARY
 LARGE_LIST| |)

> [C++] Extending STL API to support row-wise conversion
> --
>
> Key: ARROW-6377
> URL: https://issues.apache.org/jira/browse/ARROW-6377
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
> Fix For: 1.0.0
>
>
> Using array builders is the recommended way in the documentation for 
> converting rowwise data to arrow tables currently. However, array builders 
> has a low level interface to support various use cases in the library. They 
> require additional boilerplate due to type erasure, although some of these 
> boilerplate could be avoided in compile time if the schema is already known 
> and fixed (also discussed in ARROW-4067).
> In some other part of the library, STL API provides a nice abstraction over 
> builders by inferring data type and builders from values provided, reducing 
> the boilerplate significantly. It handles automatically converting tuples 
> with a limited set of native types currently: numeric types, string and 
> vector (+ nullable variations of these in case ARROW-6326 is merged). It also 
> allows passing references in tuple values (implemented recently in 
> ARROW-6284).
> As a more concrete example, this is the code which can be used to convert 
> {{row_data}} provided in examples:
>   
> {code:cpp}
> arrow::Status VectorToColumnarTableSTL(const std::vector& 
> rows,
>std::shared_ptr* table) {
> auto rng = rows | ranges::views::transform([](const data_row& row) {
>return std::tuple&>(
>row.id, row.cost, row.cost_components);
>});
> return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng,
>{"id", "cost", "cost_components"},
>table);
> }
> {code}
> So, it allows more concise code for consumers of the API compared to using 
> builders directly.
> There is no direct support by the library for other types (binary, struct, 
> union etc. types or converting iterable objects other than vectors to lists). 
> Users are provided a way to specialize their own data structures. One 
> limitation for implicit inference is that it is hard (or even impossible) to 
> infer exact type to use in some cases. For example, should 
> {{std::string_view}} value be inferred as string, binary, large binary or 
> list? This ambiguity can be avoided by providing some way for user to 
> explicitly state correct type for storing a column. For example a user can 
> return a so called {{BinaryCell}} class to return binary values.
> Proposed changes:
>  * Implementing cell "adapters": Cells are non-owning references for each 
> type. It's user's responsibility keep pointed values alive. (Can scalars be 
> used in this context?)
>  ** BinaryCell
>  ** StringCell
>  ** ListCell (fo adapting any Range)
>  ** StructCell
>  ** ...
>  * Primitive types don't need such adapters since their values are trivial to 
> cast (e.g. just use int8_t(value) to use Int8Type).
>  * Adding benchmarks for comparing with builder performance. There is likely 
> to be some performance penalty due to hindering compiler optimizations. Yet, 
> this is acceptable in exchange of a more concise code IMHO. For fine-grained 
> control over performance, it will be still possible to directly use builders.
> I have implemented something similar to BinaryCell for my use case. If above 
> changes sound reasonable, I will go ahead and start implementing other cells 
> to submit.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6405) [Python] Add std::move wrapper for use in Cython

2019-10-01 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942124#comment-16942124
 ] 

Omer Ozarslan commented on ARROW-6405:
--

Ah, thanks. Sorry for delay. I was occupied with some other stuff. I also 
thought I could delay this to yield CI to more critical bugs before the release.

It sounds okay to me to close this issue with that PR merged. I'm not sure if 
it's necessary create a separate PR just for this.

There seems a few other points move is used. You may want to replace those in 
the PR as well:
{code:java}
 ~/src/ext/arrow/python/pyarrow   master  grep "move" -rnw . | grep -E 
".(pyx|pxi|pxd)"
./_flight.pyx:946:   move(handler)))
./_flight.pyx:1358:new CPyFlightDataStream(result, 
move(data_stream)))
./_flight.pyx:1485:new CPyFlightDataStream(result, move(data_stream)))
./includes/libarrow_flight.pxd:378:unique_ptr[CFlightDataStream] 
move(unique_ptr[CFlightDataStream]) nogil
./includes/libarrow_flight.pxd:379:unique_ptr[CServerAuthHandler] 
move(unique_ptr[CServerAuthHandler]) nogil
./includes/libarrow_flight.pxd:380:unique_ptr[CClientAuthHandler] 
move(unique_ptr[CClientAuthHandler]) nogil
./_fs.pyx:249:def move(self, src, dest):
{code}
cymove doesn't enforce nogil as it made sense to me to leave decision for the 
gil to the caller of move (it's just casting after all).

> [Python] Add std::move wrapper for use in Cython
> 
>
> Key: ARROW-6405
> URL: https://issues.apache.org/jira/browse/ARROW-6405
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> [~bkietz] pointed out this to me
> https://github.com/ozars/cymove
> This is small enough that we should simply copy this code into our codebase 
> (MIT-licensed) and fix the couple of places where we have 
> {{std::move}}-related workarounds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6552) [C++] boost::optional in STL test fails compiling in gcc 4.8.2

2019-09-12 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928779#comment-16928779
 ] 

Omer Ozarslan commented on ARROW-6552:
--

Is there a simple way (such as Dockerfile) to test changes on gcc 4.8.2?

> [C++] boost::optional in STL test fails compiling in gcc 4.8.2
> --
>
> Key: ARROW-6552
> URL: https://issues.apache.org/jira/browse/ARROW-6552
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
>
> Quoting [~bkietz] from mailgroup:
> {code:java}
> a tuple constructor is choking on implicit conversion from
> string literal (char[6]) to boost::optional{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6552) [C++] boost::optional in STL test fails compiling in gcc 4.8.2

2019-09-12 Thread Omer Ozarslan (Jira)
Omer Ozarslan created ARROW-6552:


 Summary: [C++] boost::optional in STL test fails compiling in gcc 
4.8.2
 Key: ARROW-6552
 URL: https://issues.apache.org/jira/browse/ARROW-6552
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Omer Ozarslan


Quoting [~bkietz] from mailgroup:
{code:java}
a tuple constructor is choking on implicit conversion from
string literal (char[6]) to boost::optional{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6405) [Python] Add std::move wrapper for use in Cython

2019-09-01 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920544#comment-16920544
 ] 

Omer Ozarslan commented on ARROW-6405:
--

I'd be happy to work on this if this isn't assigned to anyone. :)

> [Python] Add std::move wrapper for use in Cython
> 
>
> Key: ARROW-6405
> URL: https://issues.apache.org/jira/browse/ARROW-6405
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> [~bkietz] pointed out this to me
> https://github.com/ozars/cymove
> This is small enough that we should simply copy this code into our codebase 
> (MIT-licensed) and fix the couple of places where we have 
> {{std::move}}-related workarounds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6387) [Archery] Errors with make

2019-08-29 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918795#comment-16918795
 ] 

Omer Ozarslan commented on ARROW-6387:
--

Okay, submitting a PR soon.

> [Archery] Errors with make
> --
>
> Key: ARROW-6387
> URL: https://issues.apache.org/jira/browse/ARROW-6387
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Omer Ozarslan
>Priority: Minor
>
> {{archery --debug benchmark run}} gives error on Debian 10, CMake 3.13.4, GNU 
> make 4.2.1:
> {code:java}
> (.venv)  omer@omer  ~/src/ext/arrow/cpp/build   master ●  archery --debug 
> benchmark run 
>
> DEBUG:archery:Running benchmark WORKSPACE 
>   
>
> DEBUG:archery:Executing `['/usr/bin/cmake', '-GMake', 
> '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DCMAKE_BUILD_TYPE=release', 
> '-DBUILD_WARNING_LEVEL=production', '-DARROW_BUILD_TESTS=ON', 
> '-DARROW_BUILD_BENCHMARKS=ON', '-DARROW_PYTHON=OFF', '-DARROW_PARQUET=OFF', 
> '-DARROW_GANDIVA=OFF', '-DARROW_PLASMA=OFF', '-DARROW_FLIGHT=OFF', 
> '/home/omer/src/ext/arrow/cpp']`
> CMake Error: Could not create named generator Make
>   
>   
> 
> Generators
>
>   Unix Makefiles   = Generates standard UNIX makefiles.   
>   
>
>   Ninja= Generates build.ninja files. 
>   
>
>   Watcom WMake = Generates Watcom WMake makefiles.
>   
>
>   CodeBlocks - Ninja   = Generates CodeBlocks project files.  
>   
>
>   CodeBlocks - Unix Makefiles  = Generates CodeBlocks project files.  
>   
>
>   CodeLite - Ninja = Generates CodeLite project files.
>
>   CodeLite - Unix Makefiles= Generates CodeLite project files.
>  
>   Sublime Text 2 - Ninja   = Generates Sublime Text 2 project files.  
> 
>   Sublime Text 2 - Unix Makefiles
>= Generates Sublime Text 2 project files.  
> 
>   Kate - Ninja = Generates Kate project files.
>  
>   Kate - Unix Makefiles= Generates Kate project files.
>   Eclipse CDT4 - Ninja = Generates Eclipse CDT 4.0 project files.
>   Eclipse CDT4 - Unix Makefiles= Generates Eclipse CDT 4.0 project files.
> Traceback (most recent call last):
> [[[cropped]]]{code}
> After trivial fix:
> {code:java}
> diff --git a/dev/archery/archery/utils/cmake.py 
> b/dev/archery/archery/utils/cmake.py
> index 38aedab2d..3150ea9a6 100644
> --- a/dev/archery/archery/utils/cmake.py
> +++ b/dev/archery/archery/utils/cmake.py
> @@ -34,7 +34,7 @@ class CMake(Command):
>  in the search path.
>  """
>  found_ninja = which("ninja")
> -return "Ninja" if found_ninja else "Make"
> +return "Ninja" if found_ninja else "Unix Makefiles"{code}
> I get another error:
> {code:java}
> [[[cropped]]
> -- Generating done
> -- Build files have been written to: /tmp/arrow-bench-48x_yleb/WORKSPACE/build
> DEBUG:archery:Executing `[None]`
> Traceback (most recent call last):
>   File "/home/omer/src/ext/arrow/.venv/bin/archery", line 11, in 
> load_entry_point('archery', 'console_scripts', 'archery')()
>   File 
> "/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", 
> line 764, in __call__
> return self.main(*args, **kwargs)
>   File 
> 

[jira] [Commented] (ARROW-6387) [Archery] Errors with make

2019-08-29 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918784#comment-16918784
 ] 

Omer Ozarslan commented on ARROW-6387:
--

Thanks. How about calling cmake --build instead of the build program itself?

> [Archery] Errors with make
> --
>
> Key: ARROW-6387
> URL: https://issues.apache.org/jira/browse/ARROW-6387
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Omer Ozarslan
>Priority: Minor
>
> {{archery --debug benchmark run}} gives error on Debian 10, CMake 3.13.4, GNU 
> make 4.2.1:
> {code:java}
> (.venv)  omer@omer  ~/src/ext/arrow/cpp/build   master ●  archery --debug 
> benchmark run 
>
> DEBUG:archery:Running benchmark WORKSPACE 
>   
>
> DEBUG:archery:Executing `['/usr/bin/cmake', '-GMake', 
> '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DCMAKE_BUILD_TYPE=release', 
> '-DBUILD_WARNING_LEVEL=production', '-DARROW_BUILD_TESTS=ON', 
> '-DARROW_BUILD_BENCHMARKS=ON', '-DARROW_PYTHON=OFF', '-DARROW_PARQUET=OFF', 
> '-DARROW_GANDIVA=OFF', '-DARROW_PLASMA=OFF', '-DARROW_FLIGHT=OFF', 
> '/home/omer/src/ext/arrow/cpp']`
> CMake Error: Could not create named generator Make
>   
>   
> 
> Generators
>
>   Unix Makefiles   = Generates standard UNIX makefiles.   
>   
>
>   Ninja= Generates build.ninja files. 
>   
>
>   Watcom WMake = Generates Watcom WMake makefiles.
>   
>
>   CodeBlocks - Ninja   = Generates CodeBlocks project files.  
>   
>
>   CodeBlocks - Unix Makefiles  = Generates CodeBlocks project files.  
>   
>
>   CodeLite - Ninja = Generates CodeLite project files.
>
>   CodeLite - Unix Makefiles= Generates CodeLite project files.
>  
>   Sublime Text 2 - Ninja   = Generates Sublime Text 2 project files.  
> 
>   Sublime Text 2 - Unix Makefiles
>= Generates Sublime Text 2 project files.  
> 
>   Kate - Ninja = Generates Kate project files.
>  
>   Kate - Unix Makefiles= Generates Kate project files.
>   Eclipse CDT4 - Ninja = Generates Eclipse CDT 4.0 project files.
>   Eclipse CDT4 - Unix Makefiles= Generates Eclipse CDT 4.0 project files.
> Traceback (most recent call last):
> [[[cropped]]]{code}
> After trivial fix:
> {code:java}
> diff --git a/dev/archery/archery/utils/cmake.py 
> b/dev/archery/archery/utils/cmake.py
> index 38aedab2d..3150ea9a6 100644
> --- a/dev/archery/archery/utils/cmake.py
> +++ b/dev/archery/archery/utils/cmake.py
> @@ -34,7 +34,7 @@ class CMake(Command):
>  in the search path.
>  """
>  found_ninja = which("ninja")
> -return "Ninja" if found_ninja else "Make"
> +return "Ninja" if found_ninja else "Unix Makefiles"{code}
> I get another error:
> {code:java}
> [[[cropped]]
> -- Generating done
> -- Build files have been written to: /tmp/arrow-bench-48x_yleb/WORKSPACE/build
> DEBUG:archery:Executing `[None]`
> Traceback (most recent call last):
>   File "/home/omer/src/ext/arrow/.venv/bin/archery", line 11, in 
> load_entry_point('archery', 'console_scripts', 'archery')()
>   File 
> "/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", 
> line 764, in __call__
> return self.main(*args, **kwargs)
>   

[jira] [Created] (ARROW-6387) [Archery] Errors with make

2019-08-29 Thread Omer Ozarslan (Jira)
Omer Ozarslan created ARROW-6387:


 Summary: [Archery] Errors with make
 Key: ARROW-6387
 URL: https://issues.apache.org/jira/browse/ARROW-6387
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Omer Ozarslan


{{archery --debug benchmark run}} gives error on Debian 10, CMake 3.13.4, GNU 
make 4.2.1:
{code:java}
(.venv)  omer@omer  ~/src/ext/arrow/cpp/build   master ●  archery --debug 
benchmark run   
 
DEBUG:archery:Running benchmark WORKSPACE   

   
DEBUG:archery:Executing `['/usr/bin/cmake', '-GMake', 
'-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DCMAKE_BUILD_TYPE=release', 
'-DBUILD_WARNING_LEVEL=production', '-DARROW_BUILD_TESTS=ON', 
'-DARROW_BUILD_BENCHMARKS=ON', '-DARROW_PYTHON=OFF', '-DARROW_PARQUET=OFF', 
'-DARROW_GANDIVA=OFF', '-DARROW_PLASMA=OFF', '-DARROW_FLIGHT=OFF', 
'/home/omer/src/ext/arrow/cpp']`
CMake Error: Could not create named generator Make  


  
Generators  
 
  Unix Makefiles   = Generates standard UNIX makefiles. 

   
  Ninja= Generates build.ninja files.   

   
  Watcom WMake = Generates Watcom WMake makefiles.  

   
  CodeBlocks - Ninja   = Generates CodeBlocks project files.

   
  CodeBlocks - Unix Makefiles  = Generates CodeBlocks project files.

   
  CodeLite - Ninja = Generates CodeLite project files.  
 
  CodeLite - Unix Makefiles= Generates CodeLite project files.  
   
  Sublime Text 2 - Ninja   = Generates Sublime Text 2 project files.
  
  Sublime Text 2 - Unix Makefiles
   = Generates Sublime Text 2 project files.
  
  Kate - Ninja = Generates Kate project files.  
   
  Kate - Unix Makefiles= Generates Kate project files.
  Eclipse CDT4 - Ninja = Generates Eclipse CDT 4.0 project files.
  Eclipse CDT4 - Unix Makefiles= Generates Eclipse CDT 4.0 project files.
Traceback (most recent call last):
[[[cropped]]]{code}
After trivial fix:
{code:java}
diff --git a/dev/archery/archery/utils/cmake.py 
b/dev/archery/archery/utils/cmake.py
index 38aedab2d..3150ea9a6 100644
--- a/dev/archery/archery/utils/cmake.py
+++ b/dev/archery/archery/utils/cmake.py
@@ -34,7 +34,7 @@ class CMake(Command):
 in the search path.
 """
 found_ninja = which("ninja")
-return "Ninja" if found_ninja else "Make"
+return "Ninja" if found_ninja else "Unix Makefiles"{code}
I get another error:
{code:java}
[[[cropped]]
-- Generating done
-- Build files have been written to: /tmp/arrow-bench-48x_yleb/WORKSPACE/build
DEBUG:archery:Executing `[None]`
Traceback (most recent call last):
  File "/home/omer/src/ext/arrow/.venv/bin/archery", line 11, in 
load_entry_point('archery', 'console_scripts', 'archery')()
  File 
"/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", 
line 764, in __call__
return self.main(*args, **kwargs)
  File 
"/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", 
line 717, in main
rv = self.invoke(ctx)
  File 
"/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", 
line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
  File 
"/home/omer/src/ext/arrow/.venv/lib/python3.7/site-packages/click/core.py", 
line 1137, in invoke
return 

[jira] [Commented] (ARROW-6371) [Doc] Row to columnar conversion example mentions arrow::Column in comments

2019-08-28 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917965#comment-16917965
 ] 

Omer Ozarslan commented on ARROW-6371:
--

Thanks. I replied this thread over email yesterday, but I guess the response 
didn't get through for some reason.

I submitted the PR.

> [Doc] Row to columnar conversion example mentions arrow::Column in comments
> ---
>
> Key: ARROW-6371
> URL: https://issues.apache.org/jira/browse/ARROW-6371
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Omer Ozarslan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html
> {code:cpp}
> // The final representation should be an `arrow::Table` which in turn is made 
> up of
> // an `arrow::Schema` and a list of `arrow::Column`. An `arrow::Column` is 
> again a
> // named collection of one or more `arrow::Array` instances. As the first 
> step, we
> // will iterate over the data and build up the arrays incrementally.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6377) [C++] Extending STL API to support row-wise conversion

2019-08-28 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917936#comment-16917936
 ] 

Omer Ozarslan commented on ARROW-6377:
--

On a side note, this _might_ have a better performance due to use of compile 
time knowledge, but it eventually comes down to benchmark.

> [C++] Extending STL API to support row-wise conversion
> --
>
> Key: ARROW-6377
> URL: https://issues.apache.org/jira/browse/ARROW-6377
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
>
> Using array builders is the recommended way in the documentation for 
> converting rowwise data to arrow tables currently. However, array builders 
> has a low level interface to support various use cases in the library. They 
> require additional boilerplate due to type erasure, although some of these 
> boilerplate could be avoided in compile time if the schema is already known 
> and fixed (also discussed in ARROW-4067).
> In some other part of the library, STL API provides a nice abstraction over 
> builders by inferring data type and builders from values provided, reducing 
> the boilerplate significantly. It handles automatically converting tuples 
> with a limited set of native types currently: numeric types, string and 
> vector (+ nullable variations of these in case ARROW-6326 is merged). It also 
> allows passing references in tuple values (implemented recently in 
> ARROW-6284).
> As a more concrete example, this is the code which can be used to convert 
> {{row_data}} provided in examples:
>   
> {code:cpp}
> arrow::Status VectorToColumnarTableSTL(const std::vector& 
> rows,
>std::shared_ptr* table) {
> auto rng = rows | ranges::views::transform([](const data_row& row) {
>return std::tuple&>(
>row.id, row.cost, row.cost_components);
>});
> return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng,
>{"id", "cost", "cost_components"},
>table);
> }
> {code}
> So, it allows more concise code for consumers of the API compared to using 
> builders directly.
> There is no direct support by the library for other types (binary, struct, 
> union etc. types or converting iterable objects other than vectors to lists). 
> Users are provided a way to specialize their own data structures. One 
> limitation for implicit inference is that it is hard (or even impossible) to 
> infer exact type to use in some cases. For example, should 
> {{std::string_view}} value be inferred as string, binary, large binary or 
> list? This ambiguity can be avoided by providing some way for user to 
> explicitly state correct type for storing a column. For example a user can 
> return a so called {{BinaryCell}} class to return binary values.
> Proposed changes:
>  * Implementing cell "adapters": Cells are non-owning references for each 
> type. It's user's responsibility keep pointed values alive. (Can scalars be 
> used in this context?)
>  ** BinaryCell
>  ** StringCell
>  ** ListCell (fo adapting any Range)
>  ** StructCell
>  ** ...
>  * Primitive types don't need such adapters since their values are trivial to 
> cast (e.g. just use int8_t(value) to use Int8Type).
>  * Adding benchmarks for comparing with builder performance. There is likely 
> to be some performance penalty due to hindering compiler optimizations. Yet, 
> this is acceptable in exchange of a more concise code IMHO. For fine-grained 
> control over performance, it will be still possible to directly use builders.
> I have implemented something similar to BinaryCell for my use case. If above 
> changes sound reasonable, I will go ahead and start implementing other cells 
> to submit.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (ARROW-6377) [C++] Extending STL API to support row-wise conversion

2019-08-28 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917936#comment-16917936
 ] 

Omer Ozarslan edited comment on ARROW-6377 at 8/28/19 5:08 PM:
---

On a side note, this _might_ have a better performance due to use of compile 
time knowledge, but it eventually comes down to benchmarking.


was (Author: ozars):
On a side note, this _might_ have a better performance due to use of compile 
time knowledge, but it eventually comes down to benchmark.

> [C++] Extending STL API to support row-wise conversion
> --
>
> Key: ARROW-6377
> URL: https://issues.apache.org/jira/browse/ARROW-6377
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
>
> Using array builders is the recommended way in the documentation for 
> converting rowwise data to arrow tables currently. However, array builders 
> has a low level interface to support various use cases in the library. They 
> require additional boilerplate due to type erasure, although some of these 
> boilerplate could be avoided in compile time if the schema is already known 
> and fixed (also discussed in ARROW-4067).
> In some other part of the library, STL API provides a nice abstraction over 
> builders by inferring data type and builders from values provided, reducing 
> the boilerplate significantly. It handles automatically converting tuples 
> with a limited set of native types currently: numeric types, string and 
> vector (+ nullable variations of these in case ARROW-6326 is merged). It also 
> allows passing references in tuple values (implemented recently in 
> ARROW-6284).
> As a more concrete example, this is the code which can be used to convert 
> {{row_data}} provided in examples:
>   
> {code:cpp}
> arrow::Status VectorToColumnarTableSTL(const std::vector& 
> rows,
>std::shared_ptr* table) {
> auto rng = rows | ranges::views::transform([](const data_row& row) {
>return std::tuple&>(
>row.id, row.cost, row.cost_components);
>});
> return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng,
>{"id", "cost", "cost_components"},
>table);
> }
> {code}
> So, it allows more concise code for consumers of the API compared to using 
> builders directly.
> There is no direct support by the library for other types (binary, struct, 
> union etc. types or converting iterable objects other than vectors to lists). 
> Users are provided a way to specialize their own data structures. One 
> limitation for implicit inference is that it is hard (or even impossible) to 
> infer exact type to use in some cases. For example, should 
> {{std::string_view}} value be inferred as string, binary, large binary or 
> list? This ambiguity can be avoided by providing some way for user to 
> explicitly state correct type for storing a column. For example a user can 
> return a so called {{BinaryCell}} class to return binary values.
> Proposed changes:
>  * Implementing cell "adapters": Cells are non-owning references for each 
> type. It's user's responsibility keep pointed values alive. (Can scalars be 
> used in this context?)
>  ** BinaryCell
>  ** StringCell
>  ** ListCell (fo adapting any Range)
>  ** StructCell
>  ** ...
>  * Primitive types don't need such adapters since their values are trivial to 
> cast (e.g. just use int8_t(value) to use Int8Type).
>  * Adding benchmarks for comparing with builder performance. There is likely 
> to be some performance penalty due to hindering compiler optimizations. Yet, 
> this is acceptable in exchange of a more concise code IMHO. For fine-grained 
> control over performance, it will be still possible to directly use builders.
> I have implemented something similar to BinaryCell for my use case. If above 
> changes sound reasonable, I will go ahead and start implementing other cells 
> to submit.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6377) [C++] Extending STL API to support row-wise conversion

2019-08-28 Thread Omer Ozarslan (Jira)
Omer Ozarslan created ARROW-6377:


 Summary: [C++] Extending STL API to support row-wise conversion
 Key: ARROW-6377
 URL: https://issues.apache.org/jira/browse/ARROW-6377
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Omer Ozarslan


Using array builders is the recommended way in the documentation for converting 
rowwise data to arrow tables currently. However, array builders has a low level 
interface to support various use cases in the library. They require additional 
boilerplate due to type erasure, although some of these boilerplate could be 
avoided in compile time if the schema is already known and fixed (also 
discussed in ARROW-4067).

In some other part of the library, STL API provides a nice abstraction over 
builders by inferring data type and builders from values provided, reducing the 
boilerplate significantly. It handles automatically converting tuples with a 
limited set of native types currently: numeric types, string and vector (+ 
nullable variations of these in case ARROW-6326 is merged). It also allows 
passing references in tuple values (implemented recently in ARROW-6284).

As a more concrete example, this is the code which can be used to convert 
{{row_data}} provided in examples:
  
{code:cpp}
arrow::Status VectorToColumnarTableSTL(const std::vector& rows,
   std::shared_ptr* table) {
auto rng = rows | ranges::views::transform([](const data_row& row) {
   return std::tuple&>(
   row.id, row.cost, row.cost_components);
   });
return arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rng,
   {"id", "cost", "cost_components"},
   table);
}

{code}
So, it allows more concise code for consumers of the API compared to using 
builders directly.

There is no direct support by the library for other types (binary, struct, 
union etc. types or converting iterable objects other than vectors to lists). 
Users are provided a way to specialize their own data structures. One 
limitation for implicit inference is that it is hard (or even impossible) to 
infer exact type to use in some cases. For example, should {{std::string_view}} 
value be inferred as string, binary, large binary or list? This ambiguity can 
be avoided by providing some way for user to explicitly state correct type for 
storing a column. For example a user can return a so called {{BinaryCell}} 
class to return binary values.

Proposed changes:
 * Implementing cell "adapters": Cells are non-owning references for each type. 
It's user's responsibility keep pointed values alive. (Can scalars be used in 
this context?)
 ** BinaryCell
 ** StringCell
 ** ListCell (fo adapting any Range)
 ** StructCell
 ** ...
 * Primitive types don't need such adapters since their values are trivial to 
cast (e.g. just use int8_t(value) to use Int8Type).
 * Adding benchmarks for comparing with builder performance. There is likely to 
be some performance penalty due to hindering compiler optimizations. Yet, this 
is acceptable in exchange of a more concise code IMHO. For fine-grained control 
over performance, it will be still possible to directly use builders.

I have implemented something similar to BinaryCell for my use case. If above 
changes sound reasonable, I will go ahead and start implementing other cells to 
submit.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API

2019-08-28 Thread Omer Ozarslan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917884#comment-16917884
 ] 

Omer Ozarslan commented on ARROW-6375:
--

[~pitrou] Sure, I will. I'm also opening another issue about extending STL API 
for rowwise conversion in general.

> [C++] Extend ConversionTraits to allow efficiently appending list values in 
> STL API
> ---
>
> Key: ARROW-6375
> URL: https://issues.apache.org/jira/browse/ARROW-6375
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
>
> I was trying to benchmark performances of using array builders vs. STL API 
> for converting some row data to arrow tables. I realized it is around 1.5-1.8 
> times slower to convert {{std::vector}} values with STL API than doing so 
> with builder API. It appears this is primarily due to appending rows via 
> {{...::Append}} method by iterating over 
> {{ConversionTrait>::AppendRow}} for each value.
> Calling {{...::AppendValues}} would make it more efficient, however, 
> {{ConversionTraits}} doesn't offer a way for appending more than one cells 
> ({{AppendRow}} takes a builder and a single cell as its parameters).
> Would it be possible to extend conversion traits with an optional method 
> {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization 
> to efficiently append multiple cells at once? In the example above this 
> function would be called with {{std::vector::data()}} and 
> {{std::vector::size()}} if provided. If such method isn't provided by the 
> specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
> used as default.
> [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
>  is the particular part in code that will be replaced in practice. Instead of 
> directly calling AppendRow in a for loop, a public helper function (e.g. 
> {{stl::AppendRows}}) can be provided, in which it implements above logic.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API

2019-08-28 Thread Omer Ozarslan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Ozarslan updated ARROW-6375:
-
Description: 
I was trying to benchmark performances of using array builders vs. STL API for 
converting some row data to arrow tables. I realized it is around 1.5-1.8 times 
slower to convert {{std::vector}} values with STL API than doing so with 
builder API. It appears this is primarily due to appending rows via 
{{...::Append}} method by iterating over 
{{ConversionTrait>::AppendRow}} for each value.

Calling {{...::AppendValues}} would make it more efficient, however, 
{{ConversionTraits}} doesn't offer a way for appending more than one cells 
({{AppendRow}} takes a builder and a single cell as its parameters).

Would it be possible to extend conversion traits with an optional method 
{{AppendRows(Builder, Cell*, size_t),}} which allows template specialization to 
efficiently append multiple cells at once? In the example above this function 
would be called with {{std::vector::data()}} and {{std::vector::size()}} if 
provided. If such method isn't provided by the specialization, current behavior 
(i.e. iterating over {{AppendRow}}) can be used as default.

[This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
 is the particular part in code that will be replaced in practice. Instead of 
directly calling AppendRow in a for loop, a public helper function (e.g. 
{{stl::AppendRows}}) can be provided, in which it implements above logic.

  was:
I was trying to benchmark performances of using array builders vs. STL API for 
converting some row data to arrow tables. I realized it is around 1.5-1.8 times 
slower to convert {{std::vector}} values with STL API than doing so with 
builder API. It appears this is primarily due to appending rows via 
{{...::Append}} method by iterating over 
{{ConversionTrait>::AppendRow}} for each value.

Calling {{...::AppendValues}} would make it more efficient, however, 
{{ConversionTraits}} doesn't offer a way for appending more than one cells 
({{AppendRow}} takes a builder and a single cell as its parameters).

Would it be possible to extend conversion traits with an optional method 
{{AppendRows(Builder, Cell*, size_t),}} which allows template specialization to 
efficiently append multiple values at once? In the example above this function 
would be called with {{std::vector::data()}} and {{std::vector::size()}} if 
provided. If such method isn't provided by the specialization, current behavior 
(i.e. iterating over {{AppendRow}}) can be used as default.

[This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
 is the particular part in code that will be replaced in practice. Instead of 
directly calling AppendRow in a for loop, a public helper function (e.g. 
{{stl::AppendRows}}) can be provided, in which it implements above logic.


> [C++] Extend ConversionTraits to allow efficiently appending list values in 
> STL API
> ---
>
> Key: ARROW-6375
> URL: https://issues.apache.org/jira/browse/ARROW-6375
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
>
> I was trying to benchmark performances of using array builders vs. STL API 
> for converting some row data to arrow tables. I realized it is around 1.5-1.8 
> times slower to convert {{std::vector}} values with STL API than doing so 
> with builder API. It appears this is primarily due to appending rows via 
> {{...::Append}} method by iterating over 
> {{ConversionTrait>::AppendRow}} for each value.
> Calling {{...::AppendValues}} would make it more efficient, however, 
> {{ConversionTraits}} doesn't offer a way for appending more than one cells 
> ({{AppendRow}} takes a builder and a single cell as its parameters).
> Would it be possible to extend conversion traits with an optional method 
> {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization 
> to efficiently append multiple cells at once? In the example above this 
> function would be called with {{std::vector::data()}} and 
> {{std::vector::size()}} if provided. If such method isn't provided by the 
> specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
> used as default.
> [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
>  is the particular part in code that will be replaced in practice. Instead of 
> directly calling AppendRow in a for loop, a public helper function (e.g. 
> {{stl::AppendRows}}) can be provided, in which it implements above logic.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API

2019-08-28 Thread Omer Ozarslan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Ozarslan updated ARROW-6375:
-
Description: 
I was trying to benchmark performances of using array builders vs. STL API for 
converting some row data to arrow tables. I realized it is around 1.5-1.8 times 
slower to convert {{std::vector}} values with STL API than doing so with 
builder API. It appears this is primarily due to appending rows via 
{{...::Append}} method by iterating over 
{{ConversionTrait>::AppendRow}} for each value.

Calling {{...::AppendValues}} would make it more efficient, however, 
{{ConversionTraits}} doesn't offer a way for appending more than one cells 
({{AppendRow}} takes a builder and a single cell as its parameters).

Would it be possible to extend conversion traits with an optional method 
{{AppendRows(Builder, Cell*, size_t),}} which allows template specialization to 
efficiently append multiple values at once? In the example above this function 
would be called with {{std::vector::data()}} and {{std::vector::size()}} if 
provided. If such method isn't provided by the specialization, current behavior 
(i.e. iterating over {{AppendRow}}) can be used as default.

[This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
 is the particular part in code that will be replaced in practice. Instead of 
directly calling AppendRow in a for loop, a public helper function (e.g. 
{{stl::AppendRows}}) can be provided, in which it implements above logic.

  was:
I was trying to benchmark performances of using array builders vs. STL API for 
converting some row data to arrow tables. I realized it is around 1.5-1.8 times 
slower to convert {{std::vector}} values with STL API than doing so with 
builder API. It appears this is primarily due to appending rows via 
{{...::Append}} method by iterating over 
{{ConversionTrait>::AppendRow}} for each value.

Calling {{...::AppendValues}} would make it more efficient, however, 
{{ConversionTraits}} doesn't offer a way for appending more than one cells 
({{AppendRow}} takes a builder and a single cell as its parameters).

Would it be possible to extend conversion traits with an optional metho\{{d 
}}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization 
to efficiently append multiple values at once? In the example above this 
function would be called with {{std::vector::data()}} and 
{{std::vector::size()}} if provided. If such method isn't provided by the 
specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
used as default.

[This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
 is the particular part in code that will be replaced in practice. Instead of 
directly calling AppendRow in a for loop, a public helper function (e.g. 
{{stl::AppendRows}}) can be provided, in which it implements above logic.


> [C++] Extend ConversionTraits to allow efficiently appending list values in 
> STL API
> ---
>
> Key: ARROW-6375
> URL: https://issues.apache.org/jira/browse/ARROW-6375
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
>
> I was trying to benchmark performances of using array builders vs. STL API 
> for converting some row data to arrow tables. I realized it is around 1.5-1.8 
> times slower to convert {{std::vector}} values with STL API than doing so 
> with builder API. It appears this is primarily due to appending rows via 
> {{...::Append}} method by iterating over 
> {{ConversionTrait>::AppendRow}} for each value.
> Calling {{...::AppendValues}} would make it more efficient, however, 
> {{ConversionTraits}} doesn't offer a way for appending more than one cells 
> ({{AppendRow}} takes a builder and a single cell as its parameters).
> Would it be possible to extend conversion traits with an optional method 
> {{AppendRows(Builder, Cell*, size_t),}} which allows template specialization 
> to efficiently append multiple values at once? In the example above this 
> function would be called with {{std::vector::data()}} and 
> {{std::vector::size()}} if provided. If such method isn't provided by the 
> specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
> used as default.
> [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
>  is the particular part in code that will be replaced in practice. Instead of 
> directly calling AppendRow in a for loop, a public helper function (e.g. 
> {{stl::AppendRows}}) can be provided, in which it implements above logic.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API

2019-08-28 Thread Omer Ozarslan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Ozarslan updated ARROW-6375:
-
Description: 
I was trying to benchmark performances of using array builders vs. STL API for 
converting some row data to arrow tables. I realized it is around 1.5-1.8 times 
slower to convert {{std::vector}} values with STL API than doing so with 
builder API. It appears this is primarily due to appending rows via 
{{...::Append}} method by iterating over 
{{ConversionTrait>::AppendRow}} for each value.

Calling {{...::AppendValues}} would make it more efficient, however, 
{{ConversionTraits}} doesn't offer a way for appending more than one cells 
({{AppendRow}} takes a builder and a single cell as its parameters).

Would it be possible to extend conversion traits with an optional metho\{{d 
}}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization 
to efficiently append multiple values at once? In the example above this 
function would be called with {{std::vector::data()}} and 
{{std::vector::size()}} if provided. If such method isn't provided by the 
specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
used as default.

[This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
 is the particular part in code that will be replaced in practice. Instead of 
directly calling AppendRow in a for loop, a public helper function (e.g. 
{{stl::AppendRows}}) can be provided, in which it implements above logic.

  was:
I was trying to benchmark performances of using array builders vs. STL API for 
converting some row data to arrow tables. I realized it is around 1.5-1.8 times 
slower to convert {{std::vector}} values with STL API than with builder API. It 
appears this is primarily due to appending rows via {{...::Append}} method by 
iterating over {{ConversionTrait>::AppendRow}} for each value.

Calling {{...::AppendValues}} would make it more efficient, however, 
{{ConversionTraits}} doesn't offer a way for appending more than one cells 
({{AppendRow}} takes a builder and a single cell as its parameters).

Would it be possible to extend conversion traits with an optional metho{{d 
}}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization 
to efficiently append multiple values at once? In the example above this 
function would be called with {{std::vector::data()}} and 
{{std::vector::size()}} if provided. If such method isn't provided by the 
specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
used as default.

[This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
 is the particular part in code that will be replaced in practice. Instead of 
directly calling AppendRow in a for loop, a public helper function (e.g. 
{{stl::AppendRows}}) can be provided, in which it implements above logic.


> [C++] Extend ConversionTraits to allow efficiently appending list values in 
> STL API
> ---
>
> Key: ARROW-6375
> URL: https://issues.apache.org/jira/browse/ARROW-6375
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Major
>
> I was trying to benchmark performances of using array builders vs. STL API 
> for converting some row data to arrow tables. I realized it is around 1.5-1.8 
> times slower to convert {{std::vector}} values with STL API than doing so 
> with builder API. It appears this is primarily due to appending rows via 
> {{...::Append}} method by iterating over 
> {{ConversionTrait>::AppendRow}} for each value.
> Calling {{...::AppendValues}} would make it more efficient, however, 
> {{ConversionTraits}} doesn't offer a way for appending more than one cells 
> ({{AppendRow}} takes a builder and a single cell as its parameters).
> Would it be possible to extend conversion traits with an optional metho\{{d 
> }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization 
> to efficiently append multiple values at once? In the example above this 
> function would be called with {{std::vector::data()}} and 
> {{std::vector::size()}} if provided. If such method isn't provided by the 
> specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
> used as default.
> [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
>  is the particular part in code that will be replaced in practice. Instead of 
> directly calling AppendRow in a for loop, a public helper function (e.g. 
> {{stl::AppendRows}}) can be provided, in which it implements above logic.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API

2019-08-28 Thread Omer Ozarslan (Jira)
Omer Ozarslan created ARROW-6375:


 Summary: [C++] Extend ConversionTraits to allow efficiently 
appending list values in STL API
 Key: ARROW-6375
 URL: https://issues.apache.org/jira/browse/ARROW-6375
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Omer Ozarslan


I was trying to benchmark performances of using array builders vs. STL API for 
converting some row data to arrow tables. I realized it is around 1.5-1.8 times 
slower to convert {{std::vector}} values with STL API than with builder API. It 
appears this is primarily due to appending rows via {{...::Append}} method by 
iterating over {{ConversionTrait>::AppendRow}} for each value.

Calling {{...::AppendValues}} would make it more efficient, however, 
{{ConversionTraits}} doesn't offer a way for appending more than one cells 
({{AppendRow}} takes a builder and a single cell as its parameters).

Would it be possible to extend conversion traits with an optional metho{{d 
}}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization 
to efficiently append multiple values at once? In the example above this 
function would be called with {{std::vector::data()}} and 
{{std::vector::size()}} if provided. If such method isn't provided by the 
specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
used as default.

[This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
 is the particular part in code that will be replaced in practice. Instead of 
directly calling AppendRow in a for loop, a public helper function (e.g. 
{{stl::AppendRows}}) can be provided, in which it implements above logic.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6371) [Doc] Row to columnar conversion example mentions arrow::Column in comments

2019-08-27 Thread Omer Ozarslan (Jira)
Omer Ozarslan created ARROW-6371:


 Summary: [Doc] Row to columnar conversion example mentions 
arrow::Column in comments
 Key: ARROW-6371
 URL: https://issues.apache.org/jira/browse/ARROW-6371
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Reporter: Omer Ozarslan


https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html

{code:cpp}
// The final representation should be an `arrow::Table` which in turn is made 
up of
// an `arrow::Schema` and a list of `arrow::Column`. An `arrow::Column` is 
again a
// named collection of one or more `arrow::Array` instances. As the first step, 
we
// will iterate over the data and build up the arrays incrementally.
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6326) [C++] Nullable fields when converting std::tuple to Table

2019-08-22 Thread Omer Ozarslan (Jira)
Omer Ozarslan created ARROW-6326:


 Summary: [C++] Nullable fields when converting std::tuple to Table
 Key: ARROW-6326
 URL: https://issues.apache.org/jira/browse/ARROW-6326
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Omer Ozarslan


{{std::optional}} isn't used for representing nullable fields in Arrow's 
current STL conversion API since it requires C++17. Also there are other ways 
to represent an optional field other than {{std::optional}} such as using 
pointers or external implementations of optional ({{boost::optional}}, 
{{type_safe::optional}} and alike). 

Since it is hard to maintain so many different kinds of specializations, 
introducing an {{Optional}} concept covering these classes could solve this 
issue and allow implementing nullable fields consistently.

So, the gist of proposed change will be something along the lines of:

{code:cpp}

template
constexpr bool is_optional_like_v = ...;

template
struct CTypeTraits>> {
   //...
}

template
struct ConversionTraits>> : 
public CTypeTraits {
   //...
}
{code}

For a type {{T}} to be considered as an {{Optional}}:
1) It should be convertible (implicitly or explicitly)  to {{bool}}, i.e. it 
implements {{[explicit] operator bool()}},
2) It should be dereferencable, i.e. it implements {{operator*()}}.

These two requirements provide a generalized way of templating nullable fields 
based on pointers, {{std::optional}}, {{boost::optional}} etc. However, it 
would be better (necessary?) if this implementation should act as a default 
while not breaking existing specializations of users (e.g. an existing  
implementation in which {{std::optional}} is specialized by user).

Is there any issues this approach may cause that I may have missed?

I will open a draft PR for working on that meanwhile.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6284) [C++] Allow references in std::tuple when converting tuple to arrow array

2019-08-17 Thread Omer Ozarslan (JIRA)
Omer Ozarslan created ARROW-6284:


 Summary: [C++] Allow references in std::tuple when converting 
tuple to arrow array
 Key: ARROW-6284
 URL: https://issues.apache.org/jira/browse/ARROW-6284
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Omer Ozarslan


This allows using std::tuple (e.g. std::tie) to convert user data types. More 
details will be provided in PR.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed

2019-08-10 Thread Omer Ozarslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Ozarslan closed ARROW-6195.

Resolution: Invalid

> [C++] CMake fails with file not found error while bundling thrift if python 
> is not installed
> 
>
> Key: ARROW-6195
> URL: https://issues.apache.org/jira/browse/ARROW-6195
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Minor
>
> I had this error message while I was trying to reproduce another issue in 
> docker.
> To reproduce:
> ```
> FROM debian:buster 
> RUN apt-get update 
> RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential 
> cmake 
>  
> WORKDIR /app 
> RUN git clone https://github.com/apache/arrow.git 
> WORKDIR /app/arrow/cpp/build 
> RUN git checkout 167cea0 # HEAD as of 10-Aug-19
> RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. 
> RUN cmake --build . --target thrift_ep -j 8
> ```
> Relevant part of output:
> ```
> Scanning dependencies of target thrift_ep
> [ 66%] Creating directories for 'thrift_ep'
> [ 66%] Performing download step (verify and extract) for 'thrift_ep'
> CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message):
>  File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz
> make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: 
> thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 
> make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2
> make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2
> make: *** [Makefile:487: thrift_ep] Error 2
> ```
> Installing python fixes the problem, but this isn't directly clear from the 
> error message. The source of issue is that execute_process in 
> get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value 
> since PYTHON_EXECUTABLE was empty.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed

2019-08-10 Thread Omer Ozarslan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904476#comment-16904476
 ] 

Omer Ozarslan commented on ARROW-6195:
--

Never mind. This is already documented in 
https://arrow.apache.org/docs/python/development.html.

> [C++] CMake fails with file not found error while bundling thrift if python 
> is not installed
> 
>
> Key: ARROW-6195
> URL: https://issues.apache.org/jira/browse/ARROW-6195
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Minor
>
> I had this error message while I was trying to reproduce another issue in 
> docker.
> To reproduce:
> ```
> FROM debian:buster 
> RUN apt-get update 
> RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential 
> cmake 
>  
> WORKDIR /app 
> RUN git clone https://github.com/apache/arrow.git 
> WORKDIR /app/arrow/cpp/build 
> RUN git checkout 167cea0 # HEAD as of 10-Aug-19
> RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. 
> RUN cmake --build . --target thrift_ep -j 8
> ```
> Relevant part of output:
> ```
> Scanning dependencies of target thrift_ep
> [ 66%] Creating directories for 'thrift_ep'
> [ 66%] Performing download step (verify and extract) for 'thrift_ep'
> CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message):
>  File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz
> make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: 
> thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 
> make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2
> make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2
> make: *** [Makefile:487: thrift_ep] Error 2
> ```
> Installing python fixes the problem, but this isn't directly clear from the 
> error message. The source of issue is that execute_process in 
> get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value 
> since PYTHON_EXECUTABLE was empty.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed

2019-08-10 Thread Omer Ozarslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Ozarslan updated ARROW-6195:
-
Component/s: C++

> [C++] CMake fails with file not found error while bundling thrift if python 
> is not installed
> 
>
> Key: ARROW-6195
> URL: https://issues.apache.org/jira/browse/ARROW-6195
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Minor
>
> I had this error message while I was trying to reproduce another issue in 
> docker.
> To reproduce:
> ```
> FROM debian:buster 
> RUN apt-get update 
> RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential 
> cmake 
>  
> WORKDIR /app 
> RUN git clone https://github.com/apache/arrow.git 
> WORKDIR /app/arrow/cpp/build 
> RUN git checkout 167cea0 # HEAD as of 10-Aug-19
> RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. 
> RUN cmake --build . --target thrift_ep -j 8
> ```
> Relevant part of output:
> ```
> Scanning dependencies of target thrift_ep
> [ 66%] Creating directories for 'thrift_ep'
> [ 66%] Performing download step (verify and extract) for 'thrift_ep'
> CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message):
>  File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz
> make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: 
> thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 
> make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2
> make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2
> make: *** [Makefile:487: thrift_ep] Error 2
> ```
> Installing python fixes the problem, but this isn't directly clear from the 
> error message. The source of issue is that execute_process in 
> get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value 
> since PYTHON_EXECUTABLE was empty.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed

2019-08-10 Thread Omer Ozarslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Ozarslan updated ARROW-6195:
-
Description: 
I had this error message while I was trying to reproduce another issue in 
docker.

To reproduce:

```
FROM debian:buster 
RUN apt-get update 
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential cmake 
 
WORKDIR /app 
RUN git clone https://github.com/apache/arrow.git 
WORKDIR /app/arrow/cpp/build 
RUN git checkout 167cea0 # HEAD as of 10-Aug-19
RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. 
RUN cmake --build . --target thrift_ep -j 8
```

Relevant part of output:
```
Scanning dependencies of target thrift_ep
[ 66%] Creating directories for 'thrift_ep'
[ 66%] Performing download step (verify and extract) for 'thrift_ep'
CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message):
 File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz
make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: 
thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 
make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2
make: *** [Makefile:487: thrift_ep] Error 2
```

Installing python fixes the problem, but this isn't directly clear from the 
error message. The source of issue is that execute_process in 
get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value 
since PYTHON_EXECUTABLE was empty.

  was:
I had this error message while I was trying to reproduce another issue in 
docker.

To reproduce:

```
FROM debian:buster 
RUN apt-get update 
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential cmake 
 
WORKDIR /app 
RUN git clone https://github.com/apache/arrow.git 
RUN git checkout 167cea0 # HEAD as of 10-Aug-19
WORKDIR /app/arrow/cpp/build 
RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. 
RUN cmake --build . --target thrift_ep -j 8
```

Relevant part of output:
```
Scanning dependencies of target thrift_ep
[ 66%] Creating directories for 'thrift_ep'
[ 66%] Performing download step (verify and extract) for 'thrift_ep'
CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message):
 File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz
make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: 
thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 
make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2
make: *** [Makefile:487: thrift_ep] Error 2
```

Installing python fixes the problem, but this isn't directly clear from the 
error message. The source of issue is that execute_process in 
get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value 
since PYTHON_EXECUTABLE was empty.


> [C++] CMake fails with file not found error while bundling thrift if python 
> is not installed
> 
>
> Key: ARROW-6195
> URL: https://issues.apache.org/jira/browse/ARROW-6195
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Omer Ozarslan
>Priority: Minor
>
> I had this error message while I was trying to reproduce another issue in 
> docker.
> To reproduce:
> ```
> FROM debian:buster 
> RUN apt-get update 
> RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential 
> cmake 
>  
> WORKDIR /app 
> RUN git clone https://github.com/apache/arrow.git 
> WORKDIR /app/arrow/cpp/build 
> RUN git checkout 167cea0 # HEAD as of 10-Aug-19
> RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. 
> RUN cmake --build . --target thrift_ep -j 8
> ```
> Relevant part of output:
> ```
> Scanning dependencies of target thrift_ep
> [ 66%] Creating directories for 'thrift_ep'
> [ 66%] Performing download step (verify and extract) for 'thrift_ep'
> CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message):
>  File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz
> make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: 
> thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 
> make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2
> make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2
> make: *** [Makefile:487: thrift_ep] Error 2
> ```
> Installing python fixes the problem, but this isn't directly clear from the 
> error message. The source of issue is that execute_process in 
> get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value 
> since PYTHON_EXECUTABLE was empty.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6195) [C++] CMake fails with file not found error while bundling thrift if python is not installed

2019-08-10 Thread Omer Ozarslan (JIRA)
Omer Ozarslan created ARROW-6195:


 Summary: [C++] CMake fails with file not found error while 
bundling thrift if python is not installed
 Key: ARROW-6195
 URL: https://issues.apache.org/jira/browse/ARROW-6195
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Omer Ozarslan


I had this error message while I was trying to reproduce another issue in 
docker.

To reproduce:

```
FROM debian:buster 
RUN apt-get update 
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git build-essential cmake 
 
WORKDIR /app 
RUN git clone https://github.com/apache/arrow.git 
RUN git checkout 167cea0 # HEAD as of 10-Aug-19
WORKDIR /app/arrow/cpp/build 
RUN cmake -DARROW_PARQUET=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. 
RUN cmake --build . --target thrift_ep -j 8
```

Relevant part of output:
```
Scanning dependencies of target thrift_ep
[ 66%] Creating directories for 'thrift_ep'
[ 66%] Performing download step (verify and extract) for 'thrift_ep'
CMake Error at thrift_ep-stamp/verify-thrift_ep.cmake:11 (message):
 File not found: /thrift/0.12.0/thrift-0.12.0.tar.gz
make[3]: *** [CMakeFiles/thrift_ep.dir/build.make:90: 
thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download] Error 1 
make[2]: *** [CMakeFiles/Makefile2:916: CMakeFiles/thrift_ep.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:928: CMakeFiles/thrift_ep.dir/rule] Error 2
make: *** [Makefile:487: thrift_ep] Error 2
```

Installing python fixes the problem, but this isn't directly clear from the 
error message. The source of issue is that execute_process in 
get_apache_mirrors macro silently fails and returns empty APACHE_MIRROR value 
since PYTHON_EXECUTABLE was empty.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6190) [C++] Define and declare functions regardless of NDEBUG

2019-08-09 Thread Omer Ozarslan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904070#comment-16904070
 ] 

Omer Ozarslan commented on ARROW-6190:
--

Submitted PR on https://github.com/apache/arrow/pull/5049.

> [C++] Define and declare functions regardless of NDEBUG
> ---
>
> Key: ARROW-6190
> URL: https://issues.apache.org/jira/browse/ARROW-6190
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Minor
>
> NDEBUG is not shipped in linker flags, so I got a linker error with release 
> build on FixedSizeBinaryBuilder::UnsafeAppend(util::string_view value) call, 
> since it makes a call to CheckValueSize.
> This is somewhat a follow-up of ARROW-2313. I took the same path by removing 
> NDEBUG ifdefs around CheckValueSize definition and declaration.
> I applied the same fix to CheckUTF8Initialized as well after grepping the 
> source code for "#ifndef NDEBUG" and figured out it has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6190) [C++] Define and declare functions regardless of NDEBUG

2019-08-09 Thread Omer Ozarslan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Ozarslan updated ARROW-6190:
-
Summary: [C++] Define and declare functions regardless of NDEBUG  (was: 
Define and declare functions regardless of NDEBUG)

> [C++] Define and declare functions regardless of NDEBUG
> ---
>
> Key: ARROW-6190
> URL: https://issues.apache.org/jira/browse/ARROW-6190
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Omer Ozarslan
>Priority: Minor
>
> NDEBUG is not shipped in linker flags, so I got a linker error with release 
> build on FixedSizeBinaryBuilder::UnsafeAppend(util::string_view value) call, 
> since it makes a call to CheckValueSize.
> This is somewhat a follow-up of ARROW-2313. I took the same path by removing 
> NDEBUG ifdefs around CheckValueSize definition and declaration.
> I applied the same fix to CheckUTF8Initialized as well after grepping the 
> source code for "#ifndef NDEBUG" and figured out it has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6190) Define and declare functions regardless of NDEBUG

2019-08-09 Thread Omer Ozarslan (JIRA)
Omer Ozarslan created ARROW-6190:


 Summary: Define and declare functions regardless of NDEBUG
 Key: ARROW-6190
 URL: https://issues.apache.org/jira/browse/ARROW-6190
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Omer Ozarslan


NDEBUG is not shipped in linker flags, so I got a linker error with release 
build on FixedSizeBinaryBuilder::UnsafeAppend(util::string_view value) call, 
since it makes a call to CheckValueSize.

This is somewhat a follow-up of ARROW-2313. I took the same path by removing 
NDEBUG ifdefs around CheckValueSize definition and declaration.

I applied the same fix to CheckUTF8Initialized as well after grepping the 
source code for "#ifndef NDEBUG" and figured out it has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)