[jira] [Created] (ARROW-18346) [Python] Dataset writer API papercuts

2022-11-16 Thread David Li (Jira)
David Li created ARROW-18346:


 Summary: [Python] Dataset writer API papercuts
 Key: ARROW-18346
 URL: https://issues.apache.org/jira/browse/ARROW-18346
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 10.0.0
Reporter: David Li


* Writer options are not very discoverable. Perhaps "file_options" should 
mention compression as an example of something you can control, so people 
looking for it know where to go next?
 * Compression seems like it might be common enough to warrant a top-level 
parameter somehow (even if it gets implemented differently internally)?
 * Either way, this needs a cookbook example.
 * {{make_write_options}} is lacking a docstring
 * Writer options objects are lacking {{{}__repr__{}}}s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18320) [C++] Flight client may crash due to improper Result/Status conversion

2022-11-14 Thread David Li (Jira)
David Li created ARROW-18320:


 Summary: [C++] Flight client may crash due to improper 
Result/Status conversion
 Key: ARROW-18320
 URL: https://issues.apache.org/jira/browse/ARROW-18320
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Affects Versions: 6.0.0
Reporter: David Li


Reported on user@ 
https://lists.apache.org/thread/84z329t1djhnbr5bq936v4hr8cyngj2l 

{noformat}
I have an issue on my project, we have a query execution engine that
returns result data as a flight stream and c++ client that receives the
stream. In case a query has no results but the result schema implies
dictionary encoded fields in results we have client app crushed.

The cause is in cpp/src/arrow/flight/client.cc:461:

::arrow::Result> ReadNextMessage() override {
if (stream_finished_) {
return nullptr;
}
internal::FlightData* data;
{
auto guard = read_mutex_ ? std::unique_lock(*read_mutex_)
: std::unique_lock();
peekable_reader_->Next();
}
if (!data) {
stream_finished_ = true;
return stream_->Finish(Status::OK()); // Here the issue
}
// Validate IPC message
auto result = data->OpenMessage();
if (!result.ok()) {
return stream_->Finish(std::move(result).status());
}
*app_metadata_ = std::move(data->app_metadata);
return result;
}

The method returns Result object while stream_Finish(..) returns a Status.
So there is an implicit conversion from Status to Result that causes
Result(Status) constructor to be called, but the constructor expects only
error statuses which in turn causes the app to be failed:

/// Constructs a Result object with the given non-OK Status object. All
/// calls to ValueOrDie() on this object will abort. The given `status` must
/// not be an OK status, otherwise this constructor will abort.
///
/// This constructor is not declared explicit so that a function with a
return
/// type of `Result` can return a Status object, and the status will be
/// implicitly converted to the appropriate return type as a matter of
/// convenience.
///
/// \param status The non-OK Status object to initialize to.
Result(const Status& status) noexcept // NOLINT(runtime/explicit)
: status_(status) {
if (ARROW_PREDICT_FALSE(status.ok())) {
internal::DieWithMessage(std::string("Constructed with a non-error status: ")
+
status.ToString());
}
}

Is there a way to workaround or fix it? We use Arrow 6.0.0, but it seems
that the issue exists in all future versions.
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18229) [C++][Python] RecordBatchReader can be created with a 'dict' schema which then crashes on use

2022-11-02 Thread David Li (Jira)
David Li created ARROW-18229:


 Summary: [C++][Python] RecordBatchReader can be created with a 
'dict' schema which then crashes on use
 Key: ARROW-18229
 URL: https://issues.apache.org/jira/browse/ARROW-18229
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 10.0.0
Reporter: David Li


Presumably we should disallow this or convert it to a schema?

https://github.com/duckdb/duckdb/issues/5143

{noformat}
>>> import pyarrow as pa
>>> pa.__version__
'10.0.0'
>>> reader = pa.RecordBatchReader.from_batches({"a": pa.int8()}, [])
>>> reader.schema
fish: Job 1, 'python3' terminated by signal SIGSEGV (Address boundary error)

(gdb) bt
#0  0x74247580 in arrow::Schema::num_fields() const ()
   from 
/home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
#1  0x742b93f7 in arrow::(anonymous namespace)::SchemaPrinter::Print()
()
   from 
/home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
#2  0x742b98a7 in arrow::PrettyPrint(arrow::Schema const&, 
arrow::PrettyPrintOptions const&, std::string*) ()
   from 
/home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
#3  0x764f814b in __pyx_pw_7pyarrow_3lib_6Schema_52to_string(_object*, 
_object*, _object*) ()
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18191) [C++] Valgrind failure in arrow-gcsfs-test

2022-10-28 Thread David Li (Jira)
David Li created ARROW-18191:


 Summary: [C++] Valgrind failure in arrow-gcsfs-test
 Key: ARROW-18191
 URL: https://issues.apache.org/jira/browse/ARROW-18191
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


{noformat}
==11267== 
==11267== HEAP SUMMARY:
==11267== in use at exit: 12,091 bytes in 190 blocks
==11267==   total heap usage: 982,685 allocs, 982,495 frees, 1,332,264,705 
bytes allocated
==11267== 
==11267== 192 bytes in 8 blocks are definitely lost in loss record 35 of 45
==11267==at 0x40377A5: operator new(unsigned long, std::nothrow_t const&) 
(vg_replace_malloc.c:542)
==11267==by 0x682B079: __cxa_thread_atexit (atexit_thread.cc:152)
==11267==by 0x672F2D6: 
google::cloud::v2_3_0::internal::OptionsSpan::OptionsSpan(google::cloud::v2_3_0::Options)
 (in /opt/conda/envs/arrow/lib/libgoogle_cloud_cpp_common.so.2.3.0)
==11267==by 0x5DFCA33: google::cloud::v2_3_0::Status 
google::cloud::storage::v2_3_0::Client::DeleteObject(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, google::cloud::storage::v2_3_0::Generation&&) (client.h:1285)
==11267==by 0x5DFD022: operator() (gcsfs.cc:550)
==11267==by 0x5DFD022: 
operator()&)>&,
 
google::cloud::v2_3_0::StatusOr&>
 (future.h:150)
==11267==by 0x5DFD022: __invoke_impl&, 
arrow::fs::GcsFileSystem::Impl::DeleteDirContents(const arrow::fs::(anonymous 
namespace)::GcsPath&, bool, const arrow::io::IOContext&)::&)>&,
 
google::cloud::v2_3_0::StatusOr&>
 (invoke.h:60)
==11267==by 0x5DFD022: __invoke&, 
arrow::fs::GcsFileSystem::Impl::DeleteDirContents(const arrow::fs::(anonymous 
namespace)::GcsPath&, bool, const arrow::io::IOContext&)::&)>&,
 
google::cloud::v2_3_0::StatusOr&>
 (invoke.h:95)
==11267==by 0x5DFD022: __call (functional:416)
==11267==by 0x5DFD022: operator()<> (functional:499)
==11267==by 0x5DFD022: arrow::internal::FnOnce::FnImpl, 
arrow::fs::GcsFileSystem::Impl::DeleteDirContents(arrow::fs::(anonymous 
namespace)::GcsPath const&, bool, arrow::io::IOContext 
const&)::{lambda(google::cloud::v2_3_0::StatusOr
 const&)#1}, 
google::cloud::v2_3_0::StatusOr)>
 >::invoke() (functional.h:152)
==11267==by 0x50BDAA1: operator() (functional.h:140)
==11267==by 0x50BDAA1: 
arrow::internal::WorkerLoop(std::shared_ptr,
 std::_List_iterator) (thread_pool.cc:243)
==11267==by 0x50BE161: operator() (thread_pool.cc:414)
==11267==by 0x50BE161: __invoke_impl > 
(invoke.h:60)
==11267==by 0x50BE161: 
__invoke > 
(invoke.h:95)
==11267==by 0x50BE161: _M_invoke<0> (thread:264)
==11267==by 0x50BE161: operator() (thread:271)
==11267==by 0x50BE161: 
std::thread::_State_impl
 > >::_M_run() (thread:215)
==11267==by 0x6849A92: execute_native_thread_routine (thread.cc:82)
==11267==by 0x69666DA: start_thread (pthread_create.c:463)
==11267==by 0x6C9F61E: clone (clone.S:95)
==11267== 
{
   
   Memcheck:Leak
   match-leak-kinds: definite
   fun:_ZnwmRKSt9nothrow_t
   fun:execute_native_thread_routine
   fun:start_thread
   fun:clone
}
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18060) [C++] Writing a dataset with 0 rows doesn't create any files

2022-10-14 Thread David Li (Jira)
David Li created ARROW-18060:


 Summary: [C++] Writing a dataset with 0 rows doesn't create any 
files
 Key: ARROW-18060
 URL: https://issues.apache.org/jira/browse/ARROW-18060
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 9.0.0
Reporter: David Li


If the input data has no rows, no files get created. This is potentially 
unexpected as it looks like "nothing happened". It might be nicer to create an 
empty file. With partitioning, though, that then gets weird (there's no 
partition values) so maybe an error might make more sense instead.

Reproduction in Python
{code:python}
import tempfile
from pathlib import Path

import pyarrow
import pyarrow.dataset

print("PyArrow version:", pyarrow.__version__)

table = pyarrow.table([
[],
], schema=pyarrow.schema([
("ints", "int64"),
]))

with tempfile.TemporaryDirectory() as d:
pyarrow.dataset.write_dataset(table, d, format="feather")
print(list(Path(d).iterdir()))
{code}
Output
{noformat}
> python repro.py
PyArrow version: 9.0.0
[] {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18035) [Java] Enable allocator logging in CI

2022-10-13 Thread David Li (Jira)
David Li created ARROW-18035:


 Summary: [Java] Enable allocator logging in CI
 Key: ARROW-18035
 URL: https://issues.apache.org/jira/browse/ARROW-18035
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


This would help track down certain flaky tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18034) [Java][FlightRPC] TestBasicOperation.getStreamLargeBatch is flaky on Windows CI

2022-10-13 Thread David Li (Jira)
David Li created ARROW-18034:


 Summary: [Java][FlightRPC] TestBasicOperation.getStreamLargeBatch 
is flaky on Windows CI
 Key: ARROW-18034
 URL: https://issues.apache.org/jira/browse/ARROW-18034
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Reporter: David Li


{noformat}
java.lang.IllegalStateException: 
Memory was leaked by query. Memory leaked: (134217728)
Allocator(ROOT) 0/134217728/270532608/9223372036854775807 
(res/actual/peak/limit)

at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:437)
at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29)
at 
org.apache.arrow.flight.TestBasicOperation$Producer.close(TestBasicOperation.java:514)
at 
org.apache.arrow.flight.TestBasicOperation.test(TestBasicOperation.java:333)
at 
org.apache.arrow.flight.TestBasicOperation.test(TestBasicOperation.java:312)
at 
org.apache.arrow.flight.TestBasicOperation.getStreamLargeBatch(TestBasicOperation.java:270)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17971) [Format][Docs] Add ADBC page

2022-10-10 Thread David Li (Jira)
David Li created ARROW-17971:


 Summary: [Format][Docs] Add ADBC page
 Key: ARROW-17971
 URL: https://issues.apache.org/jira/browse/ARROW-17971
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Documentation, Format
Reporter: David Li
Assignee: David Li


See ML vote thread: 
https://lists.apache.org/thread/7gb8dooz554ykbk5wlrngzkgmq0qx7y0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17914) [Java] Support reading a subset of fields from an IPC file or stream

2022-10-03 Thread David Li (Jira)
David Li created ARROW-17914:


 Summary: [Java] Support reading a subset of fields from an IPC 
file or stream
 Key: ARROW-17914
 URL: https://issues.apache.org/jira/browse/ARROW-17914
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


C++ supports {{IpcReadOptions.included_fields}} which lets you load a subset of 
(top-level) fields from an IPC file or stream, potentially saving on I/O costs. 
It would be useful to support this in Java as well. Some refactoring would be 
required since MessageSerializer currently reads record batch messages in as a 
whole, and it would be good to quantify how much of a benefit this provides in 
different scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17867) [C++][FlightRPC] Expose bulk parameter binding in Flight SQL client

2022-09-27 Thread David Li (Jira)
David Li created ARROW-17867:


 Summary: [C++][FlightRPC] Expose bulk parameter binding in Flight 
SQL client
 Key: ARROW-17867
 URL: https://issues.apache.org/jira/browse/ARROW-17867
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: David Li
Assignee: David Li


Also fix various issues noticed as part of ARROW-17661



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17857) [C++] Table::CombineChunksToBatch segfaults on empty tables

2022-09-27 Thread David Li (Jira)
David Li created ARROW-17857:


 Summary: [C++] Table::CombineChunksToBatch segfaults on empty 
tables
 Key: ARROW-17857
 URL: https://issues.apache.org/jira/browse/ARROW-17857
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li
Assignee: David Li


There can be 0 chunks in a ChunkedArray



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17840) [Java] Remove flaky JaCoCo check in JDBC driver

2022-09-25 Thread David Li (Jira)
David Li created ARROW-17840:


 Summary: [Java] Remove flaky JaCoCo check in JDBC driver
 Key: ARROW-17840
 URL: https://issues.apache.org/jira/browse/ARROW-17840
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


It doesn't seem to bring much value + can make builds flaky (e.g. a branch may 
or may not be hit depending on when exactly an exception occurs)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17830) [C++][Gandiva] AppVeyor Windows builds failing due to 'diaguids.lib'

2022-09-23 Thread David Li (Jira)
David Li created ARROW-17830:


 Summary: [C++][Gandiva] AppVeyor Windows builds failing due to 
'diaguids.lib'
 Key: ARROW-17830
 URL: https://issues.apache.org/jira/browse/ARROW-17830
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Gandiva
Reporter: David Li


Observed in AppVeyor across a few PRs
{noformat}
(arrow) C:\projects\arrow\cpp\build>cmake --build . --target install --config 
Release   || exit /B 
ninja: error: 'C:/Program Files (x86)/Microsoft Visual 
Studio/2019/Enterprise/DIA SDK/lib/amd64/diaguids.lib', needed by 
'release/gandiva.dll', missing and no known rule to make it
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17810) [Java] Update JaCoCo to 0.8.8 for Java 18 support in CI

2022-09-21 Thread David Li (Jira)
David Li created ARROW-17810:


 Summary: [Java] Update JaCoCo to 0.8.8 for Java 18 support in CI
 Key: ARROW-17810
 URL: https://issues.apache.org/jira/browse/ARROW-17810
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li
Assignee: David Li


Not sure why this didn't fail before, but we need to bump JaCoCo for Java 18 to 
work:

{noformat}
java.lang.instrument.IllegalClassFormatException: Error while instrumenting 
org/apache/calcite/avatica/AvaticaConnection$MockitoMock$854659140$auxiliary$kA4H37GT.
at 
org.jacoco.agent.rt.internal_3570298.CoverageTransformer.transform(CoverageTransformer.java:94)
at 
java.instrument/java.lang.instrument.ClassFileTransformer.transform(ClassFileTransformer.java:244)
at 
java.instrument/sun.instrument.TransformerManager.transform(TransformerManager.java:188)
at 
java.instrument/sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:541)
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1013)
at 
java.base/java.lang.ClassLoader$ByteBuddyAccessor$PXg8JwS3.defineClass(Unknown 
Source)
at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:577)
at 
net.bytebuddy.dynamic.loading.ClassInjector$UsingReflection$Dispatcher$UsingUnsafeInjection.defineClass(ClassInjector.java:1027)
at 
net.bytebuddy.dynamic.loading.ClassInjector$UsingReflection.injectRaw(ClassInjector.java:279)
at 
net.bytebuddy.dynamic.loading.ClassInjector$AbstractBase.inject(ClassInjector.java:114)
at 
net.bytebuddy.dynamic.loading.ClassLoadingStrategy$Default$InjectionDispatcher.load(ClassLoadingStrategy.java:233)
at 
net.bytebuddy.dynamic.TypeResolutionStrategy$Passive.initialize(TypeResolutionStrategy.java:100)
at 
net.bytebuddy.dynamic.DynamicType$Default$Unloaded.load(DynamicType.java:6154)
at 
org.mockito.internal.creation.bytebuddy.SubclassBytecodeGenerator.mockClass(SubclassBytecodeGenerator.java:268)
at 
org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.lambda$mockClass$0(TypeCachingBytecodeGenerator.java:47)
at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:153)
at 
net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:366)
at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:175)
at 
net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:377)
at 
org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.mockClass(TypeCachingBytecodeGenerator.java:40)
at 
org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.mockClass(InlineBytecodeGenerator.java:216)
at 
org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.lambda$mockClass$0(TypeCachingBytecodeGenerator.java:47)
at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:153)
at 
net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:366)
at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:175)
at 
net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:377)
at 
org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.mockClass(TypeCachingBytecodeGenerator.java:40)
at 
org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.createMockType(InlineDelegateByteBuddyMockMaker.java:391)
at 
org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.doCreateMock(InlineDelegateByteBuddyMockMaker.java:351)
at 
org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.createMock(InlineDelegateByteBuddyMockMaker.java:330)
at 
org.mockito.internal.creation.bytebuddy.InlineByteBuddyMockMaker.createMock(InlineByteBuddyMockMaker.java:58)
at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:53)
at org.mockito.internal.MockitoCore.mock(MockitoCore.java:84)
at org.mockito.Mockito.mock(Mockito.java:1964)
at 
org.mockito.internal.configuration.MockAnnotationProcessor.processAnnotationForMock(MockAnnotationProcessor.java:66)
at 
org.mockito.internal.configuration.MockAnnotationProcessor.process(MockAnnotationProcessor.java:27)
at 
org.mockito.internal.configuration.MockAnnotationProcessor.process(MockAnnotationProcessor.java:24)
at 
org.mockito.internal.configuration.IndependentAnnotationEngine.createMockFor(IndependentAnnotationEngine.java:45)
at 
org.mockito.internal.configuration.IndependentAnnotationEngine.process(IndependentAnnotationEngine.java:73)
at 

[jira] [Created] (ARROW-17797) [Java] Remove deprecated methods from Java dataset module in Arrow 11

2022-09-21 Thread David Li (Jira)
David Li created ARROW-17797:


 Summary: [Java] Remove deprecated methods from Java dataset module 
in Arrow 11
 Key: ARROW-17797
 URL: https://issues.apache.org/jira/browse/ARROW-17797
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


ARROW-15745 deprecated some things in the Dataset module which should be 
removed for Arrow >= 11



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17787) [Docs][Java] javadoc failing on flight-integration-tests

2022-09-20 Thread David Li (Jira)
David Li created ARROW-17787:


 Summary: [Docs][Java] javadoc failing on flight-integration-tests
 Key: ARROW-17787
 URL: https://issues.apache.org/jira/browse/ARROW-17787
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation, Java
Reporter: David Li
Assignee: David Li


Observed on master
{noformat}
 Loading source files for package org.apache.arrow.flight.integration.tests...
Constructing Javadoc information...
1 error
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Arrow Java Root POM 10.0.0-SNAPSHOT . SUCCESS [07:35 min]
[INFO] Arrow Format ... SUCCESS [ 26.940 s]
[INFO] Arrow Memory ... SUCCESS [ 23.462 s]
[INFO] Arrow Memory - Core  SUCCESS [ 13.328 s]
[INFO] Arrow Memory - Unsafe .. SUCCESS [ 14.376 s]
[INFO] Arrow Memory - Netty ... SUCCESS [ 16.075 s]
[INFO] Arrow Vectors .. SUCCESS [05:51 min]
[INFO] Arrow Compression .. SUCCESS [ 36.824 s]
[INFO] Arrow Tools  SUCCESS [ 43.014 s]
[INFO] Arrow JDBC Adapter . SUCCESS [ 40.846 s]
[INFO] Arrow Plasma Client  SUCCESS [ 26.950 s]
[INFO] Arrow Flight ... SUCCESS [ 23.166 s]
[INFO] Arrow Flight Core .. SUCCESS [02:01 min]
[INFO] Arrow Flight GRPC .. SUCCESS [ 33.919 s]
[INFO] Arrow Flight SQL ... SUCCESS [ 27.265 s]
[INFO] Arrow Flight SQL JDBC Driver ... SKIPPED
[INFO] Arrow Flight Integration Tests . FAILURE [ 16.021 s]
[INFO] Arrow AVRO Adapter . SUCCESS [ 38.905 s]
[INFO] Arrow Algorithms ... SUCCESS [ 30.490 s]
[INFO] Arrow Performance Benchmarks 10.0.0-SNAPSHOT ... SUCCESS [ 43.648 s]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 17:06 min (Wall Clock)
[INFO] Finished at: 2022-09-20T16:26:54Z
[INFO] 
Error:  Failed to execute goal 
org.apache.maven.plugins:maven-site-plugin:3.5.1:site (default-site) on project 
flight-integration-tests: Error generating 
maven-javadoc-plugin:3.0.0-M1:test-javadoc: 
Error:  Exit code: 1 - javadoc: error - No public or protected classes found to 
document.
Error:  
Error:  Command line was: /usr/lib/jvm/java-8-openjdk-amd64/jre/../bin/javadoc 
@options @packages
Error:  
Error:  Refer to the generated Javadoc files in 
'/arrow/java/flight/flight-integration-tests/target/site/testapidocs' dir.
Error:  -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e 
switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please 
read the following articles:
Error:  [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:mvn  -rf :flight-integration-tests
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17785) [Java] Flakiness in JDBC driver test ArrowFlightJdbcConnectionCookieTest.testCookies

2022-09-20 Thread David Li (Jira)
David Li created ARROW-17785:


 Summary: [Java] Flakiness in JDBC driver test 
ArrowFlightJdbcConnectionCookieTest.testCookies
 Key: ARROW-17785
 URL: https://issues.apache.org/jira/browse/ARROW-17785
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: David Li
Assignee: David Li


I think we should just suppress this kind of exception in Flight SQL as it's 
not really actionable

{noformat}
 Error:  
org.apache.arrow.driver.jdbc.ArrowFlightJdbcConnectionCookieTest.testCookies  
Time elapsed: 0.805 s  <<< ERROR!
java.sql.SQLException: While closing statement
at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
at 
org.apache.calcite.avatica.AvaticaStatement.close(AvaticaStatement.java:254)
at 
org.apache.arrow.driver.jdbc.ArrowFlightJdbcConnectionCookieTest.testCookies(ArrowFlightJdbcConnectionCookieTest.java:51)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.Verifier$1.evaluate(Verifier.java:35)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.apache.arrow.driver.jdbc.FlightServerTestRule$1.evaluate(FlightServerTestRule.java:166)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
at 
org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:90)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:55)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:102)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:54)
at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
Sep 19, 2022 12:52:16 AM io.grpc.netty.NettyServerHandler onStreamError
WARNING: Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed 
before write could take place
at 
io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:173)
at 
io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:481)
at 
io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
at 
io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:357)
at 
io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:1007)
at 

[jira] [Created] (ARROW-17741) [Packaging] Add JDBC driver to release tasks

2022-09-15 Thread David Li (Jira)
David Li created ARROW-17741:


 Summary: [Packaging] Add JDBC driver to release tasks
 Key: ARROW-17741
 URL: https://issues.apache.org/jira/browse/ARROW-17741
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Packaging
Reporter: David Li
Assignee: David Li


The java-jars task has a list of artifacts to upload, the JDBC driver needs to 
be included there: 
https://github.com/apache/arrow/blob/7cfdfbb0d5472f8f8893398b51042a3ca1dd0adf/dev/tasks/tasks.yml#L816-L820



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17732) [Docs][Java] Add documentation page for Flight SQL JDBC driver

2022-09-14 Thread David Li (Jira)
David Li created ARROW-17732:


 Summary: [Docs][Java] Add documentation page for Flight SQL JDBC 
driver
 Key: ARROW-17732
 URL: https://issues.apache.org/jira/browse/ARROW-17732
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Documentation, Java
Reporter: David Li
Assignee: David Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17731) [Website] Add blog post about Flight SQL JDBC driver

2022-09-14 Thread David Li (Jira)
David Li created ARROW-17731:


 Summary: [Website] Add blog post about Flight SQL JDBC driver
 Key: ARROW-17731
 URL: https://issues.apache.org/jira/browse/ARROW-17731
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Website
Reporter: David Li
Assignee: David Li
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17729) [Java][FlightRPC] Flight SQL JDBC driver improvements

2022-09-14 Thread David Li (Jira)
David Li created ARROW-17729:


 Summary: [Java][FlightRPC] Flight SQL JDBC driver improvements
 Key: ARROW-17729
 URL: https://issues.apache.org/jira/browse/ARROW-17729
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Reporter: David Li


Follow ups for ARROW-7744

* Rename internal classes to not imply everything is part of Flight RPC (e.g. 
ArrowFlightJdbcArray -> FieldVectorArray or similar)
* Don't throw bare exceptions (always provide some error context)
* Log a warning if the {{arrow-flight:}} URI scheme is used instead of 
{{arrow-flight-sql:}}
* Create a documentation page (that can be used for people approaching this 
from the JDBC side, not necessarily Arrow users)
* Replace {{// TODO}} comments with {{throw new 
UnsupportedOperationException()}}
* Document how timestamp/time/date types are handled in converting between the 
two type schemas
* Document the type conversions in general
* [timestamp handling is 
suspect|https://github.com/apache/arrow/pull/13800#discussion_r938908230]
* Upgrade to JUnit5/AssertJ instead of JUnit4/Hamcrest
* Get rid of FreePortFinder



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17718) [C++][Java][FlightRPC] Get rid of FlightTestUtil.getStartedServer etc.

2022-09-13 Thread David Li (Jira)
David Li created ARROW-17718:


 Summary: [C++][Java][FlightRPC] Get rid of 
FlightTestUtil.getStartedServer etc.
 Key: ARROW-17718
 URL: https://issues.apache.org/jira/browse/ARROW-17718
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Reporter: David Li


Anything expecting to bind to a random port in CI is an antipattern and makes 
tests flaky. All tests should bind to port 0 and let the OS assign a port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17688) [Format][FlightRPC][C++][Java] Add Substrait for Flight SQL

2022-09-12 Thread David Li (Jira)
David Li created ARROW-17688:


 Summary: [Format][FlightRPC][C++][Java] Add Substrait for Flight 
SQL
 Key: ARROW-17688
 URL: https://issues.apache.org/jira/browse/ARROW-17688
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, FlightRPC, Format, Java
Reporter: David Li
Assignee: David Li


See ML: https://lists.apache.org/thread/3k3np6314dwb0n7n1hrfwony5fcy7kzl



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17675) [C++] Null pointer dereference in Substrait.BasicPlanRoundTripping

2022-09-10 Thread David Li (Jira)
David Li created ARROW-17675:


 Summary: [C++] Null pointer dereference in 
Substrait.BasicPlanRoundTripping
 Key: ARROW-17675
 URL: https://issues.apache.org/jira/browse/ARROW-17675
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


{noformat}
[ RUN  ] Substrait.BasicPlanRoundTripping
file_path_str /tmp/substrait-tempdir-3pvz0v47/
/arrow/cpp/src/arrow/dataset/file_base.cc:97:19: runtime error: member call on 
null pointer of type 'arrow::Buffer'
#0 0x7fba39909ef1 in 
arrow::dataset::FileSource::Equals(arrow::dataset::FileSource const&) const 
/arrow/cpp/src/arrow/dataset/file_base.cc:97:19
#1 0x7fba3990f1ed in 
arrow::dataset::FileFragment::Equals(arrow::dataset::FileFragment const&) const 
/arrow/cpp/src/arrow/dataset/file_base.cc:147:18
#2 0x76e22c in 
arrow::engine::Substrait_BasicPlanRoundTripping_Test::TestBody() 
/arrow/cpp/src/arrow/engine/substrait/serde_test.cc:1977:5
#3 0x7fba3c92fa9a in void 
testing::internal::HandleSehExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607:10
#4 0x7fba3c915759 in void 
testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643:14
#5 0x7fba3c8ef652 in testing::Test::Run() 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2682:5
#6 0x7fba3c8f0418 in testing::TestInfo::Run() 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2861:11
#7 0x7fba3c8f0c33 in testing::TestSuite::Run() 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:3015:28
#8 0x7fba3c901a14 in testing::internal::UnitTestImpl::RunAllTests() 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5855:44
#9 0x7fba3c93289a in bool 
testing::internal::HandleSehExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool 
(testing::internal::UnitTestImpl::*)(), char const*) 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607:10
#10 0x7fba3c917f79 in bool 
testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool 
(testing::internal::UnitTestImpl::*)(), char const*) 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643:14
#11 0x7fba3c901570 in testing::UnitTest::Run() 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5438:10
#12 0x7fba3c968210 in RUN_ALL_TESTS() 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/include/gtest/gtest.h:2490:46
#13 0x7fba3c9681ec in main 
/build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc:52:10
#14 0x7fba1b6f8082 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x24082)
#15 0x4d4b2d in _start 
(/build/cpp/debug/arrow-substrait-substrait-test+0x4d4b2d)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
/arrow/cpp/src/arrow/dataset/file_base.cc:97:19 in
/build/cpp/src/arrow/engine
{noformat}

https://github.com/apache/arrow/runs/8274057341?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17661) [C++][Python][FlightRPC] Add Flight SQL ADBC driver and Python bindings

2022-09-09 Thread David Li (Jira)
David Li created ARROW-17661:


 Summary: [C++][Python][FlightRPC] Add Flight SQL ADBC driver and 
Python bindings
 Key: ARROW-17661
 URL: https://issues.apache.org/jira/browse/ARROW-17661
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, FlightRPC, Python
Reporter: David Li
Assignee: David Li


Pending ADBC acceptance.

This will finally make Flight SQL accessible in Python, though it will rely on 
having the ADBC driver manager available to provide the Python bindings.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17645) [CI] conda-integration builds failing due to pinned zlib

2022-09-07 Thread David Li (Jira)
David Li created ARROW-17645:


 Summary: [CI] conda-integration builds failing due to pinned zlib
 Key: ARROW-17645
 URL: https://issues.apache.org/jira/browse/ARROW-17645
 Project: Apache Arrow
  Issue Type: Bug
Reporter: David Li


{noformat}
Encountered problems while solving:
  - package libsqlite-3.39.2-h753d276_1 requires libzlib >=1.2.12,<1.3.0a0, but 
none of the providers can be installed
{noformat}

but in ARROW-17410 we pinned zlib to 1.2.11 to avoid a zlib bug that was 
causing failures in JS tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17604) [Java][Docs] Improve docs around JVM flags

2022-09-02 Thread David Li (Jira)
David Li created ARROW-17604:


 Summary: [Java][Docs] Improve docs around JVM flags
 Key: ARROW-17604
 URL: https://issues.apache.org/jira/browse/ARROW-17604
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Java
Reporter: David Li


* Clarify where the {{--add-opens}} flag should be added (as an argument to 
{{java}})
* Demonstrate how to configure Surefire with it
* Demonstrate how to configure IntelliJ with it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17603) [C++][FlightRPC] Print build logs if gRPC TlsCredentialsOptions detection fails

2022-09-02 Thread David Li (Jira)
David Li created ARROW-17603:


 Summary: [C++][FlightRPC] Print build logs if gRPC 
TlsCredentialsOptions detection fails
 Key: ARROW-17603
 URL: https://issues.apache.org/jira/browse/ARROW-17603
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


Make it easier to debug build failures in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17568) [FlightRPC][Integration] Ensure all RPC methods are covered by integration testing

2022-08-30 Thread David Li (Jira)
David Li created ARROW-17568:


 Summary: [FlightRPC][Integration] Ensure all RPC methods are 
covered by integration testing
 Key: ARROW-17568
 URL: https://issues.apache.org/jira/browse/ARROW-17568
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC, Go, Integration, Java
Reporter: David Li


This would help catch issues like https://github.com/apache/arrow/issues/13853



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17558) [C++][FlightRPC] Inconsistent use of int, int32_t, uint32_t for SqlInfo enum values

2022-08-29 Thread David Li (Jira)
David Li created ARROW-17558:


 Summary: [C++][FlightRPC] Inconsistent use of int, int32_t, 
uint32_t for SqlInfo enum values
 Key: ARROW-17558
 URL: https://issues.apache.org/jira/browse/ARROW-17558
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: David Li


These should all be uint32_t, always. Not a big deal in practice at least.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17538) [C++] Importing an ArrowArrayStream can't handle errors from get_schema

2022-08-26 Thread David Li (Jira)
David Li created ARROW-17538:


 Summary: [C++] Importing an ArrowArrayStream can't handle errors 
from get_schema
 Key: ARROW-17538
 URL: https://issues.apache.org/jira/browse/ARROW-17538
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 9.0.0
Reporter: David Li


As indicated in the code: 
https://github.com/apache/arrow/blob/cd3c6ead97d584366aafd2f14d99a1cb8ace9ca2/cpp/src/arrow/c/bridge.cc#L1823
 

This probably needs a static initializer so we can catch things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17537) [Java][FlightRPC] Update benchmark to be on par with C++

2022-08-26 Thread David Li (Jira)
David Li created ARROW-17537:


 Summary: [Java][FlightRPC] Update benchmark to be on par with C++
 Key: ARROW-17537
 URL: https://issues.apache.org/jira/browse/ARROW-17537
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Reporter: David Li


See https://github.com/apache/arrow/issues/13980

The Java benchmark isn't comparable out of the box (and it seems like there's 
an unexplained gap between it and the C++ benchmark)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17433) [C++] AppVeyor build fails due to Boost/Flight

2022-08-16 Thread David Li (Jira)
David Li created ARROW-17433:


 Summary: [C++] AppVeyor build fails due to Boost/Flight
 Key: ARROW-17433
 URL: https://issues.apache.org/jira/browse/ARROW-17433
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


Observed on master

{noformat}
[182/351] Building CXX object 
src\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx.obj
FAILED: 
src/arrow/filesystem/CMakeFiles/arrow-s3fs-test.dir/Unity/unity_0_cxx.cxx.obj 
C:\Miniconda37-x64\Scripts\clcache.exe  /nologo /TP -DARROW_HAVE_RUNTIME_AVX2 
-DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 
-DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_MIMALLOC 
-DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 
-DARROW_WITH_SNAPPY -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD 
-DAWS_CAL_USE_IMPORT_EXPORT -DAWS_CHECKSUMS_USE_IMPORT_EXPORT 
-DAWS_COMMON_USE_IMPORT_EXPORT -DAWS_EVENT_STREAM_USE_IMPORT_EXPORT 
-DAWS_IO_USE_IMPORT_EXPORT -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=8 
-DAWS_SDK_VERSION_PATCH=186 -DAWS_USE_IO_COMPLETION_PORTS -DBOOST_ALL_DYN_LINK 
-DBOOST_ALL_NO_LIB -DBOOST_ATOMIC_DYN_LINK -DBOOST_ATOMIC_NO_LIB 
-DBOOST_FILESYSTEM_DYN_LINK -DBOOST_FILESYSTEM_NO_LIB -DBOOST_SYSTEM_DYN_LINK 
-DBOOST_SYSTEM_NO_LIB -DPROTOBUF_USE_DLLS -DURI_STATIC_BUILD 
-DUSE_IMPORT_EXPORT -DUSE_IMPORT_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS 
-D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE 
-IC:\projects\arrow\cpp\build\src -IC:\projects\arrow\cpp\src 
-IC:\projects\arrow\cpp\src\generated 
-IC:\projects\arrow\cpp\thirdparty\flatbuffers\include 
-IC:\Miniconda37-x64\envs\arrow\Library\include 
-IC:\projects\arrow\cpp\thirdparty\hadoop\include 
-IC:\projects\arrow\cpp\build\mimalloc_ep\src\mimalloc_ep\include\mimalloc-2.0 
/DWIN32 /D_WINDOWS  /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING   
/EHsc /wd5105 /bigobj /utf-8 /W3 /wd4800 /wd4996 /wd4065  /WX /MP /MD /Od 
/UNDEBUG /showIncludes 
/Fosrc\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx.obj
 /Fdsrc\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\ /FS -c 
C:\projects\arrow\cpp\build\src\arrow\filesystem\CMakeFiles\arrow-s3fs-test.dir\Unity\unity_0_cxx.cxx
Please define _WIN32_WINNT or _WIN32_WINDOWS appropriately. For example:
- add -D_WIN32_WINNT=0x0601 to the compiler command line; or
- add _WIN32_WINNT=0x0601 to your project's Preprocessor Definitions.
Assuming _WIN32_WINNT=0x0601 (i.e. Windows 7 target).
C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(266):
 error C2220: warning treated as error - no 'object' file generated
C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(261):
 note: while compiling class template member function 
'boost::iterators::transform_iterator>,Char
 
**,boost::process::detail::entry>,boost::process::detail::entry>>
 
boost::process::basic_environment_impl::find(const
 std::basic_string,std::allocator> &)'
with
[
Char=char
]
C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(361):
 note: see reference to function template instantiation 
'boost::iterators::transform_iterator>,Char
 
**,boost::process::detail::entry>,boost::process::detail::entry>>
 
boost::process::basic_environment_impl::find(const
 std::basic_string,std::allocator> &)' being 
compiled
with
[
Char=char
]
C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/environment.hpp(632):
 note: see reference to class template instantiation 
'boost::process::basic_environment_impl'
 being compiled
with
[
Char=char
]
C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/env.hpp(176): note: 
see reference to class template instantiation 
'boost::process::basic_environment' being compiled
C:\Miniconda37-x64\envs\arrow\Library\include\boost/process/env.hpp(183): note: 
see reference to class template instantiation 
'boost::process::detail::env_init' being compiled
C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/relationship.hpp(595):
 note: see reference to class template instantiation 
'boost::asio::execution::detail::relationship_t<0>' being compiled
C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/outstanding_work.hpp(597):
 note: see reference to class template instantiation 
'boost::asio::execution::detail::outstanding_work_t<0>' being compiled
C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/occupancy.hpp(163):
 note: see reference to class template instantiation 
'boost::asio::execution::detail::occupancy_t<0>' being compiled
C:\Miniconda37-x64\envs\arrow\Library\include\boost/asio/execution/mapping.hpp(764):
 note: see reference to class template instantiation 
'boost::asio::execution::detail::mapping_t<0>' being 

[jira] [Created] (ARROW-17420) [C++][FlightRPC] Flight SQL integration tests don't fully compare schema definitions

2022-08-15 Thread David Li (Jira)
David Li created ARROW-17420:


 Summary: [C++][FlightRPC] Flight SQL integration tests don't fully 
compare schema definitions
 Key: ARROW-17420
 URL: https://issues.apache.org/jira/browse/ARROW-17420
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


Matt pointed this out in the Go tests: 
https://github.com/apache/arrow/pull/13868#discussion_r945827399



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17413) [JS] Integration test build fails with 'gulp-google-closure-compiler: java.util.zip.ZipException: invalid entry CRC'

2022-08-15 Thread David Li (Jira)
David Li created ARROW-17413:


 Summary: [JS] Integration test build fails with 
'gulp-google-closure-compiler: java.util.zip.ZipException: invalid entry CRC'
 Key: ARROW-17413
 URL: https://issues.apache.org/jira/browse/ARROW-17413
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration, JavaScript
Reporter: David Li


Seen on master, some PRs

{noformat}
[07:42:29] Error: gulp-google-closure-compiler: java.util.zip.ZipException: 
invalid entry CRC (expected 0x4e1f14a4 but got 0xb1e0eb5b)
at java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:410)
at java.util.zip.ZipInputStream.read(ZipInputStream.java:199)
at java.util.zip.ZipInputStream.closeEntry(ZipInputStream.java:143)
at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:121)
at 
com.google.javascript.jscomp.AbstractCommandLineRunner.getBuiltinExterns(AbstractCommandLineRunner.java:500)
at 
com.google.javascript.jscomp.CommandLineRunner.createExterns(CommandLineRunner.java:2084)
at 
com.google.javascript.jscomp.AbstractCommandLineRunner.doRun(AbstractCommandLineRunner.java:1187)
at 
com.google.javascript.jscomp.AbstractCommandLineRunner.run(AbstractCommandLineRunner.java:551)
at 
com.google.javascript.jscomp.CommandLineRunner.main(CommandLineRunner.java:2246)
Error writing to stdin of the compiler. write EPIPE

CustomError: gulp-google-closure-compiler: Compilation errors occurred
at CompilationStream._compilationComplete 
(/arrow/js/node_modules/google-closure-compiler/lib/gulp/index.js:238:28)
at /arrow/js/node_modules/google-closure-compiler/lib/gulp/index.js:208:14

at formatError 
(/arrow/js/node_modules/gulp-cli/lib/versioned/^4.0.0/format-error.js:21:10)
at Gulp. 
(/arrow/js/node_modules/gulp-cli/lib/versioned/^4.0.0/log/events.js:33:15)
at Gulp.emit (node:events:538:35)
at Gulp.emit (node:domain:475:12)
at Object.error 
(/arrow/js/node_modules/undertaker/lib/helpers/createExtensions.js:61:10)
at handler (/arrow/js/node_modules/now-and-later/lib/mapSeries.js:47:14)
at f (/arrow/js/node_modules/once/once.js:25:25)
at f (/arrow/js/node_modules/once/once.js:25:25)
at tryCatch 
(/arrow/js/node_modules/bach/node_modules/async-done/index.js:24:15)
at done (/arrow/js/node_modules/bach/node_modules/async-done/index.js:40:12)
[07:42:29] 'build:es2015:umd' errored after 3.02 min
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17385) [Integration] Re-enable disabled Rust Flight middleware test

2022-08-11 Thread David Li (Jira)
David Li created ARROW-17385:


 Summary: [Integration] Re-enable disabled Rust Flight middleware 
test
 Key: ARROW-17385
 URL: https://issues.apache.org/jira/browse/ARROW-17385
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration
Reporter: David Li
Assignee: David Li


Follow-up for ARROW-10961. The linked Rust issue was fixed, so we should 
re-enable the integration test case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17342) [Java] Improve testing of Dataset bindings

2022-08-08 Thread David Li (Jira)
David Li created ARROW-17342:


 Summary: [Java] Improve testing of Dataset bindings
 Key: ARROW-17342
 URL: https://issues.apache.org/jira/browse/ARROW-17342
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


>From https://github.com/apache/arrow/pull/13811

* We should ensure all types are tested
* We should organize tests in a way that Parquet, IPC, and eventually CSV/ORC 
can mostly share test code (save for perhaps skipping/overriding specific 
format-type pairs)

Incidentally: it may be good to incrementally port this module to JUnit5 and 
drop JUnit4



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17307) [C++][FlightRPC] Fix linking of Flight/gRPC example on MacOS

2022-08-04 Thread David Li (Jira)
David Li created ARROW-17307:


 Summary: [C++][FlightRPC] Fix linking of Flight/gRPC example on 
MacOS
 Key: ARROW-17307
 URL: https://issues.apache.org/jira/browse/ARROW-17307
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: David Li


{{flight_grpc_example}} uses {{--no-as-needed}} but this doesn't work on MacOS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17300) [Java][Docs] Compare/contrast the Netty and Unsafe memory backends

2022-08-03 Thread David Li (Jira)
David Li created ARROW-17300:


 Summary: [Java][Docs] Compare/contrast the Netty and Unsafe memory 
backends
 Key: ARROW-17300
 URL: https://issues.apache.org/jira/browse/ARROW-17300
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Java
Reporter: David Li


We should compare why you might want to use each.

Are there benchmarks in the Java benchmark suite that might also be useful? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17270) [Docs] Move Java nightlies instructions to developer docs to comply with ASF policies

2022-08-01 Thread David Li (Jira)
David Li created ARROW-17270:


 Summary: [Docs] Move Java nightlies instructions to developer docs 
to comply with ASF policies
 Key: ARROW-17270
 URL: https://issues.apache.org/jira/browse/ARROW-17270
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Java
Reporter: David Li


https://github.com/apache/arrow/pull/13755#pullrequestreview-1056673168

{quote}
BTW, can we move the "Installing Nightly Packages" section to development 
documents (in a follow-up task)? It seems that this doesn't follow the ASF 
policy (It seems that "Use them at your own risk" isn't suitable for the ASF 
policy):

https://www.apache.org/legal/release-policy.html#publication

Projects SHALL publish official releases and SHALL NOT publish unreleased 
materials outside the development community.

During the process of developing software and preparing a release, various 
packages are made available to the development community for testing purposes. 
Projects MUST direct outsiders towards official releases rather than raw source 
repositories, nightly builds, snapshots, release candidates, or any other 
similar packages. Projects SHOULD make available developer resources to support 
individuals actively participating in development or following the dev list and 
thus aware of the conditions placed on unreleased materials.
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17268) [C++] JSON kernels

2022-07-30 Thread David Li (Jira)
David Li created ARROW-17268:


 Summary: [C++] JSON kernels
 Key: ARROW-17268
 URL: https://issues.apache.org/jira/browse/ARROW-17268
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


As discussed on dev@: 
https://lists.apache.org/thread/onzgogx2c2djxs0wbhmvqp2dbx7kjf6o "[ARROW-17255] 
Logical JSON type in Arrow"

It would be interesting to have JSON parsing/serializing compute functions that 
operate on columns of (stringified) JSON records. For parsing, the problem is 
we need to know the output schema without being able to look at the data, so we 
would probably only be able to decode into a {{map[string, union]}} type at 
best. And/or we could offer "extraction" functions akin to what things like 
SQLite and Postgres provide (at the cost of having to reparse the JSON over and 
over).

Also see ARROW-17255 for a logical JSON type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17254) [C++][FlightRPC] Flight SQL server does not implement GetSchema

2022-07-29 Thread David Li (Jira)
David Li created ARROW-17254:


 Summary: [C++][FlightRPC] Flight SQL server does not implement 
GetSchema
 Key: ARROW-17254
 URL: https://issues.apache.org/jira/browse/ARROW-17254
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


This is specified, but not actually implemented!

It needs to be covered in integration tests, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17242) [C++][FlightRPC] Implement and call FlightDataStream::Close()

2022-07-28 Thread David Li (Jira)
David Li created ARROW-17242:


 Summary: [C++][FlightRPC] Implement and call 
FlightDataStream::Close()
 Key: ARROW-17242
 URL: https://issues.apache.org/jira/browse/ARROW-17242
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li
Assignee: David Li


For RecordBatchStream, this should dispatch to the underlying 
RecordBatchReader::Close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17230) [C++] Fix minor bugs in Substrait to ExecPlan conversion

2022-07-27 Thread David Li (Jira)
David Li created ARROW-17230:


 Summary: [C++] Fix minor bugs in Substrait to ExecPlan conversion
 Key: ARROW-17230
 URL: https://issues.apache.org/jira/browse/ARROW-17230
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li
Assignee: David Li


* The return type of DeserializePlan is wrong: it should be 
{{shared_ptr}}, else we get a use-after-free.
* Errors are ignored where they shouldn't be: you can get a half-constructed 
plan instead of an error.
* A stateful callback is called twice, leading to invalid options being passed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17229) [C++] ReadRel is translated to a source node that emits unexpected fields

2022-07-27 Thread David Li (Jira)
David Li created ARROW-17229:


 Summary: [C++] ReadRel is translated to a source node that emits 
unexpected fields
 Key: ARROW-17229
 URL: https://issues.apache.org/jira/browse/ARROW-17229
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


Currently, a Substrait plan with a RelRoot containing a ReadRel will contain 
extra, unexpected fields, namely {{__fragment_index}} et. al. Right now they 
are always included by default. There are a few things to be done:

* ReadRel's {{base_schema}} could be converted into a 
{{ScanOptions.dataset_schema}} to limit the fields read. (Also see ARROW-15585, 
these fields should be used for pushdown projection)
* The scanner always adds these extra fields - maybe it should be opt-in instead
* There's no way to manually insert a Project to "fix" things because as 
implemented, it can only add new columns



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17214) [C++] Implement Scalar CastTo from all types to String

2022-07-26 Thread David Li (Jira)
David Li created ARROW-17214:


 Summary: [C++] Implement Scalar CastTo from all types to String
 Key: ARROW-17214
 URL: https://issues.apache.org/jira/browse/ARROW-17214
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


As reported on the mailing list: 
https://lists.apache.org/thread/rp7vpjtt4lgtjxj35oyjyqh9b6on94jf

Some types, including LIST, LARGE_LIST, and MAP do not implement casts. Ideally 
we'd implement these (implement all to-string casts?) by leveraging the 
existing cast for any formattable type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17199) [FlightRPC][Java] Fix example Flight SQL server

2022-07-25 Thread David Li (Jira)
David Li created ARROW-17199:


 Summary: [FlightRPC][Java] Fix example Flight SQL server
 Key: ARROW-17199
 URL: https://issues.apache.org/jira/browse/ARROW-17199
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Reporter: David Li
Assignee: David Li


There are a number of small bugs in the Java Flight SQL example (e.g. binding 
parameters to the wrong index, not handling null parameter values, not properly 
reporting errors) that should be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17191) [C++] MinGW Flight tests failing

2022-07-23 Thread David Li (Jira)
David Li created ARROW-17191:


 Summary: [C++] MinGW Flight tests failing 
 Key: ARROW-17191
 URL: https://issues.apache.org/jira/browse/ARROW-17191
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li
Assignee: David Li


Noticed across several PRs
{noformat}
[ RUN      ] GrpcDataTest.TestDoExchangeError
D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:490: Failure
Value of: _st.IsNotImplemented()
  Actual: false
Expected: true
Expected 'writer->Close()' to fail with NotImplemented, but got IOError: Stream 
finished before first message sent. gRPC client debug context: UNKNOWN:Error 
received from peer ipv4:127.0.0.1:52323 
{created_time:"2022-07-23T01:21:23.785644223+00:00", grpc_status:2, 
grpc_message:"Stream finished before first message sent"}. Client context: OK. 
Detail: Failed
D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:490: Failure
Value of: _st.ToString()
Expected: has substring "Expected error"
  Actual: "IOError: Stream finished before first message sent. gRPC client 
debug context: UNKNOWN:Error received from peer ipv4:127.0.0.1:52323 
{created_time:\"2022-07-23T01:21:23.785644223+00:00\", grpc_status:2, 
grpc_message:\"Stream finished before first message sent\"}. Client context: 
OK. Detail: Failed"
[  FAILED  ] GrpcDataTest.TestDoExchangeError (5 ms)
[ RUN      ] GrpcDataTest.TestDoExchangeConcurrency
[       OK ] GrpcDataTest.TestDoExchangeConcurrency (5 ms)
[ RUN      ] GrpcDataTest.TestDoExchangeUndrained
[       OK ] GrpcDataTest.TestDoExchangeUndrained (4 ms)
[ RUN      ] GrpcDataTest.TestIssue5095
[       OK ] GrpcDataTest.TestIssue5095 (9 ms)
[--] 17 tests from GrpcDataTest (891 ms total)
[--] 7 tests from GrpcDoPutTest
[ RUN      ] GrpcDoPutTest.TestInts
D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure
Failed
'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but 
got \0L��. gRPC client debug context: UNKNOWN:Error received from peer 
ipv4:127.0.0.1:52331 {grpc_message:"Expected app_metadata to be foo bar but got 
\x00L\xf4\x86\x02\xe0\xa1", grpc_status:3, 
created_time:"2022-07-23T01:21:23.810734286+00:00"}. Client context: OK
[  FAILED  ] GrpcDoPutTest.TestInts (4 ms)
[ RUN      ] GrpcDoPutTest.TestFloats
D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure
Failed
'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but 
got \0<���. gRPC client debug context: UNKNOWN:Error received from peer 
ipv4:127.0.0.1:52333 {grpc_message:"Expected app_metadata to be foo bar but got 
\x00<\xee\xc6\x02\xe0\xa1", grpc_status:3, 
created_time:"2022-07-23T01:21:23.815439591+00:00"}. Client context: OK
[  FAILED  ] GrpcDoPutTest.TestFloats (4 ms)
[ RUN      ] GrpcDoPutTest.TestEmptyBatch
D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure
Failed
'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but 
got \0���. gRPC client debug context: UNKNOWN:Error received from peer 
ipv4:127.0.0.1:52335 {grpc_message:"Expected app_metadata to be foo bar but got 
\x00\x9c\xef\xa6\x02\xe0\xa1", grpc_status:3, 
created_time:"2022-07-23T01:21:23.819872813+00:00"}. Client context: OK
[  FAILED  ] GrpcDoPutTest.TestEmptyBatch (4 ms)
[ RUN      ] GrpcDoPutTest.TestDicts
D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure
Failed
'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but 
got \0\���. gRPC client debug context: UNKNOWN:Error received from peer 
ipv4:127.0.0.1:52337 {grpc_message:"Expected app_metadata to be foo bar but got 
\x00\\\xf0\xc6\x02\xe0\xa1", grpc_status:3, 
created_time:"2022-07-23T01:21:23.824172893+00:00"}. Client context: OK
[  FAILED  ] GrpcDoPutTest.TestDicts (4 ms)
[ RUN      ] GrpcDoPutTest.TestLargeBatch
D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:690: Failure
Failed
'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but 
got \0|��. gRPC client debug context: UNKNOWN:Error received from peer 
ipv4:127.0.0.1:52339 {created_time:"2022-07-23T01:21:24.001437714+00:00", 
grpc_status:3, grpc_message:"Expected app_metadata to be foo bar but got 
\x00|\xf2\xa6\x02\xe0\xa1"}. Client context: OK
[  FAILED  ] GrpcDoPutTest.TestLargeBatch (185 ms)
[ RUN      ] GrpcDoPutTest.TestSizeLimit
D:/a/arrow/arrow/cpp/src/arrow/flight/test_definitions.cc:802: Failure
Failed
'writer->Close()' failed with Invalid: Expected app_metadata to be foo bar but 
got \0\�,�. gRPC client debug context: UNKNOWN:Error received from peer 
ipv4:127.0.0.1:52341 {grpc_message:"Expected app_metadata to be foo bar but got 
\x00\\\xef,\x07\xe0\xa1", grpc_status:3, 
created_time:"2022-07-23T01:21:24.016917836+00:00"}. Client context: OK
[  FAILED  ] GrpcDoPutTest.TestSizeLimit (8 ms)
[ RUN      ] 

[jira] [Created] (ARROW-17163) [C++] Don't install jni_util.h

2022-07-21 Thread David Li (Jira)
David Li created ARROW-17163:


 Summary: [C++] Don't install jni_util.h
 Key: ARROW-17163
 URL: https://issues.apache.org/jira/browse/ARROW-17163
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


ARROW-17086 fixed some compiler warnings and restored the installation of 
jni_util.h to match prior behavior. But we never intended to expose this 
header, and the downstream Gluten project [no longer depends on 
it|https://github.com/apache/arrow/pull/13614#issuecomment-1191198106], so we 
can stop installing it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17113) [Java] All static initializers should catch and report exceptions

2022-07-18 Thread David Li (Jira)
David Li created ARROW-17113:


 Summary: [Java] All static initializers should catch and report 
exceptions
 Key: ARROW-17113
 URL: https://issues.apache.org/jira/browse/ARROW-17113
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


As reported on the mailing list: 
https://lists.apache.org/thread/gysn25gsm4v1fvvx9l0sjyr627xy7q65

All static initializers should catch and report exceptions, or else they will 
get swallowed by the JVM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17109) [C++] Clean up includes in exec_plan.h

2022-07-18 Thread David Li (Jira)
David Li created ARROW-17109:


 Summary: [C++] Clean up includes in exec_plan.h
 Key: ARROW-17109
 URL: https://issues.apache.org/jira/browse/ARROW-17109
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


Notably, it includes logging.h transitively via exec/util.h which we should 
avoid. We should perhaps add to/create an arrow/compute/exec/type_fwd.h



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17105) [Python] test_filesystem_dataset_no_filesystem_interaction segfault on s390x

2022-07-18 Thread David Li (Jira)
David Li created ARROW-17105:


 Summary: [Python] 
test_filesystem_dataset_no_filesystem_interaction segfault on s390x
 Key: ARROW-17105
 URL: https://issues.apache.org/jira/browse/ARROW-17105
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: David Li


Python on s390x test failed:
{noformat}
usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_filesystem_dataset_no_filesystem_interaction[threaded]
 Fatal Python error: Segmentation fault

Thread 0x03ff954f3700 (most recent call first):

PASSED [ 23%]  File 
"/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 60 in 
_multicall

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in 
_hookexec

  File "/usr/local/lib/python3.8/

usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_filesystem_dataset_no_filesystem_interaction[serial]
 dist-packages/pluggy/_hooks.py", line 265 in __call__

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in 
_hookexec

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.pyPASSED [ 
23%]", line 18 in _multicall

  File "/usr/loc

usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_dataset[threaded]
 al/lib/python3.SKIPPED [ 23%]

usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_dataset[serial]
 SKIPPED [ 23%]

usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py::test_scanner[threaded]
 8/dist-packages/pluggy/_callers.py", line 33 in _multicall

  File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 223 in 
call_and_report

  File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 137 in 
runtestprotocol

  File "/usr/local/lib/python3.8/dist-packages/_pytest/runner.py", line 113 in 
pytest_runtest_protocol

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 60 in 
_multicall

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in 
_hookexec

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in 
__call__

  File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 347 in 
pytest_runtestloop

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in 
_multicall

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in 
_hookexec

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in 
__call__

  File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 322 in 
_main

  File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 268 in 
wrap_session

  File "/usr/local/lib/python3.8/dist-packages/_pytest/main.py", line 315 in 
pytest_cmdline_main

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_callers.py", line 39 in 
_multicall

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_manager.py", line 80 in 
_hookexec

  File "/usr/local/lib/python3.8/dist-packages/pluggy/_hooks.py", line 265 in 
__call__

  File "/usr/local/lib/python3.8/dist-packages/_pytest/config/__init__.py", 
line 164 in main

  File "/usr/local/lib/python3.8/dist-packages/_pytest/config/__init__.py", 
line 187 in console_main

  File "/usr/local/bin/pytest", line 8 in 

/arrow/ci/scripts/python_test.sh: line 57: 10190 Segmentation fault  (core 
dumped) pytest -r s -v ${PYTEST_ARGS} --pyargs pyarrow

139

Error: `docker-compose --file 
/home/travis/build/apache/arrow/docker-compose.yml run --rm -e 
ARROW_BUILD_STATIC=OFF -e ARROW_FLIGHT=ON -e ARROW_GCS=OFF -e 
ARROW_MIMALLOC=OFF -e ARROW_ORC=OFF -e ARROW_PARQUET=OFF -e ARROW_PYTHON=ON -e 
ARROW_S3=OFF -e CMAKE_BUILD_PARALLEL_LEVEL=2 -e CMAKE_UNITY_BUILD=ON -e 
PARQUET_BUILD_EXAMPLES=OFF -e PARQUET_BUILD_EXECUTABLES=OFF -e 
Protobuf_SOURCE=BUNDLED -e gRPC_SOURCE=BUNDLED --volume 
/home/travis/build/apache/arrow/build:/build ubuntu-python` exited with a 
non-zero exit code 139, see the process log above.
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17093) [C++][CI] Enable libSegFault for C++ tests

2022-07-15 Thread David Li (Jira)
David Li created ARROW-17093:


 Summary: [C++][CI] Enable libSegFault for C++ tests
 Key: ARROW-17093
 URL: https://issues.apache.org/jira/browse/ARROW-17093
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Reporter: David Li


Adding libSegFault.so could make it easier to diagnose CI failures. It will 
print a backtrace on segfault.
{noformat}
  env SEGFAULT_SIGNALS=all \
  LD_PRELOAD=/lib/x86_64-linux-gnu/libSegFault.so
{noformat}
This will give a backtrace like this on segfault:
{noformat}
Backtrace:
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f8f4a0b900b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f8f4a098859]
/lib/x86_64-linux-gnu/libc.so.6(+0x8d26e)[0x7f8f4a10326e]
/lib/x86_64-linux-gnu/libc.so.6(+0x952fc)[0x7f8f4a10b2fc]
/lib/x86_64-linux-gnu/libc.so.6(+0x96f6d)[0x7f8f4a10cf6d]
/tmp/arrow-HEAD.y8UwB/cpp-build/release/flight-test-integration-client(_ZNSt8_Rb_treeISt10shared_ptrIN5arrow8DataTypeEES3_St9_IdentityIS3_ESt4lessIS3_ESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E+0x39)[0x5557a9a83b19]
/tmp/arrow-HEAD.y8UwB/cpp-build/release/flight-test-integration-client(_ZNSt8_Rb_treeISt10shared_ptrIN5arrow8DataTypeEES3_St9_IdentityIS3_ESt4lessIS3_ESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E+0x1f)[0x5557a9a83aff]
/tmp/arrow-HEAD.y8UwB/cpp-build/release/flight-test-integration-client(_ZNSt3setISt10shared_ptrIN5arrow8DataTypeEESt4lessIS3_ESaIS3_EED1Ev+0x33)[0x5557a9a83b83]
/lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xce)[0x7f8f4a0bcfde]
/tmp/arrow-HEAD.y8UwB/cpp-build/release/libarrow.so.900(+0x440b67)[0x7f8f47d56b67]
{noformat}
Caveats:
 * The path is OS-specific
 * We could integrate it into the build tooling instead of doing it via env var
 * Are there easily accessible equivalents for MacOS and Windows we could use?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17052) [C++][Python][FlightRPC] Ensure ::Serialize and ::Deserialize are consistently implemented

2022-07-12 Thread David Li (Jira)
David Li created ARROW-17052:


 Summary: [C++][Python][FlightRPC] Ensure ::Serialize and 
::Deserialize are consistently implemented
 Key: ARROW-17052
 URL: https://issues.apache.org/jira/browse/ARROW-17052
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC, Python
Reporter: David Li


Structures like Action don't expose these methods even though ones like 
FlightInfo do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17025) [Dev] Merge script could warn if username pings would be present in commit message

2022-07-08 Thread David Li (Jira)
David Li created ARROW-17025:


 Summary: [Dev] Merge script could warn if username pings would be 
present in commit message
 Key: ARROW-17025
 URL: https://issues.apache.org/jira/browse/ARROW-17025
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: David Li


If a PR gets merged and its description {{@}} references a user, then the user 
will get a GitHub notification every time that commit gets pushed to a fork. 
This can be rather a bother, so it might be nice if the merge script could warn 
about this, or possibly even rewrite the commit message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17024) [Java] Ensure Flight with native Netty transport is actually being tested

2022-07-08 Thread David Li (Jira)
David Li created ARROW-17024:


 Summary: [Java] Ensure Flight with native Netty transport is 
actually being tested
 Key: ARROW-17024
 URL: https://issues.apache.org/jira/browse/ARROW-17024
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Reporter: David Li
Assignee: David Li


There's only one test that exercises the domain socket path and it appears it's 
getting skipped on CI
{noformat}
[INFO] Running org.apache.arrow.flight.TestServerOptions
Warning:  Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.024 
s - in org.apache.arrow.flight.TestServerOptions {noformat}
We should make sure this test works and figure out whatever Maven magic we need 
to get the right dependencies on the right platforms



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17006) [Java] arrow-jdbc defines type but not value mapping for struct types

2022-07-07 Thread David Li (Jira)
David Li created ARROW-17006:


 Summary: [Java] arrow-jdbc defines type but not value mapping for 
struct types
 Key: ARROW-17006
 URL: https://issues.apache.org/jira/browse/ARROW-17006
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


While Types.STRUCT is mapped to ArrowType.Struct, we need additional config to 
be able to actually read such values, similar to ARROW-4142.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17004) [Java] Implement Arrow->JDBC prepared statement parameters for arrow-jdbc

2022-07-07 Thread David Li (Jira)
David Li created ARROW-17004:


 Summary: [Java] Implement Arrow->JDBC prepared statement 
parameters for arrow-jdbc
 Key: ARROW-17004
 URL: https://issues.apache.org/jira/browse/ARROW-17004
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li
Assignee: David Li


arrow-jdbc can turn JDBC ResultSets into Arrow VectorSchemaRoots. However, it 
would also be useful to have the opposite: bind values from a VectorSchemaRoot 
to a PreparedStatement for inserting/updating data.

This is necessary for the ADBC project but isn't ADBC specific, so it could be 
added to arrow-jdbc. We should also document the type mapping it uses and how 
to customize the mapping.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17003) [Java][Docs] Document JDBC module

2022-07-07 Thread David Li (Jira)
David Li created ARROW-17003:


 Summary: [Java][Docs] Document JDBC module
 Key: ARROW-17003
 URL: https://issues.apache.org/jira/browse/ARROW-17003
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Java
Reporter: David Li
Assignee: David Li


The arrow-jdbc submodule could use its own documentation page.

In particular, we should document the type mapping it uses (and the rationale 
where applicable) and how to customize it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16994) [Docs][CI] Clean up some docs warnings and increase CI timeout

2022-07-06 Thread David Li (Jira)
David Li created ARROW-16994:


 Summary: [Docs][CI] Clean up some docs warnings and increase CI 
timeout
 Key: ARROW-16994
 URL: https://issues.apache.org/jira/browse/ARROW-16994
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Documentation
Reporter: David Li


The docs are starting to take just about 30 minutes to build, causing spurious 
timeouts.

Also, there are several warnings that could/should be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16958) [C++][FlightRPC] Flight generates misaligned buffers

2022-07-01 Thread David Li (Jira)
David Li created ARROW-16958:


 Summary: [C++][FlightRPC] Flight generates misaligned buffers
 Key: ARROW-16958
 URL: https://issues.apache.org/jira/browse/ARROW-16958
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: David Li


Protobuf's wire format design + our zero-copy serializer/deserializer mean that 
buffers can end up misaligned. On some Arrow versions, this can cause segfaults 
in kernels assuming alignment (and generally violates expectations). 

We should:
* Possibly include buffer alignment in array validation
* See if we can adjust the serializer to somehow pad things properly
* See if we can do anything about this in the deserializer

Example:
{code:python}
import pyarrow as pa
import pyarrow.flight as flight

class TestServer(flight.FlightServerBase):
def do_get(self, context, ticket):
schema = pa.schema(
[
("index", pa.int64()),
("int8", pa.float64()),
("int16", pa.float64()),
("int32", pa.float64()),
]
)
return flight.RecordBatchStream(pa.table([
[0, 1, 2, 3],
[0, 1, None, 3],
[0, 1, 2, None],
[0, None, 2, 3],
], schema=schema))


with TestServer() as server:
client = flight.connect(f"grpc://localhost:{server.port}")
table = client.do_get(flight.Ticket(b"")).read_all()
for col in table:
print(col.type)
for chunk in col.chunks:
for buf in chunk.buffers():
if not buf: continue
print("buffer is 8-byte aligned?", buf.address % 8)
chunk.cast(pa.float32())
{code}

On Arrow 8
{noformat}
int64
buffer is 8-byte aligned? 1
double
buffer is 8-byte aligned? 1
buffer is 8-byte aligned? 1
double
buffer is 8-byte aligned? 1
buffer is 8-byte aligned? 1
double
buffer is 8-byte aligned? 1
buffer is 8-byte aligned? 1
{noformat}
On Arrow 7
{noformat}
int64
buffer is 8-byte aligned? 4
double
buffer is 8-byte aligned? 4
buffer is 8-byte aligned? 4
fish: Job 1, 'python ../test.py' terminated by signal SIGSEGV (Address boundary 
error)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16944) [C++] Create macro-benchmarks of file format readers

2022-06-30 Thread David Li (Jira)
David Li created ARROW-16944:


 Summary: [C++] Create macro-benchmarks of file format readers
 Key: ARROW-16944
 URL: https://issues.apache.org/jira/browse/ARROW-16944
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: David Li


Currently we have (some) microbenchmarks, but measuring performance of our 
various readers (CSV, JSON, IPC, Parquet, ORC) over "real world" files would 
also be interesting and hopefully more illustrative of the use cases we 
actually care about. Such benchmarks may be expensive, though.

Ideally, we would do this in a variety of scenarios: in-memory (to focus on CPU 
optimization), on-disk (though such measurements would likely be extremely 
noisy?), and over the network (perhaps with something like Minio + Toxiproxy to 
try to have a consistent, reproducible setup) so that we can also judge the I/O 
characteristics of the readers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16913) [Java] Implement ArrowArrayStream/C Stream Interface

2022-06-27 Thread David Li (Jira)
David Li created ARROW-16913:


 Summary: [Java] Implement ArrowArrayStream/C Stream Interface
 Key: ARROW-16913
 URL: https://issues.apache.org/jira/browse/ARROW-16913
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: David Li


ARROW-12965 implemented the core C Data Interface, but we still need to 
implement the streaming interface.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16902) [C++] Flight SQL fails to build on Windows due to incorrect usage of DLL linkage specifiers

2022-06-24 Thread David Li (Jira)
David Li created ARROW-16902:


 Summary: [C++] Flight SQL fails to build on Windows due to 
incorrect usage of DLL linkage specifiers
 Key: ARROW-16902
 URL: https://issues.apache.org/jira/browse/ARROW-16902
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 8.0.0
Reporter: David Li
Assignee: David Li
 Fix For: 9.0.0


Flight SQL uses "ARROW_EXPORT" in places, and also fails to define 
"ARROW_FLIGHT_EXPORTING", leading to linker issues.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16877) [C++] Valgrind failure (unintialized value) in arrow-compute-internals-test

2022-06-22 Thread David Li (Jira)
David Li created ARROW-16877:


 Summary: [C++] Valgrind failure (unintialized value) in 
arrow-compute-internals-test
 Key: ARROW-16877
 URL: https://issues.apache.org/jira/browse/ARROW-16877
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: David Li


Looks like GTest is trying to print an uninitalized unique_ptr.

https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=27986=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=d9b15392-e4ce-5e4c-0c8c-b69645229181

{noformat}
27/68 Test #28: arrow-compute-internals-test .***Failed   15.30 sec
==11317== Memcheck, a memory error detector
==11317== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==11317== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==11317==by 0x1C31BF: void 
testing::internal::PrintTupleTo > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > >, 
2ul>(std::tuple > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, 
std::integral_constant, std::ostream*) 
(gtest-printers.h:641)
==11317==by 0x1C31F8: void 
testing::internal::PrintTupleTo > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > >, 
3ul>(std::tuple > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, 
std::integral_constant, std::ostream*) 
(gtest-printers.h:641)
==11317==by 0x1C3231: void 
testing::internal::PrintTupleTo > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > >, 
4ul>(std::tuple > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, 
std::integral_constant, std::ostream*) 
(gtest-printers.h:641)
==11317==by 0x1C3285: void 
testing::internal::PrintTo > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > 
>(std::tuple > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, std::ostream*) 
(gtest-printers.h:654)
==11317==by 0x1C32AA: Print (gtest-printers.h:691)
==11317==by 0x1C32AA: void 
testing::internal::UniversalPrint > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > 
>(std::tuple > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&, std::ostream*) 
(gtest-printers.h:980)
==11317==by 0x1C32E7: Print (gtest-printers.h:865)
==11317==by 0x1C32E7: std::__cxx11::basic_string, std::allocator > 
testing::PrintToString > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > 
>(std::tuple > ()>, std::function, std::function, std::allocator >, 
std::allocator, 
std::allocator > > > ()>, std::__cxx11::basic_string, std::allocator > > const&) (gtest-printers.h:1018)
==11317==by 0x1C4033: 
testing::internal::ParameterizedTestSuiteInfo::RegisterTests()
 (gtest-param-util.h:590)
==11317==by 0x6438DBC: 
testing::internal::ParameterizedTestSuiteRegistry::RegisterTests() 
(gtest-param-util.h:726)
==11317==by 0x6445597: 
testing::internal::UnitTestImpl::RegisterParameterizedTests() (gtest.cc:2823)
==11317==by 0x64558D3: 
testing::internal::UnitTestImpl::PostFlagParsingInit() (gtest.cc:5639)
==11317==by 0x646C550: void 
testing::internal::InitGoogleTestImpl(int*, char**) (gtest.cc:6646)
==11317==by 0x64584C4: testing::InitGoogleTest(int*, char**) (gtest.cc:6664)
==11317==by 0x4205956: main (gtest_main.cc:51)
==11317== 
{
   
   Memcheck:Cond
   fun:vfprintf
   fun:vsnprintf
   fun:snprintf
   fun:_ZN7testing12_GLOBAL__N_126PrintByteSegmentInObjectToEPKhmmPSo
   fun:_ZN7testing12_GLOBAL__N_124PrintBytesInObjectToImplEPKhmPSo
   fun:_ZN7testing8internal20PrintBytesInObjectToEPKhmPSo
   
fun:PrintValue()>
 >
   
fun:_ZN7testing8internal17PrintWithFallbackISt8functionIFSt10unique_ptrIN5arrow7compute16FunctionRegistryESt14default_deleteIS6_EEvvRKT_PSo
   
fun:_ZN7testing8internal7PrintToISt8functionIFSt10unique_ptrIN5arrow7compute16FunctionRegistryESt14default_deleteIS6_EEvvRKT_PSo
   fun:Print
   

[jira] [Created] (ARROW-16873) [Python] test_debug_memory_pool_disabled segfaulting on MacOS CI

2022-06-21 Thread David Li (Jira)
David Li created ARROW-16873:


 Summary: [Python] test_debug_memory_pool_disabled segfaulting on 
MacOS CI
 Key: ARROW-16873
 URL: https://issues.apache.org/jira/browse/ARROW-16873
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: David Li


Observed on master and many PRs, example: 
https://github.com/apache/arrow/runs/6991997196?check_suite_focus=true

>From a quick read, it's likely just that the stderr isn't necessarily empty as 
>the test expects.

{noformat}

=== FAILURES ===
_ test_debug_memory_pool_disabled[system_memory_pool] __
pool_factory = 
@pytest.mark.parametrize('pool_factory', supported_factories())
def test_debug_memory_pool_disabled(pool_factory):
res = run_debug_memory_pool(pool_factory.__name__, "")
# The subprocess either returned successfully or was killed by a signal
# (due to writing out of bounds), depending on the underlying allocator.
if os.name == "posix":
assert res.returncode <= 0
else:
res.check_returncode()
>   assert res.stderr == ""
E   assert 'Fatal Python...in \n' == ''
E + Fatal Python error: Segmentation fault
E + 
E + Current thread 0x000102009e00 (most recent call first):
E +   File "", line 12 in 
/usr/local/lib/python3.9/site-packages/pyarrow/tests/test_memory.py:245: 
AssertionError
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16836) [C++] Have exported ArrowArrayStreams call RecordBatchReader::Close

2022-06-15 Thread David Li (Jira)
David Li created ARROW-16836:


 Summary: [C++] Have exported ArrowArrayStreams call 
RecordBatchReader::Close
 Key: ARROW-16836
 URL: https://issues.apache.org/jira/browse/ARROW-16836
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


We added RecordBatchReader::Close(), should we have an exported 
ArrowArrayStream call this? 

The issue is that {{release()}} can't return errors. We could call {{Close()}} 
implicitly after the last batch if the user drains the ArrowArrayStream, and 
return any error there, but if they don't drain the stream (but call 
{{release}}) we'll have no way to return the error. (Or we could make an ABI 
break…)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16788) [C++] Some packing builds fail to build bundled gRPC

2022-06-08 Thread David Li (Jira)
David Li created ARROW-16788:


 Summary: [C++] Some packing builds fail to build bundled gRPC
 Key: ARROW-16788
 URL: https://issues.apache.org/jira/browse/ARROW-16788
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


[https://github.com/ursacomputing/crossbow/runs/6789534725?check_suite_focus=true]
{noformat}
FAILED: 
CMakeFiles/grpc++.dir/src/core/ext/transport/binder/transport/binder_transport.cc.o
 
/usr/lib/ccache/c++   
-I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/include 
-I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep 
-I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/third_party/address_sorting/include
 
-I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/upb-generated
 
-I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/upbdefs-generated
 
-I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/third_party/upb
 
-I/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/third_party/xxhash
 -Igens -isystem 
/build/apache-arrow-9.0.0.dev191/cpp_build/protobuf_ep-install/include -isystem 
/build/apache-arrow-9.0.0.dev191/cpp_build/absl_ep-install/include -g -O2 
-fdebug-prefix-map=/build/apache-arrow-9.0.0.dev191=. -fstack-protector-strong 
-Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fdiagnostics-color=always 
-O3 -DNDEBUG -O3 -DNDEBUG -fPIC   -g -O2 
-fdebug-prefix-map=/build/apache-arrow-9.0.0.dev191=. -fstack-protector-strong 
-Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fdiagnostics-color=always 
-O3 -DNDEBUG -O3 -DNDEBUG -fPIC -fPIC   -pthread -std=c++11 -MD -MT 
CMakeFiles/grpc++.dir/src/core/ext/transport/binder/transport/binder_transport.cc.o
 -MF 
CMakeFiles/grpc++.dir/src/core/ext/transport/binder/transport/binder_transport.cc.o.d
 -o 
CMakeFiles/grpc++.dir/src/core/ext/transport/binder/transport/binder_transport.cc.o
 -c 
/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/transport/binder/transport/binder_transport.cc
/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/transport/binder/transport/binder_transport.cc:
 In function ‘void set_pollset_set(grpc_transport*, grpc_stream*, 
grpc_pollset_set*)’:
/build/apache-arrow-9.0.0.dev191/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/ext/transport/binder/transport/binder_transport.cc:135:29:
 error: format not a string literal and no format arguments 
[-Werror=format-security]
  135 |   gpr_log(GPR_INFO, __func__);
      |                             ^ {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16671) [C++][Docs] Include StopToken in documentation

2022-05-26 Thread David Li (Jira)
David Li created ARROW-16671:


 Summary: [C++][Docs] Include StopToken in documentation
 Key: ARROW-16671
 URL: https://issues.apache.org/jira/browse/ARROW-16671
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Reporter: David Li


It's used in Flight APIs at the very least so it would be good to have a doc 
page for it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16644) [C++] Unsuppress -Wno-return-stack-address

2022-05-24 Thread David Li (Jira)
David Li created ARROW-16644:


 Summary: [C++] Unsuppress -Wno-return-stack-address
 Key: ARROW-16644
 URL: https://issues.apache.org/jira/browse/ARROW-16644
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


Follow up for ARROW-16643: this code in {{small_vector_benchmark.cc}} generates 
a warning on clang-14 that we should unsuppress

{code:cpp}
template 
ARROW_NOINLINE int64_t ConsumeVector(Vector v) {
  return reinterpret_cast(v.data());
}

template 
ARROW_NOINLINE int64_t IngestVector(const Vector& v) {
  return reinterpret_cast(v.data());
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16625) [C++] IPC: validate batch schema equals stream schema in debug mode

2022-05-20 Thread David Li (Jira)
David Li created ARROW-16625:


 Summary: [C++] IPC: validate batch schema equals stream schema in 
debug mode
 Key: ARROW-16625
 URL: https://issues.apache.org/jira/browse/ARROW-16625
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


This came up in a Flight/Flight SQL demo a colleague was working on; it was 
possible to write a batch with a differing schema than what was stated for the 
stream, which would lead to a decoding failure on the other side. It might be 
useful in DEBUG mode to DCHECK this and fail-fast.

The error message could also be improved; {{ArrayLoader.GetBuffer}} could at 
least return the index and the actual # of buffers



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16597) [Python][FlightRPC] Active server may segfault if Python interpreter shuts down

2022-05-17 Thread David Li (Jira)
David Li created ARROW-16597:


 Summary: [Python][FlightRPC] Active server may segfault if Python 
interpreter shuts down
 Key: ARROW-16597
 URL: https://issues.apache.org/jira/browse/ARROW-16597
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Python
Affects Versions: 8.0.0
Reporter: David Li
Assignee: David Li


On Linux, this reliably segfaults for me with {{{}FATAL: exception not 
rethrown{}}}. Adding a \{[server.shutdown}} to the end fixes it.

The reason is that the Python interpreter exits after running the script, and 
other Python threads [call 
PyThread_exit_thread|https://github.com/python/cpython/blob/v3.10.4/Python/ceval_gil.h#L221].
 But one of the Python threads is currently in the middle of executing the RPC 
handler. PyThread_exit_thread boils down to pthread_exit which works by 
throwing an exception that it expects will not be caught. But gRPC places a 
{{catch(...)}} around RPC handlers and catches this exception, and then 
pthreads aborts when it doesn't catch the exception.

We should force servers to shutdown at exit to avoid this.

{code:python}
import traceback
import pyarrow as pa
import pyarrow.flight as flight

class Server(flight.FlightServerBase):
def do_put(self, context, descriptor, reader, writer):
raise flight.FlightCancelledError("foo", extra_info=b"bar")


print("PyArrow version:", pa.__version__)
server = Server("grpc://localhost:0")
client = flight.connect(f"grpc://localhost:{server.port}")

schema = pa.schema([])
writer, reader = client.do_put(flight.FlightDescriptor.for_command(b""), schema)
try:
writer.done_writing()
except flight.FlightError as e:
traceback.print_exc()
print(e.extra_info)
except Exception:
traceback.print_exc()
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16588) [C++][FlightRPC] Don't inherit from ::testing::Test in Flight common tests

2022-05-16 Thread David Li (Jira)
David Li created ARROW-16588:


 Summary: [C++][FlightRPC] Don't inherit from ::testing::Test in 
Flight common tests
 Key: ARROW-16588
 URL: https://issues.apache.org/jira/browse/ARROW-16588
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li
Assignee: David Li


https://github.com/apache/arrow/pull/13101#issuecomment-1127553809



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16573) [C++] Add canonical header guards for C Data Interface

2022-05-13 Thread David Li (Jira)
David Li created ARROW-16573:


 Summary: [C++] Add canonical header guards for C Data Interface
 Key: ARROW-16573
 URL: https://issues.apache.org/jira/browse/ARROW-16573
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Tom Drabas


See https://lists.apache.org/thread/fxrbpo9ywm0yjol9b5zgb04w6tns59qj



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16436) [C++] Datasets ignores CSV autogenerate_column_names during discovery

2022-05-02 Thread David Li (Jira)
David Li created ARROW-16436:


 Summary: [C++] Datasets ignores CSV autogenerate_column_names 
during discovery
 Key: ARROW-16436
 URL: https://issues.apache.org/jira/browse/ARROW-16436
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 7.0.0
Reporter: David Li


Reproduction

{code:python}
import tempfile
from pathlib import Path

import pyarrow as pa
import pyarrow.csv as csv
import pyarrow.dataset as ds

print("PyArrow version:", pa.__version__)

ro = csv.ReadOptions(autogenerate_column_names=True)
po = csv.ParseOptions()
co = csv.ConvertOptions()
file_format = ds.CsvFileFormat(read_options=ro, parse_options=po, 
convert_options=co)

with tempfile.TemporaryDirectory() as td:
td = Path(td).resolve()
with (td / "test.csv").open("w") as sink:
sink.write("1,a,true,1\n")

dataset = ds.dataset(str(td), format=file_format)
print(dataset.to_table())
{code}

Result:

{noformat}
PyArrow version: 7.0.0
Traceback (most recent call last):
  File "/home/lidavidm/csvdemo.py", line 20, in 
dataset = ds.dataset(str(td), format=file_format)
  File 
"/home/lidavidm/miniconda3/envs/arrow/lib/python3.10/site-packages/pyarrow/dataset.py",
 line 667, in dataset
return _filesystem_dataset(source, **kwargs)
  File 
"/home/lidavidm/miniconda3/envs/arrow/lib/python3.10/site-packages/pyarrow/dataset.py",
 line 422, in _filesystem_dataset
return factory.finish(schema)
  File "pyarrow/_dataset.pyx", line 1680, in 
pyarrow._dataset.DatasetFactory.finish
  File "pyarrow/error.pxi", line 143, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from 
'/tmp/tmp5rz0ipmm/test.csv': Could not open CSV input source 
'/tmp/tmp5rz0ipmm/test.csv': Invalid: CSV file contained multiple columns named 
1. Is this a 'csv' file?
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16420) [Python] pq.write_to_dataset always ignores partitioning

2022-04-29 Thread David Li (Jira)
David Li created ARROW-16420:


 Summary: [Python] pq.write_to_dataset always ignores partitioning
 Key: ARROW-16420
 URL: https://issues.apache.org/jira/browse/ARROW-16420
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 8.0.0
Reporter: David Li


The code unconditionally sets {{partitioning}} to None, so the user-supplied 
partitioning is ignored. 

https://github.com/apache/arrow/blob/edf7334fc38ec9bc2e019bf400403e7c61fb585e/python/pyarrow/parquet/__init__.py#L3143-L3146



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16419) [Python] pyarrow._exec_plan.execplan doesn't wait for plan to finish

2022-04-29 Thread David Li (Jira)
David Li created ARROW-16419:


 Summary: [Python] pyarrow._exec_plan.execplan doesn't wait for 
plan to finish
 Key: ARROW-16419
 URL: https://issues.apache.org/jira/browse/ARROW-16419
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 8.0.0
Reporter: David Li


It calls StopProducing but doesn't actually wait for finished(). This tends to 
cause "Plan was destroyed before finishing" to get printed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16417) [C++][Python] Segfault in test_exec_plan.py / test_joins

2022-04-29 Thread David Li (Jira)
David Li created ARROW-16417:


 Summary: [C++][Python] Segfault in test_exec_plan.py / test_joins
 Key: ARROW-16417
 URL: https://issues.apache.org/jira/browse/ARROW-16417
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 8.0.0
Reporter: David Li


Occurs during wheel verification. It also happens to master. The failure is 
sporadic but fairly reliable. test_joins is parameterized; it's not consistent 
in the parameters it occurs on, but it consistently occurs on that test.

The backtrace reaches into malloc_consolidate. MALLOC_CHECK doesn't help. 
However:
{noformat}
(gdb) b main
Breakpoint 1 at 0x11ea20: file 
/home/conda/feedstock_root/build_artifacts/python-split_1625973859697/work/Programs/python.c,
 line 15.
(gdb) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>call mcheck(0)
>continue
>end {noformat}
This fairly consistently fails with "memory clobbered before allocated block" 
but the location varies. 

This may be a red herring though. I also tried LD_PRELOADING a secure build of 
mimalloc to see if it would catch any sort of heap corruption but instead the 
tests pass consistently with mimalloc.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16372) [Python] Tests failing on s390x because they use Parquet

2022-04-27 Thread David Li (Jira)
David Li created ARROW-16372:


 Summary: [Python] Tests failing on s390x because they use Parquet
 Key: ARROW-16372
 URL: https://issues.apache.org/jira/browse/ARROW-16372
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: David Li


If I understand correctly, the Parquet implementation does not work on 
big-endian? So these tests need to be properly marked?

https://app.travis-ci.com/github/apache/arrow/jobs/568309096

{noformat}
=== FAILURES ===

__ test_dataset_join ___

tempdir = PosixPath('/tmp/pytest-of-root/pytest-0/test_dataset_join0')

@pytest.mark.dataset

def test_dataset_join(tempdir):

t1 = pa.table({

"colA": [1, 2, 6],

"col2": ["a", "b", "f"]

})

>   ds.write_dataset(t1, tempdir / "t1", format="parquet")

usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py:4428: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:880: in write_dataset

format = _ensure_format(format)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'parquet'

def _ensure_format(obj):

if isinstance(obj, FileFormat):

return obj

elif obj == "parquet":

if not _parquet_available:

>   raise ValueError(_parquet_msg)

E   ValueError: The pyarrow installation is not built with support 
for the Parquet file format.

usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:283: ValueError

_ test_dataset_join_unique_key _

tempdir = 
PosixPath('/tmp/pytest-of-root/pytest-0/test_dataset_join_unique_key0')

@pytest.mark.dataset

def test_dataset_join_unique_key(tempdir):

t1 = pa.table({

"colA": [1, 2, 6],

"col2": ["a", "b", "f"]

})

>   ds.write_dataset(t1, tempdir / "t1", format="parquet")

usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py:4459: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:880: in write_dataset

format = _ensure_format(format)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'parquet'

def _ensure_format(obj):

if isinstance(obj, FileFormat):

return obj

elif obj == "parquet":

if not _parquet_available:

>   raise ValueError(_parquet_msg)

E   ValueError: The pyarrow installation is not built with support 
for the Parquet file format.

usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:283: ValueError

_ test_dataset_join_collisions _

tempdir = 
PosixPath('/tmp/pytest-of-root/pytest-0/test_dataset_join_collisions0')

@pytest.mark.dataset

def test_dataset_join_collisions(tempdir):

t1 = pa.table({

"colA": [1, 2, 6],

"colB": [10, 20, 60],

"colVals": ["a", "b", "f"]

})

>   ds.write_dataset(t1, tempdir / "t1", format="parquet")

usr/local/lib/python3.8/dist-packages/pyarrow/tests/test_dataset.py:4491: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:880: in write_dataset

format = _ensure_format(format)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'parquet'

def _ensure_format(obj):

if isinstance(obj, FileFormat):

return obj

elif obj == "parquet":

if not _parquet_available:

>   raise ValueError(_parquet_msg)

E   ValueError: The pyarrow installation is not built with support 
for the Parquet file format.

usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:283: ValueError

_ test_parquet_invalid_version _

tempdir = 
PosixPath('/tmp/pytest-of-root/pytest-0/test_parquet_invalid_version0')

def test_parquet_invalid_version(tempdir):

table = pa.table({'a': [1, 2, 3]})

with pytest.raises(ValueError, match="Unsupported Parquet format 
version"):

>   _write_table(table, tempdir / 'test_version.parquet', version="2.2")

E   NameError: name '_write_table' is not defined

usr/local/lib/python3.8/dist-packages/pyarrow/tests/parquet/test_basic.py:52: 
NameError{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16347) [Packaging] verify-release-candidate fails oddly if a Conda environment is active

2022-04-26 Thread David Li (Jira)
David Li created ARROW-16347:


 Summary: [Packaging] verify-release-candidate fails oddly if a 
Conda environment is active
 Key: ARROW-16347
 URL: https://issues.apache.org/jira/browse/ARROW-16347
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Affects Versions: 8.0.0
Reporter: David Li


{noformat}
Conda environment is active despite that USE_CONDA is set to 0.

CommandNotFoundError: No command 'conda deactive'.
Did you mean 'conda deactivate'?
{noformat}

The next line is {{echo "Deactivate the environment using `conda deactive` 
before running the verification script."}} but this tries to _evaluate_ "conda 
deactive" which of course fails. The typo should be fixed, but also the 
backticks should be escaped.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16271) [C++] Implement full chunked array support for replace_with_mask

2022-04-21 Thread David Li (Jira)
David Li created ARROW-16271:


 Summary: [C++] Implement full chunked array support for 
replace_with_mask
 Key: ARROW-16271
 URL: https://issues.apache.org/jira/browse/ARROW-16271
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


ARROW-15928 enables this function to accept chunked arrays for the input array, 
but not for the mask or replacements array. More work is needed to implement 
those cases (which currently just return an error).

We should also consider how to make this work at least somewhat reusable for 
similar kernels (e.g. replace_with_indices)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16238) [C++] Fix nullptr deference in ipc/reader.cc

2022-04-19 Thread David Li (Jira)
David Li created ARROW-16238:


 Summary: [C++] Fix nullptr deference in ipc/reader.cc
 Key: ARROW-16238
 URL: https://issues.apache.org/jira/browse/ARROW-16238
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


MinGW GCC catches this
{noformat}
[20/278] Building CXX object 
src/arrow/CMakeFiles/arrow_shared.dir/Unity/unity_24_cxx.cxx.obj
In file included from 
C:/msys64/home/User/arrow/build/cpp/src/arrow/CMakeFiles/arrow_shared.dir/Unity/unity_24_cxx.cxx:3:
C:/msys64/home/User/arrow/cpp/src/arrow/ipc/reader.cc: In member function 
'virtual 
arrow::Result 
>()> > arrow::ipc::RecordBatchFileReaderImpl::GetRecordBatchGenerator(bool, 
const arrow::io::IOContext&, arrow::io::CacheOptions, 
arrow::internal::Executor*)':
C:/msys64/home/User/arrow/cpp/src/arrow/ipc/reader.cc:1303:34: warning: 'this' 
pointer is null [-Wnonnull]
 1303 |       return cached_source->Cache({{0, footer_offset_}});
      |              ^~~
In file included from C:/msys64/home/User/arrow/cpp/src/arrow/ipc/reader.h:28,
                 from C:/msys64/home/User/arrow/cpp/src/arrow/ipc/reader.cc:18,
                 from 
C:/msys64/home/User/arrow/build/cpp/src/arrow/CMakeFiles/arrow_shared.dir/Unity/unity_24_cxx.cxx:3:
C:/msys64/home/User/arrow/cpp/src/arrow/io/caching.h:124:10: note: in a call to 
non-static member function 'arrow::Status 
arrow::io::internal::ReadRangeCache::Cache(std::vector)'
  124 |   Status Cache(std::vector ranges);
      |          ^ {noformat}

This is pretty clearly wrong:

{code:cpp}
std::shared_ptr cached_source;
if (coalesce && file_->supports_zero_copy()) {
  if (!owned_file_) return Status::Invalid("Cannot coalesce without an 
owned file");
  // Since the user is asking for all fields then we can cache the entire
  // file (up to the footer)
  return cached_source->Cache({{0, footer_offset_}});
}
return WholeIpcFileRecordBatchGenerator(std::move(state), 
std::move(cached_source),
io_context, executor);
{code}

It seems ARROW-14577 removed one too many lines



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16235) [C++][FlightRPC] Flight does not build on MinGW

2022-04-19 Thread David Li (Jira)
David Li created ARROW-16235:


 Summary: [C++][FlightRPC] Flight does not build on MinGW
 Key: ARROW-16235
 URL: https://issues.apache.org/jira/browse/ARROW-16235
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: David Li
Assignee: David Li


https://github.com/apache/arrow/runs/6077889425?check_suite_focus=true
{noformat}
[180/316] Building CXX object 
src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx.obj
FAILED: 
src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx.obj
 
D:\a\_temp\msys64\mingw32\bin\ccache.exe D:\a\_temp\msys64\mingw32\bin\c++.exe 
-DARROW_FLIGHT_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_BMI2 
-DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS 
-DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 
-DARROW_WITH_SNAPPY -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD 
-DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=8 -DAWS_SDK_VERSION_PATCH=149 
-DAWS_USE_IO_COMPLETION_PORTS -DBOOST_USE_WINDOWS_H=1 
-DGRPC_NAMESPACE_FOR_TLS_CREDENTIALS_OPTIONS=grpc::experimental 
-DGRPC_USE_CERTIFICATE_VERIFIER -DGRPC_USE_TLS_CHANNEL_CREDENTIALS_OPTIONS 
-DGTEST_LINKED_AS_SHARED_LIBRARY=1 -DURI_STATIC_BUILD -DUSE_IMPORT_EXPORT 
-DUSE_IMPORT_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS -D_CRT_SECURE_NO_WARNINGS 
-D_ENABLE_EXTENDED_ALIGNED_STORAGE -Darrow_flight_testing_shared_EXPORTS 
-ID:/a/arrow/arrow/build/cpp/src -ID:/a/arrow/arrow/cpp/src 
-ID:/a/arrow/arrow/cpp/src/generated -isystem 
D:/a/arrow/arrow/cpp/thirdparty/flatbuffers/include -isystem /mingw32/include 
-isystem D:/a/arrow/arrow/build/cpp/xsimd_ep/src/xsimd_ep-install/include 
-isystem D:/a/arrow/arrow/cpp/thirdparty/hadoop/include -Wno-noexcept-type 
-Wno-subobject-linkage  -fdiagnostics-color=always -O3 -DNDEBUG  -Wa,-mbig-obj 
-Wall -Wno-conversion -Wno-deprecated-declarations -Wno-sign-conversion 
-Wunused-result -fno-semantic-interposition -mxsave -msse4.2  -O3 -DNDEBUG 
-std=c++11 -MD -MT 
src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx.obj
 -MF 
src\arrow\flight\CMakeFiles\arrow_flight_testing_shared.dir\Unity\unity_0_cxx.cxx.obj.d
 -o 
src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx.obj
 -c 
D:/a/arrow/arrow/build/cpp/src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx
In file included from 
D:/a/arrow/arrow/build/cpp/src/arrow/flight/CMakeFiles/arrow_flight_testing_shared.dir/Unity/unity_0_cxx.cxx:5:
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc: In function 'arrow::Status 
arrow::flight::ExampleTlsCertificates(std::vector*)':
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:775:31: error: variable 
'std::ifstream cert_file' has initializer but incomplete type
  775 |       std::ifstream cert_file(cert_path.str());
      |                               ^
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:782:30: error: variable 
'std::ifstream key_file' has initializer but incomplete type
  782 |       std::ifstream key_file(key_path.str());
      |                              ^~~~
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:790:42: error: expected 
unqualified-id before '&' token
  790 |     } catch (const std::ifstream::failure& e) {
      |                                          ^
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:790:42: error: expected ')' 
before '&' token
  790 |     } catch (const std::ifstream::failure& e) {
      |             ~                            ^
      |                                          )
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:790:42: error: expected '{' 
before '&' token
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:790:44: error: 'e' was not 
declared in this scope
  790 |     } catch (const std::ifstream::failure& e) {
      |                                            ^
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc: In function 'arrow::Status 
arrow::flight::ExampleTlsCertificateRoot(arrow::flight::CertKeyPair*)':
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:805:29: error: variable 
'std::ifstream cert_file' has initializer but incomplete type
  805 |     std::ifstream cert_file(path.str());
      |                             ^~~~
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:814:40: error: expected 
unqualified-id before '&' token
  814 |   } catch (const std::ifstream::failure& e) {
      |                                        ^
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:814:40: error: expected ')' 
before '&' token
  814 |   } catch (const std::ifstream::failure& e) {
      |           ~                            ^
      |                                        )
D:/a/arrow/arrow/cpp/src/arrow/flight/test_util.cc:814:40: error: expected '{' 
before 

[jira] [Created] (ARROW-16232) [C++] Include OpenTelemetry in LICENSE.txt

2022-04-19 Thread David Li (Jira)
David Li created ARROW-16232:


 Summary: [C++] Include OpenTelemetry in LICENSE.txt
 Key: ARROW-16232
 URL: https://issues.apache.org/jira/browse/ARROW-16232
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li
Assignee: David Li
 Fix For: 8.0.0


While I don't think we're distributing it yet, we shouldn't forget to do this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16221) [C++][Docs] Provide more complete linking/CMake project example

2022-04-18 Thread David Li (Jira)
David Li created ARROW-16221:


 Summary: [C++][Docs] Provide more complete linking/CMake project 
example
 Key: ARROW-16221
 URL: https://issues.apache.org/jira/browse/ARROW-16221
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Reporter: David Li


While there's a minimal example of using CMake to link against Arrow, a fuller 
example (or two) showing some of the Arrow libraries, the bundled dependencies 
(in the static build), etc. would also be useful. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16217) [C++][FlightRPC] Don't use ExecutionError in Flight SQL

2022-04-18 Thread David Li (Jira)
David Li created ARROW-16217:


 Summary: [C++][FlightRPC] Don't use ExecutionError in Flight SQL
 Key: ARROW-16217
 URL: https://issues.apache.org/jira/browse/ARROW-16217
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


This is meant for Gandiva, we should use a more relevant error



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16216) [Python][FlightRPC] Fix test_flight.py when flight is not available

2022-04-18 Thread David Li (Jira)
David Li created ARROW-16216:


 Summary: [Python][FlightRPC] Fix test_flight.py when flight is not 
available
 Key: ARROW-16216
 URL: https://issues.apache.org/jira/browse/ARROW-16216
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Python
Reporter: Kouhei Sutou
Assignee: David Li


https://github.com/apache/arrow/pull/12749#discussion_r851671770

{{flight}} is {{None}} when not building flight so don't use the module at 
module level



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16215) [C++][FlightRPC] Segfault in TestBasicAuthHandler.FailUnauthenticatedCalls

2022-04-18 Thread David Li (Jira)
David Li created ARROW-16215:


 Summary: [C++][FlightRPC] Segfault in 
TestBasicAuthHandler.FailUnauthenticatedCalls
 Key: ARROW-16215
 URL: https://issues.apache.org/jira/browse/ARROW-16215
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: David Li


{noformat}
[ RUN  ] TestBasicAuthHandler.FailUnauthenticatedCalls
C:/projects/arrow/cpp/src/arrow/flight/client.cc:363: Close() failed: IOError: 
Flight returned unauthenticated error, with message: Invalid token. Detail: 
Unauthenticated. gRPC client debug context: 
{"created":"@1650191019.67300","description":"Error received from peer 
ipv4:127.0.0.1:1955","file":"D:\bld\grpc-cpp_1646464801475\work\src\core\lib\surface\call.cc","file_line":904,"grpc_message":"Invalid
 token. Detail: Unauthenticated","grpc_status":16}. Client context: OK. Detail: 
Unauthenticated
unknown file: error: SEH exception with code 0xc005 thrown in the test body.
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16205) [C++][FlightRPC] Flight does not build in MacOS release verification

2022-04-15 Thread David Li (Jira)
David Li created ARROW-16205:


 Summary: [C++][FlightRPC] Flight does not build in MacOS release 
verification
 Key: ARROW-16205
 URL: https://issues.apache.org/jira/browse/ARROW-16205
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Kouhei Sutou
Assignee: David Li
 Fix For: 8.0.0


https://github.com/apache/arrow/pull/12749#issuecomment-1100388959



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16173) [C++] Add benchmarks for temporal functions/kernels

2022-04-12 Thread David Li (Jira)
David Li created ARROW-16173:


 Summary: [C++] Add benchmarks for temporal functions/kernels
 Key: ARROW-16173
 URL: https://issues.apache.org/jira/browse/ARROW-16173
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


See ML: https://lists.apache.org/thread/bp2f036sgfj72o46yqmglnx20zfc6tfq



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16162) [C++][FlightRPC] Flight does not build on Ubuntu 18.04

2022-04-11 Thread David Li (Jira)
David Li created ARROW-16162:


 Summary: [C++][FlightRPC] Flight does not build on Ubuntu 18.04
 Key: ARROW-16162
 URL: https://issues.apache.org/jira/browse/ARROW-16162
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Kouhei Sutou
Assignee: David Li


See this nightly for instance: 
https://github.com/ursacomputing/crossbow/runs/5953173410?check_suite_focus=true#step:5:8623



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16149) [Python][FlightRPC] Expose UCX transport to Python

2022-04-07 Thread David Li (Jira)
David Li created ARROW-16149:


 Summary: [Python][FlightRPC] Expose UCX transport to Python
 Key: ARROW-16149
 URL: https://issues.apache.org/jira/browse/ARROW-16149
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Python
Reporter: David Li


The UCX transport lives in a separate shared library, which may complicate 
distribution (though for 8.0.0 we probably don't care about that yet).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16146) [C++] arrow-gcsfs-test is timing out

2022-04-07 Thread David Li (Jira)
David Li created ARROW-16146:


 Summary: [C++] arrow-gcsfs-test is timing out
 Key: ARROW-16146
 URL: https://issues.apache.org/jira/browse/ARROW-16146
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


{noformat}
The following tests FAILED:
101 - arrow-gcsfs-test (Timeout)
{noformat}

Appears to have started with [an unrelated minor 
PR|https://github.com/apache/arrow/commit/e047c9a6c9df565b86143036cc6bab26d3a59306].
 Observed on master and across several PRs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16145) [C++] Vector kernels should implement or reject null_handling = INTERSECTION

2022-04-07 Thread David Li (Jira)
David Li created ARROW-16145:


 Summary: [C++] Vector kernels should implement or reject 
null_handling = INTERSECTION
 Key: ARROW-16145
 URL: https://issues.apache.org/jira/browse/ARROW-16145
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: David Li


As discovered in ARROW-13530, right now the framework will let you register a 
vector kernel with null_handling = INTERSECTION, but doesn't actually implement 
that (it'll preallocate but won't compute the result). We should either 
implement it, or decide it makes no sense and explicitly reject registering 
kernels with this null handling mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16135) [C++][FlightRPC] Investigate TSAN with gRPC/UCX tests

2022-04-06 Thread David Li (Jira)
David Li created ARROW-16135:


 Summary: [C++][FlightRPC] Investigate TSAN with gRPC/UCX tests
 Key: ARROW-16135
 URL: https://issues.apache.org/jira/browse/ARROW-16135
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


The gRPC Flight tests trigger lots of TSAN errors and the UCX Flight tests 
segfault inside UCX when TSAN is enabled. [This gRPC 
issue|https://github.com/grpc/grpc/issues/16749] is quite old, but suggests we 
need to build gRPC itself with TSAN. We should investigate these cases.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16127) [C++][FlightRPC] Improve concurrent call implementation in UCX client

2022-04-05 Thread David Li (Jira)
David Li created ARROW-16127:


 Summary: [C++][FlightRPC] Improve concurrent call implementation 
in UCX client
 Key: ARROW-16127
 URL: https://issues.apache.org/jira/browse/ARROW-16127
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


This currently relies on a pool of workers and endpoints; ideally we would be 
able to share a worker or even better multiplex multiple calls over a single 
endpoint (this would require wire protocol changes, however!). Care should be 
taken not to hurt performance if we do enable a multithreaded worker (which 
would be necessary, unless we switch to a model where all threads send work to 
a single worker thread).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16126) [C++][FlightRPC] Pipeline memory allocation/registration

2022-04-05 Thread David Li (Jira)
David Li created ARROW-16126:


 Summary: [C++][FlightRPC] Pipeline memory allocation/registration
 Key: ARROW-16126
 URL: https://issues.apache.org/jira/browse/ARROW-16126
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


Where possible in the UCX transport, we should allocate and register buffers in 
the background instead of blocking the thread doing UCX work.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16125) [C++][FlightRPC] Implement shutdown with deadline for UCX

2022-04-05 Thread David Li (Jira)
David Li created ARROW-16125:


 Summary: [C++][FlightRPC] Implement shutdown with deadline for UCX
 Key: ARROW-16125
 URL: https://issues.apache.org/jira/browse/ARROW-16125
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


The UCX server in ARROW-15706 does not implement shutdown with deadline.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16124) [C++][FlightRPC] UCX server should be able to shed load

2022-04-05 Thread David Li (Jira)
David Li created ARROW-16124:


 Summary: [C++][FlightRPC] UCX server should be able to shed load
 Key: ARROW-16124
 URL: https://issues.apache.org/jira/browse/ARROW-16124
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


The UCX server from ARROW-15706 will accept connections and put them into a 
queue to be handled. If they aren't handled quickly enough this can lead to a 
lot of clients stuck waiting for the server. The server should reject 
connections if too many pile up so the client can error or retry or connect to 
a different server. (This is a pitfall of gRPC/Java that we should avoid here.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16116) [C++] Properly handle non-nullable fields in Parquet reading

2022-04-04 Thread David Li (Jira)
David Li created ARROW-16116:


 Summary: [C++] Properly handle non-nullable fields in Parquet 
reading
 Key: ARROW-16116
 URL: https://issues.apache.org/jira/browse/ARROW-16116
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li


ARROW-15961 found that the Parquet Arrow reader wasn't respecting the nullable 
aspect of fields, we need to ensure that if we reconstruct an array for a 
non-nullable field, that it has no validity bitmap. We need to also add tests 
for this case, they're implicitly tested in a few places, but we should 
explicitly test this for all supported types.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   5   6   >