Re: ADBC - OS-level driver manager

2024-03-20 Thread Wenbo Hu
Wenbo Hu 于2024年3月20日周三 22:03写道: > > Hi David, > > I've been working on xDBC with Arrow for a while. I have some thoughts on > ODBC. > > We connect to the DBMS in Arrow stream using Python through four > different methods: JDBC, ADBC, ODBC, and the Python DB client li

Re: ADBC - OS-level driver manager

2024-03-20 Thread Wenbo Hu
e there been any > discussions of ADBC having a similar system-wide driver registration paradigm > like ODBC does? -- ----- Best Regards, Wenbo Hu,

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-31 Thread Wenbo Hu
in > the ticket if you are interested. In the meantime, the only workaround I > can think of is probably to slow down the data source enough that the queue > doesn't fill up. > > [1] https://github.com/apache/arrow/issues/36951 > > > On Sun, Jul 30, 2023 at 8:15 PM Wenbo Hu

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-30 Thread Wenbo Hu
om_batches(schema, rb_generator(64, 32768, 100)) local_fs = pa.fs.LocalFileSystem() pa.dataset.write_dataset( reader, "/tmp/data_f", format="feather", partitioning=["bucket"], filesystem=local_fs, existing_data_be

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-30 Thread Wenbo Hu
ble that has thousands of waiters and is constantly doing a > notify_all. > > I think we will need to figure out some kind of reproducible test case. I > will try and find some time to run some experiments on Monday. Maybe I can > reproduce this by setting the backpressure limi

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-28 Thread Wenbo Hu
supply of data is slower then your writer and I wouldn't expect memory to > accumulate. These things are solutions but might give us more clues into > what is happening. > > [1] > https://unix.stackexchange.com/questions/300106/why-is-the-oom-killer-killing-processes-when-swap-

dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-27 Thread Wenbo Hu
. -- - Best Regards, Wenbo Hu,

Re: how to make acero output order by batch index

2023-07-25 Thread Wenbo Hu
xt(); }); }}; ac::Declaration source{"record_batch_source", std::move(rb_source_options)}; ``` Works as expected. Wenbo Hu 于2023年7月26日周三 10:22写道: > > Hi, > I'll open a issue on the DeclareToReader problem. > I think the key problem is that the input stream is unordered. Th

Re: how to make acero output order by batch index

2023-07-25 Thread Wenbo Hu
lready in memory (the in-memory sources do set the batch index). > > I think your understanding of the concept is correct however. Can you > share a sample plan that is not working for you? If you use > DeclarationToTable do you get consistently ordered results? > > On Tue, J

Re: how to make acero output order by batch index

2023-07-25 Thread Wenbo Hu
cannot follow the input batch order? Then how the substrait works in this scenario? Does it output disorderly as well? Wenbo Hu 于2023年7月25日周二 19:12写道: > > Hi, > I'm trying to zip two streams with same order but different processes. > For example, the original stream comes wit

how to make acero output order by batch index

2023-07-25 Thread Wenbo Hu
that keep the order as the original input? -- - Best Regards, Wenbo Hu,

Re: Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Wenbo Hu
rrow.apache.org/docs/format/Columnar.html#buffer-listing-for-each-layout > [2] https://github.com/apache/arrow/issues/36123 > > > On Mon, Jul 17, 2023 at 4:44 PM Wenbo Hu wrote: > > > Hi, > > I'm using Acero as the stream executor to run large scale data >

Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Wenbo Hu
::fixed_size_binary(32)`, then how can I directly write to the out buffers and what is the actual type should I get from `GetValues`? Maybe, `auto *out_values = out->array_span_mutable()->GetValues(uint8_t *>(1);` and `memcpy(*out_values++, some_ptr, 32);`? -- - Best Regar

Re: Traffic control and cancel detect on flight do_get in python implementation

2023-07-07 Thread Wenbo Hu
Sorry, my bad. It works over time. It seems that the grpc starts with a default window size, then update to a stable value according to options. Wenbo Hu 于2023年7月7日周五 23:32写道: > > Both my server and client are implemented in python now, Java Client > may be in the future. > Back pr

Re: Traffic control and cancel detect on flight do_get in python implementation

2023-07-07 Thread Wenbo Hu
tions to tweak this, IIRC) > > On Thu, Jul 6, 2023, at 23:18, Wenbo Hu wrote: > > Hi, > > I'm using arrow flight to transfer data in distributed system, but > > the lightning speed makes both client and server faces out of memory > > issue. > > For do_put and

Traffic control and cancel detect on flight do_get in python implementation

2023-07-06 Thread Wenbo Hu
for client download data, or is there any better way to implement that? -- - Best Regards, Wenbo Hu,

Re: detect memory leak between java and python

2023-06-30 Thread Wenbo Hu
exported record batch, destroying the Python RecordBatch calls the > record batch's release callback. > > Regards > > Antoine. > > > > > > > Le 29/06/2023 à 15:05, Wenbo Hu a écrit : > > Thanks for your explanation, Antoine. > > > > I figured out

Re: detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
lives allocator (as long as the consumer/callback), code works as expected. Antoine Pitrou 于2023年6月29日周四 17:55写道: > > > Le 29/06/2023 à 09:50, Wenbo Hu a écrit : > > Hi, > > > > I'm using Jpype to pass streams between java and python back and forth. > > > &

Re: detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
org.apache.arrow.c.Data.exportArrayStream(allocator, r, s) with pa.RecordBatchReader._import_from_c(c_stream_ptr) as stream: for rb in stream: # type: pa.RecordBatch writer.write(rb) del rb del writer ``` Wenbo Hu

detect memory leak between java and python

2023-06-29 Thread Wenbo Hu
referenced by downstream users. Is yielding a weakref-ed `rb` a good idea? Will the weakref-ed RecordBatchReader works with other pyarrow api (dataset)? -- --------- Best Regards, Wenbo Hu,

Add limit and offset to ScannerOption

2023-06-02 Thread Wenbo Hu
ption of dataset may need to have a dedicated implementation rather than directly call compute to filter. Furthermore, Acero may also benefit from this feature for scansink. Or any other ideas for this situation? -- --------- Best Regards, Wenbo Hu,

Best practice on populating from VectorSchemaRoot to VectorSchemaRoot, ArrowStreamReader to ArrowStreamWriter

2023-04-03 Thread Wenbo Hu
making immediate ArrowRecordBatch unnecessarily? (ArrowBuf -> VectorSchemaRoot@UpstreamReader -> ArrowBuf@Loader ->VectorSchemaRoot@DownstreamWriter -> ArrowBuf) Maybe it relates to the allocator, is it any better implementations on same allocator? -- - Best Regards, Wenbo Hu,

Exposing Mutual TLS to java flight server

2023-03-23 Thread Wenbo Hu
is supported in C++/Python (https://github.com/apache/arrow/search?q=mtls). Is there any plan to expose mtls to Java Implementation? -- - Best Regards, Wenbo Hu,

Re: [DISCUSS][FLIGHT SQL] Intentions around JDBC and/or ODBC for Flight SQL?

2021-12-14 Thread Wenbo Hu
906 > > [2] https://github.com/apache/arrow/pull/11507 > > [3] https://issues.apache.org/jira/browse/ARROW-7744 -- - Best Regards, Wenbo Hu,

Re: [C++] Decouple Flight RPC from GRPC

2021-09-03 Thread Wenbo Hu
implementation, but fake certain parts of gRPC since you're doing your own > in-process proxying/translation? If so the implementation would be different > than what we would do for a truly new transport. Also, it would effectively > mean exposing the internals of Flight/gRPC as part of the AP

[C++] Decouple Flight RPC from GRPC

2021-09-03 Thread Wenbo Hu
Hi all, I've just post an issue [ARROW-13889] on jira as below. Maybe here is the right place to discuss. I'm trying to implement Flight RPC on RPC framework with protobuf message support in distributed system. However, the flight rpc is tied to grpc. Classes from grpc used in flight