Wenbo Hu 于2024年3月20日周三 22:03写道:
>
> Hi David,
>
> I've been working on xDBC with Arrow for a while. I have some thoughts on
> ODBC.
>
> We connect to the DBMS in Arrow stream using Python through four
> different methods: JDBC, ADBC, ODBC, and the Python DB client li
e there been any
> discussions of ADBC having a similar system-wide driver registration paradigm
> like ODBC does?
--
-----
Best Regards,
Wenbo Hu,
in
> the ticket if you are interested. In the meantime, the only workaround I
> can think of is probably to slow down the data source enough that the queue
> doesn't fill up.
>
> [1] https://github.com/apache/arrow/issues/36951
>
>
> On Sun, Jul 30, 2023 at 8:15 PM Wenbo Hu
om_batches(schema,
rb_generator(64, 32768, 100))
local_fs = pa.fs.LocalFileSystem()
pa.dataset.write_dataset(
reader,
"/tmp/data_f",
format="feather",
partitioning=["bucket"],
filesystem=local_fs,
existing_data_be
ble that has thousands of waiters and is constantly doing a
> notify_all.
>
> I think we will need to figure out some kind of reproducible test case. I
> will try and find some time to run some experiments on Monday. Maybe I can
> reproduce this by setting the backpressure limi
supply of data is slower then your writer and I wouldn't expect memory to
> accumulate. These things are solutions but might give us more clues into
> what is happening.
>
> [1]
> https://unix.stackexchange.com/questions/300106/why-is-the-oom-killer-killing-processes-when-swap-
.
--
-
Best Regards,
Wenbo Hu,
xt(); }); }};
ac::Declaration source{"record_batch_source", std::move(rb_source_options)};
```
Works as expected.
Wenbo Hu 于2023年7月26日周三 10:22写道:
>
> Hi,
> I'll open a issue on the DeclareToReader problem.
> I think the key problem is that the input stream is unordered. Th
lready in memory (the in-memory sources do set the batch index).
>
> I think your understanding of the concept is correct however. Can you
> share a sample plan that is not working for you? If you use
> DeclarationToTable do you get consistently ordered results?
>
> On Tue, J
cannot follow the input batch order?
Then how the substrait works in this scenario? Does it output
disorderly as well?
Wenbo Hu 于2023年7月25日周二 19:12写道:
>
> Hi,
> I'm trying to zip two streams with same order but different processes.
> For example, the original stream comes wit
that keep the order as the original
input?
--
-
Best Regards,
Wenbo Hu,
rrow.apache.org/docs/format/Columnar.html#buffer-listing-for-each-layout
> [2] https://github.com/apache/arrow/issues/36123
>
>
> On Mon, Jul 17, 2023 at 4:44 PM Wenbo Hu wrote:
>
> > Hi,
> > I'm using Acero as the stream executor to run large scale data
>
::fixed_size_binary(32)`, then
how can I directly write to the out buffers and what is the actual
type should I get from `GetValues`?
Maybe, `auto *out_values =
out->array_span_mutable()->GetValues(uint8_t *>(1);` and
`memcpy(*out_values++, some_ptr, 32);`?
--
-
Best Regar
Sorry, my bad. It works over time.
It seems that the grpc starts with a default window size, then update
to a stable value according to options.
Wenbo Hu 于2023年7月7日周五 23:32写道:
>
> Both my server and client are implemented in python now, Java Client
> may be in the future.
> Back pr
tions to tweak this, IIRC)
>
> On Thu, Jul 6, 2023, at 23:18, Wenbo Hu wrote:
> > Hi,
> > I'm using arrow flight to transfer data in distributed system, but
> > the lightning speed makes both client and server faces out of memory
> > issue.
> > For do_put and
for client
download data, or is there any better way to implement that?
--
-
Best Regards,
Wenbo Hu,
exported record batch, destroying the Python RecordBatch calls the
> record batch's release callback.
>
> Regards
>
> Antoine.
>
>
>
>
>
>
> Le 29/06/2023 à 15:05, Wenbo Hu a écrit :
> > Thanks for your explanation, Antoine.
> >
> > I figured out
lives
allocator (as long as the consumer/callback), code works as expected.
Antoine Pitrou 于2023年6月29日周四 17:55写道:
>
>
> Le 29/06/2023 à 09:50, Wenbo Hu a écrit :
> > Hi,
> >
> > I'm using Jpype to pass streams between java and python back and forth.
> >
> &
org.apache.arrow.c.Data.exportArrayStream(allocator, r, s)
with pa.RecordBatchReader._import_from_c(c_stream_ptr) as stream:
for rb in stream: # type: pa.RecordBatch
writer.write(rb)
del rb
del writer
```
Wenbo Hu
referenced by downstream users.
Is yielding a weakref-ed `rb` a good idea? Will the weakref-ed
RecordBatchReader works with other pyarrow api (dataset)?
--
---------
Best Regards,
Wenbo Hu,
ption of dataset may need
to have a dedicated implementation rather than directly call compute
to filter. Furthermore, Acero may also benefit from this feature for
scansink.
Or any other ideas for this situation?
--
---------
Best Regards,
Wenbo Hu,
making immediate ArrowRecordBatch
unnecessarily? (ArrowBuf -> VectorSchemaRoot@UpstreamReader ->
ArrowBuf@Loader ->VectorSchemaRoot@DownstreamWriter -> ArrowBuf)
Maybe it relates to the allocator, is it any better implementations on
same allocator?
--
-
Best Regards,
Wenbo Hu,
is supported in C++/Python
(https://github.com/apache/arrow/search?q=mtls). Is there any plan to
expose mtls to Java Implementation?
--
-
Best Regards,
Wenbo Hu,
906
> > [2] https://github.com/apache/arrow/pull/11507
> > [3] https://issues.apache.org/jira/browse/ARROW-7744
--
-
Best Regards,
Wenbo Hu,
implementation, but fake certain parts of gRPC since you're doing your own
> in-process proxying/translation? If so the implementation would be different
> than what we would do for a truly new transport. Also, it would effectively
> mean exposing the internals of Flight/gRPC as part of the AP
Hi all,
I've just post an issue [ARROW-13889] on jira as below. Maybe here is the right
place to discuss.
I'm trying to implement Flight RPC on RPC framework with protobuf message
support in distributed system.
However, the flight rpc is tied to grpc.
Classes from grpc used in flight
26 matches
Mail list logo