Thanks for your effort. I'd like to help with the code review. Best, Liya Fan
On Fri, May 7, 2021 at 5:20 PM Joris Peeters <[email protected]> wrote: > https://issues.apache.org/jira/browse/ARROW-12679 > > On Fri, May 7, 2021 at 8:54 AM Joris Peeters <[email protected]> > wrote: > >> Fair enough. >> I have this data moving through a few different servers and clients, in >> IPC streaming format, consumed on various platforms/languages. The >> nullability in the schema is often used in "language-friendly" clients, >> e.g. to build a `std::vector<bool>` or `std::vector<std::optional<bool>>` >> depending on whether the bit column is nullable, so preserving this >> information is quite important, even if locally in Java it makes little >> difference. >> >> I've worked around it for now by fudging the VectorSchemaRoot's schema >> myself, but I'll open a JIRA to track, and I'll assign it to myself and >> provide a fix. >> >> Cheers! >> -Joris. >> >> >> On Fri, May 7, 2021 at 3:22 AM Fan Liya <[email protected]> wrote: >> >>> Hi Joris, >>> >>> I think you are right. >>> >>> We only use the nullability information in the consumers, because it >>> makes a difference in performance. >>> >>> The nullability information in the schema is not accurate, as you have >>> observed. >>> However, such information is not well-used in the Java implementation >>> (IMHO). For example, the validity buffer is allocated even if the vector is >>> non-nullable. >>> >>> That said, I think it would be better to keep the nullability >>> information in sync. >>> So maybe we can open a JIRA to track it? >>> >>> Best, >>> Liya Fan >>> >>> >>> On Thu, May 6, 2021 at 3:09 PM Joris Peeters <[email protected]> >>> wrote: >>> >>>> Hello Fan, >>>> >>>> Yes, but it seems that code path only affects the consumers, and >>>> whether they set a value in the vector or not, see e.g. >>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/consumer/DoubleConsumer.java#L57 >>>> However, the VectorSchemaRoot's schema, defined I believe at >>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L59, >>>> does not appear to use this info, and just sets every column's nullability >>>> to true (as per the link in my original email). >>>> >>>> Note that we are indeed using the ArrowVectorIterator, and it's when >>>> iterating over the iterator and inspecting the schema of the elements >>>> (VectorSchemaRoot) that I notice this. >>>> Maybe all this needs is a `isColumnNullable(i, ..)` instead of `true` >>>> in `final FieldType fieldType = new FieldType(true, arrowType, /* >>>> dictionary encoding */ null, metadata);`. >>>> >>>> Cheers, >>>> -J >>>> >>>> On Thu, May 6, 2021 at 5:53 AM Fan Liya <[email protected]> wrote: >>>> >>>>> Hi Joris, >>>>> >>>>> Thanks for reporting the problem. >>>>> >>>>> We make use of the nullable information >>>>> in ArrowVectorIterator#initialize. (Details can be found in >>>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L73 >>>>> ) >>>>> >>>>> Please note that the ArrowVectorIterator is our encouraged way of >>>>> using the JDBC adapter. >>>>> >>>>> Best, >>>>> Liya Fan >>>>> >>>>> >>>>> On Wed, May 5, 2021 at 1:42 PM Micah Kornfield <[email protected]> >>>>> wrote: >>>>> >>>>>> I would need to look further, but I thought we handled null vs not >>>>>> null. At least I thought we had specialized conversion code to avoid >>>>>> branches. If this isn't the case it seems reasonable to contribute a >>>>>> path. >>>>>> >>>>>> On Tue, May 4, 2021 at 3:39 AM Joris Peeters < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I'm looking to use the Java JDBC adapter for loading tables from SQL >>>>>>> Server into Arrow record batches. >>>>>>> >>>>>>> At first glance the Arrow JDBC adapter seems to work well but, >>>>>>> unless I'm mistaken, it simply makes every vector nullable, >>>>>>> irrespective of >>>>>>> whether the corresponding SQL column is nullable or not. >>>>>>> >>>>>>> I think the line >>>>>>> >>>>>>> final FieldType fieldType = new FieldType(true, arrowType, /* >>>>>>> dictionary encoding */ null, metadata); >>>>>>> >>>>>>> in >>>>>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L158 >>>>>>> might be the cause here. >>>>>>> >>>>>>> Is my interpretation correct, or am I missing a setting of sorts? If >>>>>>> indeed correct, is there a fundamental reason the NULL-ness is not >>>>>>> transferred, or is this something I could contribute in a PR? (which >>>>>>> I'd be >>>>>>> happy to) I guess it's just a matter of inspecting the result metadata. >>>>>>> >>>>>>> Cheers, >>>>>>> -J >>>>>>> >>>>>>
