No CSV in these instances. On Wed., 24 Mar. 2021, 2:26 pm Micah Kornfield, <[email protected]> wrote:
> What is the source of the record batch? There was a patch since 3.0 that > fixed some potential memory corruption when reading parquet in certain > scenarios (but from the description it doesn't sound like libparquet is > being used?) > > On Tue, Mar 23, 2021 at 8:04 PM Matt Youill <[email protected]> > wrote: > >> So this seems to be caused by the variable in memory_pool.cc: >> >> const util::optional<MemoryPoolBackend> user_selected_backend = >> UserSelectedBackend(); >> >> being (or becoming) garbage. >> >> For some reason, after a few Gandiva batch evaluations >> user_selected_backend is no longer "jemalloc" but "system" (probably >> actually just null because "system" is 0) and after a while it isn't valid >> at all and crashes. >> >> There aren't multiple copies of Arrow AFAICT but I do have two apps using >> arrow. Both use libarrow.a, libarrow-glib.a and libgandiva.a... one (that >> I'm not super familiar with) shows the above behavior and the other doesn't. >> >> On 22/3/21 10:27 pm, Matt Youill wrote: >> >> Could be the build creating multiple Arrows I suppose. It's a mixture of >> quite an old Makefile calling cmake to build arrow and arrow c lib. >> >> Will double check. >> >> Thanks, Matt >> >> On Mon., 22 Mar. 2021, 9:35 pm Antoine Pitrou, <[email protected]> >> wrote: >> >>> On Mon, 22 Mar 2021 19:34:19 +1100 >>> Matt Youill <[email protected]> wrote: >>> > Hi, >>> > >>> > Not sure if anyone knows anything about this, but am getting a strange >>> > error when evaluating a record batch with a gandiva filter... >>> > >>> > __GI_raise 0x00007f2b8f01718b >>> > __GI_abort 0x00007f2b8eff6859 >>> > arrow::util::ArrowLog::~ArrowLog() 0x000056309fe94c12 >>> > arrow::default_memory_pool() 0x000056309fd6fff4 >>> > gandiva::Annotator::PrepareEvalBatch(arrow::RecordBatch const&, >>> > std::vector<std::shared_ptr<arrow::ArrayData>, >>> > std::allocator<std::shared_ptr<arrow::ArrayData> > > const&) >>> > 0x000056309facdfce >>> > gandiva::LLVMGenerator::Execute(arrow::RecordBatch const&, >>> > std::vector<std::shared_ptr<arrow::ArrayData>, >>> > std::allocator<std::shared_ptr<arrow::ArrayData> > > const&) >>> > 0x000056309faa66a2 >>> > gandiva::Filter::Evaluate(arrow::RecordBatch const&, >>> > std::shared_ptr<gandiva::SelectionVector>) 0x000056309fa9ea1d >>> > >>> > >>> > The error reported is "Internal error: cannot create default memory >>> pool" >>> > >>> > I'm using jemalloc >>> > >>> > Not even really sure how a call to arrow::default_memory_pool() can >>> > fail? This is only occurring in a release build if that helps? >>> >>> This logically should not happen. How did you compile Arrow and >>> Gandiva? Do you have two versions of Arrow lying around perhaps? >>> >>> >>> >>>
