Could you resend your reply to https://lists.apache.org/thread/5rpykkfoz416mq889pcpx9rwrrtjog60 on dev@ to connect the existing thread?
In <CAJdzkC2Rdz9wfM1_a3V4TqWF-U-3gs0TztHfdpkvKcxphdx=d...@mail.gmail.com> "Re: StreamReader" on Tue, 12 Jul 2022 10:01:00 +0200, L Ait <[email protected]> wrote: > Thank you, I will look on that, > The real problem is that I read data in chunks and the end of the chunk is > truncated (not a complete line) . I need to wait for the next chunk to have > the line completion. > > Is there a way you suggest to process only the chunks smoothly ? > > Thank you > > > Le ven. 8 juil. 2022 à 03:37, Sutou Kouhei <[email protected]> a écrit : > >> Answered on dev@: >> https://lists.apache.org/thread/5rpykkfoz416mq889pcpx9rwrrtjog60 >> >> In <CAJdzkC04+Uxa6bdmozPQFDkQ07M4Q=fmuhh2gvqzz-na2lm...@mail.gmail.com> >> "StreamReader" on Sat, 2 Jul 2022 16:04:45 +0200, >> L Ait <[email protected]> wrote: >> >> > Hi, >> > >> > I need help to integrate arrow cpp in my current project. In fact I built >> > cpp library and can call api. >> > >> > What I need is that: >> > >> > I have a c++ project that reads data by chunks then uses some erasure >> code >> > to rebuild original data. >> > >> > The rebuild is done in chunks , At each iteration I can access a buffer >> of >> > rebuilt data. >> > >> > My need is to pass this data as a stream to arrow process then send the >> > processed stream. >> > >> > For example if my original file is a csv and I would like to filter and >> > save first column: >> > >> > file >> > >> > col1,col2, col3, col3 >> > a1,b1,c1,d1 >> > an,bn,cn,dn >> > >> > split to 6 chunks of equal sizes chunk1: >> > >> > a1,b1,c1,d1 >> > ak,bk >> > >> > chunk2: >> > >> > ck,dk >> > ... >> > am,bm,cm,dm >> > >> > and so on. >> > >> > My question is how to use the right StreamReader in arrow and how this >> > deals with in complete records( lines) at the beginning and end of each >> > chunk ? >> > >> > Here a snippet of code I use : >> > buffer_type_t res = fut.get0(); >> > BOOST_LOG_TRIVIAL(trace) << >> > "RawxBackendReader: Got result with buffer size: " << res.size(); >> > std::shared_ptr<arrow::io::InputStream> input; >> > >> > std::shared_ptr<arrow::io::BufferReader> buffer(new >> arrow::io::BufferReader( >> > reinterpret_cast<const uint8_t*>(res.get()), res.size())); >> > input = buffer; >> > BOOST_LOG_TRIVIAL(trace) << "laa type input" << input.get(); >> > >> > ArrowFilter arrow_filter = ArrowFilter(input); >> > arrow_filter.ToCsv(); >> > >> > >> > result.push_back(std::move(res)); >> > >> > Thank you >>
