I think this a gap in the streaming reader library where date isn't supported and there is a check to ensure that logical types aren't ignored.
This can be worked around by updating the map: https://github.com/apache/arrow/blob/main/cpp/src/parquet/stream_reader.cc#L40 Or by defining a new type/overload to read the values into. I think my preference would be to try to make this explicit if possible (i.e. new overload) but either is likely reasonable. Thanks, Micah On Saturday, November 18, 2023, <[email protected]> wrote: > Thank you for the tip Micah. It turns out I hadn’t build any compression > and my file used compression. After rebuilding with compression, I was able > to read the file. Next problem is how to cast a Parquet date (format: > 2023-11-18) column in C++. The rest of my code: > > > > parquet::StreamReader stream{ parquet::ParquetFileReader::Open(infile) }; > > > > int date; > > //int32_t date; > > std::string symbol; > > > > while (!stream.eof()) > > { > > stream >> date >> parquet::EndRow; > > // ... > > } > > > > Error occurred: Column converted type mismatch. Column 'date' has > converted type 'DATE' not 'INT_32'. > > > > I tried int, int32_t (same as int), time_t (long). > > > > Kind regards, > > > > Nick > > > > *Van:* Micah Kornfield <[email protected]> > *Verzonden:* vrijdag 17 november 2023 17:11 > *Aan:* [email protected] > *Onderwerp:* Re: [C++][Parquet] Unable to read memory?? > > > > I read this error as the program is crashing because the code is throwing > an exception that isn't being caught. Can you add code to catch the > exception and print the error message which might be more informative? > > > > Thanks, > > Micah > > On Friday, November 17, 2023, <[email protected]> wrote: > > These were the steps I followed: > > 1. Download from Github - https://github.com/apache/arrow - unzip it > 2. Open Developer PowerShell for VS 2022 as Administrator > 3. cd D:\path_to_arrow\14.0.1 > 4. cd .\cpp > 5. mkdir build > 6. cd build > 7. cmake .. -G "Visual Studio 17 2022" -A x64 -DARROW_BUILD_TESTS=ON > -DARROW_PARQUET=ON > 8. Open arrow.sln files in build folder in VS > 9. Build ALL_BUILD > 10. Copy arrow.dll, arrow.pdb, parquet.dll, parquet.pdb files to Debug > folder of project > > In VS project Solution Explorer > Properties: > 1. C/C++ > General > Additional Include Directories: add src directory > 2. Linker > General > Additional Library Directories: add > build/release/Debug directory > 3. Linker > Input > Additional Dependencies: arrow.lib;parquet.lib > > Cmake is version 3.27.7 > > > > -----Oorspronkelijk bericht----- > Van: Bryce Mecum <[email protected]> > Verzonden: donderdag 16 november 2023 20:43 > Aan: [email protected] > Onderwerp: Re: [C++][Parquet] Unable to read memory?? > > Your code is correct so I think something else is going on. Can you give > us more details about your environment, such as how you're getting the > Arrow C++ DLLs (nuget, conda, building from source) and how you're > compiling your program? > > > > On Thu, Nov 16, 2023 at 4:27 AM <[email protected]> wrote: > > > > Hi, > > > > > > > > I’m trying to get Parquet to work in C++. I have the following code: > > > > > > > > #include "arrow/io/api.h" > > > > #include "parquet/arrow/reader.h" > > > > #include "arrow/io/file.h" > > > > #include "parquet/stream_reader.h" > > > > > > > > int main() > > > > { > > > > std::shared_ptr<arrow::io::ReadableFile> infile; > > > > > > > > PARQUET_ASSIGN_OR_THROW( > > > > infile, > > > > > > arrow::io::ReadableFile::Open("D:/path_to_parquet_file/file.parquet")) > > ; > > > > } > > > > > > > > I get an error on PARQUET_ASSIGN_OR_THROW. It seems to be unable to read > memory. Exception that I’m getting: > > > > Unhandled exception at 0x00007FFE2866CF19 in cpp.exe: Microsoft C++ > exception: parquet::ParquetStatusException at memory location > 0x000000648DCFFC60.: parquet::ParquetStatusException at memory location > 0x000000648DCFFC60. > > > > > > > > What is wrong with this code? I’m using VS Community 2022 and Windows 10 > 64bit. > > > > > > > > Kind regards, > > > > > > > > Nick > >
