[GitHub] [arrow] nevi-me commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support
nevi-me commented on pull request #7319: URL: https://github.com/apache/arrow/pull/7319#issuecomment-673589031 Merged as https://github.com/apache/arrow/commit/80a9c027b7c356f25a4c22e71587936a54959db6, not sure why the merge tool didn't close the issue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support
nevi-me commented on pull request #7319: URL: https://github.com/apache/arrow/pull/7319#issuecomment-670899165 @sunchao @andygrove (CC @wesm @kszucs @emkornfield) in the past few months we haven't had enough review bandwidth on Rust's Parquet implementation (mostly relying on Chao for non-trivial reviews), and given the amount of work needed for an Arrow writer + the interest so far (I think few people already using this fork), I'd like to propose: * We create a temporary branch in the apache/arrow repo, where the arrow writer can temporarily live * We can merge changes into the branch, esp if there aren't enough reviewers at the time * When we're close to a release, we merge what's on the temp branch into the branch that's currently called `master` but will be renamed soon ITO this PR, I think I've gotten arbitrary nesting covered, but there's a lot more work that we can now divide more easily so others can contribute better. I'm also unsure of how to test deeply nested arrays directly in the code (I had to use Spark because Arrow reader doesn't yet support that). I'll also bring this up in the mailing list for wider visibility This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support
nevi-me commented on pull request #7319: URL: https://github.com/apache/arrow/pull/7319#issuecomment-670878573 > ... One thing to note which is currently a bug in C++ is once rep/def levels are computed for any anything > with deep nesting (any leaf column one or more direct struct/group ancestor), nullness should be determined rep/def-levels and not leaf-arrays (this is currently a bug in C++). @emkornfield not sure if I understand this part, I'll try create a nested batch with a few levels, and have one record have the top level be nested. Would this cover the case above? I might also be limited by https://issues.apache.org/jira/browse/ARROW-5408 for now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support
nevi-me commented on pull request #7319: URL: https://github.com/apache/arrow/pull/7319#issuecomment-669128126 @maxburke there's been some interest from other people on this PR. I haven't been able to continue working on it because where I have a bit of free time I've been looking at the IPC/integration issues (Rust doesn't work with 0.15+ files). Please feel free to push changes against this PR, or to open a PR against my fork with upstream changes. There's also someone who reached out to me on Twitter asking how they can continue with this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org