[GitHub] [arrow] nevi-me commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support

2020-08-13 Thread GitBox


nevi-me commented on pull request #7319:
URL: https://github.com/apache/arrow/pull/7319#issuecomment-673589031


   Merged as 
https://github.com/apache/arrow/commit/80a9c027b7c356f25a4c22e71587936a54959db6,
 not sure why the merge tool didn't close the issue



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support

2020-08-08 Thread GitBox


nevi-me commented on pull request #7319:
URL: https://github.com/apache/arrow/pull/7319#issuecomment-670899165


   @sunchao @andygrove (CC @wesm @kszucs @emkornfield) in the past few months 
we haven't had enough review bandwidth on Rust's Parquet implementation (mostly 
relying on Chao for non-trivial reviews), and given the amount of work needed 
for an Arrow writer + the interest so far (I think few people already using 
this fork), I'd like to propose:
   
   * We create a temporary branch in the apache/arrow repo, where the arrow 
writer can temporarily live
   * We can merge changes into the branch, esp if there aren't enough reviewers 
at the time
   * When we're close to a release, we merge what's on the temp branch into the 
branch that's currently called `master` but will be renamed soon  
   
   ITO this PR, I think I've gotten arbitrary nesting covered, but there's a 
lot more work that we can now divide more easily so others can contribute 
better. I'm also unsure of how to test deeply nested arrays directly in the 
code (I had to use Spark because Arrow reader doesn't yet support that).
   
   I'll also bring this up in the mailing list for wider visibility



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support

2020-08-08 Thread GitBox


nevi-me commented on pull request #7319:
URL: https://github.com/apache/arrow/pull/7319#issuecomment-670878573


   > ... One thing to note which is currently a bug in C++ is once rep/def 
levels are computed for any anything
   > with deep nesting (any leaf column one or more direct struct/group 
ancestor), nullness should be determined rep/def-levels and not leaf-arrays 
(this is currently a bug in C++).
   
   @emkornfield not sure if I understand this part, I'll try create a nested 
batch with a few levels, and have one record have the top level be nested. 
Would this cover the case above? I might also be limited by 
https://issues.apache.org/jira/browse/ARROW-5408 for now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on pull request #7319: ARROW-8289: [Rust] Parquet Arrow writer with nested support

2020-08-05 Thread GitBox


nevi-me commented on pull request #7319:
URL: https://github.com/apache/arrow/pull/7319#issuecomment-669128126


   @maxburke there's been some interest from other people on this PR. I haven't 
been able to continue working on it because where I have a bit of free time 
I've been looking at the IPC/integration issues (Rust doesn't work with 0.15+ 
files).
   
   Please feel free to push changes against this PR, or to open a PR against my 
fork with upstream changes. There's also someone who reached out to me on 
Twitter asking how they can continue with this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org