[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-04-16 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-820963616


   @pitrou Really thanks for your detailed comments! I have addressed all of 
them. Please review again since we need to release it. Thanks!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-04-14 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-819293482


   > Sorry still reviewing, will try to do more tomorrow.
   
   That's fine. I have addressed all the comments you gave and pushed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-04-11 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-817451185


   @pitrou I completely revamped adapter_util.cc and switched to 
arrow::ArrayDataVisitor. That is, all comments have been addressed. Please 
review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-14 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-798971411


   @pitrou Could you please check again? My next PR is going to be ready soon 
and it is dependent on this one merging to be clean.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-10 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-796515523


   @pitrou These two errors should be intermittent and have nothing to do with 
ORC. So please review again. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-08 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-793383602


   @pitrou Now all tests have passed. Please review.
   @kou That’s fine haha. Next time I will check these files as well!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-07 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792387596


   @kou  Really thanks! I didn't realize that arrow::Type::type::MAP was not 
fully supported!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-07 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792360387


   @kou Thanks! I think I have already made all the changes in GLib (more 
precisely the Ruby tests) I think I need to make. Not sure why adding maps to 
buildable.rb led to a complaint about data_type = builder.value_data_type 
though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-06 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792195893


   @pitrou  Now the problem has been fixed and it is ready for another review.
   @kou I managed to install Arrow GLib as a developer but didn't manage to run 
the tests in Ruby due to the following error:
   
/Users/karlkatzen/Documents/code/arrow-dev/arrow/c_glib/test/test-extension-data-type.rb:19:in
 `': uninitialized constant Arrow::ExtensionArray 
(NameError)
   The functionality I changed is in ARROW-1117, namely I load ORC MAPs as 
Arrow MAPs instead of LISTs of STRUCTs. 
arrow/c_glib/test/test-orc-file-reader.rb is the only file where tests fail. 
Please help me with the C_Glib part. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-06 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-792154086


   @pitrou I have found the cause of the problem which is me failing to account 
for the possibility of orc::Decimal64VectorBatch. I will edit my writer to 
account for that.
   
   After this PR I'd like to work with you on adapter/testing/random. As you 
said before tests need to be added. Moreover we should expand random things to 
RecordBatches, ChunkedArrays & Tables as well as all types.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-04 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-791209166


   @pitrou Yup I found your changes to Random 6 days ago in Arrow-11662. Things 
did break after that. Now I’m trying to figure out whether it is my ORC writer 
or Decimal128 generator that needs to be fixed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-04 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-790615411


   @pitrou It is due to some new issue in Decimal128s.  Starting from 
https://github.com/apache/arrow/pull/8648/commits/68fd76fc5eba3350f341eb0fd7e4f83f7c83c51e
 we suddenly began to get "NotImplemented: random decimal128 generation with 
precision > 18". I reduced precision to 18 which caused ORC to misbehave. Shall 
random decimal128 generation with precision > 18 be impossible? It used to be 
perfectly fine.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-03 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-790297527


   @pitrou  Yes now it is ready for another review. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-02-08 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-775537466


   > @mathyingzhou Feel free to ping me when this is ready for review again!
   
   Thanks! Right now it seems that I will have to fix a bug in 
arrow/testing/random first if I want to further shorten the tests. I really 
don’t want to make this important PR dependent on these things though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-02-05 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-773880147


   I will make some further changes to the code to address Micah Kornfield's 
concerns but pretty much that's it. Further simplification of the tests require 
serious expansion of arrow/testing/random and arrow/compute/cast to include 
nested types (and in the case of the latter at least the fixed_size_binary -> 
binary cast) or it will remain very ugly. If possible I can work on such 
features in at least 2 separate PRs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-02-05 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-773880147


   I will make some further changes to the code to address Micah Kornfield's 
concerns but pretty much that's it. Further simplification of the tests require 
serious expansion of arrow/testing/random and arrow/compute/cast to include 
nested types (and in the case of the latter at least the fixed_size_binary -> 
binary cast) or it will remain very ugly. If possible I can work on such 
features in at least 2 separate PRs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-21 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-764050107


   > I didn't see anything rust related in this PR so I removed the Rust label
   
   @alamb Sorry it didn’t appear in the right place but there is nothing 
parquet-related in this PR either. Can the parquet label be removed? Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-20 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-764050107


   > I didn't see anything rust related in this PR so I removed the Rust label
   
   @alamb Sorry it didn’t appear in the right place but there is nothing 
parquet-related in this PR either. Can the parquet label be removed? Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-13 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-759780885


   @nevi-me Thanks! That’s done! I’m waiting for the PR to be reviewed and 
merged..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-12 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-759104480


   @xhochy Please review it when you can. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-10 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-757507160


   I have finished the Python binding as well. Note that I have made no changes 
to the Rust code.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org