[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-23 Thread GitBox
zenfenan commented on issue #11474: URL: https://github.com/apache/flink/pull/11474#issuecomment-618300290 Understood. I assume this suggestion comes from the belief that the users may update or add new metadata on the fly based on some external factor (either based on the input element

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-20 Thread GitBox
zenfenan commented on issue #11474: URL: https://github.com/apache/flink/pull/11474#issuecomment-616701185 @kl0u I agree with the point that the VectorizedRowBatch shouldn't be exposed to the implementor. I am changing `vectorize()` method to take VectorizedRowBatch as an argument along

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-18 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-615831483 @kl0u I have moved the implementation to `flink-orc` module. I think it should be easier to take a complete look now. :)

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-17 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-615329178 @kl0u I checked your branch and I think it looks exactly like what you had commented earlier. My changes pretty much are

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-15 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-614053436 Yes, @kl0u I am moving the codebase to `flink-orc` but before that I think we need to bump ORC dependency

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-13 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-612987289 @kl0u Not at all. In fact, I have added `appendUserMetadata()` already in the latest commit which I pushed yesterday. As

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-12 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-61261 @kl0u @JingsongLi I have updated the PR with the changes. Appreciate if you could take a look. Thanks :)

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-11 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-612393911 @kl0u On the Vectorizer improvement, I believe we cannot have the addUserMetadata() in Vectorizer class because it needs

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-09 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-611569646 @kl0u Hey Kostas. Thanks for the suggestions. I agree with the improvements in `Vectorizer`. I actually had already made

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-04-01 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-607620461 @kl0u and @JingsongLi Thank you so much for reviewing the work. Regarding the the way `VectorizedRowBatch` gets handled,

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-03-30 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-605911657 Cool. I'll update the PR. This is an automated message

[GitHub] [flink] zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink

2020-03-30 Thread GitBox
zenfenan commented on issue #11474: FLINK-10114: Add ORC BulkWriter support for StreamingFileSink URL: https://github.com/apache/flink/pull/11474#issuecomment-605903600 @kl0u Got it. I can do that. Where should that go in? Is it okay if I just add them in Javadoc comments itself?