Re: Avro to ORC Conversion

2020-07-15 Thread Dongjoon Hyun
Hi Ryan. If you need to build from the scratch, you may want to see a standalone converter example in Apache ORC repository. - https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/convert/ConvertTool.java Although it doesn't support Avro, there are CsvReader

Avro to ORC Conversion

2020-07-15 Thread Ryan Schachte
I'm writing a standalone Java process and interested in converting the consumed Avro messages to ORC. I've seen a plethora of examples of writing to ORC, but the conversion to ORC from Avro is what I can't seem to find a lot of examples of. This is just a standard Java process running inside of a

[GitHub] [orc] dongjoon-hyun edited a comment on pull request #517: Fix build with clang-10

2020-07-15 Thread GitBox
dongjoon-hyun edited a comment on pull request #517: URL: https://github.com/apache/orc/pull/517#issuecomment-658275200 @nikitamikhaylov . Apache ORC has a dockerized testing environment like the following. When I ran this PR on clang9 on Ubuntu 20.04, this didn't pass the test. So, I'm

[GitHub] [orc] shameersss1 commented on pull request #505: ORC-626: Reading Struct Column Having Multiple Fields With Same Name Causes java.io.EOFException

2020-07-15 Thread GitBox
shameersss1 commented on pull request #505: URL: https://github.com/apache/orc/pull/505#issuecomment-658563394 @omalley @prasanthj ping for review This is an automated message from the Apache Git Service. To respond to the

[GitHub] [orc] dongjoon-hyun opened a new pull request #519: Add macOS 10.15.5 to Travis CI

2020-07-15 Thread GitBox
dongjoon-hyun opened a new pull request #519: URL: https://github.com/apache/orc/pull/519 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: Avro to ORC Conversion

2020-07-15 Thread Matt Burgess
Ryan, In Apache NiFi we have a ConvertAvroToOrc processor [1], you may find code there that you can use in your Java program (take a look at line 212 and down). We had to create our own OrcFileWriter because the one in Apache ORC writes to a FileSystem where we needed to write to our own FlowFile

[GitHub] [orc] dongjoon-hyun commented on pull request #519: ORC-647: Add macOS 10.15 test to Travis CI

2020-07-15 Thread GitBox
dongjoon-hyun commented on pull request #519: URL: https://github.com/apache/orc/pull/519#issuecomment-659162096 Hi, @omalley . Could you review this PR when you have some time? This is an automated message from the Apache

Re: Avro to ORC Conversion

2020-07-15 Thread Matt Burgess
Ryan, It's possible there are some changes that would cause that code not to compile for Hive 2, but I have done some work with porting similar processors to Hive 2 and as I recall it was mostly API-type breaking changes and not so much from the behavior side of things, more of a Maven and

Re: Avro to ORC Conversion

2020-07-15 Thread Ryan Schachte
Great, thanks Matt! Looking at this code now and feel this will really help me a lot. Anything you think would break using this logic for Hive 2.3.5? On Wed, Jul 15, 2020 at 5:04 PM Matt Burgess wrote: > Ryan, > > In Apache NiFi we have a ConvertAvroToOrc processor [1], you may find > code

[jira] [Created] (ORC-647) Add macOS 10.15 test to Travis CI

2020-07-15 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created ORC-647: - Summary: Add macOS 10.15 test to Travis CI Key: ORC-647 URL: https://issues.apache.org/jira/browse/ORC-647 Project: ORC Issue Type: Test Components:

Re: Avro to ORC Conversion

2020-07-15 Thread Ryan Schachte
Great response, thanks for the info. Question on the Spark approach, as I have been thinking about this, but don't have a lot of knowledge in the area. Let's say I develop my application, let's call it AvroToORC.java which has a dependency on Spark. Would the entire thing run within the same