I'm trying to create a little utility to convert CSV files into ORC files.
I've noticed that the resulting ORC files don't seem quite correct,
though.  In an effort to create a simple reproducible test case, I just
changed the "Writing/Reading ORC Files" examples here:

https://orc.apache.org/docs/core-java.html

to create a file based on a pair of strings instead of integers.  The first
issue I hit is that TypeDescription.fromString() isn't available in 2.1.0,
but instead I did the following:

        TypeDescription schema = TypeDescription.createStruct()
            .addField("first", TypeDescription.createString())
            .addField("last", TypeDescription.createString());

Then I changed the loop as follows:

        BytesColumnVector first = (BytesColumnVector) writeBatch.cols[0];
        BytesColumnVector last = (BytesColumnVector) writeBatch.cols[1];
        for (int r = 0; r < 10; ++r)
        {
            String firstName = ("First-" + r).intern();
            String lastName = ("Last-" + (r * 3)).intern();
            ...
        }

The file writes without errors, and if I write it with no compression, I
can see the data using "strings my-file.orc".  However, when I then try to
read the data back from the file and print out the resulting batches to the
console, I get the following:

["       ", "      "]
["       ", "      "]
["       ", "      "]
["       ", "      "]
["       ", "       "]
["       ", "       "]
["       ", "       "]
["       ", "       "]
["       ", "       "]
["       ", "       "]

Any insights about what I may be doing wrong here would be greatly
appreciated!

Regards,
Scott

Reply via email to