You need to call setRef on the BytesColumnVectors. The relevant part is:
byte[] buffer = ("Last-" + (r * 3)).getBytes(StandardCharsets.UTF_8);
y.setRef(row, buffer, 0, buffer.length);
I've created a gist with the example modified to do one int and one string,
here:
https://gist.github.com/omalley/75093e104381ab9d157313993afcbbdf
I realized that we should include the example code in the code base and
created ORC-116.
.. Owen
On Tue, Dec 6, 2016 at 6:52 AM, Scott Wells <[email protected]> wrote:
> I'm trying to create a little utility to convert CSV files into ORC
> files. I've noticed that the resulting ORC files don't seem quite correct,
> though. In an effort to create a simple reproducible test case, I just
> changed the "Writing/Reading ORC Files" examples here:
>
> https://orc.apache.org/docs/core-java.html
>
> to create a file based on a pair of strings instead of integers. The
> first issue I hit is that TypeDescription.fromString() isn't available in
> 2.1.0, but instead I did the following:
>
> TypeDescription schema = TypeDescription.createStruct()
> .addField("first", TypeDescription.createString())
> .addField("last", TypeDescription.createString());
>
> Then I changed the loop as follows:
>
> BytesColumnVector first = (BytesColumnVector) writeBatch.cols[0];
> BytesColumnVector last = (BytesColumnVector) writeBatch.cols[1];
> for (int r = 0; r < 10; ++r)
> {
> String firstName = ("First-" + r).intern();
> String lastName = ("Last-" + (r * 3)).intern();
> ...
> }
>
> The file writes without errors, and if I write it with no compression, I
> can see the data using "strings my-file.orc". However, when I then try to
> read the data back from the file and print out the resulting batches to the
> console, I get the following:
>
> [" ", " "]
> [" ", " "]
> [" ", " "]
> [" ", " "]
> [" ", " "]
> [" ", " "]
> [" ", " "]
> [" ", " "]
> [" ", " "]
> [" ", " "]
>
> Any insights about what I may be doing wrong here would be greatly
> appreciated!
>
> Regards,
> Scott
>