Re: Unable to write string data into ORC file (or at least read it back)

Owen O'Malley Tue, 06 Dec 2016 08:35:54 -0800

You need to call setRef on the BytesColumnVectors. The relevant part is:

byte[] buffer = ("Last-" + (r * 3)).getBytes(StandardCharsets.UTF_8);
y.setRef(row, buffer, 0, buffer.length);


I've created a gist with the example modified to do one int and one string,
here:

https://gist.github.com/omalley/75093e104381ab9d157313993afcbbdf

I realized that we should include the example code in the code base and
created ORC-116.

.. Owen

On Tue, Dec 6, 2016 at 6:52 AM, Scott Wells <[email protected]> wrote:

> I'm trying to create a little utility to convert CSV files into ORC
> files.  I've noticed that the resulting ORC files don't seem quite correct,
> though.  In an effort to create a simple reproducible test case, I just
> changed the "Writing/Reading ORC Files" examples here:
>
> https://orc.apache.org/docs/core-java.html
>
> to create a file based on a pair of strings instead of integers.  The
> first issue I hit is that TypeDescription.fromString() isn't available in
> 2.1.0, but instead I did the following:
>
>         TypeDescription schema = TypeDescription.createStruct()
>             .addField("first", TypeDescription.createString())
>             .addField("last", TypeDescription.createString());
>
> Then I changed the loop as follows:
>
>         BytesColumnVector first = (BytesColumnVector) writeBatch.cols[0];
>         BytesColumnVector last = (BytesColumnVector) writeBatch.cols[1];
>         for (int r = 0; r < 10; ++r)
>         {
>             String firstName = ("First-" + r).intern();
>             String lastName = ("Last-" + (r * 3)).intern();
>             ...
>         }
>
> The file writes without errors, and if I write it with no compression, I
> can see the data using "strings my-file.orc".  However, when I then try to
> read the data back from the file and print out the resulting batches to the
> console, I get the following:
>
> ["       ", "      "]
> ["       ", "      "]
> ["       ", "      "]
> ["       ", "      "]
> ["       ", "       "]
> ["       ", "       "]
> ["       ", "       "]
> ["       ", "       "]
> ["       ", "       "]
> ["       ", "       "]
>
> Any insights about what I may be doing wrong here would be greatly
> appreciated!
>
> Regards,
> Scott
>

Re: Unable to write string data into ORC file (or at least read it back)

Reply via email to