As an example of why having the code be executable is a good idea, I
noticed that I was dropping the last batch and needed to add:

if (batch.size != 0) {
  writer.addRowBatch(batch);
}

before the close.

.. Owen

On Tue, Dec 6, 2016 at 8:35 AM, Owen O'Malley <[email protected]> wrote:

> You need to call setRef on the BytesColumnVectors. The relevant part is:
>
> byte[] buffer = ("Last-" + (r * 3)).getBytes(StandardCharsets.UTF_8);
> y.setRef(row, buffer, 0, buffer.length);
>
> I've created a gist with the example modified to do one int and one
> string, here:
>
> https://gist.github.com/omalley/75093e104381ab9d157313993afcbbdf
>
> I realized that we should include the example code in the code base and
> created ORC-116.
>
> .. Owen
>
> On Tue, Dec 6, 2016 at 6:52 AM, Scott Wells <[email protected]> wrote:
>
>> I'm trying to create a little utility to convert CSV files into ORC
>> files.  I've noticed that the resulting ORC files don't seem quite correct,
>> though.  In an effort to create a simple reproducible test case, I just
>> changed the "Writing/Reading ORC Files" examples here:
>>
>> https://orc.apache.org/docs/core-java.html
>>
>> to create a file based on a pair of strings instead of integers.  The
>> first issue I hit is that TypeDescription.fromString() isn't available in
>> 2.1.0, but instead I did the following:
>>
>>         TypeDescription schema = TypeDescription.createStruct()
>>             .addField("first", TypeDescription.createString())
>>             .addField("last", TypeDescription.createString());
>>
>> Then I changed the loop as follows:
>>
>>         BytesColumnVector first = (BytesColumnVector) writeBatch.cols[0];
>>         BytesColumnVector last = (BytesColumnVector) writeBatch.cols[1];
>>         for (int r = 0; r < 10; ++r)
>>         {
>>             String firstName = ("First-" + r).intern();
>>             String lastName = ("Last-" + (r * 3)).intern();
>>             ...
>>         }
>>
>> The file writes without errors, and if I write it with no compression, I
>> can see the data using "strings my-file.orc".  However, when I then try to
>> read the data back from the file and print out the resulting batches to the
>> console, I get the following:
>>
>> ["       ", "      "]
>> ["       ", "      "]
>> ["       ", "      "]
>> ["       ", "      "]
>> ["       ", "       "]
>> ["       ", "       "]
>> ["       ", "       "]
>> ["       ", "       "]
>> ["       ", "       "]
>> ["       ", "       "]
>>
>> Any insights about what I may be doing wrong here would be greatly
>> appreciated!
>>
>> Regards,
>> Scott
>>
>
>

Reply via email to