This problem was due to an error on my part, not an Avro bug. The bb.array()
call below returns the backing array for the ByteBuffer bb (so the programmer
can mutate the ByteBuffer’s contents via the array), which will have extra
bytes in it if the ByteBuffer’s capacity is larger than its limit. Avro
sometimes reuses (and clears) a ByteBuffer of larger capacity. In this case,
capacity 4 was reused for limit 3.
Here is the code in avro/io/BinaryDecoder.java that is reusing the larger
buffer:
@Override
public ByteBuffer readBytes(ByteBuffer old) throws IOException {
int length = readInt();
ByteBuffer result;
if (old != null && length <= old.capacity()) {
result = old; // REUSE THE OLD BUFFER, WHICH MAY BE LARGER.
result.clear(); // THE OLD BUFFER IS CLEARED HERE, WHICH MIGHT BE
SLOWER THAN ALLOCATING A NEW ONE.
} else {
result = ByteBuffer.allocate(length);
}
doReadBytes(result.array(), result.position(), length);
result.limit(length);
return result;
}
From: Dave Oshinsky [mailto:[email protected]]
Sent: Friday, December 04, 2015 11:49 PM
To: [email protected]
Subject: Re: bytes array for decimal logical type sometimes corrupted
No, the byte array (directly from byte buffer) is directly converted into
BigInteger. Logic for reading from avro or parquet (the latter using avro
parquet reader) is like this:
ByteBuffer bb = (ByteBuffer) obj; // obtained from GenericRecord
byte[] b = bb.array();
BigInteger bi = new BigInteger(b);
BigDecimal bd = new BigDecimal(bi, scale);
Logic for writing to avro or parquet (the latter using avro parquet writer) is
like this:
BigDecimal bd = (BigDecimal) obj;
...
BigInteger bi = bd.unscaledValue();
byte[] barray = bi.toByteArray();
putBinary(rec, name, barray);
...
public static void putBinary(GenericRecord rec, String name, byte[] b) {
ByteBuffer byteBuffer = ByteBuffer.allocate(b.length);
byteBuffer.put(b);
byteBuffer.rewind(); // if not done, writes nothing to avro or parquet
rec.put(name, byteBuffer);
}
The byte array that was written into Avro was 3 bytes in length, as explained
earlier. The byte array that was read from Avro was 4 bytes in length, with
the decimal 32 (space character) padding the end.
Either I am using Avro incorrectly, or the bytes that are read back are not the
same as the bytes originally written. The exact same code works properly with
Parquet (accessed using Avro Parquet reader and writer). I followed the format
for the bytes (representing the decimal number) as specified here:
https://avro.apache.org/docs/1.7.7/spec.html#Decimal
________________________________
From: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>>
Sent: Friday, December 4, 2015 6:34 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: bytes array for decimal logical type sometimes corrupted
When reading the byte[] from avro do you copy the bytebuffer to your own array?
From: Dave Oshinsky [mailto:[email protected]]
Sent: Friday, December 04, 2015 05:47 PM
To: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>>
Subject: bytes array for decimal logical type sometimes corrupted
I am a new user of Avro 1.7.7. My (Java) application is reading rows from an
Oracle DB, and archiving them to Avro (and Parquet). For NUMBER Oracle data,
my code converts the unscaled BigInteger (from BigDecimal) number into a bytes
array, and archives that to Avro using a ByteBuffer in the GenericRecord. In
one case, the NUMBER value from Oracle is 14099 (precision 8, scale 2, for a
column named “CURBAL”), which is archived to Avro based on the unscaled value
of 1409900. This corresponds to a bytes array of length 3, consisting of these
3 bytes (decimal values): 21, -125, and 108. When my code reads this CURBAL
value back from Avro, it is corrupted (padded?), with a fourth byte added that
happens to be decimal 32 (an ASCII space), i.e., the 4 byte decimal values seen
upon reading back from Avro are 21, -125, 108, and 32. Has anyone seen a
similar issue? I am archiving the same data to Parquet, and reading it back
without any corruption. I am wondering whether I am using Avro improperly here.
The schema that I’m using is shown below my signature, with various additions
(look for “cv_” prepended in the key) for JDBC ResultSetMetaData info that my
code is preserving for later on. I have attached sample Avro and Parquet files
to this email, with corruption in CURBAL of the tenth record of the Avro. (I
realize that the attachments may not get forwarded – let me know if I should
send them to you individually.) Interestingly, if I write this tenth record to
another Avro file as the first record, it does not get corrupted (an
alignment/padding issue?).
Thanks in advance,
Dave Oshinsky
Commvault Systems
[email protected]<mailto:[email protected]>
JSON schema:
{
"type" : "record",
"name" : "my_table",
"namespace" : "com.commvault",
"fields" : [ {
"name" : "ACCT_NO",
"type" : {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 20,
"scale" : 0,
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.math.BigDecimal",
"cv_connection" : "oracle.jdbc.driver.T4CConnection",
"cv_currency" : true,
"cv_def_writable" : false,
"cv_nullable" : 0,
"cv_precision" : 20,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 1,
"cv_type" : 2,
"cv_typename" : "NUMBER",
"cv_writable" : true
}
}, {
"name" : "SF_NO",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 10,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 2,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "LF_NO",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 10,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 3,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "BRANCH_NO",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 20,
"scale" : 0,
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.math.BigDecimal",
"cv_currency" : true,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 20,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 4,
"cv_type" : 2,
"cv_typename" : "NUMBER",
"cv_writable" : true
} ]
}, {
"name" : "INTRO_CUST_NO",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 20,
"scale" : 0,
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.math.BigDecimal",
"cv_currency" : true,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 20,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 5,
"cv_type" : 2,
"cv_typename" : "NUMBER",
"cv_writable" : true
} ]
}, {
"name" : "INTRO_ACCT_NO",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 20,
"scale" : 0,
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.math.BigDecimal",
"cv_currency" : true,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 20,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 6,
"cv_type" : 2,
"cv_typename" : "NUMBER",
"cv_writable" : true
} ]
}, {
"name" : "INTRO_SIGN",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 1,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 7,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "TYPE",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 2,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 8,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "OPR_MODE",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 2,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 9,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "CUR_ACCT_TYPE",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 4,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 10,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "TITLE",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 30,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 11,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "CORP_CUST_NO",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 20,
"scale" : 0,
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.math.BigDecimal",
"cv_currency" : true,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 20,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 12,
"cv_type" : 2,
"cv_typename" : "NUMBER",
"cv_writable" : true
} ]
}, {
"name" : "APLNDT",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.sql.Timestamp",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 0,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 13,
"cv_type" : 93,
"cv_typename" : "DATE",
"cv_writable" : true
} ]
}, {
"name" : "OPNDT",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.sql.Timestamp",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 0,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 14,
"cv_type" : 93,
"cv_typename" : "DATE",
"cv_writable" : true
} ]
}, {
"name" : "VERI_EMP_NO",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 20,
"scale" : 0,
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.math.BigDecimal",
"cv_currency" : true,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 20,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 15,
"cv_type" : 2,
"cv_typename" : "NUMBER",
"cv_writable" : true
} ]
}, {
"name" : "VERI_SIGN",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 1,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 16,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "MANAGER_SIGN",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 1,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 17,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
}, {
"name" : "CURBAL",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 8,
"scale" : 2,
"cv_auto_incr" : false,
"cv_case_sensitive" : false,
"cv_column_class" : "java.math.BigDecimal",
"cv_currency" : true,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 8,
"cv_read_only" : false,
"cv_scale" : 2,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 18,
"cv_type" : 2,
"cv_typename" : "NUMBER",
"cv_writable" : true
} ]
}, {
"name" : "STATUS",
"type" : [ "null", {
"type" : "string",
"cv_auto_incr" : false,
"cv_case_sensitive" : true,
"cv_column_class" : "java.lang.String",
"cv_currency" : false,
"cv_def_writable" : false,
"cv_nullable" : 1,
"cv_precision" : 1,
"cv_read_only" : false,
"cv_scale" : 0,
"cv_searchable" : true,
"cv_signed" : true,
"cv_subscript" : 19,
"cv_type" : 12,
"cv_typename" : "VARCHAR2",
"cv_writable" : true
} ]
} ]
}
***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************
PLEASE READ: This message is for the named person's use only. It may contain
confidential, proprietary or legally privileged information. No confidentiality
or privilege is waived or lost by any mistransmission. If you receive this
message in error, please delete it and all copies from your system, destroy any
hard copies and notify the sender. You must not, directly or indirectly, use,
disclose, distribute, print, or copy any part of this message if you are not
the intended recipient. Nomura Holding America Inc., Nomura Securities
International, Inc, and their respective subsidiaries each reserve the right to
monitor all e-mail communications through its networks. Any views expressed in
this message are those of the individual sender, except where the message
states otherwise and the sender is authorized to state the views of such
entity. Unless otherwise stated, any pricing information in this message is
indicative only, is subject to change and does not constitute an offer to deal
at any price quoted. Any reference to the terms of executed transactions should
be treated as preliminary only and subject to our formal written confirmation.
***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************
***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************