[jira] [Updated] (AVRO-3841) Align the specification of the way to encode NaN to the actual implementations
[ https://issues.apache.org/jira/browse/AVRO-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated AVRO-3841: - Labels: pull-request-available (was: ) > Align the specification of the way to encode NaN to the actual implementations > -- > > Key: AVRO-3841 > URL: https://issues.apache.org/jira/browse/AVRO-3841 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Affects Versions: 1.12.0 >Reporter: Kousuke Saruta >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The specification says about the way to encode float/double like as follows. > {code} > a float is written as 4 bytes. The float is converted into a 32-bit integer > using a method equivalent to Java’s floatToIntBits and then encoded in > little-endian format. > a double is written as 8 bytes. The double is converted into a 64-bit integer > using a method equivalent to Java’s doubleToLongBits and then encoded in > little-endian format. > {code} > But the actual implementation in Java uses > floatToRawIntBits/doubleToRawLongBits rather than > floatToIntBits/doubleToLongBits. > The they are different in the way to encode NaN. > floatToIntBits/doubleToLongBits doesn't distinguish between NaN and -NaN but > floatToRawIntBits/doubleToRawLongBits does. > I confirmed all the implementation distinguish between NaN and -NaN. > So, I think it's better to modify the specification. > Java > {code} > public static int encodeFloat(float f, byte[] buf, int pos) { > final int bits = Float.floatToRawIntBits(f); > buf[pos + 3] = (byte) (bits >>> 24); > buf[pos + 2] = (byte) (bits >>> 16); > buf[pos + 1] = (byte) (bits >>> 8); > buf[pos] = (byte) (bits); > return 4; > } > public static int encodeDouble(double d, byte[] buf, int pos) { > final long bits = Double.doubleToRawLongBits(d); > int first = (int) (bits & 0x); > int second = (int) ((bits >>> 32) & 0x); > // the compiler seems to execute this order the best, likely due to > // register allocation -- the lifetime of constants is minimized. > buf[pos] = (byte) (first); > buf[pos + 4] = (byte) (second); > buf[pos + 5] = (byte) (second >>> 8); > buf[pos + 1] = (byte) (first >>> 8); > buf[pos + 2] = (byte) (first >>> 16); > buf[pos + 6] = (byte) (second >>> 16); > buf[pos + 7] = (byte) (second >>> 24); > buf[pos + 3] = (byte) (first >>> 24); > return 8; > } > {code} > Rust > {code} > Value::Float(x) => buffer.extend_from_slice(_le_bytes()), > Value::Double(x) => buffer.extend_from_slice(_le_bytes()), > {code} > Python > {code} > def write_float(self, datum: float) -> None: > > """ > > A float is written as 4 bytes. > > The float is converted into a 32-bit integer using a method > equivalent to > Java's floatToIntBits and then encoded in little-endian format. > > """ > > self.write(STRUCT_FLOAT.pack(datum)) > def write_double(self, datum: float) -> None: > > """ > > A double is written as 8 bytes. > > The double is converted into a 64-bit integer using a method > equivalent to > Java's doubleToLongBits and then encoded in little-endian format. > > """ > > self.write(STRUCT_DOUBLE.pack(datum)) > {code} > C > {code} > static int write_float(avro_writer_t writer, const float f) > { > #if AVRO_PLATFORM_IS_BIG_ENDIAN > uint8_t buf[4]; > #endif >
[jira] [Updated] (AVRO-3841) Align the specification of the way to encode NaN to the actual implementations
[ https://issues.apache.org/jira/browse/AVRO-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated AVRO-3841: - Summary: Align the specification of the way to encode NaN to the actual implementations (was: Align the specification of encoding NaN to the actual implementations) > Align the specification of the way to encode NaN to the actual implementations > -- > > Key: AVRO-3841 > URL: https://issues.apache.org/jira/browse/AVRO-3841 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Affects Versions: 1.12.0 >Reporter: Kousuke Saruta >Priority: Minor > > The specification says about the way to encode float/double like as follows. > {code} > a float is written as 4 bytes. The float is converted into a 32-bit integer > using a method equivalent to Java’s floatToIntBits and then encoded in > little-endian format. > a double is written as 8 bytes. The double is converted into a 64-bit integer > using a method equivalent to Java’s doubleToLongBits and then encoded in > little-endian format. > {code} > But the actual implementation in Java uses > floatToRawIntBits/doubleToRawLongBits rather than > floatToIntBits/doubleToLongBits. > The they are different in the way to encode NaN. > floatToIntBits/doubleToLongBits doesn't distinguish between NaN and -NaN but > floatToRawIntBits/doubleToRawLongBits does. > I confirmed all the implementation distinguish between NaN and -NaN. > So, I think it's better to modify the specification. > Java > {code} > public static int encodeFloat(float f, byte[] buf, int pos) { > final int bits = Float.floatToRawIntBits(f); > buf[pos + 3] = (byte) (bits >>> 24); > buf[pos + 2] = (byte) (bits >>> 16); > buf[pos + 1] = (byte) (bits >>> 8); > buf[pos] = (byte) (bits); > return 4; > } > public static int encodeDouble(double d, byte[] buf, int pos) { > final long bits = Double.doubleToRawLongBits(d); > int first = (int) (bits & 0x); > int second = (int) ((bits >>> 32) & 0x); > // the compiler seems to execute this order the best, likely due to > // register allocation -- the lifetime of constants is minimized. > buf[pos] = (byte) (first); > buf[pos + 4] = (byte) (second); > buf[pos + 5] = (byte) (second >>> 8); > buf[pos + 1] = (byte) (first >>> 8); > buf[pos + 2] = (byte) (first >>> 16); > buf[pos + 6] = (byte) (second >>> 16); > buf[pos + 7] = (byte) (second >>> 24); > buf[pos + 3] = (byte) (first >>> 24); > return 8; > } > {code} > Rust > {code} > Value::Float(x) => buffer.extend_from_slice(_le_bytes()), > Value::Double(x) => buffer.extend_from_slice(_le_bytes()), > {code} > Python > {code} > def write_float(self, datum: float) -> None: > > """ > > A float is written as 4 bytes. > > The float is converted into a 32-bit integer using a method > equivalent to > Java's floatToIntBits and then encoded in little-endian format. > > """ > > self.write(STRUCT_FLOAT.pack(datum)) > def write_double(self, datum: float) -> None: > > """ > > A double is written as 8 bytes. > > The double is converted into a 64-bit integer using a method > equivalent to > Java's doubleToLongBits and then encoded in little-endian format. > > """ > > self.write(STRUCT_DOUBLE.pack(datum)) > {code} > C > {code} > static int write_float(avro_writer_t writer, const float f) > { > #if AVRO_PLATFORM_IS_BIG_ENDIAN >