[jira] [Updated] (AVRO-3841) Align the specification of the way to encode NaN to the actual implementations

2023-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated AVRO-3841:
-
Labels: pull-request-available  (was: )

> Align the specification of the way to encode NaN to the actual implementations
> --
>
> Key: AVRO-3841
> URL: https://issues.apache.org/jira/browse/AVRO-3841
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: spec
>Affects Versions: 1.12.0
>Reporter: Kousuke Saruta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The specification says about the way to encode float/double like as follows.
> {code}
> a float is written as 4 bytes. The float is converted into a 32-bit integer 
> using a method equivalent to Java’s floatToIntBits and then encoded in 
> little-endian format.
> a double is written as 8 bytes. The double is converted into a 64-bit integer 
> using a method equivalent to Java’s doubleToLongBits and then encoded in 
> little-endian format.
> {code}
> But the actual implementation in Java uses 
> floatToRawIntBits/doubleToRawLongBits rather than 
> floatToIntBits/doubleToLongBits.
> The they are different in the way to encode NaN.
> floatToIntBits/doubleToLongBits doesn't distinguish between NaN and -NaN but 
> floatToRawIntBits/doubleToRawLongBits does.
> I confirmed all the implementation distinguish between NaN and -NaN.
> So, I think it's better to modify the specification.
> Java
> {code}
>   public static int encodeFloat(float f, byte[] buf, int pos) {
> final int bits = Float.floatToRawIntBits(f);
> buf[pos + 3] = (byte) (bits >>> 24);
> buf[pos + 2] = (byte) (bits >>> 16);
> buf[pos + 1] = (byte) (bits >>> 8);
> buf[pos] = (byte) (bits);
> return 4;
>   }
>   public static int encodeDouble(double d, byte[] buf, int pos) {
> final long bits = Double.doubleToRawLongBits(d);
> int first = (int) (bits & 0x);
> int second = (int) ((bits >>> 32) & 0x);
> // the compiler seems to execute this order the best, likely due to
> // register allocation -- the lifetime of constants is minimized.
> buf[pos] = (byte) (first);
> buf[pos + 4] = (byte) (second);
> buf[pos + 5] = (byte) (second >>> 8);
> buf[pos + 1] = (byte) (first >>> 8);
> buf[pos + 2] = (byte) (first >>> 16);
> buf[pos + 6] = (byte) (second >>> 16);
> buf[pos + 7] = (byte) (second >>> 24);
> buf[pos + 3] = (byte) (first >>> 24);
> return 8;
>   }
> {code}
> Rust
> {code}
> Value::Float(x) => buffer.extend_from_slice(_le_bytes()),
> Value::Double(x) => buffer.extend_from_slice(_le_bytes()),
> {code}
> Python
> {code}
> def write_float(self, datum: float) -> None:  
> 
> """   
> 
> A float is written as 4 bytes.
> 
> The float is converted into a 32-bit integer using a method 
> equivalent to 
> Java's floatToIntBits and then encoded in little-endian format.   
> 
> """   
> 
> self.write(STRUCT_FLOAT.pack(datum)) 
> def write_double(self, datum: float) -> None: 
> 
> """   
> 
> A double is written as 8 bytes.   
> 
> The double is converted into a 64-bit integer using a method 
> equivalent to
> Java's doubleToLongBits and then encoded in little-endian format. 
> 
> """   
> 
> self.write(STRUCT_DOUBLE.pack(datum))
> {code}
> C
> {code}
> static int write_float(avro_writer_t writer, const float f)
> {
> #if AVRO_PLATFORM_IS_BIG_ENDIAN
> uint8_t buf[4];
> #endif
> 

[jira] [Updated] (AVRO-3841) Align the specification of the way to encode NaN to the actual implementations

2023-08-23 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated AVRO-3841:
-
Summary: Align the specification of the way to encode NaN to the actual 
implementations  (was: Align the specification of encoding NaN to the actual 
implementations)

> Align the specification of the way to encode NaN to the actual implementations
> --
>
> Key: AVRO-3841
> URL: https://issues.apache.org/jira/browse/AVRO-3841
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: spec
>Affects Versions: 1.12.0
>Reporter: Kousuke Saruta
>Priority: Minor
>
> The specification says about the way to encode float/double like as follows.
> {code}
> a float is written as 4 bytes. The float is converted into a 32-bit integer 
> using a method equivalent to Java’s floatToIntBits and then encoded in 
> little-endian format.
> a double is written as 8 bytes. The double is converted into a 64-bit integer 
> using a method equivalent to Java’s doubleToLongBits and then encoded in 
> little-endian format.
> {code}
> But the actual implementation in Java uses 
> floatToRawIntBits/doubleToRawLongBits rather than 
> floatToIntBits/doubleToLongBits.
> The they are different in the way to encode NaN.
> floatToIntBits/doubleToLongBits doesn't distinguish between NaN and -NaN but 
> floatToRawIntBits/doubleToRawLongBits does.
> I confirmed all the implementation distinguish between NaN and -NaN.
> So, I think it's better to modify the specification.
> Java
> {code}
>   public static int encodeFloat(float f, byte[] buf, int pos) {
> final int bits = Float.floatToRawIntBits(f);
> buf[pos + 3] = (byte) (bits >>> 24);
> buf[pos + 2] = (byte) (bits >>> 16);
> buf[pos + 1] = (byte) (bits >>> 8);
> buf[pos] = (byte) (bits);
> return 4;
>   }
>   public static int encodeDouble(double d, byte[] buf, int pos) {
> final long bits = Double.doubleToRawLongBits(d);
> int first = (int) (bits & 0x);
> int second = (int) ((bits >>> 32) & 0x);
> // the compiler seems to execute this order the best, likely due to
> // register allocation -- the lifetime of constants is minimized.
> buf[pos] = (byte) (first);
> buf[pos + 4] = (byte) (second);
> buf[pos + 5] = (byte) (second >>> 8);
> buf[pos + 1] = (byte) (first >>> 8);
> buf[pos + 2] = (byte) (first >>> 16);
> buf[pos + 6] = (byte) (second >>> 16);
> buf[pos + 7] = (byte) (second >>> 24);
> buf[pos + 3] = (byte) (first >>> 24);
> return 8;
>   }
> {code}
> Rust
> {code}
> Value::Float(x) => buffer.extend_from_slice(_le_bytes()),
> Value::Double(x) => buffer.extend_from_slice(_le_bytes()),
> {code}
> Python
> {code}
> def write_float(self, datum: float) -> None:  
> 
> """   
> 
> A float is written as 4 bytes.
> 
> The float is converted into a 32-bit integer using a method 
> equivalent to 
> Java's floatToIntBits and then encoded in little-endian format.   
> 
> """   
> 
> self.write(STRUCT_FLOAT.pack(datum)) 
> def write_double(self, datum: float) -> None: 
> 
> """   
> 
> A double is written as 8 bytes.   
> 
> The double is converted into a 64-bit integer using a method 
> equivalent to
> Java's doubleToLongBits and then encoded in little-endian format. 
> 
> """   
> 
> self.write(STRUCT_DOUBLE.pack(datum))
> {code}
> C
> {code}
> static int write_float(avro_writer_t writer, const float f)
> {
> #if AVRO_PLATFORM_IS_BIG_ENDIAN
>