[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it
[ https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612586#comment-13612586 ] Leo Romanoff commented on AVRO-1282: I have implemented an initial version of the Unsafe-based serialization/deserialization. It uses Unsafe (when it is available) to read/write fields of an object. I can see that it improves performance of reflection-based (de)serialization by a factor of two. This is actually less than I expected. One of the problems that I see is that read/write methods work with Objects. It means that when reading/writing primitive types they are being boxed and unboxed all the time. It would be cool to improve it. How about introducing setIntField, setFloatField, setDoubleField, setShortField, setXXXField in GenericData? By default it would invoke setField for all of them. But it can be redefined by derived classes, i.e. ReflectData. Then the GenericDataReader#read() method could read a primitive type and write a primitive type without all those boxing/unboxing. What do you think? Make use of the sun.misc.Unsafe class during serialization if a JDK supports it --- Key: AVRO-1282 URL: https://issues.apache.org/jira/browse/AVRO-1282 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.4 Reporter: Leo Romanoff Priority: Minor Unsafe can be used to significantly speed up serialization process, if a JDK implementation supports java.misc.Unsafe properly. Most JDKs running on PCs support it. Some platforms like Android lack a proper support for Unsafe yet. There are two possibilities to use Unsafe for serialization: 1) Very quick access to the fields of objects. It is way faster than with the reflection-based approach using Field.get/set 2) Input and Output streams can be using Unsafe to perform very quick input/output. 3) More over, Unsafe makes it possible to serialize to/deserialize from off-heap memory directly and very quickly, without any intermediate buffers allocated on heap. There is virtually no overhead compared to the usual byte arrays. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it
[ https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612901#comment-13612901 ] Leo Romanoff commented on AVRO-1282: I implemented the optimization that avoids boxing/unboxing. Now Perf test shows that performance is about 3-4 times better than the vanilla reflection-based Avro version. But I want to check a few more potential improvements before I submit a patch. Make use of the sun.misc.Unsafe class during serialization if a JDK supports it --- Key: AVRO-1282 URL: https://issues.apache.org/jira/browse/AVRO-1282 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.4 Reporter: Leo Romanoff Priority: Minor Unsafe can be used to significantly speed up serialization process, if a JDK implementation supports java.misc.Unsafe properly. Most JDKs running on PCs support it. Some platforms like Android lack a proper support for Unsafe yet. There are two possibilities to use Unsafe for serialization: 1) Very quick access to the fields of objects. It is way faster than with the reflection-based approach using Field.get/set 2) Input and Output streams can be using Unsafe to perform very quick input/output. 3) More over, Unsafe makes it possible to serialize to/deserialize from off-heap memory directly and very quickly, without any intermediate buffers allocated on heap. There is virtually no overhead compared to the usual byte arrays. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (AVRO-1284) Python: validation should be a method of Schema objects
Jeremy Kahn created AVRO-1284: - Summary: Python: validation should be a method of Schema objects Key: AVRO-1284 URL: https://issues.apache.org/jira/browse/AVRO-1284 Project: Avro Issue Type: Improvement Components: python Reporter: Jeremy Kahn Assignee: Jeremy Kahn Priority: Minor Fix For: 1.7.5 In Python, validation of a datum by the schema was done in {{avro.io.validate}} function. The {{avro.io.validate}} function is a complex, recursively-called switch statement. Instead of calling a two-argument {{avro.io.validate}} with a Schema object and a datum, it is easier to understand and extend if they are one-argument methods on the schema. I (Jeremy) have written a patch that implements {{validate}} methods on Schema objects. This patch will form the prerequisite for AVRO-1265 (see easier to extend above). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (AVRO-1284) Python: validation should be a method of Schema objects
[ https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AVRO-1284 started by Jeremy Kahn. Python: validation should be a method of Schema objects --- Key: AVRO-1284 URL: https://issues.apache.org/jira/browse/AVRO-1284 Project: Avro Issue Type: Improvement Components: python Reporter: Jeremy Kahn Assignee: Jeremy Kahn Priority: Minor Fix For: 1.7.5 In Python, validation of a datum by the schema was done in {{avro.io.validate}} function. The {{avro.io.validate}} function is a complex, recursively-called switch statement. Instead of calling a two-argument {{avro.io.validate}} with a Schema object and a datum, it is easier to understand and extend if they are one-argument methods on the schema. I (Jeremy) have written a patch that implements {{validate}} methods on Schema objects. This patch will form the prerequisite for AVRO-1265 (see easier to extend above). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1284) Python: validation should be a method of Schema objects
[ https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Kahn updated AVRO-1284: -- Attachment: validation-as-method.patch Python: validation should be a method of Schema objects --- Key: AVRO-1284 URL: https://issues.apache.org/jira/browse/AVRO-1284 Project: Avro Issue Type: Improvement Components: python Reporter: Jeremy Kahn Assignee: Jeremy Kahn Priority: Minor Fix For: 1.7.5 Attachments: validation-as-method.patch In Python, validation of a datum by the schema was done in {{avro.io.validate}} function. The {{avro.io.validate}} function is a complex, recursively-called switch statement. Instead of calling a two-argument {{avro.io.validate}} with a Schema object and a datum, it is easier to understand and extend if they are one-argument methods on the schema. I (Jeremy) have written a patch that implements {{validate}} methods on Schema objects. This patch will form the prerequisite for AVRO-1265 (see easier to extend above). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1284) Python: validation should be a method of Schema objects
[ https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Kahn updated AVRO-1284: -- Attachment: validation-as-method-backwards-compatible.patch The {{validation-as-method-backwards-compatible}} patch maintains the functional behavior of {{avro.io.validate}} by calling the method indirectly, in case users are calling {{avro.io.validate}}. Prefer this patch to the simpler {{validation-as-method}} patch. Python: validation should be a method of Schema objects --- Key: AVRO-1284 URL: https://issues.apache.org/jira/browse/AVRO-1284 Project: Avro Issue Type: Improvement Components: python Reporter: Jeremy Kahn Assignee: Jeremy Kahn Priority: Minor Fix For: 1.7.5 Attachments: validation-as-method-backwards-compatible.patch, validation-as-method.patch In Python, validation of a datum by the schema was done in {{avro.io.validate}} function. The {{avro.io.validate}} function is a complex, recursively-called switch statement. Instead of calling a two-argument {{avro.io.validate}} with a Schema object and a datum, it is easier to understand and extend if they are one-argument methods on the schema. I (Jeremy) have written a patch that implements {{validate}} methods on Schema objects. This patch will form the prerequisite for AVRO-1265 (see easier to extend above). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (AVRO-1265) Python: schema objects should support builder() default-filling behavior
[ https://issues.apache.org/jira/browse/AVRO-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AVRO-1265 started by Jeremy Kahn. Python: schema objects should support builder() default-filling behavior Key: AVRO-1265 URL: https://issues.apache.org/jira/browse/AVRO-1265 Project: Avro Issue Type: Improvement Components: python Reporter: Jeremy Kahn Assignee: Jeremy Kahn Priority: Minor Labels: features Fix For: 1.7.5 There seems to be no way to easily use the avro libraries in Python (where I feel most qualified to comment) to encode generics with missing default values and have them transmitted in well-formed avro binary. If you fill in the missing default values, the Python libraries will transmit correctly. I'd be happy to add methods to the avro.RecordSchema objects (in the Python libraries) that fill defaults on missing member fields of a record, recursively (which probably means method extension of other schema classes as well). For backwards compatibility (and probably to avoid unnecessary data traversal), clients probably want to explicitly ask the schema to fill in defaults before transmission in the cases where you'd like to set only the non-default values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira