[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-03-25 Thread Leo Romanoff (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612586#comment-13612586
 ] 

Leo Romanoff commented on AVRO-1282:


I have implemented an initial version of the Unsafe-based 
serialization/deserialization. It uses Unsafe (when it is available) to 
read/write fields of an object. I can see that it improves performance of 
reflection-based (de)serialization by a factor of two. This is actually less 
than I expected. 

One of the problems that I see is that read/write methods work with Objects. It 
means that when reading/writing primitive types they are being boxed and 
unboxed all the time. It would be cool to improve it.
How about introducing setIntField, setFloatField, setDoubleField, 
setShortField, setXXXField in GenericData? By default it would invoke setField 
for all of them. But it can be redefined by derived classes, i.e. ReflectData. 
Then the GenericDataReader#read() method could read a primitive type and write 
a primitive type without all those boxing/unboxing.
What do you think?

 Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
 it
 ---

 Key: AVRO-1282
 URL: https://issues.apache.org/jira/browse/AVRO-1282
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.4
Reporter: Leo Romanoff
Priority: Minor

 Unsafe can be used to significantly speed up serialization process, if a JDK 
 implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
 support it. Some platforms like Android lack a proper support for Unsafe yet.
 There are two possibilities to use Unsafe for serialization:
 1) Very quick access to the fields of objects. It is way faster than with the 
 reflection-based approach using Field.get/set
 2) Input and Output streams can be using Unsafe to perform very quick 
 input/output.
  
 3) More over, Unsafe makes it possible to serialize to/deserialize from 
 off-heap memory directly and very quickly, without any intermediate buffers 
 allocated on heap. There is virtually no overhead compared to the usual byte 
 arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-03-25 Thread Leo Romanoff (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612901#comment-13612901
 ] 

Leo Romanoff commented on AVRO-1282:


I implemented the optimization that avoids boxing/unboxing. Now Perf test shows 
that performance is about 3-4 times better than the vanilla reflection-based 
Avro version.  But I want to check a few more potential improvements before I 
submit a patch.

 Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
 it
 ---

 Key: AVRO-1282
 URL: https://issues.apache.org/jira/browse/AVRO-1282
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.4
Reporter: Leo Romanoff
Priority: Minor

 Unsafe can be used to significantly speed up serialization process, if a JDK 
 implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
 support it. Some platforms like Android lack a proper support for Unsafe yet.
 There are two possibilities to use Unsafe for serialization:
 1) Very quick access to the fields of objects. It is way faster than with the 
 reflection-based approach using Field.get/set
 2) Input and Output streams can be using Unsafe to perform very quick 
 input/output.
  
 3) More over, Unsafe makes it possible to serialize to/deserialize from 
 off-heap memory directly and very quickly, without any intermediate buffers 
 allocated on heap. There is virtually no overhead compared to the usual byte 
 arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-25 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1284:
-

 Summary: Python: validation should be a method of Schema objects
 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5


In Python, validation of a datum by the schema was done in {{avro.io.validate}} 
function.
The {{avro.io.validate}} function is a complex, recursively-called switch 
statement.

Instead of calling a two-argument {{avro.io.validate}} with a Schema object and 
a datum, it is easier to understand and extend if they are one-argument methods 
on the schema.

I (Jeremy) have written a patch that implements {{validate}} methods on Schema 
objects. This patch will form the prerequisite for AVRO-1265 (see easier to 
extend above).



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-25 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1284 started by Jeremy Kahn.

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-25 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1284:
--

Attachment: validation-as-method.patch

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5

 Attachments: validation-as-method.patch


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-25 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1284:
--

Attachment: validation-as-method-backwards-compatible.patch

The {{validation-as-method-backwards-compatible}} patch maintains the 
functional behavior of {{avro.io.validate}} by calling the method indirectly, 
in case users are calling {{avro.io.validate}}.

Prefer this patch to the simpler {{validation-as-method}} patch.

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5

 Attachments: validation-as-method-backwards-compatible.patch, 
 validation-as-method.patch


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (AVRO-1265) Python: schema objects should support builder() default-filling behavior

2013-03-25 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1265 started by Jeremy Kahn.

 Python: schema objects should support builder() default-filling behavior
 

 Key: AVRO-1265
 URL: https://issues.apache.org/jira/browse/AVRO-1265
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Fix For: 1.7.5


 There seems to be no way to easily use the avro libraries in Python (where I 
 feel most qualified to comment) to encode generics with missing default 
 values and have them transmitted in well-formed avro binary.
 If you fill in the missing default values, the Python libraries will 
 transmit correctly.
 I'd be happy to add methods to the avro.RecordSchema objects (in the Python 
 libraries) that fill defaults on missing member fields of a record, 
 recursively (which probably means method extension of other schema classes as 
 well).
 For backwards compatibility (and probably to avoid unnecessary data 
 traversal), clients probably want to explicitly ask the schema to fill in 
 defaults before transmission in the cases where you'd like to set only the 
 non-default values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira