[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661468#comment-16661468
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on issue #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#issuecomment-432458389
 
 
   Yes, I ran tests (outside of Docker) before I committed this. Sorry to hear
   it's broken for you.
   
   Doug
   
   On Tue, Oct 23, 2018, 12:05 PM Fokko Driesprong 
   wrote:
   
   > Did you check the tests? I'm trying to set up CI/CD, but the following
   > tests are failing:
   >
   > ---
   >  T E S T S
   > ---
   > Running org.apache.avro.specific.TestGeneratedCode
   > Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.353 sec 
<<< FAILURE! - in org.apache.avro.specific.TestGeneratedCode
   > withSchemaMigration(org.apache.avro.specific.TestGeneratedCode)  Time 
elapsed: 0.042 sec  <<< FAILURE!
   > java.lang.AssertionError: Test schema must allow for custom coders.
   >at 
org.apache.avro.specific.TestGeneratedCode.withSchemaMigration(TestGeneratedCode.java:75)
   >
   > withoutSchemaMigration(org.apache.avro.specific.TestGeneratedCode)  Time 
elapsed: 0.001 sec  <<< FAILURE!
   > java.lang.AssertionError: Test schema must allow for custom coders.
   >at 
org.apache.avro.specific.TestGeneratedCode.withoutSchemaMigration(TestGeneratedCode.java:54)
   >
   > —
   > You are receiving this because you modified the open/close state.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661328#comment-16661328
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

Fokko commented on issue #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#issuecomment-432429845
 
 
   Thanks for the quick response! Using `./build.sh docker` you should get a 
full fledged docker environment to run the tests. If you run `mvn clean test` 
in `lang/avro/` the tests error on my current master.
   
   For example, the trailing whitespace is something that is being added in the 
test reference file should not be in the generated file: 
https://github.com/rstata-projects/avro/blob/98b3df3410c4dc14aa6b5890e2f7482da55350ca/lang/java/tools/src/test/compiler/output/Player.java#L543-L553
   This showed up in the diff of the failed test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661274#comment-16661274
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

Fokko commented on issue #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#issuecomment-432415634
 
 
   It looks like three more tests are failing later on: 
https://gist.github.com/Fokko/40602f95c04313bd6c7aca00316ec84c
   
   PTAL @rstata 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-23 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661059#comment-16661059
 ] 

ASF subversion and git services commented on AVRO-2090:
---

Commit b4ede4b116b24b5308e8419504a73e02b7f7e406 in avro's branch 
refs/heads/master from [~raymie]
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=b4ede4b ]

AVRO-2090 second try (#350)

* Finished initial implementation (not tested).

* Added Reader/Decoder code

* Made some of the changes suggested by Doug.

* Hide helper methods related to custom coding.  Changed them from public to 
protected.  Also changed name of encode and decode to customEncode and 
customDecode to be more clear as to their function.

* Allow dynamic changes to flag that controls whether or not the custom 
en/decoders are used.

* Fixed typos in TestSpecificCompiler.java

* New test case: breaks new code-gen when schema needs resolution.

* Fixed bug with decoding when the schema has been migrated.

* Added test-with-custom-coders execution of testing and fixed some problems 
that this uncovered.

* Fixed potential performance problem of redundantly allocating objects.

* Added documentation (also update AVRO-2090 description to point to new docs).

* Small doc fix (I forgot to commit these changes before pushing)


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-23 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661060#comment-16661060
 ] 

ASF subversion and git services commented on AVRO-2090:
---

Commit b4ede4b116b24b5308e8419504a73e02b7f7e406 in avro's branch 
refs/heads/master from [~raymie]
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=b4ede4b ]

AVRO-2090 second try (#350)

* Finished initial implementation (not tested).

* Added Reader/Decoder code

* Made some of the changes suggested by Doug.

* Hide helper methods related to custom coding.  Changed them from public to 
protected.  Also changed name of encode and decode to customEncode and 
customDecode to be more clear as to their function.

* Allow dynamic changes to flag that controls whether or not the custom 
en/decoders are used.

* Fixed typos in TestSpecificCompiler.java

* New test case: breaks new code-gen when schema needs resolution.

* Fixed bug with decoding when the schema has been migrated.

* Added test-with-custom-coders execution of testing and fixed some problems 
that this uncovered.

* Fixed potential performance problem of redundantly allocating objects.

* Added documentation (also update AVRO-2090 description to point to new docs).

* Small doc fix (I forgot to commit these changes before pushing)


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661058#comment-16661058
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting closed pull request #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/doc/src/content/xdocs/gettingstartedjava.xml 
b/doc/src/content/xdocs/gettingstartedjava.xml
index fe6c7d284..7f331e347 100644
--- a/doc/src/content/xdocs/gettingstartedjava.xml
+++ b/doc/src/content/xdocs/gettingstartedjava.xml
@@ -319,6 +319,43 @@ $ mvn compile # includes code generation via Avro Maven 
plugin
 $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain
 
   
+  
+Beta feature: Generating faster code
+
+  In this release we have introduced a new approach to
+  generating code that speeds up decoding of objects by more
+  than 10% and encoding by more than 30% (future performance
+  enhancements are underway).  To ensure a smooth introduction
+  of this change into production systems, this feature is
+  controlled by a feature flag, the system
+  property org.apache.avro.specific.use_custom_coders.
+  In this first release, this feature is off by default.  To
+  turn it on, set the system flag to true at
+  runtime.  In the sample above, for example, you could enable
+  the fater coders as follows:
+
+
+$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \
+-Dorg.apache.avro.specific.use_custom_coders=true
+
+
+  Note that you do not have to recompile your Avro
+  schema to have access to this feature.  The feature is
+  compiled and built into your code, and you turn it on and
+  off at runtime using the feature flag.  As a result, you can
+  turn it on during testing, for example, and then off in
+  production.  Or you can turn it on in production, and
+  quickly turn it off if something breaks.
+
+
+  We encourage the Avro community to exercise this new feature
+  early to help build confidence.  (For those paying
+  one-demand for compute resources in the cloud, it can lead
+  to meaningful cost savings.)  As confidence builds, we will
+  turn this feature on by default, and eventually eliminate
+  the feature flag (and the old code).
+
+  
 
 
 
diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java 
b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
index cb9a82b48..073ca27b0 100644
--- a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
+++ b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
@@ -116,9 +116,7 @@ public static Object resolve(Schema writer, Schema reader)
* the above loop will always be correct.
*
* Throws a runtime exception if we're not just about to read the
-   * field of a record.  Also, this method will consume the field
-   * information, and thus may only be called once before
-   * reading the field value.  (However, if the client knows the
+   * first field of a record.  (If the client knows the
* order of incoming fields, then the client does not
* need to call this method but rather can just start reading the
* field values.)
diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java 
b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
index 1c179007a..79558ba47 100644
--- a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
+++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
@@ -66,6 +66,9 @@
 
 /** Utilities to use existing Java classes and interfaces via reflection. */
 public class ReflectData extends SpecificData {
+  @Override
+  public boolean useCustomCoders() { return false; }
+
   /** {@link ReflectData} implementation that permits null field values.  The
* schema generated for each field is a union of its declared type and
* null. */
diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java 
b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
index 44de5c434..60d43dcf0 100644
--- a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
+++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
@@ -122,6 +122,22 @@ public DatumWriter createDatumWriter(Schema schema) {
   /** Return the singleton instance. */
   public static SpecificData get() { 

[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-22 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659671#comment-16659671
 ] 

Doug Cutting commented on AVRO-2090:


Looks reasonable to me.  Anyone object to me committing this soon?

> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-22 Thread Raymie Stata (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658852#comment-16658852
 ] 

Raymie Stata commented on AVRO-2090:


Any more feedback on this patch?

> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code]
>  for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657461#comment-16657461
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#discussion_r226786418
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
 ##
 @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
   /** Return the singleton instance. */
   public static SpecificData get() { return INSTANCE; }
 
+  private boolean useCustomCoderFlag
+= 
Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
+  public boolean useCustomCoders() { return useCustomCoderFlag; }
+  public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; }
+
 
 Review comment:
   I just pushed a change that added the missing Javadoc, and also updated the 
Getting Started Guide for Java to include a subsection on the new feature flag 
and how to use it.  Finally, since we point to JIRA tickets as our release 
notes, I updated the description of AVRO-2090 to be more informative to users 
looking to see what's new in Avro, and also included there a pointer to the 
more detailed documentation in the Getting Started Guide.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> New implementation for generation of code for SpecificRecord that improves 
> decoding by over 10% and encoding over 30% (more improvements are on the 
> way).  This feature is behind a feature flag 
> ({{org.apache.avro.specific.use_custom_coders}}) and (for now) turned off by 
> default.  See [Getting Started 
> (Java)|https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating
>  faster+code] for instructions.
> (A bit more info: Compared to GenericRecords, SpecificRecords offer 
> type-safety plus the performance of traditional getters/setters/instance 
> variables.  But these are only beneficial to Java code accessing those 
> records.  SpecificRecords inherit serialization and deserialization code from 
> GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that 
> serialization and deserialization is _slower_ for SpecificRecord than for 
> GenericRecord).  This patch extends record.vm to generate custom, 
> higher-performance encoder and decoder functions for SpecificRecords.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657223#comment-16657223
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#discussion_r226734720
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
 ##
 @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
   /** Return the singleton instance. */
   public static SpecificData get() { return INSTANCE; }
 
+  private boolean useCustomCoderFlag
+= 
Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
+  public boolean useCustomCoders() { return useCustomCoderFlag; }
+  public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; }
+
 
 Review comment:
   Regarding piecemeal: yes, it would definitely be valuable to have a finite 
list of todo's here rather than an indefinitely growing one :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657220#comment-16657220
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#discussion_r226734142
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
 ##
 @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
   /** Return the singleton instance. */
   public static SpecificData get() { return INSTANCE; }
 
+  private boolean useCustomCoderFlag
+= 
Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
+  public boolean useCustomCoders() { return useCustomCoderFlag; }
+  public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; }
+
 
 Review comment:
   Regarding documentation: there is a larger issue here that this feature flag 
isn't documented.  I'll add the suggested Javadoc but also look for other 
places to document this new feature and how Avro users can enable it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657219#comment-16657219
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#discussion_r226733860
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
 ##
 @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
   /** Return the singleton instance. */
   public static SpecificData get() { return INSTANCE; }
 
+  private boolean useCustomCoderFlag
+= 
Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
+  public boolean useCustomCoders() { return useCustomCoderFlag; }
+  public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; }
+
 
 Review comment:
   Regarding compiler options: the controlability here isn't intended to be a 
long-term option, it's intended to be a "feature flag" that allows more 
risk-seeking users to use the feature early and help find corner-case bugs.  
After a release or two, the flag should go away and this should be the only way 
to generate code.  As a result, this feature flag is more appropriate as a 
runtime option (as it is now), vs. a compile-time option.  For example, imagine 
the scenario where the custom coders look really good during testing, they get 
shipped to production, and a bug is tickled.  In this scenario, the ops team 
will want to turn off the custom coders with a runtime flag, rather than wait 
for the system to be recompiled with a change to the compile-time flag.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657118#comment-16657118
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on a change in pull request #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#discussion_r226716217
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
 ##
 @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
   /** Return the singleton instance. */
   public static SpecificData get() { return INSTANCE; }
 
+  private boolean useCustomCoderFlag
+= 
Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
+  public boolean useCustomCoders() { return useCustomCoderFlag; }
+  public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; }
+
 
 Review comment:
   These new public methods need javadoc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656690#comment-16656690
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on issue #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#issuecomment-431334712
 
 
   I pushed a patch to fix the problem Doug found.  I think this is ready to 
go...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654741#comment-16654741
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on issue #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#issuecomment-430901185
 
 
   I will make the change to SpecificDatumReader as discussed with Doug.  I'd 
like to batch this change with a larger set of changes based on further 
feedback on this work.
   
   More feedback?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md, perf-data.txt
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654225#comment-16654225
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #350: AVRO-2090 second try
URL: https://github.com/apache/avro/pull/350#discussion_r226101432
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
 ##
 @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
 }
   }
 
+  @Override
+  protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
+throws IOException {
+SpecificData data = getSpecificData();
+if (data.useCustomCoders()) {
+  Object r = data.newRecord(old, expected);
 
 Review comment:
   What about the following:
   ```
 old = data.newRecord(old, expected);
 if (old instanceof SpecificRecordBase) {
 ...
   ```
   That is to say, if I get a record back from newRecord and it doesn't have 
custom coders, then I pass the old-or-newly-allocated object to 
`super.readRecord`, which won't then need to allocate another object.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653005#comment-16653005
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata closed pull request #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java 
b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
index be6bde8d2..bb918e41d 100644
--- a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
+++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
@@ -122,6 +122,10 @@ public DatumWriter createDatumWriter(Schema schema) {
   /** Return the singleton instance. */
   public static SpecificData get() { return INSTANCE; }
 
+  private static final boolean USE_CUSTOM_CODERS
+= 
Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
+  public boolean useCustomCoders() { return USE_CUSTOM_CODERS; }
+
   @Override
   protected boolean isEnum(Object datum) {
 return datum instanceof Enum || super.isEnum(datum);
diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
 
b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
index 774ca0944..0a9c97014 100644
--- 
a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
+++ 
b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
@@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
 }
   }
 
+  @Override
+  protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
+throws IOException {
+SpecificData data = getSpecificData();
+Object r = data.newRecord(old, expected);
+if (SpecificData.get().useCustomCoders()
+&& r instanceof SpecificRecordBase) // TODO: Is this needed?
+{
+  SpecificRecordBase d = (SpecificRecordBase) r;
+  if (d.hasCustomCoders()) {
+d.decode(in);
+return d;
+  }
+}
+return super.readRecord(old, expected, in);
+  }
+
   @Override
   protected void readField(Object r, Schema.Field f, Object oldDatum,
ResolvingDecoder in, Object state)
diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
 
b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
index 7bee02a65..ee1d850a7 100644
--- 
a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
+++ 
b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
@@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, 
Encoder out)
 writeString(datum, out);
   }
 
+  @Override
+  protected void writeRecord(Schema schema, Object datum, Encoder out)
+throws IOException {
+if (SpecificData.get().useCustomCoders()
+&& datum instanceof SpecificRecordBase) // TODO: Is this needed?
+{
+  SpecificRecordBase d = (SpecificRecordBase) datum;
+  if (d.hasCustomCoders()) {
+d.encode(out);
+return;
+  }
+}
+super.writeRecord(schema, datum, out);
+  }
+
   @Override
   protected void writeField(Object datum, Schema.Field f, Encoder out,
 Object state) throws IOException {
diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java 
b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java
index 20d3dc331..2c26d0282 100644
--- 
a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java
+++ 
b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java
@@ -25,6 +25,8 @@
 import org.apache.avro.Conversion;
 import org.apache.avro.Schema;
 import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.Decoder;
+import org.apache.avro.io.Encoder;
 
 /** Base class for generated record classes. */
 public abstract class SpecificRecordBase
@@ -90,4 +92,19 @@ public void readExternal(ObjectInput in)
 new SpecificDatumReader(getSchema())
   .read(this, SpecificData.getDecoder(in));
   }
+
+  /** Returns true iff an instance supports the {@link #encode} and
+* {@link #decode} operations.  Should only be used by
+* SpecificDatumReader/Writer to selectively use
+* {@link #encode} and {@link #decode} to optimize the (de)serialization of
+* values. */
+  public boolean hasCustomCoders() { return false; }
+
+  public void encode(Encoder out) throws 

[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-10-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653004#comment-16653004
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on issue #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#issuecomment-430501383
 
 
   I've opened a new pull request (#350) to replace this one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-09-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633209#comment-16633209
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on issue #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#issuecomment-425691627
 
 
   @juwex, would love your help if you're still interested.  Good catch on the 
field-order problem.  It'd be great if you could develop a test case that 
demonstrates the bug you've found -- it'd be a fantastic regression test.  
Regarding a fix for this problem, we should check to see if the Decoder 
provided is an instance of ResolvingDecoder, and if it is code should use 
readFieldOrder to read fields in the correct order.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-09-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633206#comment-16633206
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on issue #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#issuecomment-425690934
 
 
   I will update this patch.  The old changes were easily rebased onto Avro's 
current master.  You can see that result here:
   
   https://github.com/rstata-projects/avro/tree/specific-new-rebased
   
   I then started addressing the comments above.  I just closed a few of the 
change requests from above -- you can find the resulting changes here:
   
   https://github.com/rstata-projects/avro/tree/AVRO-2090-again
   
   I'll keep working on this over the next few days.  In the meantime, if folks 
have applications that use SpecificRecord, it'd be great if you could enable 
this feature and run it through your test cases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-09-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633196#comment-16633196
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r221445803
 
 

 ##
 File path: 
lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
 ##
 @@ -473,4 +475,282 @@ public class ${this.mangle($schema.getName())}#if 
($schema.isError()) extends or
 READER$.read(this, SpecificData.getDecoder(in));
   }
 
+#if ($this.isCustomCodable($schema))
+  @Override public boolean hasCustomCoders() { return true; }
+
+  @Override public void encode(org.apache.avro.io.Encoder out)
+throws java.io.IOException
+  {
+#set ($nv = 0)## Counter to ensure unique var-names
+#set ($maxnv = 0)## Holds high-water mark during recursion
+#foreach ($field in $schema.getFields())
+#set ($n = $this.mangle($field.name(), $schema.isError()))
+#set ($s = $field.schema())
+#encodeVar(0 "this.${n}" $s)
+
+#set ($nv = $maxnv)
+#end
+  }
+
+  @Override public void decode(org.apache.avro.io.Decoder in)
+throws java.io.IOException
+  {
+#set ($nv = 0)## Counter to ensure unique var-names
+#set ($maxnv = 0)## Holds high-water mark during recursion
+#foreach ($field in $schema.getFields())
+#set ($n = $this.mangle($field.name(), $schema.isError()))
+#set ($s = $field.schema())
+#set ($rs = "SCHEMA$.getField(""${n}"").schema()")
+#decodeVar(0 "this.${n}" $s $rs)
+
+#set ($nv = $maxnv)
+#end
+  }
+#end
 }
+
+#macro( encodeVar $indent $var $s )
+#set ($I = $this.indent($indent))
+# Compound types (array, map, and union) require calls
+# that will recurse back into this encodeVar macro:
+#if ($s.Type.Name.equals("array"))
+#encodeArray($indent $var $s)
+#elseif ($s.Type.Name.equals("map"))
+#encodeMap($indent $var $s)
+#elseif ($s.Type.Name.equals("union"))
+#encodeUnion($indent $var $s)
+# Use the generated "encode" method as fast way to write
+# (specific) record types:
+#elseif ($s.Type.Name.equals("record"))
+$I${var}.encode(out);
+# For rest of cases, generate calls out.writeXYZ:
+#elseif ($s.Type.Name.equals("null"))
+$Iout.writeNull();
+#elseif ($s.Type.Name.equals("boolean"))
+$Iout.writeBoolean(${var});
+#elseif ($s.Type.Name.equals("int"))
+$Iout.writeInt(${var});
+#elseif ($s.Type.Name.equals("long"))
+$Iout.writeLong(${var});
+#elseif ($s.Type.Name.equals("float"))
+$Iout.writeFloat(${var});
+#elseif ($s.Type.Name.equals("double"))
+$Iout.writeDouble(${var});
+#elseif ($s.Type.Name.equals("string"))
+#if ($this.isStringable($s))
+$Iout.writeString(${var}.toString());
+#else
+$Iout.writeString(${var});
+#end
+#elseif ($s.Type.Name.equals("bytes"))
+$Iout.writeBytes(${var});
+#elseif ($s.Type.Name.equals("fixed"))
+$Iout.writeFixed(${var}.bytes(), 0, ${s.FixedSize});
+#elseif ($s.Type.Name.equals("enum"))
+$Iout.writeEnum(${var}.ordinal());
+#else
+## TODO -- singal a code-gen-time error
+#end
+#end
+
+#macro( encodeArray $indent $var $s )
+#set ($I = $this.indent($indent))
+#set ($et = $this.javaType($s.ElementType))
+$Ilong size${nv} = ${var}.size();
+$Iout.writeArrayStart();
+$Iout.setItemCount(size${nv});
+$Ilong actualSize${nv} = 0;
+$Ifor ($et e${nv}: ${var}) {
+$I  actualSize${nv}++;
+$I  out.startItem();
+#set ($var = "e${nv}")
+#set ($nv = $nv + 1)
+#set ($maxnv = $nv)
+#set ($indent = $indent + 2)
+#encodeVar($indent $var $s.ElementType)
+#set ($nv = $nv - 1)
+#set ($indent = $indent - 2)
+#set ($I = $this.indent($indent))
+$I}
+$Iout.writeArrayEnd();
+$Iif (actualSize${nv} != size${nv})
+$I  throw new java.util.ConcurrentModificationException("Array-size 
written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
+#end
+
+#macro( encodeMap $indent $var $s )
+#set ($I = $this.indent($indent))
+#set ($kt = $this.getStringType($s))
+#set ($vt = $this.javaType($s.ValueType))
+$Ilong size${nv} = ${var}.size();
+$Iout.writeMapStart();
+$Iout.setItemCount(size${nv});
+$Ilong actualSize${nv} = 0;
+$Ifor (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) {
+$I  actualSize${nv}++;
+$I  out.startItem();
+#if ($this.isStringable($s))
+$I  out.writeString(e${nv}.getKey().toString());
+#else
+$I  out.writeString(e${nv}.getKey());
+#end
+$I  $vt v${nv} = e${nv}.getValue();
+#set ($var = "v${nv}")
+#set ($nv = $nv + 1)
+#set ($maxnv = $nv)
+#set ($indent = $indent + 2)
+#encodeVar($indent $var $s.ValueType)
+#set ($nv = $nv - 1)
+#set ($indent = $indent - 2)
+#set ($I = $this.indent($indent))
+$I}
+$Iout.writeMapEnd();
+$Iif (actualSize${nv} != size${nv})
+  throw new 

[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-09-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633197#comment-16633197
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r221445811
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
 ##
 @@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, 
Encoder out)
 writeString(datum, out);
   }
 
+  @Override
+  protected void writeRecord(Schema schema, Object datum, Encoder out)
+throws IOException {
+if (SpecificData.get().useCustomCoders()
 
 Review comment:
   ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-09-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633195#comment-16633195
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r221445798
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java
 ##
 @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in)
 new SpecificDatumReader(getSchema())
   .read(this, SpecificData.getDecoder(in));
   }
+
+  /** Returns true iff an instance supports the {@link #encode} and
+* {@link #decode} operations.  Should only be used by
+* SpecificDatumReader/Writer to selectively use
+* {@link #encode} and {@link #decode} to optimize the (de)serialization of
+* values. */
+  public boolean hasCustomCoders() { return false; }
+
 
 Review comment:
   protected is enough


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-09-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633198#comment-16633198
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r221445814
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
 ##
 @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
 }
   }
 
+  @Override
+  protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
+throws IOException {
+SpecificData data = getSpecificData();
+Object r = data.newRecord(old, expected);
+if (SpecificData.get().useCustomCoders()
 
 Review comment:
   ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-09-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633194#comment-16633194
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r221445787
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
 ##
 @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
 }
   }
 
+  @Override
+  protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
+throws IOException {
+SpecificData data = getSpecificData();
+Object r = data.newRecord(old, expected);
 
 Review comment:
   yep


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-07-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559292#comment-16559292
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

juwex commented on issue #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#issuecomment-408317797
 
 
   I'm very excited about the performance improvements with this feature and 
would like to see this in a usable state sooner rather than later. If I can 
assist getting this branch into a mergeable state, I'd love to do so (with 
@rstata 's consent)
   
   There's a rather big issue with schema evolution currently, however. The 
generated reader relies on the field order within the `ResolvingDecoder` to 
stay the same. Hence, data written with a schema that only differs in regards 
to the order of fields will result in an error or (even worse) invalid data. If 
required, I can provide a code sample.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436077#comment-16436077
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on issue #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#issuecomment-380896347
 
 
   If it demonstrates a big performance improvement under Perf.java then I 
think it would be a good change to have & am willing to help get it committed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429197#comment-16429197
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on issue #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#issuecomment-379423675
 
 
   BTW, I believe that Thiru presented a bit of this work at Strata last Fall:
   
   
https://conferences.oreilly.com/strata/strata-ny-2017/public/schedule/detail/60729


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429195#comment-16429195
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

rstata commented on issue #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#issuecomment-379423354
 
 
   This work has gotten a bit old in my head.  I can dust it off and address 
the issues raised in this ticket, but want to make sure there is genuine 
interest before making the effort.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419838#comment-16419838
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r178191398
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
 ##
 @@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, 
Encoder out)
 writeString(datum, out);
   }
 
+  @Override
+  protected void writeRecord(Schema schema, Object datum, Encoder out)
+throws IOException {
+if (SpecificData.get().useCustomCoders()
 
 Review comment:
   should use this.getSpecificData() instead of the static instance


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419837#comment-16419837
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r178190902
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
 ##
 @@ -122,6 +122,10 @@ public DatumWriter createDatumWriter(Schema schema) {
   /** Return the singleton instance. */
   public static SpecificData get() { return INSTANCE; }
 
+  private static final boolean USE_CUSTOM_CODERS
+= 
Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
+  public boolean useCustomCoders() { return USE_CUSTOM_CODERS; }
 
 Review comment:
   useCustomCoders should probably be a settable member variable on each 
SpecificData instance, so folks can more easily switch it on and off without 
restarting the JVM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419839#comment-16419839
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r178191480
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
 ##
 @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
 }
   }
 
+  @Override
+  protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
+throws IOException {
+SpecificData data = getSpecificData();
+Object r = data.newRecord(old, expected);
+if (SpecificData.get().useCustomCoders()
 
 Review comment:
   should use this.getSpecificData() instead of using the static instance


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419597#comment-16419597
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r178152178
 
 

 ##
 File path: 
lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
 ##
 @@ -473,4 +475,282 @@ public class ${this.mangle($schema.getName())}#if 
($schema.isError()) extends or
 READER$.read(this, SpecificData.getDecoder(in));
   }
 
+#if ($this.isCustomCodable($schema))
+  @Override public boolean hasCustomCoders() { return true; }
+
+  @Override public void encode(org.apache.avro.io.Encoder out)
+throws java.io.IOException
+  {
+#set ($nv = 0)## Counter to ensure unique var-names
+#set ($maxnv = 0)## Holds high-water mark during recursion
+#foreach ($field in $schema.getFields())
+#set ($n = $this.mangle($field.name(), $schema.isError()))
+#set ($s = $field.schema())
+#encodeVar(0 "this.${n}" $s)
+
+#set ($nv = $maxnv)
+#end
+  }
+
+  @Override public void decode(org.apache.avro.io.Decoder in)
+throws java.io.IOException
+  {
+#set ($nv = 0)## Counter to ensure unique var-names
+#set ($maxnv = 0)## Holds high-water mark during recursion
+#foreach ($field in $schema.getFields())
+#set ($n = $this.mangle($field.name(), $schema.isError()))
+#set ($s = $field.schema())
+#set ($rs = "SCHEMA$.getField(""${n}"").schema()")
+#decodeVar(0 "this.${n}" $s $rs)
+
+#set ($nv = $maxnv)
+#end
+  }
+#end
 }
+
+#macro( encodeVar $indent $var $s )
+#set ($I = $this.indent($indent))
+# Compound types (array, map, and union) require calls
+# that will recurse back into this encodeVar macro:
+#if ($s.Type.Name.equals("array"))
+#encodeArray($indent $var $s)
+#elseif ($s.Type.Name.equals("map"))
+#encodeMap($indent $var $s)
+#elseif ($s.Type.Name.equals("union"))
+#encodeUnion($indent $var $s)
+# Use the generated "encode" method as fast way to write
+# (specific) record types:
+#elseif ($s.Type.Name.equals("record"))
+$I${var}.encode(out);
+# For rest of cases, generate calls out.writeXYZ:
+#elseif ($s.Type.Name.equals("null"))
+$Iout.writeNull();
+#elseif ($s.Type.Name.equals("boolean"))
+$Iout.writeBoolean(${var});
+#elseif ($s.Type.Name.equals("int"))
+$Iout.writeInt(${var});
+#elseif ($s.Type.Name.equals("long"))
+$Iout.writeLong(${var});
+#elseif ($s.Type.Name.equals("float"))
+$Iout.writeFloat(${var});
+#elseif ($s.Type.Name.equals("double"))
+$Iout.writeDouble(${var});
+#elseif ($s.Type.Name.equals("string"))
+#if ($this.isStringable($s))
+$Iout.writeString(${var}.toString());
+#else
+$Iout.writeString(${var});
+#end
+#elseif ($s.Type.Name.equals("bytes"))
+$Iout.writeBytes(${var});
+#elseif ($s.Type.Name.equals("fixed"))
+$Iout.writeFixed(${var}.bytes(), 0, ${s.FixedSize});
+#elseif ($s.Type.Name.equals("enum"))
+$Iout.writeEnum(${var}.ordinal());
+#else
+## TODO -- singal a code-gen-time error
+#end
+#end
+
+#macro( encodeArray $indent $var $s )
+#set ($I = $this.indent($indent))
+#set ($et = $this.javaType($s.ElementType))
+$Ilong size${nv} = ${var}.size();
+$Iout.writeArrayStart();
+$Iout.setItemCount(size${nv});
+$Ilong actualSize${nv} = 0;
+$Ifor ($et e${nv}: ${var}) {
+$I  actualSize${nv}++;
+$I  out.startItem();
+#set ($var = "e${nv}")
+#set ($nv = $nv + 1)
+#set ($maxnv = $nv)
+#set ($indent = $indent + 2)
+#encodeVar($indent $var $s.ElementType)
+#set ($nv = $nv - 1)
+#set ($indent = $indent - 2)
+#set ($I = $this.indent($indent))
+$I}
+$Iout.writeArrayEnd();
+$Iif (actualSize${nv} != size${nv})
+$I  throw new java.util.ConcurrentModificationException("Array-size 
written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
+#end
+
+#macro( encodeMap $indent $var $s )
+#set ($I = $this.indent($indent))
+#set ($kt = $this.getStringType($s))
+#set ($vt = $this.javaType($s.ValueType))
+$Ilong size${nv} = ${var}.size();
+$Iout.writeMapStart();
+$Iout.setItemCount(size${nv});
+$Ilong actualSize${nv} = 0;
+$Ifor (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) {
+$I  actualSize${nv}++;
+$I  out.startItem();
+#if ($this.isStringable($s))
+$I  out.writeString(e${nv}.getKey().toString());
+#else
+$I  out.writeString(e${nv}.getKey());
+#end
+$I  $vt v${nv} = e${nv}.getValue();
+#set ($var = "v${nv}")
+#set ($nv = $nv + 1)
+#set ($maxnv = $nv)
+#set ($indent = $indent + 2)
+#encodeVar($indent $var $s.ValueType)
+#set ($nv = $nv - 1)
+#set ($indent = $indent - 2)
+#set ($I = $this.indent($indent))
+$I}
+$Iout.writeMapEnd();
+$Iif (actualSize${nv} != size${nv})
+  throw new 

[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419596#comment-16419596
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r178148789
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
 ##
 @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
 }
   }
 
+  @Override
+  protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
+throws IOException {
+SpecificData data = getSpecificData();
+Object r = data.newRecord(old, expected);
 
 Review comment:
   'r' should only be created when custom coders are used, no?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419595#comment-16419595
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

cutting commented on a change in pull request #256: AVRO-2090: Improve 
encode/decode time for SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#discussion_r178149568
 
 

 ##
 File path: 
lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java
 ##
 @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in)
 new SpecificDatumReader(getSchema())
   .read(this, SpecificData.getDecoder(in));
   }
+
+  /** Returns true iff an instance supports the {@link #encode} and
+* {@link #decode} operations.  Should only be used by
+* SpecificDatumReader/Writer to selectively use
+* {@link #encode} and {@link #decode} to optimize the (de)serialization of
+* values. */
+  public boolean hasCustomCoders() { return false; }
+
 
 Review comment:
   Do these need to be public, or is protected enough?  Also, they need some 
javadoc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418582#comment-16418582
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

scottcarey commented on issue #256: AVRO-2090: Improve encode/decode time for 
SpecificRecord using code generation
URL: https://github.com/apache/avro/pull/256#issuecomment-377157550
 
 
   This is a good idea.  What is the performance improvement?  Did you run any 
benchmarks?
   
   I had imagined generating bytecode using ASM, but generating it in the 
generated class would work too.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2017-10-11 Thread Thiruvalluvan M. G. (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200287#comment-16200287
 ] 

Thiruvalluvan M. G. commented on AVRO-2090:
---

+1 for the patch.

> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2017-10-11 Thread Bridger Howell (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200271#comment-16200271
 ] 

Bridger Howell commented on AVRO-2090:
--

This looks like a really good idea. If I have some free time, I'll try to help 
with code review.

> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2090) Improve encode/decode time for SpecificRecord using code generation

2017-10-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195955#comment-16195955
 ] 

ASF GitHub Bot commented on AVRO-2090:
--

GitHub user rstata opened a pull request:

https://github.com/apache/avro/pull/256

AVRO-2090: Improve encode/decode time for SpecificRecord using code 
generation

Initial patch for AVRO-2090

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rstata-projects/avro AVRO-2090

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #256


commit d2127c7a4051bf7efa56cb0d7e8d9de6ead31c16
Author: rstata 
Date:   2017-07-13T08:36:39Z

Saving initial work, still have more to do.

commit 474cc97315ecfeb5bc79dc366424c342d57d83e2
Author: rstata 
Date:   2017-07-14T04:22:46Z

Finished initial implementation (not tested).

commit 456d667c58df1493190b99bea40b24408e969679
Author: rstata 
Date:   2017-07-14T05:51:24Z

Poorly done feature flag, and formatting improvements (incl proper 
indentation).

commit b1caba57a90a7fe9a7779c137137315d1d6a99ec
Author: rstata 
Date:   2017-07-16T08:40:03Z

Added Reader/Decoder code

commit 9f8c853f6f43c9ce07eb9ad10da0d6acf9263c5e
Author: rstata 
Date:   2017-09-16T01:33:16Z

Updated output files to reflect new specific-compiler strategy.

commit 83698d9e3ea04e00cd7da87373409b74f77d708a
Author: rstata 
Date:   2017-10-03T21:44:05Z

Reverting changes to SpecificFixed

commit 84e4cbb1ada1ceb97dcec0364a231055cd25142a
Author: rstata 
Date:   2017-10-04T01:54:24Z

Change name of feature from Encodable to CustomCoders

commit e57289bae26683ba4ea3ed30f863be5a79983bc0
Author: rstata 
Date:   2017-10-04T04:49:15Z

Fixed bugs in codegen template

commit f8fae7bc307fae7d51afab7a99025b4213937d40
Author: rstata 
Date:   2017-10-04T05:32:20Z

Added feature flag for custom coders

commit d5b45607ace5fbaf9ee526df2fa285a047365548
Author: rstata 
Date:   2017-10-08T00:18:34Z

Remove stale TODO comment




> Improve encode/decode time for SpecificRecord using code generation
> ---
>
> Key: AVRO-2090
> URL: https://issues.apache.org/jira/browse/AVRO-2090
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
> Attachments: customcoders.md
>
>
> Compared to GenericRecords, SpecificRecords offer type-safety plus the 
> performance of traditional getters/setters/instance variables.  But these are 
> only beneficial to Java code accessing those records.  SpecificRecords 
> inherit serialization and deserialization code from GenericRecords, which is 
> dynamic and thus slow (in fact, benchmarks show that serialization and 
> deserialization is _slower_ for SpecificRecord than for GenericRecord).
> This patch extends record.vm to generate custom, higher-performance encoder 
> and decoder functions for SpecificRecords.  We've run a public benchmark 
> showing that the new code reduces serialization time by 2/3 and 
> deserialization time by close to 50%.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)