[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2020-02-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028831#comment-17028831
 ] 

ASF subversion and git services commented on AVRO-2247:
---

Commit cbc6e500710864545a1b2f9ffa28edef532e26af in avro's branch 
refs/heads/branch-1.9 from Martin Jubelgas
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=cbc6e50 ]

AVRO-2247 - improved java reading performance with new reader (#391)

* AVRO-2247 - Add FastDatumReaderBuilder and dependencies (rebased)

* Addressed comments to pull request


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Assignee: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0, 1.9.2
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2020-02-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028830#comment-17028830
 ] 

ASF subversion and git services commented on AVRO-2247:
---

Commit cbc6e500710864545a1b2f9ffa28edef532e26af in avro's branch 
refs/heads/branch-1.9 from Martin Jubelgas
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=cbc6e50 ]

AVRO-2247 - improved java reading performance with new reader (#391)

* AVRO-2247 - Add FastDatumReaderBuilder and dependencies (rebased)

* Addressed comments to pull request


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Assignee: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0, 1.9.2
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2020-02-03 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028785#comment-17028785
 ] 

Hudson commented on AVRO-2247:
--

SUCCESS: Integrated in Jenkins build AvroJava #813 (See 
[https://builds.apache.org/job/AvroJava/813/])
AVRO-2247 - improved java reading performance with new reader (#391) (github: 
[https://github.com/apache/avro/commit/3ad0106f5fa15fbe718727016c600d14cd23294c])
* (edit) lang/java/avro/src/main/java/org/apache/avro/util/Utf8.java
* (add) lang/java/avro/src/main/java/org/apache/avro/io/ReflectionUtils.java
* (edit) 
lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java
* (edit) lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
* (edit) 
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java
* (edit) lang/java/avro/pom.xml
* (add) lang/java/avro/src/main/java/org/apache/avro/io/FastReaderBuilder.java
* (edit) lang/java/avro/src/main/java/org/apache/avro/Resolver.java
* (edit) lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Assignee: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0, 1.9.2
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2020-02-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028769#comment-17028769
 ] 

ASF subversion and git services commented on AVRO-2247:
---

Commit 3ad0106f5fa15fbe718727016c600d14cd23294c in avro's branch 
refs/heads/master from Martin Jubelgas
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=3ad0106 ]

AVRO-2247 - improved java reading performance with new reader (#391)

* AVRO-2247 - Add FastDatumReaderBuilder and dependencies (rebased)

* Addressed comments to pull request


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2020-02-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028770#comment-17028770
 ] 

ASF subversion and git services commented on AVRO-2247:
---

Commit 3ad0106f5fa15fbe718727016c600d14cd23294c in avro's branch 
refs/heads/master from Martin Jubelgas
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=3ad0106 ]

AVRO-2247 - improved java reading performance with new reader (#391)

* AVRO-2247 - Add FastDatumReaderBuilder and dependencies (rebased)

* Addressed comments to pull request


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2020-01-31 Thread Raymie Stata (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027873#comment-17027873
 ] 

Raymie Stata commented on AVRO-2247:


I'm in favor.

> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2020-01-29 Thread Martin Jubelgas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025924#comment-17025924
 ] 

Martin Jubelgas commented on AVRO-2247:
---

[~rskraba] - *peers* Any news/input?

> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2019-12-18 Thread Martin Jubelgas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999568#comment-16999568
 ] 

Martin Jubelgas commented on AVRO-2247:
---

No worries. As long as I have the impression, it's not all in vain, I'll not 
drop this. Just don't want the thing to be dropped because it got overlooked. 
If there is interest, I'd even go the extra mile to backport and everything, to 
be able to get more feedback on the feature.
There might be things to improve there, still, and I'm even willing to put (if 
need be lots) more work into it.

> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2019-12-18 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999377#comment-16999377
 ] 

Ryan Skraba commented on AVRO-2247:
---

Please don't drop this. I'm enthusiastic about the technique, and the results 
look really impressive!  It's been on my "must-dive-in" list since I heard of 
it.  If nobody else is actively reviewing it, I will *definitely* take the 
review on.  (My apologies in advance, however -- the end of the year will be 
risky!)

> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2019-12-18 Thread Martin Jubelgas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999354#comment-16999354
 ] 

Martin Jubelgas commented on AVRO-2247:
---

It's been quite a while, so I'd like to once more ask for feedback on 
https://github.com/apache/avro/pull/391

The proposed change improves reading performance in existing benchmark tests by 
up to 60% (more realistically maybe 30%), without breaking compatibility so I 
personally think it would be a shame to let the chance of such a performance 
increase go to waste.

So... could anyone please let me know what needs to be adressed in order to get 
this merged, or just tell me to drop the issue.


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2019-04-24 Thread Martin Jubelgas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825498#comment-16825498
 ] 

Martin Jubelgas commented on AVRO-2247:
---

Again rebased the PR [https://github.com/apache/avro/pull/391] to the master 
branch, using the now merged work of AVRO-2275.

CI is still acting up, failing to build the test image, not while executing 
tests.

> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2019-04-04 Thread Martin Jubelgas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809919#comment-16809919
 ] 

Martin Jubelgas commented on AVRO-2247:
---

Rebased the branch for the pull request ( 
[https://github.com/apache/avro/pull/391] ) and fixed formatting, only 
including the dependencies of Raymie's work on AVRO-2275 that I had yet 
included in my work.

Ran across a funny effect with CI. Test failed in some unexpected place on 
first attempt, but after a re-push, CI went through smoothly. Something's still 
amiss there and should be addressed.

> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704451#comment-16704451
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

rstata commented on issue #391: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-443139676
 
 
   @unchuckable -- send an email to "rstata - at - yahoo - . - com" to better 
coordinate.  Thanks.
   '
   '


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703867#comment-16703867
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

rstata commented on issue #391: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-443005833
 
 
   On the one hand, the performance results I posted a few days ago certainly 
demonstrate there is some perfomance improvements to be had for 
GenericDatumReader.
   
   On the other hand, this change introduces 2800 lines of new code that looks 
like it'd be tedious to maintain.  Also, the comparison here isn't apples to 
apples, because the old code is more aggressive about reusing objects, and it 
attempts to apply conversions, which is pure overhead for the performance tests 
we're using but aren't in other cases.  Finally, looking more closely at 
GenericDatumReader, it has built into it a BUNCH of "customization" points -- 
methods and objects that can be replaced to customize the reading process, all 
of which add overhead in the inner-most loop.  It's not clear whether how much 
of the performance gains come from the pre-computation of actions versus simply 
getting rid of all these customization points.
   
   I'm tempted to extend the AVRO-2275 work so that the Action-tree generated 
by Resolver is a complete mirror of the reader's schema (right now, it stops at 
DoNothing nodes, which for Unions in particular could be pretty high-up in the 
schema's tree).  Then one could write a FastGenericDatumReader class that 
simply walks that tree to decode the object.  I suspect the resulting code 
would be on the order of 100 lines and would capture almost all the speed found 
in this fast-avro patch.  (And one could decorate the Action objects with any 
Conversions for LogicalTypes found in the reader's schema, making it quick and 
easy to apply conversions while doing the walk.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703673#comment-16703673
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #391: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-442957245
 
 
   Am currently refactoring the code, to use the refactored `Resolver` of #395. 
Will post updates soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702319#comment-16702319
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #391: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-442576105
 
 
   I agree that JMH will still be hard pressed for before/after comparisons, 
unless the change can be toggled with a feature switch at runtime (which 
fortunately is the case with the proposed change).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701425#comment-16701425
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

rstata commented on issue #391: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-442330623
 
 
   I've run your code against `Perf.java` and uploaded the 
   [results 
here](https://github.com/apache/avro/files/2623075/AVRO-2247-Perf-results-11-27.pdf).
  This report contains two sets of results:
   
   * The "avro-2247 (calibration)" column presents the results of running the 
2247 branch against itself three different times.  These results are useful for 
understanding where the Perf.java benchmark tends to have a lot of internal 
variability.  As an example, the BooleanRead/Write shows a lot of natural 
variability, which is something I've notice in a lot of my previous performance 
testing.
   
   * The "avro-2274 (w/ custom coders) vs" column presents the result of 
running three different treatments against my avro-2274 branch.  The three 
sub-columns here are as follows: "master" is the Apache Avro master branch 
(just prior to avro-2274 being merged into it); "2247 (off)" branch is the 2247 
code with fast-coder turned off; "2247 (on)" is the 2247 branch with coders 
turned on.
   
   The last sub-column of "avro-2274 (...) vs" results is the more relevant.  
What we see here are a large number of record-related cases showing speedups of 
20-30% and even more.  This is very promising.
   
   I am currently running the JMH-based benchmarks.  These do _not_ have an 
(obvious) mechanism for comparing the "before/after" performance of your 
proposed changes, but I will be interested in seeing if they do better in 
reducing the variance between runs.
   
   I haven't inspected your code yet.  I'll do that as well, and offer some 
opinions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700045#comment-16700045
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #391: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-441965011
 
 
   Hi, @rstata.
   
   First of all, thanks for looking into it. It means a lot. I'm sorry about 
the license files; totally forgot about them files this time  
   
   I pulled your change from your repo and pushed it into mine. No clue what's 
up with github and the pull request there, if anybody has a pointer on what I 
would need to set in my repo, any advice is welcome.
   
   Invoking the benchmark:
   `cd lang/java/benchmark`
   `mvn clean package`
   `java -jar target/benchmarks.jar` (not the `benchmark-1.9.0-SNAPSHOT`)
   
   By default, it will use 5 warmup iterations and 5 measurement iterations 
with 10 seconds each, and do all of that 5 times, which totals up to almost 3 
hours, but it can easily be reduced to more reasonable limits (20 minutes), 
like:
   `java -jar target/benchmarks.jar -wi 3 -i 3 -f 1` (3 iterations for warmup 
and measurement and only 1 repetition)
   Adding `-e Building` will exclude the buiding of the DatumReaders from the 
benchmark, and reduce  the total time of evaluation by half currently.
   
   The current benchmark classes are only a small excerpt of cases of Perf.java 
(but trying to replicate them as good as possible). I can gladly add more if it 
helps the project; it might make sense to move that to a different ticket 
though, I guess.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699651#comment-16699651
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

rstata commented on issue #391: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-441827690
 
 
   I will play with it.
   
   To get it to build, I added licenses to all the files:
   
   
[https://github.com/rstata-projects/avro/tree/unchuckable-fast-avro](https://github.com/rstata-projects/avro/tree/unchuckable-fast-avro)
   
   For some reason I can't issue a pull request to your fork, can you pull this 
change from my repo?
   
   Also, how do you invoke the benchmark?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698788#comment-16698788
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable opened a new pull request #391: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/391
 
 
   Cannot reopen the original PR (#354), since I've rebased to current master.
   
   I've tried to adress the points that @rstata brought up with my approach. 
The feature switch between traditional and newly suggested reader mechanism now 
is done inside `GenericDatumReader`. All tests provided with the avro project 
run smoothly (I stole @rstata's idea to trigger the tests an additional time 
with the feature switch enabled). Also fixed defaulting in a way that takes 
advantage of immutable values and only actually re-reads default objects with a 
distinct decoder when really required.
   
   If there is any more things that would need testing, please do give me a 
pointer.
   
   Overall, the newly proposed writer sacrifices time building a `DatumReader`, 
allowing it to perform the actual reading at a highly improved rate. For all 
applications that are remotely "big data", that tradeoff should turn out highly 
beneficial.
   
   I also included a small module (`benchmark`) that uses JMH to test the 
performance of the proposed reader approach against the current generic reader. 
Using JMH should be preferable to Perf.java, for it allows to perform 
benchmarks in a controlled and statistical significant way.
   
   As stated in the last PR, I'm open to any changes, fire ahead. It's the 
overall concept and its aparent reader performance gains that I'm chasing 
after, not having my implementation find its way into the main branch 1:1.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682275#comment-16682275
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-437568430
 
 
   Note: I'd still be grateful for feedback on the concept of the readers as I 
tried to implement them (i.e. unifying `DatumReader`, `ResolvingDecoder` and 
`Parser` into one functional structure for faster evaluation)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682263#comment-16682263
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-437565988
 
 
   Okay, just ran across a major showstopper with this approach when it comes 
to using default values (and subsequent modifications of the latter).
   
   Also, the discussion above and the following study of some of the code 
Raymie pointed me to have helped me understand some concepts that I somehow 
couldn't get my head wrapped around before.
   I'll try to evaluate how I can fix the problem, or how I might be able to 
incorporate some ideas into existing code with less intrusive action ;)
   
   Thanks for everyone who took the time to look over my submission. I've 
learned aplenty from it already.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682264#comment-16682264
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable closed pull request #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java 
b/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
index ba538d2fb..093d01154 100644
--- a/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
+++ b/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
@@ -49,6 +49,7 @@
 import org.apache.avro.io.BinaryEncoder;
 import org.apache.avro.io.DecoderFactory;
 import org.apache.avro.io.EncoderFactory;
+import org.apache.avro.io.fastreader.FastReader;
 import org.apache.avro.io.DatumReader;
 import org.apache.avro.io.DatumWriter;
 import org.apache.avro.util.Utf8;
@@ -70,6 +71,9 @@
   public static final String STRING_PROP = "avro.java.string";
   protected static final String STRING_TYPE_STRING = "String";
 
+  private boolean fastReaderEnabled = Boolean.parseBoolean( 
System.getProperty("org.apache.avro.fastread", "false" ) );
+  private ThreadLocal fastReader = ThreadLocal.withInitial( 
()->new FastReader( this ) );
+
   private final ClassLoader classLoader;
 
   /** Set the Java type to be used when reading this schema.  Meaningful only
@@ -99,6 +103,18 @@ public GenericData(ClassLoader classLoader) {
   /** Return the class loader that's used (by subclasses). */
   public ClassLoader getClassLoader() { return classLoader; }
 
+  public void setFastReaderEnabled( boolean enabled ) {
+this.fastReaderEnabled = enabled;
+  }
+
+  public boolean isFastReaderEnabled() {
+return fastReaderEnabled;
+  }
+
+  public FastReader getFastReader() {
+return this.fastReader.get();
+  }
+
   private Map> conversions =
   new HashMap<>();
 
@@ -420,12 +436,12 @@ public int compareTo(GenericEnumSymbol that) {
 
   /** Returns a {@link DatumReader} for this kind of data. */
   public DatumReader createDatumReader(Schema schema) {
-return new GenericDatumReader(schema, schema, this);
+return createDatumReader( schema, schema );
   }
 
   /** Returns a {@link DatumReader} for this kind of data. */
   public DatumReader createDatumReader(Schema writer, Schema reader) {
-return new GenericDatumReader(writer, reader, this);
+  return new GenericDatumReader( writer, reader, this );
   }
 
   /** Returns a {@link DatumWriter} for this kind of data. */
@@ -1097,7 +1113,7 @@ private Object deepCopyRaw(Schema schema, Object value) {
 Map mapCopy =
   new HashMap<>(mapValue.size());
 for (Map.Entry entry : mapValue.entrySet()) {
-  mapCopy.put((CharSequence)(deepCopy(STRINGS, entry.getKey())),
+  mapCopy.put((deepCopy(STRINGS, entry.getKey())),
   deepCopy(schema.getValueType(), entry.getValue()));
 }
 return mapCopy;
diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java 
b/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java
index 9b7b04cd9..0a513e411 100644
--- 
a/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java
+++ 
b/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java
@@ -18,14 +18,13 @@
 package org.apache.avro.generic;
 
 import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.nio.ByteBuffer;
+import java.util.Collection;
 import java.util.HashMap;
 import java.util.IdentityHashMap;
 import java.util.Map;
-import java.util.Collection;
-import java.nio.ByteBuffer;
-import java.lang.reflect.Constructor;
-import java.lang.reflect.InvocationTargetException;
-
 import org.apache.avro.AvroRuntimeException;
 import org.apache.avro.Conversion;
 import org.apache.avro.Conversions;
@@ -36,6 +35,7 @@
 import org.apache.avro.io.Decoder;
 import org.apache.avro.io.DecoderFactory;
 import org.apache.avro.io.ResolvingDecoder;
+import org.apache.avro.io.fastreader.FastReader;
 import org.apache.avro.util.Utf8;
 import org.apache.avro.util.WeakIdentityHashMap;
 
@@ -45,6 +45,9 @@
   private Schema actual;
   private Schema expected;
 
+  private DatumReader fastDatumReader = null;
+  private FastReader creatorFastReader = null;
+
   private ResolvingDecoder creatorResolver = null;
   private final Thread creator;
 
@@ -86,6 +89,7 @@ public void setSchema(Schema writer) {
   expected = actual;
 }
 creatorResolver = 

[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682093#comment-16682093
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable edited a comment on issue #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-437531886
 
 
   Thanks for your feedback, raymie. Hadn't expected to receive that much 
input, but I'll try to address your points:
   
   * I moved the feature switch from `GenericData.createDatumReader` to 
`GenericDatumReader.read`. (even tho I don't quite get why the factory method 
isn't the default way to instantiate a new reader, can someone help me 
understand?). This way, the newly written code is subject to ALL unit tests of 
the project. (Admittedly, I had to disable the `ReusingArrayReader` due to some 
problems there, will address that later on if desired). With little adjustments 
(most of which having been fixing the Exception types and messages), all tests 
seem to pass. If there's further test frameworks to use, please do let me know.
   
   * I am well aware of the deviations in performance measurements. That's why 
everyone should take `Perf.java` results with a grain of salt. Most tests there 
are VERY sensitive on even small changes and have a too large spread. Actually, 
for performance measurement, a module that makes use of JMH or similar would be 
preferable (ideally one that not only measures time, but also allocation 
activity). Also, structures of different depth and diversity should be checked.
   
   * It's nice to see you having pursued similar ideas with your branch. I 
think the main difference is that I am trying to do all verifications ONCE at 
Reader creation, while your take still makes use of the existing `DatumReader`, 
which requires some avoidable lookups at read time (things like 
`SpecificData.getClass()`, etc ) and `ResolvingDecoder`, which still relies on 
parser information for nearly every read operation. My approach is 
**replacing** usage of `ResolvingDecoder` and `DatumReader`, using a different 
approach that makes all necessary decisions only once where possible, storing 
results in instance variables instead of maps (hoping to affect performance 
positively that way, see note above). Downside of my approach in its current 
form is that it only works for Generic and Specific records. Have not looked 
into what needs to be changed in order to use the other kinds of data 
(reflective, protobuf, thrift), but I consider generic and specific records to 
be the most important use case.
   
   * Sadly, the mechanism I present does not take advantage of the generated 
reader code of AVRO-2090, but offers performance benefits in a similar extent 
(depending on actual data structures) and works for all kinds of generic and 
specific records
   
   * The big benefit of the current approach is the strong speedup when dealing 
with default values. Maybe a huge part of the gain could be achieved with a 
smaller change, I do agree.
   
   I would be very grateful for some feedback on whether you consider the 
current approach I present worth spending more time on or whether there are 
more/other things that would keep it from being considered beneficial for the 
project.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a 

[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682085#comment-16682085
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-437531886
 
 
   Thanks for your feedback, raymie. Hadn't expected to receive that much 
input, but I'll try to address your points:
   
   * I moved the feature switch from `GenericData.createDatumReader` to 
`GenericDatumReader.read`. (even tho I don't quite get why the factory method 
isn't the default way to instantiate a new reader, can someone help me 
understand?). This way, the newly written code is subject to ALL unit tests of 
the project. (Admittedly, I had to disable the `ReusingArrayReader` due to some 
problems there, will address that later on if desired). With little adjustments 
(most of which having been fixing the Exception types and messages), all tests 
seem to pass. If there's further test frameworks to use, please do let me know.
   
   * I am well aware of the deviations in performance measurements. That's why 
everyone should take `Perf.java` results with a grain of salt. Most tests there 
are VERY sensitive on even small changes and have a too large spread. Actually, 
for performance measurement, a module that makes use of JMH or similar would be 
preferable (ideally one that not only measures time, but also allocation 
activity). Also, structures of different depth and diversity should be checked.
   
   * It's nice to see you having pursued similar ideas with your branch. I 
think the main difference is that I am trying to do all verifications ONCE at 
Reader creation, while your take still makes use of the existing `DatumReader`, 
which requires some avoidable lookups at read time (things like 
`SpecificData.getClass()`, etc ) and `ResolvingDecoder`, which still relies on 
parser information for nearly every read operation. My approach is 
**replacing** usage of `ResolvingDecoder` and `DatumReader`, using a different 
approach that makes all necessary decisions only once where possible, storing 
results in instance variables instead of maps (hoping to affect performance 
positively that way, see note above). Downside of my approach in its current 
form is that it only works for Generic and Specific records. Have not looked 
into what needs to be changed in order to use the other kinds of data 
(reflective, protobuf, thrift), but I consider generic and specific records to 
be the most important use case.
   
   * Sadly, the mechanism I present does not take advantage of the generated 
reader code of AVRO-2090, but offers performance benefits in a similar extent 
and works for all kinds of generic and specific records
   
   * The big benefit of the current approach is the strong speedup when dealing 
with default values. Maybe a huge part of the gain could be achieved with a 
smaller change, I do agree.
   
   I would be very grateful for some feedback on whether you consider the 
current approach I present worth spending more time on or whether there are 
more/other things that would keep it from being considered beneficial for the 
project.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 

[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681607#comment-16681607
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

rstata commented on issue #354: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-437398604
 
 
   I've been meaning to comment on this for a while.  Looking at your code 
quickly, I wasn't convinced that it worked for recursive records (and maybe not 
even for nested records).  Also, the solution as posted re-implements schema 
resolution.  The schema-resolution code is subjected to a large number of 
regression tests that came about because the resolution logic is subtle in 
places.  A re-implementation of that logic should subject itself to that test 
suite, which yours does not.
   
   Inspired by both your JIRA (AVRO-2247) and my own thoughts about further 
improving performance of reading with resolution, I have refactored the 
schema-resolution logic away from the resolving-grammar generation logic.  I 
have published this in the branch 
[`'refactor-resolving-2018-11-09`](https://github.com/rstata-projects/avro/tree/refactor-resolving-2018-11-09)
 of my Avro fork.  This code is "bug for bug" compatible with Avro's existing 
schema resolution (e.g., it implements the funky "best match" algorithm 
currently used for unions), and it passes the full schema-resolution regression 
suite.
   
   You might want to look to see if this would be a good foundation for 
implementing your improvements.  Start at the new 
[Resolver](https://github.com/rstata-projects/avro/blob/refactor-resolving-2018-11-09/lang/java/avro/src/main/java/org/apache/avro/Resolver.java)
 class, and also look to see how ResolvingGrammarGenerator uses the output of 
Resolver.  However, be warned that I intend to "re-write history" on this code 
pretty severely before proposing it as an actual improvement, so you might want 
to wait about a week before actually depending upon this code.
   
   Over the last few weeks I've been working on the performance-testing suite.  
What I've found is that the variance between runs of this suite varies 
_significantly:_ in places, over 30%!  Across the board, I see variance of over 
5% between runs for over 40% of the test cases.  With this much variance, it's 
impossible to say if a proposed performance improvement is really an 
improvement (and impossible to tell whether or not an attempt to improve one 
set of performance cases has degraded performance elsewhere).
   
   By the end of next week I should have a proposed set of changes to the 
performance benchmark, plus a "cookbook" for using it (on AWS), which minimizes 
variance between runs of the suite.  With that in place, I will return to the 
`refactor-resolving` work and submit it along with testing that shows it 
doesn't degrade performance (and, in fact, improves it in places).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678879#comment-16678879
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-436800295
 
 
   @Fokko - Actually, reason was twofold: For one, I was looking at the code 
generation of Raymie for AVRO-2090 and was considering working up a concept to 
do on-the-fly bytecode generation for deserialization. And coming up with 
something that creates an execution plan was kinda the natural first step for 
that. I'd really like to extend that in a way that makes the ExecutionSteps 
generate inlined bytecode at a later point on the fly, so they JVM can optimize 
even more.  And on the other hand, I tried to understand the 
ResolvingGrammarGenerator and had a hard time with it, so I tried to build 
something that felt easier for me, and was kinda surprised with the results. 
I'm well aware it would be preferable to improve on what's already there, but I 
felt that the one-stage "execution plan" approach was too different from the 
two-stage "DatumReader and ResolvingDecoder" approach. I'm happy tho even if 
this PR only serves as inspiration for other changes, and am willing to assist 
in getting things done another way, too.
   
   @cutting - The ExecutionSteps are created in 
`FastReader.initializeRecordReader(...)`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678839#comment-16678839
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

cutting commented on issue #354: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-436792455
 
 
   I don't see where ExcecutionSteps are created.  Is some of the code missing 
from the patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678762#comment-16678762
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

Fokko commented on issue #354: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-436773869
 
 
   Wow, that is an incredible speedup. Just curious, why implement a completely 
new reader, instead of optimizing the existing ones? Would be nice to run the 
speed tests every time, to avoid performance regression.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678735#comment-16678735
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-436765147
 
 
   Rebased as requested, and added small change to Perf.java to use 
`GenericData.get().createDatumReader( schema )` instead of `new 
GenericDatumReader( schema )`.
   Also, using `WeakIdentityHashMap` instead of `WeakHashMap` for schema 
lookups for additional speedup.
   
   As noted, am curious for any feedback and willing to work on implementation 
and style details. Just need to know if this is something worth pursuing.
   
   With current changes, I get the following Perf.java comparison:
   
   test name | time (fast read disabled) | time (fast read enabled)
   |-|
   FooBarSpecificRecordTestRead | 5534 ms |   3115 ms
   GenericRead | 4711 ms |3422 ms
   GenericStringsRead | 4902 ms |   3695 ms
   GenericNested_Read | 7190 ms |  4961 ms
   GenericNestedFake_Read | 2581 ms |   2461 ms
   GenericWithDefault_Read | 8400 ms |  3746 ms
   GenericWithOutOfOrder_Read | 4627 ms |   3549 ms
   GenericWithPromotion_Read | 4991 ms |   3673 ms
   GenericOneTimeDecoderUse_Read | 4618 ms |   3496 ms
   GenericOneTimeReaderUse_Read | 7035 ms |   4693 ms
   GenericOneTimeUse_Read  | 6965 ms |   4721 ms
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678347#comment-16678347
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

Fokko commented on issue #354: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-436652418
 
 
   Can you rebase and fix the merge conflict?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662357#comment-16662357
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable opened a new pull request #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354
 
 
   This is the first implementation of a proposed new reader design as 
described in AVRO-2247 that improves reading performance both for generic and 
specific records. Please let me know what you think. Classes could be 
consolidated into inner classes, but I did not want to spend too much aestetics 
work before getting feedback on whether this feature is feasible.
   
   Feature can be enabled per GenericData or SpecfiicData instance of by 
setting system property `org.apache.avro.fastread` to `true`. Note that in 
order to see effects in Perf, it would be required to replace calls to `new 
GenericDatumReader( schema )` with `GenericData.get().createDatumReader( schema 
)` (this change is not included yet).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)