[jira] [Updated] (AVRO-1821) Avro (Java) Memory Leak in ReflectData Caching

2017-06-20 Thread Nandor Kollar (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar updated AVRO-1821:

Fix Version/s: 1.8.1

> Avro (Java) Memory Leak in ReflectData Caching
> --
>
> Key: AVRO-1821
> URL: https://issues.apache.org/jira/browse/AVRO-1821
> Project: Avro
>  Issue Type: Bug
>  Components: java
> Environment: OS X 10.11.3
> {code}java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode){code}
>Reporter: Bryan Harclerode
>Assignee: Bryan Harclerode
> Fix For: 1.8.1
>
> Attachments: 
> 0001-AVRO-1821-Fix-memory-leak-of-Schemas-in-ReflectData.patch
>
>
> I think I have encountered one of the memory leaks described by AVRO-1283 in 
> the way Java Avro implements field accessor caching in {{ReflectData}}. When 
> a reflected object is serialized, the key of {{ClassAccessorData.bySchema}} 
> (as retained by {{ReflectData.ACCESSOR_CACHE}}) retains a strong reference to 
> the schema that was used to serialize the object, but there exists no code 
> path for clearing these references after a schema will no longer be used.
> While in most cases, a class will probably only have one schema associated 
> with it (created and cached by {{ReflectData.getSchema(Type)}}), I 
> experienced {{OutOfMemoryError}} when serializing generic classes with 
> dynamically-generated schemas. The following is a minimal example which will 
> exhaust a 50MiB heap ({{-Xmx50m}}) after about 190K iterations:
> {code:title=AvroMemoryLeakMinimal.java|borderStyle=solid}
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.io.BinaryEncoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroMemoryLeakMinimal {
> public static void main(String[] args) throws IOException {
> long count = 0;
> EncoderFactory encFactory = EncoderFactory.get();
> try {
> while (true) {
> // Create schema
> Schema schema = Schema.createRecord("schema", null, null, 
> false);
> schema.setFields(Collections.emptyList());
> // serialize
> ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
> BinaryEncoder encoder = encFactory.binaryEncoder(baos, null);
> (new ReflectDatumWriter(schema)).write(new Object(), 
> encoder);
> byte[] result = baos.toByteArray();
> count++;
> }
> } catch (OutOfMemoryError e) {
> System.out.print("Memory exhausted after ");
> System.out.print(count);
> System.out.println(" schemas");
> throw e;
> }
> }
> }
> {code}
> I was able to fix the bug in the latest 1.9.0-SNAPSHOT from git with the 
> following patch to {{ClassAccessorData.bySchema}} to use weak keys so that it 
> properly released the {{Schema}} objects if no other threads are still 
> referencing them:
> {code:title=ReflectData.java.patch|borderStyle=solid}
> --- a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> @@ -57,6 +57,7 @@ import org.apache.avro.io.DatumWriter;
>  import org.apache.avro.specific.FixedSize;
>  import org.apache.avro.specific.SpecificData;
>  import org.apache.avro.SchemaNormalization;
> +import org.apache.avro.util.WeakIdentityHashMap;
>  import org.codehaus.jackson.JsonNode;
>  import org.codehaus.jackson.node.NullNode;
>  
> @@ -234,8 +235,8 @@ public class ReflectData extends SpecificData {
>  private final Class clazz;
>  private final Map byName =
>  new HashMap();
> -private final IdentityHashMap bySchema =
> -new IdentityHashMap();
> +private final WeakIdentityHashMap bySchema =
> +new WeakIdentityHashMap();
>  
>  private ClassAccessorData(Class c) {
>clazz = c;
> {code}
> Additionally, I'm not sure why an {{IdentityHashMap}} was used instead of a 
> standard {{HashMap}}, since two equivalent schemas have the same set of 
> {{FieldAccessor}}. Everything appears to work and all tests pass if I use a 
> {{WeakHashMap}} instead of an {{WeakIdentityHashMap}}, but I don't know if 
> there was some other reason object identity was important for this map. If a 
> non-identity map can be used, this will help reduce memory/CPU usage further 
> by not regenerating all the field 

[jira] [Updated] (AVRO-1821) Avro (Java) Memory Leak in ReflectData Caching

2016-04-16 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1821:

Assignee: Bryan Harclerode

> Avro (Java) Memory Leak in ReflectData Caching
> --
>
> Key: AVRO-1821
> URL: https://issues.apache.org/jira/browse/AVRO-1821
> Project: Avro
>  Issue Type: Bug
>  Components: java
> Environment: OS X 10.11.3
> {code}java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode){code}
>Reporter: Bryan Harclerode
>Assignee: Bryan Harclerode
> Attachments: 
> 0001-AVRO-1821-Fix-memory-leak-of-Schemas-in-ReflectData.patch
>
>
> I think I have encountered one of the memory leaks described by AVRO-1283 in 
> the way Java Avro implements field accessor caching in {{ReflectData}}. When 
> a reflected object is serialized, the key of {{ClassAccessorData.bySchema}} 
> (as retained by {{ReflectData.ACCESSOR_CACHE}}) retains a strong reference to 
> the schema that was used to serialize the object, but there exists no code 
> path for clearing these references after a schema will no longer be used.
> While in most cases, a class will probably only have one schema associated 
> with it (created and cached by {{ReflectData.getSchema(Type)}}), I 
> experienced {{OutOfMemoryError}} when serializing generic classes with 
> dynamically-generated schemas. The following is a minimal example which will 
> exhaust a 50MiB heap ({{-Xmx50m}}) after about 190K iterations:
> {code:title=AvroMemoryLeakMinimal.java|borderStyle=solid}
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.io.BinaryEncoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroMemoryLeakMinimal {
> public static void main(String[] args) throws IOException {
> long count = 0;
> EncoderFactory encFactory = EncoderFactory.get();
> try {
> while (true) {
> // Create schema
> Schema schema = Schema.createRecord("schema", null, null, 
> false);
> schema.setFields(Collections.emptyList());
> // serialize
> ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
> BinaryEncoder encoder = encFactory.binaryEncoder(baos, null);
> (new ReflectDatumWriter(schema)).write(new Object(), 
> encoder);
> byte[] result = baos.toByteArray();
> count++;
> }
> } catch (OutOfMemoryError e) {
> System.out.print("Memory exhausted after ");
> System.out.print(count);
> System.out.println(" schemas");
> throw e;
> }
> }
> }
> {code}
> I was able to fix the bug in the latest 1.9.0-SNAPSHOT from git with the 
> following patch to {{ClassAccessorData.bySchema}} to use weak keys so that it 
> properly released the {{Schema}} objects if no other threads are still 
> referencing them:
> {code:title=ReflectData.java.patch|borderStyle=solid}
> --- a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> @@ -57,6 +57,7 @@ import org.apache.avro.io.DatumWriter;
>  import org.apache.avro.specific.FixedSize;
>  import org.apache.avro.specific.SpecificData;
>  import org.apache.avro.SchemaNormalization;
> +import org.apache.avro.util.WeakIdentityHashMap;
>  import org.codehaus.jackson.JsonNode;
>  import org.codehaus.jackson.node.NullNode;
>  
> @@ -234,8 +235,8 @@ public class ReflectData extends SpecificData {
>  private final Class clazz;
>  private final Map byName =
>  new HashMap();
> -private final IdentityHashMap bySchema =
> -new IdentityHashMap();
> +private final WeakIdentityHashMap bySchema =
> +new WeakIdentityHashMap();
>  
>  private ClassAccessorData(Class c) {
>clazz = c;
> {code}
> Additionally, I'm not sure why an {{IdentityHashMap}} was used instead of a 
> standard {{HashMap}}, since two equivalent schemas have the same set of 
> {{FieldAccessor}}. Everything appears to work and all tests pass if I use a 
> {{WeakHashMap}} instead of an {{WeakIdentityHashMap}}, but I don't know if 
> there was some other reason object identity was important for this map. If a 
> non-identity map can be used, this will help reduce memory/CPU usage further 
> by not regenerating all the field accessors for equivalent schemas.
> 

[jira] [Updated] (AVRO-1821) Avro (Java) Memory Leak in ReflectData Caching

2016-04-11 Thread Bryan Harclerode (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Harclerode updated AVRO-1821:
---
Attachment: 0001-AVRO-1821-Fix-memory-leak-of-Schemas-in-ReflectData.patch

Attaching changes and unit tests as a patch from {{git format-patch}}

> Avro (Java) Memory Leak in ReflectData Caching
> --
>
> Key: AVRO-1821
> URL: https://issues.apache.org/jira/browse/AVRO-1821
> Project: Avro
>  Issue Type: Bug
>  Components: java
> Environment: OS X 10.11.3
> {code}java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode){code}
>Reporter: Bryan Harclerode
> Attachments: 
> 0001-AVRO-1821-Fix-memory-leak-of-Schemas-in-ReflectData.patch
>
>
> I think I have encountered one of the memory leaks described by AVRO-1283 in 
> the way Java Avro implements field accessor caching in {{ReflectData}}. When 
> a reflected object is serialized, the key of {{ClassAccessorData.bySchema}} 
> (as retained by {{ReflectData.ACCESSOR_CACHE}}) retains a strong reference to 
> the schema that was used to serialize the object, but there exists no code 
> path for clearing these references after a schema will no longer be used.
> While in most cases, a class will probably only have one schema associated 
> with it (created and cached by {{ReflectData.getSchema(Type)}}), I 
> experienced {{OutOfMemoryError}} when serializing generic classes with 
> dynamically-generated schemas. The following is a minimal example which will 
> exhaust a 50MiB heap ({{-Xmx50m}}) after about 190K iterations:
> {code:title=AvroMemoryLeakMinimal.java|borderStyle=solid}
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.io.BinaryEncoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroMemoryLeakMinimal {
> public static void main(String[] args) throws IOException {
> long count = 0;
> EncoderFactory encFactory = EncoderFactory.get();
> try {
> while (true) {
> // Create schema
> Schema schema = Schema.createRecord("schema", null, null, 
> false);
> schema.setFields(Collections.emptyList());
> // serialize
> ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
> BinaryEncoder encoder = encFactory.binaryEncoder(baos, null);
> (new ReflectDatumWriter(schema)).write(new Object(), 
> encoder);
> byte[] result = baos.toByteArray();
> count++;
> }
> } catch (OutOfMemoryError e) {
> System.out.print("Memory exhausted after ");
> System.out.print(count);
> System.out.println(" schemas");
> throw e;
> }
> }
> }
> {code}
> I was able to fix the bug in the latest 1.9.0-SNAPSHOT from git with the 
> following patch to {{ClassAccessorData.bySchema}} to use weak keys so that it 
> properly released the {{Schema}} objects if no other threads are still 
> referencing them:
> {code:title=ReflectData.java.patch|borderStyle=solid}
> --- a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> @@ -57,6 +57,7 @@ import org.apache.avro.io.DatumWriter;
>  import org.apache.avro.specific.FixedSize;
>  import org.apache.avro.specific.SpecificData;
>  import org.apache.avro.SchemaNormalization;
> +import org.apache.avro.util.WeakIdentityHashMap;
>  import org.codehaus.jackson.JsonNode;
>  import org.codehaus.jackson.node.NullNode;
>  
> @@ -234,8 +235,8 @@ public class ReflectData extends SpecificData {
>  private final Class clazz;
>  private final Map byName =
>  new HashMap();
> -private final IdentityHashMap bySchema =
> -new IdentityHashMap();
> +private final WeakIdentityHashMap bySchema =
> +new WeakIdentityHashMap();
>  
>  private ClassAccessorData(Class c) {
>clazz = c;
> {code}
> Additionally, I'm not sure why an {{IdentityHashMap}} was used instead of a 
> standard {{HashMap}}, since two equivalent schemas have the same set of 
> {{FieldAccessor}}. Everything appears to work and all tests pass if I use a 
> {{WeakHashMap}} instead of an {{WeakIdentityHashMap}}, but I don't know if 
> there was some other reason object identity was important for this map. If a 
> non-identity map can be used, this will help reduce