[jira] [Updated] (AVRO-2050) Clear Array To Allow GC

2017-07-17 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated AVRO-2050:
--
Attachment: AVRO-2050.2.patch

[~nkollar]

This implementation is essentially an {{ArrayList}}.  The {{ArrayList}} 
overwrites the {{clear}} method because using the default {{AbstractList}} 
implementation requires instantiating an Iterator and then deleting each item 
in the Iterator one at a time.  This is bad performance in terms of constant 
stack manipulation, but also this amounts to draining the array from the head 
of the list.  Draining from the head requires an array copy for each item 
removed to shift down the existing records.  It is much better to override the 
method as {{ArrayList}} has done.  However, I did see some overlap with the 
{{toString}} and {{add}} methods which can be leveraged.  Changed the patch to 
remove the two overrides.

> Clear Array To Allow GC
> ---
>
> Key: AVRO-2050
> URL: https://issues.apache.org/jira/browse/AVRO-2050
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.7, 1.8.2
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: AVRO-2050.1.patch, AVRO-2050.2.patch
>
>
> Java's {{ArrayList}} implementation clears all Objects from the internal 
> buffer when the {{clear()}} method is called.  This allows the Objects to be 
> free for GC.  We should do the same in Avro 
> {{org.apache.avro.generic.GenericData}} 
> [ArrayList 
> Source|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/ArrayList.java#ArrayList.clear%28%29]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2050) Clear Array To Allow GC

2017-07-17 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089518#comment-16089518
 ] 

Nandor Kollar commented on AVRO-2050:
-

I'm wondering why {{clear()}} method is in overridden. It looks like the base 
class is AbstractList, which has clear method implemented correctly, so we 
might instead implement the {{Array}} iterator's {{remove()}} method no?

> Clear Array To Allow GC
> ---
>
> Key: AVRO-2050
> URL: https://issues.apache.org/jira/browse/AVRO-2050
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.7, 1.8.2
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: AVRO-2050.1.patch
>
>
> Java's {{ArrayList}} implementation clears all Objects from the internal 
> buffer when the {{clear()}} method is called.  This allows the Objects to be 
> free for GC.  We should do the same in Avro 
> {{org.apache.avro.generic.GenericData}} 
> [ArrayList 
> Source|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/ArrayList.java#ArrayList.clear%28%29]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2046) avro-python3: Very restricted set of data types which are allowed in AvroSchemaFromJSONData

2017-07-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089511#comment-16089511
 ] 

ASF GitHub Bot commented on AVRO-2046:
--

GitHub user manu-chroma opened a pull request:

https://github.com/apache/avro/pull/235

schema.py: No sys traceback in parse exception

In the ``SchemaParseException``, do not provide sys traceback. 

For our project CWL Tool, we're using `avro/py` in our python 3 builds. 
More on this has been discussed here: 
https://issues.apache.org/jira/browse/AVRO-2046 

For doing this, we use `autotranslate` tool which converts `avro/py` code 
to python2and3 compatible code during runtime. 
The problem arises when it tries to convert this `raise Exception` command. 
There is no way to achieve this in a cross-compatible way without the use of 
external lib.
 
Thus, I've created this PR. This is a very minimal change and really solves 
our problem for the time being. We really hope you'll consider this or at least 
give your feedback on the same.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manu-chroma/avro patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #235


commit 92525fda5cbae1ea7b9e5e255a52ad7e8f0ff71f
Author: Manvendra Singh 
Date:   2017-07-17T08:53:28Z

schema.py: No sys traceback in parse exception

In the ``SchemaParseException``, do not provide sys traceback. 

For our project CWL Tool, we're using `avro/py` in our python 3 builds. 
More on this has been discussed here: 
https://issues.apache.org/jira/browse/AVRO-2046 

For doing this, we use `autotranslate` tool which converts `avro/py` code 
to python2and3 compatible code during runtime. 
The problem arises when it tries to convert this `raise Exception` command. 
There is no way to achieve this in a cross-compatible way without the use of 
external lib.
 
Thus, I've created this PR. This is a very minimal change and really solves 
our problem for the time being. We really hope you'll consider this or at least 
give your feedback on the same.




> avro-python3: Very restricted set of data types which are allowed in 
> AvroSchemaFromJSONData
> ---
>
> Key: AVRO-2046
> URL: https://issues.apache.org/jira/browse/AVRO-2046
> Project: Avro
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.8.2
> Environment: avro-python3 (1.8.2)
>Reporter: Manvendra Singh
>
> Hey, I come from CWL project: 
> https://github.com/common-workflow-language/cwltool and as a part of my GSoC 
> project, I'm working on adding Python 3 compatibility to *cwltool* codebase. 
> We've been using avro-python2 for a long time now and it has worked great for 
> us in our projects: schema_salad and cwltool.
> In the process of porting cwltool, I'm facing issues with avro-python3 
> library. I've found the following bug:
> Minimal reproducible example:
> {code:none}
> from collections import OrderedDict
> import avro.schema
> AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData
> a = {
>   "fields": [
> {
>   "name": "name",
>   "type": "string"
> },
> {
>   "name": "favorite_number",
>   "type": [
> "int",
> "null"
>   ]
> },
> {
>   "name": "favorite_color",
>   "type": [
> "string",
> "null"
>   ]
> }
>   ],
>   "name": "User",
>   "namespace": "example.avro",
>   "type": "record"
> }
> b = OrderedDict(a)
> AvroSchemaFromJSONData(a)
> AvroSchemaFromJSONData(b)
> {code}
> Ouput: 
> {code}
> ~/Desktop/test/venv3/lib/python3.5/site-packages/avro/schema.py in 
> SchemaFromJSONData(json_data, names)
>1252   if parser is None:
>1253 raise SchemaParseException(
> -> 1254 'Invalid JSON descriptor for an Avro schema: %r.' % json_data)
>1255   return parser(json_data, names=names)
>1256 
> SchemaParseException: Invalid JSON descriptor for an Avro schema: 
> OrderedDict([('namespace', 'example.avro'), ('type', 'record'), ('name', 
> 'User'), ('fields', [{'type': 'string', 'name': 'name'}, {'type': ['int', 
> 'null'], 'name': 'favorite_number'}, {'type': ['string', 'null'], 'name': 
> 'favorite_color'}])]).
> {code}
>  
> h5. The current implementation of this function does not allow for *any dict 
> like data type*. It, however, works in avro-python2. 
> Relevant line of code: 
> https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1250
> Apart from this, I've tried 

[GitHub] avro pull request #235: schema.py: No sys traceback in parse exception

2017-07-17 Thread manu-chroma
GitHub user manu-chroma opened a pull request:

https://github.com/apache/avro/pull/235

schema.py: No sys traceback in parse exception

In the ``SchemaParseException``, do not provide sys traceback. 

For our project CWL Tool, we're using `avro/py` in our python 3 builds. 
More on this has been discussed here: 
https://issues.apache.org/jira/browse/AVRO-2046 

For doing this, we use `autotranslate` tool which converts `avro/py` code 
to python2and3 compatible code during runtime. 
The problem arises when it tries to convert this `raise Exception` command. 
There is no way to achieve this in a cross-compatible way without the use of 
external lib.
 
Thus, I've created this PR. This is a very minimal change and really solves 
our problem for the time being. We really hope you'll consider this or at least 
give your feedback on the same.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manu-chroma/avro patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #235


commit 92525fda5cbae1ea7b9e5e255a52ad7e8f0ff71f
Author: Manvendra Singh 
Date:   2017-07-17T08:53:28Z

schema.py: No sys traceback in parse exception

In the ``SchemaParseException``, do not provide sys traceback. 

For our project CWL Tool, we're using `avro/py` in our python 3 builds. 
More on this has been discussed here: 
https://issues.apache.org/jira/browse/AVRO-2046 

For doing this, we use `autotranslate` tool which converts `avro/py` code 
to python2and3 compatible code during runtime. 
The problem arises when it tries to convert this `raise Exception` command. 
There is no way to achieve this in a cross-compatible way without the use of 
external lib.
 
Thus, I've created this PR. This is a very minimal change and really solves 
our problem for the time being. We really hope you'll consider this or at least 
give your feedback on the same.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (AVRO-1786) Strange IndexOutofBoundException in GenericDatumReader.readString

2017-07-17 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089823#comment-16089823
 ] 

BELUGA BEHR commented on AVRO-1786:
---

May be experiencing this issue as well trying to collect more information...

> Strange IndexOutofBoundException in GenericDatumReader.readString
> -
>
> Key: AVRO-1786
> URL: https://issues.apache.org/jira/browse/AVRO-1786
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.4, 1.7.7
> Environment: CentOS 6.5 Linux x64, 2.6.32-358.14.1.el6.x86_64
> Use IBM JVM:
> IBM J9 VM (build 2.7, JRE 1.7.0 Linux amd64-64 Compressed References 
> 20140515_199835 (JIT enabled, AOT enabled)
>Reporter: Yong Zhang
>Priority: Minor
>
> Our production cluster is CENTOS 6.5 (2.6.32-358.14.1.el6.x86_64), running 
> IBM BigInsight V3.0.0.2. In Apache term, it is Hadoop 2.2.0 with MRV1(no 
> yarn), and comes with AVRO 1.7.4, running with IBM J9 VM (build 2.7, JRE 
> 1.7.0 Linux amd64-64 Compressed References 20140515_199835 (JIT enabled, AOT 
> enabled). Not sure if the JDK matters, but it is NOT Oracle JVM.
> We have a ETL implemented in a chain of MR jobs. In one MR job, it is going 
> to merge 2 sets of AVRO data. Dataset1 is in HDFS location A, and Dataset2 is 
> in HDFS location B, and both contains the AVRO records binding to the same 
> AVRO schema. The record contains an unique id field, and a timestamp field. 
> The MR job is to merge the records based on the ID, and use the later 
> timestamp record to replace previous timestamp record, and omit the final 
> AVRO record out. Very straightforward.
> Now we faced a problem that one reducer keeps failing with the following 
> stacktrace on JobTracker:
> {code}
> java.lang.IndexOutOfBoundsException
>   at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:191)
>   at java.io.DataInputStream.read(DataInputStream.java:160)
>   at 
> org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
>   at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
>   at 
> org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
>   at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:143)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:125)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:121)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
>   at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>   at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108)
>   at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117)
>   at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:366)
>   at javax.security.auth.Subject.doAs(Subject.java:572)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> Here is the my Mapper and Reducer methods:
> Mapper:
> public void map(AvroKey key, NullWritable value, Context 
> context) throws IOException, InterruptedException 
> Reducer:
> protected void reduce(CustomPartitionKeyClass key, 
> Iterable values, Context context) throws 
> IOException, InterruptedException 
> What bother me are the following facts:
> 1) All the mappers finish without error
> 2) Most of the reducers finish without error, but one reducer keeps failing 
> with the above error.
> 3) It looks like caused by the data? But keep in mind that all the avro 
> records passed the mapper side, but failed in one reducer. 
> 4) From the stacktrace, it looks like our reducer code was NOT invoked yet, 
> but failed 

[GitHub] avro pull request #236: [AVRO-2051] Remove synchronization for JsonPropertie...

2017-07-17 Thread dkulp
GitHub user dkulp opened a pull request:

https://github.com/apache/avro/pull/236

[AVRO-2051] Remove synchronization for JsonProperties.getJsonProp

This change does two basic things:

1) Makes "props" a private field and requires the subclasses to access it 
via the additional methods.  This allows some changing of the underlying 
implementation a bit easier.

2) Change props to an AtomicReference and makes it act like an immutable 
map.  The addProp method does a full copy of the map, adds the new value, and 
then atomicly swaps in the map thus not affecting other threads that would be 
using the value that was "current" when they called the get method.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dkulp/avro master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #236


commit ad14635fa3af97b90282a79b7e04a0b8753e45b5
Author: Daniel Kulp 
Date:   2017-07-17T19:08:10Z

[AVRO-2051] Remove synchronization for JsonProperties.getJsonProp




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (AVRO-2051) Thread contention accessing JsonProperties props

2017-07-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090365#comment-16090365
 ] 

ASF GitHub Bot commented on AVRO-2051:
--

GitHub user dkulp opened a pull request:

https://github.com/apache/avro/pull/236

[AVRO-2051] Remove synchronization for JsonProperties.getJsonProp

This change does two basic things:

1) Makes "props" a private field and requires the subclasses to access it 
via the additional methods.  This allows some changing of the underlying 
implementation a bit easier.

2) Change props to an AtomicReference and makes it act like an immutable 
map.  The addProp method does a full copy of the map, adds the new value, and 
then atomicly swaps in the map thus not affecting other threads that would be 
using the value that was "current" when they called the get method.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dkulp/avro master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #236


commit ad14635fa3af97b90282a79b7e04a0b8753e45b5
Author: Daniel Kulp 
Date:   2017-07-17T19:08:10Z

[AVRO-2051] Remove synchronization for JsonProperties.getJsonProp




> Thread contention accessing JsonProperties props
> 
>
> Key: AVRO-2051
> URL: https://issues.apache.org/jira/browse/AVRO-2051
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Daniel Kulp
>
> See 
> https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E
> Basically, the getJsonProp method, being synchronized, is causing thread 
> contention issues when trying to share schemas between threads.My 
> proposal (pull request forthcoming) is to treat "props" as an immutable map 
> and do a copy+add+swap for the addProp method.   This will make the addProp 
> call slower (particularly for large maps of props), but would make the reads 
> significantly faster as no locking will be needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2051) Thread contention accessing JsonProperties props

2017-07-17 Thread Daniel Kulp (JIRA)
Daniel Kulp created AVRO-2051:
-

 Summary: Thread contention accessing JsonProperties props
 Key: AVRO-2051
 URL: https://issues.apache.org/jira/browse/AVRO-2051
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.8.2
Reporter: Daniel Kulp


See 
https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E

Basically, the getJsonProp method, being synchronized, is causing thread 
contention issues when trying to share schemas between threads.My proposal 
(pull request forthcoming) is to treat "props" as an immutable map and do a 
copy+add+swap for the addProp method.   This will make the addProp call slower 
(particularly for large maps of props), but would make the reads significantly 
faster as no locking will be needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2052) Remove org.apache.avro.file.DataFileWriter Double Buffering

2017-07-17 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created AVRO-2052:
-

 Summary: Remove org.apache.avro.file.DataFileWriter Double 
Buffering
 Key: AVRO-2052
 URL: https://issues.apache.org/jira/browse/AVRO-2052
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.8.2, 1.7.7
Reporter: BELUGA BEHR
Priority: Trivial


{code:title=org.apache.avro.file.DataFileWriter}
  private void init(OutputStream outs) throws IOException {
this.underlyingStream = outs;
this.out = new BufferedFileOutputStream(outs);
EncoderFactory efactory = new EncoderFactory();
this.vout = efactory.binaryEncoder(out, null);
dout.setSchema(schema);
buffer = new NonCopyingByteArrayOutputStream(
Math.min((int)(syncInterval * 1.25), Integer.MAX_VALUE/2 -1));
this.bufOut = efactory.binaryEncoder(buffer, null);
if (this.codec == null) {
  this.codec = CodecFactory.nullCodec().createInstance();
}
this.isOpen = true;
  }
{code}

It's clear here that both streams are writing to a buffered destination, {{ 
BufferedFileOutputStream}} and {{ByteArrayOutputStream}} therefore there is no 
reason to need a buffered encoder and instead, write directly to the buffered 
streams with {{directBinaryEncoder}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2051) Thread contention accessing JsonProperties props

2017-07-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090654#comment-16090654
 ] 

Doug Cutting commented on AVRO-2051:


This can make building schemas quadratic in the number of properties, no?  
While for most schemas this is probably not an issue, for some it might 
significantly impact performance.

I think instead we should just bite the bullet and make Schema immutable, 
eliminating the addProp method altogether.  At the same time, we should stop 
exposing JsonNode in the public API, instead using only Object, as intended in 
AVRO-1585.

> Thread contention accessing JsonProperties props
> 
>
> Key: AVRO-2051
> URL: https://issues.apache.org/jira/browse/AVRO-2051
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Daniel Kulp
>
> See 
> https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E
> Basically, the getJsonProp method, being synchronized, is causing thread 
> contention issues when trying to share schemas between threads.My 
> proposal (pull request forthcoming) is to treat "props" as an immutable map 
> and do a copy+add+swap for the addProp method.   This will make the addProp 
> call slower (particularly for large maps of props), but would make the reads 
> significantly faster as no locking will be needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AVRO-2052) Remove org.apache.avro.file.DataFileWriter Double Buffering

2017-07-17 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated AVRO-2052:
--
Attachment: AVRO-2052.1.patch

Call {{directBinaryEncoder}} instead of the buffered {{binaryEncoder}}

> Remove org.apache.avro.file.DataFileWriter Double Buffering
> ---
>
> Key: AVRO-2052
> URL: https://issues.apache.org/jira/browse/AVRO-2052
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.7, 1.8.2
>Reporter: BELUGA BEHR
>Priority: Trivial
> Attachments: AVRO-2052.1.patch
>
>
> {code:title=org.apache.avro.file.DataFileWriter}
>   private void init(OutputStream outs) throws IOException {
> this.underlyingStream = outs;
> this.out = new BufferedFileOutputStream(outs);
> EncoderFactory efactory = new EncoderFactory();
> this.vout = efactory.binaryEncoder(out, null);
> dout.setSchema(schema);
> buffer = new NonCopyingByteArrayOutputStream(
> Math.min((int)(syncInterval * 1.25), Integer.MAX_VALUE/2 -1));
> this.bufOut = efactory.binaryEncoder(buffer, null);
> if (this.codec == null) {
>   this.codec = CodecFactory.nullCodec().createInstance();
> }
> this.isOpen = true;
>   }
> {code}
> It's clear here that both streams are writing to a buffered destination, {{ 
> BufferedFileOutputStream}} and {{ByteArrayOutputStream}} therefore there is 
> no reason to need a buffered encoder and instead, write directly to the 
> buffered streams with {{directBinaryEncoder}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2051) Thread contention accessing JsonProperties props

2017-07-17 Thread Daniel Kulp (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090896#comment-16090896
 ] 

Daniel Kulp commented on AVRO-2051:
---

I'm trying to find something that will work for Avro 1.8.x as that's what we'll 
need.Thus, removing all of that is likely not an option. 

That said, I just discovered that we already have parts of guava shaded in as a 
dependency.   Thus, I believe I can use the CacheBuilder to create the 
equivalent of a "ConcurrentLinkedHashMap" (there are some google links that 
mention this) that would work for this and not have the quadratic issue.   I'll 
investigate more tomorrow.   Another option would be to either add a dependency 
to something else (like caffeine) that has a ConcurrentLinkedHashMap or 
copy/shade an Apache licensed version (like 
https://github.com/ben-manes/concurrentlinkedhashmap/blob/master/src/main/java/com/googlecode/concurrentlinkedhashmap/ConcurrentLinkedHashMap.java)
 into the src and use it.   

> Thread contention accessing JsonProperties props
> 
>
> Key: AVRO-2051
> URL: https://issues.apache.org/jira/browse/AVRO-2051
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Daniel Kulp
>
> See 
> https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E
> Basically, the getJsonProp method, being synchronized, is causing thread 
> contention issues when trying to share schemas between threads.My 
> proposal (pull request forthcoming) is to treat "props" as an immutable map 
> and do a copy+add+swap for the addProp method.   This will make the addProp 
> call slower (particularly for large maps of props), but would make the reads 
> significantly faster as no locking will be needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AVRO-2051) Thread contention accessing JsonProperties props

2017-07-17 Thread Daniel Kulp (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090911#comment-16090911
 ] 

Daniel Kulp edited comment on AVRO-2051 at 7/18/17 1:04 AM:


Of course another option is to just surround the access to the props with a 
ReentrantReadWriteLock.  Bunch of ideas to test and benchmark.


was (Author: dkulp):
Of course another option is to just surround the access to the props with a 
ReentrantReadWriteLock.

> Thread contention accessing JsonProperties props
> 
>
> Key: AVRO-2051
> URL: https://issues.apache.org/jira/browse/AVRO-2051
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Daniel Kulp
>
> See 
> https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E
> Basically, the getJsonProp method, being synchronized, is causing thread 
> contention issues when trying to share schemas between threads.My 
> proposal (pull request forthcoming) is to treat "props" as an immutable map 
> and do a copy+add+swap for the addProp method.   This will make the addProp 
> call slower (particularly for large maps of props), but would make the reads 
> significantly faster as no locking will be needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2051) Thread contention accessing JsonProperties props

2017-07-17 Thread Daniel Kulp (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090911#comment-16090911
 ] 

Daniel Kulp commented on AVRO-2051:
---

Of course another option is to just surround the access to the props with a 
ReentrantReadWriteLock.

> Thread contention accessing JsonProperties props
> 
>
> Key: AVRO-2051
> URL: https://issues.apache.org/jira/browse/AVRO-2051
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Daniel Kulp
>
> See 
> https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E
> Basically, the getJsonProp method, being synchronized, is causing thread 
> contention issues when trying to share schemas between threads.My 
> proposal (pull request forthcoming) is to treat "props" as an immutable map 
> and do a copy+add+swap for the addProp method.   This will make the addProp 
> call slower (particularly for large maps of props), but would make the reads 
> significantly faster as no locking will be needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)