[jira] [Commented] (AVRO-1410) Explicit version specification in pom prevents dependency management
[ https://issues.apache.org/jira/browse/AVRO-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846234#comment-13846234 ] Tom White commented on AVRO-1410: - The patch simply moves the version numbers into the top-level Java POM's dependency management section (where other version numbers are already defined), rather than having them defined in the avro-mapred POM. The dependencies and version numbers for each profile (Hadoop 1 and 2) are unchanged. So I'm +1 on the change. I verified that running the tests under each profile was successful with the change: {noformat} mvn clean install mvn -Dhadoop.version=2 clean install {noformat} Explicit version specification in pom prevents dependency management Key: AVRO-1410 URL: https://issues.apache.org/jira/browse/AVRO-1410 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.5 Reporter: E. Sammer Assignee: E. Sammer Fix For: 1.7.6 Attachments: AVRO-1410-2.patch, AVRO-1410.patch While building Avro against different versions of Hadoop I found that avro-mapred explicitly specifies the versions of some of the Hadoop dependencies rather than deferring to its own dependency management. This makes it impossible to build against other versions of Hadoop without pom-hackery. It's also a potential maintenance issue for Avro in that changes to the dependency management section of the Java top level pom won't be picked up by the mapred module. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: [jira] [Commented] (AVRO-1382) Support for python3
There were just some strings replaced by ant, and I prefer to have the python module more standalone, on the other side still we need to then put the avro version somewhere, which so far is an ant replacement string in the released versions. Pedro. On Thu, Dec 12, 2013 at 12:18 AM, Doug Cutting (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/AVRO-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845820#comment-13845820] Doug Cutting commented on AVRO-1382: Ant is not required. All that releases require is that the top-level build.sh script works. In particular, that './build.sh dist' puts binary release artifacts in the top-level dist/ directory, that 'test' runs unit tests, and 'clean' removes files generated by the other commands. If ant is replaced with some other build tool then the top-level build.sh should be updated to invoke the new tool rather than ant. Some languages implement a lang/*/build.sh script that invokes a language-specific build tool and then copies source code archive files up to ../../dist. Also, if the build tools change then the top-level BUILD.txt file should be updated. Support for python3 --- Key: AVRO-1382 URL: https://issues.apache.org/jira/browse/AVRO-1382 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.7.5 Reporter: Christophe Taton Attachments: AVRO-1382.20131203-001922.diff Hi, I'd need to use Avro from Python3, which would require essentially the following changes, which I am happy to contribute: - rewrite except statements according to new syntax - rewrite print statements according to new syntax - basestring becomes str - update some imports (StringIO becomes io.StringIO, httplib becomes http.client) This would apparently require branching the python code to maintain a version for python2 and a separate version for python3. Any thoughts on how to approach this? Thanks! -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (AVRO-1411) org.apache.avro.util.Utf8 performance improvement by remove private Charset in class
Tie Liu created AVRO-1411: - Summary: org.apache.avro.util.Utf8 performance improvement by remove private Charset in class Key: AVRO-1411 URL: https://issues.apache.org/jira/browse/AVRO-1411 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Tie Liu Priority: Minor Inside org.apache.avro.util.Utf8 class, it has a private member field defined as: private static final Charset UTF8 = Charset.forName(UTF-8); and it's used as: public static final byte[] getBytesFor(String str) { return str.getBytes(UTF8); } I guess the intention of create this object is to save object creation, but when we dive into the string.getBytes code, when it's called with Charset, it actually create a new StringEncoder in java.lang.StringCoding: static byte[] encode(Charset cs, char[] ca, int off, int len) { StringEncoder se = new StringEncoder(cs, cs.name()); char[] c = Arrays.copyOf(ca, ca.length); return se.encode(c, off, len); } If instead we just call it with string literal UTF-8, it will just reuse the threadlocal StringEncoder. We tried overwrite this class with passing string literal and proved those short lived StringEncoder objects is not created any more. Would like apache to fix this so we don't need to overwrite it anymore. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (AVRO-1406) Avro C++ GenericRecord (GenericDatum, etc.) doesn't support getters and setters with field name argument
[ https://issues.apache.org/jira/browse/AVRO-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iaroslav Zeigerman updated AVRO-1406: - Attachment: AVRO-1406_2.patch Added new patch according to discussion with Doug Cutting and Thiruvalluvan M. G. Avro C++ GenericRecord (GenericDatum, etc.) doesn't support getters and setters with field name argument Key: AVRO-1406 URL: https://issues.apache.org/jira/browse/AVRO-1406 Project: Avro Issue Type: Bug Components: c++ Affects Versions: 1.7.5 Reporter: Iaroslav Zeigerman Labels: c++ Fix For: 1.7.6 Attachments: AVRO-1406.patch, AVRO-1406.patch, AVRO-1406_2.patch In Java implementation there is GenericData.Record which can use field names to set and get data. There is nothing similar in C++ implementation. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1410) Explicit version specification in pom prevents dependency management
[ https://issues.apache.org/jira/browse/AVRO-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846469#comment-13846469 ] Doug Cutting commented on AVRO-1410: Thanks, Tom, for testing this. I was somehow mis-reading the patch. +1 Explicit version specification in pom prevents dependency management Key: AVRO-1410 URL: https://issues.apache.org/jira/browse/AVRO-1410 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.5 Reporter: E. Sammer Assignee: E. Sammer Fix For: 1.7.6 Attachments: AVRO-1410-2.patch, AVRO-1410.patch While building Avro against different versions of Hadoop I found that avro-mapred explicitly specifies the versions of some of the Hadoop dependencies rather than deferring to its own dependency management. This makes it impossible to build against other versions of Hadoop without pom-hackery. It's also a potential maintenance issue for Avro in that changes to the dependency management section of the Java top level pom won't be picked up by the mapred module. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1382) Support for python3
[ https://issues.apache.org/jira/browse/AVRO-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846478#comment-13846478 ] Doug Cutting commented on AVRO-1382: One more thing to update relevant to releases are the publishing instructions for PyPi. These are in: https://cwiki.apache.org/confluence/display/AVRO/How+To+Release#HowToRelease-Publishing We need to add instructions here for publishing the Python 3 release artifacts. Support for python3 --- Key: AVRO-1382 URL: https://issues.apache.org/jira/browse/AVRO-1382 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.7.5 Reporter: Christophe Taton Attachments: AVRO-1382.20131203-001922.diff Hi, I'd need to use Avro from Python3, which would require essentially the following changes, which I am happy to contribute: - rewrite except statements according to new syntax - rewrite print statements according to new syntax - basestring becomes str - update some imports (StringIO becomes io.StringIO, httplib becomes http.client) This would apparently require branching the python code to maintain a version for python2 and a separate version for python3. Any thoughts on how to approach this? Thanks! -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: unsigned types
On Thu, Dec 12, 2013 at 7:26 AM, Pedro Larroy pedro.larroy.li...@gmail.comwrote: Is there a plan / schedule for Avro 2.0? Christophe Taton started a discussion about this topic ten days ago on this list. You can find it in the archives. http://mail-archives.apache.org/mod_mbox/avro-dev/ A major release is required for incompatible changes to the schema language. Thus 2.0 planning work must focus on determining a set of such changes. I think in most (if not all) cases one can implement equivalent features compatibly in the 1.0 branch at the cost of making schemas more verbose. So there's a tension between preserving compatibility and improving schema elegance, as the issue discussed in this thread illustrates. Further discussion of a 2.0 release should probably occur on that thread. Doug
Re: Effort towards Avro 2.0?
On Wed, Dec 4, 2013 at 11:40 PM, Christophe Taton christophe.ta...@gmail.com wrote: Well, I guess one can always handle such things externally to Avro. This needn't be done externally. When an extension schema is encountered, the schema compiler can generate Object references, the DatumWriter can write the schema signature and encode the object, and the DatumReader can locate the referenced schema and create and deserialize the appropriate object. This behavior could be compatibly added to the existing generic, specific and reflect representations, since the extension schema is in Avro's namespace and should thus not conflict with any user schema. So this could be implemented compatibly. Until support is added in other languages (Python, C, C++, etc.) these objects would be opaque, but no matter how the feature is implemented it will require implementation work in each language. Doug
[jira] [Commented] (AVRO-1411) org.apache.avro.util.Utf8 performance improvement by remove private Charset in class
[ https://issues.apache.org/jira/browse/AVRO-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846520#comment-13846520 ] Doug Cutting commented on AVRO-1411: Please contribute a patch with this change. Also please provide benchmark results. Ideally these would use the existing performance suite (lang/java/ipc/src/test/java/org/apache/avro/io/Perf.java). Once we can validate the performance improvement then we can probably get the change committed. org.apache.avro.util.Utf8 performance improvement by remove private Charset in class Key: AVRO-1411 URL: https://issues.apache.org/jira/browse/AVRO-1411 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Tie Liu Priority: Minor Inside org.apache.avro.util.Utf8 class, it has a private member field defined as: private static final Charset UTF8 = Charset.forName(UTF-8); and it's used as: public static final byte[] getBytesFor(String str) { return str.getBytes(UTF8); } I guess the intention of create this object is to save object creation, but when we dive into the string.getBytes code, when it's called with Charset, it actually create a new StringEncoder in java.lang.StringCoding: static byte[] encode(Charset cs, char[] ca, int off, int len) { StringEncoder se = new StringEncoder(cs, cs.name()); char[] c = Arrays.copyOf(ca, ca.length); return se.encode(c, off, len); } If instead we just call it with string literal UTF-8, it will just reuse the threadlocal StringEncoder. We tried overwrite this class with passing string literal and proved those short lived StringEncoder objects is not created any more. Would like apache to fix this so we don't need to overwrite it anymore. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1406) Avro C++ GenericRecord (GenericDatum, etc.) doesn't support getters and setters with field name argument
[ https://issues.apache.org/jira/browse/AVRO-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846536#comment-13846536 ] Doug Cutting commented on AVRO-1406: Looks good to me except the new 'setFieldAt' method should be named just 'setField', no? Avro C++ GenericRecord (GenericDatum, etc.) doesn't support getters and setters with field name argument Key: AVRO-1406 URL: https://issues.apache.org/jira/browse/AVRO-1406 Project: Avro Issue Type: Bug Components: c++ Affects Versions: 1.7.5 Reporter: Iaroslav Zeigerman Labels: c++ Fix For: 1.7.6 Attachments: AVRO-1406.patch, AVRO-1406.patch, AVRO-1406_2.patch In Java implementation there is GenericData.Record which can use field names to set and get data. There is nothing similar in C++ implementation. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1406) Avro C++ GenericRecord (GenericDatum, etc.) doesn't support getters and setters with field name argument
[ https://issues.apache.org/jira/browse/AVRO-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846540#comment-13846540 ] Doug Cutting commented on AVRO-1406: Also, we should add unit tests that call these new methods before we commit this. Avro C++ GenericRecord (GenericDatum, etc.) doesn't support getters and setters with field name argument Key: AVRO-1406 URL: https://issues.apache.org/jira/browse/AVRO-1406 Project: Avro Issue Type: Bug Components: c++ Affects Versions: 1.7.5 Reporter: Iaroslav Zeigerman Labels: c++ Fix For: 1.7.6 Attachments: AVRO-1406.patch, AVRO-1406.patch, AVRO-1406_2.patch In Java implementation there is GenericData.Record which can use field names to set and get data. There is nothing similar in C++ implementation. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1411) org.apache.avro.util.Utf8 performance improvement by remove private Charset in class
[ https://issues.apache.org/jira/browse/AVRO-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846558#comment-13846558 ] Rob Turner commented on AVRO-1411: -- This is exactly the same as AVRO-1348, I have added a patch there. org.apache.avro.util.Utf8 performance improvement by remove private Charset in class Key: AVRO-1411 URL: https://issues.apache.org/jira/browse/AVRO-1411 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Tie Liu Priority: Minor Inside org.apache.avro.util.Utf8 class, it has a private member field defined as: private static final Charset UTF8 = Charset.forName(UTF-8); and it's used as: public static final byte[] getBytesFor(String str) { return str.getBytes(UTF8); } I guess the intention of create this object is to save object creation, but when we dive into the string.getBytes code, when it's called with Charset, it actually create a new StringEncoder in java.lang.StringCoding: static byte[] encode(Charset cs, char[] ca, int off, int len) { StringEncoder se = new StringEncoder(cs, cs.name()); char[] c = Arrays.copyOf(ca, ca.length); return se.encode(c, off, len); } If instead we just call it with string literal UTF-8, it will just reuse the threadlocal StringEncoder. We tried overwrite this class with passing string literal and proved those short lived StringEncoder objects is not created any more. Would like apache to fix this so we don't need to overwrite it anymore. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1411) org.apache.avro.util.Utf8 performance improvement by remove private Charset in class
[ https://issues.apache.org/jira/browse/AVRO-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846578#comment-13846578 ] Tie Liu commented on AVRO-1411: --- I checked the patch, the patch v2 (https://issues.apache.org/jira/secure/attachment/12616422/AVRO-1348v2.patch) is exactly the one i'm going to submit. So I guess now it's safe to close this jira since it's duplicate with https://issues.apache.org/jira/browse/AVRO-1348. With changed to use string literal UTF-8, I saw similar improvements as Rob Turner saw. org.apache.avro.util.Utf8 performance improvement by remove private Charset in class Key: AVRO-1411 URL: https://issues.apache.org/jira/browse/AVRO-1411 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Tie Liu Priority: Minor Inside org.apache.avro.util.Utf8 class, it has a private member field defined as: private static final Charset UTF8 = Charset.forName(UTF-8); and it's used as: public static final byte[] getBytesFor(String str) { return str.getBytes(UTF8); } I guess the intention of create this object is to save object creation, but when we dive into the string.getBytes code, when it's called with Charset, it actually create a new StringEncoder in java.lang.StringCoding: static byte[] encode(Charset cs, char[] ca, int off, int len) { StringEncoder se = new StringEncoder(cs, cs.name()); char[] c = Arrays.copyOf(ca, ca.length); return se.encode(c, off, len); } If instead we just call it with string literal UTF-8, it will just reuse the threadlocal StringEncoder. We tried overwrite this class with passing string literal and proved those short lived StringEncoder objects is not created any more. Would like apache to fix this so we don't need to overwrite it anymore. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (AVRO-1411) org.apache.avro.util.Utf8 performance improvement by remove private Charset in class
[ https://issues.apache.org/jira/browse/AVRO-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tie Liu resolved AVRO-1411. --- Resolution: Duplicate duplicate as jira https://issues.apache.org/jira/browse/AVRO-1348 org.apache.avro.util.Utf8 performance improvement by remove private Charset in class Key: AVRO-1411 URL: https://issues.apache.org/jira/browse/AVRO-1411 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Tie Liu Priority: Minor Inside org.apache.avro.util.Utf8 class, it has a private member field defined as: private static final Charset UTF8 = Charset.forName(UTF-8); and it's used as: public static final byte[] getBytesFor(String str) { return str.getBytes(UTF8); } I guess the intention of create this object is to save object creation, but when we dive into the string.getBytes code, when it's called with Charset, it actually create a new StringEncoder in java.lang.StringCoding: static byte[] encode(Charset cs, char[] ca, int off, int len) { StringEncoder se = new StringEncoder(cs, cs.name()); char[] c = Arrays.copyOf(ca, ca.length); return se.encode(c, off, len); } If instead we just call it with string literal UTF-8, it will just reuse the threadlocal StringEncoder. We tried overwrite this class with passing string literal and proved those short lived StringEncoder objects is not created any more. Would like apache to fix this so we don't need to overwrite it anymore. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion
[ https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846597#comment-13846597 ] Tie Liu commented on AVRO-1348: --- I tested it under java 6, will be out of office tomorrow, will try to produce the benchmark when get back on Monday. Improve Utf8 to String conversion - Key: AVRO-1348 URL: https://issues.apache.org/jira/browse/AVRO-1348 Project: Avro Issue Type: Bug Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: AVRO-1348v2.patch, AVRO1348v1.patch AVRO-1241 found that the existing method of creating Strings from Utf8 byte arrays could be made faster. The same method is being used in the Utf8.toString(), and could likely be sped up by doing the same thing. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion
[ https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846867#comment-13846867 ] Rob Turner commented on AVRO-1348: -- Here are results on 32 bit Linux with Sun JDK: ||jdk||change||test name||time||M entries/sec|| M bytes/sec|| bytes/cycle |jdk1.6.0_45| UTF-8| StringRead: | 34721 ms | 1.152|41.032| 1780910| |jdk1.6.0_45| Charset| StringRead: | 45231 ms | 0.884|31.499| 1780910| |jdk1.7.0_40| UTF-8| StringRead: | 28211 ms | 1.418|50.502| 1780910| |jdk1.7.0_40| Charset| StringRead: | 34729 ms | 1.152|41.024| 1780910| Improve Utf8 to String conversion - Key: AVRO-1348 URL: https://issues.apache.org/jira/browse/AVRO-1348 Project: Avro Issue Type: Bug Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: AVRO-1348v2.patch, AVRO1348v1.patch AVRO-1241 found that the existing method of creating Strings from Utf8 byte arrays could be made faster. The same method is being used in the Utf8.toString(), and could likely be sped up by doing the same thing. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (AVRO-1398) Increase DEFAULT_SYNC_INTERVAL to 64K from 16,000
[ https://issues.apache.org/jira/browse/AVRO-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated AVRO-1398: --- Resolution: Fixed Status: Resolved (was: Patch Available) I committed this. Thanks, Rob. Increase DEFAULT_SYNC_INTERVAL to 64K from 16,000 - Key: AVRO-1398 URL: https://issues.apache.org/jira/browse/AVRO-1398 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Rob Turner Assignee: Rob Turner Priority: Minor Fix For: 1.7.6 Attachments: AVRO-1398.patch This improves compression especially for deflate. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1398) Increase DEFAULT_SYNC_INTERVAL to 64K from 16,000
[ https://issues.apache.org/jira/browse/AVRO-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846873#comment-13846873 ] ASF subversion and git services commented on AVRO-1398: --- Commit 1550578 from [~cutting] in branch 'avro/trunk' [ https://svn.apache.org/r1550578 ] AVRO-1398. Increase default sync interval from 16k to 64k. Contributed by Rob Turner. Increase DEFAULT_SYNC_INTERVAL to 64K from 16,000 - Key: AVRO-1398 URL: https://issues.apache.org/jira/browse/AVRO-1398 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Rob Turner Assignee: Rob Turner Priority: Minor Fix For: 1.7.6 Attachments: AVRO-1398.patch This improves compression especially for deflate. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1397) Binary fragment tools should allow more than one datum or JSON object
[ https://issues.apache.org/jira/browse/AVRO-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846894#comment-13846894 ] Doug Cutting commented on AVRO-1397: This is looking good. Sorry it took me so long to get back to it! For compatibility, pretty-printing should be off by default, so the option should be --pretty to enable it. Other than that, +1. Binary fragment tools should allow more than one datum or JSON object - Key: AVRO-1397 URL: https://issues.apache.org/jira/browse/AVRO-1397 Project: Avro Issue Type: New Feature Components: java Affects Versions: 1.7.5 Reporter: Rob Turner Assignee: Rob Turner Priority: Minor Attachments: AVRO-1397.patch, AVRO-1397.patch, AVRO-1397.patch It would be useful for the binary fragment tools to process more than one datum or JSON object. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1407) NettyTransceiver can cause a infinite loop when slow to connect
[ https://issues.apache.org/jira/browse/AVRO-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846905#comment-13846905 ] Doug Cutting commented on AVRO-1407: Why can't the channelFuture be closed in the finally clause? NettyTransceiver can cause a infinite loop when slow to connect --- Key: AVRO-1407 URL: https://issues.apache.org/jira/browse/AVRO-1407 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.5, 1.7.6 Reporter: Gareth Davis Attachments: AVRO-1407-1.patch When a new {{NettyTransceiver}} is created it forces the channel to be allocated and connected to the remote host. it waits for the connectTimeout ms on the [connect channel future|https://github.com/apache/avro/blob/1579ab1ac95731630af58fc303a07c9bf28541d6/lang/java/ipc/src/main/java/org/apache/avro/ipc/NettyTransceiver.java#L271] this is obivously a good thing it's only that on being unsuccessful, ie {{!channelFuture.isSuccess()}} an exception is thrown and the call to the constructor fails with an {{IOException}}, but has the potential to leave a active channel associated with the {{ChannelFactory}} The problem is that a Netty {{NioClientSocketChannelFactory}} will not shutdown if there are active channels still around and if you have supplied the {{ChannelFactory}} to the {{NettyTransceiver}} then you will not be able to cancel it by calling {{ChannelFactory.releaseExternalResources()}} like the [Flume Avro RPC client does|https://github.com/apache/flume/blob/b8cf789b8509b1e5be05dd0b0b16c5d9af9698ae/flume-ng-sdk/src/main/java/org/apache/flume/api/NettyAvroRpcClient.java#L158]. In order to recreate this you need a very laggy network, where the connect attempt takes longer than the connect timeout but does actually work, this very hard to organise in a test case, although I do have a test setup using vagrant VM's that recreates this everytime, using the Flume RPC client and server. The following stack is from a production system, it won't ever leave recover until the channel is disconnected (by forcing a disconnect at the remote host) or restarting the JVM. {noformat:title=Production stack trace} TLOG-0 daemon prio=10 tid=0x7f581c7be800 nid=0x39a1 waiting on condition [0x7f57ef9f2000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) parking to wait for 0x0007218b16e0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1253) at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:103) at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.releaseExternalResources(AbstractNioWorkerPool.java:80) at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.releaseExternalResources(NioClientSocketChannelFactory.java:181) at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:142) at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:101) at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:564) locked 0x0006c30ae7b0 (a org.apache.flume.api.NettyAvroRpcClient) at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88) at org.apache.flume.api.LoadBalancingRpcClient.createClient(LoadBalancingRpcClient.java:214) at org.apache.flume.api.LoadBalancingRpcClient.getClient(LoadBalancingRpcClient.java:205) locked 0x0006a97b18e8 (a org.apache.flume.api.LoadBalancingRpcClient) at org.apache.flume.api.LoadBalancingRpcClient.appendBatch(LoadBalancingRpcClient.java:95) at com.ean.platform.components.tlog.client.service.AvroRpcEventRouter$1.call(AvroRpcEventRouter.java:45) at com.ean.platform.components.tlog.client.service.AvroRpcEventRouter$1.call(AvroRpcEventRouter.java:43) {noformat} The solution is very simple, and a patch should be along in a moment. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (AVRO-1234) Avro MapReduce jobs silently ignore input data without '.avro' extension
[ https://issues.apache.org/jira/browse/AVRO-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated AVRO-1234: - Status: Patch Available (was: Open) Avro MapReduce jobs silently ignore input data without '.avro' extension Key: AVRO-1234 URL: https://issues.apache.org/jira/browse/AVRO-1234 Project: Avro Issue Type: Bug Affects Versions: 1.7.3 Reporter: Dave Beech Assignee: Dave Beech Fix For: 1.7.6 Attachments: AVRO-1234-1.patch, AVRO-1234.patch The AvroInputFormat class explicitly checks each input path for a '.avro' extension. If only some of the input paths have the correct extension, the remainder are silently ignored and not included in the job. However, if none of the input paths have the extension, the job will continue and succeed even though no map tasks are allocated, and no work is done. This only happens using the old mapred API. The new mapreduce API version will happily read files regardless of extension. Is the check necessary? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1408) JsonDecoder fails when fields with default values are omitted
[ https://issues.apache.org/jira/browse/AVRO-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846925#comment-13846925 ] Doug Cutting commented on AVRO-1408: Avro requires the schema that was used to write the data (the writer schema) when reading. A second schema (the reader schema) may also be provided that is used to interpret the data. When the writer schema is missing a field that the reader schema has then the default value from the reader schema is used. You are not providing the writer schema when reading, but only the reader schema. Here's a test that illustrates: {code} @Test public void testDefault() throws Exception { String writerJson = {\type\:\record\,\name\:\Foo\,\fields\:[ +{\type\:\int\,\name\:\x\}]}; String readerJson = {\type\:\record\,\name\:\Foo\,\fields\:[ +{\type\:\int\,\name\:\x\}, +{\type\:\int\,\name\:\y\,\default\:-1}]}; Schema writerSchema = Schema.parse(writerJson); Schema readerSchema = Schema.parse(readerJson); DatumReaderGenericRecord reader = new GenericDatumReaderGenericRecord(writerSchema, readerSchema); Decoder decoder = DecoderFactory.get().jsonDecoder(writerSchema, {\x\:1}); GenericRecord r = reader.read(null, decoder); Assert.assertEquals(-1, r.get(y)); } {code} JsonDecoder fails when fields with default values are omitted - Key: AVRO-1408 URL: https://issues.apache.org/jira/browse/AVRO-1408 Project: Avro Issue Type: Bug Affects Versions: 1.7.4, 1.7.5 Reporter: Adam Cataldo Start with an IDL like protocol Foo { record Bar { int x; int y = 0; } } Then use JsonDecoder to deserialize: {x:30} Bug: this fails because y is not provided. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1234) Avro MapReduce jobs silently ignore input data without '.avro' extension
[ https://issues.apache.org/jira/browse/AVRO-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846926#comment-13846926 ] Sandy Ryza commented on AVRO-1234: -- Attached a patch that adds a flag for opt-in and adds a test Avro MapReduce jobs silently ignore input data without '.avro' extension Key: AVRO-1234 URL: https://issues.apache.org/jira/browse/AVRO-1234 Project: Avro Issue Type: Bug Affects Versions: 1.7.3 Reporter: Dave Beech Assignee: Dave Beech Fix For: 1.7.6 Attachments: AVRO-1234-1.patch, AVRO-1234.patch The AvroInputFormat class explicitly checks each input path for a '.avro' extension. If only some of the input paths have the correct extension, the remainder are silently ignored and not included in the job. However, if none of the input paths have the extension, the job will continue and succeed even though no map tasks are allocated, and no work is done. This only happens using the old mapred API. The new mapreduce API version will happily read files regardless of extension. Is the check necessary? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (AVRO-1234) Avro MapReduce jobs silently ignore input data without '.avro' extension
[ https://issues.apache.org/jira/browse/AVRO-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated AVRO-1234: - Attachment: AVRO-1234-1.patch Avro MapReduce jobs silently ignore input data without '.avro' extension Key: AVRO-1234 URL: https://issues.apache.org/jira/browse/AVRO-1234 Project: Avro Issue Type: Bug Affects Versions: 1.7.3 Reporter: Dave Beech Assignee: Dave Beech Fix For: 1.7.6 Attachments: AVRO-1234-1.patch, AVRO-1234.patch The AvroInputFormat class explicitly checks each input path for a '.avro' extension. If only some of the input paths have the correct extension, the remainder are silently ignored and not included in the job. However, if none of the input paths have the extension, the job will continue and succeed even though no map tasks are allocated, and no work is done. This only happens using the old mapred API. The new mapreduce API version will happily read files regardless of extension. Is the check necessary? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1234) Avro MapReduce jobs silently ignore input data without '.avro' extension
[ https://issues.apache.org/jira/browse/AVRO-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846937#comment-13846937 ] Doug Cutting commented on AVRO-1234: Patch looks good. We might add more javadoc, noting that the default is to ignore, plus add doc for the default constant. Other than that +1. Avro MapReduce jobs silently ignore input data without '.avro' extension Key: AVRO-1234 URL: https://issues.apache.org/jira/browse/AVRO-1234 Project: Avro Issue Type: Bug Affects Versions: 1.7.3 Reporter: Dave Beech Assignee: Dave Beech Fix For: 1.7.6 Attachments: AVRO-1234-1.patch, AVRO-1234.patch The AvroInputFormat class explicitly checks each input path for a '.avro' extension. If only some of the input paths have the correct extension, the remainder are silently ignored and not included in the job. However, if none of the input paths have the extension, the job will continue and succeed even though no map tasks are allocated, and no work is done. This only happens using the old mapred API. The new mapreduce API version will happily read files regardless of extension. Is the check necessary? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (AVRO-1234) Avro MapReduce jobs silently ignore input data without '.avro' extension
[ https://issues.apache.org/jira/browse/AVRO-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated AVRO-1234: --- Resolution: Fixed Status: Resolved (was: Patch Available) I committed this. Thanks Dave Sandy! Avro MapReduce jobs silently ignore input data without '.avro' extension Key: AVRO-1234 URL: https://issues.apache.org/jira/browse/AVRO-1234 Project: Avro Issue Type: Bug Affects Versions: 1.7.3 Reporter: Dave Beech Assignee: Dave Beech Fix For: 1.7.6 Attachments: AVRO-1234-1.patch, AVRO-1234-2.patch, AVRO-1234.patch The AvroInputFormat class explicitly checks each input path for a '.avro' extension. If only some of the input paths have the correct extension, the remainder are silently ignored and not included in the job. However, if none of the input paths have the extension, the job will continue and succeed even though no map tasks are allocated, and no work is done. This only happens using the old mapred API. The new mapreduce API version will happily read files regardless of extension. Is the check necessary? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1234) Avro MapReduce jobs silently ignore input data without '.avro' extension
[ https://issues.apache.org/jira/browse/AVRO-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847000#comment-13847000 ] ASF subversion and git services commented on AVRO-1234: --- Commit 1550605 from [~cutting] in branch 'avro/trunk' [ https://svn.apache.org/r1550605 ] AVRO-1234. Java: Permit AvroInputFormat to process files whose names don't end in .avro. Contributed by Dave Beech Sandy Ryza. Avro MapReduce jobs silently ignore input data without '.avro' extension Key: AVRO-1234 URL: https://issues.apache.org/jira/browse/AVRO-1234 Project: Avro Issue Type: Bug Affects Versions: 1.7.3 Reporter: Dave Beech Assignee: Dave Beech Fix For: 1.7.6 Attachments: AVRO-1234-1.patch, AVRO-1234-2.patch, AVRO-1234.patch The AvroInputFormat class explicitly checks each input path for a '.avro' extension. If only some of the input paths have the correct extension, the remainder are silently ignored and not included in the job. However, if none of the input paths have the extension, the job will continue and succeed even though no map tasks are allocated, and no work is done. This only happens using the old mapred API. The new mapreduce API version will happily read files regardless of extension. Is the check necessary? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1406) Avro C++ GenericRecord (GenericDatum, etc.) doesn't support getters and setters with field name argument
[ https://issues.apache.org/jira/browse/AVRO-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847022#comment-13847022 ] Thiruvalluvan M. G. commented on AVRO-1406: --- The patch looks good to me. +1 for Doug's two comments. Avro C++ GenericRecord (GenericDatum, etc.) doesn't support getters and setters with field name argument Key: AVRO-1406 URL: https://issues.apache.org/jira/browse/AVRO-1406 Project: Avro Issue Type: Bug Components: c++ Affects Versions: 1.7.5 Reporter: Iaroslav Zeigerman Labels: c++ Fix For: 1.7.6 Attachments: AVRO-1406.patch, AVRO-1406.patch, AVRO-1406_2.patch In Java implementation there is GenericData.Record which can use field names to set and get data. There is nothing similar in C++ implementation. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (AVRO-1408) JsonDecoder fails when fields with default values are omitted
[ https://issues.apache.org/jira/browse/AVRO-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Cataldo resolved AVRO-1408. Resolution: Not A Problem Release Note: Thanks for the clarification Doug. JsonDecoder fails when fields with default values are omitted - Key: AVRO-1408 URL: https://issues.apache.org/jira/browse/AVRO-1408 Project: Avro Issue Type: Bug Affects Versions: 1.7.4, 1.7.5 Reporter: Adam Cataldo Start with an IDL like protocol Foo { record Bar { int x; int y = 0; } } Then use JsonDecoder to deserialize: {x:30} Bug: this fails because y is not provided. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1234) Avro MapReduce jobs silently ignore input data without '.avro' extension
[ https://issues.apache.org/jira/browse/AVRO-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847240#comment-13847240 ] Hudson commented on AVRO-1234: -- SUCCESS: Integrated in AvroJava #406 (See [https://builds.apache.org/job/AvroJava/406/]) AVRO-1234. Java: Permit AvroInputFormat to process files whose names don't end in .avro. Contributed by Dave Beech Sandy Ryza. (cutting: rev 1550605) * /avro/trunk/CHANGES.txt * /avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroAsTextInputFormat.java * /avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroInputFormat.java * /avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapred/tether/TetherInputFormat.java * /avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapred/TestAvroInputFormat.java Avro MapReduce jobs silently ignore input data without '.avro' extension Key: AVRO-1234 URL: https://issues.apache.org/jira/browse/AVRO-1234 Project: Avro Issue Type: Bug Affects Versions: 1.7.3 Reporter: Dave Beech Assignee: Dave Beech Fix For: 1.7.6 Attachments: AVRO-1234-1.patch, AVRO-1234-2.patch, AVRO-1234.patch The AvroInputFormat class explicitly checks each input path for a '.avro' extension. If only some of the input paths have the correct extension, the remainder are silently ignored and not included in the job. However, if none of the input paths have the extension, the job will continue and succeed even though no map tasks are allocated, and no work is done. This only happens using the old mapred API. The new mapreduce API version will happily read files regardless of extension. Is the check necessary? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1400) Introduce annotation to specify default values
[ https://issues.apache.org/jira/browse/AVRO-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847238#comment-13847238 ] Hudson commented on AVRO-1400: -- SUCCESS: Integrated in AvroJava #406 (See [https://builds.apache.org/job/AvroJava/406/]) AVRO-1400. Java: Add AvroDefault reflect annotation to specify default values. (cutting: rev 1550260) * /avro/trunk/CHANGES.txt * /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/Schema.java * /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/reflect/AvroDefault.java * /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java * /avro/trunk/lang/java/avro/src/test/java/org/apache/avro/reflect/TestReflect.java Introduce annotation to specify default values -- Key: AVRO-1400 URL: https://issues.apache.org/jira/browse/AVRO-1400 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: dennis lucero Assignee: Doug Cutting Labels: annotation, features, reflection, schema Fix For: 1.7.6 Attachments: AVRO-1400.patch It would be nice if there was an annotation in org.apache.avro.reflect to specify default values for schemata derived reflectively, allowing for proper schema evolution. I suggest the following: {code:java} @AvroDefault(1) int someNum; @Nullable @AvroDefault(null) AnotherRecord myRec; {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1398) Increase DEFAULT_SYNC_INTERVAL to 64K from 16,000
[ https://issues.apache.org/jira/browse/AVRO-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847239#comment-13847239 ] Hudson commented on AVRO-1398: -- SUCCESS: Integrated in AvroJava #406 (See [https://builds.apache.org/job/AvroJava/406/]) AVRO-1398. Increase default sync interval from 16k to 64k. Contributed by Rob Turner. (cutting: rev 1550578) * /avro/trunk/CHANGES.txt * /avro/trunk/lang/c++/api/DataFile.hh * /avro/trunk/lang/c++/impl/DataFile.cc * /avro/trunk/lang/csharp/src/apache/main/File/DataFileConstants.cs * /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/file/DataFileConstants.java * /avro/trunk/lang/php/lib/avro/data_file.php * /avro/trunk/lang/py/src/avro/datafile.py * /avro/trunk/lang/ruby/lib/avro/data_file.rb Increase DEFAULT_SYNC_INTERVAL to 64K from 16,000 - Key: AVRO-1398 URL: https://issues.apache.org/jira/browse/AVRO-1398 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Rob Turner Assignee: Rob Turner Priority: Minor Fix For: 1.7.6 Attachments: AVRO-1398.patch This improves compression especially for deflate. -- This message was sent by Atlassian JIRA (v6.1.4#6159)