[jira] [Commented] (AVRO-806) add a column-major codec for data files
[ https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257077#comment-13257077 ] Doug Cutting commented on AVRO-806: --- I've implemented a new column-file format at: https://github.com/cutting/trevni This supports writing Avro data. If folks find this useful then I intend to contribute it to Apache. add a column-major codec for data files --- Key: AVRO-806 URL: https://issues.apache.org/jira/browse/AVRO-806 Project: Avro Issue Type: New Feature Components: java, spec Reporter: Doug Cutting Assignee: Doug Cutting Fix For: 1.7.0 Attachments: AVRO-806-v2.patch, AVRO-806.patch, avro-file-columnar.pdf Define a codec that, when a data file's schema is a record schema, writes blocks within the file in column-major order. This would permit better compression and also permit efficient skipping of fields that are not of interest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1062) DataFileWriter uses java.rmi.server.UID to generate unique id,which causes avro compilation problem on Android Delvik
[ https://issues.apache.org/jira/browse/AVRO-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255802#comment-13255802 ] Doug Cutting commented on AVRO-1062: Looks good to me. We could perhaps improve sync marker generation, but this issue should make it no worse and permits Avro to run on Android. I'll commit this soon unless someone objects. DataFileWriter uses java.rmi.server.UID to generate unique id,which causes avro compilation problem on Android Delvik - Key: AVRO-1062 URL: https://issues.apache.org/jira/browse/AVRO-1062 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.3 Environment: Android 2.3.3-API level 10 Reporter: Kevin Zhao Labels: patch Fix For: 1.7.0 Attachments: AVRO-1062.patch Because Android Delvik does not have java.rmi.* packages and org.apache.avro.file.DataFileWriter has a reference of java.rmi.server.UID,avro fails in compilation process on Android. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1061) Add sync interval option to Avro commandline tools
[ https://issues.apache.org/jira/browse/AVRO-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255174#comment-13255174 ] Doug Cutting commented on AVRO-1061: This looks good to me. We should probably add a test to TestDataFileTools that checks this. Should we also increase the default sync interval to 64k? Add sync interval option to Avro commandline tools -- Key: AVRO-1061 URL: https://issues.apache.org/jira/browse/AVRO-1061 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.0 Reporter: Ari Pollak Priority: Trivial Attachments: AVRO-1061.patch It would be nice to expose the sync interval to the avro commandline writer tools, since I've seen a 20%+ decrease in file size using deflate compression and a 64K+ sync interval instead of the default of 16K. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1057) Java builder API fails when default value does not match the first type in a union
[ https://issues.apache.org/jira/browse/AVRO-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253666#comment-13253666 ] Doug Cutting commented on AVRO-1057: I think this is correct. The specification says, Default values for union fields correspond to the first schema in the union (http://avro.apache.org/docs/current/spec.html). So the default value for {boolean null} must be a boolean. Java builder API fails when default value does not match the first type in a union -- Key: AVRO-1057 URL: https://issues.apache.org/jira/browse/AVRO-1057 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.3 Reporter: Christophe Taton Priority: Minor The following definition works fine with the builder: record Rec { union { boolean, null } field = false; } but this one fails: record Rec { union { boolean, null } field = null; } Rec.newBuilder().build() fails with this error: org.apache.avro.AvroRuntimeException: org.apache.avro.AvroTypeException: Non-boolean default for boolean: null at Rec$Builder.build Caused by: org.apache.avro.AvroTypeException: Non-boolean default for boolean: null at org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:393) at org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:350) at org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:178) at Rec$Builder.build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1057) Java builder API fails when default value does not match the first type in a union
[ https://issues.apache.org/jira/browse/AVRO-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253692#comment-13253692 ] Doug Cutting commented on AVRO-1057: Yes, this could be checked more aggressively. The existing logic for this is in RecordBuilderBase#defaultValue(). It writes the default JSON parse tree, then reads it into the appropriate Avro data structure (generic, specific, reflect). The writing does the error checking, and that's in ResolvingGrammarGenerator#encode. So the compiler and/or parser could call that. We probably don't want to check it unconditionally in the schema parser, as schema parsing is performance sensitive. Java builder API fails when default value does not match the first type in a union -- Key: AVRO-1057 URL: https://issues.apache.org/jira/browse/AVRO-1057 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.3 Reporter: Christophe Taton Priority: Minor The following definition works fine with the builder: record Rec { union { boolean, null } field = false; } but this one fails: record Rec { union { boolean, null } field = null; } Rec.newBuilder().build() fails with this error: org.apache.avro.AvroRuntimeException: org.apache.avro.AvroTypeException: Non-boolean default for boolean: null at Rec$Builder.build Caused by: org.apache.avro.AvroTypeException: Non-boolean default for boolean: null at org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:393) at org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:350) at org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:178) at Rec$Builder.build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1055) Race condition in Java fingerprinting code
[ https://issues.apache.org/jira/browse/AVRO-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250094#comment-13250094 ] Doug Cutting commented on AVRO-1055: +1 Looks good to me. You might add a comment saying something like, Nested class used so that table is not built unless it's used. Race condition in Java fingerprinting code -- Key: AVRO-1055 URL: https://issues.apache.org/jira/browse/AVRO-1055 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.0 Reporter: Thiruvalluvan M. G. Assignee: Thiruvalluvan M. G. Priority: Minor Attachments: AVRO-1055.patch There is a subtle race condition. If the fpTable64 is not yet initialized and two thread try to compute FP for two schemas (or the same schema) at the same time, one thread will start initializing the table while the other can start using the partially initialized table giving wrong result. The forthcoming patch fixes that issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-551) C: Build and pass tests on Win32
[ https://issues.apache.org/jira/browse/AVRO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245856#comment-13245856 ] Doug Cutting commented on AVRO-551: --- It's fine to include MIT and BSD licensed code, but the licenses for these files should be appended to the end of Avro's top-level LICENSE.txt file. For more information see: http://apache.org/legal/resolved.html C: Build and pass tests on Win32 Key: AVRO-551 URL: https://issues.apache.org/jira/browse/AVRO-551 Project: Avro Issue Type: Improvement Components: c Reporter: Bruce Mitchener Attachments: AVRO-551.patch Avro C does not currently build on Win32. We need to address that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1045) deepCopy of BYTES underflow exception
[ https://issues.apache.org/jira/browse/AVRO-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229630#comment-13229630 ] Doug Cutting commented on AVRO-1045: It's a little odd to require that deepCopy() preserve more than is checked by equals(). Some folks may might reasonably expect deepCopy() to compact large ByteBuffers. We could perhaps add a 'protected ByteBuffer GenericData#copyBytes(ByteBuffer)' method that could be overridden in a subclass? Would that work in your case? Am I being overly cautious? deepCopy of BYTES underflow exception - Key: AVRO-1045 URL: https://issues.apache.org/jira/browse/AVRO-1045 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.2 Reporter: Jeremy Lewi Priority: Minor Fix For: 1.6.3 Attachments: AVRO-1045.patch In org.apache.avro.generic.GenericData.deepCopy - the code for copying a ByteBuffer is ByteBuffer byteBufferValue = (ByteBuffer) value; byte[] bytesCopy = new byte[byteBufferValue.capacity()]; byteBufferValue.rewind(); byteBufferValue.get(bytesCopy); byteBufferValue.rewind(); return ByteBuffer.wrap(bytesCopy); I think this is problematic because it will cause an UnderFlow exception to be thrown if the ByteBuffer limit is less than the capacity of the byte buffer. My use case is as follows. I have ByteBuffer's backed by large arrays so I can avoid resizing the array every time I write data. So limit capacity. When the data is written, or copied I think avro should respect this. When data is serialized, avro should automatically use the minimum number of bytes. When an object is copied, I think it makes sense to preserve the capacity of the underlying buffer as opposed to compacting it. So I think the code could be fixed by replacing get with byteBufferValue.get(bytesCopy, 0 , byteBufferValue.limit()); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-987) Make Avro OSGi ready
[ https://issues.apache.org/jira/browse/AVRO-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223408#comment-13223408 ] Doug Cutting commented on AVRO-987: --- The patch looks reasonable to me, but fails to apply cleanly. Can someone please provide a version that applies to the current trunk? Thanks! Make Avro OSGi ready Key: AVRO-987 URL: https://issues.apache.org/jira/browse/AVRO-987 Project: Avro Issue Type: New Feature Components: java Reporter: Ioannis Canellos Attachments: AVRO-987-patch-updated.txt, AVRO-987-patch.txt It would be really nice to be able to use Avro inside OSGi. To achieve this two things are required: i) Provide proper MANIFEST.MF. ii) Deal with potential class loading issues. Avro uses Class.forName a lot and that is not very OSGi friendly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1022) Error in validate name
[ https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223419#comment-13223419 ] Doug Cutting commented on AVRO-1022: Raymie, this is a good approach. The spec language that requires ASCII should be changed from MUST to SHOULD. One use case that Scott mentioned that your prose does not is transmitting schemas from other systems, e.g., Avro Schemas might often be generated automatically from Pig or SQL schemas. In these cases accepting liberally permits schemas to pass through Avro losslessly. Strict validation is really only useful when a developer is the schema author. In many (most?) cases Avro might be an underlying tool, used indirectly through an application, and in these cases strict validation is probably not useful. Error in validate name -- Key: AVRO-1022 URL: https://issues.apache.org/jira/browse/AVRO-1022 Project: Avro Issue Type: Bug Components: java Reporter: Raymie Stata Priority: Minor Attachments: AVRO-1022.patch, AVRO-1022.patch, unicode-recommendation.html Fix schema.validateName to allow only ASCII letters, not Unicode letters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas
[ https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223428#comment-13223428 ] Doug Cutting commented on AVRO-1006: Patch looks good. +1 I'll commit this unless there are objections. Fingerprints for Avro Schemas - Key: AVRO-1006 URL: https://issues.apache.org/jira/browse/AVRO-1006 Project: Avro Issue Type: New Feature Components: java Reporter: Raymie Stata Assignee: Raymie Stata Labels: features Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, schema-fingerprinting.html, schema-fingerprinting.html, schema-fingerprinting.html Add function that returns a standardized, 64-bit fingerprint for schemas. Fingerprints are designed such that the chances of collisions is very, very low. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-784) SpecificCompiler should generate accessors
[ https://issues.apache.org/jira/browse/AVRO-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222847#comment-13222847 ] Doug Cutting commented on AVRO-784: --- Scott, good point. Let's open a new issue to add unboxed accessors. SpecificCompiler should generate accessors -- Key: AVRO-784 URL: https://issues.apache.org/jira/browse/AVRO-784 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.5.0 Reporter: E. Sammer Labels: features Attachments: avro-784.diff, avro-784.diff Avro's Java SpecificCompiler should generate java bean style accessors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1027) NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread
[ https://issues.apache.org/jira/browse/AVRO-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221291#comment-13221291 ] Doug Cutting commented on AVRO-1027: James, thanks for running your tests. I guess I'll go ahead and commit this now and roll a release candidate today. NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread --- Key: AVRO-1027 URL: https://issues.apache.org/jira/browse/AVRO-1027 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.1 Reporter: Simon Wilkinson Assignee: James Baldassari Fix For: 1.6.3 Attachments: AVRO-1027-v2.patch, AVRO-1027.patch If an Exception is caught while trying to write to a Channel, Netty can deliver the Exception to a ChannelUpstreamHandler on the same thread that attempted to write to the Channel. If this occurs with the NettyClientAvroHandler implementation of ChannelUpstreamHandler then the thread will deadlock. Specifically, NettyClientAvroHandler overrides the ChannelUpstreamHandler.exceptionCaught() method to perform a disconnect, which requires the NettyTransceiver's write lock. However, in the above situation, the thread will already have locked the NettyTransceiver's read lock to write to the Channel. ReentrantReadWriteLock does not allow upgrading from a read to a write lock, hence the thread deadlocks. Example stack trace (simplified): SessionManager-TimeoutPoller prio=10 tid=0x7b689c00 nid=0x375d waiting on condition [0x7b0ad000..0x7b0ade70] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xf2a944d8 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) [Acquire write lock] at org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:285) at org.apache.avro.ipc.NettyTransceiver.access$2(NettyTransceiver.java:281) at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.exceptionCaught(NettyTransceiver.java:499) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122) at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:473) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783) at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432) at org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:661) at org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:372) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:117) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771) at org.jboss.netty.channel.Channels.write(Channels.java:632) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) at org.jboss.netty.channel.Channels.write(Channels.java:611) at org.jboss.netty.channel.Channels.write(Channels.java:578) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251) [Acquire read lock] at org.apache.avro.ipc.NettyTransceiver.writeDataPack(NettyTransceiver.java:413) [Acquire read lock] at
[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas
[ https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221342#comment-13221342 ] Doug Cutting commented on AVRO-1006: Raymie SchemaFingerprint.fingerprint seems unnecessarily long... Now this becomes SchemaNormalization.fp(SchemaNormalization.toParsingForm(schema)). The 'fp' might better be spelled out as 'fingerprint'. Also a utility method like SchemaNormalization.parsingFingerprint(schema) might be useful. Graham pass a Normalizer instance... With the latest API, someone can already call SchemaNormalization.fingerprint() with a differently normalized schema, so I don't see the need for this. As we add more normalizers to Avro we can add new methods, so I'm not (yet) seeing the advantage of adding a Normalization interface. Fingerprints for Avro Schemas - Key: AVRO-1006 URL: https://issues.apache.org/jira/browse/AVRO-1006 Project: Avro Issue Type: New Feature Components: java Reporter: Raymie Stata Assignee: Raymie Stata Labels: features Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, schema-fingerprinting.html, schema-fingerprinting.html, schema-fingerprinting.html Add function that returns a standardized, 64-bit fingerprint for schemas. Fingerprints are designed such that the chances of collisions is very, very low. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas
[ https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221386#comment-13221386 ] Doug Cutting commented on AVRO-1006: A few minor nits: - the 'href' in one of the documentation links is missing its 'h' - the WHITESPACE comment should perhaps read, Eliminate all whitespace in JSON outside of string literals. - we might define a nested FingerprintAlgorithm Enum for the implemented fingerprint algorithm names. - SchemaNormalization should probably have a private constructor, e.g., 'private SchemaNormalization() {}' - the #fingerprint link in the class documentation is broken. Otherwise I'm +1 and look forward to committing this early next week unless there are objections. Fingerprints for Avro Schemas - Key: AVRO-1006 URL: https://issues.apache.org/jira/browse/AVRO-1006 Project: Avro Issue Type: New Feature Components: java Reporter: Raymie Stata Assignee: Raymie Stata Labels: features Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, schema-fingerprinting.html, schema-fingerprinting.html, schema-fingerprinting.html Add function that returns a standardized, 64-bit fingerprint for schemas. Fingerprints are designed such that the chances of collisions is very, very low. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1027) NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread
[ https://issues.apache.org/jira/browse/AVRO-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220296#comment-13220296 ] Doug Cutting commented on AVRO-1027: Simon, do you expect to supply a test today? It would be good to include this in 1.6.3, but I don't think it's a showstopper, since it's not a regression, is it? NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread --- Key: AVRO-1027 URL: https://issues.apache.org/jira/browse/AVRO-1027 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.1 Reporter: Simon Wilkinson Assignee: James Baldassari Fix For: 1.6.3 Attachments: AVRO-1027-v2.patch, AVRO-1027.patch If an Exception is caught while trying to write to a Channel, Netty can deliver the Exception to a ChannelUpstreamHandler on the same thread that attempted to write to the Channel. If this occurs with the NettyClientAvroHandler implementation of ChannelUpstreamHandler then the thread will deadlock. Specifically, NettyClientAvroHandler overrides the ChannelUpstreamHandler.exceptionCaught() method to perform a disconnect, which requires the NettyTransceiver's write lock. However, in the above situation, the thread will already have locked the NettyTransceiver's read lock to write to the Channel. ReentrantReadWriteLock does not allow upgrading from a read to a write lock, hence the thread deadlocks. Example stack trace (simplified): SessionManager-TimeoutPoller prio=10 tid=0x7b689c00 nid=0x375d waiting on condition [0x7b0ad000..0x7b0ade70] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xf2a944d8 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) [Acquire write lock] at org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:285) at org.apache.avro.ipc.NettyTransceiver.access$2(NettyTransceiver.java:281) at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.exceptionCaught(NettyTransceiver.java:499) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122) at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:473) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783) at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432) at org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:661) at org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:372) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:117) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771) at org.jboss.netty.channel.Channels.write(Channels.java:632) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) at org.jboss.netty.channel.Channels.write(Channels.java:611) at org.jboss.netty.channel.Channels.write(Channels.java:578) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251) [Acquire read lock] at org.apache.avro.ipc.NettyTransceiver.writeDataPack(NettyTransceiver.java:413)
[jira] [Commented] (AVRO-1027) NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread
[ https://issues.apache.org/jira/browse/AVRO-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220363#comment-13220363 ] Doug Cutting commented on AVRO-1027: Should we: - a. commit this without tests and roll a 1.6.3 RC today - b. hold off on 1.6.3 until this has tests next week - c. roll a 1.6.3 RC today without this I don't have a strong opinion. NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread --- Key: AVRO-1027 URL: https://issues.apache.org/jira/browse/AVRO-1027 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.1 Reporter: Simon Wilkinson Assignee: James Baldassari Fix For: 1.6.3 Attachments: AVRO-1027-v2.patch, AVRO-1027.patch If an Exception is caught while trying to write to a Channel, Netty can deliver the Exception to a ChannelUpstreamHandler on the same thread that attempted to write to the Channel. If this occurs with the NettyClientAvroHandler implementation of ChannelUpstreamHandler then the thread will deadlock. Specifically, NettyClientAvroHandler overrides the ChannelUpstreamHandler.exceptionCaught() method to perform a disconnect, which requires the NettyTransceiver's write lock. However, in the above situation, the thread will already have locked the NettyTransceiver's read lock to write to the Channel. ReentrantReadWriteLock does not allow upgrading from a read to a write lock, hence the thread deadlocks. Example stack trace (simplified): SessionManager-TimeoutPoller prio=10 tid=0x7b689c00 nid=0x375d waiting on condition [0x7b0ad000..0x7b0ade70] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xf2a944d8 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) [Acquire write lock] at org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:285) at org.apache.avro.ipc.NettyTransceiver.access$2(NettyTransceiver.java:281) at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.exceptionCaught(NettyTransceiver.java:499) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122) at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:473) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783) at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432) at org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:661) at org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:372) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:117) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771) at org.jboss.netty.channel.Channels.write(Channels.java:632) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) at org.jboss.netty.channel.Channels.write(Channels.java:611) at org.jboss.netty.channel.Channels.write(Channels.java:578) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251) [Acquire read lock] at
[jira] [Commented] (AVRO-999) NPE in Java, RecordBuilderBase.defaultValue
[ https://issues.apache.org/jira/browse/AVRO-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214909#comment-13214909 ] Doug Cutting commented on AVRO-999: --- The test case added in this patch passes with the changes in AVRO-1007. Also, the tests added in AVRO-1007 are substantially similar to the test added here. NPE in Java, RecordBuilderBase.defaultValue --- Key: AVRO-999 URL: https://issues.apache.org/jira/browse/AVRO-999 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.1 Environment: Java Reporter: Jay Rutten Assignee: James Baldassari Attachments: AVRO-999.patch If you have a union with a default of null, the code in RecordBuilderBase.defaultValue will cause an NPE in ConcurrentHashMap, since it is trying to add a null to the map. Sample union: {code} record Sample { union{null, string} value = null; } {code} Code: {code} // If not cached, get the default Java value by encoding the default JSON // value and then decoding it: if (defaultValue == null) { ByteArrayOutputStream baos = new ByteArrayOutputStream(); encoder = EncoderFactory.get().binaryEncoder(baos, encoder); ResolvingGrammarGenerator.encode(encoder, field.schema(), defaultJsonValue); encoder.flush(); decoder = DecoderFactory.get().binaryDecoder(baos.toByteArray(), decoder); defaultValue = new GenericDatumReader(field.schema()).read(null, decoder); defaultSchemaValues.putIfAbsent(field.pos(), defaultValue); // -- NPE from here } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas
[ https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214932#comment-13214932 ] Doug Cutting commented on AVRO-1006: This looks generally good. A few nits: - In class and method names, should we abbreviate 'fp' or spell out 'fingerprint'? FP means floating point to my eye. - Might we instead put this in org.apache.avro.SchemaFingerprint, rather than in util? Right now things in the util package depend only on the JDK, not on other parts of Avro. - Public methods and classes need javadoc comments. - The changes to the spec are not correctly processed by Forrest 0.8 for me. Fingerprints for Avro Schemas - Key: AVRO-1006 URL: https://issues.apache.org/jira/browse/AVRO-1006 Project: Avro Issue Type: New Feature Components: java Reporter: Raymie Stata Assignee: Raymie Stata Labels: features Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch, schema-fingerprinting.html, schema-fingerprinting.html, schema-fingerprinting.html Add function that returns a standardized, 64-bit fingerprint for schemas. Fingerprints are designed such that the chances of collisions is very, very low. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1036) IDL processing fails with multi-level nested imports
[ https://issues.apache.org/jira/browse/AVRO-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215125#comment-13215125 ] Doug Cutting commented on AVRO-1036: Looks like we patched this in parallel! You added some tests, which is great! My patch is a little different. When you include a file in a different directory then any imports it contains should be relative to its directory, not the directory of the original file, no? So this.inputDir is not the right value for the new inputDir, but rather it should come from the imported file. IDL processing fails with multi-level nested imports Key: AVRO-1036 URL: https://issues.apache.org/jira/browse/AVRO-1036 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.2 Reporter: George Fletcher Assignee: Doug Cutting Fix For: 1.6.3 Attachments: AVRO-1036.patch, jira-1036.patch The change to support finding IDL related files on the classpath in addition to the maven-plugin defined directory caused the context of the sourceDirectory to be lost when the InputStream return by findFile() is used to create a new Idl instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2
[ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213866#comment-13213866 ] Doug Cutting commented on AVRO-593: --- Ideally, anything in the .io, .util, and .file packages does not reference the .mapred or .mapreduce packages [ ... ] Much in these packages references AvroKey and AvroValue and/or AvroJob. These uses aren't mapreduce-specific and could be refactored away, e.g., by moving AvroKey and AvroValue from o.a.a.mapred to o.a.a.hadoop.io, but that would be incompatible. SortedKeyValueFile is the Avro equivalent of Hadoop's MapFile. Arguably it should be moved into o.a.a.io. It depends on AvroKeyValue, which might also be moved to the core. AvroKeyValue is very similar in functionality to o.a.a.mapred.Pair. Perhaps SortedKeyValueFile should be switched to use Pair and both moved to the core. I have implemented a SequenceFile shim and it works. There's now just a tiny class that needs to be in o.a.h.io, a base class that exposes two package-private nested classes from within SequenceFile. I've re-arranged the classes per Scott's #4 variant but can revert that. We need to decide how much refactoring we want to do here. Finally, I note that io.SeekableHadoopInput replicates functionality that's already in mapred.FsInput, so we should replace the former with the latter in the new code. Avro mapreduce apis incompatible with hadoop 0.20.2 --- Key: AVRO-593 URL: https://issues.apache.org/jira/browse/AVRO-593 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.3.2, 1.3.3 Environment: Avro 1.3.3, Hadoop 0.20.2 Reporter: Steve Severance Assignee: Garrett Wu Attachments: AVRO-593.patch, AVRO-593.patch The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1035) Add the possibility to append to existing avro files
[ https://issues.apache.org/jira/browse/AVRO-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213874#comment-13213874 ] Doug Cutting commented on AVRO-1035: Note that append is not reliable in current Hadoop releases. Append support in Hadoop 1.0 just means that flush() works reliably, not that append actually works. Append should be reliable in 0.23 releases although I doubt it's been well tested there yet. Add the possibility to append to existing avro files -- Key: AVRO-1035 URL: https://issues.apache.org/jira/browse/AVRO-1035 Project: Avro Issue Type: New Feature Reporter: Vyacheslav Zholudev Currently it is not possible to append to avro files that were written and closed. Here is a Scott Carey's reply on the mailing list: {quote} It is not possible without modifying DataFileWriter. Please open a JIRA ticket. It could not simply append to an OutputStream, since it must either: * Seek to the start to validate the schemas match and find the sync marker, or * Trust that the schemas match and find the sync marker from the last block DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we could add something to the mapred module that takes a Path and FileSystem and returns something that implemements an interface that DataFileWriter can append to. This would be something that is both a http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInp ut.html and an OutputStream, or has both an InputStream from the start of the existing file and an OutputStream at the end. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1023) Saved state should be restored in finally clause
[ https://issues.apache.org/jira/browse/AVRO-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214094#comment-13214094 ] Doug Cutting commented on AVRO-1023: The way this might be triggered is if a Schema.Parser is reused after a SchemaParseException is thrown. Currently the default namespace is that of the preceding schema parsed. If a SchemaParseException is thrown and the parser is reused then the default namespace could be that of a schema nested within the previous schema. Saved state should be restored in finally clause Key: AVRO-1023 URL: https://issues.apache.org/jira/browse/AVRO-1023 Project: Avro Issue Type: Bug Components: java Reporter: Raymie Stata Assignee: Raymie Stata Priority: Minor Attachments: AVRO-1023.patch Schema.parse(JsonParse) and Schema.parse(JsonNode,Names) save global state in a local variable; they should restore that state in a finally clause but they don't. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-999) NPE in Java, RecordBuilderBase.defaultValue
[ https://issues.apache.org/jira/browse/AVRO-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214103#comment-13214103 ] Doug Cutting commented on AVRO-999: --- I think this was already fixed in AVRO-1007. Can I close it as a duplicate? NPE in Java, RecordBuilderBase.defaultValue --- Key: AVRO-999 URL: https://issues.apache.org/jira/browse/AVRO-999 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.1 Environment: Java Reporter: Jay Rutten Assignee: James Baldassari Attachments: AVRO-999.patch If you have a union with a default of null, the code in RecordBuilderBase.defaultValue will cause an NPE in ConcurrentHashMap, since it is trying to add a null to the map. Sample union: {code} record Sample { union{null, string} value = null; } {code} Code: {code} // If not cached, get the default Java value by encoding the default JSON // value and then decoding it: if (defaultValue == null) { ByteArrayOutputStream baos = new ByteArrayOutputStream(); encoder = EncoderFactory.get().binaryEncoder(baos, encoder); ResolvingGrammarGenerator.encode(encoder, field.schema(), defaultJsonValue); encoder.flush(); decoder = DecoderFactory.get().binaryDecoder(baos.toByteArray(), decoder); defaultValue = new GenericDatumReader(field.schema()).read(null, decoder); defaultSchemaValues.putIfAbsent(field.pos(), defaultValue); // -- NPE from here } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2
[ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212819#comment-13212819 ] Doug Cutting commented on AVRO-593: --- I see a few choices: 1. org.apache.avro.{mapred,mapreduce,io,file,util}. This is what the code on github does. This would make the avro-mapred module contain things outside the org.apache.avro.mapred package, and splits Avro's io, file and util packages across multiple modules. 2. org.apache.avro.mapred.{mapreduce,io,file,util}. This is what my patch does. This is back-compatible and consistent with the module name, but places mapreduce under mapred, which is different than the Hadoop layout. 3. org.apache.avro.hadoop.{mapred,mapreduce,io,file,util}. We'd rename the module to be avro-hadoop. This would be incompatible but consistent with Hadoop. For back-compatibility we might leave the mapred classes in their current package. 4. org.apache.avro.{mapred,mapreduce,mapred.io,mapred.file,mapred.util}. This is back-compatible but includes a package that's not under the package of the module name. Tom, are you advocating for (4)? I'd be okay with that, I guess. I'm also leaning towards moving AvroSequenceFile under org.apache.avro and adding just a shim base class into org.apache.hadoop.io that subclasses SequenceFile and makes public the bits we need. That way if we get Hadoop to expose these bits the Avro API would not change. Avro mapreduce apis incompatible with hadoop 0.20.2 --- Key: AVRO-593 URL: https://issues.apache.org/jira/browse/AVRO-593 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.3.2, 1.3.3 Environment: Avro 1.3.3, Hadoop 0.20.2 Reporter: Steve Severance Assignee: Garrett Wu Attachments: AVRO-593.patch, AVRO-593.patch The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2
[ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213071#comment-13213071 ] Doug Cutting commented on AVRO-593: --- Is it possible to move AvroSequenceFile under o.a.a ? I discussed that above. We could move it, but we'd still need a shim in o.a.h.io, since the subclass accesses package-private bits. if we need to produce two otherwise identical modules in a build – one 0.23.x + compatible and one for the 0.20 / 0.22 / 1.0 users The nested Context classes in mapreduce's Mapper and Reducer went from abstract classes to interfaces (MAPREDUCE-954), requiring re-compilation of code that references these. But the mapreduce support added here does not reference these. So I think we're spared. Avro mapreduce apis incompatible with hadoop 0.20.2 --- Key: AVRO-593 URL: https://issues.apache.org/jira/browse/AVRO-593 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.3.2, 1.3.3 Environment: Avro 1.3.3, Hadoop 0.20.2 Reporter: Steve Severance Assignee: Garrett Wu Attachments: AVRO-593.patch, AVRO-593.patch The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-672) Convert JSON Text Input to Avro Tool
[ https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210586#comment-13210586 ] Doug Cutting commented on AVRO-672: --- Leith, is the tool that Ron provided here the one you need? If so, then we can probably resuscitate this patch and get it committed. If not, is there a specific tool you need (e.g., CSV or TSV)? Thanks! Convert JSON Text Input to Avro Tool Key: AVRO-672 URL: https://issues.apache.org/jira/browse/AVRO-672 Project: Avro Issue Type: New Feature Components: java Reporter: Ron Bodkin Attachments: AVRO-672.patch, AVRO-672.patch The attached patch allows reading a JSON-formatted text file in, converting to a conforming Avro text file, emitting one record per line, e.g., it can read this input file: {intval:12} {intval:-73,strval:hello, there!!} with this schema: { type:record, name:TestRecord, fields: [ {name:intval,type:int}, {name:strval,type:[string, null]}]} returning valid Avro. This is different than the DataFileWriteTool, which would read in the following internal encoding: {intval:12,strval:null} {intval:-73,strval:{string:hello, there!!}} In general, the internal encodings used by Avro aren't natural when reading in JSON text that appears in the wild. Likewise, this utility allows changing invalid Avro identifier characters into an underscore, again to tolerate JSON that wasn't designed to be readable by Avro. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2
[ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210813#comment-13210813 ] Doug Cutting commented on AVRO-593: --- Garrett, this code looks great! Thanks for contributing it. I renamed all of the packages to reside under org.apache.avro.mapred. So that package now has subpackages named io, file, util and mapreduce. That's consistent with other Avro modules, where classes are under org.apache.avro.module. The only exception is org.apache.hadoop.io.AvroSequenceFile. This is in a Hadoop package so that it can access some package-private parts of SequenceFile. This is fragile, as SequenceFile could change these non-public APIs. We should probably file an issue with Hadoop to make these items protected so that SequenceFile can be subclassed in a supported way. I plan to improve the javadoc a bit (adding package.html files to new packages) and move versions for new dependencies from mapred/pom.xml into the parent pom. Then I think this should be ready to commit. Avro mapreduce apis incompatible with hadoop 0.20.2 --- Key: AVRO-593 URL: https://issues.apache.org/jira/browse/AVRO-593 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.3.2, 1.3.3 Environment: Avro 1.3.3, Hadoop 0.20.2 Reporter: Steve Severance Assignee: Garrett Wu Attachments: AVRO-593.patch, AVRO-593.patch The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1025) migrate website dist to svnpubsub
[ https://issues.apache.org/jira/browse/AVRO-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209735#comment-13209735 ] Doug Cutting commented on AVRO-1025: I committed the docs to subversion and asked Infrastructure to switch the website to automatically update from there in INFRA-4443. migrate website dist to svnpubsub --- Key: AVRO-1025 URL: https://issues.apache.org/jira/browse/AVRO-1025 Project: Avro Issue Type: Improvement Reporter: Doug Cutting Assignee: Doug Cutting ASF infrastructure has requested that all projects migrate to svnpubsub for their websites and release distributions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1022) Error in validate name
[ https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207029#comment-13207029 ] Doug Cutting commented on AVRO-1022: If names are restricted, then consuming schemas from other systems will be difficult. Good point. The question is where the escaping burden lies: either with adapter layers (e.g., in Pig or Hive) or in the code generation layer. I'd argue that code generation layer already has to handle reserved words so that adding character escaping is not a significant burden there. It's also safer to not assume that other implementations have correctly escaped all names; to be tolerant. Finally, escaping as late as possible maximizes legibility through the system. Error in validate name -- Key: AVRO-1022 URL: https://issues.apache.org/jira/browse/AVRO-1022 Project: Avro Issue Type: Bug Components: java Reporter: Raymie Stata Priority: Minor Attachments: AVRO-1022.patch, AVRO-1022.patch Fix schema.validateName to allow only ASCII letters, not Unicode letters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1028) IPC transceiver doesn't gracefully handle server connection resets.
[ https://issues.apache.org/jira/browse/AVRO-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207313#comment-13207313 ] Doug Cutting commented on AVRO-1028: Patch works for me, but with the addition of urllib3 requirement perhaps we should push this to 1.7.0? IPC transceiver doesn't gracefully handle server connection resets. --- Key: AVRO-1028 URL: https://issues.apache.org/jira/browse/AVRO-1028 Project: Avro Issue Type: Improvement Components: python Affects Versions: 1.6.2 Reporter: Bo Shi Assignee: Bo Shi Fix For: 1.6.2 Attachments: AVRO-1028.patch The current Python HTTPTransceiver class forces users to handle connection resets. I've refactored the class using urllib3 and incorporated some features we get for free from said library into the transceiver. Added a test case for test_ipc.py that uses the twisted server implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1022) Error in validate name
[ https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205616#comment-13205616 ] Doug Cutting commented on AVRO-1022: An implementation would be naive to trust that other implementations have validated all names in schemas it receives. Java currently disables validation when reading a schema from a data file, since it's more important to be able to read the data. With Generic APIs name validation isn't required and many applications use only generic APIs. This would not require support for unicode identifiers in programming languages. A code generator should escape any character in a name that's not easy for it to represent in an identifier. We'd just be permitting code generators to take advantage of when a programming language does support Unicode in identifiers. If we went the other way (chance the spec), we'd have to answer a bunch of design questions (decide what is a letter, decide on normalization, figure out how to mangle names in various languages, etc.), and then implement validation in each language [ ... ] I disagree. Even if we removed all restrictions on naming I don't think we'd add much burden to implementations. Most implementations don't do code generation. Code generators already need to mangle names. A code generator should already escape rather than die when it sees an unexpected character in a name. (The alternative is an inability to generate code for schemas that someone else controls, a poor choice.) So I don't see a new interoperability problem this would create. We already have schemas in the wild whose names are invalid. Perhaps we should change the spec to recommend that names be restricted to ASCII for ease of programming with generated APIs in all languages. And we might check that in compiler, forcing folks to specify --escape-non-ASCII-names if they really want to generate code for a schema whose names contain non-ASCII characters, to discourage the use of non-ASCII in schemas that you do control. In general we could encourage implementations to both not trust that identifiers are all-ASCII and to try to encourage all-ASCII identifiers. Error in validate name -- Key: AVRO-1022 URL: https://issues.apache.org/jira/browse/AVRO-1022 Project: Avro Issue Type: Bug Components: java Reporter: Raymie Stata Priority: Minor Attachments: AVRO-1022.patch Fix schema.validateName to allow only ASCII letters, not Unicode letters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-973) Union behavior not consistent
[ https://issues.apache.org/jira/browse/AVRO-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205678#comment-13205678 ] Doug Cutting commented on AVRO-973: --- There is no change you can make to the current validation-based mechanic that guarantees correctness for record types That's right. Full recursive field validation is not required for unions. See AVRO-654 and http://avro.apache.org/docs/current/spec.html#Unions The object's schema name should be checked against the names of the schemas in the union. If the fields don't match but the names are the same then a runtime error should be generated. This is a longstanding misfeature of the Python implementation. Union behavior not consistent - Key: AVRO-973 URL: https://issues.apache.org/jira/browse/AVRO-973 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.6.1, 1.6.2 Reporter: Gaurav Nanda Labels: patch Attachments: AVRO-973-patch-1.patch, AVRO-973-patch-2.patch, AVRO-973-patch-3.patch, AVRO-973-wrapper.patch, AVRO-973-wrapper.patch, test_unions.py Original Estimate: 0.25h Remaining Estimate: 0.25h Python's union does not respect the order in which type is specified. For following schema: {type:map,values:[int,long,float,double,string,boolean]}, an integer value is written as double, but it should respect the order in which types have been specified. Fixed Code (io.py): def write_union(self, writers_schema, datum, encoder): A union is encoded by first writing a long value indicating the zero-based position within the union of the schema of its value. The value is then encoded per the indicated schema within the union. # resolve union index_of_schema = -1 for i, candidate_schema in enumerate(writers_schema.schemas): if validate(candidate_schema, datum): index_of_schema = i break // XXX Add break statement here XXX// if index_of_schema 0: raise AvroTypeException(writers_schema, datum) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-973) Union behavior not consistent
[ https://issues.apache.org/jira/browse/AVRO-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205715#comment-13205715 ] Doug Cutting commented on AVRO-973: --- Ultimately I think the fix is AVRO-283. Python would use objects to represent records, not dictionaries. This was recently discussed on the dev list where Marcio expressed interest in working on it (http://s.apache.org/deA). Perhaps you can collaborate there? Union behavior not consistent - Key: AVRO-973 URL: https://issues.apache.org/jira/browse/AVRO-973 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.6.1, 1.6.2 Reporter: Gaurav Nanda Labels: patch Attachments: AVRO-973-patch-1.patch, AVRO-973-patch-2.patch, AVRO-973-patch-3.patch, AVRO-973-wrapper.patch, AVRO-973-wrapper.patch, test_unions.py Original Estimate: 0.25h Remaining Estimate: 0.25h Python's union does not respect the order in which type is specified. For following schema: {type:map,values:[int,long,float,double,string,boolean]}, an integer value is written as double, but it should respect the order in which types have been specified. Fixed Code (io.py): def write_union(self, writers_schema, datum, encoder): A union is encoded by first writing a long value indicating the zero-based position within the union of the schema of its value. The value is then encoded per the indicated schema within the union. # resolve union index_of_schema = -1 for i, candidate_schema in enumerate(writers_schema.schemas): if validate(candidate_schema, datum): index_of_schema = i break // XXX Add break statement here XXX// if index_of_schema 0: raise AvroTypeException(writers_schema, datum) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1022) Error in validate name
[ https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204844#comment-13204844 ] Doug Cutting commented on AVRO-1022: And doing Unicode right is a lot of work; doing it poorly will just create a nasty source of interop problems. I don't see this. Avro already requires that JSON parsers do Unicode right. Permitting non-ASCII in identifiers only creates problems when generating code. The potential interoperability problem could be that some implementations, when given a schema, would be unable to generate valid code in their programming language for that schema, rendering that schema unreadable by generated code (although it would still be readable by generic code). That would be a bug in that implementation. Code generators already have to mangle names that are reserved words in the generated programming language. If we permit non-ASCII characters in identifiers then implementations might also need to escape non-ASCII characters when generating code. This doesn't seem a huge burden. It's important that the specification is clear about what characters implementations might expect to see in identifiers so that they know what characters need to be escaped. A conservative implementation might simply escape anything that's not permitted in their programming language. If the spec is changed we should specify precisely what characters are permitted. Unicode characters have properties. We can use these properties to make the specification precise. One property is 'letter', another is 'number'. Java's isLetterOrDigit() includes these two sets. Stepping back, it would be good if folks could use their own languages when writing Avro schemas. It should be possible to use, e.g., column names that are in Japanese, Chinese, Hindi, etc. Error in validate name -- Key: AVRO-1022 URL: https://issues.apache.org/jira/browse/AVRO-1022 Project: Avro Issue Type: Bug Components: java Reporter: Raymie Stata Priority: Minor Attachments: AVRO-1022.patch Fix schema.validateName to allow only ASCII letters, not Unicode letters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-990) ruby impl failed when the local_protocol not same with remote_protocol
[ https://issues.apache.org/jira/browse/AVRO-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204914#comment-13204914 ] Doug Cutting commented on AVRO-990: --- It would also be good to have a test case included in the patch, if possible. ruby impl failed when the local_protocol not same with remote_protocol -- Key: AVRO-990 URL: https://issues.apache.org/jira/browse/AVRO-990 Project: Avro Issue Type: Bug Components: ruby Affects Versions: 1.6.1 Reporter: kafka0102 Fix For: 1.7.0 Attachments: ipc.patch Original Estimate: 24h Remaining Estimate: 24h For Requestor class,when local_protocol is not same with remote_protocol,Requestor makes REMOTE_HASHES[transport.remote_name] has value, and skips self.remote_protocol = local_protocol in the write_handshake_request method, making the next new Requestor object's @remote_protocol always nil. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1021) Fix a few name-related imperfections in Avro spec
[ https://issues.apache.org/jira/browse/AVRO-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203788#comment-13203788 ] Doug Cutting commented on AVRO-1021: if there is an Avro Data File or old schema that used a name before defining it that currently works I don't think any implementation currently supports use-before-define, does it? A left-to-right traversal of JSON only makes sense for array elements. The only schemas that include multiple types and traversal order matters are unions and records, but these use JSON arrays, so left-to-right works. The types array in a protocol definition could come textually after the messages, but the types must be processed before the messages and in-order. Should we clarify that too? Fix a few name-related imperfections in Avro spec - Key: AVRO-1021 URL: https://issues.apache.org/jira/browse/AVRO-1021 Project: Avro Issue Type: Bug Components: spec Reporter: Raymie Stata Assignee: Raymie Stata Priority: Minor Attachments: AVRO-1021.patch, AVRO-1021.patch Require names are defined before used; disallow multiple definitions of names; clarify that name-equality is case sensitive (for type names, field names, and enum symbols). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1022) Error in validate name
[ https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203818#comment-13203818 ] Doug Cutting commented on AVRO-1022: Every language that currently implements Avro supports unicode identifiers. So I wonder if we should instead amend the specification to permit non-ASCII characters? Error in validate name -- Key: AVRO-1022 URL: https://issues.apache.org/jira/browse/AVRO-1022 Project: Avro Issue Type: Bug Components: java Reporter: Raymie Stata Priority: Minor Attachments: AVRO-1022.patch Fix schema.validateName to allow only ASCII letters, not Unicode letters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-995) Java: Update Dependencies for 1.6.2
[ https://issues.apache.org/jira/browse/AVRO-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203823#comment-13203823 ] Doug Cutting commented on AVRO-995: --- +1 Java: Update Dependencies for 1.6.2 --- Key: AVRO-995 URL: https://issues.apache.org/jira/browse/AVRO-995 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.6.2 Attachments: AVRO-995.2.patch, AVRO-995.patch A few of our dependencies need upgrading. In particular, I have been hit by a bug in Jackson that is fixed by the latest release (http://jira.codehaus.org/browse/JACKSON-462). Summary: I will submit a patch that updates everything to the next bugfix or very minor release, other than paranamer, thrift, and hadoop. Details: (using maven versions plugin) On the dependency side: [INFO] com.thoughtworks.paranamer:paranamer 2.3 - 2.4-debug could not find info about what is new in 2.4. I do not think we should upgrade until we have more info [INFO] net.sf.jopt-simple:jopt-simple 4.1 - 4.3 minor extra features (http://pholser.github.com/jopt-simple/changes.html) [INFO] org.apache.hadoop:hadoop-core 0.20.205.0 - 1.0.0 renamed 0.20.205, no need to update yet. [INFO] org.codehaus.jackson:jackson-mapper-asl ... 1.8.6 - 1.9.3 I suggest we upgrade to 1.8.7. [INFO] org.jboss.netty:netty . 3.2.6.Final - 3.2.7.Final bugfix release [INFO] org.apache.thrift:libthrift ... 0.7.0 - 0.8.0 Is this a minor / bugfix release? If so we should update, otherwise wait until Avro 1.7.x [INFO] org.slf4j:slf4j-api ... 1.6.3 - 1.6.4 [INFO] org.slf4j:slf4j-simple 1.6.3 - 1.6.4 Minor bugfixes (http://www.slf4j.org/news.html) On the plugin side: (mvn versions:display-plugin-updates) [INFO] maven-antrun-plugin .. 1.6 - 1.7 minor, looks safe: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWke1=w6nv2u85nkgqm0zxo3khyzdc8hazkhhvywjbuv...@mail.gmail.com%3E) [INFO] maven-gpg-plugin . 1.3 - 1.4 minor update (http://mail-archives.apache.org/mod_mbox/maven-announce/201108.mbox/%3CCA+nPnMw_3zQQCpzybQvo-QZFMCogvH31WEhxQnZ=cdzgxsr...@mail.gmail.com%3E) [INFO] maven-checkstyle-plugin .. 2.6 - 2.8 we avoided 2.7 before for some reason: (http://mail-archives.apache.org/mod_mbox/maven-dev/201108.mbox/%3ccapoybqsvu+kup5vuce8rc6mjb9rykr2cpig+rvbe5o8teo6...@mail.gmail.com%3E) useful new feature: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C15365449.01320142181746.JavaMail.mark@MARK%3E) [INFO] maven-surefire-plugin .. 2.10 - 2.11 lots of bug fixes and long requested new features: (http://mail-archives.apache.org/mod_mbox/maven-announce/201112.mbox/%3CCA+jQputH_uA2Ue6JqiHp1YeNo=qqxgcpdtgq9vv1aw_psqk...@mail.gmail.com%3E) [INFO] maven-shade-plugin ... 1.4 - 1.5 minor looks safe: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C1076639049.01320107865464.JavaMail.benson@tinfoilhat.local%3E) [INFO] maven-archetype-plugin ... 2.1 - 2.2 http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWkeLyc-tA2NCh3xYR06W+eGiWd46fS=R=ngjon14zrd...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1022) Error in validate name
[ https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203876#comment-13203876 ] Doug Cutting commented on AVRO-1022: I doubt it's well tested in implementations so there are probably bugs there. Error in validate name -- Key: AVRO-1022 URL: https://issues.apache.org/jira/browse/AVRO-1022 Project: Avro Issue Type: Bug Components: java Reporter: Raymie Stata Priority: Minor Attachments: AVRO-1022.patch Fix schema.validateName to allow only ASCII letters, not Unicode letters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1007) Insufficient validation in generated specific record builder implementations
[ https://issues.apache.org/jira/browse/AVRO-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203928#comment-13203928 ] Doug Cutting commented on AVRO-1007: According to the spec, that schema is malformed. The default value of a union should always be interpreted as the first type in the union. I don't think this schema currently works when reading records that lack the field f. See line 348 of ResovlingGrammarGenerator, where the default value is encoded using the first type in the union. I suppose we could change the spec to permit a null default value if any element of the union is null, but I don't see why we should. It makes the spec more complex and doesn't provide any additional expressive power. This patch would make the builder API enforce the spec, consistent with ResolvingDecoder. Insufficient validation in generated specific record builder implementations Key: AVRO-1007 URL: https://issues.apache.org/jira/browse/AVRO-1007 Project: Avro Issue Type: Bug Affects Versions: 1.6.1 Reporter: James Baldassari Assignee: James Baldassari Labels: java Fix For: 1.6.2 Attachments: AVRO-1007-v2.patch, AVRO-1007-v3.patch, AVRO-1007-v4.patch, AVRO-1007.patch, AVRO-1007.patch, AVRO-1007.patch The are two main problems with the generated build() method in specific record builders: * For non-primitive types, if there is no default value and the user does not set the value, build() will execute successfully without throwing an exception ** Instead, an AvroRuntimeException should be thrown with an exception message indicating the name of the required field that was not set * For primitive types, if there is no default value and the user does not set the value, an AvroRuntimeException is thrown with the 'cause' set to a NullPointerException, which is not very helpful ** The NPE comes from attempting to set the primitive field to the result of defaultValue(), which is null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1021) Fix a few name-related imperfections in Avro spec
[ https://issues.apache.org/jira/browse/AVRO-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204034#comment-13204034 ] Doug Cutting commented on AVRO-1021: where the types attribute of a protocol is always deemed to come before the messages attribute Works for me. Fix a few name-related imperfections in Avro spec - Key: AVRO-1021 URL: https://issues.apache.org/jira/browse/AVRO-1021 Project: Avro Issue Type: Bug Components: spec Reporter: Raymie Stata Assignee: Raymie Stata Priority: Minor Attachments: AVRO-1021.patch, AVRO-1021.patch Require names are defined before used; disallow multiple definitions of names; clarify that name-equality is case sensitive (for type names, field names, and enum symbols). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-850) Python protocol parsing doesn't set message error union to ['string'] when no errors declared
[ https://issues.apache.org/jira/browse/AVRO-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204165#comment-13204165 ] Doug Cutting commented on AVRO-850: --- This is a duplicate of AVRO-748 as far as I can tell Yes, but this one has a patch! I'll commit this tomorrow unless there are objections. Python protocol parsing doesn't set message error union to ['string'] when no errors declared - Key: AVRO-850 URL: https://issues.apache.org/jira/browse/AVRO-850 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.5.0 Reporter: Jeremy Lewi Assignee: Jeremy Lewi Attachments: AVRO-850.patch This bug applies to the python module. According to the protocol specification (http://avro.apache.org/docs/current/spec.html#Messages) when no errors are declared in the protocol for a message, the effective error union is ['string']. The behavior of avro.protocol is not consistent with this specification. In particular if no errors are declared the errors property of Message will be None and not an instance of ErrorUnionSchema. Consequently, if a message returns an error an exception gets thrown. Patch to follow shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-973) Union behavior not consistent
[ https://issues.apache.org/jira/browse/AVRO-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204169#comment-13204169 ] Doug Cutting commented on AVRO-973: --- I'll commit this tomorrow unless there are objections. Union behavior not consistent - Key: AVRO-973 URL: https://issues.apache.org/jira/browse/AVRO-973 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.6.1, 1.6.2 Reporter: Gaurav Nanda Labels: patch Attachments: AVRO-973-patch-1.patch, AVRO-973-patch-2.patch, test_unions.py Original Estimate: 0.25h Remaining Estimate: 0.25h Python's union does not respect the order in which type is specified. For following schema: {type:map,values:[int,long,float,double,string,boolean]}, an integer value is written as double, but it should respect the order in which types have been specified. Fixed Code (io.py): def write_union(self, writers_schema, datum, encoder): A union is encoded by first writing a long value indicating the zero-based position within the union of the schema of its value. The value is then encoded per the indicated schema within the union. # resolve union index_of_schema = -1 for i, candidate_schema in enumerate(writers_schema.schemas): if validate(candidate_schema, datum): index_of_schema = i break // XXX Add break statement here XXX// if index_of_schema 0: raise AvroTypeException(writers_schema, datum) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-301) Handle non-reserved properties appropriately in the Python implementation
[ https://issues.apache.org/jira/browse/AVRO-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204171#comment-13204171 ] Doug Cutting commented on AVRO-301: --- I'll commit this tomorrow unless there are objections. Handle non-reserved properties appropriately in the Python implementation - Key: AVRO-301 URL: https://issues.apache.org/jira/browse/AVRO-301 Project: Avro Issue Type: New Feature Components: python Reporter: Jeff Hammerbacher Assignee: Marcio Silva Attachments: AVRO-301-patch-1.patch, AVRO-301-patch-2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1013) NettyTransceiver can hang after server restart
[ https://issues.apache.org/jira/browse/AVRO-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202679#comment-13202679 ] Doug Cutting commented on AVRO-1013: The new test passes for me even without the changes to NettyTransceiver.java. NettyTransceiver can hang after server restart -- Key: AVRO-1013 URL: https://issues.apache.org/jira/browse/AVRO-1013 Project: Avro Issue Type: Bug Affects Versions: 1.6.1 Reporter: James Baldassari Priority: Blocker Attachments: AVRO-1013.patch I ran into a very specific scenario today which can lead to NettyTransceiver hanging indefinitely: # Start up a NettyServer # Initialize a NettyTransceiver and SpecificRequestor # Execute an RPC to establish the connection/handshake with the server # Shut down the server # Immediately execute another RPC After Step 4, NettyTransceiver will detect that the connection has been closed and call NettyTransceiver#disconnect(boolean, boolean, Throwable), which sets 'remote' to null, indicating to Requestor that the NettyTransceiver is now disconnected. However, if an RPC is executed just after the server has closed its socket (Step 5) and before disconnect() has been called, NettyTransceiver may still try to send this RPC because 'remote' has not yet been set to null. This race condition is normally ok because NettyTransceiver#getChannel() will detect that the socket has been closed and then try to reestablish the connection. Unfortunately, in this scenario getChannel() blocks forever when it attempts to acquire the write lock because the read lock has been acquired twice rather than once as getChannel() expects. The read lock is acquired once by transceive(ListByteBuffer, CallbackListByteBuffer) and again by writeDataPack(NettyDataPack). The fix is fairly simple. The writeDataPack(NettyDataPack) method (which is private) does not acquire the read lock but specifies in its contract that the read lock must acquired before calling this method. This change prevents the read lock from being acquired more than once by any single thread. Another change is to have NettyTransceiver#isConnected() perform two checks instead of one: remote != null isChannelReady(channel). This second change should allow NettyTransceiver to detect disconnect events more quickly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-975) Support RPC in C#
[ https://issues.apache.org/jira/browse/AVRO-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202978#comment-13202978 ] Doug Cutting commented on AVRO-975: --- Andrew, is this the complete implementation? If so, can anyone review it? The patches don't apply cleanly for me on Linux, perhaps due to EOL differences. Also, it would be great to be able to add this to the tests in share/test/interop/test_rpc_interop.sh. These use HTTP, though. Support RPC in C# - Key: AVRO-975 URL: https://issues.apache.org/jira/browse/AVRO-975 Project: Avro Issue Type: New Feature Components: csharp Affects Versions: 1.6.1 Reporter: Jeff Hammerbacher Attachments: 975.patch, buildtask.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas
[ https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203082#comment-13203082 ] Doug Cutting commented on AVRO-1006: Some notes: - Primitive types may have attributes, e.g., {type:int, java-class:java.lang.Short}, so only primitives without any attributes may be represented by their name alone. - Attributes within JSON objects are not ordered. A correct JSON parser need not preserve ordering. Relying on order-preservation may require some implementations to write their own JSON libraries. - With multiple Avro implementations, the chance of an inconsistent canonicalization implementation is significant. Creating an adequate test suite and validating all implementations would require significant effort. Given the above, I'd be hesitant to build a system that depends on consistent canonical schemas for correct operation. Folks who build systems that use Avro would thus be wise to design them to gracefully handle inconsistent canonicalization. For example, Avro's RPC handshake currently uses a fingerprint-like approach without requiring canonicalization. Two implementations that represent a schema using the same string will have more efficient handshakes, but implementations that produce different strings for equivalent schemas will still interoperate correctly. So a standard, recommended canonical form could be useful, but folks should perhaps not assume that every implementation is correct. I like the idea of a schema repository. A related idea I've had is to use something like a URL shortener. Instead of mapping url-url, it could map url-schema. One would register one's schema with the shortener, then hand out references. A shortener would, as an optimization, return the same ID for equivalent schemas. The shortener would only need to rely on only a single canonicalization implementation, its own. Fingerprints for Avro Schemas - Key: AVRO-1006 URL: https://issues.apache.org/jira/browse/AVRO-1006 Project: Avro Issue Type: New Feature Components: java Reporter: Raymie Stata Assignee: Raymie Stata Labels: features Attachments: schema-fingerprinting.html, schema-fingerprinting.html, schema-fingerprinting.html Add function that returns a standardized, 64-bit fingerprint for schemas. Fingerprints are designed such that the chances of collisions is very, very low. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-986) Avro files generated from avro-c dont work with the Java mapred implementation.
[ https://issues.apache.org/jira/browse/AVRO-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192387#comment-13192387 ] Doug Cutting commented on AVRO-986: --- share/test/data/syncInMeta.avro was just where I placed the quickstop.db file already attached to this issue. Avro files generated from avro-c dont work with the Java mapred implementation. --- Key: AVRO-986 URL: https://issues.apache.org/jira/browse/AVRO-986 Project: Avro Issue Type: Bug Components: c, java Environment: avro-c 1.6.2-SNAPSHOT avro-java 1.6.2-SNAPSHOT hadoop 0.20.2 Reporter: Michael Cooper Priority: Critical Labels: c, hadoop, java, mapreduce Fix For: 1.6.2 Attachments: 0001-Remove-sync-marker-from-metadata-in-header.patch, 0001-avromod-utility.patch, AVRO-986-java.patch, AVRO-986-java.patch, quickstop.db When a file generated from the Avro-C implementation is fed into Hadoop, it will fail with Block size invalid or too large for this implementation: -49. This is caused by the sync marker, namely the one that Avro-C puts into the header... The org.apache.avro.mapred.AvroRecordReader uses a FileSplit object to work out where it should read from, but this class is not particularly smart, it just divides the file up into equal size chunks, the first being with position 0. So org.apache.avro.mapred.AvroRecordReader gets 0 as the start of its chunk, and calls {code:title=AvroRecordReader.java}reader.sync(split.getStart()); // sync to start{code} Then the org.apache.avro.file.DataFileReader::seek() goes to 0, then searches for a sync marker It encounters one at position 32, the one in the header metadata map, avro.sync No other implementations add the sync marker in the metadata map, and none read it from there, not even the C version. I suggest we remove this from the header as the simplest solution. Another solution would be to create an AvroFileSplit class in mapred that knows where the blocks are, and provides the correct locations in the first place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-981) Python Avro library does not build/install on OS X
[ https://issues.apache.org/jira/browse/AVRO-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192539#comment-13192539 ] Doug Cutting commented on AVRO-981: --- snappy and python-snappy are dependencies that are not included in the project, and should be How do you mean they should they be included? Do you mean we should mention them in the top-level BUILD.txt or something else? Do you still feel we should remove them from setup.py? Thanks! Python Avro library does not build/install on OS X -- Key: AVRO-981 URL: https://issues.apache.org/jira/browse/AVRO-981 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.5.4, 1.6.1 Environment: Mac OS X 10.6.8, Python 2.5, 2.6, 2.7 Reporter: Russell Jurney Priority: Blocker Labels: avro, fun, happy, pants, python Fix For: 1.5.4, 1.6.1 Attachments: AVRO-981.patch russell-jurneys-macbook-pro:py rjurney$ sudo python2.5 setup.py install Password: running install running bdist_egg running egg_info writing requirements to avro.egg-info/requires.txt writing avro.egg-info/PKG-INFO writing top-level names to avro.egg-info/top_level.txt writing dependency_links to avro.egg-info/dependency_links.txt reading manifest file 'avro.egg-info/SOURCES.txt' writing manifest file 'avro.egg-info/SOURCES.txt' installing library code to build/bdist.macosx-10.6-i386/egg running install_lib running build_py creating build/bdist.macosx-10.6-i386 creating build/bdist.macosx-10.6-i386/egg creating build/bdist.macosx-10.6-i386/egg/avro copying build/lib/avro/__init__.py - build/bdist.macosx-10.6-i386/egg/avro copying build/lib/avro/datafile.py - build/bdist.macosx-10.6-i386/egg/avro copying build/lib/avro/io.py - build/bdist.macosx-10.6-i386/egg/avro copying build/lib/avro/ipc.py - build/bdist.macosx-10.6-i386/egg/avro copying build/lib/avro/protocol.py - build/bdist.macosx-10.6-i386/egg/avro copying build/lib/avro/schema.py - build/bdist.macosx-10.6-i386/egg/avro copying build/lib/avro/tool.py - build/bdist.macosx-10.6-i386/egg/avro copying build/lib/avro/txipc.py - build/bdist.macosx-10.6-i386/egg/avro copying build/lib/pyAntTasks-1.3-LICENSE.txt - build/bdist.macosx-10.6-i386/egg copying build/lib/pyAntTasks-1.3.jar - build/bdist.macosx-10.6-i386/egg creating build/bdist.macosx-10.6-i386/egg/simplejson copying build/lib/simplejson/__init__.py - build/bdist.macosx-10.6-i386/egg/simplejson copying build/lib/simplejson/_speedups.c - build/bdist.macosx-10.6-i386/egg/simplejson copying build/lib/simplejson/decoder.py - build/bdist.macosx-10.6-i386/egg/simplejson copying build/lib/simplejson/encoder.py - build/bdist.macosx-10.6-i386/egg/simplejson copying build/lib/simplejson/LICENSE.txt - build/bdist.macosx-10.6-i386/egg/simplejson copying build/lib/simplejson/scanner.py - build/bdist.macosx-10.6-i386/egg/simplejson copying build/lib/simplejson/tool.py - build/bdist.macosx-10.6-i386/egg/simplejson byte-compiling build/bdist.macosx-10.6-i386/egg/avro/__init__.py to __init__.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/avro/datafile.py to datafile.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/avro/io.py to io.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/avro/ipc.py to ipc.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/avro/protocol.py to protocol.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/avro/schema.py to schema.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/avro/tool.py to tool.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/avro/txipc.py to txipc.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/__init__.py to __init__.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/decoder.py to decoder.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/encoder.py to encoder.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/scanner.py to scanner.pyc byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/tool.py to tool.pyc creating build/bdist.macosx-10.6-i386/egg/EGG-INFO installing scripts to build/bdist.macosx-10.6-i386/egg/EGG-INFO/scripts running install_scripts running build_scripts creating build/scripts-2.5 copying and adjusting ./scripts/avro - build/scripts-2.5 changing mode of build/scripts-2.5/avro from 644 to 755 creating build/bdist.macosx-10.6-i386/egg/EGG-INFO/scripts copying build/scripts-2.5/avro - build/bdist.macosx-10.6-i386/egg/EGG-INFO/scripts changing mode of build/bdist.macosx-10.6-i386/egg/EGG-INFO/scripts/avro to 755 copying avro.egg-info/PKG-INFO - build/bdist.macosx-10.6-i386/egg/EGG-INFO copying avro.egg-info/SOURCES.txt - build/bdist.macosx-10.6-i386/egg/EGG-INFO copying
[jira] [Commented] (AVRO-995) Java: Update Dependencies for 1.6.2
[ https://issues.apache.org/jira/browse/AVRO-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192567#comment-13192567 ] Doug Cutting commented on AVRO-995: --- Should we be concerned about updating versions of dependencies for non-bugfix reasons in a bugfix release? It's possible that updating one of our dependencies could break a project that depends on Avro. So it might be safer to simply update Jackson in 1.6.2 and save the rest of these for 1.7.0. Am I being too paranoid? Java: Update Dependencies for 1.6.2 --- Key: AVRO-995 URL: https://issues.apache.org/jira/browse/AVRO-995 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.6.2 Attachments: AVRO-995.patch A few of our dependencies need upgrading. In particular, I have been hit by a bug in Jackson that is fixed by the latest release (http://jira.codehaus.org/browse/JACKSON-462). Summary: I will submit a patch that updates everything to the next bugfix or very minor release, other than paranamer, thrift, and hadoop. Details: (using maven versions plugin) On the dependency side: [INFO] com.thoughtworks.paranamer:paranamer 2.3 - 2.4-debug could not find info about what is new in 2.4. I do not think we should upgrade until we have more info [INFO] net.sf.jopt-simple:jopt-simple 4.1 - 4.3 minor extra features (http://pholser.github.com/jopt-simple/changes.html) [INFO] org.apache.hadoop:hadoop-core 0.20.205.0 - 1.0.0 renamed 0.20.205, no need to update yet. [INFO] org.codehaus.jackson:jackson-mapper-asl ... 1.8.6 - 1.9.3 I suggest we upgrade to 1.8.7. [INFO] org.jboss.netty:netty . 3.2.6.Final - 3.2.7.Final bugfix release [INFO] org.apache.thrift:libthrift ... 0.7.0 - 0.8.0 Is this a minor / bugfix release? If so we should update, otherwise wait until Avro 1.7.x [INFO] org.slf4j:slf4j-api ... 1.6.3 - 1.6.4 [INFO] org.slf4j:slf4j-simple 1.6.3 - 1.6.4 Minor bugfixes (http://www.slf4j.org/news.html) On the plugin side: (mvn versions:display-plugin-updates) [INFO] maven-antrun-plugin .. 1.6 - 1.7 minor, looks safe: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWke1=w6nv2u85nkgqm0zxo3khyzdc8hazkhhvywjbuv...@mail.gmail.com%3E) [INFO] maven-gpg-plugin . 1.3 - 1.4 minor update (http://mail-archives.apache.org/mod_mbox/maven-announce/201108.mbox/%3CCA+nPnMw_3zQQCpzybQvo-QZFMCogvH31WEhxQnZ=cdzgxsr...@mail.gmail.com%3E) [INFO] maven-checkstyle-plugin .. 2.6 - 2.8 we avoided 2.7 before for some reason: (http://mail-archives.apache.org/mod_mbox/maven-dev/201108.mbox/%3ccapoybqsvu+kup5vuce8rc6mjb9rykr2cpig+rvbe5o8teo6...@mail.gmail.com%3E) useful new feature: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C15365449.01320142181746.JavaMail.mark@MARK%3E) [INFO] maven-surefire-plugin .. 2.10 - 2.11 lots of bug fixes and long requested new features: (http://mail-archives.apache.org/mod_mbox/maven-announce/201112.mbox/%3CCA+jQputH_uA2Ue6JqiHp1YeNo=qqxgcpdtgq9vv1aw_psqk...@mail.gmail.com%3E) [INFO] maven-shade-plugin ... 1.4 - 1.5 minor looks safe: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C1076639049.01320107865464.JavaMail.benson@tinfoilhat.local%3E) [INFO] maven-archetype-plugin ... 2.1 - 2.2 http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWkeLyc-tA2NCh3xYR06W+eGiWd46fS=R=ngjon14zrd...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-989) Java: Improve Builder performance in Specific API
[ https://issues.apache.org/jira/browse/AVRO-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192608#comment-13192608 ] Doug Cutting commented on AVRO-989: --- +1 for reducing the coupling. It's probably worthy of a separate issue. It will probably subsume this issue. In addition to making record fields private we might also change the default representation for strings to String (AVRO-803). Java: Improve Builder performance in Specific API -- Key: AVRO-989 URL: https://issues.apache.org/jira/browse/AVRO-989 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Attachments: AVRO-989-v2.patch, AVRO-989.patch The Specific API generates Builder objects for each record. This builder uses a boolean[] to store flags for each field to indicate whether the field is set or not. This is not space efficient, a boolean[] takes 16 bytes plus one byte per field, rounded up to the nearest 8 byte interval. This can be improved on by using BitSet for large records, and bitmasks on an int for records with less than 32 fields. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized
[ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192672#comment-13192672 ] Doug Cutting commented on AVRO-997: --- I think in current trunk you'd get Unknown datum type: M which is somewhat more informative than the 1.5.1 message. Maybe this could be improved to something like, Unknown datum for GenericData: M to hint that specific, reflect, thrift, protobuf or some other data model might be better? Union of enum and null cannot be serialized --- Key: AVRO-997 URL: https://issues.apache.org/jira/browse/AVRO-997 Project: Avro Issue Type: Bug Affects Versions: 1.5.1 Reporter: Aaron Kimball I have a schema like: {code} [ { type: enum, name: Gender, symbols: [M, F] }, { type : record, name : Foo, fields : [ { type : [Gender, null], name : gender }, ... ] } ] {code} I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}} When I go to serialize this, I get: {code}Not in union [{type:enum,name:Gender,symbols:[M,F]},null]: M at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-995) Java: Update Dependencies for 1.6.2
[ https://issues.apache.org/jira/browse/AVRO-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191624#comment-13191624 ] Doug Cutting commented on AVRO-995: --- Should we include this in 1.6.2 or not? Java: Update Dependencies for 1.6.2 --- Key: AVRO-995 URL: https://issues.apache.org/jira/browse/AVRO-995 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.6.2 Attachments: AVRO-995.patch A few of our dependencies need upgrading. In particular, I have been hit by a bug in Jackson that is fixed by the latest release (http://jira.codehaus.org/browse/JACKSON-462). Summary: I will submit a patch that updates everything to the next bugfix or very minor release, other than paranamer, thrift, and hadoop. Details: (using maven versions plugin) On the dependency side: [INFO] com.thoughtworks.paranamer:paranamer 2.3 - 2.4-debug could not find info about what is new in 2.4. I do not think we should upgrade until we have more info [INFO] net.sf.jopt-simple:jopt-simple 4.1 - 4.3 minor extra features (http://pholser.github.com/jopt-simple/changes.html) [INFO] org.apache.hadoop:hadoop-core 0.20.205.0 - 1.0.0 renamed 0.20.205, no need to update yet. [INFO] org.codehaus.jackson:jackson-mapper-asl ... 1.8.6 - 1.9.3 I suggest we upgrade to 1.8.7. [INFO] org.jboss.netty:netty . 3.2.6.Final - 3.2.7.Final bugfix release [INFO] org.apache.thrift:libthrift ... 0.7.0 - 0.8.0 Is this a minor / bugfix release? If so we should update, otherwise wait until Avro 1.7.x [INFO] org.slf4j:slf4j-api ... 1.6.3 - 1.6.4 [INFO] org.slf4j:slf4j-simple 1.6.3 - 1.6.4 Minor bugfixes (http://www.slf4j.org/news.html) On the plugin side: (mvn versions:display-plugin-updates) [INFO] maven-antrun-plugin .. 1.6 - 1.7 minor, looks safe: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWke1=w6nv2u85nkgqm0zxo3khyzdc8hazkhhvywjbuv...@mail.gmail.com%3E) [INFO] maven-gpg-plugin . 1.3 - 1.4 minor update (http://mail-archives.apache.org/mod_mbox/maven-announce/201108.mbox/%3CCA+nPnMw_3zQQCpzybQvo-QZFMCogvH31WEhxQnZ=cdzgxsr...@mail.gmail.com%3E) [INFO] maven-checkstyle-plugin .. 2.6 - 2.8 we avoided 2.7 before for some reason: (http://mail-archives.apache.org/mod_mbox/maven-dev/201108.mbox/%3ccapoybqsvu+kup5vuce8rc6mjb9rykr2cpig+rvbe5o8teo6...@mail.gmail.com%3E) useful new feature: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C15365449.01320142181746.JavaMail.mark@MARK%3E) [INFO] maven-surefire-plugin .. 2.10 - 2.11 lots of bug fixes and long requested new features: (http://mail-archives.apache.org/mod_mbox/maven-announce/201112.mbox/%3CCA+jQputH_uA2Ue6JqiHp1YeNo=qqxgcpdtgq9vv1aw_psqk...@mail.gmail.com%3E) [INFO] maven-shade-plugin ... 1.4 - 1.5 minor looks safe: (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C1076639049.01320107865464.JavaMail.benson@tinfoilhat.local%3E) [INFO] maven-archetype-plugin ... 2.1 - 2.2 http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWkeLyc-tA2NCh3xYR06W+eGiWd46fS=R=ngjon14zrd...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-968) Avro C - avro_value_cmp_fast() may return garbage value for AVRO_STRING comparison
[ https://issues.apache.org/jira/browse/AVRO-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191626#comment-13191626 ] Doug Cutting commented on AVRO-968: --- This issue can be resolved as fixed, right? Avro C - avro_value_cmp_fast() may return garbage value for AVRO_STRING comparison -- Key: AVRO-968 URL: https://issues.apache.org/jira/browse/AVRO-968 Project: Avro Issue Type: Bug Components: c Affects Versions: 1.6.1, 1.6.2, 1.7.0 Environment: All. Currently using gcc 4.6.1 on Ubuntu 11.10. Reporter: Vivek Nadkarni Priority: Minor Fix For: 1.6.2, 1.7.0 Attachments: 0001-AVRO-968.-C-Fixed-avro_value_cmp-on-string-values.patch, AVRO-968.patch Original Estimate: 24h Remaining Estimate: 24h Compiler shows a warning that variables may be used uninitialized in avro_value_cmp_fast(): /home/user/avro-trunk/lang/c/src/value.c: In function 'avro_value_cmp_fast': /home/user/avro-trunk/lang/c/src/value.c:387:13: warning: 'size2' may be used uninitialized in this function [-Wuninitialized] /home/user/avro-trunk/lang/c/src/value.c:387:13: warning: 'size1' may be used uninitialized in this function [-Wuninitialized] /home/user/avro-trunk/lang/c/src/value.c:388:11: warning: 'buf1' may be used uninitialized in this function [-Wuninitialized] /home/user/avro-trunk/lang/c/src/value.c:388:11: warning: 'buf2' may be used uninitialized in this function [-Wuninitialized] Examining the file shows that the warnings are real, and the variables size1, buf1, size2, buf2 should be loaded before they are used. The simple fix is to copy matching code from the function avro_value_equal_fast(). I will attach that code in an upcoming patch. Cheers, Vivek -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized
[ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189289#comment-13189289 ] Doug Cutting commented on AVRO-997: --- SpecificData#isEnum() overrides GenericData#isEnum() and returns true for instances of Enum. So this should work. Also, TestSpecificDatumWriter#testResolveUnion writes and reads an instance of a record with a field whose type is a union of null and an enum using generated, specific code. So this appears to be tested. Aaron, can you provide a simple, complete test that fails? Also, what version of Avro are you using? Union of enum and null cannot be serialized --- Key: AVRO-997 URL: https://issues.apache.org/jira/browse/AVRO-997 Project: Avro Issue Type: Bug Affects Versions: 1.5.1 Reporter: Aaron Kimball I have a schema like: {code} [ { type: enum, name: Gender, symbols: [M, F] }, { type : record, name : Foo, fields : [ { type : [Gender, null], name : gender }, ... ] } ] {code} I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}} When I go to serialize this, I get: {code}Not in union [{type:enum,name:Gender,symbols:[M,F]},null]: M at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized
[ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189446#comment-13189446 ] Doug Cutting commented on AVRO-997: --- SpecificDatumWriter can write both generic and specific instances, while GenericDatumWriter is only intended to correctly write generic instances. Union of enum and null cannot be serialized --- Key: AVRO-997 URL: https://issues.apache.org/jira/browse/AVRO-997 Project: Avro Issue Type: Bug Affects Versions: 1.5.1 Reporter: Aaron Kimball I have a schema like: {code} [ { type: enum, name: Gender, symbols: [M, F] }, { type : record, name : Foo, fields : [ { type : [Gender, null], name : gender }, ... ] } ] {code} I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}} When I go to serialize this, I get: {code}Not in union [{type:enum,name:Gender,symbols:[M,F]},null]: M at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-991) Allow combining multiple Avro files within a stream. (no files on disk)
[ https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188830#comment-13188830 ] Doug Cutting commented on AVRO-991: --- +1 for user-specified sync markers. That should probably be a separate issue from the appended-stream tool. Allow combining multiple Avro files within a stream. (no files on disk) --- Key: AVRO-991 URL: https://issues.apache.org/jira/browse/AVRO-991 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.1 Reporter: Frank Grimes It would be nice to be able to do as follows: cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine combined-file.avro or similarly hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java -jar avro-tools.jar streamcombine | hdfs -put - hdfs://hadoop/combined-file.avro See the following thread for details: http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%3cc08f1de9-97a8-4d28-b0ad-5e4a7f32f...@gmail.com%3e -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-994) TestFileSpanStorage.testTonsOfSpans() fails on my slow VM
[ https://issues.apache.org/jira/browse/AVRO-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187219#comment-13187219 ] Doug Cutting commented on AVRO-994: --- +1 That's a better way to fix such things, rather than just increasing the sleep time as we have before. Thanks! TestFileSpanStorage.testTonsOfSpans() fails on my slow VM - Key: AVRO-994 URL: https://issues.apache.org/jira/browse/AVRO-994 Project: Avro Issue Type: Bug Reporter: James Baldassari Priority: Minor Attachments: AVRO-994.patch {noformat} Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.554 sec FAILURE! testTonsOfSpans(org.apache.avro.ipc.trace.TestFileSpanStorage) Time elapsed: 3.853 sec FAILURE! java.lang.AssertionError: expected:5 but was:42356 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.avro.ipc.trace.TestFileSpanStorage.testTonsOfSpans(TestFileSpanStorage.java:70) {noformat} The issue seems to be the {{Thread.sleep(2000)}} on line 66. Doubling this to 4000ms causes the test to pass. In general it might be better to make this sleep event-based rather than using a fixed sleep time. If that isn't possible, then maybe using some sort of a retry loop would work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-991) Allow combining multiple Avro files within a stream. (no files on disk)
[ https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187227#comment-13187227 ] Doug Cutting commented on AVRO-991: --- For the record, the thinking behind the varied sync marker is that it makes collisions less likely. In theory this is not true, but in practice my concern was that, once a value was fixed and known, there'd be a significantly higher probability that someone would include it in some data. Perhaps that's not correct, though. As for expanding the spec, as I mentioned above, we can do that at present, since the file's magic number can never be the start of a valid block. So if a block ever starts with the magic number then a reader could assume that it's an appended file. It's perhaps not the way one would design an appendable format from scratch, but I think it's workable. Allow combining multiple Avro files within a stream. (no files on disk) --- Key: AVRO-991 URL: https://issues.apache.org/jira/browse/AVRO-991 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.1 Reporter: Frank Grimes It would be nice to be able to do as follows: cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine combined-file.avro or similarly hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java -jar avro-tools.jar streamcombine | hdfs -put - hdfs://hadoop/combined-file.avro See the following thread for details: http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%3cc08f1de9-97a8-4d28-b0ad-5e4a7f32f...@gmail.com%3e -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-991) Allow combining multiple Avro files within a stream. (no files on disk)
[ https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187346#comment-13187346 ] Doug Cutting commented on AVRO-991: --- On second thought, I don't think we ought to add this to the spec. I think a tool that can read appended streams and write a single file would be useful, but I don't think we should require every implementation to be able to parse appended files. That would be an incompatible change, and, as Scott points out, would also create difficult to split files. I also think Scott's idea of permitting user-spec'd sync markers could be useful. Allow combining multiple Avro files within a stream. (no files on disk) --- Key: AVRO-991 URL: https://issues.apache.org/jira/browse/AVRO-991 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.1 Reporter: Frank Grimes It would be nice to be able to do as follows: cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine combined-file.avro or similarly hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java -jar avro-tools.jar streamcombine | hdfs -put - hdfs://hadoop/combined-file.avro See the following thread for details: http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%3cc08f1de9-97a8-4d28-b0ad-5e4a7f32f...@gmail.com%3e -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-991) Allow combining multiple Avro files within a stream. (no files on disk)
[ https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185746#comment-13185746 ] Doug Cutting commented on AVRO-991: --- I think this would work. We'd need to be able to distinguish the start of a block from the start of the next file. A block starts with the count of items in it, encoded as a variable-length zig-zag-encoded long. A file starts with ASCII 'O'. Interpreted as a variable-length zig-zag encoded long, this is -40, which is an invalid item count. So a DataFileStream would need to, when the item count is -40, try to read a file header, and if its schema is compatible, update its sync and codec and keep reading. Allow combining multiple Avro files within a stream. (no files on disk) --- Key: AVRO-991 URL: https://issues.apache.org/jira/browse/AVRO-991 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.1 Reporter: Frank Grimes It would be nice to be able to do as follows: cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine combined-file.avro or similarly hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java -jar avro-tools.jar streamcombine | hdfs -put - hdfs://hadoop/combined-file.avro See the following thread for details: http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%3cc08f1de9-97a8-4d28-b0ad-5e4a7f32f...@gmail.com%3e -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-839) Implement builder pattern in generated record classes that sets default values when omitted
[ https://issues.apache.org/jira/browse/AVRO-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183595#comment-13183595 ] Doug Cutting commented on AVRO-839: --- we could add the methods back as deprecated, with a default implementation and some javadoc so that other folks won't be confused by the change and code with or without an override on these will still work. Then in 1.7.x we can drop it and there will be a better paper trail for others to follow on where the new non-deprecated versions are. +1 We should adopt this as a standard process for incompatible changes to public APIs. Implement builder pattern in generated record classes that sets default values when omitted --- Key: AVRO-839 URL: https://issues.apache.org/jira/browse/AVRO-839 Project: Avro Issue Type: Improvement Components: java Reporter: James Baldassari Assignee: James Baldassari Fix For: 1.6.0 Attachments: AVRO-839-v2.patch, AVRO-839-v3.patch, AVRO-839-v4.patch, AVRO-839-v4.patch, AVRO-839-v5.patch, AVRO-839.patch, AVRO-839.patch, AVRO-839.patch This is an idea for an improvement to the SpecificCompiler-generated record classes. There are two main issues to address: # Default values specified in schemas are only used at read time, not when writing/serializing records. For example, a NullPointerException is thrown when attempting to write a record that has an uninitialized array or string type. I'm sure this was done for good reasons, like giving users maximum control and preventing unnecessary garbage collection, but I think it's also somewhat confusing and unintuitive for new users (myself included). # Users have to create their own factory classes/methods for every record type, both to ensure that all non-primitive members are initialized and to facilitate the construction and initialization of record instances (i.e. constructing and setting values in a single statement). These issues have been discussed previously here: * [http://search-hadoop.com/m/iDVTn1JVeSR1] * AVRO-726 * AVRO-770 * [http://search-hadoop.com/m/JuY1V16pwxh1] I'd like to propose a solution that is used by at least one other messaging framework. For each generated record class there will be a public static inner class called Builder. The Builder inner class has the same fields as the record class, as well as accessors and mutators for each of these fields. Whenever a mutator method is called, the Builder sets a boolean flag indicating that the field has been set. All mutators return a reference to 'this', so it's possible to chain a series of setter invocations, which makes it really easy to construct records in a single statement. The Builder also has a build() method which constructs a record instance using the values that were set in the Builder. When the build() method is invoked, if there are any fields that have not been set but have default values as defined in the schema, the Builder will set the values of these fields using their defaults. One nice thing about implementing the builder pattern in a static inner Builder class rather than in the record itself is that this enhancement will be completely backwards-compatible with existing code. The record class itself would not change, and the public fields would still be there, so existing code would still work. Users would have the option to use the Builder or continue constructing records manually. Eventually the public fields could be phased out, and the record would be made immutable. All changes would have to be done through the Builder. Here is an example of what this might look like: {code} // Person.newBuilder() returns a new Person.Builder instance // All Person.Builder setters return 'this' allowing us to chain set calls together for convenience // Person.Builder.build() returns a Person instance after setting any uninitialized values that have defaults Person me = Person.newBuilder().setName(James).setCountry(US).setState(MA).build(); // We still have direct access to Person's members, so the records are backwards-compatible me.state = CA; // Person has accessor methods now so that the public fields can be phased out later System.out.println(me.getState()); // No NPE here because the arrayPerson field that stores this person's friends has been automatically // initialized by the Builder to a new java.util.ArrayListPerson due to a @java_class annotation in the IDL System.out.println(me.getFriends().size()); {code} What do people think about this approach? Any other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please
[jira] [Commented] (AVRO-986) Avro files generated from avro-c dont work with the Java mapred implementation.
[ https://issues.apache.org/jira/browse/AVRO-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174936#comment-13174936 ] Doug Cutting commented on AVRO-986: --- +1 This patch sounds like the right way to fix this to me. If we were to instead fix this in Java then I don't think we should try to make the splitter smarter, since splitting is single-threaded and that's not scalable. Rather we should make sync(0) skip over the metadata. But there probably shouldn't be any sync markers in the metadata anyway... Avro files generated from avro-c dont work with the Java mapred implementation. --- Key: AVRO-986 URL: https://issues.apache.org/jira/browse/AVRO-986 Project: Avro Issue Type: Bug Components: c, java Environment: avro-c 1.6.2-SNAPSHOT avro-java 1.6.2-SNAPSHOT hadoop 0.20.2 Reporter: Michael Cooper Priority: Critical Labels: c, hadoop, java, mapreduce Attachments: 0001-Remove-sync-marker-from-metadata-in-header.patch When a file generated from the Avro-C implementation is fed into Hadoop, it will fail with Block size invalid or too large for this implementation: -49. This is caused by the sync marker, namely the one that Avro-C puts into the header... The org.apache.avro.mapred.AvroRecordReader uses a FileSplit object to work out where it should read from, but this class is not particularly smart, it just divides the file up into equal size chunks, the first being with position 0. So org.apache.avro.mapred.AvroRecordReader gets 0 as the start of its chunk, and calls {code:title=AvroRecordReader.java}reader.sync(split.getStart()); // sync to start{code} Then the org.apache.avro.file.DataFileReader::seek() goes to 0, then searches for a sync marker It encounters one at position 32, the one in the header metadata map, avro.sync No other implementations add the sync marker in the metadata map, and none read it from there, not even the C version. I suggest we remove this from the header as the simplest solution. Another solution would be to create an AvroFileSplit class in mapred that knows where the blocks are, and provides the correct locations in the first place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-570) python implementation of mapreduce connector
[ https://issues.apache.org/jira/browse/AVRO-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173450#comment-13173450 ] Doug Cutting commented on AVRO-570: --- I don't seem to have avro installed in /usr/lib/python. The tests you describe above find the version in my build directory, as expected. The scripts in /tmp look fine. Yet I still see: {code} [py-test] ./home/cutting/src/avro/trunk/lang/py/build/src/avro/tether/tether_task_runner.py:24: RuntimeWarning: Parent module 'avro.tether' not found while handling absolute import [py-test] from avro import tether {code} python implementation of mapreduce connector Key: AVRO-570 URL: https://issues.apache.org/jira/browse/AVRO-570 Project: Avro Issue Type: New Feature Components: python Affects Versions: 1.6.0 Reporter: Doug Cutting Assignee: Jeremy Lewi Priority: Critical Labels: hadoop Fix For: 1.7.0 Attachments: AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch AVRO-512 defines protocols for implementing mapreduce tasks. It would be good to have a Python implementation of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-982) NettyTransceiver: can hang on connection interruption
[ https://issues.apache.org/jira/browse/AVRO-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170626#comment-13170626 ] Doug Cutting commented on AVRO-982: --- It would be great to have a test that this fixes. I tried some simple changes to TestNettyServerWithCallbacks to reproduce the problem and could not. Can you devise a test? NettyTransceiver: can hang on connection interruption - Key: AVRO-982 URL: https://issues.apache.org/jira/browse/AVRO-982 Project: Avro Issue Type: Improvement Components: java Reporter: Bruno Dumon Priority: Minor Attachments: AVRO-982.patch When stopping my avro server, I noticed that my avro client was hanging. This makes it impossible for my client to retry the operation, as it hangs inside the avro code: {noformat} pool-2-thread-1 prio=10 tid=0x7fc66840e800 nid=0x75fc waiting on condition [0x7fc674176000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007d7471bd0 (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207) at org.apache.avro.ipc.CallFuture.get(CallFuture.java:116) at org.apache.avro.ipc.Requestor.request(Requestor.java:106) at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:72) {noformat} In a similar situation elsewhere in the NettyTransceiver (method exceptionCaught), the pending requests are canceled. It seems appropriate to do that also on closed connections. I'll attach a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-724) C implementation does not write datum values that are larger than the memory write buffer (currently 16K)
[ https://issues.apache.org/jira/browse/AVRO-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168536#comment-13168536 ] Doug Cutting commented on AVRO-724: --- The ideal solution would be to have fixed length block header fields but that would require a change to the spec. This makes sense. To do this we'd probably want to increment the file format's magic number, i.e., from {'O','b','j',1} to {'O','b','j',2}. And it would be best to update all implementations to read the new format before making it the default for any implementation. C implementation does not write datum values that are larger than the memory write buffer (currently 16K) - Key: AVRO-724 URL: https://issues.apache.org/jira/browse/AVRO-724 Project: Avro Issue Type: Bug Components: c Affects Versions: 1.4.1 Reporter: Jeremy Hinegardner The current C implementation does not allow for datum values greater than 16K. The {{avro_file_writer_append}} flushes blocks to disk over time, but does not deal with the single case of a single datum being larger than {{avro_file_writer_t.datum_buffer}}. This is noted in the source code: {code:title=datafile.c:294-313} int avro_file_writer_append(avro_file_writer_t w, avro_datum_t datum) { int rval; if (!w || !datum) { return EINVAL; } rval = avro_write_data(w-datum_writer, w-writers_schema, datum); if (rval) { check(rval, file_write_block(w)); rval = avro_write_data(w-datum_writer, w-writers_schema, datum); if (rval) { /* TODO: if the datum encoder larger than our buffer, just write a single large datum */ return rval; } } w-block_count++; return 0; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2
[ https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167925#comment-13167925 ] Doug Cutting commented on AVRO-593: --- Garrett, I just glanced at this and it looks great! You've factored things so that much of the code is shared between the 'mapred' and 'mapreduce' implementations. The stuff in the 'file' and 'io' packages should probably be renamed. Currently the 'io' and 'file' packages are in the main avro jar, which does not require Hadoop. I think it's best not to split packages across multiple jars and these classes depend on Hadoop so probably belong in the avro-mapred jar. Perhaps they should be renamed 'org.apache.avro.mapred.{io,file}'? Also, do you intend this code to be contributed to Apache Avro? (I ask as a legal formality.) Avro mapreduce apis incompatible with hadoop 0.20.2 --- Key: AVRO-593 URL: https://issues.apache.org/jira/browse/AVRO-593 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.3.2, 1.3.3 Environment: Avro 1.3.3, Hadoop 0.20.2 Reporter: Steve Severance Attachments: AVRO-593.patch The avro api's for hadoop use the hadoop mapreduce api that has been deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 and higher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-971) IDL Import from project classpath
[ https://issues.apache.org/jira/browse/AVRO-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164881#comment-13164881 ] Doug Cutting commented on AVRO-971: --- In idl.jj, shouldn't we update ImportSchema() too? Also, it'd be good to add a test for this. IDL Import from project classpath - Key: AVRO-971 URL: https://issues.apache.org/jira/browse/AVRO-971 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.1 Environment: Maven java projects Reporter: Victor Chau Priority: Minor Labels: patch Attachments: ImportFromClassPath.patch Currently, it looks like the only option to importing another schema in IDL is to place the file being imported in the same directory as that of the importing avdl. In a setup where there are avdl's that are spread among several maven projects that are owned by different teams, this is logistically difficult to manage. When using the avro-maven-plugin, I would like to be able to just create a dependency from my project on another jar that contains the avdl I am want to import and have Avro be smart enough to look for it in the classpath of the project containing the avdl when compiling my avdl. Attached is a working patch that will: 1. Change the IDLProtocolMojo class to lookup the current project's classpath and create a new ClassLoader. 2. Give the Idl compiler class the ClassLoader before parsing the avdl. 3. If the Idl class encounters an import that it cannot resolve to the local directory while parsing, it will try to use the ClassLoader to load up the file being imported. The patch spans the Avro 1.6.1 tag of the Java avro, avro-compiler, and avro-maven-plugin projects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-969) Make possible usage of SpecificDatumWriter in avro-mapred
[ https://issues.apache.org/jira/browse/AVRO-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163776#comment-13163776 ] Doug Cutting commented on AVRO-969: --- AVRO-966 is a bug. I supplied a patch there. SpecificDatumWriter might be a bit more efficient. Reflection is not in general used for specific objects though, even when ReflectDatumReader is used. Rather ReflectDatumReader detects the specific object and uses SpecificDatumReader, but that detection adds a small cost. Make possible usage of SpecificDatumWriter in avro-mapred - Key: AVRO-969 URL: https://issues.apache.org/jira/browse/AVRO-969 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.1 Reporter: Vyacheslav Zholudev I realized that ReflectDatumWriter is always used when running mapred job (in AvroOutputFormat.java). Sometimes it leads to bugs like in AVRO-966. Why not just provide a property like {{WRITER_IS_REFLECT = avro.map.writer.is.reflect;}} to make a decision which DatumWriter should be used. I created a small patch to solve this: {code:title=avro-mapred.patch|borderStyle=solid} Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java === --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java (revision 1209417) +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java (revision ) @@ -53,6 +53,8 @@ /** The configuration key for reflection-based map output representation. */ public static final String MAP_OUTPUT_IS_REFLECT = avro.map.output.is.reflect; + public static final String WRITER_IS_REFLECT = avro.map.writer.is.reflect; + /** Configure a job's map input schema. */ public static void setInputSchema(JobConf job, Schema s) { job.set(INPUT_SCHEMA, s.toString()); Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java === --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java (revision 1209417) +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java (revision ) @@ -23,6 +23,7 @@ import java.util.Map; import java.net.URLDecoder; +import org.apache.avro.specific.SpecificDatumWriter; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; @@ -102,8 +103,9 @@ ? AvroJob.getMapOutputSchema(job) : AvroJob.getOutputSchema(job); -final DataFileWriterT writer = - new DataFileWriterT(new ReflectDatumWriterT()); +final DataFileWriterT writer = job.getBoolean(AvroJob.WRITER_IS_REFLECT, false) ? + new DataFileWriterT(new ReflectDatumWriterT()) : + new DataFileWriterT(new SpecificDatumWriterT()); configureDataFileWriter(writer, job); {code} Does it make sense? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-965) Enhance the IDL parser to allow properties for protocols and messages
[ https://issues.apache.org/jira/browse/AVRO-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163820#comment-13163820 ] Doug Cutting commented on AVRO-965: --- The properties might be instead parsed in ProtocolBody. E.g. that might look something like: {code} { IMPORT ... { } | ( SchemaProperty(props) )* ( NamedSchemaDeclaration(props) { ... } | MessageDeclaration(props) { ... } } {code} Does that make sense? Enhance the IDL parser to allow properties for protocols and messages - Key: AVRO-965 URL: https://issues.apache.org/jira/browse/AVRO-965 Project: Avro Issue Type: Improvement Reporter: George Fletcher Priority: Minor Enhance the IDL parser to support arbitrary properties for protocol and message types. This will allow for attaching metadata to a protocol or message and can be used for versioning and in some cases language annotations. This was partly discussed as part of JIRA ticket 886 https://issues.apache.org/jira/browse/AVRO-886 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-969) Make possible usage of SpecificDatumWriter in avro-mapred
[ https://issues.apache.org/jira/browse/AVRO-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163006#comment-13163006 ] Doug Cutting commented on AVRO-969: --- ReflectDatumWriter should be able to correctly write a superset of the types that SpecificDatumWriter can write, so this property should not be needed. That said, it might be good to be able to override the DatumWriter and/or DatumReader classes used by Avro's mapred API. This might permit, e.g., ThriftDatumWriter to be used. So the patch that might be best is to switch to using an avro.input.datumReader, avro.map_output.datumWriter, and avro.output.datumWriter properties that name classes whose constructor accepts a schema parameter. Make possible usage of SpecificDatumWriter in avro-mapred - Key: AVRO-969 URL: https://issues.apache.org/jira/browse/AVRO-969 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.1 Reporter: Vyacheslav Zholudev I realized that ReflectDatumWriter is always used when running mapred job (in AvroOutputFormat.java). Sometimes it leads to bugs like in AVRO-966. Why not just provide a property like {{WRITER_IS_REFLECT = avro.map.writer.is.reflect;}} to make a decision which DatumWriter should be used. I created a small patch to solve this: {code:title=avro-mapred.patch|borderStyle=solid} Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java === --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java (revision 1209417) +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java (revision ) @@ -53,6 +53,8 @@ /** The configuration key for reflection-based map output representation. */ public static final String MAP_OUTPUT_IS_REFLECT = avro.map.output.is.reflect; + public static final String WRITER_IS_REFLECT = avro.map.writer.is.reflect; + /** Configure a job's map input schema. */ public static void setInputSchema(JobConf job, Schema s) { job.set(INPUT_SCHEMA, s.toString()); Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java === --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java (revision 1209417) +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java (revision ) @@ -23,6 +23,7 @@ import java.util.Map; import java.net.URLDecoder; +import org.apache.avro.specific.SpecificDatumWriter; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; @@ -102,8 +103,9 @@ ? AvroJob.getMapOutputSchema(job) : AvroJob.getOutputSchema(job); -final DataFileWriterT writer = - new DataFileWriterT(new ReflectDatumWriterT()); +final DataFileWriterT writer = job.getBoolean(AvroJob.WRITER_IS_REFLECT, false) ? + new DataFileWriterT(new ReflectDatumWriterT()) : + new DataFileWriterT(new SpecificDatumWriterT()); configureDataFileWriter(writer, job); {code} Does it make sense? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-970) (Java) Allow users to implement their own Codecs
[ https://issues.apache.org/jira/browse/AVRO-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163140#comment-13163140 ] Doug Cutting commented on AVRO-970: --- I think this was kept private originally out of concern that the API might not be stable. But CodecFactory is public, so it doesn't make much sense for Codec to be private. I'm +1 for making this change. Would you like to provide a patch? Should we add a test that implements a Codec in a different package, perhaps one that just performs a bitwise-NOT of the data? (Java) Allow users to implement their own Codecs Key: AVRO-970 URL: https://issues.apache.org/jira/browse/AVRO-970 Project: Avro Issue Type: Improvement Components: java Reporter: Peter Nimmervoll Currently the base class for all codecs (Codec) is not public which makes it impossible to write own codecs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-953) python ipc with path
[ https://issues.apache.org/jira/browse/AVRO-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156379#comment-13156379 ] Doug Cutting commented on AVRO-953: --- How hard would it be to add a test of the new functionality? python ipc with path Key: AVRO-953 URL: https://issues.apache.org/jira/browse/AVRO-953 Project: Avro Issue Type: Improvement Components: python Reporter: Craig Landry Priority: Minor Labels: patch Attachments: req_resource.patch Original Estimate: 1h Remaining Estimate: 1h Currently the ipc.HTTPTransceiver class has a hardcoded path of '/'. This improvement request is to allow users to provide an override path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-570) python implementation of mapreduce connector
[ https://issues.apache.org/jira/browse/AVRO-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156391#comment-13156391 ] Doug Cutting commented on AVRO-570: --- Finally looking at this. The Java changes look reasonable and all Java tests pass for me. Python tests fail with: {code} [py-test] ./home/cutting/src/avro/trunk/lang/py/build/src/avro/tether/tether_task_runner.py:24: RuntimeWarning: Parent module 'avro.tether' not found while handling absolute import [py-test] from avro import tether [py-test] INFO:root:tether_task_runner.__main__: Task: word_count_task.WordCountTask [py-test] INFO:TetherTask:TetherTask.open: Opening connection to parent server on port=42343 [py-test] MockParentResponder: Recieved 'configure': inputPort=59800 [py-test] localhost.localdomain - - [23/Nov/2011 15:15:17] POST / HTTP/1.1 200 - [py-test] .E [py-test] == [py-test] space, [py-test] ERROR: test1 (test_tether_word_count.TestTetherWordCount) [py-test]fields: [ {name: foo, type: string} ] }, [py-test] -- [py-test] {name: ReferencedRecord, type: record, [py-test] Traceback (most recent call last): [py-test]fields: [ {name: bar, type: double} ] }, [py-test] File /home/cutting/src/avro/trunk/lang/py/build/test/test_tether_word_count.py, line 187, in test1 [py-test] {name: TestError, [py-test] proc=subprocess.Popen(args) [py-test] type: error, fields: [ {name: message, type: string} ] [py-test] File /usr/lib/python2.6/subprocess.py, line 623, in __init__ [py-test] } [py-test] errread, errwrite) [py-test] ], [py-test] File /usr/lib/python2.6/subprocess.py, line 1141, in _execute_child [py-test] [py-test] raise child_exception [py-test] messages: { [py-test] OSError: [Errno 2] No such file or directory [py-test] echo: { [py-test] [py-test] request: [{name: qualified, [py-test] -- [py-test] type: ReferencedRecord}], [py-test] Ran 45 tests in 8.061s {code} Any idea what's causing that? python implementation of mapreduce connector Key: AVRO-570 URL: https://issues.apache.org/jira/browse/AVRO-570 Project: Avro Issue Type: New Feature Components: python Affects Versions: 1.6.0 Reporter: Doug Cutting Assignee: Jeremy Lewi Priority: Critical Labels: hadoop Fix For: 1.7.0 Attachments: AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch AVRO-512 defines protocols for implementing mapreduce tasks. It would be good to have a Python implementation of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-951) Records with field named data collide with new builder code from specific compiler
[ https://issues.apache.org/jira/browse/AVRO-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144097#comment-13144097 ] Doug Cutting commented on AVRO-951: --- Oops. Looks like we both worked on this in parallel and with slightly different approaches. Does anyone have a preference? Records with field named data collide with new builder code from specific compiler Key: AVRO-951 URL: https://issues.apache.org/jira/browse/AVRO-951 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.0 Reporter: Alex Miller Assignee: Doug Cutting Priority: Blocker Fix For: 1.6.1 Attachments: AVRO-951.patch, AVRO-951.patch When I updated my dependencies from 1.5.x to 1.6.0 I found that one of my generated specific data classes failed to compile. The schema definition is: {code} record DataResponse { string queryId; int startRow; boolean more; arrayarrayunion {IRI, BNode, PlainLiteral, TypedLiteral, string, boolean, int, long, float, double, null} data; } {code} which I'm using to create: {code} { type : record, name : DataResponse, fields : [ { name : queryId, type : string }, { name : startRow, type : int }, { name : more, type : boolean }, { name : data, type : { type : array, items : { type : array, items : [ IRI, BNode, PlainLiteral, TypedLiteral, string, boolean, int, long, float, double, null ] } } } {code} which generates this code in the specific compiler: {code} public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBaseDataResponse implements org.apache.avro.data.RecordBuilderDataResponse { private java.lang.CharSequence queryId; private int startRow; private boolean more; // *** local field named data private java.util.Listjava.util.Listjava.lang.Object data; // snipped some /** Creates a Builder by copying an existing DataResponse instance */ private Builder(sherpa.protocol.DataResponse other) { super(sherpa.protocol.DataResponse.SCHEMA$); if (isValidValue(fields[0], other.queryId)) { // *** Call intended to go to super class data field queryId = (java.lang.CharSequence) data.deepCopy(fields[0].schema(), other.queryId); fieldSetFlags[0] = true; } if (isValidValue(fields[1], other.startRow)) { startRow = (java.lang.Integer) data.deepCopy(fields[1].schema(), other.startRow); fieldSetFlags[1] = true; } if (isValidValue(fields[2], other.more)) { more = (java.lang.Boolean) data.deepCopy(fields[2].schema(), other.more); fieldSetFlags[2] = true; } if (isValidValue(fields[3], other.data)) { data = (java.util.Listjava.util.Listjava.lang.Object) data.deepCopy(fields[3].schema(), other.data); fieldSetFlags[3] = true; } } {code} If you note the two ***'ed comments above, the first is the locally generated data field. The second is a reference to a super-class's field, also named data (although it's shadowed by the local data field). The super class is org.apache.avro.data.RecordBuilderBase. Seems like any of the protected fields at that point could potentially collide with actual record field names (schema, fields, fieldSetFlags would all have the same problem). Maybe if those fields were accessed via getters in the generated code, the local fields could shadow the super class without issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-951) Records with field named data collide with new builder code from specific compiler
[ https://issues.apache.org/jira/browse/AVRO-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144158#comment-13144158 ] Doug Cutting commented on AVRO-951: --- The class did not exist before 1.6.0. It has created a regression against 1.5.x, so I think it's probably fair to change its API in 1.6.1. Records with field named data collide with new builder code from specific compiler Key: AVRO-951 URL: https://issues.apache.org/jira/browse/AVRO-951 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.0 Reporter: Alex Miller Assignee: Doug Cutting Priority: Blocker Fix For: 1.6.1 Attachments: AVRO-951.patch, AVRO-951.patch When I updated my dependencies from 1.5.x to 1.6.0 I found that one of my generated specific data classes failed to compile. The schema definition is: {code} record DataResponse { string queryId; int startRow; boolean more; arrayarrayunion {IRI, BNode, PlainLiteral, TypedLiteral, string, boolean, int, long, float, double, null} data; } {code} which I'm using to create: {code} { type : record, name : DataResponse, fields : [ { name : queryId, type : string }, { name : startRow, type : int }, { name : more, type : boolean }, { name : data, type : { type : array, items : { type : array, items : [ IRI, BNode, PlainLiteral, TypedLiteral, string, boolean, int, long, float, double, null ] } } } {code} which generates this code in the specific compiler: {code} public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBaseDataResponse implements org.apache.avro.data.RecordBuilderDataResponse { private java.lang.CharSequence queryId; private int startRow; private boolean more; // *** local field named data private java.util.Listjava.util.Listjava.lang.Object data; // snipped some /** Creates a Builder by copying an existing DataResponse instance */ private Builder(sherpa.protocol.DataResponse other) { super(sherpa.protocol.DataResponse.SCHEMA$); if (isValidValue(fields[0], other.queryId)) { // *** Call intended to go to super class data field queryId = (java.lang.CharSequence) data.deepCopy(fields[0].schema(), other.queryId); fieldSetFlags[0] = true; } if (isValidValue(fields[1], other.startRow)) { startRow = (java.lang.Integer) data.deepCopy(fields[1].schema(), other.startRow); fieldSetFlags[1] = true; } if (isValidValue(fields[2], other.more)) { more = (java.lang.Boolean) data.deepCopy(fields[2].schema(), other.more); fieldSetFlags[2] = true; } if (isValidValue(fields[3], other.data)) { data = (java.util.Listjava.util.Listjava.lang.Object) data.deepCopy(fields[3].schema(), other.data); fieldSetFlags[3] = true; } } {code} If you note the two ***'ed comments above, the first is the locally generated data field. The second is a reference to a super-class's field, also named data (although it's shadowed by the local data field). The super class is org.apache.avro.data.RecordBuilderBase. Seems like any of the protected fields at that point could potentially collide with actual record field names (schema, fields, fieldSetFlags would all have the same problem). Maybe if those fields were accessed via getters in the generated code, the local fields could shadow the super class without issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-951) Records with field named data collide with new builder code from specific compiler
[ https://issues.apache.org/jira/browse/AVRO-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143550#comment-13143550 ] Doug Cutting commented on AVRO-951: --- Maybe if those fields were accessed via getters in the generated code [ ... ] But then the base class getter couldn't be called getData() as that would conflict with the generated getter. I think it would be better to rename base class fields to include a trailing dollar-sign, e.g., data$. This should also be done for method parameters: 'other' above should be 'other$'. Records with field named data collide with new builder code from specific compiler Key: AVRO-951 URL: https://issues.apache.org/jira/browse/AVRO-951 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.0 Reporter: Alex Miller Fix For: 1.6.1 When I updated my dependencies from 1.5.x to 1.6.0 I found that one of my generated specific data classes failed to compile. The schema definition is: {code} record DataResponse { string queryId; int startRow; boolean more; arrayarrayunion {IRI, BNode, PlainLiteral, TypedLiteral, string, boolean, int, long, float, double, null} data; } {code} which I'm using to create: {code} { type : record, name : DataResponse, fields : [ { name : queryId, type : string }, { name : startRow, type : int }, { name : more, type : boolean }, { name : data, type : { type : array, items : { type : array, items : [ IRI, BNode, PlainLiteral, TypedLiteral, string, boolean, int, long, float, double, null ] } } } {code} which generates this code in the specific compiler: {code} public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBaseDataResponse implements org.apache.avro.data.RecordBuilderDataResponse { private java.lang.CharSequence queryId; private int startRow; private boolean more; // *** local field named data private java.util.Listjava.util.Listjava.lang.Object data; // snipped some /** Creates a Builder by copying an existing DataResponse instance */ private Builder(sherpa.protocol.DataResponse other) { super(sherpa.protocol.DataResponse.SCHEMA$); if (isValidValue(fields[0], other.queryId)) { // *** Call intended to go to super class data field queryId = (java.lang.CharSequence) data.deepCopy(fields[0].schema(), other.queryId); fieldSetFlags[0] = true; } if (isValidValue(fields[1], other.startRow)) { startRow = (java.lang.Integer) data.deepCopy(fields[1].schema(), other.startRow); fieldSetFlags[1] = true; } if (isValidValue(fields[2], other.more)) { more = (java.lang.Boolean) data.deepCopy(fields[2].schema(), other.more); fieldSetFlags[2] = true; } if (isValidValue(fields[3], other.data)) { data = (java.util.Listjava.util.Listjava.lang.Object) data.deepCopy(fields[3].schema(), other.data); fieldSetFlags[3] = true; } } {code} If you note the two ***'ed comments above, the first is the locally generated data field. The second is a reference to a super-class's field, also named data (although it's shadowed by the local data field). The super class is org.apache.avro.data.RecordBuilderBase. Seems like any of the protected fields at that point could potentially collide with actual record field names (schema, fields, fieldSetFlags would all have the same problem). Maybe if those fields were accessed via getters in the generated code, the local fields could shadow the super class without issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-946) GenericData.resolveUnion() performance improvement
[ https://issues.apache.org/jira/browse/AVRO-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142216#comment-13142216 ] Doug Cutting commented on AVRO-946: --- Hernan, that sounds like a good plan to me. Would you like to update the patch or should I? GenericData.resolveUnion() performance improvement -- Key: AVRO-946 URL: https://issues.apache.org/jira/browse/AVRO-946 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.0 Reporter: Hernan Otero Attachments: AVRO-946.patch, AVRO-946.patch Due to the sequential nature of today's implementation of GenericData.resolveUnion() (used when serializing an object): {code} public int resolveUnion(Schema union, Object datum) { int i = 0; for (Schema type : union.getTypes()) { if (instanceOf(type, datum)) return i; i++; } throw new UnresolvedUnionException(union, datum); } {code} it showed up when we were doing some serialization performance analysis. A simple optimization can be implemented by keeping a map within the UnionSchema object (in fact, this could actually be a perfect hash map given the potential values in the map are known in advance). The optimization is obviously most notable when a Union within the schema contains many types (in our particular use case, more than 40 in some cases). In this scenario, we observed a 25% improvement by using an identity hash map. Even though using an identity map provides a significant boost, we have observed an even further improvement (and removed some of the restrictions of relying on object identity) by using a perfect hash map on the schema names (an extra 15% on top of that in some cases). This implementation, unfortunately, is not something we could contribute at this point, but we thought it'd be a good idea to allow users to provide alternative implementations of the indexing behavior, such as adding the following static method to Schema: {code} public static void setUnionTypeIndexCacheFactory(UnionIndexCacheFactory factory) { unionIndexCacheFactory = factory; } {code} This is what the interface and identity hash map-based implementation would look like: {code} /** * A factory interface for creating UnionTypeIndexCache instances. */ public static interface UnionIndexCacheFactory { UnionIndexCache createUnionIndexCache(ListSchema types); /** * Used for caching schema indices within a union. */ public static interface UnionIndexCache { void setTypeIndex(Schema schema, int index); int getTypeIndex(Schema schema); } } private static class IdentityMapUnionIndexCacheFactory implements UnionIndexCacheFactory { @Override public UnionIndexCache createUnionIndexCache(ListSchema types) { return new UnionIndexCache() { private final IdentityHashMapSchema, Integer schemaToIndex = new IdentityHashMapSchema, Integer(); @Override public void setTypeIndex(Schema schema, int index) { schemaToIndex.put(schema, index); } @Override public int getTypeIndex(Schema schema) { Integer index = schemaToIndex.get(schema); return index == null ? -1 : index; } }; } } {code} I will attach a patch later today or early tomorrow. Thanks in advance, Hernan Otero -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-946) GenericData.resolveUnion() performance improvement
[ https://issues.apache.org/jira/browse/AVRO-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140260#comment-13140260 ] Doug Cutting commented on AVRO-946: --- Identity equality may result in multiple entries for a given schema but the cache should still work correctly. It would perform poorly if every instance had a different schema, but that's not likely. Also note that Schema now caches hash codes. So even using equals hashing would usually only result in a single call to equals, to verify the hash entry. Equals is fast for identical objects, so, if you used equals hashing, the slow case would be when the cached key is equal but not identical. I think identity hashing with weak keys is probably preferable. GenericData.resolveUnion() performance improvement -- Key: AVRO-946 URL: https://issues.apache.org/jira/browse/AVRO-946 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.6.0 Reporter: Hernan Otero Due to the sequential nature of today's implementation of GenericData.resolveUnion() (used when serializing an object): {code} public int resolveUnion(Schema union, Object datum) { int i = 0; for (Schema type : union.getTypes()) { if (instanceOf(type, datum)) return i; i++; } throw new UnresolvedUnionException(union, datum); } {code} it showed up when we were doing some serialization performance analysis. A simple optimization can be implemented by keeping a map within the UnionSchema object (in fact, this could actually be a perfect hash map given the potential values in the map are known in advance). The optimization is obviously most notable when a Union within the schema contains many types (in our particular use case, more than 40 in some cases). In this scenario, we observed a 25% improvement by using an identity hash map. Even though using an identity map provides a significant boost, we have observed an even further improvement (and removed some of the restrictions of relying on object identity) by using a perfect hash map on the schema names (an extra 15% on top of that in some cases). This implementation, unfortunately, is not something we could contribute at this point, but we thought it'd be a good idea to allow users to provide alternative implementations of the indexing behavior, such as adding the following static method to Schema: {code} public static void setUnionTypeIndexCacheFactory(UnionIndexCacheFactory factory) { unionIndexCacheFactory = factory; } {code} This is what the interface and identity hash map-based implementation would look like: {code} /** * A factory interface for creating UnionTypeIndexCache instances. */ public static interface UnionIndexCacheFactory { UnionIndexCache createUnionIndexCache(ListSchema types); /** * Used for caching schema indices within a union. */ public static interface UnionIndexCache { void setTypeIndex(Schema schema, int index); int getTypeIndex(Schema schema); } } private static class IdentityMapUnionIndexCacheFactory implements UnionIndexCacheFactory { @Override public UnionIndexCache createUnionIndexCache(ListSchema types) { return new UnionIndexCache() { private final IdentityHashMapSchema, Integer schemaToIndex = new IdentityHashMapSchema, Integer(); @Override public void setTypeIndex(Schema schema, int index) { schemaToIndex.put(schema, index); } @Override public int getTypeIndex(Schema schema) { Integer index = schemaToIndex.get(schema); return index == null ? -1 : index; } }; } } {code} I will attach a patch later today or early tomorrow. Thanks in advance, Hernan Otero -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-821) PHP protocol support
[ https://issues.apache.org/jira/browse/AVRO-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140641#comment-13140641 ] Doug Cutting commented on AVRO-821: --- A test would be great! Thanks! PHP protocol support Key: AVRO-821 URL: https://issues.apache.org/jira/browse/AVRO-821 Project: Avro Issue Type: New Feature Components: php Affects Versions: 1.5.1 Environment: all Reporter: Andy Wick Fix For: 1.5.1 Attachments: AVRO-821-fixed.patch, avro.patch PHP version doesn't support protocol format -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-949) NettyTransiever doesn't call RPCPlugin.clientReceiveResponse on the same thread as clientSendRequest
[ https://issues.apache.org/jira/browse/AVRO-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139311#comment-13139311 ] Doug Cutting commented on AVRO-949: --- RPCContext, I mean. NettyTransiever doesn't call RPCPlugin.clientReceiveResponse on the same thread as clientSendRequest Key: AVRO-949 URL: https://issues.apache.org/jira/browse/AVRO-949 Project: Avro Issue Type: Bug Affects Versions: 1.5.4, 1.6.0 Reporter: Philip Zeyliger RPCPlugin.clientReceiveResponse() is called in the Netty IO thread when using a NettyTransceiver. This is quite different than how HTTPTransceiver does it. Users can use RPCPlugin to do things like tracing and timing. It's bizarre that clientSendRequest() happens in the caller's thread, but clientReceiveResponse() happens in a different one, because thread locals are one of the easiest way to pass information between these. There's no easy other way, since RPCContext, which is passed along, has no way to associate arbitrary data with itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-941) Avro should support the Apache Maven Shade plugin class relocation feature
[ https://issues.apache.org/jira/browse/AVRO-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138658#comment-13138658 ] Doug Cutting commented on AVRO-941: --- I'll commit this soon unless someone objects. It's not perfect but it's better than nothing. Avro should support the Apache Maven Shade plugin class relocation feature -- Key: AVRO-941 URL: https://issues.apache.org/jira/browse/AVRO-941 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.5.4 Reporter: Matt Massie Attachments: shade.patch The Apache shade plugin allows maven builds to create an uber jar that contains dependencies in the project. In addition, the shade plugin allows you to relocate dependencies into a private namespace to prevent class conflicts on shared class paths. Avro does not support relocation. All generated Avro objects contain a string field named SCHEMA$ which serves as the authority for the class namespace. When the shade plugin updates the byte code to relocate the class, it doesn't alter the SCHEMA$ string. This break Avro use of reflection since the namespace in SCHEMA$ points to an incorrect location. I spoke with Doug about the issue and he was kind enough to provide a quick hack in order to fix this issue. The hack is to check for mismatches between the byte code and the SCHEMA$ and, when they don't match, to defer to the byte code. I'll attach Doug's patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-943) TestNettyServerWithCallbacks sometimes hangs
[ https://issues.apache.org/jira/browse/AVRO-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135194#comment-13135194 ] Doug Cutting commented on AVRO-943: --- Here's the thread dump. {code} Full thread dump Java HotSpot(TM) Server VM (17.1-b03 mixed mode): New I/O client boss #2 prio=10 tid=0x6e8eec00 nid=0x4222 waiting on condition [0x6e5ad000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x9edf6c58 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) at org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:191) at org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:180) at org.apache.avro.ipc.NettyTransceiver.access$200(NettyTransceiver.java:59) at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:361) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783) at org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:344) at org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:232) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:98) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:404) at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:602) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:91) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) at org.jboss.netty.channel.Channels.close(Channels.java:720) at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:200) at org.jboss.netty.channel.ChannelFutureListener$2.operationComplete(ChannelFutureListener.java:57) at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381) at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:367) at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:334) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:389) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) New I/O client worker #1-1 prio=10 tid=0x6eaedc00 nid=0x4220 runnable [0x6e75c000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x9e8fdbc0 (a sun.nio.ch.Util$1) - locked 0x9e8fdbb0 (a java.util.Collections$UnmodifiableSet) - locked 0x9e8fd9c8 (a
[jira] [Commented] (AVRO-943) TestNettyServerWithCallbacks sometimes hangs
[ https://issues.apache.org/jira/browse/AVRO-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135210#comment-13135210 ] Doug Cutting commented on AVRO-943: --- It hangs around one time in 10. I can reproduce this by running TestNettyServerWithCallbacks in a loop, e.g.: {code} while ( true ); do mvn test -Dtest=TestNettyServerWithCallbacks; done {code} TestNettyServerWithCallbacks sometimes hangs Key: AVRO-943 URL: https://issues.apache.org/jira/browse/AVRO-943 Project: Avro Issue Type: Bug Components: java Reporter: Doug Cutting I'm periodically seeing tests hang in TestNettyServerWithCallbacks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-941) Avro should support the Apache Maven Shade plugin class relocation feature
[ https://issues.apache.org/jira/browse/AVRO-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134406#comment-13134406 ] Doug Cutting commented on AVRO-941: --- the patch should probably only substitute the namespace field specifically That's a flaw, but probably not the most critical one. Avro identifiers cannot contain dots except in namespaces. So as long as the package name contains a dot (which most do) then the global replace should not harm the schema. Fixing this requires a fair amount of code, walking the schema and creating a copy with things renamed. (We already do this in a few places, so probably we should create a SchemaVisitor API to simplify this, but that's a separate issue.) Note that this approach will always be flawed, since it won't always be able to perfectly reconstruct the relocations used when shading. However replacement is only attempted when things are already broken, so it does no harm and imperfections are thus tolerable. Probably the biggest flaw of the current patch is that it will fails if nested schemas are not all in the same namespace. To address this we might look for a common suffix or prefix in the new and old package and then replace the differing text. For example, if the current class is com.baz.hidden.org.foo.Bar and the schema is org.foo.Bar then the replacement should be to prefix all namespaces with com.baz.hidden. I'd be happy to see such improvements to this patch, but I'd not object to it being committed more-or-less as-is since it does no harm. Avro should support the Apache Maven Shade plugin class relocation feature -- Key: AVRO-941 URL: https://issues.apache.org/jira/browse/AVRO-941 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.5.4 Reporter: Matt Massie Attachments: shade.patch The Apache shade plugin allows maven builds to create an uber jar that contains dependencies in the project. In addition, the shade plugin allows you to relocate dependencies into a private namespace to prevent class conflicts on shared class paths. Avro does not support relocation. All generated Avro objects contain a string field named SCHEMA$ which serves as the authority for the class namespace. When the shade plugin updates the byte code to relocate the class, it doesn't alter the SCHEMA$ string. This break Avro use of reflection since the namespace in SCHEMA$ points to an incorrect location. I spoke with Doug about the issue and he was kind enough to provide a quick hack in order to fix this issue. The hack is to check for mismatches between the byte code and the SCHEMA$ and, when they don't match, to defer to the byte code. I'll attach Doug's patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-935) Update Java dependencies for 1.6.0
[ https://issues.apache.org/jira/browse/AVRO-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133185#comment-13133185 ] Doug Cutting commented on AVRO-935: --- Scott, do you want to update this and commit it, or should I? Update Java dependencies for 1.6.0 -- Key: AVRO-935 URL: https://issues.apache.org/jira/browse/AVRO-935 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.6.0 Attachments: AVRO-935.patch Update Java dependencies to the latest version where appropriate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-936) Avro Java does not build with Maven 2.
[ https://issues.apache.org/jira/browse/AVRO-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130979#comment-13130979 ] Doug Cutting commented on AVRO-936: --- +1 This looks fine to me. Avro Java does not build with Maven 2. -- Key: AVRO-936 URL: https://issues.apache.org/jira/browse/AVRO-936 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.0 Reporter: Thiruvalluvan M. G. Assignee: Thiruvalluvan M. G. Attachments: AVRO-936.patch It is because we use the feature Support Enum-type parameters in mojos of Maven 3: http://jira.codehaus.org/browse/MNG-4292. The forthcoming patch fixes it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-935) Update Java dependencies for 1.6.0
[ https://issues.apache.org/jira/browse/AVRO-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130064#comment-13130064 ] Doug Cutting commented on AVRO-935: --- Hmm. 'mvn test' passes for me, but running individual tests with, e.g., 'mvn -Dtest=TestSchema' now fails. That's not critical for the 1.6.0 release, but it's nice if it works for developers. Any idea why this now fails? Update Java dependencies for 1.6.0 -- Key: AVRO-935 URL: https://issues.apache.org/jira/browse/AVRO-935 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.6.0 Attachments: AVRO-935.patch Update Java dependencies to the latest version where appropriate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-935) Update Java dependencies for 1.6.0
[ https://issues.apache.org/jira/browse/AVRO-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130120#comment-13130120 ] Doug Cutting commented on AVRO-935: --- +1 for adding that to the surefire plugin configuration. Update Java dependencies for 1.6.0 -- Key: AVRO-935 URL: https://issues.apache.org/jira/browse/AVRO-935 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.6.0 Attachments: AVRO-935.patch Update Java dependencies to the latest version where appropriate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-467) CMake: Complete CMake build system and remove autotools build system
[ https://issues.apache.org/jira/browse/AVRO-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129209#comment-13129209 ] Doug Cutting commented on AVRO-467: --- lang/c/build.sh seems to work correctly for me after applying the patch. cmake is already required by the C++ build and is listed in the top-level BUILD.txt as a requirement. We should probably add it to the C requirements there too. I have traditionally used Ubuntu package names in that file, since that's what I install. CMake: Complete CMake build system and remove autotools build system Key: AVRO-467 URL: https://issues.apache.org/jira/browse/AVRO-467 Project: Avro Issue Type: Improvement Components: c Affects Versions: 1.3.0 Reporter: Bruce Mitchener Assignee: Bruce Mitchener Labels: cmake Attachments: 0001-AVRO-467.-C-Switch-from-autotools-to-CMake.patch, 0001-AVRO-467.-C-Switch-from-autotools-to-CMake.patch Placeholder bug to serve as a parent for all of the various remaining tasks for the CMake build system. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-570) python implementation of mapreduce connector
[ https://issues.apache.org/jira/browse/AVRO-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126715#comment-13126715 ] Doug Cutting commented on AVRO-570: --- I'm hoping to release 1.6.0 in the next week or two. As long as this doesn't make any incompatible changes it could go into 1.6.1 which will likely follow in a month or so. If it makes incompatible changes and doesn't make 1.6.0 then it wouldn't go out until 1.7.0, probably sometime in the first half of 2012. python implementation of mapreduce connector Key: AVRO-570 URL: https://issues.apache.org/jira/browse/AVRO-570 Project: Avro Issue Type: New Feature Components: python Affects Versions: 1.6.0 Reporter: Doug Cutting Assignee: Jeremy Lewi Priority: Critical Labels: hadoop Fix For: 1.6.0 Attachments: AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch AVRO-512 defines protocols for implementing mapreduce tasks. It would be good to have a Python implementation of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-923) Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration
[ https://issues.apache.org/jira/browse/AVRO-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125964#comment-13125964 ] Doug Cutting commented on AVRO-923: --- it seems to me this risk is already taken for other parameters such as avro.mapper. For the case of schemas though there is a second check that occurs when the input file schema does not match the compiled schema. The input schema is not what I was most concerned about, rather the map output schema. If different tasks somehow got a different map output schema it would result in strange hard-to-debug i/o exceptions. We require that the map output schema is constant across all tasks in a job for things to work correctly. Of course it's not always possible to prohibit folks from creating erroneous situations, we should try to discourage that but don't want to overly limit functionality in the process. It can also be described with xml files What I meant was that the xml files can be programmatically constructed. They should ideally not be constructed with cut and paste, but should use the same source for schemas as the Java code that's getting re-generated to build the new version of the jar file. Perhaps you can refer to the schemas with an external entity definition in the XML that fetches the appropriate version? {code} !DOCTYPE job [ !ENTITY schemaX SYSTEM http://svn.foo.com/project/trunk/schemas/x.avsc; ] job ... schemaX; ... /job {code} Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration --- Key: AVRO-923 URL: https://issues.apache.org/jira/browse/AVRO-923 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.5.4 Environment: any Reporter: Julien Muller Fix For: 1.6.0 Original Estimate: 2h Remaining Estimate: 2h The current implementation of Avro MapRed is designed to use JobConf. While it is possible to use job.xml file, it is pretty painful since you have to copy/paste the all schemes for input and output. This is error prone and time consuming. Also any update in a bean requires to recopy/repaste the schema (if using JobConf a simple recompile would be enough). A proposition to improve this and to stay backward compatible would be to introduce new keys in AvroJob and reference the actual avro bean used. This can be implemented as a fallback. New keys would be created: - avro.input.schema avro.input.class - avro.map.output.schema avro.map.output.class - avro.output.schema avro.output.class Only 3 methods would be impacted in AvroJob: - getInputSchema(Configuration job) { // Implement a fallback like String s = job.get(INPUT_SCHEMA); if(s==null) s = (String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField(SCHEMA$).get(null); return Schema.parse(s); } } - getMapOutputSchema() - getOutputSchema() Also, it would be more consistent to add new setters. This is not mandatory since in that use case, the new keys are filled up directly in the job, not using AvroJob. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-923) Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration
[ https://issues.apache.org/jira/browse/AVRO-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125213#comment-13125213 ] Doug Cutting commented on AVRO-923: --- It's slightly riskier to get the schema from the runtime than from the job, in particular the map output schema. If different versions of code are somehow run on different nodes, then different map output schemas could be used, which would create havoc, since the schema does not travel with the map output data. When the schema is in the job.xml, there's very little chance of a lack of coordination, since the framework distributes the same job.xml to every task. If the schema comes from the runtime, there's some chance that different versions of classes could be installed on different nodes. Another concern is that not all schemas have a class that defines them. For example, one might have jobs whose inputs or outputs are bytes or string or Pairstring,bytes, etc. These are the reasons that schema-in-job.xml is the required and preferred means of specification. However there may be cases where it's preferable to additionally support specification of schemas via a specific class, as suggested in this issue. A JobConf can be programmatically constructed. Why is it so painful to insert the schema there as a part of your job creation/submission pipeline? I'd like to better understand why that's so difficult before we add a new mechanism, since any added mechanism has the potential to create bugs and user confusion. Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration --- Key: AVRO-923 URL: https://issues.apache.org/jira/browse/AVRO-923 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.5.4 Environment: any Reporter: Julien Muller Fix For: 1.6.0 Original Estimate: 2h Remaining Estimate: 2h The current implementation of Avro MapRed is designed to use JobConf. While it is possible to use job.xml file, it is pretty painful since you have to copy/paste the all schemes for input and output. This is error prone and time consuming. Also any update in a bean requires to recopy/repaste the schema (if using JobConf a simple recompile would be enough). A proposition to improve this and to stay backward compatible would be to introduce new keys in AvroJob and reference the actual avro bean used. This can be implemented as a fallback. New keys would be created: - avro.input.schema avro.input.class - avro.map.output.schema avro.map.output.class - avro.output.schema avro.output.class Only 3 methods would be impacted in AvroJob: - getInputSchema(Configuration job) { // Implement a fallback like String s = job.get(INPUT_SCHEMA); if(s==null) s = (String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField(SCHEMA$).get(null); return Schema.parse(s); } } - getMapOutputSchema() - getOutputSchema() Also, it would be more consistent to add new setters. This is not mandatory since in that use case, the new keys are filled up directly in the job, not using AvroJob. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-878) TestWordCount.testProjection is broken
[ https://issues.apache.org/jira/browse/AVRO-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125216#comment-13125216 ] Doug Cutting commented on AVRO-878: --- So should we resolve this as Not a problem or should we keep it open as a problem when running under Java 7? Is Java 7 a platform that Avro needs to support at this point? TestWordCount.testProjection is broken -- Key: AVRO-878 URL: https://issues.apache.org/jira/browse/AVRO-878 Project: Avro Issue Type: Test Affects Versions: 1.6.0 Reporter: Jeremy Lewi Assignee: Jeremy Lewi Fix For: 1.6.0 Attachments: AVRO-878.patch, TEST-org.apache.avro.mapred.TestWordCount.xml, TEST-org.apache.avro.mapred.TestWordCount.xml, lewi-ipc-reports.tar.gz Original Estimate: 1h Remaining Estimate: 1h TestWordCount.testProjection in avro/mapred/TestWordCount.java is broken. It appears to be using the wrong schema to read the output of the map reduce job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-803) Java generated Avro classes make using Avro painful and surprising
[ https://issues.apache.org/jira/browse/AVRO-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124299#comment-13124299 ] Doug Cutting commented on AVRO-803: --- I'd prefer not to break back-compatibility this time. It makes it impossible for folks to upgrade one project without making source code changes to other projects. If you specify stringTypeString/stringType in your pom.xml then all your Map keys become java.lang.String. Java generated Avro classes make using Avro painful and surprising -- Key: AVRO-803 URL: https://issues.apache.org/jira/browse/AVRO-803 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.5.0 Environment: Any Reporter: Sam Pullara Assignee: Doug Cutting Fix For: 1.6.0 Attachments: AVRO-803.patch, AVRO-803.patch, Foo.java Currently the Avro generated Java classes expose CharSequence in their API. However, you cannot use any old CharSequence when interacting with them. In fact, you have to use the Utf8 class if you want to get consistent results. I think that Avro should work with any CharSequence if that is the API. Here is an example where this happens: https://github.com/spullara/avro-generated-code/blob/master/src/test/java/AnnoyingTest.java That prints out 'false' three times unexpectedly. If you can't get it to print 'true' three times then you should probably change it back to Utf8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-897) Map lookup behavior is ill-defined in Java
[ https://issues.apache.org/jira/browse/AVRO-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123118#comment-13123118 ] Doug Cutting commented on AVRO-897: --- This is addressed by the patch for AVRO-803, described in the comment at http://s.apache.org/VJC. GenericDatumReader will now use java.lang.String everywhere when string schemas are annotated with avro.java.string:String. There's a GenericData method to add this annotation. This is perhaps not ideal but it is back-compatible which is important. Can we close this issue as a duplicate of AVRO-803? Map lookup behavior is ill-defined in Java -- Key: AVRO-897 URL: https://issues.apache.org/jira/browse/AVRO-897 Project: Avro Issue Type: Bug Affects Versions: 1.5.1 Reporter: Garrett Wu Attachments: avro-charsequence-map-test.tar.gz In Java, an Avro {{map}} is a Java {{Map}}. The map keys are type {{string}}, which maps to a Java {{CharSequence}}. Clients must know to use {{Utf8}} objects when calling {{get()}} or {{containsKey()}}. Instead, {{GenericDatumReader}} should instantiate a {{Map}} instance with a {{Comparator}} suitable for comparing any type of {{CharSequence}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-883) Avro should come with a simple Java example
[ https://issues.apache.org/jira/browse/AVRO-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123183#comment-13123183 ] Doug Cutting commented on AVRO-883: --- Perhaps this should go into lang/java/archetypes/avro-data-archetype? If you agree, would you be willing to convert this into a patch that goes there? Thanks! Avro should come with a simple Java example --- Key: AVRO-883 URL: https://issues.apache.org/jira/browse/AVRO-883 Project: Avro Issue Type: Task Components: java Reporter: William McNeill Fix For: 1.6.0 The Avro distribution should have a simple Java example of how to serialize and deserialize data. As discussed on the mailing list, the example at https://github.com/wpm/AvroExample can serve as a basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-912) Mapreduce tether test fails on Windows
[ https://issues.apache.org/jira/browse/AVRO-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122121#comment-13122121 ] Doug Cutting commented on AVRO-912: --- +1 Changes look good to me and tests pass on Linux. Mapreduce tether test fails on Windows -- Key: AVRO-912 URL: https://issues.apache.org/jira/browse/AVRO-912 Project: Avro Issue Type: Bug Reporter: Thiruvalluvan M. G. Assignee: Thiruvalluvan M. G. Attachments: AVRO-912.patch The problems are: 1. The executable filename is passed around as a URL. Windows filenames are valid URLs. 2. Typical windows user's home directory is {{c:\Documents and Settings\username}}. Maven puts the downloaded jar files under {{$HOME/.m2}}. So the classpath has several directories with space in their names. Splitting command line arguments using space generates invalid classpath. 3. Hadoop's {{TaskLog.captureOutAndError()}} generates command line for unix systems using bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-803) Java generated Avro classes make using Avro painful and surprising
[ https://issues.apache.org/jira/browse/AVRO-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122154#comment-13122154 ] Doug Cutting commented on AVRO-803: --- Here's a new proposal: - add a new Decoder method, 'String readString()' implemented to avoid allocating new intermediate byte arrays for each call as is currently done when Utf8's are not reused. - change generated specific code to optionally use String everywhere instead of CharSequence. (We could also add an option to emit Utf8 everywhere.) When String is used we add a property to the string schemas in the generated code so they become {type:string, java:String}. - GenericData#readString() would call the new Decoder method when java:String is present in the String's schema. This is totally back-compatible. Java generated Avro classes make using Avro painful and surprising -- Key: AVRO-803 URL: https://issues.apache.org/jira/browse/AVRO-803 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.5.0 Environment: Any Reporter: Sam Pullara Fix For: 1.6.0 Attachments: Foo.java Currently the Avro generated Java classes expose CharSequence in their API. However, you cannot use any old CharSequence when interacting with them. In fact, you have to use the Utf8 class if you want to get consistent results. I think that Avro should work with any CharSequence if that is the API. Here is an example where this happens: https://github.com/spullara/avro-generated-code/blob/master/src/test/java/AnnoyingTest.java That prints out 'false' three times unexpectedly. If you can't get it to print 'true' three times then you should probably change it back to Utf8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira