[jira] [Commented] (AVRO-1745) Avro-tools bundles third party jar files
[ https://issues.apache.org/jira/browse/AVRO-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948000#comment-14948000 ] Philip Zeyliger commented on AVRO-1745: --- This is essentially the same issue as AVRO-998. There's a mvn classifier that can get you an isolated copy; largely avro tools is intended to be used standalone. > Avro-tools bundles third party jar files > > > Key: AVRO-1745 > URL: https://issues.apache.org/jira/browse/AVRO-1745 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Eric Yang > > Avro-tools bundles Hadoop jar file, and several third party jar files. This > can introduce problems on integrating with different version of third party > jar files. If someone have avro in their class path, and somehow pick up > avro-tools hadoop classes while running map reduce job. It can potentially > cause the job to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AVRO-1745) Avro-tools bundles third party jar files
[ https://issues.apache.org/jira/browse/AVRO-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-1745. --- Resolution: Duplicate > Avro-tools bundles third party jar files > > > Key: AVRO-1745 > URL: https://issues.apache.org/jira/browse/AVRO-1745 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Eric Yang > > Avro-tools bundles Hadoop jar file, and several third party jar files. This > can introduce problems on integrating with different version of third party > jar files. If someone have avro in their class path, and somehow pick up > avro-tools hadoop classes while running map reduce job. It can potentially > cause the job to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AVRO-1610) HttpTransceiver.java allocates arbitrary amount of memory
Philip Zeyliger created AVRO-1610: - Summary: HttpTransceiver.java allocates arbitrary amount of memory Key: AVRO-1610 URL: https://issues.apache.org/jira/browse/AVRO-1610 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.7 Reporter: Philip Zeyliger In {{HttpTransceiver.java}}, Avro does: {code} int length = (in.read()24)+(in.read()16)+(in.read()8)+in.read(); if (length == 0) { // end of buffers return buffers; } ByteBuffer buffer = ByteBuffer.allocate(length); {code} This means that badly formatted input (like that produced by {{curl http://host/ --data foo}} and many common security scanners) will trigger an OutOfMemory exception. This is undesirable, especially combined with setups that kill the process on out of memory exceptions. This bug is similar in spirit to AVRO-. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace
[ https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634527#comment-13634527 ] Philip Zeyliger commented on AVRO-1296: --- Patches look good to me. Unless there are objections, I'll commit tomorrow or so. Python: schemas retrieved from protocol types ignore namespace -- Key: AVRO-1296 URL: https://issues.apache.org/jira/browse/AVRO-1296 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.7.4 Reporter: Jeremy Kahn Assignee: Jeremy Kahn Fix For: 1.7.5 Attachments: AVRO-1296a.patch, AVRO-1296b.patch If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines {{namespace: ns}} and then retrieve a child schema {{s}} from the protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not have its namespace set (to {{ns}}), even if {{p}} has a namespace. This is particularly problematic if I'm using {{s}} to write out an avro file intended to be read by a specific-type reader, because the file header will claim to be objects of type {{s}} (not {{ns.s}}, as expected). I've attached two patches: one that makes sure that the {{namespace}} property of protocol types is set to the default namespace of the protocol when not otherwise set. The second patch ensures that the {{namespace}} is *not* rendered into JSON when a default protocol specifies the right value already. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1248) Add a avro tool that dumps the protocol from a remote RPC service
[ https://issues.apache.org/jira/browse/AVRO-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575479#comment-13575479 ] Philip Zeyliger commented on AVRO-1248: --- +1. This is a great addition. Two nits: bq.ListByteBuffer response = transceiver.transceive (byteBufferOutputStream.getBufferList()); I think style dictates no space between transceive and the arguments. bq. +transceiver.close(); // can disconnect at this point as all that is left is to output the result It might be appropriate to put the close() in a finally block to make sure it happens, even if there's an error before that. Add a avro tool that dumps the protocol from a remote RPC service - Key: AVRO-1248 URL: https://issues.apache.org/jira/browse/AVRO-1248 Project: Avro Issue Type: Improvement Components: java Reporter: Gareth Davis Priority: Minor Attachments: AVRO-1248.patch Would like to introspect a running RPC service using the avro command line tool. expected command would be: java -jar avro-tools.jar rpcprotocol uri The output would be pretty printed Protocol definition reported by the remote server during the handshake -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1248) Add a avro tool that dumps the protocol from a remote RPC service
[ https://issues.apache.org/jira/browse/AVRO-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated AVRO-1248: -- Resolution: Fixed Status: Resolved (was: Patch Available) I've committed this (r1444584). Thanks for the contribution, Gareth. I also tested the tool against an Avro RPC server I have running here locally. Worked like a charm. Add a avro tool that dumps the protocol from a remote RPC service - Key: AVRO-1248 URL: https://issues.apache.org/jira/browse/AVRO-1248 Project: Avro Issue Type: Improvement Components: java Reporter: Gareth Davis Priority: Minor Attachments: AVRO-1248.patch Would like to introspect a running RPC service using the avro command line tool. expected command would be: java -jar avro-tools.jar rpcprotocol uri The output would be pretty printed Protocol definition reported by the remote server during the handshake -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1156) Avro responder swallows thrown Errors
[ https://issues.apache.org/jira/browse/AVRO-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453455#comment-13453455 ] Philip Zeyliger commented on AVRO-1156: --- I misunderstood the patch, so withdraw my +1 for the time being. Avro responder swallows thrown Errors - Key: AVRO-1156 URL: https://issues.apache.org/jira/browse/AVRO-1156 Project: Avro Issue Type: Bug Reporter: Mike Percy Assignee: Mike Percy Fix For: 1.7.2 Attachments: AVRO-1156-1.patch The Avro responder wraps caught Errors, such as OutOfMemoryErrors, in Exceptions and rethrows them. That's problematic because an Error should be allowed to crash the JVM, since it's often irrecoverable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1111) Malformed data can cause OutOfMemoryError in Avro IPC
[ https://issues.apache.org/jira/browse/AVRO-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451727#comment-13451727 ] Philip Zeyliger commented on AVRO-: --- Mike, your patch seems like a pragmatic approach. I'm +1 on the patch. You might be even more conservative: 10% of the maximum memory seems like more than large enough. Even 100MB seems large enough. There are two other places we could annotate this sort of information. We could annotate the protocol description to say maxSize=100MB, to limit the size of arrays. That requires a protocol change, and it also requires keeping track of the size of requests, which is tricky in its own way. Another approach is to pass a max size to the transceiver when instantiating it. An application might be able to say never accept RPCs 100MB (in fact, that's a reasonable default). If an application wants to use larger ones, it can configure the server appropriately, thereby bypassing the check. Thoughts on these alternatives? Malformed data can cause OutOfMemoryError in Avro IPC - Key: AVRO- URL: https://issues.apache.org/jira/browse/AVRO- Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.3 Reporter: Hari Shreedharan Attachments: AVRO--1.patch If the data that comes in through the Netty channel buffer is not framed correctly/is not valid Avro data, then the incoming data can cause arbitrarily large array lists to be created, causing OutOfMemoryError. The relevant code(org.apache.avro.ipc.NettyTransportCodec): private boolean decodePackHeader(ChannelHandlerContext ctx, Channel channel, ChannelBuffer buffer) throws Exception { if (buffer.readableBytes()8) { return false; } int serial = buffer.readInt(); listSize = buffer.readInt(); dataPack = new NettyDataPack(serial, new ArrayListByteBuffer(listSize)); return true; } If the buffer does not have valid Avro data, the listSize variable can have arbitrary values, causing massive ArrayLists to be created, leading to OutOfMemoryErrors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-625) RPC: permit out-of-order responses
[ https://issues.apache.org/jira/browse/AVRO-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295871#comment-13295871 ] Philip Zeyliger commented on AVRO-625: -- It'd be great to get a non-Java implementation as well. Today, I think there are only Http-tranceiver clients in python. RPC: permit out-of-order responses -- Key: AVRO-625 URL: https://issues.apache.org/jira/browse/AVRO-625 Project: Avro Issue Type: New Feature Components: java, spec Reporter: Doug Cutting Assignee: Doug Cutting It should be possible, when using a stateful, connection-based transport, for a client to complete a second request over a connection before the first request has returned. In other words, responses should be permitted to arrive out-of-order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-845) setup.py uses Python2.7+ specific code
[ https://issues.apache.org/jira/browse/AVRO-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057934#comment-13057934 ] Philip Zeyliger commented on AVRO-845: -- Note that I haven't backported this; just checked it into trunk. setup.py uses Python2.7+ specific code -- Key: AVRO-845 URL: https://issues.apache.org/jira/browse/AVRO-845 Project: Avro Issue Type: Bug Components: python Affects Versions: 1.5.1 Reporter: Miki Tebeka Assignee: Miki Tebeka Labels: python Fix For: 1.5.2, 1.6.0 Attachments: AVRO-845.diff setup.py uses sys.version_info.major which was introduced at Python2.7 (thanks to Jeremy Lewi for reporting). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-836) Have an avro command line utility to display and write Avro files
[ https://issues.apache.org/jira/browse/AVRO-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051828#comment-13051828 ] Philip Zeyliger commented on AVRO-836: -- Committed revision 1137526. A small python suggestion for next time: since you imported avro.schema, using avro as a variable name is suggested against, because it's a local that's redefining something that's in an outer scope (the module avro). pyflakes in my editor complained about it. Thank you for your contribution! Have an avro command line utility to display and write Avro files --- Key: AVRO-836 URL: https://issues.apache.org/jira/browse/AVRO-836 Project: Avro Issue Type: New Feature Components: python Reporter: Miki Tebeka Labels: command-line, python Attachments: AVRO-836.diff, AVRO-836.diff, AVRO-836.diff Following email conversation, this is a port of [avrocat|https://bitbucket.org/tebeka/avrocat/src] to the avro code base. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-836) Have an avro command line utility to display and write Avro files
[ https://issues.apache.org/jira/browse/AVRO-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050170#comment-13050170 ] Philip Zeyliger commented on AVRO-836: -- Miki--I haven't. Would you mind posting that as a patch? I don't have mercurial handy, plus if you post a patch it's quite explicit that you're agreeing to the apache license terms. Have an avro command line utility to display and write Avro files --- Key: AVRO-836 URL: https://issues.apache.org/jira/browse/AVRO-836 Project: Avro Issue Type: New Feature Components: python Reporter: Miki Tebeka Labels: command-line, python Attachments: AVRO-836.diff, AVRO-836.diff Following email conversation, this is a port of [avrocat|https://bitbucket.org/tebeka/avrocat/src] to the avro code base. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-836) Have an avro command line utility to display and write Avro files
[ https://issues.apache.org/jira/browse/AVRO-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047510#comment-13047510 ] Philip Zeyliger commented on AVRO-836: -- 1) ok 2) Cool; looking forward to a new patch. Have an avro command line utility to display and write Avro files --- Key: AVRO-836 URL: https://issues.apache.org/jira/browse/AVRO-836 Project: Avro Issue Type: New Feature Components: python Reporter: Miki Tebeka Labels: command-line, python Attachments: AVRO-836.diff Following email conversation, this is a port of [avrocat|https://bitbucket.org/tebeka/avrocat/src] to the avro code base. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-836) Have an avro command line utility to display and write Avro files
[ https://issues.apache.org/jira/browse/AVRO-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047761#comment-13047761 ] Philip Zeyliger commented on AVRO-836: -- OK leave it. Have an avro command line utility to display and write Avro files --- Key: AVRO-836 URL: https://issues.apache.org/jira/browse/AVRO-836 Project: Avro Issue Type: New Feature Components: python Reporter: Miki Tebeka Labels: command-line, python Attachments: AVRO-836.diff Following email conversation, this is a port of [avrocat|https://bitbucket.org/tebeka/avrocat/src] to the avro code base. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-836) Have an avro command line utility to display and write Avro files
[ https://issues.apache.org/jira/browse/AVRO-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047796#comment-13047796 ] Philip Zeyliger commented on AVRO-836: -- Unless anyone has objections, I'll commit this Sunday or Monday. Holler! Have an avro command line utility to display and write Avro files --- Key: AVRO-836 URL: https://issues.apache.org/jira/browse/AVRO-836 Project: Avro Issue Type: New Feature Components: python Reporter: Miki Tebeka Labels: command-line, python Attachments: AVRO-836.diff, AVRO-836.diff Following email conversation, this is a port of [avrocat|https://bitbucket.org/tebeka/avrocat/src] to the avro code base. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-836) Have an avro command line utility to display and write Avro files
[ https://issues.apache.org/jira/browse/AVRO-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046178#comment-13046178 ] Philip Zeyliger commented on AVRO-836: -- This looks great to me. Some comments: 1) Are you sure you want to put avro into scripts/? Seems like src/avro would be just as good a place. 2) Usage help: {quote} [1]air::py(129882)$PYTHONPATH=src ./scripts/avro error: You must specify `cat` or `write` [1]air::py(129888)$PYTHONPATH=src ./scripts/avro write error: No schema specified {quote} Have you considered spitting out a more useful message here? You can get optparse to spit out fairly nice error messages. It's hard to figure out what's required and how to use it. Thanks! Have an avro command line utility to display and write Avro files --- Key: AVRO-836 URL: https://issues.apache.org/jira/browse/AVRO-836 Project: Avro Issue Type: New Feature Components: python Reporter: Miki Tebeka Labels: command-line, python Attachments: AVRO-836.diff Following email conversation, this is a port of [avrocat|https://bitbucket.org/tebeka/avrocat/src] to the avro code base. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-539) Allow asynchronous clients to specify a callback to be run when server processing completes
[ https://issues.apache.org/jira/browse/AVRO-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043415#comment-13043415 ] Philip Zeyliger commented on AVRO-539: -- BTW, Google Guava has http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/util/concurrent/ListenableFuture.html, which may be a working middle ground. Allow asynchronous clients to specify a callback to be run when server processing completes --- Key: AVRO-539 URL: https://issues.apache.org/jira/browse/AVRO-539 Project: Avro Issue Type: New Feature Reporter: Jeff Hammerbacher Attachments: AVRO-539.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (AVRO-833) Don't require simplejson when not needed
[ https://issues.apache.org/jira/browse/AVRO-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-833. -- Resolution: Fixed Assignee: Miki Tebeka Hadoop Flags: [Reviewed] Thanks for your contribution. I've checked this in as r1129856 after running build.sh test. In the future, please try to provide patch files relative to the top-level directory. I find that naming them AVRO-833.patch also helps keep track of things. Thanks! -- Philip Don't require simplejson when not needed Key: AVRO-833 URL: https://issues.apache.org/jira/browse/AVRO-833 Project: Avro Issue Type: Bug Components: python Reporter: Miki Tebeka Assignee: Miki Tebeka Priority: Minor Labels: python Attachments: setup.patch Currently setup.py required simplejson unconditionally. However since json was added to python 2.6 and up (and the avro code tries to use the build in module first), require it only if python version is 2.5 or lower. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-817) Add __version__ to avro/__init__.py
[ https://issues.apache.org/jira/browse/AVRO-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030150#comment-13030150 ] Philip Zeyliger commented on AVRO-817: -- That approach is a bit of an anti-pattern: you want to be able to point your python path towards the python directory; replacing stuff at build-time is very uncommon in python-land. Add __version__ to avro/__init__.py --- Key: AVRO-817 URL: https://issues.apache.org/jira/browse/AVRO-817 Project: Avro Issue Type: Improvement Components: python Affects Versions: 1.5.1 Reporter: Miki Tebeka Assignee: Miki Tebeka Attachments: version.patch Original Estimate: 10m Remaining Estimate: 10m Currently, there is no way to know which version of avro is installed on my machine. Add a __version__ string to avro/__init__.py -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (AVRO-296) Extend Avro IDL (was genavro) to do doc fields
[ https://issues.apache.org/jira/browse/AVRO-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated AVRO-296: - Summary: Extend Avro IDL (was genavro) to do doc fields (was: Extend genavro to do doc fields) Extend Avro IDL (was genavro) to do doc fields Key: AVRO-296 URL: https://issues.apache.org/jira/browse/AVRO-296 Project: Avro Issue Type: New Feature Components: java Reporter: Philip Zeyliger Assignee: Philip Zeyliger Priority: Minor AVRO-152 introduces docs in schemas; genavro should understand those as well. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (AVRO-768) Avro IDL Compiler Should Generate Language Docs
[ https://issues.apache.org/jira/browse/AVRO-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-768. -- Resolution: Duplicate This is a duplicate of AVRO-296. Ed: this is an excellent idea, and I encourage you to work on it! Should be a moderately straight-forward and 100% uncontroversial change. Avro IDL Compiler Should Generate Language Docs --- Key: AVRO-768 URL: https://issues.apache.org/jira/browse/AVRO-768 Project: Avro Issue Type: Improvement Components: build Affects Versions: 1.4.1, 1.5.0 Reporter: Ed Kohlwey The Avro IDL compiler should support the inline documentation features of Avro. This is already implemented in the Avro protocol specification as the doc field, however there is no documented feature in Avro IDL for supporting this. I suggest using a style similar to Javadocs, where /** */ is used to represent doc comments. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (AVRO-732) Generated protocol's method should not throw AvroRemoteException
[ https://issues.apache.org/jira/browse/AVRO-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12985775#action_12985775 ] Philip Zeyliger commented on AVRO-732: -- Serializing Java exceptions only works for Java: I think you've more or less got to coerce the exception into a string. Generated protocol's method should not throw AvroRemoteException Key: AVRO-732 URL: https://issues.apache.org/jira/browse/AVRO-732 Project: Avro Issue Type: Bug Components: java Reporter: Sharad Agarwal If user does NOT define the throws clause in the idl, the code is generated with throws AvroRemoteException clause. However on throwing the AvroRemoteException from the implementation, the serialization fails. This is not intuitive to users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-656) writing unions with multiple records, fixed or enums can choose wrong branch
[ https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977860#action_12977860 ] Philip Zeyliger commented on AVRO-656: -- For what it's worth, I've always slightly agitated for wrappers. I like dynamic typing as much as the next guy, but when type information is available, I want to be able to get at it unambiguously. It's probably too late to change. (Though part of the beauty of some of this stuff is that it's possible to have two different client implementations to try out.) writing unions with multiple records, fixed or enums can choose wrong branch - Key: AVRO-656 URL: https://issues.apache.org/jira/browse/AVRO-656 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.4.0 Reporter: Doug Cutting Assignee: Doug Cutting Fix For: 1.5.0 Attachments: AVRO-656.patch, AVRO-656.patch According to the specification, a union may contain multiple instances of a named type, provided they have different names. There are several bugs in the Java implementation of this when writing data: - for record, only the short-name of the record is checked, so the branch for a record of the same name in a different namespace may be used by mistake - for enum and fixed, the name of the record is not checked, so the first enum or fixed in the union will always be assumed when writing. in many cases this may cause the wrong data to be written, potentially corrupting output. This is not a regression. This has never been implemented correctly by Java. Python and Ruby never check names, but rather perform a full, recursive validation of content. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-723) Incorrect handling of Undeclared Errors in IPC calls to SpecificResponder
[ https://issues.apache.org/jira/browse/AVRO-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976549#action_12976549 ] Philip Zeyliger commented on AVRO-723: -- Hi Stephen, I think this is an improvement. I'm a bit worried about catching the AvroRuntimeException but, then ignoring it. Could the AvroRuntimeException actually be something other than the schema mismatch from an undeclared error? If so, that error would get swallowed. Incorrect handling of Undeclared Errors in IPC calls to SpecificResponder - Key: AVRO-723 URL: https://issues.apache.org/jira/browse/AVRO-723 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.5.0 Reporter: Stephen Gargan Priority: Minor Attachments: undeclared-error.patch Undeclared errors thrown during service invocations are not getting returned correctly. When they are encountered, the writeError method in the responder will try to encode them using errors union for the message. However, because they are undeclared they are not present in the union and encoding causes a further AvroRuntimeException. Its this Not in union exception that gets returned to the client, not the undeclared problem which gets lost. The attached patch handles them like other system errors, calling to string on the exception and writing this as the error to be returned. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-663) avro-tools-1.4.0.jar doesn't meet the maven2 layout standard, making it inaccessble to maven users
[ https://issues.apache.org/jira/browse/AVRO-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971939#action_12971939 ] Philip Zeyliger commented on AVRO-663: -- There's also shade. http://maven.apache.org/plugins/maven-shade-plugin/. I'm not familiar with the distinctions. avro-tools-1.4.0.jar doesn't meet the maven2 layout standard, making it inaccessble to maven users -- Key: AVRO-663 URL: https://issues.apache.org/jira/browse/AVRO-663 Project: Avro Issue Type: Bug Components: build Affects Versions: 1.4.0 Reporter: Brian Fox in /org/apache/avro/avro/1.4.0 you have avro-tools-1.4.0.jar. This should instead be avro-1.4.0-tools.jar (similar to sources/javadoc jar structure) if this is really just an attached artifact. If this is supposed to be something separate and avro-tools is the artifactId, then it should be in /org/apache/avro/avro-tools/1.4.0/avro-tools-1.4.0.jar -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-709) Some python speedups
[ https://issues.apache.org/jira/browse/AVRO-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12971978#action_12971978 ] Philip Zeyliger commented on AVRO-709: -- Justin, This is really close to being committed. If you put together a patch with the right licenses and both files, it'd be my pleasure to push it through. Thanks! Some python speedups Key: AVRO-709 URL: https://issues.apache.org/jira/browse/AVRO-709 Project: Avro Issue Type: Improvement Components: python Affects Versions: 1.5.0 Environment: linux Reporter: Justin Azoff Priority: Minor Fix For: 1.5.0 Attachments: 0001-python-speedups.patch, av_bench.py I did some basic profiling of the python library and found it spent most of it's time looking up the properties. Converting them to plain class members speeds things up quite a bit. using __slots__ would probably improve things even more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-709) Some python speedups
[ https://issues.apache.org/jira/browse/AVRO-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969554#action_12969554 ] Philip Zeyliger commented on AVRO-709: -- Patch looks good. I'd like to check in av_bench.py. It doesn't have the Apache license header on it, which is probably a pre-requisite. Justin, would you mind creating a patch that's got both files? Thanks! Some python speedups Key: AVRO-709 URL: https://issues.apache.org/jira/browse/AVRO-709 Project: Avro Issue Type: Improvement Components: python Affects Versions: 1.5.0 Environment: linux Reporter: Justin Azoff Priority: Minor Fix For: 1.5.0 Attachments: 0001-python-speedups.patch, av_bench.py I did some basic profiling of the python library and found it spent most of it's time looking up the properties. Converting them to plain class members speeds things up quite a bit. using __slots__ would probably improve things even more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-648) Use a template library for the SpecificCompiler
[ https://issues.apache.org/jira/browse/AVRO-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926851#action_12926851 ] Philip Zeyliger commented on AVRO-648: -- +1. Sorry 'bout the slow review; crazy couple of days. Use a template library for the SpecificCompiler --- Key: AVRO-648 URL: https://issues.apache.org/jira/browse/AVRO-648 Project: Avro Issue Type: Improvement Components: java Reporter: Philip Zeyliger Assignee: Philip Zeyliger Fix For: 1.5.0 Attachments: AVRO-648.patch, AVRO-648.patch.txt This JIRA proposes uses a templating library instead of string concatenation for the SpecificCompiler. We've had conversations on the list about customizing the generated code (by adding getters and setters, adding utility classes for the arrays, adding String-friendly (as opposed to Utf8) accessors, etc.), but we've been stymied by the fact that the specific compiler is hard-coded to use one template, and it's hard to experiment with. Sam Pullara (at http://github.com/spullara/avrocompiler) has done pretty much this: he forked/subclassed a copy of SpecificCompiler that uses the Mustache language to generate code. He's also gone ahead and done some of the customizations. In the patch I'm about to post, I've replicated the existing code generation using Velocity. We already build Velocity for some of the IPC plugins, and it's an Apache project. The existing tests pass, plus I've added tests that check that the generated code is character-for-character the same, in a handful of cases. This was actually quite painful, since I had to reproduce some questionable indentation and trailing whitespace :). That said, I'm pleased with how easy it was to incorporate the templates. Eventually, I hope we support getters and setters, or perhaps support multiple versions of templates. (If someone wants to generate, say, C++ code, the path is now a lot easier for that, as well.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-672) Convert JSON Text Input to Avro Tool
[ https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918341#action_12918341 ] Philip Zeyliger commented on AVRO-672: -- I like the idea of having tools that manipulate traditional data formats into avro records, including guessing at the schema. CSV and TSV and one-json-per-line are obvious candidates here. Convert JSON Text Input to Avro Tool Key: AVRO-672 URL: https://issues.apache.org/jira/browse/AVRO-672 Project: Avro Issue Type: New Feature Reporter: Ron Bodkin Attachments: AVRO-672.patch, AVRO-672.patch The attached patch allows reading a JSON-formatted text file in, converting to a conforming Avro text file, emitting one record per line, e.g., it can read this input file: {intval:12} {intval:-73,strval:hello, there!!} with this schema: { type:record, name:TestRecord, fields: [ {name:intval,type:int}, {name:strval,type:[string, null]}]} returning valid Avro. This is different than the DataFileWriteTool, which would read in the following internal encoding: {intval:12,strval:null} {intval:-73,strval:{string:hello, there!!}} In general, the internal encodings used by Avro aren't natural when reading in JSON text that appears in the wild. Likewise, this utility allows changing invalid Avro identifier characters into an underscore, again to tolerate JSON that wasn't designed to be readable by Avro. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-311) Run Python tests with nose
[ https://issues.apache.org/jira/browse/AVRO-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917474#action_12917474 ] Philip Zeyliger commented on AVRO-311: -- Hi Daniel, My only hesitation with this patch is that adds another dependency to folks building AVRO, namely that of having nosetests installed. I think nosetests is LGPL, so we shouldn't be redistributing it, though it's a fine build dependency. If no one objects, I'll go ahead and commit this, since I much prefer nose to python's built-in unittest myself. Run Python tests with nose -- Key: AVRO-311 URL: https://issues.apache.org/jira/browse/AVRO-311 Project: Avro Issue Type: Improvement Components: build, python Reporter: Jeff Hammerbacher Attachments: AVRO-311.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-673) Reduce time spent validating schemas
[ https://issues.apache.org/jira/browse/AVRO-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914676#action_12914676 ] Philip Zeyliger commented on AVRO-673: -- Patch looks reasonable to me. Haven't downloaded and tried it. Reduce time spent validating schemas Key: AVRO-673 URL: https://issues.apache.org/jira/browse/AVRO-673 Project: Avro Issue Type: Improvement Components: python Reporter: Erik Frey Priority: Minor Attachments: AVRO-673.patch avro.io has a validate method that currently occupies around half the time it takes to serialize a fairly complex record through a datafile. validate() gets called repeatedly during an object's traversal, even though validate itself is already recursive. This introduces combinatorially excessive validation that has a significant impact on the performance of serializing complex records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-670) Allow DataFileWriteTool to accept schema files as input
[ https://issues.apache.org/jira/browse/AVRO-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated AVRO-670: - Status: Resolved (was: Patch Available) Hadoop Flags: [Incompatible change, Reviewed] Release Note: The fromjson tool now requires either a --schema or --schema-file command-line argument to specify the schema. Previously, the schema was to be specified as the first argument. Assignee: Ron Bodkin Resolution: Fixed Hi Ron, Thanks for your contribution! (And congratulations on your first contribution to AVRO.) I've committed it. I made two very minor changes: I fixed one checkstyle bug (ant test runs checkstyle) bq. [checkstyle] /data/6/philip/avro-svn/lang/java/src/java/org/apache/avro/tool/DataFileWriteTool.java:121:69: Redundant throws: 'FileNotFoundException' is subclass of 'IOException'. I also added a --schema to src/test/bin/test_tools.sh in one place, to fix a test failure. Allow DataFileWriteTool to accept schema files as input --- Key: AVRO-670 URL: https://issues.apache.org/jira/browse/AVRO-670 Project: Avro Issue Type: Improvement Reporter: Ron Bodkin Assignee: Ron Bodkin Fix For: 1.5.0 Attachments: AVRO-670.patch, datafilewritefile.patch For non-trivial schemas, it's difficult to pass them inline as a command line argument. I made a patch to use two different arguments: instead of having the first argument be the schema you would now use -schema-file file or -schema schema and then have one other argument (the input JSON file) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
[ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912725#action_12912725 ] Philip Zeyliger commented on AVRO-647: -- I would be +1 full-maven for Java. Amongst the evils available, it's one of the least objectionable. I'm using it on another project now, and, well, I hate that I don't know what it's doing half the time, but it removes a considerable amount of the Ivy and Ant boilerplate. Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar -- Key: AVRO-647 URL: https://issues.apache.org/jira/browse/AVRO-647 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Assignee: Scott Carey Our dependencies are starting to get a little complicated on the Java side. I propose we build two (possibly more) jars related to our major dependencies and functions. 1. avro.jar (or perhaps avro-core.jar) This contains all of the core avro functionality for _using_ avro as a library. This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop. This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?). 2. avro-dev.jar This would contain compilers, idl, development tools, etc. Most applications will not need this, but build systems and developers will. 3. avro-hadoop.jar This would contain the hadoop API and possibly pig/hive/whatever related to that. This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
[ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912786#action_12912786 ] Philip Zeyliger commented on AVRO-647: -- To be clear, I'm too much of a maven incompetent to volunteer. I would be happy to test it out after the fact, though. BTW, it would be totally acceptable and desirable for the maven plugins for avro code generation to be part of Avro's build. Patrick, who wrote the plugin, would be happy to contribute it, if he hasn't already. That solves a versioning problem for the plugin, too. Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar -- Key: AVRO-647 URL: https://issues.apache.org/jira/browse/AVRO-647 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Assignee: Scott Carey Our dependencies are starting to get a little complicated on the Java side. I propose we build two (possibly more) jars related to our major dependencies and functions. 1. avro.jar (or perhaps avro-core.jar) This contains all of the core avro functionality for _using_ avro as a library. This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop. This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?). 2. avro-dev.jar This would contain compilers, idl, development tools, etc. Most applications will not need this, but build systems and developers will. 3. avro-hadoop.jar This would contain the hadoop API and possibly pig/hive/whatever related to that. This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-670) Allow DataFileWriteTool to accept schema files as input
[ https://issues.apache.org/jira/browse/AVRO-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910712#action_12910712 ] Philip Zeyliger commented on AVRO-670: -- Hi Ron, The idea and implementation look good. We should note that it's an incompatible change in terms of the tools command line API, which I think is fine, but should be noted. I couldn't get the patch to apply cleanly. Typically (see https://cwiki.apache.org/AVRO/how-to-contribute.html) patches should be generated at top-level (so, lang/java/... should be the path that's being patched). Many folks name their patches AVRO-670.patch, too, though I'm not a stickler there. I also recommend turning off Eclipse's autoimport. I think our style guide discourages star imports (e.g., +import joptsimple.*; is not something your patch should have introduced). Besides that, patch gave me a rejects file---how did you generate your patch? I was slightly surprised that 'lang/java/src/test/bin/test_tools.sh' (which is run by the ant target 'test-tools') doesn't exercise this code. Would be good to make sure that the tests for this code don't need any modification. (There's a java test somewhere, but it might not exercise the command-line parsing code; I haven't looked.) Could you upload a new patch without the import changes and re-generated against the root of the repo? Thanks! Allow DataFileWriteTool to accept schema files as input --- Key: AVRO-670 URL: https://issues.apache.org/jira/browse/AVRO-670 Project: Avro Issue Type: Improvement Reporter: Ron Bodkin Fix For: 1.5.0 Attachments: datafilewritefile.patch For non-trivial schemas, it's difficult to pass them inline as a command line argument. I made a patch to use two different arguments: instead of having the first argument be the schema you would now use -schema-file file or -schema schema and then have one other argument (the input JSON file) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-159) maven-avro-plugin: Allow maven builds to use avro
[ https://issues.apache.org/jira/browse/AVRO-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907685#action_12907685 ] Philip Zeyliger commented on AVRO-159: -- I've been using phunt's avro-maven-plugin and it's handy. Would be even more handy if it were part of avro's published artifacts. Is anyone listening sufficiently a build.xml and Maven expert to make that work? maven-avro-plugin: Allow maven builds to use avro - Key: AVRO-159 URL: https://issues.apache.org/jira/browse/AVRO-159 Project: Avro Issue Type: New Feature Components: java Reporter: Hiram Chirino Attachments: AVRO-159.patch, maven-avro-plugin.tar.gz -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-605) Java: make Utf8 implement CharSequence
[ https://issues.apache.org/jira/browse/AVRO-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907831#action_12907831 ] Philip Zeyliger commented on AVRO-605: -- BTW, a caveat for those using this: Utf8(foo).equals(foo) is still false, so it's easy to get into a false sense of security with using Utf8's and Strings interchangeably. Java: make Utf8 implement CharSequence -- Key: AVRO-605 URL: https://issues.apache.org/jira/browse/AVRO-605 Project: Avro Issue Type: Improvement Components: java Reporter: Doug Cutting Assignee: Doug Cutting Fix For: 1.4.0 Attachments: AVRO-605.patch, AVRO-605.patch, AVRO-605.patch If Utf8 implemented CharSequence then interfaces could change to specify CharSequence and accept Utf8 or String interchangeably. This would be mostly back-compatible, except the method signatures of generated protocols would be different, so implementations would need to be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-661) Facilitate moving between Java Strings and Avro's UTF-8
[ https://issues.apache.org/jira/browse/AVRO-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907247#action_12907247 ] Philip Zeyliger commented on AVRO-661: -- I'm not sure what you mean here, but https://issues.apache.org/jira/browse/AVRO-605 has made this considerably more seamless. I would be tempted to close this as 'duplicate' unless you're still finding it awkward to use. Facilitate moving between Java Strings and Avro's UTF-8 --- Key: AVRO-661 URL: https://issues.apache.org/jira/browse/AVRO-661 Project: Avro Issue Type: Improvement Components: java Reporter: Jeff Hammerbacher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-647) Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar
[ https://issues.apache.org/jira/browse/AVRO-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905085#action_12905085 ] Philip Zeyliger commented on AVRO-647: -- I may be missing something: what's http-client used for in the tools category? Break avro.jar into avro.jar, avro-dev.jar and avro-hadoop.jar -- Key: AVRO-647 URL: https://issues.apache.org/jira/browse/AVRO-647 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Assignee: Scott Carey Our dependencies are starting to get a little complicated on the Java side. I propose we build two (possibly more) jars related to our major dependencies and functions. 1. avro.jar (or perhaps avro-core.jar) This contains all of the core avro functionality for _using_ avro as a library. This excludes the specific compiler, avro idl, and other build-time or development tools, as well as avro packages for third party integration such as hadoop. This jar should then have a minimal set of dependencies (jackson, jetty, SLF4J ?). 2. avro-dev.jar This would contain compilers, idl, development tools, etc. Most applications will not need this, but build systems and developers will. 3. avro-hadoop.jar This would contain the hadoop API and possibly pig/hive/whatever related to that. This makes it easier for pig/hive/hadoop to consume avro-core without circular dependencies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-649) Allow pluggable ThreadPools in Java HttpServer
[ https://issues.apache.org/jira/browse/AVRO-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905198#action_12905198 ] Philip Zeyliger commented on AVRO-649: -- Could we reverse the dependency, and say that HttpServer requires a servlet container, and folks can pass in whatever they want? I've been wanting to re-use the servlet container that AVRO uses for my own nefarious purposes (there's no reason to open up two ports if you've already got one open). We should still have methods that work out of the box, as well. Allow pluggable ThreadPools in Java HttpServer -- Key: AVRO-649 URL: https://issues.apache.org/jira/browse/AVRO-649 Project: Avro Issue Type: Improvement Components: java Reporter: Stu Hood Fix For: 1.4.1, 1.5.0 Attachments: 0001-Allow-an-alternate-ThreadPool-to-be-passed-to-the-Je.patch The easiest way to tune the threading implementation of an RPC server is to provide an alternate threadpool implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-649) Allow pluggable ThreadPools in Java HttpServer
[ https://issues.apache.org/jira/browse/AVRO-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905221#action_12905221 ] Philip Zeyliger commented on AVRO-649: -- Good point, I didn't realize that ResponderServlet is public and could be used directly. That works, then. (Are the paths used in HTTP requests specified?) Allow pluggable ThreadPools in Java HttpServer -- Key: AVRO-649 URL: https://issues.apache.org/jira/browse/AVRO-649 Project: Avro Issue Type: Improvement Components: java Reporter: Stu Hood Fix For: 1.4.1, 1.5.0 Attachments: 0001-Allow-an-alternate-ThreadPool-to-be-passed-to-the-Je.patch The easiest way to tune the threading implementation of an RPC server is to provide an alternate threadpool implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-637) GenericArray should implement Collection
[ https://issues.apache.org/jira/browse/AVRO-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904241#action_12904241 ] Philip Zeyliger commented on AVRO-637: -- Patch looks good. I took a look at it via reviewboard at https://review.cloudera.org/r/745/, if anyone else is curious what it looks like side-by-side. GenericArray should implement Collection Key: AVRO-637 URL: https://issues.apache.org/jira/browse/AVRO-637 Project: Avro Issue Type: New Feature Components: java Reporter: Doug Cutting Assignee: Doug Cutting Fix For: 1.5.0 Attachments: AVRO-637.patch, AVRO-637.patch, AVRO-637.patch It would be nice if Avro arrays were better integrated with Java collections. The GenericArray interface permits array element reuse, which is awkward with java.util.Collection. But if GenericArray implemented Collection and the Avro runtime permitted arbitrary Collection implementations to be passed for Arrays then it would simplify many applications. The runtime could still reuse elements if an array implemented GenericArray, so performance would not suffer for applications that, e.g., loop over a data file, reusing instances. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-646) Java: disable trace tests
[ https://issues.apache.org/jira/browse/AVRO-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904417#action_12904417 ] Philip Zeyliger commented on AVRO-646: -- +1 for now. Java: disable trace tests - Key: AVRO-646 URL: https://issues.apache.org/jira/browse/AVRO-646 Project: Avro Issue Type: Test Components: java Reporter: Doug Cutting Fix For: 1.4.0 I propose we disable the org.apache.avro.ipc.trace.* tests which are currently unreliable so that we can get the 1.4.0 release out. This package is entirely new code that other previously existing code does not depend on, so its test failures are not a regression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-637) GenericArray should implement Collection
[ https://issues.apache.org/jira/browse/AVRO-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904075#action_12904075 ] Philip Zeyliger commented on AVRO-637: -- The schema we're talking about here is '{type:array, items:string}', no? In Java, array reads List to me. For the generated code, List is the closest match we've got. (The generated code is not a good match for things that don't fit into memory, so I'm not persuaded by the Iterable argument.) General types are nice for writing, but they're a massive pain for reading. I recently wrote an RPC Server using Avro Java, and the lack of get() on an array that the client wrote was quite irritating (hence I'm a big fan of this jira). At this layer, I think you have to choose the interfaces that are closest to the implementation. GenericArray should implement Collection Key: AVRO-637 URL: https://issues.apache.org/jira/browse/AVRO-637 Project: Avro Issue Type: New Feature Components: java Reporter: Doug Cutting Assignee: Doug Cutting Fix For: 1.5.0 Attachments: AVRO-637.patch, AVRO-637.patch It would be nice if Avro arrays were better integrated with Java collections. The GenericArray interface permits array element reuse, which is awkward with java.util.Collection. But if GenericArray implemented Collection and the Avro runtime permitted arbitrary Collection implementations to be passed for Arrays then it would simplify many applications. The runtime could still reuse elements if an array implemented GenericArray, so performance would not suffer for applications that, e.g., loop over a data file, reusing instances. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-639) python should print short name of records
[ https://issues.apache.org/jira/browse/AVRO-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903531#action_12903531 ] Philip Zeyliger commented on AVRO-639: -- BTW, given that we now prefer AvroIDL for generating these things, I'd be a big fan of deprecating short names and using fully qualified names in the JSON representation always. python should print short name of records -- Key: AVRO-639 URL: https://issues.apache.org/jira/browse/AVRO-639 Project: Avro Issue Type: Improvement Components: python Reporter: Doug Cutting Fix For: 1.4.0 Attachments: AVRO-639.patch When printing a reference to a named schema, the non-full-name is preferred when the type's namespace is the same as its enclosing schema. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-639) python should print short name of records
[ https://issues.apache.org/jira/browse/AVRO-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903543#action_12903543 ] Philip Zeyliger commented on AVRO-639: -- Python patch looks. Consider writing a quick test in test_schema.py; should be pretty easy. python should print short name of records -- Key: AVRO-639 URL: https://issues.apache.org/jira/browse/AVRO-639 Project: Avro Issue Type: Improvement Components: python Reporter: Doug Cutting Fix For: 1.4.0 Attachments: AVRO-639.patch When printing a reference to a named schema, the non-full-name is preferred when the type's namespace is the same as its enclosing schema. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-639) python should print short name of records
[ https://issues.apache.org/jira/browse/AVRO-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903552#action_12903552 ] Philip Zeyliger commented on AVRO-639: -- Alright, plus one. I introduced this behavior recently in AVRO-620, btw. If it used to work before, it's because previously the schema that we printed out would be more like the schema that originally came in, but that's the wrong behavior if you're printing out a subschema. python should print short name of records -- Key: AVRO-639 URL: https://issues.apache.org/jira/browse/AVRO-639 Project: Avro Issue Type: Improvement Components: python Reporter: Doug Cutting Fix For: 1.4.0 Attachments: AVRO-639.patch When printing a reference to a named schema, the non-full-name is preferred when the type's namespace is the same as its enclosing schema. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-640) python rpc interop tests use wrong src dir
[ https://issues.apache.org/jira/browse/AVRO-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903612#action_12903612 ] Philip Zeyliger commented on AVRO-640: -- +1 python rpc interop tests use wrong src dir -- Key: AVRO-640 URL: https://issues.apache.org/jira/browse/AVRO-640 Project: Avro Issue Type: Bug Components: python Reporter: Doug Cutting Priority: Blocker Fix For: 1.4.0 Attachments: AVRO-640.patch Python's RPC interop tests don't work because they refer to code in lang/py/src/ when they should instead look in lang/py/build/src. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (AVRO-636) Expose Singleton Method for TracePlugin
[ https://issues.apache.org/jira/browse/AVRO-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-636. -- Resolution: Fixed Just comitted this; thanks, Patrick. Expose Singleton Method for TracePlugin --- Key: AVRO-636 URL: https://issues.apache.org/jira/browse/AVRO-636 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-636.patch.v1, AVRO-636.patch.v2, AVRO-636.patch.v3, AVRO-636.patch.v4 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-636) Expose Singleton Method for TracePlugin
[ https://issues.apache.org/jira/browse/AVRO-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903173#action_12903173 ] Philip Zeyliger commented on AVRO-636: -- You should probably make the set and get methods synchronized here. Otherwise, looks good. Expose Singleton Method for TracePlugin --- Key: AVRO-636 URL: https://issues.apache.org/jira/browse/AVRO-636 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-636.patch.v1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-614) Improve Trace frontend UI
[ https://issues.apache.org/jira/browse/AVRO-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902491#action_12902491 ] Philip Zeyliger commented on AVRO-614: -- Patch looks good. Do you have commit access yet, or should I commit for you? Improve Trace frontend UI - Key: AVRO-614 URL: https://issues.apache.org/jira/browse/AVRO-614 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-614.patch.v1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-632) Responder should log stack traces for user exceptions
[ https://issues.apache.org/jira/browse/AVRO-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902722#action_12902722 ] Philip Zeyliger commented on AVRO-632: -- +1. Responder should log stack traces for user exceptions - Key: AVRO-632 URL: https://issues.apache.org/jira/browse/AVRO-632 Project: Avro Issue Type: Bug Components: java Reporter: Doug Cutting Assignee: Doug Cutting Fix For: 1.4.0 Attachments: AVRO-632.patch Responder currently does not log the stack trace for exceptions in user code, making it hard to debug server-side problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (AVRO-613) Create basic frontend to view trace results.
[ https://issues.apache.org/jira/browse/AVRO-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-613. -- Resolution: Fixed I just committed this. Thanks, Patrick! Create basic frontend to view trace results. Key: AVRO-613 URL: https://issues.apache.org/jira/browse/AVRO-613 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-613.v1.txt, AVRO-613.v2.txt, AVRO-613.v3.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (AVRO-622) python avro.ipc doesn't work with python2.4
python avro.ipc doesn't work with python2.4 --- Key: AVRO-622 URL: https://issues.apache.org/jira/browse/AVRO-622 Project: Avro Issue Type: Bug Components: python Reporter: Philip Zeyliger Similar to AVRO-618, avro.ipc depends on the 'struct' python module, which is newish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (AVRO-623) python avro.ipc has unpythonic build-time dependency
python avro.ipc has unpythonic build-time dependency Key: AVRO-623 URL: https://issues.apache.org/jira/browse/AVRO-623 Project: Avro Issue Type: Bug Components: build, python Reporter: Philip Zeyliger The avro.ipc module requires a build-time replacement of @HANDSHAKE_REQUEST_SCHEMA@ (and the same thing for response). This prevents AVRO from being used unless it's been pre-processed with the ant file. I think it would be better to inline the schema in the code, and generate a test to make sure it stays compatible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly
[ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-620. -- Hadoop Flags: [Reviewed] Assignee: Philip Zeyliger Fix Version/s: 1.4.0 Resolution: Fixed I committed this. Jeff reviewed it: {noformat} Ship it! Okay. - Jeff {noformat} Python implementation doesn't stringify sub-schemas correctly - Key: AVRO-620 URL: https://issues.apache.org/jira/browse/AVRO-620 Project: Avro Issue Type: Bug Components: python Reporter: Philip Zeyliger Assignee: Philip Zeyliger Fix For: 1.4.0 Attachments: AVRO-620.patch.txt {noformat} In [9]: import avro.schema In [10]: s = avro.schema.parse('{type: record, name: X, fields: [{name: y, type: {type: record, name: Y, fields: [{name: Z, type: X}]}}]}') In [11]: str(s.fields[0].type) Out[11]: '{fields: [{type: X, name: Z}], type: record, name: Y}' {noformat} str(schema) is used in avro data files to record the schema. In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X. I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-622) python avro.ipc doesn't work with python2.4
[ https://issues.apache.org/jira/browse/AVRO-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated AVRO-622: - Attachment: AVRO-622.patch.txt This patch fixes the issue. I will commit it shortly; it imports the trick from AVRO-618 to overcome the lack of a struct class in older pythons. I've attached a test which does nothing, but it does import the avro.ipc module. I started writing an end-to-end test for python RPC, but it turns out that there weren't any server implementations. The test file will check compilation and can serve as a placeholder. python avro.ipc doesn't work with python2.4 --- Key: AVRO-622 URL: https://issues.apache.org/jira/browse/AVRO-622 Project: Avro Issue Type: Bug Components: python Reporter: Philip Zeyliger Attachments: AVRO-622.patch.txt Similar to AVRO-618, avro.ipc depends on the 'struct' python module, which is newish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (AVRO-622) python avro.ipc doesn't work with python2.4
[ https://issues.apache.org/jira/browse/AVRO-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-622. -- Assignee: Philip Zeyliger Fix Version/s: 1.4.0 Resolution: Fixed python avro.ipc doesn't work with python2.4 --- Key: AVRO-622 URL: https://issues.apache.org/jira/browse/AVRO-622 Project: Avro Issue Type: Bug Components: python Reporter: Philip Zeyliger Assignee: Philip Zeyliger Fix For: 1.4.0 Attachments: AVRO-622.patch.txt Similar to AVRO-618, avro.ipc depends on the 'struct' python module, which is newish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly
[ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901270#action_12901270 ] Philip Zeyliger commented on AVRO-620: -- On reviewboard at https://review.cloudera.org/r/706/ Python implementation doesn't stringify sub-schemas correctly - Key: AVRO-620 URL: https://issues.apache.org/jira/browse/AVRO-620 Project: Avro Issue Type: Bug Components: python Reporter: Philip Zeyliger Attachments: AVRO-620.patch.txt {noformat} In [9]: import avro.schema In [10]: s = avro.schema.parse('{type: record, name: X, fields: [{name: y, type: {type: record, name: Y, fields: [{name: Z, type: X}]}}]}') In [11]: str(s.fields[0].type) Out[11]: '{fields: [{type: X, name: Z}], type: record, name: Y}' {noformat} str(schema) is used in avro data files to record the schema. In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X. I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly
[ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated AVRO-620: - Attachment: AVRO-620.patch.txt I believe I've fixed this. I implemented a Schema.to_json(names) method, which recursively serializes schema objects to JSON-compatible structures, avoiding re-serializing schemas which we've already seen. (This also means avoiding serializing JSON just to deserialize it again.) I was able to get rid of some variables which tracked how the schema was originally defined, because this recursion is taking care of noticing that. As I needed to, I removed some verbosity from the tests and removed some exception handling. It's very unhelpful when python tests catch exceptions, because they make it that much harder to track down the exact point of the failure. (An exception that propagates through a test is a test failure, so there's no need to separately mark the test as failed.) Printing extra information about what tests are running distracts from where the failures are occurring. I recommend the nose test runner (with flags --pdb --pdb-failure) for running the tests. I've added a test that triggered this in the first place. Python implementation doesn't stringify sub-schemas correctly - Key: AVRO-620 URL: https://issues.apache.org/jira/browse/AVRO-620 Project: Avro Issue Type: Bug Components: python Reporter: Philip Zeyliger Attachments: AVRO-620.patch.txt {noformat} In [9]: import avro.schema In [10]: s = avro.schema.parse('{type: record, name: X, fields: [{name: y, type: {type: record, name: Y, fields: [{name: Z, type: X}]}}]}') In [11]: str(s.fields[0].type) Out[11]: '{fields: [{type: X, name: Z}], type: record, name: Y}' {noformat} str(schema) is used in avro data files to record the schema. In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X. I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly
[ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900943#action_12900943 ] Philip Zeyliger commented on AVRO-620: -- I'm working on this. Python implementation doesn't stringify sub-schemas correctly - Key: AVRO-620 URL: https://issues.apache.org/jira/browse/AVRO-620 Project: Avro Issue Type: Bug Components: python Reporter: Philip Zeyliger {noformat} In [9]: import avro.schema In [10]: s = avro.schema.parse('{type: record, name: X, fields: [{name: y, type: {type: record, name: Y, fields: [{name: Z, type: X}]}}]}') In [11]: str(s.fields[0].type) Out[11]: '{fields: [{type: X, name: Z}], type: record, name: Y}' {noformat} str(schema) is used in avro data files to record the schema. In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X. I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-618) Avro doesn't work with python 2.4
[ https://issues.apache.org/jira/browse/AVRO-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated AVRO-618: - Attachment: AVRO-618.patch.txt Patch which fixes python2.4 issues. I tested with python2.4 and python2.5 on OS X 10.5. Avro doesn't work with python 2.4 - Key: AVRO-618 URL: https://issues.apache.org/jira/browse/AVRO-618 Project: Avro Issue Type: Bug Affects Versions: 1.4.0 Reporter: Philip Zeyliger Attachments: AVRO-618.patch.txt The avro tests fail on a system with python2.4. The issues can be easily worked around by avoiding python2.5 constructs and libraries. The missing struct.Struct is what started me down this path, but I also ended up tackling a few other similar issues and re-jiggering slightly the build.xml to allow for ant test -Dpython=python2.4 to work. {noformat} File build/bdist.macosx-10.5-i386/egg/avro/io.py, line 52, in ? AttributeError: 'module' object has no attribute 'Struct' {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-458) add tools that read/write CSV records from/to avro data files
[ https://issues.apache.org/jira/browse/AVRO-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900433#action_12900433 ] Philip Zeyliger commented on AVRO-458: -- Harsh, I took a quick look at the github, and it looks like this is great stuff. Looking forward to more tools in the ecosystem. The one thing obviously missing from the patch is some tests. Cheers, -- Philip add tools that read/write CSV records from/to avro data files - Key: AVRO-458 URL: https://issues.apache.org/jira/browse/AVRO-458 Project: Avro Issue Type: New Feature Reporter: Doug Cutting It might be useful to have command-line tools that can read write arbitrary CSV data from to Avro data files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (AVRO-619) Prefer the JSON module of python's stdlib over simplejson.
[ https://issues.apache.org/jira/browse/AVRO-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-619. -- Hadoop Flags: [Reviewed] Assignee: Harsh J Chouraria Resolution: Fixed Looks great. This was minor, so I went ahead and committed this. Thanks for your first contribution, Harsh. Welcome! Prefer the JSON module of python's stdlib over simplejson. -- Key: AVRO-619 URL: https://issues.apache.org/jira/browse/AVRO-619 Project: Avro Issue Type: Improvement Components: python Affects Versions: 1.3.3 Environment: Any/All, Python 2.6.5 Reporter: Harsh J Chouraria Assignee: Harsh J Chouraria Priority: Trivial Fix For: 1.4.0 Attachments: avro.json.priority.r1.diff Original Estimate: 0.02h Remaining Estimate: 0.02h Give the stdlib's json a higher import priority over simplejson modules, which is only required if the python version is 2.6. Currently even 2.6 version of python running avro code would begin utilizing simplejson over its own provided json library, which should not be the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-618) Avro doesn't work with python 2.4
[ https://issues.apache.org/jira/browse/AVRO-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900531#action_12900531 ] Philip Zeyliger commented on AVRO-618: -- They're pretty close. I actually got rid of uuid completely, and just used the same mechanism that it uses to get a random 16 bytes, since there's no need to create the UUID object if all we're looking for is 16 random bytes. I think the code has moved a bit, so mine might apply more cleanly. Haven't tried applying 588. I've also updated the build.xml file to run arbitrary versions of python easily, so this is easier to reproduce. Avro doesn't work with python 2.4 - Key: AVRO-618 URL: https://issues.apache.org/jira/browse/AVRO-618 Project: Avro Issue Type: Bug Affects Versions: 1.4.0 Reporter: Philip Zeyliger Attachments: AVRO-618.patch.txt The avro tests fail on a system with python2.4. The issues can be easily worked around by avoiding python2.5 constructs and libraries. The missing struct.Struct is what started me down this path, but I also ended up tackling a few other similar issues and re-jiggering slightly the build.xml to allow for ant test -Dpython=python2.4 to work. {noformat} File build/bdist.macosx-10.5-i386/egg/avro/io.py, line 52, in ? AttributeError: 'module' object has no attribute 'Struct' {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-613) Create basic frontend to view trace results.
[ https://issues.apache.org/jira/browse/AVRO-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1285#action_1285 ] Philip Zeyliger commented on AVRO-613: -- Made some comments via reviewboard: --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/655/#review945 --- Cool stuff. Some comments below. lang/java/src/java/org/apache/avro/ipc/trace/StaticServlet.java http://review.cloudera.org/r/655/#comment3103 This will throw an exception on an empty string. You should check for that and handle it more gracefully. Might be useful to write a quick test as well. lang/java/src/java/org/apache/avro/ipc/trace/TraceClientServlet.java http://review.cloudera.org/r/655/#comment3104 This seems very ad hoc. What you're looking for is some sort of URL routing. You have a couple of options here: - Servlets have something called a RequestDispatcher which can be controlled by a web.xml file. - Or do it a bit more methodically, by extracting the path component out of the URL (instead of stripping http://, which is one-off, and may be broken because we might be using https), and then having some logic. It makes sense to put the logic for serving a specific view into its own function. If you had a more heavy-weight framework, it would certainly force you to do it. Either way, separating out the URL parsing logic from the collection logic seems like a nice thing to do. lang/java/src/java/org/apache/avro/ipc/trace/TraceCollection.java http://review.cloudera.org/r/655/#comment3106 Are you not using primitive types so you can store 'null'? I think it would be better to use -1, but, either way, you should document it. lang/java/src/java/org/apache/avro/ipc/trace/TraceCollection.java http://review.cloudera.org/r/655/#comment3105 If you have the getters, you may as well drop the 'public'. lang/java/src/java/org/apache/avro/ipc/trace/TraceCollection.java http://review.cloudera.org/r/655/#comment3107 Commented out code? lang/java/src/java/org/apache/avro/ipc/trace/TraceCollection.java http://review.cloudera.org/r/655/#comment3108 Indentation seems inconsistent here. lang/java/src/java/org/apache/avro/ipc/trace/TraceCollection.java http://review.cloudera.org/r/655/#comment3109 More commented out code - Philip Create basic frontend to view trace results. Key: AVRO-613 URL: https://issues.apache.org/jira/browse/AVRO-613 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-613.v1.txt, AVRO-613.v2.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (AVRO-606) Add File-Based Span Storage to TracePlugin
[ https://issues.apache.org/jira/browse/AVRO-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-606. -- Resolution: Fixed I just committed this as r985363. Thanks for your contribution, Patrick! Add File-Based Span Storage to TracePlugin -- Key: AVRO-606 URL: https://issues.apache.org/jira/browse/AVRO-606 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-606.v1.txt, AVRO-606.v2.txt, AVRO-606.v3.txt, AVRO-606.v4.txt, AVRO-606.v5.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-606) Add File-Based Span Storage to TracePlugin
[ https://issues.apache.org/jira/browse/AVRO-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895756#action_12895756 ] Philip Zeyliger commented on AVRO-606: -- Pressed add quite early. Looks reasonable, for the most part. It feels a bit odd that the reader and writer of these log directories are the same class, still. That might be worth splitting up at some point. Instead of keeping a list of all previous files, you could also simply scan the directory and look around. Note that it's possible to add some metadata to AVRO data files when you open them, so you might store the open time in the header. (The close time would require a seek...). That way, you could retrieve spans across restarts. Add File-Based Span Storage to TracePlugin -- Key: AVRO-606 URL: https://issues.apache.org/jira/browse/AVRO-606 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-606.v1.txt, AVRO-606.v2.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-595) Add Dapper-Style Tracing Plugin to Avro
[ https://issues.apache.org/jira/browse/AVRO-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894832#action_12894832 ] Philip Zeyliger commented on AVRO-595: -- Via reviewboard again: --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/390/#review627 --- Almost there. I tried running the tests and ran into some issues: Testcase: testRecursingTrace took 0.006 sec Caused an ERROR Connection refused java.net.ConnectException: Connection refused Does ant test clean work for you? I think it's trying to re-use a port that it hasn't closed. lang/java/src/java/org/apache/avro/ipc/trace/Util.java http://review.cloudera.org/r/390/#comment2366 If you can avoid this, I would avoid making this public. lang/java/src/java/org/apache/avro/ipc/trace/Util.java http://review.cloudera.org/r/390/#comment2365 This doesn't feel public to me - Philip Add Dapper-Style Tracing Plugin to Avro --- Key: AVRO-595 URL: https://issues.apache.org/jira/browse/AVRO-595 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-595.patch.v1.txt, AVRO-595.patch.v2.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (AVRO-587) Add Charts and Templating to Stats View
[ https://issues.apache.org/jira/browse/AVRO-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger resolved AVRO-587. -- Resolution: Fixed Add Charts and Templating to Stats View --- Key: AVRO-587 URL: https://issues.apache.org/jira/browse/AVRO-587 Project: Avro Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell Fix For: 1.4.0 Attachments: AVRO-587.patch.v1.txt, AVRO-587.patch.v2.txt, AVRO-587.patch.v3, AVRO-587.patch.v4.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-595) Add Dapper-Style Tracing Plugin to Avro
[ https://issues.apache.org/jira/browse/AVRO-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890911#action_12890911 ] Philip Zeyliger commented on AVRO-595: -- I put up a review on reviewboard: --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/351/#review443 --- Great start. Comments below. lang/java/src/java/org/apache/avro/ipc/trace/InMemorySpanStorage.java http://review.hbase.org/r/351/#comment1866 queryHandle seems unused here. The query stuff still seems to be baking; I haven't seen if your tests depend on it, but it might make sense to stage it out of this commit and put it back in later when it's more strictly necessary. lang/java/src/java/org/apache/avro/ipc/trace/SpanStorage.java http://review.hbase.org/r/351/#comment1864 Drop @author's for Apache code lang/java/src/java/org/apache/avro/ipc/trace/SpanStorage.java http://review.hbase.org/r/351/#comment1865 It's weird that this returns bytes and not objects... lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1851 mention how this might be disabled? lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1852 Could you initialize these guys in-line and mark them final as well? lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1853 I'm not a big fan of using Configuration here. Even though AVRO has a dependency on Hadoop, it seems a bit abusive to use it where there's no Hadoop going on. I recommend a POJO represingting TracePluginConfiguration which people can create and pass to this. If you were using Hadoop configuration, you'd want the keys to be better namespaced (i.e., avro.trace.foo instead of just foo), since people would want to use the same global configuration mechanism. But my opinion is that the layer that's doing the work should use reasonably typed configuration, and if you want to layer something above it, that can be done above. lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1854 These are static, so you shouldn't need to instantiate them per instance. lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1855 Why Math.abs()? Is spanID a fixed(4) or a long? It'll store more efficiently if it's a fixed(4), though it might be more of a pain to use. lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1856 What's the 100 doing here? lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1857 Cache this? lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1858 Extract createNewEmptySpan() into a method; you've done the same thing twice. lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1859 You should document (perhaps in package.html, or in javadoc here) how this system uses the RPC metadata. lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1860 Since these aren't supposed to happen, I tend to escalate them into RuntimeException instead of dropping them on the floor. Just in case :) lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1861 I'm a little confused as to why you need both span and parent_span, but I might be missing something. lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1862 Log about this; you'd want to know that something is misbehaving. lang/java/src/java/org/apache/avro/ipc/trace/TracePlugin.java http://review.hbase.org/r/351/#comment1863 You're using childSpan and this.childSpan inconsistency in a few places; may as well make it consistent. lang/java/src/test/java/org/apache/avro/ipc/trace/TestBasicTracing.java http://review.hbase.org/r/351/#comment1868 Looks like x, y, w here. lang/java/src/test/java/org/apache/avro/ipc/trace/TestBasicTracing.java http://review.hbase.org/r/351/#comment1869 I wonder if this test would read easier if you used the specific API and generated some classes for it. Not a big concern. share/schemas/org/apache/avro/ipc/trace/avroTrace.avdl http://review.hbase.org/r/351/#comment1870 Perhaps add a microseconds component to this as well? share/schemas/org/apache/avro/ipc/trace/avroTrace.avdl http://review.hbase.org/r/351/#comment1847 It's a little weird that SpanEventType isn't really a type; it's the event
[jira] Commented: (AVRO-578) Add RPC Payload to RPCContext class
[ https://issues.apache.org/jira/browse/AVRO-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880694#action_12880694 ] Philip Zeyliger commented on AVRO-578: -- It's a question of whether we are considering the Plugin API evolving or stable, and I'm not sure. The safest thing to do is to add more methods to the RPCContext class, representing the clearer API you propose, and mark as deprecated the old ones. The code will continue calling both for a release, until we can delete the deprecated methods. If we're less scrupulous, we could delete the old methods altogether, but that might cause someone pain. Doug, do you have an opinion? Add RPC Payload to RPCContext class --- Key: AVRO-578 URL: https://issues.apache.org/jira/browse/AVRO-578 Project: Avro Issue Type: Sub-task Components: java Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: AVRO-578.patch, AVRO-578.patch.v2 For stats/monitoring it is helpful to see how many bytes are encoded for a given RPC call. Right now Encoder's don't track how many payload bytes are actually written out when encoding is done. Ideally this bytesWritten() would be in Encoder interface, however not sure JSON plugin can track the number of characters actually written, so alternatively just could be added to BinaryEncoder, and stats plugin will only provide payload sizes when that encoder is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-566) test_tools.sh should do something intelligent if JAVA_HOME is not set
[ https://issues.apache.org/jira/browse/AVRO-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876328#action_12876328 ] Philip Zeyliger commented on AVRO-566: -- +1. test_tools.sh should do something intelligent if JAVA_HOME is not set - Key: AVRO-566 URL: https://issues.apache.org/jira/browse/AVRO-566 Project: Avro Issue Type: Wish Components: java Reporter: Jeff Hammerbacher Assignee: Doug Cutting Attachments: AVRO-566.patch Currently, test_tools.sh will fail if the JAVA_HOME environment variable is not set, even if there is a java on the PATH. I generally trust Philip on testing, but I've been bitten a few times by this requirement while trying to build a release. It's low priority, but if it doesn't violate anyone's test-related sensibilities, I'd love to have this test just run with the first java it finds on the path if JAVA_HOME is not set. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-567) add tools for text file import and export
[ https://issues.apache.org/jira/browse/AVRO-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876450#action_12876450 ] Philip Zeyliger commented on AVRO-567: -- Would the HDFS integration introduce a dependency on Hadoop Common's FileSystem? add tools for text file import and export - Key: AVRO-567 URL: https://issues.apache.org/jira/browse/AVRO-567 Project: Avro Issue Type: New Feature Components: java Reporter: Doug Cutting Fix For: 1.4.0 It would be good to have command-line tools to convert between newline-delimited text to Avro data files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.