from:"Doug Cutting $Commented$ $JIRA$"

[jira] [Commented] (AVRO-806) add a column-major codec for data files

2012-04-18 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257077#comment-13257077
 ] 

Doug Cutting commented on AVRO-806:
---

I've implemented a new column-file format at:

  https://github.com/cutting/trevni

This supports writing Avro data.

If folks find this useful then I intend to contribute it to Apache.


 add a column-major codec for data files
 ---

 Key: AVRO-806
 URL: https://issues.apache.org/jira/browse/AVRO-806
 Project: Avro
  Issue Type: New Feature
  Components: java, spec
Reporter: Doug Cutting
Assignee: Doug Cutting
 Fix For: 1.7.0

 Attachments: AVRO-806-v2.patch, AVRO-806.patch, avro-file-columnar.pdf


 Define a codec that, when a data file's schema is a record schema, writes 
 blocks within the file in column-major order.  This would permit better 
 compression and also permit efficient skipping of fields that are not of 
 interest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1062) DataFileWriter uses java.rmi.server.UID to generate unique id,which causes avro compilation problem on Android Delvik

2012-04-17 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255802#comment-13255802
 ] 

Doug Cutting commented on AVRO-1062:


Looks good to me.  We could perhaps improve sync marker generation, but this 
issue should make it no worse and permits Avro to run on Android.  I'll commit 
this soon unless someone objects.

 DataFileWriter uses java.rmi.server.UID to generate unique id,which causes 
 avro compilation problem on Android Delvik
 -

 Key: AVRO-1062
 URL: https://issues.apache.org/jira/browse/AVRO-1062
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.3
 Environment: Android 2.3.3-API level 10
Reporter: Kevin Zhao
  Labels: patch
 Fix For: 1.7.0

 Attachments: AVRO-1062.patch


 Because Android Delvik does not have java.rmi.* packages and 
 org.apache.avro.file.DataFileWriter has a reference of 
 java.rmi.server.UID,avro fails in compilation process on Android.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1061) Add sync interval option to Avro commandline tools

2012-04-16 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255174#comment-13255174
 ] 

Doug Cutting commented on AVRO-1061:


This looks good to me.

We should probably add a test to TestDataFileTools that checks this.

Should we also increase the default sync interval to 64k?




 Add sync interval option to Avro commandline tools
 --

 Key: AVRO-1061
 URL: https://issues.apache.org/jira/browse/AVRO-1061
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.0
Reporter: Ari Pollak
Priority: Trivial
 Attachments: AVRO-1061.patch


 It would be nice to expose the sync interval to the avro commandline writer 
 tools, since I've seen a 20%+ decrease in file size using deflate compression 
 and a 64K+ sync interval instead of the default of 16K.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1057) Java builder API fails when default value does not match the first type in a union

2012-04-13 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253666#comment-13253666
 ] 

Doug Cutting commented on AVRO-1057:


I think this is correct.  The specification says, Default values for union 
fields correspond to the first schema in the union  
(http://avro.apache.org/docs/current/spec.html).  So the default value for 
{boolean null} must be a boolean.

 Java builder API fails when default value does not match the first type in a 
 union
 --

 Key: AVRO-1057
 URL: https://issues.apache.org/jira/browse/AVRO-1057
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.3
Reporter: Christophe Taton
Priority: Minor

 The following definition works fine with the builder:
 record Rec {
   union { boolean, null } field = false;
 }
 but this one fails:
 record Rec {
   union { boolean, null } field = null;
 }
 Rec.newBuilder().build() fails with this error:
 org.apache.avro.AvroRuntimeException: org.apache.avro.AvroTypeException: 
 Non-boolean default for boolean: null
   at Rec$Builder.build
 Caused by: org.apache.avro.AvroTypeException: Non-boolean default for 
 boolean: null
   at 
 org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:393)
   at 
 org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:350)
   at 
 org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:178)
   at Rec$Builder.build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1057) Java builder API fails when default value does not match the first type in a union

2012-04-13 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253692#comment-13253692
 ] 

Doug Cutting commented on AVRO-1057:


Yes, this could be checked more aggressively.  The existing logic for this is 
in RecordBuilderBase#defaultValue().  It writes the default JSON parse tree, 
then reads it into the appropriate Avro data structure (generic, specific, 
reflect).  The writing does the error checking, and that's in 
ResolvingGrammarGenerator#encode.  So the compiler and/or parser could call 
that.  We probably don't want to check it unconditionally in the schema parser, 
as schema parsing is performance sensitive.

 Java builder API fails when default value does not match the first type in a 
 union
 --

 Key: AVRO-1057
 URL: https://issues.apache.org/jira/browse/AVRO-1057
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.3
Reporter: Christophe Taton
Priority: Minor

 The following definition works fine with the builder:
 record Rec {
   union { boolean, null } field = false;
 }
 but this one fails:
 record Rec {
   union { boolean, null } field = null;
 }
 Rec.newBuilder().build() fails with this error:
 org.apache.avro.AvroRuntimeException: org.apache.avro.AvroTypeException: 
 Non-boolean default for boolean: null
   at Rec$Builder.build
 Caused by: org.apache.avro.AvroTypeException: Non-boolean default for 
 boolean: null
   at 
 org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:393)
   at 
 org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:350)
   at 
 org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:178)
   at Rec$Builder.build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1055) Race condition in Java fingerprinting code

2012-04-09 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250094#comment-13250094
 ] 

Doug Cutting commented on AVRO-1055:


+1 Looks good to me.  You might add a comment saying something like, Nested 
class used so that table is not built unless it's used.

 Race condition in Java fingerprinting code
 --

 Key: AVRO-1055
 URL: https://issues.apache.org/jira/browse/AVRO-1055
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.0
Reporter: Thiruvalluvan M. G.
Assignee: Thiruvalluvan M. G.
Priority: Minor
 Attachments: AVRO-1055.patch


 There is a subtle race condition. If the fpTable64 is not yet initialized and 
 two thread try to compute FP for two schemas (or the same schema) at the same 
 time, one thread will start initializing the table while the other can start 
 using the partially initialized table giving wrong result.
 The forthcoming patch fixes that issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-551) C: Build and pass tests on Win32

2012-04-03 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245856#comment-13245856
 ] 

Doug Cutting commented on AVRO-551:
---

It's fine to include MIT and BSD licensed code, but the licenses for these 
files should be appended to the end of Avro's top-level LICENSE.txt file.  For 
more information see:

http://apache.org/legal/resolved.html

 C: Build and pass tests on Win32
 

 Key: AVRO-551
 URL: https://issues.apache.org/jira/browse/AVRO-551
 Project: Avro
  Issue Type: Improvement
  Components: c
Reporter: Bruce Mitchener
 Attachments: AVRO-551.patch


 Avro C does not currently build on Win32. We need to address that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1045) deepCopy of BYTES underflow exception

2012-03-14 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229630#comment-13229630
 ] 

Doug Cutting commented on AVRO-1045:


It's a little odd to require that deepCopy() preserve more than is checked by 
equals().  Some folks may might reasonably expect deepCopy() to compact large 
ByteBuffers.  We could perhaps add a 'protected ByteBuffer 
GenericData#copyBytes(ByteBuffer)' method that could be overridden in a 
subclass?  Would that work in your case?  Am I being overly cautious?

 deepCopy of BYTES underflow exception
 -

 Key: AVRO-1045
 URL: https://issues.apache.org/jira/browse/AVRO-1045
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.2
Reporter: Jeremy Lewi
Priority: Minor
 Fix For: 1.6.3

 Attachments: AVRO-1045.patch


 In org.apache.avro.generic.GenericData.deepCopy - the code for copying a 
 ByteBuffer is
 ByteBuffer byteBufferValue = (ByteBuffer) value;
 byte[] bytesCopy = new byte[byteBufferValue.capacity()];
 byteBufferValue.rewind();
 byteBufferValue.get(bytesCopy);
 byteBufferValue.rewind();
 return ByteBuffer.wrap(bytesCopy);
 I think this is problematic because it will cause an UnderFlow exception to 
 be thrown if the ByteBuffer limit is less than the capacity of the byte 
 buffer.
 My use case is as follows. I have ByteBuffer's backed by large arrays so I 
 can avoid resizing the array every time I write data. So limit  capacity. 
 When the data is written, or copied
 I think avro should respect this. When data is serialized, avro should 
 automatically use the minimum number of bytes.
 When an object is copied, I think it makes sense to preserve the capacity of 
 the underlying buffer as opposed to compacting it.
 So I think the code could be fixed by replacing get with 
 byteBufferValue.get(bytesCopy, 0 , byteBufferValue.limit());

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-987) Make Avro OSGi ready

2012-03-06 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223408#comment-13223408
 ] 

Doug Cutting commented on AVRO-987:
---

The patch looks reasonable to me, but fails to apply cleanly.  Can someone 
please provide a version that applies to the current trunk?  Thanks!

 Make Avro OSGi ready
 

 Key: AVRO-987
 URL: https://issues.apache.org/jira/browse/AVRO-987
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Ioannis Canellos
 Attachments: AVRO-987-patch-updated.txt, AVRO-987-patch.txt


 It would be really nice to be able to use Avro inside OSGi. To achieve this 
 two things are required:
 i) Provide proper MANIFEST.MF.
 ii) Deal with potential class loading issues. Avro uses Class.forName a lot 
 and that is not very OSGi friendly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1022) Error in validate name

2012-03-06 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223419#comment-13223419
 ] 

Doug Cutting commented on AVRO-1022:


Raymie, this is a good approach.  The spec language that requires ASCII should 
be changed from MUST to SHOULD.

One use case that Scott mentioned that your prose does not is transmitting 
schemas from other systems, e.g., Avro Schemas might often be generated 
automatically from Pig or SQL schemas.  In these cases accepting liberally 
permits schemas to pass through Avro losslessly.  Strict validation is really 
only useful when a developer is the schema author.  In many (most?) cases Avro 
might be an underlying tool, used indirectly through an application, and in 
these cases strict validation is probably not useful.

 Error in validate name
 --

 Key: AVRO-1022
 URL: https://issues.apache.org/jira/browse/AVRO-1022
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Raymie Stata
Priority: Minor
 Attachments: AVRO-1022.patch, AVRO-1022.patch, 
 unicode-recommendation.html


 Fix schema.validateName to allow only ASCII letters, not Unicode letters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas

2012-03-06 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223428#comment-13223428
 ] 

Doug Cutting commented on AVRO-1006:


Patch looks good. +1 I'll commit this unless there are objections.

 Fingerprints for Avro Schemas
 -

 Key: AVRO-1006
 URL: https://issues.apache.org/jira/browse/AVRO-1006
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata
  Labels: features
 Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch, 
 AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, 
 schema-fingerprinting.html, schema-fingerprinting.html, 
 schema-fingerprinting.html


 Add function that returns a standardized, 64-bit fingerprint for schemas.  
 Fingerprints are designed such that the chances of collisions is very, very 
 low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-784) SpecificCompiler should generate accessors

2012-03-05 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222847#comment-13222847
 ] 

Doug Cutting commented on AVRO-784:
---

Scott, good point.  Let's open a new issue to add unboxed accessors.

 SpecificCompiler should generate accessors
 --

 Key: AVRO-784
 URL: https://issues.apache.org/jira/browse/AVRO-784
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.5.0
Reporter: E. Sammer
  Labels: features
 Attachments: avro-784.diff, avro-784.diff


 Avro's Java SpecificCompiler should generate java bean style accessors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1027) NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread

2012-03-02 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221291#comment-13221291
 ] 

Doug Cutting commented on AVRO-1027:


James, thanks for running your tests.

I guess I'll go ahead and commit this now and roll a release candidate today.

 NettyTransceiver will deadlock when attempting transceive/disconnect on the 
 same thread
 ---

 Key: AVRO-1027
 URL: https://issues.apache.org/jira/browse/AVRO-1027
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.1
Reporter: Simon Wilkinson
Assignee: James Baldassari
 Fix For: 1.6.3

 Attachments: AVRO-1027-v2.patch, AVRO-1027.patch


 If an Exception is caught while trying to write to a Channel, Netty can 
 deliver the Exception to a ChannelUpstreamHandler on the same thread that 
 attempted to write to the Channel. If this occurs with the 
 NettyClientAvroHandler implementation of ChannelUpstreamHandler then the 
 thread will deadlock.
 Specifically, NettyClientAvroHandler overrides the 
 ChannelUpstreamHandler.exceptionCaught() method to perform a disconnect, 
 which requires the NettyTransceiver's write lock. However, in the above 
 situation, the thread will already have locked the NettyTransceiver's read 
 lock to write to the Channel. ReentrantReadWriteLock does not allow upgrading 
 from a read to a write lock, hence the thread deadlocks.
 Example stack trace (simplified):
 SessionManager-TimeoutPoller prio=10 tid=0x7b689c00 nid=0x375d waiting on 
 condition [0x7b0ad000..0x7b0ade70]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0xf2a944d8 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
 at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
  [Acquire write lock] at 
  org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:285)
 at 
 org.apache.avro.ipc.NettyTransceiver.access$2(NettyTransceiver.java:281)
 at 
 org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.exceptionCaught(NettyTransceiver.java:499)
 at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
 at 
 org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:473)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238)
 at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
 at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:661)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:372)
 at 
 org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:117)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771)
 at org.jboss.netty.channel.Channels.write(Channels.java:632)
 at 
 org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
 at org.jboss.netty.channel.Channels.write(Channels.java:611)
 at org.jboss.netty.channel.Channels.write(Channels.java:578)
 at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251)
  [Acquire read lock] at 
  org.apache.avro.ipc.NettyTransceiver.writeDataPack(NettyTransceiver.java:413)
  [Acquire read lock] at

[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas

2012-03-02 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221342#comment-13221342
]

Doug Cutting commented on AVRO-1006:

Raymie SchemaFingerprint.fingerprint seems unnecessarily long...

Now this becomes
SchemaNormalization.fp(SchemaNormalization.toParsingForm(schema)). The 'fp'
might better be spelled out as 'fingerprint'. Also a utility method like
SchemaNormalization.parsingFingerprint(schema) might be useful.

Graham pass a Normalizer instance...

With the latest API, someone can already call SchemaNormalization.fingerprint()
with a differently normalized schema, so I don't see the need for this. As we
add more normalizers to Avro we can add new methods, so I'm not (yet) seeing
the advantage of adding a Normalization interface.

Fingerprints for Avro Schemas
-

Key: AVRO-1006
URL: https://issues.apache.org/jira/browse/AVRO-1006
Project: Avro
Issue Type: New Feature
Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata
Labels: features
Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch,
AVRO-1006.patch, AVRO-1006.patch, schema-fingerprinting.html,
schema-fingerprinting.html, schema-fingerprinting.html

Add function that returns a standardized, 64-bit fingerprint for schemas.
Fingerprints are designed such that the chances of collisions is very, very
low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas

2012-03-02 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221386#comment-13221386
 ] 

Doug Cutting commented on AVRO-1006:


A few minor nits:
 - the 'href' in one of the documentation links is missing its 'h'
 - the WHITESPACE comment should perhaps read, Eliminate all whitespace in 
JSON outside of string literals.
 - we might define a nested FingerprintAlgorithm Enum for the implemented 
fingerprint algorithm names.
 - SchemaNormalization should probably have a private constructor, e.g., 
'private SchemaNormalization() {}'
 - the #fingerprint link in the class documentation is broken.

Otherwise I'm +1 and look forward to committing this early next week unless 
there are objections.

 Fingerprints for Avro Schemas
 -

 Key: AVRO-1006
 URL: https://issues.apache.org/jira/browse/AVRO-1006
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata
  Labels: features
 Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch, 
 AVRO-1006.patch, AVRO-1006.patch, AVRO-1006.patch, 
 schema-fingerprinting.html, schema-fingerprinting.html, 
 schema-fingerprinting.html


 Add function that returns a standardized, 64-bit fingerprint for schemas.  
 Fingerprints are designed such that the chances of collisions is very, very 
 low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1027) NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread

2012-03-01 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220296#comment-13220296
 ] 

Doug Cutting commented on AVRO-1027:


Simon, do you expect to supply a test today?

It would be good to include this in 1.6.3, but I don't think it's a 
showstopper, since it's not a regression, is it?

 NettyTransceiver will deadlock when attempting transceive/disconnect on the 
 same thread
 ---

 Key: AVRO-1027
 URL: https://issues.apache.org/jira/browse/AVRO-1027
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.1
Reporter: Simon Wilkinson
Assignee: James Baldassari
 Fix For: 1.6.3

 Attachments: AVRO-1027-v2.patch, AVRO-1027.patch


 If an Exception is caught while trying to write to a Channel, Netty can 
 deliver the Exception to a ChannelUpstreamHandler on the same thread that 
 attempted to write to the Channel. If this occurs with the 
 NettyClientAvroHandler implementation of ChannelUpstreamHandler then the 
 thread will deadlock.
 Specifically, NettyClientAvroHandler overrides the 
 ChannelUpstreamHandler.exceptionCaught() method to perform a disconnect, 
 which requires the NettyTransceiver's write lock. However, in the above 
 situation, the thread will already have locked the NettyTransceiver's read 
 lock to write to the Channel. ReentrantReadWriteLock does not allow upgrading 
 from a read to a write lock, hence the thread deadlocks.
 Example stack trace (simplified):
 SessionManager-TimeoutPoller prio=10 tid=0x7b689c00 nid=0x375d waiting on 
 condition [0x7b0ad000..0x7b0ade70]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0xf2a944d8 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
 at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
  [Acquire write lock] at 
  org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:285)
 at 
 org.apache.avro.ipc.NettyTransceiver.access$2(NettyTransceiver.java:281)
 at 
 org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.exceptionCaught(NettyTransceiver.java:499)
 at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
 at 
 org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:473)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238)
 at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
 at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:661)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:372)
 at 
 org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:117)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771)
 at org.jboss.netty.channel.Channels.write(Channels.java:632)
 at 
 org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
 at org.jboss.netty.channel.Channels.write(Channels.java:611)
 at org.jboss.netty.channel.Channels.write(Channels.java:578)
 at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251)
  [Acquire read lock] at 
  org.apache.avro.ipc.NettyTransceiver.writeDataPack(NettyTransceiver.java:413)

[jira] [Commented] (AVRO-1027) NettyTransceiver will deadlock when attempting transceive/disconnect on the same thread

2012-03-01 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220363#comment-13220363
 ] 

Doug Cutting commented on AVRO-1027:


Should we:
- a. commit this without tests and roll a 1.6.3 RC today
- b. hold off on 1.6.3 until this has tests next week
- c. roll a 1.6.3 RC today without this

I don't have a strong opinion.

 NettyTransceiver will deadlock when attempting transceive/disconnect on the 
 same thread
 ---

 Key: AVRO-1027
 URL: https://issues.apache.org/jira/browse/AVRO-1027
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.1
Reporter: Simon Wilkinson
Assignee: James Baldassari
 Fix For: 1.6.3

 Attachments: AVRO-1027-v2.patch, AVRO-1027.patch


 If an Exception is caught while trying to write to a Channel, Netty can 
 deliver the Exception to a ChannelUpstreamHandler on the same thread that 
 attempted to write to the Channel. If this occurs with the 
 NettyClientAvroHandler implementation of ChannelUpstreamHandler then the 
 thread will deadlock.
 Specifically, NettyClientAvroHandler overrides the 
 ChannelUpstreamHandler.exceptionCaught() method to perform a disconnect, 
 which requires the NettyTransceiver's write lock. However, in the above 
 situation, the thread will already have locked the NettyTransceiver's read 
 lock to write to the Channel. ReentrantReadWriteLock does not allow upgrading 
 from a read to a write lock, hence the thread deadlocks.
 Example stack trace (simplified):
 SessionManager-TimeoutPoller prio=10 tid=0x7b689c00 nid=0x375d waiting on 
 condition [0x7b0ad000..0x7b0ade70]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0xf2a944d8 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
 at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
  [Acquire write lock] at 
  org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:285)
 at 
 org.apache.avro.ipc.NettyTransceiver.access$2(NettyTransceiver.java:281)
 at 
 org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.exceptionCaught(NettyTransceiver.java:499)
 at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
 at 
 org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:473)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238)
 at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
 at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:661)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.writeFromUserCode(NioWorker.java:372)
 at 
 org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:117)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771)
 at org.jboss.netty.channel.Channels.write(Channels.java:632)
 at 
 org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:70)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
 at org.jboss.netty.channel.Channels.write(Channels.java:611)
 at org.jboss.netty.channel.Channels.write(Channels.java:578)
 at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251)
  [Acquire read lock] at

[jira] [Commented] (AVRO-999) NPE in Java, RecordBuilderBase.defaultValue

2012-02-23 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214909#comment-13214909
 ] 

Doug Cutting commented on AVRO-999:
---

The test case added in this patch passes with the changes in AVRO-1007.  Also, 
the tests added in AVRO-1007 are substantially similar to the test added here.

 NPE in Java, RecordBuilderBase.defaultValue
 ---

 Key: AVRO-999
 URL: https://issues.apache.org/jira/browse/AVRO-999
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.1
 Environment: Java
Reporter: Jay Rutten
Assignee: James Baldassari
 Attachments: AVRO-999.patch


 If you have a union with a default of null, the code in 
 RecordBuilderBase.defaultValue will cause an NPE in ConcurrentHashMap, since 
 it is trying to add a null to the map.
 Sample union:
 {code}
 record Sample {
 union{null, string} value = null;
 }
 {code}
 Code:
 {code}
 // If not cached, get the default Java value by encoding the default JSON
 // value and then decoding it:
 if (defaultValue == null) {
   ByteArrayOutputStream baos = new ByteArrayOutputStream();
   encoder = EncoderFactory.get().binaryEncoder(baos, encoder);
   ResolvingGrammarGenerator.encode(encoder, field.schema(), defaultJsonValue);
   encoder.flush();
   decoder = DecoderFactory.get().binaryDecoder(baos.toByteArray(), decoder);
   defaultValue = new GenericDatumReader(field.schema()).read(null, decoder);
   defaultSchemaValues.putIfAbsent(field.pos(), defaultValue); // -- NPE from 
 here
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas

2012-02-23 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214932#comment-13214932
 ] 

Doug Cutting commented on AVRO-1006:


This looks generally good.  A few nits:
 - In class and method names, should we abbreviate 'fp' or spell out 
'fingerprint'?  FP means floating point to my eye.
 - Might we instead put this in org.apache.avro.SchemaFingerprint, rather than 
in util?  Right now things in the util package depend only on the JDK, not on 
other parts of Avro.
 - Public methods and classes need javadoc comments.
 - The changes to the spec are not correctly processed by Forrest 0.8 for me.

 Fingerprints for Avro Schemas
 -

 Key: AVRO-1006
 URL: https://issues.apache.org/jira/browse/AVRO-1006
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata
  Labels: features
 Attachments: AVRO-1006-prelim.patch, AVRO-1006.patch, 
 schema-fingerprinting.html, schema-fingerprinting.html, 
 schema-fingerprinting.html


 Add function that returns a standardized, 64-bit fingerprint for schemas.  
 Fingerprints are designed such that the chances of collisions is very, very 
 low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1036) IDL processing fails with multi-level nested imports

2012-02-23 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215125#comment-13215125
 ] 

Doug Cutting commented on AVRO-1036:


Looks like we patched this in parallel!  You added some tests, which is great!

My patch is a little different.  When you include a file in a different 
directory then any imports it contains should be relative to its directory, not 
the directory of the original file, no?  So this.inputDir is not the right 
value for the new inputDir, but rather it should come from the imported file.

 IDL processing fails with multi-level nested imports
 

 Key: AVRO-1036
 URL: https://issues.apache.org/jira/browse/AVRO-1036
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.2
Reporter: George Fletcher
Assignee: Doug Cutting
 Fix For: 1.6.3

 Attachments: AVRO-1036.patch, jira-1036.patch


 The change to support finding IDL related files on the classpath in addition 
 to the maven-plugin defined directory caused the context of the 
 sourceDirectory to be lost when the InputStream return by findFile() is 
 used to create a new Idl instance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

2012-02-22 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213866#comment-13213866
 ] 

Doug Cutting commented on AVRO-593:
---

 Ideally, anything in the .io, .util, and .file packages does not reference 
 the .mapred or .mapreduce packages [ ... ]

Much in these packages references AvroKey and AvroValue and/or AvroJob.  These 
uses aren't mapreduce-specific and could be refactored away, e.g., by moving 
AvroKey and AvroValue from o.a.a.mapred to o.a.a.hadoop.io, but that would be 
incompatible.

SortedKeyValueFile is the Avro equivalent of Hadoop's MapFile.  Arguably it 
should be moved into o.a.a.io.  It depends on AvroKeyValue, which might also be 
moved to the core.  AvroKeyValue is very similar in functionality to 
o.a.a.mapred.Pair.  Perhaps SortedKeyValueFile should be switched to use Pair 
and both moved to the core.

I have implemented a SequenceFile shim and it works.  There's now just a tiny 
class that needs to be in o.a.h.io, a base class that exposes two 
package-private nested classes from within SequenceFile.

I've re-arranged the classes per Scott's #4 variant but can revert that.  We 
need to decide how much refactoring we want to do here.

Finally, I note that io.SeekableHadoopInput replicates functionality that's 
already in mapred.FsInput, so we should replace the former with the latter in 
the new code.

 Avro mapreduce apis incompatible with hadoop 0.20.2
 ---

 Key: AVRO-593
 URL: https://issues.apache.org/jira/browse/AVRO-593
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.3.2, 1.3.3
 Environment: Avro 1.3.3, Hadoop 0.20.2
Reporter: Steve Severance
Assignee: Garrett Wu
 Attachments: AVRO-593.patch, AVRO-593.patch


 The avro api's for hadoop use the hadoop mapreduce api that has been 
 deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 
 and higher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1035) Add the possibility to append to existing avro files

2012-02-22 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213874#comment-13213874
 ] 

Doug Cutting commented on AVRO-1035:


Note that append is not reliable in current Hadoop releases.  Append support 
in Hadoop 1.0 just means that flush() works reliably, not that append actually 
works.  Append should be reliable in 0.23 releases although I doubt it's been 
well tested there yet.

 Add the possibility to append to existing avro files  
 --

 Key: AVRO-1035
 URL: https://issues.apache.org/jira/browse/AVRO-1035
 Project: Avro
  Issue Type: New Feature
Reporter: Vyacheslav Zholudev

 Currently it is not possible to append to avro files that were written and 
 closed. 
 Here is a Scott Carey's reply on the mailing list:
 {quote}
 It is not possible without modifying DataFileWriter. Please open a JIRA
 ticket.  
 It could not simply append to an OutputStream, since it must either:
 * Seek to the start to validate the schemas match and find the sync
 marker, or
 * Trust that the schemas match and find the sync marker from the last block
 DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
 could add something to the mapred module that takes a Path and FileSystem
 and returns
 something that implemements an interface that DataFileWriter can append
 to.  This would be something that is both a
 http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInp
 ut.html
 and an OutputStream, or has both an InputStream from the start of the
 existing file and an OutputStream at the end.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1023) Saved state should be restored in finally clause

2012-02-22 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214094#comment-13214094
 ] 

Doug Cutting commented on AVRO-1023:


The way this might be triggered is if a Schema.Parser is reused after a 
SchemaParseException is thrown.  Currently the default namespace is that of the 
preceding schema parsed.  If a SchemaParseException is thrown and the parser is 
reused then the default namespace could be that of a schema nested within the 
previous schema.

 Saved state should be restored in finally clause
 

 Key: AVRO-1023
 URL: https://issues.apache.org/jira/browse/AVRO-1023
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata
Priority: Minor
 Attachments: AVRO-1023.patch


 Schema.parse(JsonParse) and Schema.parse(JsonNode,Names) save global state in 
 a local variable; they should restore that state in a finally clause but they 
 don't.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-999) NPE in Java, RecordBuilderBase.defaultValue

2012-02-22 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214103#comment-13214103
 ] 

Doug Cutting commented on AVRO-999:
---

I think this was already fixed in AVRO-1007.  Can I close it as a duplicate?

 NPE in Java, RecordBuilderBase.defaultValue
 ---

 Key: AVRO-999
 URL: https://issues.apache.org/jira/browse/AVRO-999
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.1
 Environment: Java
Reporter: Jay Rutten
Assignee: James Baldassari
 Attachments: AVRO-999.patch


 If you have a union with a default of null, the code in 
 RecordBuilderBase.defaultValue will cause an NPE in ConcurrentHashMap, since 
 it is trying to add a null to the map.
 Sample union:
 {code}
 record Sample {
 union{null, string} value = null;
 }
 {code}
 Code:
 {code}
 // If not cached, get the default Java value by encoding the default JSON
 // value and then decoding it:
 if (defaultValue == null) {
   ByteArrayOutputStream baos = new ByteArrayOutputStream();
   encoder = EncoderFactory.get().binaryEncoder(baos, encoder);
   ResolvingGrammarGenerator.encode(encoder, field.schema(), defaultJsonValue);
   encoder.flush();
   decoder = DecoderFactory.get().binaryDecoder(baos.toByteArray(), decoder);
   defaultValue = new GenericDatumReader(field.schema()).read(null, decoder);
   defaultSchemaValues.putIfAbsent(field.pos(), defaultValue); // -- NPE from 
 here
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

2012-02-21 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212819#comment-13212819
 ] 

Doug Cutting commented on AVRO-593:
---

I see a few choices:

1. org.apache.avro.{mapred,mapreduce,io,file,util}.  This is what the code on 
github does.  This would make the avro-mapred module contain things outside the 
org.apache.avro.mapred package, and splits Avro's io, file and util packages 
across multiple modules.

2. org.apache.avro.mapred.{mapreduce,io,file,util}.  This is what my patch 
does.  This is back-compatible and consistent with the module name, but places 
mapreduce under mapred, which is different than the Hadoop layout.

3. org.apache.avro.hadoop.{mapred,mapreduce,io,file,util}.  We'd rename the 
module to be avro-hadoop.  This would be incompatible but consistent with 
Hadoop.  For back-compatibility we might leave the mapred classes in their 
current package.

4. org.apache.avro.{mapred,mapreduce,mapred.io,mapred.file,mapred.util}.  This 
is back-compatible but includes a package that's not under the package of the 
module name.

Tom, are you advocating for (4)?  I'd be okay with that, I guess.

I'm also leaning towards moving AvroSequenceFile under org.apache.avro and 
adding just a shim base class into org.apache.hadoop.io that subclasses 
SequenceFile and makes public the bits we need.  That way if we get Hadoop to 
expose these bits the Avro API would not change.




 Avro mapreduce apis incompatible with hadoop 0.20.2
 ---

 Key: AVRO-593
 URL: https://issues.apache.org/jira/browse/AVRO-593
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.3.2, 1.3.3
 Environment: Avro 1.3.3, Hadoop 0.20.2
Reporter: Steve Severance
Assignee: Garrett Wu
 Attachments: AVRO-593.patch, AVRO-593.patch


 The avro api's for hadoop use the hadoop mapreduce api that has been 
 deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 
 and higher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

2012-02-21 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213071#comment-13213071
]

Doug Cutting commented on AVRO-593:
---

Is it possible to move AvroSequenceFile under o.a.a ?

I discussed that above. We could move it, but we'd still need a shim in
o.a.h.io, since the subclass accesses package-private bits.

if we need to produce two otherwise identical modules in a build – one 0.23.x
+ compatible and one for the 0.20 / 0.22 / 1.0 users

The nested Context classes in mapreduce's Mapper and Reducer went from abstract
classes to interfaces (MAPREDUCE-954), requiring re-compilation of code that
references these. But the mapreduce support added here does not reference
these. So I think we're spared.

Avro mapreduce apis incompatible with hadoop 0.20.2
---

Key: AVRO-593
URL: https://issues.apache.org/jira/browse/AVRO-593
Project: Avro
Issue Type: Bug
Components: java
Affects Versions: 1.3.2, 1.3.3
Environment: Avro 1.3.3, Hadoop 0.20.2
Reporter: Steve Severance
Assignee: Garrett Wu
Attachments: AVRO-593.patch, AVRO-593.patch

The avro api's for hadoop use the hadoop mapreduce api that has been
deprecated. A new avro mapreduce api should be implemented for hadoop 0.20
and higher.

[jira] [Commented] (AVRO-672) Convert JSON Text Input to Avro Tool

2012-02-17 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210586#comment-13210586
 ] 

Doug Cutting commented on AVRO-672:
---

Leith, is the tool that Ron provided here the one you need?  If so, then we can 
probably resuscitate this patch and get it committed.  If not, is there a 
specific tool you need (e.g., CSV or TSV)?  Thanks!

 Convert JSON Text Input to Avro Tool
 

 Key: AVRO-672
 URL: https://issues.apache.org/jira/browse/AVRO-672
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Ron Bodkin
 Attachments: AVRO-672.patch, AVRO-672.patch


 The attached patch allows reading a JSON-formatted text file in, converting 
 to a conforming Avro text file, emitting one record per line, e.g., it can 
 read this input file:
 {intval:12}
 {intval:-73,strval:hello, there!!}
 with this schema:
 { type:record, name:TestRecord, fields: [ 
 {name:intval,type:int}, {name:strval,type:[string, null]}]}
 returning valid Avro. This is different than the DataFileWriteTool, which 
 would read in the following internal encoding:
 {intval:12,strval:null}
 {intval:-73,strval:{string:hello, there!!}}
 In general, the internal encodings used by Avro aren't natural when reading 
 in JSON text that appears in the wild. Likewise, this utility allows changing 
 invalid Avro identifier characters into an underscore, again to tolerate JSON 
 that wasn't designed to be readable by Avro.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

2012-02-17 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210813#comment-13210813
 ] 

Doug Cutting commented on AVRO-593:
---

Garrett, this code looks great!  Thanks for contributing it.

I renamed all of the packages to reside under org.apache.avro.mapred.  So that 
package now has subpackages named io, file, util and mapreduce.  That's 
consistent with other Avro modules, where classes are under 
org.apache.avro.module.

The only exception is org.apache.hadoop.io.AvroSequenceFile.  This is in a 
Hadoop package so that it can access some package-private parts of 
SequenceFile.  This is fragile, as SequenceFile could change these non-public 
APIs.  We should probably file an issue with Hadoop to make these items 
protected so that SequenceFile can be subclassed in a supported way.

I plan to improve the javadoc a bit (adding package.html files to new packages) 
and move versions for new dependencies from mapred/pom.xml into the parent pom. 
 Then I think this should be ready to commit.

 Avro mapreduce apis incompatible with hadoop 0.20.2
 ---

 Key: AVRO-593
 URL: https://issues.apache.org/jira/browse/AVRO-593
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.3.2, 1.3.3
 Environment: Avro 1.3.3, Hadoop 0.20.2
Reporter: Steve Severance
Assignee: Garrett Wu
 Attachments: AVRO-593.patch, AVRO-593.patch


 The avro api's for hadoop use the hadoop mapreduce api that has been 
 deprecated. A new avro mapreduce api should be implemented for hadoop 0.20 
 and higher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1025) migrate website dist to svnpubsub

2012-02-16 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209735#comment-13209735
 ] 

Doug Cutting commented on AVRO-1025:


I committed the docs to subversion and asked Infrastructure to switch the 
website to automatically update from there in INFRA-4443.


 migrate website  dist to svnpubsub
 ---

 Key: AVRO-1025
 URL: https://issues.apache.org/jira/browse/AVRO-1025
 Project: Avro
  Issue Type: Improvement
Reporter: Doug Cutting
Assignee: Doug Cutting

 ASF infrastructure has requested that all projects migrate to svnpubsub for 
 their websites and release distributions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1022) Error in validate name

2012-02-13 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207029#comment-13207029
 ] 

Doug Cutting commented on AVRO-1022:


 If names are restricted, then consuming schemas from other systems will be 
 difficult.

Good point.  The question is where the escaping burden lies: either with 
adapter layers (e.g., in Pig or Hive) or in the code generation layer.  I'd 
argue that code generation layer already has to handle reserved words so that 
adding character escaping is not a significant burden there.  It's also safer 
to not assume that other implementations have correctly escaped all names; to 
be tolerant.  Finally, escaping as late as possible maximizes legibility 
through the system.


 Error in validate name
 --

 Key: AVRO-1022
 URL: https://issues.apache.org/jira/browse/AVRO-1022
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Raymie Stata
Priority: Minor
 Attachments: AVRO-1022.patch, AVRO-1022.patch


 Fix schema.validateName to allow only ASCII letters, not Unicode letters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1028) IPC transceiver doesn't gracefully handle server connection resets.

2012-02-13 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207313#comment-13207313
 ] 

Doug Cutting commented on AVRO-1028:


Patch works for me, but with the addition of urllib3 requirement perhaps we 
should push this to 1.7.0?

 IPC transceiver doesn't gracefully handle server connection resets.
 ---

 Key: AVRO-1028
 URL: https://issues.apache.org/jira/browse/AVRO-1028
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.6.2
Reporter: Bo Shi
Assignee: Bo Shi
 Fix For: 1.6.2

 Attachments: AVRO-1028.patch


 The current Python HTTPTransceiver class forces users to handle connection 
 resets.
 I've refactored the class using urllib3 and incorporated some features we get 
 for free from said library into the transceiver.  Added a test case for 
 test_ipc.py that uses the twisted server implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1022) Error in validate name

2012-02-10 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205616#comment-13205616
]

Doug Cutting commented on AVRO-1022:

An implementation would be naive to trust that other implementations have
validated all names in schemas it receives. Java currently disables validation
when reading a schema from a data file, since it's more important to be able to
read the data. With Generic APIs name validation isn't required and many
applications use only generic APIs.

This would not require support for unicode identifiers in programming
languages. A code generator should escape any character in a name that's not
easy for it to represent in an identifier. We'd just be permitting code
generators to take advantage of when a programming language does support
Unicode in identifiers.

If we went the other way (chance the spec), we'd have to answer a bunch of
design questions
(decide what is a letter, decide on normalization, figure out how to mangle
names in various
languages, etc.), and then implement validation in each language [ ... ]

I disagree. Even if we removed all restrictions on naming I don't think we'd
add much burden to implementations. Most implementations don't do code
generation. Code generators already need to mangle names. A code generator
should already escape rather than die when it sees an unexpected character in a
name. (The alternative is an inability to generate code for schemas that
someone else controls, a poor choice.)

So I don't see a new interoperability problem this would create. We already
have schemas in the wild whose names are invalid.

Perhaps we should change the spec to recommend that names be restricted to
ASCII for ease of programming with generated APIs in all languages. And we
might check that in compiler, forcing folks to specify --escape-non-ASCII-names
if they really want to generate code for a schema whose names contain non-ASCII
characters, to discourage the use of non-ASCII in schemas that you do control.
In general we could encourage implementations to both not trust that
identifiers are all-ASCII and to try to encourage all-ASCII identifiers.

Error in validate name
--

Key: AVRO-1022
URL: https://issues.apache.org/jira/browse/AVRO-1022
Project: Avro
Issue Type: Bug
Components: java
Reporter: Raymie Stata
Priority: Minor
Attachments: AVRO-1022.patch

Fix schema.validateName to allow only ASCII letters, not Unicode letters.

[jira] [Commented] (AVRO-973) Union behavior not consistent

2012-02-10 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205678#comment-13205678
 ] 

Doug Cutting commented on AVRO-973:
---

 There is no change you can make to the current validation-based mechanic that 
 guarantees correctness for record types 

That's right.

Full recursive field validation is not required for unions.  See AVRO-654 and 
http://avro.apache.org/docs/current/spec.html#Unions

The object's schema name should be checked against the names of the schemas in 
the union.  If the fields don't match but the names are the same then a runtime 
error should be generated.

This is a longstanding misfeature of the Python implementation.


 Union behavior not consistent
 -

 Key: AVRO-973
 URL: https://issues.apache.org/jira/browse/AVRO-973
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.6.1, 1.6.2
Reporter: Gaurav Nanda
  Labels: patch
 Attachments: AVRO-973-patch-1.patch, AVRO-973-patch-2.patch, 
 AVRO-973-patch-3.patch, AVRO-973-wrapper.patch, AVRO-973-wrapper.patch, 
 test_unions.py

   Original Estimate: 0.25h
  Remaining Estimate: 0.25h

 Python's union does not respect the order in which type is specified.
 For following schema: 
 {type:map,values:[int,long,float,double,string,boolean]}, 
 an integer value is written as double, but it should respect the order in 
 which types have been specified.
 Fixed Code (io.py):
 def write_union(self, writers_schema, datum, encoder):

A union is encoded by first writing a long value indicating
the zero-based position within the union of the schema of its value.
The value is then encoded per the indicated schema within the union.

# resolve union
index_of_schema = -1
for i, candidate_schema in enumerate(writers_schema.schemas):
  if validate(candidate_schema, datum):
index_of_schema = i
break // XXX Add break statement here XXX//
if index_of_schema  0: raise AvroTypeException(writers_schema, datum)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-973) Union behavior not consistent

2012-02-10 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205715#comment-13205715
 ] 

Doug Cutting commented on AVRO-973:
---

Ultimately I think the fix is AVRO-283.  Python would use objects to represent 
records, not dictionaries.  This was recently discussed on the dev list where 
Marcio expressed interest in working on it (http://s.apache.org/deA).  Perhaps 
you can collaborate there?

 Union behavior not consistent
 -

 Key: AVRO-973
 URL: https://issues.apache.org/jira/browse/AVRO-973
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.6.1, 1.6.2
Reporter: Gaurav Nanda
  Labels: patch
 Attachments: AVRO-973-patch-1.patch, AVRO-973-patch-2.patch, 
 AVRO-973-patch-3.patch, AVRO-973-wrapper.patch, AVRO-973-wrapper.patch, 
 test_unions.py

   Original Estimate: 0.25h
  Remaining Estimate: 0.25h

 Python's union does not respect the order in which type is specified.
 For following schema: 
 {type:map,values:[int,long,float,double,string,boolean]}, 
 an integer value is written as double, but it should respect the order in 
 which types have been specified.
 Fixed Code (io.py):
 def write_union(self, writers_schema, datum, encoder):

A union is encoded by first writing a long value indicating
the zero-based position within the union of the schema of its value.
The value is then encoded per the indicated schema within the union.

# resolve union
index_of_schema = -1
for i, candidate_schema in enumerate(writers_schema.schemas):
  if validate(candidate_schema, datum):
index_of_schema = i
break // XXX Add break statement here XXX//
if index_of_schema  0: raise AvroTypeException(writers_schema, datum)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1022) Error in validate name

2012-02-09 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204844#comment-13204844
]

Doug Cutting commented on AVRO-1022:

And doing Unicode right is a lot of work; doing it poorly will just create a
nasty source of interop problems.

I don't see this. Avro already requires that JSON parsers do Unicode right.
Permitting non-ASCII in identifiers only creates problems when generating code.
The potential interoperability problem could be that some implementations,
when given a schema, would be unable to generate valid code in their
programming language for that schema, rendering that schema unreadable by
generated code (although it would still be readable by generic code). That
would be a bug in that implementation.

Code generators already have to mangle names that are reserved words in the
generated programming language. If we permit non-ASCII characters in
identifiers then implementations might also need to escape non-ASCII characters
when generating code. This doesn't seem a huge burden.

It's important that the specification is clear about what characters
implementations might expect to see in identifiers so that they know what
characters need to be escaped. A conservative implementation might simply
escape anything that's not permitted in their programming language.

If the spec is changed we should specify precisely what characters are
permitted. Unicode characters have properties. We can use these properties to
make the specification precise. One property is 'letter', another is 'number'.
Java's isLetterOrDigit() includes these two sets.

Stepping back, it would be good if folks could use their own languages when
writing Avro schemas. It should be possible to use, e.g., column names that
are in Japanese, Chinese, Hindi, etc.

Error in validate name
--

Key: AVRO-1022
URL: https://issues.apache.org/jira/browse/AVRO-1022
Project: Avro
Issue Type: Bug
Components: java
Reporter: Raymie Stata
Priority: Minor
Attachments: AVRO-1022.patch

Fix schema.validateName to allow only ASCII letters, not Unicode letters.

[jira] [Commented] (AVRO-990) ruby impl failed when the local_protocol not same with remote_protocol

2012-02-09 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204914#comment-13204914
 ] 

Doug Cutting commented on AVRO-990:
---

It would also be good to have a test case included in the patch, if possible.

 ruby impl failed when the local_protocol not same with remote_protocol
 --

 Key: AVRO-990
 URL: https://issues.apache.org/jira/browse/AVRO-990
 Project: Avro
  Issue Type: Bug
  Components: ruby
Affects Versions: 1.6.1
Reporter: kafka0102
 Fix For: 1.7.0

 Attachments: ipc.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 For Requestor class,when local_protocol is not same with 
 remote_protocol,Requestor makes REMOTE_HASHES[transport.remote_name] has 
 value, and skips  self.remote_protocol = local_protocol in the 
 write_handshake_request method, making the next new Requestor object's 
 @remote_protocol always nil.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1021) Fix a few name-related imperfections in Avro spec

2012-02-08 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203788#comment-13203788
 ] 

Doug Cutting commented on AVRO-1021:


 if there is an Avro Data File or old schema that used a name before defining 
 it that currently works

I don't think any implementation currently supports use-before-define, does it?

A left-to-right traversal of JSON only makes sense for array elements.  The 
only schemas that include multiple types and traversal order matters are unions 
and records, but these use JSON arrays, so left-to-right works.  The types 
array in a protocol definition could come textually after the messages, but the 
types must be processed before the messages and in-order.  Should we clarify 
that too?

 Fix a few name-related imperfections in Avro spec
 -

 Key: AVRO-1021
 URL: https://issues.apache.org/jira/browse/AVRO-1021
 Project: Avro
  Issue Type: Bug
  Components: spec
Reporter: Raymie Stata
Assignee: Raymie Stata
Priority: Minor
 Attachments: AVRO-1021.patch, AVRO-1021.patch


 Require names are defined before used; disallow multiple definitions of 
 names; clarify that name-equality is case sensitive (for type names, field 
 names, and enum symbols).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1022) Error in validate name

2012-02-08 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203818#comment-13203818
 ] 

Doug Cutting commented on AVRO-1022:


Every language that currently implements Avro supports unicode identifiers.  So 
I wonder if we should instead amend the specification to permit non-ASCII 
characters?

 Error in validate name
 --

 Key: AVRO-1022
 URL: https://issues.apache.org/jira/browse/AVRO-1022
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Raymie Stata
Priority: Minor
 Attachments: AVRO-1022.patch


 Fix schema.validateName to allow only ASCII letters, not Unicode letters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-995) Java: Update Dependencies for 1.6.2

2012-02-08 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203823#comment-13203823
 ] 

Doug Cutting commented on AVRO-995:
---

+1

 Java: Update Dependencies for 1.6.2
 ---

 Key: AVRO-995
 URL: https://issues.apache.org/jira/browse/AVRO-995
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.6.2

 Attachments: AVRO-995.2.patch, AVRO-995.patch


 A few of our dependencies need upgrading.  In particular, I have been hit by 
 a bug in Jackson that is fixed by the latest release 
 (http://jira.codehaus.org/browse/JACKSON-462).
 Summary:  I will submit a patch that updates everything to the next bugfix or 
 very minor release, other than paranamer, thrift, and hadoop.
 Details:
 (using maven versions plugin)
 On the dependency side:
 [INFO]   com.thoughtworks.paranamer:paranamer  2.3 - 
 2.4-debug
   could not find info about what is new in 2.4.  I do not think we should 
 upgrade until we have more info
 [INFO]   net.sf.jopt-simple:jopt-simple  4.1 - 
 4.3
   minor extra features (http://pholser.github.com/jopt-simple/changes.html)
 [INFO]   org.apache.hadoop:hadoop-core  0.20.205.0 - 
 1.0.0
   renamed 0.20.205, no need to update yet.
 [INFO]   org.codehaus.jackson:jackson-mapper-asl ... 1.8.6 - 
 1.9.3
   I suggest we upgrade to 1.8.7.
 [INFO]   org.jboss.netty:netty . 3.2.6.Final - 
 3.2.7.Final
   bugfix release
 [INFO]   org.apache.thrift:libthrift ... 0.7.0 - 
 0.8.0
   Is this a minor / bugfix release?  If so we should update, otherwise wait 
 until Avro 1.7.x
 [INFO]   org.slf4j:slf4j-api ... 1.6.3 - 
 1.6.4
 [INFO]   org.slf4j:slf4j-simple  1.6.3 - 
 1.6.4
   Minor bugfixes (http://www.slf4j.org/news.html)
 On the plugin side: (mvn versions:display-plugin-updates)
 [INFO]   maven-antrun-plugin .. 1.6 - 
 1.7 
   minor, looks safe: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWke1=w6nv2u85nkgqm0zxo3khyzdc8hazkhhvywjbuv...@mail.gmail.com%3E)
 [INFO]   maven-gpg-plugin . 1.3 - 1.4
   minor update 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/201108.mbox/%3CCA+nPnMw_3zQQCpzybQvo-QZFMCogvH31WEhxQnZ=cdzgxsr...@mail.gmail.com%3E)
 [INFO]   maven-checkstyle-plugin .. 2.6 - 2.8
   we avoided 2.7 before for some reason: 
 (http://mail-archives.apache.org/mod_mbox/maven-dev/201108.mbox/%3ccapoybqsvu+kup5vuce8rc6mjb9rykr2cpig+rvbe5o8teo6...@mail.gmail.com%3E)
   useful new feature: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C15365449.01320142181746.JavaMail.mark@MARK%3E)
 [INFO]   maven-surefire-plugin .. 2.10 - 2.11
   lots of bug fixes and long requested new features: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/201112.mbox/%3CCA+jQputH_uA2Ue6JqiHp1YeNo=qqxgcpdtgq9vv1aw_psqk...@mail.gmail.com%3E)
 [INFO]   maven-shade-plugin ... 1.4 - 1.5
   minor looks  safe: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C1076639049.01320107865464.JavaMail.benson@tinfoilhat.local%3E)
 [INFO]   maven-archetype-plugin ... 2.1 - 2.2
   
 http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWkeLyc-tA2NCh3xYR06W+eGiWd46fS=R=ngjon14zrd...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1022) Error in validate name

2012-02-08 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203876#comment-13203876
 ] 

Doug Cutting commented on AVRO-1022:


I doubt it's well tested in implementations so there are probably bugs there.

 Error in validate name
 --

 Key: AVRO-1022
 URL: https://issues.apache.org/jira/browse/AVRO-1022
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Raymie Stata
Priority: Minor
 Attachments: AVRO-1022.patch


 Fix schema.validateName to allow only ASCII letters, not Unicode letters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1007) Insufficient validation in generated specific record builder implementations

2012-02-08 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203928#comment-13203928
 ] 

Doug Cutting commented on AVRO-1007:


According to the spec, that schema is malformed.  The default value of a union 
should always be interpreted as the first type in the union.

I don't think this schema currently works when reading records that lack the 
field f.  See line 348 of ResovlingGrammarGenerator, where the default value 
is encoded using the first type in the union.

I suppose we could change the spec to permit a null default value if any 
element of the union is null, but I don't see why we should.  It makes the spec 
more complex and doesn't provide any additional expressive power.

This patch would make the builder API enforce the spec, consistent with 
ResolvingDecoder.

 Insufficient validation in generated specific record builder implementations
 

 Key: AVRO-1007
 URL: https://issues.apache.org/jira/browse/AVRO-1007
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.6.1
Reporter: James Baldassari
Assignee: James Baldassari
  Labels: java
 Fix For: 1.6.2

 Attachments: AVRO-1007-v2.patch, AVRO-1007-v3.patch, 
 AVRO-1007-v4.patch, AVRO-1007.patch, AVRO-1007.patch, AVRO-1007.patch


 The are two main problems with the generated build() method in specific 
 record builders:
 * For non-primitive types, if there is no default value and the user does not 
 set the value, build() will execute successfully without throwing an exception
 ** Instead, an AvroRuntimeException should be thrown with an exception 
 message indicating the name of the required field that was not set
 * For primitive types, if there is no default value and the user does not set 
 the value, an AvroRuntimeException is thrown with the 'cause' set to a 
 NullPointerException, which is not very helpful
 ** The NPE comes from attempting to set the primitive field to the result of 
 defaultValue(), which is null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1021) Fix a few name-related imperfections in Avro spec

2012-02-08 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204034#comment-13204034
 ] 

Doug Cutting commented on AVRO-1021:


 where the types attribute of a protocol is always deemed to come before the 
 messages attribute

Works for me.

 Fix a few name-related imperfections in Avro spec
 -

 Key: AVRO-1021
 URL: https://issues.apache.org/jira/browse/AVRO-1021
 Project: Avro
  Issue Type: Bug
  Components: spec
Reporter: Raymie Stata
Assignee: Raymie Stata
Priority: Minor
 Attachments: AVRO-1021.patch, AVRO-1021.patch


 Require names are defined before used; disallow multiple definitions of 
 names; clarify that name-equality is case sensitive (for type names, field 
 names, and enum symbols).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-850) Python protocol parsing doesn't set message error union to ['string'] when no errors declared

2012-02-08 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204165#comment-13204165
]

Doug Cutting commented on AVRO-850:
---

This is a duplicate of AVRO-748 as far as I can tell

Yes, but this one has a patch!

I'll commit this tomorrow unless there are objections.

Python protocol parsing doesn't set message error union to ['string'] when no
errors declared
-

Key: AVRO-850
URL: https://issues.apache.org/jira/browse/AVRO-850
Project: Avro
Issue Type: Bug
Components: python
Affects Versions: 1.5.0
Reporter: Jeremy Lewi
Assignee: Jeremy Lewi
Attachments: AVRO-850.patch

This bug applies to the python module.
According to the protocol specification
(http://avro.apache.org/docs/current/spec.html#Messages) when no errors are
declared in the protocol for a message, the effective error union is
['string']. The behavior of avro.protocol is not consistent with this
specification. In particular if no errors are declared the errors property
of Message will be None and not an instance of ErrorUnionSchema.
Consequently, if a message returns an error an exception gets thrown.
Patch to follow shortly.

[jira] [Commented] (AVRO-973) Union behavior not consistent

2012-02-08 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204169#comment-13204169
 ] 

Doug Cutting commented on AVRO-973:
---

I'll commit this tomorrow unless there are objections.

 Union behavior not consistent
 -

 Key: AVRO-973
 URL: https://issues.apache.org/jira/browse/AVRO-973
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.6.1, 1.6.2
Reporter: Gaurav Nanda
  Labels: patch
 Attachments: AVRO-973-patch-1.patch, AVRO-973-patch-2.patch, 
 test_unions.py

   Original Estimate: 0.25h
  Remaining Estimate: 0.25h

 Python's union does not respect the order in which type is specified.
 For following schema: 
 {type:map,values:[int,long,float,double,string,boolean]}, 
 an integer value is written as double, but it should respect the order in 
 which types have been specified.
 Fixed Code (io.py):
 def write_union(self, writers_schema, datum, encoder):

A union is encoded by first writing a long value indicating
the zero-based position within the union of the schema of its value.
The value is then encoded per the indicated schema within the union.

# resolve union
index_of_schema = -1
for i, candidate_schema in enumerate(writers_schema.schemas):
  if validate(candidate_schema, datum):
index_of_schema = i
break // XXX Add break statement here XXX//
if index_of_schema  0: raise AvroTypeException(writers_schema, datum)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-301) Handle non-reserved properties appropriately in the Python implementation

2012-02-08 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204171#comment-13204171
 ] 

Doug Cutting commented on AVRO-301:
---

I'll commit this tomorrow unless there are objections.

 Handle non-reserved properties appropriately in the Python implementation
 -

 Key: AVRO-301
 URL: https://issues.apache.org/jira/browse/AVRO-301
 Project: Avro
  Issue Type: New Feature
  Components: python
Reporter: Jeff Hammerbacher
Assignee: Marcio Silva
 Attachments: AVRO-301-patch-1.patch, AVRO-301-patch-2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1013) NettyTransceiver can hang after server restart

2012-02-07 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202679#comment-13202679
 ] 

Doug Cutting commented on AVRO-1013:


The new test passes for me even without the changes to NettyTransceiver.java.

 NettyTransceiver can hang after server restart
 --

 Key: AVRO-1013
 URL: https://issues.apache.org/jira/browse/AVRO-1013
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.6.1
Reporter: James Baldassari
Priority: Blocker
 Attachments: AVRO-1013.patch


 I ran into a very specific scenario today which can lead to NettyTransceiver 
 hanging indefinitely:
 # Start up a NettyServer
 # Initialize a NettyTransceiver and SpecificRequestor
 # Execute an RPC to establish the connection/handshake with the server
 # Shut down the server
 # Immediately execute another RPC
 After Step 4, NettyTransceiver will detect that the connection has been 
 closed and call NettyTransceiver#disconnect(boolean, boolean, Throwable), 
 which sets 'remote' to null, indicating to Requestor that the 
 NettyTransceiver is now disconnected.  However, if an RPC is executed just 
 after the server has closed its socket (Step 5) and before disconnect() has 
 been called, NettyTransceiver may still try to send this RPC because 'remote' 
 has not yet been set to null.  This race condition is normally ok because 
 NettyTransceiver#getChannel() will detect that the socket has been closed and 
 then try to reestablish the connection.  Unfortunately, in this scenario 
 getChannel() blocks forever when it attempts to acquire the write lock 
 because the read lock has been acquired twice rather than once as 
 getChannel() expects.  The read lock is acquired once by 
 transceive(ListByteBuffer, CallbackListByteBuffer) and again by 
 writeDataPack(NettyDataPack).
 The fix is fairly simple.  The writeDataPack(NettyDataPack) method (which is 
 private) does not acquire the read lock but specifies in its contract that 
 the read lock must acquired before calling this method.  This change prevents 
 the read lock from being acquired more than once by any single thread.  
 Another change is to have NettyTransceiver#isConnected() perform two checks 
 instead of one: remote != null  isChannelReady(channel).  This second 
 change should allow NettyTransceiver to detect disconnect events more quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-975) Support RPC in C#

2012-02-07 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202978#comment-13202978
 ] 

Doug Cutting commented on AVRO-975:
---

Andrew, is this the complete implementation?  If so, can anyone review it?

The patches don't apply cleanly for me on Linux, perhaps due to EOL differences.

Also, it would be great to be able to add this to the tests in 
share/test/interop/test_rpc_interop.sh.  These use HTTP, though.



 Support RPC in C#
 -

 Key: AVRO-975
 URL: https://issues.apache.org/jira/browse/AVRO-975
 Project: Avro
  Issue Type: New Feature
  Components: csharp
Affects Versions: 1.6.1
Reporter: Jeff Hammerbacher
 Attachments: 975.patch, buildtask.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1006) Fingerprints for Avro Schemas

2012-02-07 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203082#comment-13203082
]

Doug Cutting commented on AVRO-1006:

Some notes:

- Primitive types may have attributes, e.g., {type:int,
java-class:java.lang.Short}, so only primitives without any attributes may
be represented by their name alone.

- Attributes within JSON objects are not ordered. A correct JSON parser need
not preserve ordering. Relying on order-preservation may require some
implementations to write their own JSON libraries.

- With multiple Avro implementations, the chance of an inconsistent
canonicalization implementation is significant. Creating an adequate test
suite and validating all implementations would require significant effort.

Given the above, I'd be hesitant to build a system that depends on consistent
canonical schemas for correct operation. Folks who build systems that use Avro
would thus be wise to design them to gracefully handle inconsistent
canonicalization. For example, Avro's RPC handshake currently uses a
fingerprint-like approach without requiring canonicalization. Two
implementations that represent a schema using the same string will have more
efficient handshakes, but implementations that produce different strings for
equivalent schemas will still interoperate correctly. So a standard,
recommended canonical form could be useful, but folks should perhaps not assume
that every implementation is correct.

I like the idea of a schema repository. A related idea I've had is to use
something like a URL shortener. Instead of mapping url-url, it could map
url-schema. One would register one's schema with the shortener, then hand out
references. A shortener would, as an optimization, return the same ID for
equivalent schemas. The shortener would only need to rely on only a single
canonicalization implementation, its own.

Fingerprints for Avro Schemas
-

Key: AVRO-1006
URL: https://issues.apache.org/jira/browse/AVRO-1006
Project: Avro
Issue Type: New Feature
Components: java
Reporter: Raymie Stata
Assignee: Raymie Stata
Labels: features
Attachments: schema-fingerprinting.html, schema-fingerprinting.html,
schema-fingerprinting.html

Add function that returns a standardized, 64-bit fingerprint for schemas.
Fingerprints are designed such that the chances of collisions is very, very
low.

[jira] [Commented] (AVRO-986) Avro files generated from avro-c dont work with the Java mapred implementation.

2012-01-24 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192387#comment-13192387
 ] 

Doug Cutting commented on AVRO-986:
---

share/test/data/syncInMeta.avro was just where I placed the quickstop.db file 
already attached to this issue.

 Avro files generated from avro-c dont work with the Java mapred 
 implementation.
 ---

 Key: AVRO-986
 URL: https://issues.apache.org/jira/browse/AVRO-986
 Project: Avro
  Issue Type: Bug
  Components: c, java
 Environment: avro-c 1.6.2-SNAPSHOT
 avro-java 1.6.2-SNAPSHOT
 hadoop 0.20.2
Reporter: Michael Cooper
Priority: Critical
  Labels: c, hadoop, java, mapreduce
 Fix For: 1.6.2

 Attachments: 0001-Remove-sync-marker-from-metadata-in-header.patch, 
 0001-avromod-utility.patch, AVRO-986-java.patch, AVRO-986-java.patch, 
 quickstop.db


 When a file generated from the Avro-C implementation is fed into Hadoop, it 
 will fail with Block size invalid or too large for this implementation: -49.
 This is caused by the sync marker, namely the one that Avro-C puts into the 
 header...
 The org.apache.avro.mapred.AvroRecordReader uses a FileSplit object to work 
 out where it should read from, but this class is not particularly smart, it 
 just divides the file up into equal size chunks, the first being with 
 position 0.
 So org.apache.avro.mapred.AvroRecordReader gets 0 as the start of its chunk, 
 and calls
 {code:title=AvroRecordReader.java}reader.sync(split.getStart());   // sync to 
 start{code}
 Then the org.apache.avro.file.DataFileReader::seek() goes to 0, then searches 
 for a sync marker
 It encounters one at position 32, the one in the header metadata map, 
 avro.sync
 No other implementations add the sync marker in the metadata map, and none 
 read it from there, not even the C version.
 I suggest we remove this from the header as the simplest solution.
 Another solution would be to create an AvroFileSplit class in mapred that 
 knows where the blocks are, and provides the correct locations in the first 
 place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-981) Python Avro library does not build/install on OS X

2012-01-24 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192539#comment-13192539
 ] 

Doug Cutting commented on AVRO-981:
---

 snappy and python-snappy are dependencies that are not included in the 
 project, and should be

How do you mean they should they be included?  Do you mean we should mention 
them in the top-level BUILD.txt or something else?

Do you still feel we should remove them from setup.py?

Thanks!

 Python Avro library does not build/install on OS X
 --

 Key: AVRO-981
 URL: https://issues.apache.org/jira/browse/AVRO-981
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.5.4, 1.6.1
 Environment: Mac OS X 10.6.8, Python 2.5, 2.6, 2.7
Reporter: Russell Jurney
Priority: Blocker
  Labels: avro, fun, happy, pants, python
 Fix For: 1.5.4, 1.6.1

 Attachments: AVRO-981.patch


 russell-jurneys-macbook-pro:py rjurney$ sudo python2.5 setup.py install
 Password:
 running install
 running bdist_egg
 running egg_info
 writing requirements to avro.egg-info/requires.txt
 writing avro.egg-info/PKG-INFO
 writing top-level names to avro.egg-info/top_level.txt
 writing dependency_links to avro.egg-info/dependency_links.txt
 reading manifest file 'avro.egg-info/SOURCES.txt'
 writing manifest file 'avro.egg-info/SOURCES.txt'
 installing library code to build/bdist.macosx-10.6-i386/egg
 running install_lib
 running build_py
 creating build/bdist.macosx-10.6-i386
 creating build/bdist.macosx-10.6-i386/egg
 creating build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/avro/__init__.py - build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/avro/datafile.py - build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/avro/io.py - build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/avro/ipc.py - build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/avro/protocol.py - build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/avro/schema.py - build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/avro/tool.py - build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/avro/txipc.py - build/bdist.macosx-10.6-i386/egg/avro
 copying build/lib/pyAntTasks-1.3-LICENSE.txt - 
 build/bdist.macosx-10.6-i386/egg
 copying build/lib/pyAntTasks-1.3.jar - build/bdist.macosx-10.6-i386/egg
 creating build/bdist.macosx-10.6-i386/egg/simplejson
 copying build/lib/simplejson/__init__.py - 
 build/bdist.macosx-10.6-i386/egg/simplejson
 copying build/lib/simplejson/_speedups.c - 
 build/bdist.macosx-10.6-i386/egg/simplejson
 copying build/lib/simplejson/decoder.py - 
 build/bdist.macosx-10.6-i386/egg/simplejson
 copying build/lib/simplejson/encoder.py - 
 build/bdist.macosx-10.6-i386/egg/simplejson
 copying build/lib/simplejson/LICENSE.txt - 
 build/bdist.macosx-10.6-i386/egg/simplejson
 copying build/lib/simplejson/scanner.py - 
 build/bdist.macosx-10.6-i386/egg/simplejson
 copying build/lib/simplejson/tool.py - 
 build/bdist.macosx-10.6-i386/egg/simplejson
 byte-compiling build/bdist.macosx-10.6-i386/egg/avro/__init__.py to 
 __init__.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/avro/datafile.py to 
 datafile.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/avro/io.py to io.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/avro/ipc.py to ipc.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/avro/protocol.py to 
 protocol.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/avro/schema.py to schema.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/avro/tool.py to tool.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/avro/txipc.py to txipc.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/__init__.py to 
 __init__.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/decoder.py to 
 decoder.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/encoder.py to 
 encoder.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/scanner.py to 
 scanner.pyc
 byte-compiling build/bdist.macosx-10.6-i386/egg/simplejson/tool.py to tool.pyc
 creating build/bdist.macosx-10.6-i386/egg/EGG-INFO
 installing scripts to build/bdist.macosx-10.6-i386/egg/EGG-INFO/scripts
 running install_scripts
 running build_scripts
 creating build/scripts-2.5
 copying and adjusting ./scripts/avro - build/scripts-2.5
 changing mode of build/scripts-2.5/avro from 644 to 755
 creating build/bdist.macosx-10.6-i386/egg/EGG-INFO/scripts
 copying build/scripts-2.5/avro - 
 build/bdist.macosx-10.6-i386/egg/EGG-INFO/scripts
 changing mode of build/bdist.macosx-10.6-i386/egg/EGG-INFO/scripts/avro to 755
 copying avro.egg-info/PKG-INFO - build/bdist.macosx-10.6-i386/egg/EGG-INFO
 copying avro.egg-info/SOURCES.txt - build/bdist.macosx-10.6-i386/egg/EGG-INFO
 copying

[jira] [Commented] (AVRO-995) Java: Update Dependencies for 1.6.2

2012-01-24 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192567#comment-13192567
 ] 

Doug Cutting commented on AVRO-995:
---

Should we be concerned about updating versions of dependencies for non-bugfix 
reasons in a bugfix release?  It's possible that updating one of our 
dependencies could break a project that depends on Avro.  So it might be safer 
to simply update Jackson in 1.6.2 and save the rest of these for 1.7.0.  Am I 
being too paranoid?

 Java: Update Dependencies for 1.6.2
 ---

 Key: AVRO-995
 URL: https://issues.apache.org/jira/browse/AVRO-995
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.6.2

 Attachments: AVRO-995.patch


 A few of our dependencies need upgrading.  In particular, I have been hit by 
 a bug in Jackson that is fixed by the latest release 
 (http://jira.codehaus.org/browse/JACKSON-462).
 Summary:  I will submit a patch that updates everything to the next bugfix or 
 very minor release, other than paranamer, thrift, and hadoop.
 Details:
 (using maven versions plugin)
 On the dependency side:
 [INFO]   com.thoughtworks.paranamer:paranamer  2.3 - 
 2.4-debug
   could not find info about what is new in 2.4.  I do not think we should 
 upgrade until we have more info
 [INFO]   net.sf.jopt-simple:jopt-simple  4.1 - 
 4.3
   minor extra features (http://pholser.github.com/jopt-simple/changes.html)
 [INFO]   org.apache.hadoop:hadoop-core  0.20.205.0 - 
 1.0.0
   renamed 0.20.205, no need to update yet.
 [INFO]   org.codehaus.jackson:jackson-mapper-asl ... 1.8.6 - 
 1.9.3
   I suggest we upgrade to 1.8.7.
 [INFO]   org.jboss.netty:netty . 3.2.6.Final - 
 3.2.7.Final
   bugfix release
 [INFO]   org.apache.thrift:libthrift ... 0.7.0 - 
 0.8.0
   Is this a minor / bugfix release?  If so we should update, otherwise wait 
 until Avro 1.7.x
 [INFO]   org.slf4j:slf4j-api ... 1.6.3 - 
 1.6.4
 [INFO]   org.slf4j:slf4j-simple  1.6.3 - 
 1.6.4
   Minor bugfixes (http://www.slf4j.org/news.html)
 On the plugin side: (mvn versions:display-plugin-updates)
 [INFO]   maven-antrun-plugin .. 1.6 - 
 1.7 
   minor, looks safe: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWke1=w6nv2u85nkgqm0zxo3khyzdc8hazkhhvywjbuv...@mail.gmail.com%3E)
 [INFO]   maven-gpg-plugin . 1.3 - 1.4
   minor update 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/201108.mbox/%3CCA+nPnMw_3zQQCpzybQvo-QZFMCogvH31WEhxQnZ=cdzgxsr...@mail.gmail.com%3E)
 [INFO]   maven-checkstyle-plugin .. 2.6 - 2.8
   we avoided 2.7 before for some reason: 
 (http://mail-archives.apache.org/mod_mbox/maven-dev/201108.mbox/%3ccapoybqsvu+kup5vuce8rc6mjb9rykr2cpig+rvbe5o8teo6...@mail.gmail.com%3E)
   useful new feature: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C15365449.01320142181746.JavaMail.mark@MARK%3E)
 [INFO]   maven-surefire-plugin .. 2.10 - 2.11
   lots of bug fixes and long requested new features: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/201112.mbox/%3CCA+jQputH_uA2Ue6JqiHp1YeNo=qqxgcpdtgq9vv1aw_psqk...@mail.gmail.com%3E)
 [INFO]   maven-shade-plugin ... 1.4 - 1.5
   minor looks  safe: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C1076639049.01320107865464.JavaMail.benson@tinfoilhat.local%3E)
 [INFO]   maven-archetype-plugin ... 2.1 - 2.2
   
 http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWkeLyc-tA2NCh3xYR06W+eGiWd46fS=R=ngjon14zrd...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-989) Java: Improve Builder performance in Specific API

2012-01-24 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192608#comment-13192608
 ] 

Doug Cutting commented on AVRO-989:
---

+1 for reducing the coupling.  It's probably worthy of a separate issue.  It 
will probably subsume this issue.

In addition to making record fields private we might also change the default 
representation for strings to String (AVRO-803).

 Java:  Improve Builder performance in Specific API
 --

 Key: AVRO-989
 URL: https://issues.apache.org/jira/browse/AVRO-989
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Scott Carey
 Attachments: AVRO-989-v2.patch, AVRO-989.patch


 The Specific API generates Builder objects for each record.  This builder 
 uses a boolean[] to store flags for each field to indicate whether the field 
 is set or not.
 This is not space efficient, a boolean[] takes 16 bytes plus one byte per 
 field, rounded up to the nearest 8 byte interval. 
 This can be improved on by using BitSet for large records, and bitmasks on an 
 int for records with less than 32 fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

2012-01-24 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192672#comment-13192672
 ] 

Doug Cutting commented on AVRO-997:
---

I think in current trunk you'd get Unknown datum type: M which is somewhat 
more informative than the 1.5.1 message.  Maybe this could be improved to 
something like, Unknown datum for GenericData: M to hint that specific, 
reflect, thrift, protobuf or some other data model might be better?

 Union of enum and null cannot be serialized
 ---

 Key: AVRO-997
 URL: https://issues.apache.org/jira/browse/AVRO-997
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.5.1
Reporter: Aaron Kimball

 I have a schema like:
 {code}
 [
 {
   type: enum,
   name: Gender,
   symbols: [M, F]
 },
 {
   type : record,
   name : Foo,
   fields : [
 { type : [Gender, null], name : gender },
 ...
   ]
 }
 ]
 {code}
 I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
 When I go to serialize this, I get:
 {code}Not in union 
 [{type:enum,name:Gender,symbols:[M,F]},null]: M
   at 
 org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
   at 
 org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-995) Java: Update Dependencies for 1.6.2

2012-01-23 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191624#comment-13191624
 ] 

Doug Cutting commented on AVRO-995:
---

Should we include this in 1.6.2 or not?

 Java: Update Dependencies for 1.6.2
 ---

 Key: AVRO-995
 URL: https://issues.apache.org/jira/browse/AVRO-995
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.6.2

 Attachments: AVRO-995.patch


 A few of our dependencies need upgrading.  In particular, I have been hit by 
 a bug in Jackson that is fixed by the latest release 
 (http://jira.codehaus.org/browse/JACKSON-462).
 Summary:  I will submit a patch that updates everything to the next bugfix or 
 very minor release, other than paranamer, thrift, and hadoop.
 Details:
 (using maven versions plugin)
 On the dependency side:
 [INFO]   com.thoughtworks.paranamer:paranamer  2.3 - 
 2.4-debug
   could not find info about what is new in 2.4.  I do not think we should 
 upgrade until we have more info
 [INFO]   net.sf.jopt-simple:jopt-simple  4.1 - 
 4.3
   minor extra features (http://pholser.github.com/jopt-simple/changes.html)
 [INFO]   org.apache.hadoop:hadoop-core  0.20.205.0 - 
 1.0.0
   renamed 0.20.205, no need to update yet.
 [INFO]   org.codehaus.jackson:jackson-mapper-asl ... 1.8.6 - 
 1.9.3
   I suggest we upgrade to 1.8.7.
 [INFO]   org.jboss.netty:netty . 3.2.6.Final - 
 3.2.7.Final
   bugfix release
 [INFO]   org.apache.thrift:libthrift ... 0.7.0 - 
 0.8.0
   Is this a minor / bugfix release?  If so we should update, otherwise wait 
 until Avro 1.7.x
 [INFO]   org.slf4j:slf4j-api ... 1.6.3 - 
 1.6.4
 [INFO]   org.slf4j:slf4j-simple  1.6.3 - 
 1.6.4
   Minor bugfixes (http://www.slf4j.org/news.html)
 On the plugin side: (mvn versions:display-plugin-updates)
 [INFO]   maven-antrun-plugin .. 1.6 - 
 1.7 
   minor, looks safe: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWke1=w6nv2u85nkgqm0zxo3khyzdc8hazkhhvywjbuv...@mail.gmail.com%3E)
 [INFO]   maven-gpg-plugin . 1.3 - 1.4
   minor update 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/201108.mbox/%3CCA+nPnMw_3zQQCpzybQvo-QZFMCogvH31WEhxQnZ=cdzgxsr...@mail.gmail.com%3E)
 [INFO]   maven-checkstyle-plugin .. 2.6 - 2.8
   we avoided 2.7 before for some reason: 
 (http://mail-archives.apache.org/mod_mbox/maven-dev/201108.mbox/%3ccapoybqsvu+kup5vuce8rc6mjb9rykr2cpig+rvbe5o8teo6...@mail.gmail.com%3E)
   useful new feature: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C15365449.01320142181746.JavaMail.mark@MARK%3E)
 [INFO]   maven-surefire-plugin .. 2.10 - 2.11
   lots of bug fixes and long requested new features: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/201112.mbox/%3CCA+jQputH_uA2Ue6JqiHp1YeNo=qqxgcpdtgq9vv1aw_psqk...@mail.gmail.com%3E)
 [INFO]   maven-shade-plugin ... 1.4 - 1.5
   minor looks  safe: 
 (http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3C1076639049.01320107865464.JavaMail.benson@tinfoilhat.local%3E)
 [INFO]   maven-archetype-plugin ... 2.1 - 2.2
   
 http://mail-archives.apache.org/mod_mbox/maven-announce/20.mbox/%3CCALhtWkeLyc-tA2NCh3xYR06W+eGiWd46fS=R=ngjon14zrd...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-968) Avro C - avro_value_cmp_fast() may return garbage value for AVRO_STRING comparison

2012-01-23 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191626#comment-13191626
 ] 

Doug Cutting commented on AVRO-968:
---

This issue can be resolved as fixed, right?

 Avro C - avro_value_cmp_fast() may return garbage value for AVRO_STRING 
 comparison
 --

 Key: AVRO-968
 URL: https://issues.apache.org/jira/browse/AVRO-968
 Project: Avro
  Issue Type: Bug
  Components: c
Affects Versions: 1.6.1, 1.6.2, 1.7.0
 Environment: All. Currently using gcc 4.6.1 on Ubuntu 11.10.
Reporter: Vivek Nadkarni
Priority: Minor
 Fix For: 1.6.2, 1.7.0

 Attachments: 
 0001-AVRO-968.-C-Fixed-avro_value_cmp-on-string-values.patch, AVRO-968.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 Compiler shows a warning that variables may be used uninitialized in 
 avro_value_cmp_fast():
 /home/user/avro-trunk/lang/c/src/value.c: In function 'avro_value_cmp_fast':
 /home/user/avro-trunk/lang/c/src/value.c:387:13: warning: 'size2' may be used 
 uninitialized in this function [-Wuninitialized]
 /home/user/avro-trunk/lang/c/src/value.c:387:13: warning: 'size1' may be used 
 uninitialized in this function [-Wuninitialized]
 /home/user/avro-trunk/lang/c/src/value.c:388:11: warning: 'buf1' may be used 
 uninitialized in this function [-Wuninitialized]
 /home/user/avro-trunk/lang/c/src/value.c:388:11: warning: 'buf2' may be used 
 uninitialized in this function [-Wuninitialized]
 Examining the file shows that the warnings are real, and the variables size1, 
 buf1, size2, buf2 should be loaded before they are used. The simple fix is to 
 copy matching code from the function avro_value_equal_fast(). I will attach 
 that code in an upcoming patch. 
 Cheers,
 Vivek

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

2012-01-19 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189289#comment-13189289
 ] 

Doug Cutting commented on AVRO-997:
---

SpecificData#isEnum() overrides GenericData#isEnum() and returns true for 
instances of Enum.  So this should work.  Also, 
TestSpecificDatumWriter#testResolveUnion writes and reads an instance of a 
record with a field whose type is a union of null and an enum using 
generated, specific code.  So this appears to be tested.

Aaron, can you provide a simple, complete test that fails?  Also, what version 
of Avro are you using?

 Union of enum and null cannot be serialized
 ---

 Key: AVRO-997
 URL: https://issues.apache.org/jira/browse/AVRO-997
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.5.1
Reporter: Aaron Kimball

 I have a schema like:
 {code}
 [
 {
   type: enum,
   name: Gender,
   symbols: [M, F]
 },
 {
   type : record,
   name : Foo,
   fields : [
 { type : [Gender, null], name : gender },
 ...
   ]
 }
 ]
 {code}
 I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
 When I go to serialize this, I get:
 {code}Not in union 
 [{type:enum,name:Gender,symbols:[M,F]},null]: M
   at 
 org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
   at 
 org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

2012-01-19 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189446#comment-13189446
 ] 

Doug Cutting commented on AVRO-997:
---

SpecificDatumWriter can write both generic and specific instances, while 
GenericDatumWriter is only intended to correctly write generic instances.

 Union of enum and null cannot be serialized
 ---

 Key: AVRO-997
 URL: https://issues.apache.org/jira/browse/AVRO-997
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.5.1
Reporter: Aaron Kimball

 I have a schema like:
 {code}
 [
 {
   type: enum,
   name: Gender,
   symbols: [M, F]
 },
 {
   type : record,
   name : Foo,
   fields : [
 { type : [Gender, null], name : gender },
 ...
   ]
 }
 ]
 {code}
 I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
 When I go to serialize this, I get:
 {code}Not in union 
 [{type:enum,name:Gender,symbols:[M,F]},null]: M
   at 
 org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
   at 
 org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
   at 
 org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-991) Allow combining multiple Avro files within a stream. (no files on disk)

2012-01-18 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188830#comment-13188830
 ] 

Doug Cutting commented on AVRO-991:
---

 +1 for user-specified sync markers.

That should probably be a separate issue from the appended-stream tool.

 Allow combining multiple Avro files within a stream. (no files on disk)
 ---

 Key: AVRO-991
 URL: https://issues.apache.org/jira/browse/AVRO-991
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.1
Reporter: Frank Grimes

 It would be nice to be able to do as follows:
   cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine  
 combined-file.avro
 or similarly
   
   hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java 
 -jar avro-tools.jar streamcombine | hdfs -put - 
 hdfs://hadoop/combined-file.avro
 See the following thread for details: 
 http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%3cc08f1de9-97a8-4d28-b0ad-5e4a7f32f...@gmail.com%3e

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-994) TestFileSpanStorage.testTonsOfSpans() fails on my slow VM

2012-01-16 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187219#comment-13187219
 ] 

Doug Cutting commented on AVRO-994:
---

+1 That's a better way to fix such things, rather than just increasing the 
sleep time as we have before.  Thanks!

 TestFileSpanStorage.testTonsOfSpans() fails on my slow VM
 -

 Key: AVRO-994
 URL: https://issues.apache.org/jira/browse/AVRO-994
 Project: Avro
  Issue Type: Bug
Reporter: James Baldassari
Priority: Minor
 Attachments: AVRO-994.patch


 {noformat}
 Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.554 sec 
  FAILURE!
 testTonsOfSpans(org.apache.avro.ipc.trace.TestFileSpanStorage)  Time elapsed: 
 3.853 sec   FAILURE!
 java.lang.AssertionError: expected:5 but was:42356
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at org.junit.Assert.assertEquals(Assert.java:456)
 at 
 org.apache.avro.ipc.trace.TestFileSpanStorage.testTonsOfSpans(TestFileSpanStorage.java:70)
 {noformat}
 The issue seems to be the {{Thread.sleep(2000)}} on line 66.  Doubling this 
 to 4000ms causes the test to pass.  In general it might be better to make 
 this sleep event-based rather than using a fixed sleep time.  If that isn't 
 possible, then maybe using some sort of a retry loop would work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-991) Allow combining multiple Avro files within a stream. (no files on disk)

2012-01-16 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187227#comment-13187227
 ] 

Doug Cutting commented on AVRO-991:
---

For the record, the thinking behind the varied sync marker is that it makes 
collisions less likely.  In theory this is not true, but in practice my concern 
was that, once a value was fixed and known, there'd be a significantly higher 
probability that someone would include it in some data.  Perhaps that's not 
correct, though.

As for expanding the spec, as I mentioned above, we can do that at present, 
since the file's magic number can never be the start of a valid block.  So if a 
block ever starts with the magic number then a reader could assume that it's an 
appended file.  It's perhaps not the way one would design an appendable format 
from scratch, but I think it's workable.

 Allow combining multiple Avro files within a stream. (no files on disk)
 ---

 Key: AVRO-991
 URL: https://issues.apache.org/jira/browse/AVRO-991
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.1
Reporter: Frank Grimes

 It would be nice to be able to do as follows:
   cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine  
 combined-file.avro
 or similarly
   
   hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java 
 -jar avro-tools.jar streamcombine | hdfs -put - 
 hdfs://hadoop/combined-file.avro
 See the following thread for details: 
 http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%3cc08f1de9-97a8-4d28-b0ad-5e4a7f32f...@gmail.com%3e

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-991) Allow combining multiple Avro files within a stream. (no files on disk)

2012-01-16 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187346#comment-13187346
 ] 

Doug Cutting commented on AVRO-991:
---

On second thought, I don't think we ought to add this to the spec.  I think a 
tool that can read appended streams and write a single file would be useful, 
but I don't think we should require every implementation to be able to parse 
appended files.  That would be an incompatible change, and, as Scott points 
out, would also create difficult to split files.

I also think Scott's idea of permitting user-spec'd sync markers could be 
useful.

 Allow combining multiple Avro files within a stream. (no files on disk)
 ---

 Key: AVRO-991
 URL: https://issues.apache.org/jira/browse/AVRO-991
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.1
Reporter: Frank Grimes

 It would be nice to be able to do as follows:
   cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine  
 combined-file.avro
 or similarly
   
   hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java 
 -jar avro-tools.jar streamcombine | hdfs -put - 
 hdfs://hadoop/combined-file.avro
 See the following thread for details: 
 http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%3cc08f1de9-97a8-4d28-b0ad-5e4a7f32f...@gmail.com%3e

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-991) Allow combining multiple Avro files within a stream. (no files on disk)

2012-01-13 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185746#comment-13185746
 ] 

Doug Cutting commented on AVRO-991:
---

I think this would work.  We'd need to be able to distinguish the start of a 
block from the start of the next file.  A block starts with the count of items 
in it, encoded as a variable-length zig-zag-encoded long.  A file starts with 
ASCII 'O'.  Interpreted as a variable-length zig-zag encoded long, this is -40, 
which is an invalid item count.  So a DataFileStream would need to, when the 
item count is -40, try to read a file header, and if its schema is compatible, 
update its sync and codec and keep reading.


 Allow combining multiple Avro files within a stream. (no files on disk)
 ---

 Key: AVRO-991
 URL: https://issues.apache.org/jira/browse/AVRO-991
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.1
Reporter: Frank Grimes

 It would be nice to be able to do as follows:
   cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine  
 combined-file.avro
 or similarly
   
   hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java 
 -jar avro-tools.jar streamcombine | hdfs -put - 
 hdfs://hadoop/combined-file.avro
 See the following thread for details: 
 http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%3cc08f1de9-97a8-4d28-b0ad-5e4a7f32f...@gmail.com%3e

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-839) Implement builder pattern in generated record classes that sets default values when omitted

2012-01-10 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183595#comment-13183595
 ] 

Doug Cutting commented on AVRO-839:
---

 we could add the methods back as deprecated, with a default implementation 
 and some javadoc so that other folks won't be confused by the change and code 
 with or without an override on these will still work. Then in 1.7.x we can 
 drop it and there will be a better paper trail for others to follow on where 
 the new non-deprecated versions are.

+1  We should adopt this as a standard process for incompatible changes to 
public APIs.


 Implement builder pattern in generated record classes that sets default 
 values when omitted
 ---

 Key: AVRO-839
 URL: https://issues.apache.org/jira/browse/AVRO-839
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: James Baldassari
Assignee: James Baldassari
 Fix For: 1.6.0

 Attachments: AVRO-839-v2.patch, AVRO-839-v3.patch, AVRO-839-v4.patch, 
 AVRO-839-v4.patch, AVRO-839-v5.patch, AVRO-839.patch, AVRO-839.patch, 
 AVRO-839.patch


 This is an idea for an improvement to the SpecificCompiler-generated record 
 classes.  There are two main issues to address:
 # Default values specified in schemas are only used at read time, not when 
 writing/serializing records.  For example, a NullPointerException is thrown 
 when attempting to write a record that has an uninitialized array or string 
 type.  I'm sure this was done for good reasons, like giving users maximum 
 control and preventing unnecessary garbage collection, but I think it's also 
 somewhat confusing and unintuitive for new users (myself included).
 # Users have to create their own factory classes/methods for every record 
 type, both to ensure that all non-primitive members are initialized and to 
 facilitate the construction and initialization of record instances (i.e. 
 constructing and setting values in a single statement).
 These issues have been discussed previously here:
 * [http://search-hadoop.com/m/iDVTn1JVeSR1]
 * AVRO-726
 * AVRO-770
 * [http://search-hadoop.com/m/JuY1V16pwxh1]
 I'd like to propose a solution that is used by at least one other messaging 
 framework.  For each generated record class there will be a public static 
 inner class called Builder.  The Builder inner class has the same fields as 
 the record class, as well as accessors and mutators for each of these fields. 
  Whenever a mutator method is called, the Builder sets a boolean flag 
 indicating that the field has been set.  All mutators return a reference to 
 'this', so it's possible to chain a series of setter invocations, which makes 
 it really easy to construct records in a single statement.  The Builder also 
 has a build() method which constructs a record instance using the values that 
 were set in the Builder.  When the build() method is invoked, if there are 
 any fields that have not been set but have default values as defined in the 
 schema, the Builder will set the values of these fields using their defaults.
 One nice thing about implementing the builder pattern in a static inner 
 Builder class rather than in the record itself is that this enhancement will 
 be completely backwards-compatible with existing code.  The record class 
 itself would not change, and the public fields would still be there, so 
 existing code would still work.  Users would have the option to use the 
 Builder or continue constructing records manually.  Eventually the public 
 fields could be phased out, and the record would be made immutable.  All 
 changes would have to be done through the Builder.
 Here is an example of what this might look like:
 {code}
 // Person.newBuilder() returns a new Person.Builder instance
 // All Person.Builder setters return 'this' allowing us to chain set calls 
 together for convenience
 // Person.Builder.build() returns a Person instance after setting any 
 uninitialized values that have defaults
 Person me = 
 Person.newBuilder().setName(James).setCountry(US).setState(MA).build();
 // We still have direct access to Person's members, so the records are 
 backwards-compatible
 me.state = CA;
 // Person has accessor methods now so that the public fields can be phased 
 out later
 System.out.println(me.getState());
 // No NPE here because the arrayPerson field that stores this person's 
 friends has been automatically 
 // initialized by the Builder to a new java.util.ArrayListPerson due to a 
 @java_class annotation in the IDL
 System.out.println(me.getFriends().size());
 {code}
 What do people think about this approach?  Any other ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please

[jira] [Commented] (AVRO-986) Avro files generated from avro-c dont work with the Java mapred implementation.

2011-12-22 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174936#comment-13174936
 ] 

Doug Cutting commented on AVRO-986:
---

+1 This patch sounds like the right way to fix this to me.

If we were to instead fix this in Java then I don't think we should try to make 
the splitter smarter, since splitting is single-threaded and that's not 
scalable.  Rather we should make sync(0) skip over the metadata.  But there 
probably shouldn't be any sync markers in the metadata anyway...

 Avro files generated from avro-c dont work with the Java mapred 
 implementation.
 ---

 Key: AVRO-986
 URL: https://issues.apache.org/jira/browse/AVRO-986
 Project: Avro
  Issue Type: Bug
  Components: c, java
 Environment: avro-c 1.6.2-SNAPSHOT
 avro-java 1.6.2-SNAPSHOT
 hadoop 0.20.2
Reporter: Michael Cooper
Priority: Critical
  Labels: c, hadoop, java, mapreduce
 Attachments: 0001-Remove-sync-marker-from-metadata-in-header.patch


 When a file generated from the Avro-C implementation is fed into Hadoop, it 
 will fail with Block size invalid or too large for this implementation: -49.
 This is caused by the sync marker, namely the one that Avro-C puts into the 
 header...
 The org.apache.avro.mapred.AvroRecordReader uses a FileSplit object to work 
 out where it should read from, but this class is not particularly smart, it 
 just divides the file up into equal size chunks, the first being with 
 position 0.
 So org.apache.avro.mapred.AvroRecordReader gets 0 as the start of its chunk, 
 and calls
 {code:title=AvroRecordReader.java}reader.sync(split.getStart());   // sync to 
 start{code}
 Then the org.apache.avro.file.DataFileReader::seek() goes to 0, then searches 
 for a sync marker
 It encounters one at position 32, the one in the header metadata map, 
 avro.sync
 No other implementations add the sync marker in the metadata map, and none 
 read it from there, not even the C version.
 I suggest we remove this from the header as the simplest solution.
 Another solution would be to create an AvroFileSplit class in mapred that 
 knows where the blocks are, and provides the correct locations in the first 
 place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-570) python implementation of mapreduce connector

2011-12-20 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173450#comment-13173450
 ] 

Doug Cutting commented on AVRO-570:
---

I don't seem to have avro installed in /usr/lib/python.  The tests you describe 
above find the version in my build directory, as expected.  The scripts in /tmp 
look fine.  Yet I still see:

{code}
  [py-test] 
./home/cutting/src/avro/trunk/lang/py/build/src/avro/tether/tether_task_runner.py:24:
 RuntimeWarning: Parent module 'avro.tether' not found while handling absolute 
import
  [py-test]   from avro import tether
{code}

 python implementation of mapreduce connector
 

 Key: AVRO-570
 URL: https://issues.apache.org/jira/browse/AVRO-570
 Project: Avro
  Issue Type: New Feature
  Components: python
Affects Versions: 1.6.0
Reporter: Doug Cutting
Assignee: Jeremy Lewi
Priority: Critical
  Labels: hadoop
 Fix For: 1.7.0

 Attachments: AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, 
 AVRO-570.patch, AVRO-570.patch, AVRO-570.patch


 AVRO-512 defines protocols for implementing mapreduce tasks.  It would be 
 good to have a Python implementation of this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-982) NettyTransceiver: can hang on connection interruption

2011-12-15 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170626#comment-13170626
 ] 

Doug Cutting commented on AVRO-982:
---

It would be great to have a test that this fixes.  I tried some simple changes 
to TestNettyServerWithCallbacks to reproduce the problem and could not.  Can 
you devise a test?

 NettyTransceiver: can hang on connection interruption
 -

 Key: AVRO-982
 URL: https://issues.apache.org/jira/browse/AVRO-982
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Bruno Dumon
Priority: Minor
 Attachments: AVRO-982.patch


 When stopping my avro server, I noticed that my avro client was hanging. This 
 makes it impossible for my client to retry the operation, as it hangs inside 
 the avro code:
 {noformat}
 pool-2-thread-1 prio=10 tid=0x7fc66840e800 nid=0x75fc waiting on 
 condition [0x7fc674176000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007d7471bd0 (a 
 java.util.concurrent.CountDownLatch$Sync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
 at org.apache.avro.ipc.CallFuture.get(CallFuture.java:116)
 at org.apache.avro.ipc.Requestor.request(Requestor.java:106)
 at 
 org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:72)
 {noformat}
 In a similar situation elsewhere in the NettyTransceiver (method 
 exceptionCaught), the pending requests are canceled. It seems appropriate to 
 do that also on closed connections. I'll attach a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-724) C implementation does not write datum values that are larger than the memory write buffer (currently 16K)

2011-12-13 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168536#comment-13168536
 ] 

Doug Cutting commented on AVRO-724:
---

 The ideal solution would be to have fixed length block header fields but that 
 would require a change to the spec.

This makes sense.  To do this we'd probably want to increment the file format's 
magic number, i.e., from {'O','b','j',1} to {'O','b','j',2}.  And it would be 
best to update all implementations to read the new format before making it the 
default for any implementation.

 C implementation does not write datum values that are larger than the memory 
 write buffer (currently 16K)
 -

 Key: AVRO-724
 URL: https://issues.apache.org/jira/browse/AVRO-724
 Project: Avro
  Issue Type: Bug
  Components: c
Affects Versions: 1.4.1
Reporter: Jeremy Hinegardner

 The current C implementation does not allow for datum values greater than 16K.
 The {{avro_file_writer_append}} flushes blocks to disk over time, but does 
 not deal with the single case of a single datum being larger than 
 {{avro_file_writer_t.datum_buffer}}.  This is noted in the source code:
 {code:title=datafile.c:294-313}
 int avro_file_writer_append(avro_file_writer_t w, avro_datum_t datum)
 {
 int rval;  
 if (!w || !datum) {
 return EINVAL; 
 }
 rval = avro_write_data(w-datum_writer, w-writers_schema, datum);
 if (rval) {
 check(rval, file_write_block(w));
 rval =
 avro_write_data(w-datum_writer, w-writers_schema, datum);
 if (rval) {
 /* TODO: if the datum encoder larger than our buffer,
just write a single large datum */
 return rval;   
 }
 }
 w-block_count++;  
 return 0;  
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-593) Avro mapreduce apis incompatible with hadoop 0.20.2

2011-12-12 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167925#comment-13167925
]

Doug Cutting commented on AVRO-593:
---

Garrett, I just glanced at this and it looks great! You've factored things so
that much of the code is shared between the 'mapred' and 'mapreduce'
implementations.

The stuff in the 'file' and 'io' packages should probably be renamed.
Currently the 'io' and 'file' packages are in the main avro jar, which does not
require Hadoop. I think it's best not to split packages across multiple jars
and these classes depend on Hadoop so probably belong in the avro-mapred jar.
Perhaps they should be renamed 'org.apache.avro.mapred.{io,file}'?

Also, do you intend this code to be contributed to Apache Avro? (I ask as a
legal formality.)

Avro mapreduce apis incompatible with hadoop 0.20.2
---

The avro api's for hadoop use the hadoop mapreduce api that has been
deprecated. A new avro mapreduce api should be implemented for hadoop 0.20
and higher.

[jira] [Commented] (AVRO-971) IDL Import from project classpath

2011-12-07 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164881#comment-13164881
 ] 

Doug Cutting commented on AVRO-971:
---

In idl.jj, shouldn't we update ImportSchema() too?

Also, it'd be good to add a test for this.

 IDL Import from project classpath
 -

 Key: AVRO-971
 URL: https://issues.apache.org/jira/browse/AVRO-971
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.1
 Environment: Maven java projects
Reporter: Victor Chau
Priority: Minor
  Labels: patch
 Attachments: ImportFromClassPath.patch


 Currently, it looks like the only option to importing another schema in IDL 
 is to place the file being imported in the same directory as that of the 
 importing avdl.  In a setup where there are avdl's that are spread among 
 several maven projects that are owned by different teams, this is 
 logistically difficult to manage.
 When using the avro-maven-plugin, I would like to be able to just create a 
 dependency from my project on another jar that contains the avdl I am want to 
 import and have Avro be smart enough to look for it in the classpath of the 
 project containing the avdl when compiling my avdl.
 Attached is a working patch that will:
   1. Change the IDLProtocolMojo class to lookup the current project's 
 classpath and create a new ClassLoader.
   2. Give the Idl compiler class the ClassLoader before parsing the avdl.
   3. If the Idl class encounters an import that it cannot resolve to the 
 local directory while parsing, it will try to use the ClassLoader to load up 
 the file being imported.
 The patch spans the Avro 1.6.1 tag of the Java avro, avro-compiler, and 
 avro-maven-plugin projects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-969) Make possible usage of SpecificDatumWriter in avro-mapred

2011-12-06 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163776#comment-13163776
 ] 

Doug Cutting commented on AVRO-969:
---

AVRO-966 is a bug.  I supplied a patch there.

SpecificDatumWriter might be a bit more efficient.  Reflection is not in 
general used for specific objects though, even when ReflectDatumReader is used. 
 Rather ReflectDatumReader detects the specific object and uses 
SpecificDatumReader, but that detection adds a small cost.

 Make possible usage of SpecificDatumWriter in avro-mapred
 -

 Key: AVRO-969
 URL: https://issues.apache.org/jira/browse/AVRO-969
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.1
Reporter: Vyacheslav Zholudev

 I realized that ReflectDatumWriter is always used when running mapred job (in 
 AvroOutputFormat.java). Sometimes it leads to bugs like in AVRO-966.
 Why not just provide a property like {{WRITER_IS_REFLECT = 
 avro.map.writer.is.reflect;}} to make a decision which DatumWriter should 
 be used. 
 I created a small patch to solve this:
 {code:title=avro-mapred.patch|borderStyle=solid}
 Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java
 ===
 --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java  
 (revision 1209417)
 +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java  
 (revision )
 @@ -53,6 +53,8 @@
/** The configuration key for reflection-based map output representation. 
 */
public static final String MAP_OUTPUT_IS_REFLECT = 
 avro.map.output.is.reflect;
  
 +  public static final String WRITER_IS_REFLECT = 
 avro.map.writer.is.reflect;
 +
/** Configure a job's map input schema. */
public static void setInputSchema(JobConf job, Schema s) {
  job.set(INPUT_SCHEMA, s.toString());
 Index: 
 lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java
 ===
 --- 
 lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java 
 (revision 1209417)
 +++ 
 lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java 
 (revision )
 @@ -23,6 +23,7 @@
  import java.util.Map;
  import java.net.URLDecoder;
  
 +import org.apache.avro.specific.SpecificDatumWriter;
  import org.apache.hadoop.io.NullWritable;
  import org.apache.hadoop.fs.FileSystem;
  import org.apache.hadoop.fs.Path;
 @@ -102,8 +103,9 @@
? AvroJob.getMapOutputSchema(job)
: AvroJob.getOutputSchema(job);
  
 -final DataFileWriterT writer =
 -  new DataFileWriterT(new ReflectDatumWriterT());
 +final DataFileWriterT writer = 
 job.getBoolean(AvroJob.WRITER_IS_REFLECT, false) ?
 +  new DataFileWriterT(new ReflectDatumWriterT()) :
 +  new DataFileWriterT(new SpecificDatumWriterT());
  
  configureDataFileWriter(writer, job);
 {code}
 Does it make sense? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-965) Enhance the IDL parser to allow properties for protocols and messages

2011-12-06 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163820#comment-13163820
 ] 

Doug Cutting commented on AVRO-965:
---

The properties might be instead parsed in ProtocolBody.  E.g. that might look 
something like:

{code}
{
IMPORT ... {   }
|
( SchemaProperty(props) )*
( NamedSchemaDeclaration(props) { ... } | MessageDeclaration(props) {  ... }
}
{code}

Does that make sense?

 Enhance the IDL parser to allow properties for protocols and messages
 -

 Key: AVRO-965
 URL: https://issues.apache.org/jira/browse/AVRO-965
 Project: Avro
  Issue Type: Improvement
Reporter: George Fletcher
Priority: Minor

 Enhance the IDL parser to support arbitrary properties for protocol and 
 message types. This will allow for attaching metadata to a protocol or 
 message and can be used for versioning and in some cases language annotations.
 This was partly discussed as part of JIRA ticket 886
 https://issues.apache.org/jira/browse/AVRO-886

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-969) Make possible usage of SpecificDatumWriter in avro-mapred

2011-12-05 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163006#comment-13163006
 ] 

Doug Cutting commented on AVRO-969:
---

ReflectDatumWriter should be able to correctly write a superset of the types 
that SpecificDatumWriter can write, so this property should not be needed.  
That said, it might be good to be able to override the DatumWriter and/or 
DatumReader classes used by Avro's mapred API.  This might permit, e.g., 
ThriftDatumWriter to be used.  So the patch that might be best is to switch to 
using an avro.input.datumReader, avro.map_output.datumWriter, and 
avro.output.datumWriter properties that name classes whose constructor accepts 
a schema parameter.

 Make possible usage of SpecificDatumWriter in avro-mapred
 -

 Key: AVRO-969
 URL: https://issues.apache.org/jira/browse/AVRO-969
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.1
Reporter: Vyacheslav Zholudev

 I realized that ReflectDatumWriter is always used when running mapred job (in 
 AvroOutputFormat.java). Sometimes it leads to bugs like in AVRO-966.
 Why not just provide a property like {{WRITER_IS_REFLECT = 
 avro.map.writer.is.reflect;}} to make a decision which DatumWriter should 
 be used. 
 I created a small patch to solve this:
 {code:title=avro-mapred.patch|borderStyle=solid}
 Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java
 ===
 --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java  
 (revision 1209417)
 +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java  
 (revision )
 @@ -53,6 +53,8 @@
/** The configuration key for reflection-based map output representation. 
 */
public static final String MAP_OUTPUT_IS_REFLECT = 
 avro.map.output.is.reflect;
  
 +  public static final String WRITER_IS_REFLECT = 
 avro.map.writer.is.reflect;
 +
/** Configure a job's map input schema. */
public static void setInputSchema(JobConf job, Schema s) {
  job.set(INPUT_SCHEMA, s.toString());
 Index: 
 lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java
 ===
 --- 
 lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java 
 (revision 1209417)
 +++ 
 lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java 
 (revision )
 @@ -23,6 +23,7 @@
  import java.util.Map;
  import java.net.URLDecoder;
  
 +import org.apache.avro.specific.SpecificDatumWriter;
  import org.apache.hadoop.io.NullWritable;
  import org.apache.hadoop.fs.FileSystem;
  import org.apache.hadoop.fs.Path;
 @@ -102,8 +103,9 @@
? AvroJob.getMapOutputSchema(job)
: AvroJob.getOutputSchema(job);
  
 -final DataFileWriterT writer =
 -  new DataFileWriterT(new ReflectDatumWriterT());
 +final DataFileWriterT writer = 
 job.getBoolean(AvroJob.WRITER_IS_REFLECT, false) ?
 +  new DataFileWriterT(new ReflectDatumWriterT()) :
 +  new DataFileWriterT(new SpecificDatumWriterT());
  
  configureDataFileWriter(writer, job);
 {code}
 Does it make sense? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-970) (Java) Allow users to implement their own Codecs

2011-12-05 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163140#comment-13163140
 ] 

Doug Cutting commented on AVRO-970:
---

I think this was kept private originally out of concern that the API might not 
be stable.  But CodecFactory is public, so it doesn't make much sense for Codec 
to be private.  I'm +1 for making this change.  Would you like to provide a 
patch?  Should we add a test that implements a Codec in a different package, 
perhaps one that just performs a bitwise-NOT of the data?

 (Java) Allow users to implement their own Codecs
 

 Key: AVRO-970
 URL: https://issues.apache.org/jira/browse/AVRO-970
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Peter Nimmervoll

 Currently the base class for all codecs (Codec) is not public which makes it
 impossible to write own codecs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-953) python ipc with path

2011-11-23 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156379#comment-13156379
 ] 

Doug Cutting commented on AVRO-953:
---

How hard would it be to add a test of the new functionality?

 python ipc with path
 

 Key: AVRO-953
 URL: https://issues.apache.org/jira/browse/AVRO-953
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Craig Landry
Priority: Minor
  Labels: patch
 Attachments: req_resource.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Currently the ipc.HTTPTransceiver class has a hardcoded path of '/'.  This 
 improvement request is to allow users to provide an override path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-570) python implementation of mapreduce connector

2011-11-23 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156391#comment-13156391
 ] 

Doug Cutting commented on AVRO-570:
---

Finally looking at this.  The Java changes look reasonable and all Java tests 
pass for me.

Python tests fail with:

{code}

[py-test] 
./home/cutting/src/avro/trunk/lang/py/build/src/avro/tether/tether_task_runner.py:24:
 RuntimeWarning: Parent module 'avro.tether' not found while handling absolute 
import
  [py-test]   from avro import tether
  [py-test] INFO:root:tether_task_runner.__main__: Task: 
word_count_task.WordCountTask
  [py-test] INFO:TetherTask:TetherTask.open: Opening connection to parent 
server on port=42343
  [py-test] MockParentResponder: Recieved 'configure': inputPort=59800
  [py-test] localhost.localdomain - - [23/Nov/2011 15:15:17] POST / HTTP/1.1 
200 -
  [py-test] .E
  [py-test] 
==
  [py-test] space, 
  [py-test] ERROR: test1 (test_tether_word_count.TestTetherWordCount)
  [py-test]fields: [ {name: foo, type: string} ] },
  [py-test] 
--
  [py-test]  {name: ReferencedRecord, type: record, 
  [py-test] Traceback (most recent call last):
  [py-test]fields: [ {name: bar, type: double} ] },
  [py-test]   File 
/home/cutting/src/avro/trunk/lang/py/build/test/test_tether_word_count.py, 
line 187, in test1
  [py-test]  {name: TestError,
  [py-test] proc=subprocess.Popen(args)
  [py-test]   type: error, fields: [ {name: message, type: 
string} ]
  [py-test]   File /usr/lib/python2.6/subprocess.py, line 623, in __init__
  [py-test]  }
  [py-test] errread, errwrite)
  [py-test]  ],
  [py-test]   File /usr/lib/python2.6/subprocess.py, line 1141, in 
_execute_child
  [py-test] 
  [py-test] raise child_exception
  [py-test]  messages: {
  [py-test] OSError: [Errno 2] No such file or directory
  [py-test]  echo: {
  [py-test] 
  [py-test]  request: [{name: qualified, 
  [py-test] 
--
  [py-test]  type: ReferencedRecord}],
  [py-test] Ran 45 tests in 8.061s
{code}

Any idea what's causing that?

 python implementation of mapreduce connector
 

 Key: AVRO-570
 URL: https://issues.apache.org/jira/browse/AVRO-570
 Project: Avro
  Issue Type: New Feature
  Components: python
Affects Versions: 1.6.0
Reporter: Doug Cutting
Assignee: Jeremy Lewi
Priority: Critical
  Labels: hadoop
 Fix For: 1.7.0

 Attachments: AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, 
 AVRO-570.patch, AVRO-570.patch, AVRO-570.patch


 AVRO-512 defines protocols for implementing mapreduce tasks.  It would be 
 good to have a Python implementation of this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-951) Records with field named data collide with new builder code from specific compiler

2011-11-04 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144097#comment-13144097
 ] 

Doug Cutting commented on AVRO-951:
---

Oops.  Looks like we both worked on this in parallel and with slightly 
different approaches.  Does anyone have a preference?

 Records with field named data collide with new builder code from specific 
 compiler
 

 Key: AVRO-951
 URL: https://issues.apache.org/jira/browse/AVRO-951
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.0
Reporter: Alex Miller
Assignee: Doug Cutting
Priority: Blocker
 Fix For: 1.6.1

 Attachments: AVRO-951.patch, AVRO-951.patch


 When I updated my dependencies from 1.5.x to 1.6.0 I found that one of my 
 generated specific data classes failed to compile.  The schema definition is:
 {code}
   record DataResponse {
 string queryId;
 int startRow;
 boolean more;
 arrayarrayunion {IRI, BNode, PlainLiteral, TypedLiteral, 
string, boolean, int, long, float, double,
null}  data;
   }
 {code}
 which I'm using to create: 
 {code}
 {
 type : record,
 name : DataResponse,
 fields : [ {
   name : queryId,
   type : string
 }, {
   name : startRow,
   type : int
 }, {
   name : more,
   type : boolean
 }, {
   name : data,
   type : {
 type : array,
 items : {
   type : array,
   items : [ IRI, BNode, PlainLiteral, TypedLiteral, 
 string, boolean, int, long, float, double, null ]
 }
   }
 }
 {code}
 which generates this code in the specific compiler: 
 {code}
   public static class Builder extends 
 org.apache.avro.specific.SpecificRecordBuilderBaseDataResponse
 implements org.apache.avro.data.RecordBuilderDataResponse {
 private java.lang.CharSequence queryId;
 private int startRow;
 private boolean more;
 // *** local field named data
 private java.util.Listjava.util.Listjava.lang.Object data;
// snipped some
 
 /** Creates a Builder by copying an existing DataResponse instance */
 private Builder(sherpa.protocol.DataResponse other) {
 super(sherpa.protocol.DataResponse.SCHEMA$);
   if (isValidValue(fields[0], other.queryId)) {
 // *** Call intended to go to super class data field
 queryId = (java.lang.CharSequence) data.deepCopy(fields[0].schema(), 
 other.queryId);
 fieldSetFlags[0] = true;
   }
   if (isValidValue(fields[1], other.startRow)) {
 startRow = (java.lang.Integer) data.deepCopy(fields[1].schema(), 
 other.startRow);
 fieldSetFlags[1] = true;
   }
   if (isValidValue(fields[2], other.more)) {
 more = (java.lang.Boolean) data.deepCopy(fields[2].schema(), 
 other.more);
 fieldSetFlags[2] = true;
   }
   if (isValidValue(fields[3], other.data)) {
 data = (java.util.Listjava.util.Listjava.lang.Object) 
 data.deepCopy(fields[3].schema(), other.data);
 fieldSetFlags[3] = true;
   }
 }
 {code}
 If you note the two ***'ed comments above, the first is the locally generated 
 data field.  The second is a reference to a super-class's field, also named 
 data (although it's shadowed by the local data field).  The super class is 
 org.apache.avro.data.RecordBuilderBase.  
 Seems like any of the protected fields at that point could potentially 
 collide with actual record field names (schema, fields, fieldSetFlags 
 would all have the same problem).  Maybe if those fields were accessed via 
 getters in the generated code, the local fields could shadow the super class 
 without issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-951) Records with field named data collide with new builder code from specific compiler

2011-11-04 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144158#comment-13144158
 ] 

Doug Cutting commented on AVRO-951:
---

The class did not exist before 1.6.0.  It has created a regression against 
1.5.x, so I think it's probably fair to change its API in 1.6.1.

 Records with field named data collide with new builder code from specific 
 compiler
 

 Key: AVRO-951
 URL: https://issues.apache.org/jira/browse/AVRO-951
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.0
Reporter: Alex Miller
Assignee: Doug Cutting
Priority: Blocker
 Fix For: 1.6.1

 Attachments: AVRO-951.patch, AVRO-951.patch


 When I updated my dependencies from 1.5.x to 1.6.0 I found that one of my 
 generated specific data classes failed to compile.  The schema definition is:
 {code}
   record DataResponse {
 string queryId;
 int startRow;
 boolean more;
 arrayarrayunion {IRI, BNode, PlainLiteral, TypedLiteral, 
string, boolean, int, long, float, double,
null}  data;
   }
 {code}
 which I'm using to create: 
 {code}
 {
 type : record,
 name : DataResponse,
 fields : [ {
   name : queryId,
   type : string
 }, {
   name : startRow,
   type : int
 }, {
   name : more,
   type : boolean
 }, {
   name : data,
   type : {
 type : array,
 items : {
   type : array,
   items : [ IRI, BNode, PlainLiteral, TypedLiteral, 
 string, boolean, int, long, float, double, null ]
 }
   }
 }
 {code}
 which generates this code in the specific compiler: 
 {code}
   public static class Builder extends 
 org.apache.avro.specific.SpecificRecordBuilderBaseDataResponse
 implements org.apache.avro.data.RecordBuilderDataResponse {
 private java.lang.CharSequence queryId;
 private int startRow;
 private boolean more;
 // *** local field named data
 private java.util.Listjava.util.Listjava.lang.Object data;
// snipped some
 
 /** Creates a Builder by copying an existing DataResponse instance */
 private Builder(sherpa.protocol.DataResponse other) {
 super(sherpa.protocol.DataResponse.SCHEMA$);
   if (isValidValue(fields[0], other.queryId)) {
 // *** Call intended to go to super class data field
 queryId = (java.lang.CharSequence) data.deepCopy(fields[0].schema(), 
 other.queryId);
 fieldSetFlags[0] = true;
   }
   if (isValidValue(fields[1], other.startRow)) {
 startRow = (java.lang.Integer) data.deepCopy(fields[1].schema(), 
 other.startRow);
 fieldSetFlags[1] = true;
   }
   if (isValidValue(fields[2], other.more)) {
 more = (java.lang.Boolean) data.deepCopy(fields[2].schema(), 
 other.more);
 fieldSetFlags[2] = true;
   }
   if (isValidValue(fields[3], other.data)) {
 data = (java.util.Listjava.util.Listjava.lang.Object) 
 data.deepCopy(fields[3].schema(), other.data);
 fieldSetFlags[3] = true;
   }
 }
 {code}
 If you note the two ***'ed comments above, the first is the locally generated 
 data field.  The second is a reference to a super-class's field, also named 
 data (although it's shadowed by the local data field).  The super class is 
 org.apache.avro.data.RecordBuilderBase.  
 Seems like any of the protected fields at that point could potentially 
 collide with actual record field names (schema, fields, fieldSetFlags 
 would all have the same problem).  Maybe if those fields were accessed via 
 getters in the generated code, the local fields could shadow the super class 
 without issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-951) Records with field named data collide with new builder code from specific compiler

2011-11-03 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143550#comment-13143550
 ] 

Doug Cutting commented on AVRO-951:
---

 Maybe if those fields were accessed via getters in the generated code [ ... ]

But then the base class getter couldn't be called getData() as that would 
conflict with the generated getter.  I think it would be better to rename base 
class fields to include a trailing dollar-sign, e.g., data$.  This should also 
be done for method parameters: 'other' above should be 'other$'.

 Records with field named data collide with new builder code from specific 
 compiler
 

 Key: AVRO-951
 URL: https://issues.apache.org/jira/browse/AVRO-951
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.0
Reporter: Alex Miller
 Fix For: 1.6.1


 When I updated my dependencies from 1.5.x to 1.6.0 I found that one of my 
 generated specific data classes failed to compile.  The schema definition is:
 {code}
   record DataResponse {
 string queryId;
 int startRow;
 boolean more;
 arrayarrayunion {IRI, BNode, PlainLiteral, TypedLiteral, 
string, boolean, int, long, float, double,
null}  data;
   }
 {code}
 which I'm using to create: 
 {code}
 {
 type : record,
 name : DataResponse,
 fields : [ {
   name : queryId,
   type : string
 }, {
   name : startRow,
   type : int
 }, {
   name : more,
   type : boolean
 }, {
   name : data,
   type : {
 type : array,
 items : {
   type : array,
   items : [ IRI, BNode, PlainLiteral, TypedLiteral, 
 string, boolean, int, long, float, double, null ]
 }
   }
 }
 {code}
 which generates this code in the specific compiler: 
 {code}
   public static class Builder extends 
 org.apache.avro.specific.SpecificRecordBuilderBaseDataResponse
 implements org.apache.avro.data.RecordBuilderDataResponse {
 private java.lang.CharSequence queryId;
 private int startRow;
 private boolean more;
 // *** local field named data
 private java.util.Listjava.util.Listjava.lang.Object data;
// snipped some
 
 /** Creates a Builder by copying an existing DataResponse instance */
 private Builder(sherpa.protocol.DataResponse other) {
 super(sherpa.protocol.DataResponse.SCHEMA$);
   if (isValidValue(fields[0], other.queryId)) {
 // *** Call intended to go to super class data field
 queryId = (java.lang.CharSequence) data.deepCopy(fields[0].schema(), 
 other.queryId);
 fieldSetFlags[0] = true;
   }
   if (isValidValue(fields[1], other.startRow)) {
 startRow = (java.lang.Integer) data.deepCopy(fields[1].schema(), 
 other.startRow);
 fieldSetFlags[1] = true;
   }
   if (isValidValue(fields[2], other.more)) {
 more = (java.lang.Boolean) data.deepCopy(fields[2].schema(), 
 other.more);
 fieldSetFlags[2] = true;
   }
   if (isValidValue(fields[3], other.data)) {
 data = (java.util.Listjava.util.Listjava.lang.Object) 
 data.deepCopy(fields[3].schema(), other.data);
 fieldSetFlags[3] = true;
   }
 }
 {code}
 If you note the two ***'ed comments above, the first is the locally generated 
 data field.  The second is a reference to a super-class's field, also named 
 data (although it's shadowed by the local data field).  The super class is 
 org.apache.avro.data.RecordBuilderBase.  
 Seems like any of the protected fields at that point could potentially 
 collide with actual record field names (schema, fields, fieldSetFlags 
 would all have the same problem).  Maybe if those fields were accessed via 
 getters in the generated code, the local fields could shadow the super class 
 without issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-946) GenericData.resolveUnion() performance improvement

2011-11-02 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142216#comment-13142216
 ] 

Doug Cutting commented on AVRO-946:
---

Hernan, that sounds like a good plan to me.  Would you like to update the patch 
or should I?

 GenericData.resolveUnion() performance improvement
 --

 Key: AVRO-946
 URL: https://issues.apache.org/jira/browse/AVRO-946
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.0
Reporter: Hernan Otero
 Attachments: AVRO-946.patch, AVRO-946.patch


 Due to the sequential nature of today's implementation of 
 GenericData.resolveUnion() (used when serializing an object):
 {code}
   public int resolveUnion(Schema union, Object datum) {
 int i = 0;
 for (Schema type : union.getTypes()) {
   if (instanceOf(type, datum))
 return i;
   i++;
 }
 throw new UnresolvedUnionException(union, datum);
   }
 {code}
 it showed up when we were doing some serialization performance analysis.  A 
 simple optimization can be implemented by keeping a map within the 
 UnionSchema object (in fact, this could actually be a perfect hash map given 
 the potential values in the map are known in advance).  The optimization is 
 obviously most notable when a Union within the schema contains many types (in 
 our particular use case, more than 40 in some cases).  In this scenario, we 
 observed a 25% improvement by using an identity hash map.
 Even though using an identity map provides a significant boost, we have 
 observed an even further improvement (and removed some of the restrictions of 
 relying on object identity) by using a perfect hash map on the schema names 
 (an extra 15% on top of that in some cases).  This implementation, 
 unfortunately, is not something we could contribute at this point, but we 
 thought it'd be a good idea to allow users to provide alternative 
 implementations of the indexing behavior, such as adding the following static 
 method to Schema:
 {code}
 public static void setUnionTypeIndexCacheFactory(UnionIndexCacheFactory 
 factory)
 {
   unionIndexCacheFactory = factory;
 }
 {code}
 This is what the interface and identity hash map-based implementation would 
 look like:
 {code}
   /**
* A factory interface for creating UnionTypeIndexCache instances.
*/
   public static interface UnionIndexCacheFactory
   {
   UnionIndexCache createUnionIndexCache(ListSchema types);
   /**
* Used for caching schema indices within a union.
*/
   public static interface UnionIndexCache
   {
   void setTypeIndex(Schema schema, int index);
   int getTypeIndex(Schema schema);
   }
   }
   private static class IdentityMapUnionIndexCacheFactory implements 
 UnionIndexCacheFactory
   {
   @Override
   public UnionIndexCache createUnionIndexCache(ListSchema types)
   {
   return new UnionIndexCache()
   {
   private final IdentityHashMapSchema, Integer schemaToIndex = 
 new IdentityHashMapSchema, Integer();
   @Override
   public void setTypeIndex(Schema schema, int index)
   {
   schemaToIndex.put(schema, index);
   }
   @Override
   public int getTypeIndex(Schema schema)
   {
   Integer index = schemaToIndex.get(schema);
   return index == null ? -1 : index;
   }
   };
   }
   }
 {code}
 I will attach a patch later today or early tomorrow.
 Thanks in advance,
 Hernan Otero

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-946) GenericData.resolveUnion() performance improvement

2011-10-31 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140260#comment-13140260
 ] 

Doug Cutting commented on AVRO-946:
---

Identity equality may result in multiple entries for a given schema but the 
cache should still work correctly.  It would perform poorly if every instance 
had a different schema, but that's not likely.

Also note that Schema now caches hash codes.  So even using equals hashing 
would usually only result in a single call to equals, to verify the hash entry. 
 Equals is fast for identical objects, so, if you used equals hashing, the slow 
case would be when the cached key is equal but not identical.

I think identity hashing with weak keys is probably preferable.

 GenericData.resolveUnion() performance improvement
 --

 Key: AVRO-946
 URL: https://issues.apache.org/jira/browse/AVRO-946
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.6.0
Reporter: Hernan Otero

 Due to the sequential nature of today's implementation of 
 GenericData.resolveUnion() (used when serializing an object):
 {code}
   public int resolveUnion(Schema union, Object datum) {
 int i = 0;
 for (Schema type : union.getTypes()) {
   if (instanceOf(type, datum))
 return i;
   i++;
 }
 throw new UnresolvedUnionException(union, datum);
   }
 {code}
 it showed up when we were doing some serialization performance analysis.  A 
 simple optimization can be implemented by keeping a map within the 
 UnionSchema object (in fact, this could actually be a perfect hash map given 
 the potential values in the map are known in advance).  The optimization is 
 obviously most notable when a Union within the schema contains many types (in 
 our particular use case, more than 40 in some cases).  In this scenario, we 
 observed a 25% improvement by using an identity hash map.
 Even though using an identity map provides a significant boost, we have 
 observed an even further improvement (and removed some of the restrictions of 
 relying on object identity) by using a perfect hash map on the schema names 
 (an extra 15% on top of that in some cases).  This implementation, 
 unfortunately, is not something we could contribute at this point, but we 
 thought it'd be a good idea to allow users to provide alternative 
 implementations of the indexing behavior, such as adding the following static 
 method to Schema:
 {code}
 public static void setUnionTypeIndexCacheFactory(UnionIndexCacheFactory 
 factory)
 {
   unionIndexCacheFactory = factory;
 }
 {code}
 This is what the interface and identity hash map-based implementation would 
 look like:
 {code}
   /**
* A factory interface for creating UnionTypeIndexCache instances.
*/
   public static interface UnionIndexCacheFactory
   {
   UnionIndexCache createUnionIndexCache(ListSchema types);
   /**
* Used for caching schema indices within a union.
*/
   public static interface UnionIndexCache
   {
   void setTypeIndex(Schema schema, int index);
   int getTypeIndex(Schema schema);
   }
   }
   private static class IdentityMapUnionIndexCacheFactory implements 
 UnionIndexCacheFactory
   {
   @Override
   public UnionIndexCache createUnionIndexCache(ListSchema types)
   {
   return new UnionIndexCache()
   {
   private final IdentityHashMapSchema, Integer schemaToIndex = 
 new IdentityHashMapSchema, Integer();
   @Override
   public void setTypeIndex(Schema schema, int index)
   {
   schemaToIndex.put(schema, index);
   }
   @Override
   public int getTypeIndex(Schema schema)
   {
   Integer index = schemaToIndex.get(schema);
   return index == null ? -1 : index;
   }
   };
   }
   }
 {code}
 I will attach a patch later today or early tomorrow.
 Thanks in advance,
 Hernan Otero

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-821) PHP protocol support

2011-10-31 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140641#comment-13140641
 ] 

Doug Cutting commented on AVRO-821:
---

A test would be great!  Thanks!

 PHP protocol support
 

 Key: AVRO-821
 URL: https://issues.apache.org/jira/browse/AVRO-821
 Project: Avro
  Issue Type: New Feature
  Components: php
Affects Versions: 1.5.1
 Environment: all
Reporter: Andy Wick
 Fix For: 1.5.1

 Attachments: AVRO-821-fixed.patch, avro.patch


 PHP version doesn't support protocol format

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-949) NettyTransiever doesn't call RPCPlugin.clientReceiveResponse on the same thread as clientSendRequest

2011-10-29 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139311#comment-13139311
 ] 

Doug Cutting commented on AVRO-949:
---

RPCContext, I mean.

 NettyTransiever doesn't call RPCPlugin.clientReceiveResponse on the same 
 thread as clientSendRequest
 

 Key: AVRO-949
 URL: https://issues.apache.org/jira/browse/AVRO-949
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.5.4, 1.6.0
Reporter: Philip Zeyliger

 RPCPlugin.clientReceiveResponse() is called in the Netty IO thread when using 
 a NettyTransceiver.  This is quite different than how HTTPTransceiver does it.
 Users can use RPCPlugin to do things like tracing and timing. It's bizarre 
 that clientSendRequest() happens in the caller's thread, but 
 clientReceiveResponse() happens in a different one, because thread locals are 
 one of the easiest way to pass information between these.  There's no easy 
 other way, since RPCContext, which is passed along, has no way to associate 
 arbitrary data with itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-941) Avro should support the Apache Maven Shade plugin class relocation feature

2011-10-28 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138658#comment-13138658
 ] 

Doug Cutting commented on AVRO-941:
---

I'll commit this soon unless someone objects.  It's not perfect but it's better 
than nothing.

 Avro should support the Apache Maven Shade plugin class relocation feature
 --

 Key: AVRO-941
 URL: https://issues.apache.org/jira/browse/AVRO-941
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.5.4
Reporter: Matt Massie
 Attachments: shade.patch


 The Apache shade plugin allows maven builds to create an uber jar that 
 contains dependencies in the project.  In addition, the shade plugin allows 
 you to relocate dependencies into a private namespace to prevent class 
 conflicts on shared class paths.  Avro does not support relocation.
 All generated Avro objects contain a string field named SCHEMA$ which serves 
 as the authority for the class namespace.  When the shade plugin updates the 
 byte code to relocate the class, it doesn't alter the SCHEMA$ string.  This 
 break Avro use of reflection since the namespace in SCHEMA$ points to an 
 incorrect location.
 I spoke with Doug about the issue and he was kind enough to provide a quick 
 hack in order to fix this issue.  The hack is to check for mismatches between 
 the byte code and the SCHEMA$ and, when they don't match, to defer to the 
 byte code.  I'll attach Doug's patch to this Jira.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-943) TestNettyServerWithCallbacks sometimes hangs

2011-10-25 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135194#comment-13135194
 ] 

Doug Cutting commented on AVRO-943:
---

Here's the thread dump.

{code}
Full thread dump Java HotSpot(TM) Server VM (17.1-b03 mixed mode):
New I/O client boss #2 prio=10 tid=0x6e8eec00 nid=0x4222 waiting on condition 
[0x6e5ad000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x9edf6c58 (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
at 
org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:191)
at 
org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:180)
at 
org.apache.avro.ipc.NettyTransceiver.access$200(NettyTransceiver.java:59)
at 
org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:361)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:344)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:232)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:98)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:404)
at 
org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:602)
at 
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:91)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771)
at 
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
at org.jboss.netty.channel.Channels.close(Channels.java:720)
at 
org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:200)
at 
org.jboss.netty.channel.ChannelFutureListener$2.operationComplete(ChannelFutureListener.java:57)
at 
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
at 
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:367)
at 
org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:334)
at 
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:389)
at 
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
at 
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
New I/O client worker #1-1 prio=10 tid=0x6eaedc00 nid=0x4220 runnable 
[0x6e75c000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked 0x9e8fdbc0 (a sun.nio.ch.Util$1)
- locked 0x9e8fdbb0 (a java.util.Collections$UnmodifiableSet)
- locked 0x9e8fd9c8 (a

[jira] [Commented] (AVRO-943) TestNettyServerWithCallbacks sometimes hangs

2011-10-25 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135210#comment-13135210
 ] 

Doug Cutting commented on AVRO-943:
---

It hangs around one time in 10.

I can reproduce this by running TestNettyServerWithCallbacks in a loop, e.g.:

{code}
while ( true ); do mvn test -Dtest=TestNettyServerWithCallbacks; done
{code}


 TestNettyServerWithCallbacks sometimes hangs
 

 Key: AVRO-943
 URL: https://issues.apache.org/jira/browse/AVRO-943
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Doug Cutting

 I'm periodically seeing tests hang in TestNettyServerWithCallbacks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-941) Avro should support the Apache Maven Shade plugin class relocation feature

2011-10-24 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134406#comment-13134406
 ] 

Doug Cutting commented on AVRO-941:
---

 the patch should probably only substitute the namespace field specifically

That's a flaw, but probably not the most critical one.  Avro identifiers cannot 
contain dots except in namespaces.  So as long as the package name contains a 
dot (which most do) then the global replace should not harm the schema.  Fixing 
this requires a fair amount of code, walking the schema and creating a copy 
with things renamed.  (We already do this in a few places, so probably we 
should create a SchemaVisitor API to simplify this, but that's a separate 
issue.)

Note that this approach will always be flawed, since it won't always be able to 
perfectly reconstruct the relocations used when shading.  However replacement 
is only attempted when things are already broken, so it does no harm and 
imperfections are thus tolerable.

Probably the biggest flaw of the current patch is that it will fails if nested 
schemas are not all in the same namespace.  To address this we might look for a 
common suffix or prefix in the new and old package and then replace the 
differing text.  For example, if the current class is 
com.baz.hidden.org.foo.Bar and the schema is org.foo.Bar then the replacement 
should be to prefix all namespaces with com.baz.hidden.

I'd be happy to see such improvements to this patch, but I'd not object to it 
being committed more-or-less as-is since it does no harm.


 Avro should support the Apache Maven Shade plugin class relocation feature
 --

 Key: AVRO-941
 URL: https://issues.apache.org/jira/browse/AVRO-941
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.5.4
Reporter: Matt Massie
 Attachments: shade.patch


 The Apache shade plugin allows maven builds to create an uber jar that 
 contains dependencies in the project.  In addition, the shade plugin allows 
 you to relocate dependencies into a private namespace to prevent class 
 conflicts on shared class paths.  Avro does not support relocation.
 All generated Avro objects contain a string field named SCHEMA$ which serves 
 as the authority for the class namespace.  When the shade plugin updates the 
 byte code to relocate the class, it doesn't alter the SCHEMA$ string.  This 
 break Avro use of reflection since the namespace in SCHEMA$ points to an 
 incorrect location.
 I spoke with Doug about the issue and he was kind enough to provide a quick 
 hack in order to fix this issue.  The hack is to check for mismatches between 
 the byte code and the SCHEMA$ and, when they don't match, to defer to the 
 byte code.  I'll attach Doug's patch to this Jira.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-935) Update Java dependencies for 1.6.0

2011-10-21 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133185#comment-13133185
 ] 

Doug Cutting commented on AVRO-935:
---

Scott, do you want to update this and commit it, or should I?

 Update Java dependencies for 1.6.0
 --

 Key: AVRO-935
 URL: https://issues.apache.org/jira/browse/AVRO-935
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.6.0

 Attachments: AVRO-935.patch


 Update Java dependencies to the latest version where appropriate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-936) Avro Java does not build with Maven 2.

2011-10-19 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130979#comment-13130979
 ] 

Doug Cutting commented on AVRO-936:
---

+1 This looks fine to me.

 Avro Java does not build with Maven 2.
 --

 Key: AVRO-936
 URL: https://issues.apache.org/jira/browse/AVRO-936
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.0
Reporter: Thiruvalluvan M. G.
Assignee: Thiruvalluvan M. G.
 Attachments: AVRO-936.patch


 It is because we use the feature Support Enum-type parameters in mojos of 
 Maven 3:
 http://jira.codehaus.org/browse/MNG-4292.
 The forthcoming patch fixes it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-935) Update Java dependencies for 1.6.0

2011-10-18 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130064#comment-13130064
 ] 

Doug Cutting commented on AVRO-935:
---

Hmm.  'mvn test' passes for me, but running individual tests with, e.g., 'mvn 
-Dtest=TestSchema' now fails.  That's not critical for the 1.6.0 release, but 
it's nice if it works for developers.  Any idea why this now fails?

 Update Java dependencies for 1.6.0
 --

 Key: AVRO-935
 URL: https://issues.apache.org/jira/browse/AVRO-935
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.6.0

 Attachments: AVRO-935.patch


 Update Java dependencies to the latest version where appropriate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-935) Update Java dependencies for 1.6.0

2011-10-18 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130120#comment-13130120
 ] 

Doug Cutting commented on AVRO-935:
---

+1 for adding that to the surefire plugin configuration.

 Update Java dependencies for 1.6.0
 --

 Key: AVRO-935
 URL: https://issues.apache.org/jira/browse/AVRO-935
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.6.0

 Attachments: AVRO-935.patch


 Update Java dependencies to the latest version where appropriate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-467) CMake: Complete CMake build system and remove autotools build system

2011-10-17 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129209#comment-13129209
 ] 

Doug Cutting commented on AVRO-467:
---

lang/c/build.sh seems to work correctly for me after applying the patch.

cmake is already required by the C++ build and is listed in the top-level 
BUILD.txt as a requirement.  We should probably add it to the C requirements 
there too.  I have traditionally used Ubuntu package names in that file, since 
that's what I install.

 CMake: Complete CMake build system and remove autotools build system
 

 Key: AVRO-467
 URL: https://issues.apache.org/jira/browse/AVRO-467
 Project: Avro
  Issue Type: Improvement
  Components: c
Affects Versions: 1.3.0
Reporter: Bruce Mitchener
Assignee: Bruce Mitchener
  Labels: cmake
 Attachments: 0001-AVRO-467.-C-Switch-from-autotools-to-CMake.patch, 
 0001-AVRO-467.-C-Switch-from-autotools-to-CMake.patch


 Placeholder bug to serve as a parent for all of the various remaining tasks 
 for the CMake build system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-570) python implementation of mapreduce connector

2011-10-13 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126715#comment-13126715
 ] 

Doug Cutting commented on AVRO-570:
---

I'm hoping to release 1.6.0 in the next week or two.  As long as this doesn't 
make any incompatible changes it could go into 1.6.1 which will likely follow 
in a month or so.  If it makes incompatible changes and doesn't make 1.6.0 then 
it wouldn't go out until 1.7.0, probably sometime in the first half of 2012.

 python implementation of mapreduce connector
 

 Key: AVRO-570
 URL: https://issues.apache.org/jira/browse/AVRO-570
 Project: Avro
  Issue Type: New Feature
  Components: python
Affects Versions: 1.6.0
Reporter: Doug Cutting
Assignee: Jeremy Lewi
Priority: Critical
  Labels: hadoop
 Fix For: 1.6.0

 Attachments: AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, 
 AVRO-570.patch


 AVRO-512 defines protocols for implementing mapreduce tasks.  It would be 
 good to have a Python implementation of this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-923) Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration

2011-10-12 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125964#comment-13125964
]

Doug Cutting commented on AVRO-923:
---

it seems to me this risk is already taken for other parameters such as
avro.mapper. For the case of schemas though there is a second check that
occurs when the input file schema does not match the compiled schema.

The input schema is not what I was most concerned about, rather the map output
schema. If different tasks somehow got a different map output schema it would
result in strange hard-to-debug i/o exceptions. We require that the map output
schema is constant across all tasks in a job for things to work correctly. Of
course it's not always possible to prohibit folks from creating erroneous
situations, we should try to discourage that but don't want to overly limit
functionality in the process.

It can also be described with xml files

What I meant was that the xml files can be programmatically constructed. They
should ideally not be constructed with cut and paste, but should use the same
source for schemas as the Java code that's getting re-generated to build the
new version of the jar file. Perhaps you can refer to the schemas with an
external entity definition in the XML that fetches the appropriate version?

{code}
!DOCTYPE job [
!ENTITY schemaX SYSTEM http://svn.foo.com/project/trunk/schemas/x.avsc;
]
job
... schemaX; ...
/job
{code}

Avro-MapRed: Provide a fallback using avro beans instead of schema in job
configuration
---

Key: AVRO-923
URL: https://issues.apache.org/jira/browse/AVRO-923
Project: Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.5.4
Environment: any
Reporter: Julien Muller
Fix For: 1.6.0

Original Estimate: 2h
Remaining Estimate: 2h

The current implementation of Avro MapRed is designed to use JobConf. While
it is possible to use job.xml file, it is pretty painful since you have to
copy/paste the all schemes for input and output. This is error prone and time
consuming. Also any update in a bean requires to recopy/repaste the schema
(if using JobConf a simple recompile would be enough).
A proposition to improve this and to stay backward compatible would be to
introduce new keys in AvroJob and reference the actual avro bean used. This
can be implemented as a fallback.
New keys would be created:
- avro.input.schema avro.input.class
- avro.map.output.schema avro.map.output.class
- avro.output.schema avro.output.class
Only 3 methods would be impacted in AvroJob:
- getInputSchema(Configuration job) {
// Implement a fallback like
String s = job.get(INPUT_SCHEMA);
if(s==null) s =
(String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField(SCHEMA$).get(null);
return Schema.parse(s);
}
}
- getMapOutputSchema()
- getOutputSchema()
Also, it would be more consistent to add new setters. This is not mandatory
since in that use case, the new keys are filled up directly in the job, not
using AvroJob.

[jira] [Commented] (AVRO-923) Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration

2011-10-11 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125213#comment-13125213
]

Doug Cutting commented on AVRO-923:
---

It's slightly riskier to get the schema from the runtime than from the job, in
particular the map output schema. If different versions of code are somehow
run on different nodes, then different map output schemas could be used, which
would create havoc, since the schema does not travel with the map output data.
When the schema is in the job.xml, there's very little chance of a lack of
coordination, since the framework distributes the same job.xml to every task.
If the schema comes from the runtime, there's some chance that different
versions of classes could be installed on different nodes.

Another concern is that not all schemas have a class that defines them. For
example, one might have jobs whose inputs or outputs are bytes or string or
Pairstring,bytes, etc.

These are the reasons that schema-in-job.xml is the required and preferred
means of specification. However there may be cases where it's preferable to
additionally support specification of schemas via a specific class, as
suggested in this issue.

A JobConf can be programmatically constructed. Why is it so painful to insert
the schema there as a part of your job creation/submission pipeline? I'd like
to better understand why that's so difficult before we add a new mechanism,
since any added mechanism has the potential to create bugs and user confusion.

Avro-MapRed: Provide a fallback using avro beans instead of schema in job
configuration
---

Original Estimate: 2h
Remaining Estimate: 2h

[jira] [Commented] (AVRO-878) TestWordCount.testProjection is broken

2011-10-11 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125216#comment-13125216
 ] 

Doug Cutting commented on AVRO-878:
---

So should we resolve this as Not a problem or should we keep it open as a 
problem when running under Java 7?  Is Java 7 a platform that Avro needs to 
support at this point?

 TestWordCount.testProjection is broken
 --

 Key: AVRO-878
 URL: https://issues.apache.org/jira/browse/AVRO-878
 Project: Avro
  Issue Type: Test
Affects Versions: 1.6.0
Reporter: Jeremy Lewi
Assignee: Jeremy Lewi
 Fix For: 1.6.0

 Attachments: AVRO-878.patch, 
 TEST-org.apache.avro.mapred.TestWordCount.xml, 
 TEST-org.apache.avro.mapred.TestWordCount.xml, lewi-ipc-reports.tar.gz

   Original Estimate: 1h
  Remaining Estimate: 1h

 TestWordCount.testProjection in avro/mapred/TestWordCount.java  is broken. It 
 appears to be using the wrong schema to read the output of the map reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-803) Java generated Avro classes make using Avro painful and surprising

2011-10-10 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124299#comment-13124299
]

Doug Cutting commented on AVRO-803:
---

I'd prefer not to break back-compatibility this time. It makes it impossible
for folks to upgrade one project without making source code changes to other
projects. If you specify stringTypeString/stringType in your pom.xml then
all your Map keys become java.lang.String.

Java generated Avro classes make using Avro painful and surprising
--

Key: AVRO-803
URL: https://issues.apache.org/jira/browse/AVRO-803
Project: Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.5.0
Environment: Any
Reporter: Sam Pullara
Assignee: Doug Cutting
Fix For: 1.6.0

Attachments: AVRO-803.patch, AVRO-803.patch, Foo.java

Currently the Avro generated Java classes expose CharSequence in their API.
However, you cannot use any old CharSequence when interacting with them. In
fact, you have to use the Utf8 class if you want to get consistent results. I
think that Avro should work with any CharSequence if that is the API. Here is
an example where this happens:
https://github.com/spullara/avro-generated-code/blob/master/src/test/java/AnnoyingTest.java
That prints out 'false' three times unexpectedly. If you can't get it to
print 'true' three times then you should probably change it back to Utf8.

[jira] [Commented] (AVRO-897) Map lookup behavior is ill-defined in Java

2011-10-07 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123118#comment-13123118
 ] 

Doug Cutting commented on AVRO-897:
---

This is addressed by the patch for AVRO-803, described in the comment at 
http://s.apache.org/VJC.

GenericDatumReader will now use java.lang.String everywhere when string schemas 
are annotated with avro.java.string:String.  There's a GenericData method 
to add this annotation.  This is perhaps not ideal but it is back-compatible 
which is important.

Can we close this issue as a duplicate of AVRO-803?

 Map lookup behavior is ill-defined in Java
 --

 Key: AVRO-897
 URL: https://issues.apache.org/jira/browse/AVRO-897
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.5.1
Reporter: Garrett Wu
 Attachments: avro-charsequence-map-test.tar.gz


 In Java, an Avro {{map}} is a Java {{Map}}.  The map keys are type 
 {{string}}, which maps to a Java {{CharSequence}}.
 Clients must know to use {{Utf8}} objects when calling {{get()}} or 
 {{containsKey()}}.  Instead, {{GenericDatumReader}} should instantiate a 
 {{Map}} instance with a {{Comparator}} suitable for comparing any type of 
 {{CharSequence}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-883) Avro should come with a simple Java example

2011-10-07 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123183#comment-13123183
 ] 

Doug Cutting commented on AVRO-883:
---

Perhaps this should go into lang/java/archetypes/avro-data-archetype?  If you 
agree, would you be willing to convert this into a patch that goes there?  
Thanks!

 Avro should come with a simple Java example
 ---

 Key: AVRO-883
 URL: https://issues.apache.org/jira/browse/AVRO-883
 Project: Avro
  Issue Type: Task
  Components: java
Reporter: William McNeill
 Fix For: 1.6.0


 The Avro distribution should have a simple Java example of how to serialize 
 and deserialize data. As discussed on the mailing list, the example at 
 https://github.com/wpm/AvroExample can serve as a basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-912) Mapreduce tether test fails on Windows

2011-10-06 Thread Doug Cutting (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122121#comment-13122121
 ] 

Doug Cutting commented on AVRO-912:
---

+1 Changes look good to me and tests pass on Linux.

 Mapreduce tether test fails on Windows
 --

 Key: AVRO-912
 URL: https://issues.apache.org/jira/browse/AVRO-912
 Project: Avro
  Issue Type: Bug
Reporter: Thiruvalluvan M. G.
Assignee: Thiruvalluvan M. G.
 Attachments: AVRO-912.patch


 The problems are:
 1. The executable filename is passed around as a URL. Windows filenames are 
 valid URLs.
 2. Typical windows user's home directory is {{c:\Documents and 
 Settings\username}}. Maven puts the downloaded jar files under {{$HOME/.m2}}. 
 So the classpath has several directories with space in their names. Splitting 
 command line arguments using space generates invalid classpath.
 3. Hadoop's {{TaskLog.captureOutAndError()}} generates command line for unix 
 systems using bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-803) Java generated Avro classes make using Avro painful and surprising

2011-10-06 Thread Doug Cutting (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/AVRO-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122154#comment-13122154
]

Doug Cutting commented on AVRO-803:
---

Here's a new proposal:
- add a new Decoder method, 'String readString()' implemented to avoid
allocating new intermediate byte arrays for each call as is currently done when
Utf8's are not reused.
- change generated specific code to optionally use String everywhere instead
of CharSequence. (We could also add an option to emit Utf8 everywhere.) When
String is used we add a property to the string schemas in the generated code so
they become {type:string, java:String}.
- GenericData#readString() would call the new Decoder method when
java:String is present in the String's schema.

This is totally back-compatible.

Java generated Avro classes make using Avro painful and surprising
--

Key: AVRO-803
URL: https://issues.apache.org/jira/browse/AVRO-803
Project: Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.5.0
Environment: Any
Reporter: Sam Pullara
Fix For: 1.6.0

Attachments: Foo.java

1 2 >

1 - 100 of 108 matches

Mail list logo