[jira] [Updated] (AVRO-1299) SpecificRecordBase implements GenericRecord

2013-04-16 Thread Christophe Taton (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Taton updated AVRO-1299:
---

Attachment: AVRO-1299.20130416-25.patch

This patch updates SpecificRecordBase to implement GenericRecord.

> SpecificRecordBase implements GenericRecord
> ---
>
> Key: AVRO-1299
> URL: https://issues.apache.org/jira/browse/AVRO-1299
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.4
>Reporter: Christophe Taton
>Priority: Minor
> Fix For: 1.7.5
>
> Attachments: AVRO-1299.20130416-25.patch
>
>
> Code written for generic records should be directly applicable on equivalent 
> specific records.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1299) SpecificRecordBase implements GenericRecord

2013-04-16 Thread Christophe Taton (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Taton updated AVRO-1299:
---

Fix Version/s: 1.7.5
   Status: Patch Available  (was: Open)

> SpecificRecordBase implements GenericRecord
> ---
>
> Key: AVRO-1299
> URL: https://issues.apache.org/jira/browse/AVRO-1299
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.4
>Reporter: Christophe Taton
>Priority: Minor
> Fix For: 1.7.5
>
>
> Code written for generic records should be directly applicable on equivalent 
> specific records.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1299) SpecificRecordBase implements GenericRecord

2013-04-16 Thread Christophe Taton (JIRA)
Christophe Taton created AVRO-1299:
--

 Summary: SpecificRecordBase implements GenericRecord
 Key: AVRO-1299
 URL: https://issues.apache.org/jira/browse/AVRO-1299
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.4
Reporter: Christophe Taton
Priority: Minor


Code written for generic records should be directly applicable on equivalent 
specific records.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1295) null namespace within non-null is not print/parse consistent

2013-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633775#comment-13633775
 ] 

Hudson commented on AVRO-1295:
--

Integrated in AvroJava #362 (See [https://builds.apache.org/job/AvroJava/362/])
AVRO-1295. Java: Fix printing of a non-null namespace within a null 
namespace. (Revision 1468677)

 Result = SUCCESS
cutting : 
Files : 
* /avro/trunk/CHANGES.txt
* /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/Schema.java
* /avro/trunk/lang/java/ipc/src/test/java/org/apache/avro/TestSchema.java


> null namespace within non-null is not print/parse consistent
> 
>
> Key: AVRO-1295
> URL: https://issues.apache.org/jira/browse/AVRO-1295
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Doug Cutting
>Assignee: Doug Cutting
> Fix For: 1.7.5
>
> Attachments: AVRO-1295.patch
>
>
> If a record with a null namespace is nested within a record with a non-null 
> namespace then, when the outer schema is printed and re-parsed, the inner 
> schema's namespace becomes the outer, rather than null as it should be.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1295) null namespace within non-null is not print/parse consistent

2013-04-16 Thread Doug Cutting (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated AVRO-1295:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I committed this.

> null namespace within non-null is not print/parse consistent
> 
>
> Key: AVRO-1295
> URL: https://issues.apache.org/jira/browse/AVRO-1295
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Doug Cutting
>Assignee: Doug Cutting
> Fix For: 1.7.5
>
> Attachments: AVRO-1295.patch
>
>
> If a record with a null namespace is nested within a record with a non-null 
> namespace then, when the outer schema is printed and re-parsed, the inner 
> schema's namespace becomes the outer, rather than null as it should be.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-867) Allow tools to read files via hadoop FileSystem class

2013-04-16 Thread Doug Cutting (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated AVRO-867:
--

Attachment: AVRO-867.patch

This looks like a good contribution!

Here's a version of the patch with the following minor modifications:
 - diff is from root, rather than lang/java/tools
 - removed some spurious whitespace changes

I'll commit this soon unless someone objects.

> Allow tools to read files via hadoop FileSystem class
> -
>
> Key: AVRO-867
> URL: https://issues.apache.org/jira/browse/AVRO-867
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.7.5
>Reporter: Joe Crobak
>Assignee: Joe Crobak
> Attachments: addedHadoopFileSupport.diff, AVRO-867.patch
>
>
> It would be great if I could use the various tools to read/parse files that 
> are in HDFS, S3, etc via the 
> [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html]
>  api. We could retain backwards compatibility by assuming that unqualified 
> urls are "file://" but allow reading of files from fully qualified urls such 
> as hdfs://. The required apis are already part of the avro-tools uber jar to 
> support the TetherTool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-16 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633308#comment-13633308
 ] 

Jeremy Kahn commented on AVRO-1296:
---

Commenting to nudge this issue.

Can somebody review these Python patches?  It's a small change but it fixes a 
fairly serious obstacle to using Avro files as a Java/Python interlingua for  
on-disk storage.

> Python: schemas retrieved from protocol types ignore namespace
> --
>
> Key: AVRO-1296
> URL: https://issues.apache.org/jira/browse/AVRO-1296
> Project: Avro
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.7.4
>Reporter: Jeremy Kahn
>Assignee: Jeremy Kahn
> Fix For: 1.7.5
>
> Attachments: AVRO-1296a.patch, AVRO-1296b.patch
>
>
> If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
> {{"namespace": "ns"}} and then retrieve a child schema {{s}} from the 
> protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not 
> have its namespace set (to {{ns}}), even if {{p}} has a namespace.
> This is particularly problematic if I'm using {{s}} to write out an avro file 
> intended to be read by a specific-type reader, because the file header will 
> claim to be objects of type {{s}} (not {{ns.s}}, as expected).
> I've attached two patches: one that makes sure that the {{namespace}} 
> property of protocol types is set to the default namespace of the protocol 
> when not otherwise set.
> The second patch ensures that the {{namespace}} is *not* rendered into JSON 
> when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-16 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633048#comment-13633048
 ] 

Doug Cutting commented on AVRO-1282:


I don't think we should add an 'accessor' field to the Schema, nor should 
GenericDatumReader have reflect logic in it.  Rather I think we can accomplish 
this by implementing ReflectData#getRecordState(Object,Schema) to return an 
array of accessors from a cache keyed by the schema.  Then 
ReflectData#getField(Object r, String f, int p, Object state) can find the 
accessor as state[p].  Does that make sense?

> Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
> it
> ---
>
> Key: AVRO-1282
> URL: https://issues.apache.org/jira/browse/AVRO-1282
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.4
>Reporter: Leo Romanoff
>Priority: Minor
> Attachments: avro-1282-v1.patch, avro-1282-v2.patch
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK 
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the 
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick 
> input/output.
>  
> 3) More over, Unsafe makes it possible to serialize to/deserialize from 
> off-heap memory directly and very quickly, without any intermediate buffers 
> allocated on heap. There is virtually no overhead compared to the usual byte 
> arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-16 Thread Leo Romanoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Romanoff updated AVRO-1282:
---

Attachment: avro-1282-v2.patch

Minor fixes for problems with UNIONs and GenericData.Record-based structs

> Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
> it
> ---
>
> Key: AVRO-1282
> URL: https://issues.apache.org/jira/browse/AVRO-1282
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.4
>Reporter: Leo Romanoff
>Priority: Minor
> Attachments: avro-1282-v1.patch, avro-1282-v2.patch
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK 
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the 
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick 
> input/output.
>  
> 3) More over, Unsafe makes it possible to serialize to/deserialize from 
> off-heap memory directly and very quickly, without any intermediate buffers 
> allocated on heap. There is virtually no overhead compared to the usual byte 
> arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-16 Thread Leo Romanoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Romanoff updated AVRO-1282:
---

Status: Patch Available  (was: Open)

I just submitted a patch that implements Unsafe-based 
serialization/deserialization in case of reflection-based (de)serializers. It 
also includes a performance test. 
You can run it using 
 mvn exec:java -Dexec.mainClass="org.apache.avro.io.Perf" 
-Dexec.classpathScope=test -Dexec.args="-REFr"

or 

 mvn exec:java -Dexec.mainClass="org.apache.avro.io.Perf" 
-Dexec.classpathScope=test -Dexec.args="-reflect"

On my machine I get 5x performance boost using Unsafe approach.

> Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
> it
> ---
>
> Key: AVRO-1282
> URL: https://issues.apache.org/jira/browse/AVRO-1282
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.4
>Reporter: Leo Romanoff
>Priority: Minor
> Attachments: avro-1282-v1.patch
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK 
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the 
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick 
> input/output.
>  
> 3) More over, Unsafe makes it possible to serialize to/deserialize from 
> off-heap memory directly and very quickly, without any intermediate buffers 
> allocated on heap. There is virtually no overhead compared to the usual byte 
> arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-16 Thread Leo Romanoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Romanoff updated AVRO-1282:
---

Attachment: avro-1282-v1.patch

Unsafe serialization patch

> Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
> it
> ---
>
> Key: AVRO-1282
> URL: https://issues.apache.org/jira/browse/AVRO-1282
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.4
>Reporter: Leo Romanoff
>Priority: Minor
> Attachments: avro-1282-v1.patch
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK 
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the 
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick 
> input/output.
>  
> 3) More over, Unsafe makes it possible to serialize to/deserialize from 
> off-heap memory directly and very quickly, without any intermediate buffers 
> allocated on heap. There is virtually no overhead compared to the usual byte 
> arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-867) Allow tools to read files via hadoop FileSystem class

2013-04-16 Thread Vincenz Priesnitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincenz Priesnitz updated AVRO-867:
---

Attachment: addedHadoopFileSupport.diff

> Allow tools to read files via hadoop FileSystem class
> -
>
> Key: AVRO-867
> URL: https://issues.apache.org/jira/browse/AVRO-867
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.7.5
>Reporter: Joe Crobak
>Assignee: Joe Crobak
> Attachments: addedHadoopFileSupport.diff
>
>
> It would be great if I could use the various tools to read/parse files that 
> are in HDFS, S3, etc via the 
> [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html]
>  api. We could retain backwards compatibility by assuming that unqualified 
> urls are "file://" but allow reading of files from fully qualified urls such 
> as hdfs://. The required apis are already part of the avro-tools uber jar to 
> support the TetherTool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-867) Allow tools to read files via hadoop FileSystem class

2013-04-16 Thread Vincenz Priesnitz (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincenz Priesnitz updated AVRO-867:
---

Affects Version/s: 1.7.5
 Release Note: avro-tools can now access Hadoop supported filesystem 
when started via hadoop jar.
   Status: Patch Available  (was: Open)

Attached you find a patch that changes the Utils class to use the hadoop 
FileSystem class. It is now possible to use any supported filesystem for input 
or output files in more tools. 

Without any configurations, the tools behave as before:
{noformat}
# reads from local file system by default
# supports relative paths
java -jar avro-tools-1.7.5.jar tojson ~/myDir/myData.avro
{noformat}

If invoked via hadoop jar, the tools support more filesystems. Different 
filesystems can be used in a single call. Furthermore, any default filesystem 
that might be specified in core-site.xml is respected.
{noformat}
# combines an ftp file and a local file and writes result file 
combinedData.avro directly on the default hdfs server.
hadoop jar avro-tools-1.7.5.jar concat ftp://myFtpServer/data1.avro 
file:///home/user/data2.avro combinedData.avro
{noformat}

Now it is possible to take a look at remote files quicker, e.g.:
{noformat}
hadoop jar avro-Tools getschema Data_on_hdfs.avro
hadoop jar avro-Tools tojson ftp://server-address/Data_on_ftp.avro 
{noformat}

The following tools now use Utils for accessing files: concat, fragtojson, 
fromjson, fromtext, getmeta, getschema, jsontofrag, recodec, tojson, totext.

> Allow tools to read files via hadoop FileSystem class
> -
>
> Key: AVRO-867
> URL: https://issues.apache.org/jira/browse/AVRO-867
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.7.5
>Reporter: Joe Crobak
>Assignee: Joe Crobak
>
> It would be great if I could use the various tools to read/parse files that 
> are in HDFS, S3, etc via the 
> [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html]
>  api. We could retain backwards compatibility by assuming that unqualified 
> urls are "file://" but allow reading of files from fully qualified urls such 
> as hdfs://. The required apis are already part of the avro-tools uber jar to 
> support the TetherTool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira