[jira] [Commented] (AVRO-1414) Compression with C++ DataFile

2013-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850730#comment-13850730
 ] 

Doug Cutting commented on AVRO-1414:


This would be great to have.  We can test compatibility against other 
implementations by putting more compressed files in share/test/data and having 
unit tests that validate against those.

 Compression with C++ DataFile
 -

 Key: AVRO-1414
 URL: https://issues.apache.org/jira/browse/AVRO-1414
 Project: Avro
  Issue Type: Improvement
  Components: c++
Reporter: Daniel Russel

 There is no way to use compression with the C++ DataFileReader and C++ 
 DataFileWriter, from what I can tell. Adding compression of the written 
 blocks using boost streams is relatively straight forward and I can provide a 
 patch if people are interested. 
 However, there are a couple caveats:
 - the windows builds of boost don't currently include zlib support (required 
 for compression) by default. You have to do extra work to get it.
 - I don't know if doing it that way is compatible with other avro 
 implementations



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1413) Provide static abstract getClassSchema on SpecificRecordBase (as per AVRO-1223)

2013-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850737#comment-13850737
 ] 

Doug Cutting commented on AVRO-1413:


Adding public getRecordClass() and getSchema() methods to RecordBuilderBase 
seems reasonable to me.

Note that there is already the method SpecificData#getSchema(Type).  So if you 
have a class that you know is a specific record then you can use this to get 
its schema.

 Provide static abstract getClassSchema on SpecificRecordBase (as per 
 AVRO-1223)
 ---

 Key: AVRO-1413
 URL: https://issues.apache.org/jira/browse/AVRO-1413
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.5
Reporter: Tristan Stevens
Priority: Trivial
   Original Estimate: 10m
  Remaining Estimate: 10m

 AVRO-1223 added support for static getClassSchema() to generated Record 
 classes, this should be added to the abstract superclass SpecificRecordBase 
 and also to the interface SpecificRecord in order to enable this method to be 
 used on any generated Records.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion

2013-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850898#comment-13850898
 ] 

Doug Cutting commented on AVRO-1348:


Here are benchmarks run on my 64-bit Linux laptop.

Before:
{code}
java version 1.6.0_45
test name timeM entries/sec   M bytes/sec  bytes/cycle
 StringRead:   5495 ms   7.278   259.239   1780910
StringWrite:   5774 ms   6.927   246.725   1780910

java version 1.7.0_45
 StringRead:   3541 ms  11.296   402.326   1780910
StringWrite:   4378 ms   9.136   325.421   1780910
{code}

After:
{code}
java version 1.6.0_45
 test name timeM entries/sec   M bytes/sec  bytes/cycle
 StringRead:   4882 ms   8.193   291.829   1780910
StringWrite:   5844 ms   6.844   243.754   1780910

java version 1.7.0_45
 test name timeM entries/sec   M bytes/sec  bytes/cycle
 StringRead:   3535 ms  11.315   403.020   1780910
StringWrite:   4136 ms   9.670   344.411   1780910
{code}

The speedup of the write benchmark under Java 7 is consistent and inexplicable.

This appears to achieve good performance in both Java 6  7.  I'll commit this 
unless someone objects.

 Improve Utf8 to String conversion
 -

 Key: AVRO-1348
 URL: https://issues.apache.org/jira/browse/AVRO-1348
 Project: Avro
  Issue Type: Bug
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: AVRO-1348.patch, AVRO-1348v2.patch, AVRO1348v1.patch


 AVRO-1241 found that the existing method of creating Strings from Utf8 byte 
 arrays could be made faster. The same method is being used in the 
 Utf8.toString(), and could likely be sped up by doing the same thing.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion

2013-12-17 Thread Rob Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850949#comment-13850949
 ] 

Rob Turner commented on AVRO-1348:
--

Here are the results on my old 32 bit Linux laptop, Doug's Patch looks good!

||jdk||change||test name||time||M entries/sec||M bytes/sec||bytes/cycle||
|jdk1.6.0_45|UTF-8|StringRead:|29532 ms|   1.354|48.243|   
1780910|
|jdk1.7.0_40|UTF-8|StringRead:|22897 ms|   1.747|62.222|   
1780910|
|jdk1.6.0_45|Doug's Patch|StringRead:|29628 ms|   1.350|48.086| 
  1780910|
|jdk1.7.0_40|Doug's Patch|StringRead:|22902 ms|   1.747|62.209| 
  1780910|
|jdk1.6.0_45|Charset|StringRead:|37476 ms |  1.067|38.016|   
1780910|
|jdk1.7.0_40|Charset|StringRead:|23376 ms |  1.711|   60.948|   
1780910|


 Improve Utf8 to String conversion
 -

 Key: AVRO-1348
 URL: https://issues.apache.org/jira/browse/AVRO-1348
 Project: Avro
  Issue Type: Bug
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: AVRO-1348.patch, AVRO-1348v2.patch, AVRO1348v1.patch


 AVRO-1241 found that the existing method of creating Strings from Utf8 byte 
 arrays could be made faster. The same method is being used in the 
 Utf8.toString(), and could likely be sped up by doing the same thing.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1063) Ruby client should use multi_json rather than being locked down to yajl

2013-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851097#comment-13851097
 ] 

Doug Cutting commented on AVRO-1063:


I'll commit this soon unless there are objections.  Sean?

 Ruby client should use multi_json rather than being locked down to yajl
 ---

 Key: AVRO-1063
 URL: https://issues.apache.org/jira/browse/AVRO-1063
 Project: Avro
  Issue Type: Improvement
  Components: ruby
Reporter: Paul Dlug
Priority: Minor
 Fix For: 1.7.6

 Attachments: AVRO-1063.diff


 The avro ruby client uses yajl for JSON serialization which is just one of 
 many suitable JSON implementations for ruby. The multi_json gem provides a 
 wrapper for JSON serialization selecting the fastest library available (Oj is 
 now even faster than Yajl) and falling back to a pure ruby implementation 
 bundled with multi_json. Requiring yajl also precludes the ruby gem from 
 being used under jruby since it requires a C extension.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1405) Avro-c may not handle eof correctly if avro data file contains multiple sync markers

2013-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851108#comment-13851108
 ] 

Doug Cutting commented on AVRO-1405:


I get a bunch of errors compiling the new test when I apply this.

{code}
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c: In function 
‘write_data’:
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:89:5: error: ‘for’ 
loop initial declarations are only allowed in C99 mode
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:89:5: note: use 
option -std=c99 or -std=gnu99 to compile your code
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c: At top level:
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:147:13: error: 
redefinition of ‘PERSON_SCHEMA’
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:6:13: note: previous 
definition of ‘PERSON_SCHEMA’ was here
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:157:13: error: 
redefinition of ‘file’
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:16:13: note: 
previous definition of ‘file’ was here
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:159:6: error: 
redefinition of ‘print_avro_value’
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:18:6: note: previous 
definition of ‘print_avro_value’ was here
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:167:5: error: 
redefinition of ‘read_data’
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:26:5: note: previous 
definition of ‘read_data’ was here
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:209:5: error: 
redefinition of ‘write_data’
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:68:5: note: previous 
definition of ‘write_data’ was here
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c: In function 
‘write_data’:
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:230:5: error: ‘for’ 
loop initial declarations are only allowed in C99 mode
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c: At top level:
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:268:5: error: 
redefinition of ‘main’
/home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:127:5: note: 
previous definition of ‘main’ was here
make[2]: *** [tests/CMakeFiles/test_avro_1405.dir/test_avro_1405.o] Error 1
{code}

 Avro-c may not handle eof correctly if avro data file contains multiple sync 
 markers
 

 Key: AVRO-1405
 URL: https://issues.apache.org/jira/browse/AVRO-1405
 Project: Avro
  Issue Type: Bug
  Components: c
Affects Versions: 1.7.5
Reporter: Mika Ristimaki
Assignee: Mika Ristimaki
Priority: Minor
 Fix For: 1.7.6

 Attachments: AVRO-1405.patch


 I encountered a bug in the Avro C API. If the following is done, it seems 
 that the Avro data file reader can not read the file correctly
 {code}
 while (has values to write) {
   Open file for writing
   Write a value to the file
   Close the writer
 }
 {code}
 Reading this file with Avro data file reader fails with EOF after only the 
 first item has been read from the file.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1379) C: Add avro_file_writer_append_encoded() to API

2013-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851110#comment-13851110
 ] 

Doug Cutting commented on AVRO-1379:


This looks reasonable to me.  Can you please add a test?  Thanks!

 C: Add avro_file_writer_append_encoded() to API
 ---

 Key: AVRO-1379
 URL: https://issues.apache.org/jira/browse/AVRO-1379
 Project: Avro
  Issue Type: Improvement
  Components: c
Affects Versions: 1.7.5
Reporter: Mark Teodoro
  Labels: patch
 Fix For: 1.7.6

 Attachments: AVRO-1379.patch


 Java's org.apache.avro.file.DataFileWriter.appendEncoded() allows you to 
 append a pre-encoded datum to a file.  This patch adds the same functionality 
 to Avro-C.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Issue Comment Deleted] (AVRO-1063) Ruby client should use multi_json rather than being locked down to yajl

2013-12-17 Thread Doug Cutting (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated AVRO-1063:
---

Comment: was deleted

(was: I'll commit this soon unless there are objections.  Sean?)

 Ruby client should use multi_json rather than being locked down to yajl
 ---

 Key: AVRO-1063
 URL: https://issues.apache.org/jira/browse/AVRO-1063
 Project: Avro
  Issue Type: Improvement
  Components: ruby
Reporter: Paul Dlug
Priority: Minor
 Fix For: 1.8.0

 Attachments: AVRO-1063.diff


 The avro ruby client uses yajl for JSON serialization which is just one of 
 many suitable JSON implementations for ruby. The multi_json gem provides a 
 wrapper for JSON serialization selecting the fastest library available (Oj is 
 now even faster than Yajl) and falling back to a pure ruby implementation 
 bundled with multi_json. Requiring yajl also precludes the ruby gem from 
 being used under jruby since it requires a C extension.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (AVRO-1063) Ruby client should use multi_json rather than being locked down to yajl

2013-12-17 Thread Doug Cutting (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated AVRO-1063:
---

Fix Version/s: (was: 1.7.6)
   1.8.0

This is actually incompatible.  It requires that folks have the multi-json gem 
installed.  Folks could not upgrade to a version of Avro that includes this 
change without also installing multi-json.  So it needs to be done as a part of 
Avro 1.8.

 Ruby client should use multi_json rather than being locked down to yajl
 ---

 Key: AVRO-1063
 URL: https://issues.apache.org/jira/browse/AVRO-1063
 Project: Avro
  Issue Type: Improvement
  Components: ruby
Reporter: Paul Dlug
Priority: Minor
 Fix For: 1.8.0

 Attachments: AVRO-1063.diff


 The avro ruby client uses yajl for JSON serialization which is just one of 
 many suitable JSON implementations for ruby. The multi_json gem provides a 
 wrapper for JSON serialization selecting the fastest library available (Oj is 
 now even faster than Yajl) and falling back to a pure ruby implementation 
 bundled with multi_json. Requiring yajl also precludes the ruby gem from 
 being used under jruby since it requires a C extension.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (AVRO-1414) Compression with C++ DataFile

2013-12-17 Thread Daniel Russel (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Russel updated AVRO-1414:


Attachment: patch

 Compression with C++ DataFile
 -

 Key: AVRO-1414
 URL: https://issues.apache.org/jira/browse/AVRO-1414
 Project: Avro
  Issue Type: Improvement
  Components: c++
Reporter: Daniel Russel
 Attachments: patch


 There is no way to use compression with the C++ DataFileReader and C++ 
 DataFileWriter, from what I can tell. Adding compression of the written 
 blocks using boost streams is relatively straight forward and I can provide a 
 patch if people are interested. 
 However, there are a couple caveats:
 - the windows builds of boost don't currently include zlib support (required 
 for compression) by default. You have to do extra work to get it.
 - I don't know if doing it that way is compatible with other avro 
 implementations



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1414) Compression with C++ DataFile

2013-12-17 Thread Daniel Russel (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851207#comment-13851207
 ] 

Daniel Russel commented on AVRO-1414:
-

You can find a patch for nice inspection at 
https://github.com/salilab/avrocpp/compare/compression. I don't see an easy 
way of downloading it from there though. So it is attached too.

 Compression with C++ DataFile
 -

 Key: AVRO-1414
 URL: https://issues.apache.org/jira/browse/AVRO-1414
 Project: Avro
  Issue Type: Improvement
  Components: c++
Reporter: Daniel Russel
 Attachments: patch


 There is no way to use compression with the C++ DataFileReader and C++ 
 DataFileWriter, from what I can tell. Adding compression of the written 
 blocks using boost streams is relatively straight forward and I can provide a 
 patch if people are interested. 
 However, there are a couple caveats:
 - the windows builds of boost don't currently include zlib support (required 
 for compression) by default. You have to do extra work to get it.
 - I don't know if doing it that way is compatible with other avro 
 implementations



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1063) Ruby client should use multi_json rather than being locked down to yajl

2013-12-17 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851347#comment-13851347
 ] 

Sean Busbey commented on AVRO-1063:
---

I tested this with a few different json providing backends on MRI and JRuby 
1.7.3. tests all passed fine.

I thought the dependencies for this new version of the gem properly specified 
multi-json as a dependency that would be pulled in by the Gem utils when a user 
upgraded. If that's the case would it be sufficiently compatible? Or is a 
dependency change itself incompatible?

 Ruby client should use multi_json rather than being locked down to yajl
 ---

 Key: AVRO-1063
 URL: https://issues.apache.org/jira/browse/AVRO-1063
 Project: Avro
  Issue Type: Improvement
  Components: ruby
Reporter: Paul Dlug
Priority: Minor
 Fix For: 1.8.0

 Attachments: AVRO-1063.diff


 The avro ruby client uses yajl for JSON serialization which is just one of 
 many suitable JSON implementations for ruby. The multi_json gem provides a 
 wrapper for JSON serialization selecting the fastest library available (Oj is 
 now even faster than Yajl) and falling back to a pure ruby implementation 
 bundled with multi_json. Requiring yajl also precludes the ruby gem from 
 being used under jruby since it requires a C extension.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (AVRO-1379) C: Add avro_file_writer_append_encoded() to API

2013-12-17 Thread Mark Teodoro (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Teodoro updated AVRO-1379:
---

Attachment: AVRO-1379-test.patch

No problem, here's another patch with a test.

 C: Add avro_file_writer_append_encoded() to API
 ---

 Key: AVRO-1379
 URL: https://issues.apache.org/jira/browse/AVRO-1379
 Project: Avro
  Issue Type: Improvement
  Components: c
Affects Versions: 1.7.5
Reporter: Mark Teodoro
  Labels: patch
 Fix For: 1.7.6

 Attachments: AVRO-1379-test.patch, AVRO-1379.patch


 Java's org.apache.avro.file.DataFileWriter.appendEncoded() allows you to 
 append a pre-encoded datum to a file.  This patch adds the same functionality 
 to Avro-C.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)