[jira] [Commented] (AVRO-1414) Compression with C++ DataFile
[ https://issues.apache.org/jira/browse/AVRO-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850730#comment-13850730 ] Doug Cutting commented on AVRO-1414: This would be great to have. We can test compatibility against other implementations by putting more compressed files in share/test/data and having unit tests that validate against those. Compression with C++ DataFile - Key: AVRO-1414 URL: https://issues.apache.org/jira/browse/AVRO-1414 Project: Avro Issue Type: Improvement Components: c++ Reporter: Daniel Russel There is no way to use compression with the C++ DataFileReader and C++ DataFileWriter, from what I can tell. Adding compression of the written blocks using boost streams is relatively straight forward and I can provide a patch if people are interested. However, there are a couple caveats: - the windows builds of boost don't currently include zlib support (required for compression) by default. You have to do extra work to get it. - I don't know if doing it that way is compatible with other avro implementations -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1413) Provide static abstract getClassSchema on SpecificRecordBase (as per AVRO-1223)
[ https://issues.apache.org/jira/browse/AVRO-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850737#comment-13850737 ] Doug Cutting commented on AVRO-1413: Adding public getRecordClass() and getSchema() methods to RecordBuilderBase seems reasonable to me. Note that there is already the method SpecificData#getSchema(Type). So if you have a class that you know is a specific record then you can use this to get its schema. Provide static abstract getClassSchema on SpecificRecordBase (as per AVRO-1223) --- Key: AVRO-1413 URL: https://issues.apache.org/jira/browse/AVRO-1413 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Tristan Stevens Priority: Trivial Original Estimate: 10m Remaining Estimate: 10m AVRO-1223 added support for static getClassSchema() to generated Record classes, this should be added to the abstract superclass SpecificRecordBase and also to the interface SpecificRecord in order to enable this method to be used on any generated Records. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion
[ https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850898#comment-13850898 ] Doug Cutting commented on AVRO-1348: Here are benchmarks run on my 64-bit Linux laptop. Before: {code} java version 1.6.0_45 test name timeM entries/sec M bytes/sec bytes/cycle StringRead: 5495 ms 7.278 259.239 1780910 StringWrite: 5774 ms 6.927 246.725 1780910 java version 1.7.0_45 StringRead: 3541 ms 11.296 402.326 1780910 StringWrite: 4378 ms 9.136 325.421 1780910 {code} After: {code} java version 1.6.0_45 test name timeM entries/sec M bytes/sec bytes/cycle StringRead: 4882 ms 8.193 291.829 1780910 StringWrite: 5844 ms 6.844 243.754 1780910 java version 1.7.0_45 test name timeM entries/sec M bytes/sec bytes/cycle StringRead: 3535 ms 11.315 403.020 1780910 StringWrite: 4136 ms 9.670 344.411 1780910 {code} The speedup of the write benchmark under Java 7 is consistent and inexplicable. This appears to achieve good performance in both Java 6 7. I'll commit this unless someone objects. Improve Utf8 to String conversion - Key: AVRO-1348 URL: https://issues.apache.org/jira/browse/AVRO-1348 Project: Avro Issue Type: Bug Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: AVRO-1348.patch, AVRO-1348v2.patch, AVRO1348v1.patch AVRO-1241 found that the existing method of creating Strings from Utf8 byte arrays could be made faster. The same method is being used in the Utf8.toString(), and could likely be sped up by doing the same thing. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion
[ https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850949#comment-13850949 ] Rob Turner commented on AVRO-1348: -- Here are the results on my old 32 bit Linux laptop, Doug's Patch looks good! ||jdk||change||test name||time||M entries/sec||M bytes/sec||bytes/cycle|| |jdk1.6.0_45|UTF-8|StringRead:|29532 ms| 1.354|48.243| 1780910| |jdk1.7.0_40|UTF-8|StringRead:|22897 ms| 1.747|62.222| 1780910| |jdk1.6.0_45|Doug's Patch|StringRead:|29628 ms| 1.350|48.086| 1780910| |jdk1.7.0_40|Doug's Patch|StringRead:|22902 ms| 1.747|62.209| 1780910| |jdk1.6.0_45|Charset|StringRead:|37476 ms | 1.067|38.016| 1780910| |jdk1.7.0_40|Charset|StringRead:|23376 ms | 1.711| 60.948| 1780910| Improve Utf8 to String conversion - Key: AVRO-1348 URL: https://issues.apache.org/jira/browse/AVRO-1348 Project: Avro Issue Type: Bug Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: AVRO-1348.patch, AVRO-1348v2.patch, AVRO1348v1.patch AVRO-1241 found that the existing method of creating Strings from Utf8 byte arrays could be made faster. The same method is being used in the Utf8.toString(), and could likely be sped up by doing the same thing. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1063) Ruby client should use multi_json rather than being locked down to yajl
[ https://issues.apache.org/jira/browse/AVRO-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851097#comment-13851097 ] Doug Cutting commented on AVRO-1063: I'll commit this soon unless there are objections. Sean? Ruby client should use multi_json rather than being locked down to yajl --- Key: AVRO-1063 URL: https://issues.apache.org/jira/browse/AVRO-1063 Project: Avro Issue Type: Improvement Components: ruby Reporter: Paul Dlug Priority: Minor Fix For: 1.7.6 Attachments: AVRO-1063.diff The avro ruby client uses yajl for JSON serialization which is just one of many suitable JSON implementations for ruby. The multi_json gem provides a wrapper for JSON serialization selecting the fastest library available (Oj is now even faster than Yajl) and falling back to a pure ruby implementation bundled with multi_json. Requiring yajl also precludes the ruby gem from being used under jruby since it requires a C extension. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1405) Avro-c may not handle eof correctly if avro data file contains multiple sync markers
[ https://issues.apache.org/jira/browse/AVRO-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851108#comment-13851108 ] Doug Cutting commented on AVRO-1405: I get a bunch of errors compiling the new test when I apply this. {code} /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c: In function ‘write_data’: /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:89:5: error: ‘for’ loop initial declarations are only allowed in C99 mode /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:89:5: note: use option -std=c99 or -std=gnu99 to compile your code /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c: At top level: /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:147:13: error: redefinition of ‘PERSON_SCHEMA’ /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:6:13: note: previous definition of ‘PERSON_SCHEMA’ was here /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:157:13: error: redefinition of ‘file’ /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:16:13: note: previous definition of ‘file’ was here /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:159:6: error: redefinition of ‘print_avro_value’ /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:18:6: note: previous definition of ‘print_avro_value’ was here /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:167:5: error: redefinition of ‘read_data’ /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:26:5: note: previous definition of ‘read_data’ was here /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:209:5: error: redefinition of ‘write_data’ /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:68:5: note: previous definition of ‘write_data’ was here /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c: In function ‘write_data’: /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:230:5: error: ‘for’ loop initial declarations are only allowed in C99 mode /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c: At top level: /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:268:5: error: redefinition of ‘main’ /home/cutting/src/avro/trunk/lang/c/tests/test_avro_1405.c:127:5: note: previous definition of ‘main’ was here make[2]: *** [tests/CMakeFiles/test_avro_1405.dir/test_avro_1405.o] Error 1 {code} Avro-c may not handle eof correctly if avro data file contains multiple sync markers Key: AVRO-1405 URL: https://issues.apache.org/jira/browse/AVRO-1405 Project: Avro Issue Type: Bug Components: c Affects Versions: 1.7.5 Reporter: Mika Ristimaki Assignee: Mika Ristimaki Priority: Minor Fix For: 1.7.6 Attachments: AVRO-1405.patch I encountered a bug in the Avro C API. If the following is done, it seems that the Avro data file reader can not read the file correctly {code} while (has values to write) { Open file for writing Write a value to the file Close the writer } {code} Reading this file with Avro data file reader fails with EOF after only the first item has been read from the file. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1379) C: Add avro_file_writer_append_encoded() to API
[ https://issues.apache.org/jira/browse/AVRO-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851110#comment-13851110 ] Doug Cutting commented on AVRO-1379: This looks reasonable to me. Can you please add a test? Thanks! C: Add avro_file_writer_append_encoded() to API --- Key: AVRO-1379 URL: https://issues.apache.org/jira/browse/AVRO-1379 Project: Avro Issue Type: Improvement Components: c Affects Versions: 1.7.5 Reporter: Mark Teodoro Labels: patch Fix For: 1.7.6 Attachments: AVRO-1379.patch Java's org.apache.avro.file.DataFileWriter.appendEncoded() allows you to append a pre-encoded datum to a file. This patch adds the same functionality to Avro-C. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Issue Comment Deleted] (AVRO-1063) Ruby client should use multi_json rather than being locked down to yajl
[ https://issues.apache.org/jira/browse/AVRO-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated AVRO-1063: --- Comment: was deleted (was: I'll commit this soon unless there are objections. Sean?) Ruby client should use multi_json rather than being locked down to yajl --- Key: AVRO-1063 URL: https://issues.apache.org/jira/browse/AVRO-1063 Project: Avro Issue Type: Improvement Components: ruby Reporter: Paul Dlug Priority: Minor Fix For: 1.8.0 Attachments: AVRO-1063.diff The avro ruby client uses yajl for JSON serialization which is just one of many suitable JSON implementations for ruby. The multi_json gem provides a wrapper for JSON serialization selecting the fastest library available (Oj is now even faster than Yajl) and falling back to a pure ruby implementation bundled with multi_json. Requiring yajl also precludes the ruby gem from being used under jruby since it requires a C extension. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (AVRO-1063) Ruby client should use multi_json rather than being locked down to yajl
[ https://issues.apache.org/jira/browse/AVRO-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated AVRO-1063: --- Fix Version/s: (was: 1.7.6) 1.8.0 This is actually incompatible. It requires that folks have the multi-json gem installed. Folks could not upgrade to a version of Avro that includes this change without also installing multi-json. So it needs to be done as a part of Avro 1.8. Ruby client should use multi_json rather than being locked down to yajl --- Key: AVRO-1063 URL: https://issues.apache.org/jira/browse/AVRO-1063 Project: Avro Issue Type: Improvement Components: ruby Reporter: Paul Dlug Priority: Minor Fix For: 1.8.0 Attachments: AVRO-1063.diff The avro ruby client uses yajl for JSON serialization which is just one of many suitable JSON implementations for ruby. The multi_json gem provides a wrapper for JSON serialization selecting the fastest library available (Oj is now even faster than Yajl) and falling back to a pure ruby implementation bundled with multi_json. Requiring yajl also precludes the ruby gem from being used under jruby since it requires a C extension. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (AVRO-1414) Compression with C++ DataFile
[ https://issues.apache.org/jira/browse/AVRO-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russel updated AVRO-1414: Attachment: patch Compression with C++ DataFile - Key: AVRO-1414 URL: https://issues.apache.org/jira/browse/AVRO-1414 Project: Avro Issue Type: Improvement Components: c++ Reporter: Daniel Russel Attachments: patch There is no way to use compression with the C++ DataFileReader and C++ DataFileWriter, from what I can tell. Adding compression of the written blocks using boost streams is relatively straight forward and I can provide a patch if people are interested. However, there are a couple caveats: - the windows builds of boost don't currently include zlib support (required for compression) by default. You have to do extra work to get it. - I don't know if doing it that way is compatible with other avro implementations -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1414) Compression with C++ DataFile
[ https://issues.apache.org/jira/browse/AVRO-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851207#comment-13851207 ] Daniel Russel commented on AVRO-1414: - You can find a patch for nice inspection at https://github.com/salilab/avrocpp/compare/compression. I don't see an easy way of downloading it from there though. So it is attached too. Compression with C++ DataFile - Key: AVRO-1414 URL: https://issues.apache.org/jira/browse/AVRO-1414 Project: Avro Issue Type: Improvement Components: c++ Reporter: Daniel Russel Attachments: patch There is no way to use compression with the C++ DataFileReader and C++ DataFileWriter, from what I can tell. Adding compression of the written blocks using boost streams is relatively straight forward and I can provide a patch if people are interested. However, there are a couple caveats: - the windows builds of boost don't currently include zlib support (required for compression) by default. You have to do extra work to get it. - I don't know if doing it that way is compatible with other avro implementations -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (AVRO-1063) Ruby client should use multi_json rather than being locked down to yajl
[ https://issues.apache.org/jira/browse/AVRO-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851347#comment-13851347 ] Sean Busbey commented on AVRO-1063: --- I tested this with a few different json providing backends on MRI and JRuby 1.7.3. tests all passed fine. I thought the dependencies for this new version of the gem properly specified multi-json as a dependency that would be pulled in by the Gem utils when a user upgraded. If that's the case would it be sufficiently compatible? Or is a dependency change itself incompatible? Ruby client should use multi_json rather than being locked down to yajl --- Key: AVRO-1063 URL: https://issues.apache.org/jira/browse/AVRO-1063 Project: Avro Issue Type: Improvement Components: ruby Reporter: Paul Dlug Priority: Minor Fix For: 1.8.0 Attachments: AVRO-1063.diff The avro ruby client uses yajl for JSON serialization which is just one of many suitable JSON implementations for ruby. The multi_json gem provides a wrapper for JSON serialization selecting the fastest library available (Oj is now even faster than Yajl) and falling back to a pure ruby implementation bundled with multi_json. Requiring yajl also precludes the ruby gem from being used under jruby since it requires a C extension. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (AVRO-1379) C: Add avro_file_writer_append_encoded() to API
[ https://issues.apache.org/jira/browse/AVRO-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Teodoro updated AVRO-1379: --- Attachment: AVRO-1379-test.patch No problem, here's another patch with a test. C: Add avro_file_writer_append_encoded() to API --- Key: AVRO-1379 URL: https://issues.apache.org/jira/browse/AVRO-1379 Project: Avro Issue Type: Improvement Components: c Affects Versions: 1.7.5 Reporter: Mark Teodoro Labels: patch Fix For: 1.7.6 Attachments: AVRO-1379-test.patch, AVRO-1379.patch Java's org.apache.avro.file.DataFileWriter.appendEncoded() allows you to append a pre-encoded datum to a file. This patch adds the same functionality to Avro-C. -- This message was sent by Atlassian JIRA (v6.1.4#6159)