The Jira is: https://issues.apache.org/jira/browse/AVRO-1035
It is possible to append to an existing Avro file: http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) Should we close that issue as "fixed"? Doug On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <[email protected]> wrote: > Was a JIRA ticket ever created regarding appending to an existing Avro file > on HDFS? > > What is the status of such a capability, a year out from when the issue below > was raised? > > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" > <[email protected]> wrote: > >> Thanks for your reply, I suspected this. >> >> I will create a JIRA ticket. >> >> Vyacheslav >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: >> >>> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <[email protected]> >>> wrote: >>> >>>> Yep, I saw that method as well as the stackoverflow post. However, I'm >>>> interested how to append to a file on the arbitrary file system, not >>>> only on the local one. >>>> >>>> I want to get an OutputStream based on the Path and the FileSystem >>>> implementation and then pass it for appending to avro methods. >>>> >>>> Is that possible? >>> >>> It is not possible without modifying DataFileWriter. Please open a JIRA >>> ticket. >>> >>> It could not simply append to an OutputStream, since it must either: >>> * Seek to the start to validate the schemas match and find the sync >>> marker, or >>> * Trust that the schemas match and find the sync marker from the last >>> block >>> >>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we >>> could add something to the mapred module that takes a Path and >>> FileSystem and returns something that implemements an interface that >>> DataFileWriter can append to. This would be something that is both a >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >>> and an OutputStream, or has both an InputStream from the start of the >>> existing file and an OutputStream at the end. >>> >>>> Thanks, >>>> Vyacheslav >>>> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote: >>>> >>>>> Hi, >>>>> >>>>> Use the appendTo feature of the DataFileWriter. See >>>>> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>>> >>>>> For a quick setup example, read also: >>>>> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>>> >>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev >>>>> <[email protected]> wrote: >>>>>> Hi, >>>>>> >>>>>> is it possible to append to an already existing avro file when it was >>>>>> written and closed before? >>>>>> >>>>>> If I use >>>>>> outputStream = fs.append(avroFilePath); >>>>>> >>>>>> then later on I get: java.io.IOException: Invalid sync! >>>>>> >>>>>> Probably because the schema is written twice and some other issues. >>>>>> >>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file >>>>>> gets >>>>>> overwritten. >>>>>> >>>>>> Thanks, >>>>>> Vyacheslav >>>>> >>>>> -- >>>>> Harsh J >>>>> Customer Ops. Engineer >>>>> Cloudera | http://tiny.cloudera.com/about > On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <[email protected]> wrote: > Was a JIRA ticket ever created regarding appending to an existing Avro file > on HDFS? > > What is the status of such a capability, a year out from when the issue below > was raised? > > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" > <[email protected]> wrote: > >> Thanks for your reply, I suspected this. >> >> I will create a JIRA ticket. >> >> Vyacheslav >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: >> >>> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <[email protected]> >>> wrote: >>> >>>> Yep, I saw that method as well as the stackoverflow post. However, I'm >>>> interested how to append to a file on the arbitrary file system, not >>>> only on the local one. >>>> >>>> I want to get an OutputStream based on the Path and the FileSystem >>>> implementation and then pass it for appending to avro methods. >>>> >>>> Is that possible? >>> >>> It is not possible without modifying DataFileWriter. Please open a JIRA >>> ticket. >>> >>> It could not simply append to an OutputStream, since it must either: >>> * Seek to the start to validate the schemas match and find the sync >>> marker, or >>> * Trust that the schemas match and find the sync marker from the last >>> block >>> >>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we >>> could add something to the mapred module that takes a Path and >>> FileSystem and returns something that implemements an interface that >>> DataFileWriter can append to. This would be something that is both a >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >>> and an OutputStream, or has both an InputStream from the start of the >>> existing file and an OutputStream at the end. >>> >>>> Thanks, >>>> Vyacheslav >>>> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote: >>>> >>>>> Hi, >>>>> >>>>> Use the appendTo feature of the DataFileWriter. See >>>>> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >>>>> >>>>> For a quick setup example, read also: >>>>> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >>>>> >>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev >>>>> <[email protected]> wrote: >>>>>> Hi, >>>>>> >>>>>> is it possible to append to an already existing avro file when it was >>>>>> written and closed before? >>>>>> >>>>>> If I use >>>>>> outputStream = fs.append(avroFilePath); >>>>>> >>>>>> then later on I get: java.io.IOException: Invalid sync! >>>>>> >>>>>> Probably because the schema is written twice and some other issues. >>>>>> >>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file >>>>>> gets >>>>>> overwritten. >>>>>> >>>>>> Thanks, >>>>>> Vyacheslav >>>>> >>>>> -- >>>>> Harsh J >>>>> Customer Ops. Engineer >>>>> Cloudera | http://tiny.cloudera.com/about >
