It will work on an OutputStream that supports append. http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, java.io.OutputStream)
So it depends on how well HDFS implements FileSystem#append(), not on any changes in Avro. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) I have no recent personal experience with append in HDFS. Does anyone else here? Doug On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <michaelma...@yahoo.com> wrote: > My understanding is that will append to a file on the local filesystem, but > not to a file on HDFS. > > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> wrote: > >> From: Doug Cutting <cutt...@apache.org> >> Subject: Re: Is it possible to append to an already existing avro file >> To: user@avro.apache.org >> Date: Tuesday, February 5, 2013, 5:08 PM >> The Jira is: >> >> https://issues.apache.org/jira/browse/AVRO-1035 >> >> It is possible to append to an existing Avro file: >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> Should we close that issue as "fixed"? >> >> Doug >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <michaelma...@yahoo.com> >> wrote: >> > Was a JIRA ticket ever created regarding appending to >> an existing Avro file on HDFS? >> > >> > What is the status of such a capability, a year out >> from when the issue below was raised? >> > >> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav >> Zholudev" <vyacheslav.zholu...@gmail.com> >> wrote: >> > >> >> Thanks for your reply, I suspected this. >> >> >> >> I will create a JIRA ticket. >> >> >> >> Vyacheslav >> >> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: >> >> >> >>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" >> <vyacheslav.zholu...@gmail.com> >> >>> wrote: >> >>> >> >>>> Yep, I saw that method as well as the >> stackoverflow post. However, I'm >> >>>> interested how to append to a file on the >> arbitrary file system, not >> >>>> only on the local one. >> >>>> >> >>>> I want to get an OutputStream based on the >> Path and the FileSystem >> >>>> implementation and then pass it for >> appending to avro methods. >> >>>> >> >>>> Is that possible? >> >>> >> >>> It is not possible without modifying >> DataFileWriter. Please open a JIRA >> >>> ticket. >> >>> >> >>> It could not simply append to an OutputStream, >> since it must either: >> >>> * Seek to the start to validate the schemas >> match and find the sync >> >>> marker, or >> >>> * Trust that the schemas match and find the >> sync marker from the last >> >>> block >> >>> >> >>> DataFileWriter cannot refer to Hadoop classes >> such as FileSystem, but we >> >>> could add something to the mapred module that >> takes a Path and >> >>> FileSystem and returns something that >> implemements an interface that >> >>> DataFileWriter can append to. This would >> be something that is both a >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >>> and an OutputStream, or has both an InputStream >> from the start of the >> >>> existing file and an OutputStream at the end. >> >>> >> >>>> Thanks, >> >>>> Vyacheslav >> >>>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J >> wrote: >> >>>> >> >>>>> Hi, >> >>>>> >> >>>>> Use the appendTo feature of the >> DataFileWriter. See >> >>>>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >>>>> >> >>>>> For a quick setup example, read also: >> >>>>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >>>>> >> >>>>> On Tue, Feb 21, 2012 at 3:15 AM, >> Vyacheslav Zholudev >> >>>>> <vyacheslav.zholu...@gmail.com> >> wrote: >> >>>>>> Hi, >> >>>>>> >> >>>>>> is it possible to append to an >> already existing avro file when it was >> >>>>>> written and closed before? >> >>>>>> >> >>>>>> If I use >> >>>>>> outputStream = >> fs.append(avroFilePath); >> >>>>>> >> >>>>>> then later on I get: >> java.io.IOException: Invalid sync! >> >>>>>> >> >>>>>> Probably because the schema is >> written twice and some other issues. >> >>>>>> >> >>>>>> If I use outputStream = >> fs.create(avroFilePath); then the avro file >> >>>>>> gets >> >>>>>> overwritten. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Vyacheslav >> >>>>> >> >>>>> -- >> >>>>> Harsh J >> >>>>> Customer Ops. Engineer >> >>>>> Cloudera | http://tiny.cloudera.com/about >> > >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <michaelma...@yahoo.com> >> wrote: >> > Was a JIRA ticket ever created regarding appending to >> an existing Avro file on HDFS? >> > >> > What is the status of such a capability, a year out >> from when the issue below was raised? >> > >> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav >> Zholudev" <vyacheslav.zholu...@gmail.com> >> wrote: >> > >> >> Thanks for your reply, I suspected this. >> >> >> >> I will create a JIRA ticket. >> >> >> >> Vyacheslav >> >> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: >> >> >> >>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" >> <vyacheslav.zholu...@gmail.com> >> >>> wrote: >> >>> >> >>>> Yep, I saw that method as well as the >> stackoverflow post. However, I'm >> >>>> interested how to append to a file on the >> arbitrary file system, not >> >>>> only on the local one. >> >>>> >> >>>> I want to get an OutputStream based on the >> Path and the FileSystem >> >>>> implementation and then pass it for >> appending to avro methods. >> >>>> >> >>>> Is that possible? >> >>> >> >>> It is not possible without modifying >> DataFileWriter. Please open a JIRA >> >>> ticket. >> >>> >> >>> It could not simply append to an OutputStream, >> since it must either: >> >>> * Seek to the start to validate the schemas >> match and find the sync >> >>> marker, or >> >>> * Trust that the schemas match and find the >> sync marker from the last >> >>> block >> >>> >> >>> DataFileWriter cannot refer to Hadoop classes >> such as FileSystem, but we >> >>> could add something to the mapred module that >> takes a Path and >> >>> FileSystem and returns something that >> implemements an interface that >> >>> DataFileWriter can append to. This would >> be something that is both a >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >>> and an OutputStream, or has both an InputStream >> from the start of the >> >>> existing file and an OutputStream at the end. >> >>> >> >>>> Thanks, >> >>>> Vyacheslav >> >>>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J >> wrote: >> >>>> >> >>>>> Hi, >> >>>>> >> >>>>> Use the appendTo feature of the >> DataFileWriter. See >> >>>>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >>>>> >> >>>>> For a quick setup example, read also: >> >>>>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >>>>> >> >>>>> On Tue, Feb 21, 2012 at 3:15 AM, >> Vyacheslav Zholudev >> >>>>> <vyacheslav.zholu...@gmail.com> >> wrote: >> >>>>>> Hi, >> >>>>>> >> >>>>>> is it possible to append to an >> already existing avro file when it was >> >>>>>> written and closed before? >> >>>>>> >> >>>>>> If I use >> >>>>>> outputStream = >> fs.append(avroFilePath); >> >>>>>> >> >>>>>> then later on I get: >> java.io.IOException: Invalid sync! >> >>>>>> >> >>>>>> Probably because the schema is >> written twice and some other issues. >> >>>>>> >> >>>>>> If I use outputStream = >> fs.create(avroFilePath); then the avro file >> >>>>>> gets >> >>>>>> overwritten. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Vyacheslav >> >>>>> >> >>>>> -- >> >>>>> Harsh J >> >>>>> Customer Ops. Engineer >> >>>>> Cloudera | http://tiny.cloudera.com/about >> > >>