Re: Is it possible to append to an already existing avro file

2013-07-09 Thread Doug Cutting
Since the exception is thrown from java.io.FileInputStream#open, it's
trying to append to a local file, not one in HDFS.

You're passing 'new File(...)' to appendTo, when you should probably
be passing 'new FsInput(...)'.

Doug

On Mon, Jul 8, 2013 at 9:29 AM, TrevniUser dipti.de...@cerner.com wrote:
 I was following this thread for a problem I am facing while using
 SortedKeyValueFiles.

 Below is the piece of code that tries to obtain the appropriate writer based
 on whether I am appending or creating a new file:

 OutputStream dataOutputStream;
 if (!fileSystem.exists(dataFilePath)) {
 dataOutputStream = fileSystem.create(dataFilePath);
 mDataFileWriter = new
 DataFileWriterGenericRecord(datumWriter).setSyncInterval(1 
 20).create(mRecordSchema, dataOutputStream);
 } else {
 dataOutputStream = fileSystem.append(dataFilePath);
 mDataFileWriter = new
 DataFileWriterGenericRecord(datumWriter).setSyncInterval(1 
 20).appendTo(new File(options.getPath() + DATA_FILENAME));
 }

 but it fails with this:

 java.io.FileNotFoundException: /CHANGELOG/data (No such file or directory)
 at java.io.FileInputStream.open(Native Method)
 at java.io.FileInputStream.init(FileInputStream.java:120)
 at 
 org.apache.avro.file.SeekableFileInput.init(SeekableFileInput.java:29)
 at 
 org.apache.avro.file.DataFileWriter.appendTo(DataFileWriter.java:149)
 at
 com.abc.kepler.datasink.hdfs.util.SortedKeyValueFile$Writer.init(SortedKeyValueFile.java:597)
 at
 com.abc.kepler.datasink.hdfs.util.ChangeLogUtil.getChangeLogWriter(ChangeLogUtil.java:84)
 at
 com.abc.kepler.datasink.hdfs.HDFSDataSinkChangeLog.append(HDFSDataSinkChangeLog.java:219)
 at
 com.abc.kepler.datasink.hdfs.HDFSDataSinkChangesTest.writeDataSingleEntityKeyDefaultLocation(HDFSDataSinkChangesTest.java:1036)
 at
 com.abc.kepler.datasink.hdfs.HDFSDataSinkChangesTest.javadocExampleTest(HDFSDataSinkChangesTest.java:645)

 So, is the avro writer it not able to locate the file on hdfs? Could you
 please share some pointers what could be leading to this?



 --
 View this message in context: 
 http://apache-avro.679487.n3.nabble.com/Is-it-possible-to-append-to-an-already-existing-avro-file-tp3762049p4027785.html
 Sent from the Avro - Users mailing list archive at Nabble.com.


Re: Is it possible to append to an already existing avro file

2013-07-09 Thread TrevniUser
Thanks for replying. You are correct. I followed this example
https://gist.github.com/QwertyManiac/4724582



--
View this message in context: 
http://apache-avro.679487.n3.nabble.com/Is-it-possible-to-append-to-an-already-existing-avro-file-tp3762049p4027789.html
Sent from the Avro - Users mailing list archive at Nabble.com.


Re: Is it possible to append to an already existing avro file

2013-02-07 Thread Harsh J
I assume by non-trivial you meant the extra Seekable stuff I needed to
wrap around the DFS output streams to let Avro take it as append-able?
I don't think its possible for Avro to carry it since Avro (core) does
not reverse-depend on Hadoop. Should we document it somewhere though?
Do you have any ideas on the best place to do that?

On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak michaelma...@yahoo.com wrote:
 Thanks so much for the code -- it works great!

 Since it is a non-trivial amount of code required to achieve append, I 
 suggest attaching that code to AVRO-1035, in the hopes that someone will come 
 up with an interface that requires just one line of user code to achieve 
 append.

 --- On Wed, 2/6/13, Harsh J ha...@cloudera.com wrote:

 From: Harsh J ha...@cloudera.com
 Subject: Re: Is it possible to append to an already existing avro file
 To: user@avro.apache.org
 Date: Wednesday, February 6, 2013, 11:17 AM
 Hey Michael,

 It does implement the regular Java OutputStream interface,
 as seen in
 the API: 
 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.

 Here's a sample program that works on Hadoop 2.x in my
 tests:
 https://gist.github.com/QwertyManiac/4724582

 On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak michaelma...@yahoo.com
 wrote:
  I don't believe a Hadoop FileSystem is a Java
 OutputStream?
 
  --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org
 wrote:
 
  From: Doug Cutting cutt...@apache.org
  Subject: Re: Is it possible to append to an already
 existing avro file
  To: user@avro.apache.org
  Date: Tuesday, February 5, 2013, 5:27 PM
  It will work on an OutputStream that
  supports append.
 
  http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
  java.io.OutputStream)
 
  So it depends on how well HDFS implements
  FileSystem#append(), not on
  any changes in Avro.
 
  http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
 
  I have no recent personal experience with append
 in
  HDFS.  Does anyone
  else here?
 
  Doug
 
  On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
 michaelma...@yahoo.com
  wrote:
   My understanding is that will append to a file
 on the
  local filesystem, but not to a file on HDFS.
  
   --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org
  wrote:
  
   From: Doug Cutting cutt...@apache.org
   Subject: Re: Is it possible to append to
 an already
  existing avro file
   To: user@avro.apache.org
   Date: Tuesday, February 5, 2013, 5:08 PM
   The Jira is:
  
   https://issues.apache.org/jira/browse/AVRO-1035
  
   It is possible to append to an existing
 Avro file:
  
   http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
  
   Should we close that issue as fixed?
  
   Doug
  
   On Fri, Feb 1, 2013 at 11:32 AM, Michael
 Malak
  michaelma...@yahoo.com
   wrote:
Was a JIRA ticket ever created
 regarding
  appending to
   an existing Avro file on HDFS?
   
What is the status of such a
 capability, a
  year out
   from when the issue below was raised?
   
On Wed, 22 Feb 2012 10:57:48 +0100,
  Vyacheslav
   Zholudev vyacheslav.zholu...@gmail.com
   wrote:
   
Thanks for your reply, I
 suspected this.
   
I will create a JIRA ticket.
   
Vyacheslav
   
On Feb 21, 2012, at 6:02 PM,
 Scott Carey
  wrote:
   
   
On 2/21/12 7:29 AM,
 Vyacheslav
  Zholudev
   vyacheslav.zholu...@gmail.com
wrote:
   
Yep, I saw that method as
 well as
  the
   stackoverflow post. However, I'm
interested how to append
 to a file
  on the
   arbitrary file system, not
only on the local one.
   
I want to get an
 OutputStream
  based on the
   Path and the FileSystem
implementation and then
 pass it
  for
   appending to avro methods.
   
Is that possible?
   
It is not possible without
 modifying
   DataFileWriter. Please open a JIRA
ticket.
   
It could not simply append to
 an
  OutputStream,
   since it must either:
* Seek to the start to
 validate the
  schemas
   match and find the sync
marker, or
* Trust that the schemas
 match and
  find the
   sync marker from the last
block
   
DataFileWriter cannot refer
 to Hadoop
  classes
   such as FileSystem, but we
could add something to the
 mapred
  module that
   takes a Path and
FileSystem and returns
 something that
   implemements an interface that
DataFileWriter can append
 to.
  This would
   be something that is both a
http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
and an OutputStream, or has
 both an
  InputStream
   from the start of the
existing file and an
 OutputStream at
  the end.
   
Thanks,
Vyacheslav
   
On Feb 21, 2012, at 5:29
 AM, Harsh
  J
   wrote:
   
Hi,
   
Use the appendTo
 feature of
  the
   DataFileWriter. See

Re: Is it possible to append to an already existing avro file

2013-02-07 Thread Michael Malak
I confess to being a user of rather than a developer of open source, but 
perhaps you could elaborate on what depends on means and what the 
consequences are?

Isn't it -- or couldn't it be made -- a run-time binding, so that only those 
who try to use the HDFS append functionality would be required to also include 
the HDFS Jars in their classpath?

Or is the issue more of a bookkeeping one, whereby every update to HDFS will 
require an Avro regression test?

Now that Hive supports Avro as of the Jan. 11 release of Hive 0.10, the use 
case of ingesting data into Avro on HDFS is only going to get more popular, and 
appending is very handy for ingesting, especially for live real-time or 
near-real-time data.  So it seems to me that if the inconveniences are minor or 
can be worked around, that Avro indeed should perhaps depend on HDFS.

--- On Thu, 2/7/13, Harsh J ha...@cloudera.com wrote:

 From: Harsh J ha...@cloudera.com
 Subject: Re: Is it possible to append to an already existing avro file
 To: user@avro.apache.org
 Date: Thursday, February 7, 2013, 9:28 AM
 I assume by non-trivial you meant the
 extra Seekable stuff I needed to
 wrap around the DFS output streams to let Avro take it as
 append-able?
 I don't think its possible for Avro to carry it since Avro
 (core) does
 not reverse-depend on Hadoop. Should we document it
 somewhere though?
 Do you have any ideas on the best place to do that?
 
 On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak michaelma...@yahoo.com
 wrote:
  Thanks so much for the code -- it works great!
 
  Since it is a non-trivial amount of code required to
  achieve append, I suggest attaching that code to AVRO-1035,
  in the hopes that someone will come up with an interface
  that requires just one line of user code to achieve append.
 
  --- On Wed, 2/6/13, Harsh J ha...@cloudera.com
 wrote:
 
  From: Harsh J ha...@cloudera.com
  Subject: Re: Is it possible to append to an already existing avro file
  To: user@avro.apache.org
  Date: Wednesday, February 6, 2013, 11:17 AM
  Hey Michael,
 
  It does implement the regular Java OutputStream interface,
  as seen in
  the API: 
  http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
 
  Here's a sample program that works on Hadoop 2.x in my
  tests:
  https://gist.github.com/QwertyManiac/4724582



Re: Is it possible to append to an already existing avro file

2013-02-07 Thread Doug Cutting
The avro-mapred module includes a Seekable implementation that works
with HDFS called FsInput:

http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/FsInput.html

With this, your example can be made considerably smaller.

Doug



On Thu, Feb 7, 2013 at 8:28 AM, Harsh J ha...@cloudera.com wrote:
 I assume by non-trivial you meant the extra Seekable stuff I needed to
 wrap around the DFS output streams to let Avro take it as append-able?
 I don't think its possible for Avro to carry it since Avro (core) does
 not reverse-depend on Hadoop. Should we document it somewhere though?
 Do you have any ideas on the best place to do that?

 On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak michaelma...@yahoo.com wrote:
 Thanks so much for the code -- it works great!

 Since it is a non-trivial amount of code required to achieve append, I 
 suggest attaching that code to AVRO-1035, in the hopes that someone will 
 come up with an interface that requires just one line of user code to 
 achieve append.

 --- On Wed, 2/6/13, Harsh J ha...@cloudera.com wrote:

 From: Harsh J ha...@cloudera.com
 Subject: Re: Is it possible to append to an already existing avro file
 To: user@avro.apache.org
 Date: Wednesday, February 6, 2013, 11:17 AM
 Hey Michael,

 It does implement the regular Java OutputStream interface,
 as seen in
 the API: 
 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.

 Here's a sample program that works on Hadoop 2.x in my
 tests:
 https://gist.github.com/QwertyManiac/4724582

 On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak michaelma...@yahoo.com
 wrote:
  I don't believe a Hadoop FileSystem is a Java
 OutputStream?
 
  --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org
 wrote:
 
  From: Doug Cutting cutt...@apache.org
  Subject: Re: Is it possible to append to an already
 existing avro file
  To: user@avro.apache.org
  Date: Tuesday, February 5, 2013, 5:27 PM
  It will work on an OutputStream that
  supports append.
 
  http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
  java.io.OutputStream)
 
  So it depends on how well HDFS implements
  FileSystem#append(), not on
  any changes in Avro.
 
  http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
 
  I have no recent personal experience with append
 in
  HDFS.  Does anyone
  else here?
 
  Doug
 
  On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
 michaelma...@yahoo.com
  wrote:
   My understanding is that will append to a file
 on the
  local filesystem, but not to a file on HDFS.
  
   --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org
  wrote:
  
   From: Doug Cutting cutt...@apache.org
   Subject: Re: Is it possible to append to
 an already
  existing avro file
   To: user@avro.apache.org
   Date: Tuesday, February 5, 2013, 5:08 PM
   The Jira is:
  
   https://issues.apache.org/jira/browse/AVRO-1035
  
   It is possible to append to an existing
 Avro file:
  
   http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
  
   Should we close that issue as fixed?
  
   Doug
  
   On Fri, Feb 1, 2013 at 11:32 AM, Michael
 Malak
  michaelma...@yahoo.com
   wrote:
Was a JIRA ticket ever created
 regarding
  appending to
   an existing Avro file on HDFS?
   
What is the status of such a
 capability, a
  year out
   from when the issue below was raised?
   
On Wed, 22 Feb 2012 10:57:48 +0100,
  Vyacheslav
   Zholudev vyacheslav.zholu...@gmail.com
   wrote:
   
Thanks for your reply, I
 suspected this.
   
I will create a JIRA ticket.
   
Vyacheslav
   
On Feb 21, 2012, at 6:02 PM,
 Scott Carey
  wrote:
   
   
On 2/21/12 7:29 AM,
 Vyacheslav
  Zholudev
   vyacheslav.zholu...@gmail.com
wrote:
   
Yep, I saw that method as
 well as
  the
   stackoverflow post. However, I'm
interested how to append
 to a file
  on the
   arbitrary file system, not
only on the local one.
   
I want to get an
 OutputStream
  based on the
   Path and the FileSystem
implementation and then
 pass it
  for
   appending to avro methods.
   
Is that possible?
   
It is not possible without
 modifying
   DataFileWriter. Please open a JIRA
ticket.
   
It could not simply append to
 an
  OutputStream,
   since it must either:
* Seek to the start to
 validate the
  schemas
   match and find the sync
marker, or
* Trust that the schemas
 match and
  find the
   sync marker from the last
block
   
DataFileWriter cannot refer
 to Hadoop
  classes
   such as FileSystem, but we
could add something to the
 mapred
  module that
   takes a Path and
FileSystem and returns
 something that
   implemements an interface that
DataFileWriter can append
 to.
  This would
   be something that is both a
http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file

Re: Is it possible to append to an already existing avro file

2013-02-06 Thread Michael Malak
Thanks so much for the code -- it works great!

Since it is a non-trivial amount of code required to achieve append, I suggest 
attaching that code to AVRO-1035, in the hopes that someone will come up with 
an interface that requires just one line of user code to achieve append.

--- On Wed, 2/6/13, Harsh J ha...@cloudera.com wrote:

 From: Harsh J ha...@cloudera.com
 Subject: Re: Is it possible to append to an already existing avro file
 To: user@avro.apache.org
 Date: Wednesday, February 6, 2013, 11:17 AM
 Hey Michael,
 
 It does implement the regular Java OutputStream interface,
 as seen in
 the API: 
 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
 
 Here's a sample program that works on Hadoop 2.x in my
 tests:
 https://gist.github.com/QwertyManiac/4724582
 
 On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak michaelma...@yahoo.com
 wrote:
  I don't believe a Hadoop FileSystem is a Java
 OutputStream?
 
  --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org
 wrote:
 
  From: Doug Cutting cutt...@apache.org
  Subject: Re: Is it possible to append to an already
 existing avro file
  To: user@avro.apache.org
  Date: Tuesday, February 5, 2013, 5:27 PM
  It will work on an OutputStream that
  supports append.
 
  http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
  java.io.OutputStream)
 
  So it depends on how well HDFS implements
  FileSystem#append(), not on
  any changes in Avro.
 
  http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
 
  I have no recent personal experience with append
 in
  HDFS.  Does anyone
  else here?
 
  Doug
 
  On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
 michaelma...@yahoo.com
  wrote:
   My understanding is that will append to a file
 on the
  local filesystem, but not to a file on HDFS.
  
   --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org
  wrote:
  
   From: Doug Cutting cutt...@apache.org
   Subject: Re: Is it possible to append to
 an already
  existing avro file
   To: user@avro.apache.org
   Date: Tuesday, February 5, 2013, 5:08 PM
   The Jira is:
  
   https://issues.apache.org/jira/browse/AVRO-1035
  
   It is possible to append to an existing
 Avro file:
  
   http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
  
   Should we close that issue as fixed?
  
   Doug
  
   On Fri, Feb 1, 2013 at 11:32 AM, Michael
 Malak
  michaelma...@yahoo.com
   wrote:
Was a JIRA ticket ever created
 regarding
  appending to
   an existing Avro file on HDFS?
   
What is the status of such a
 capability, a
  year out
   from when the issue below was raised?
   
On Wed, 22 Feb 2012 10:57:48 +0100,
  Vyacheslav
   Zholudev vyacheslav.zholu...@gmail.com
   wrote:
   
Thanks for your reply, I
 suspected this.
   
I will create a JIRA ticket.
   
Vyacheslav
   
On Feb 21, 2012, at 6:02 PM,
 Scott Carey
  wrote:
   
   
On 2/21/12 7:29 AM,
 Vyacheslav
  Zholudev
   vyacheslav.zholu...@gmail.com
wrote:
   
Yep, I saw that method as
 well as
  the
   stackoverflow post. However, I'm
interested how to append
 to a file
  on the
   arbitrary file system, not
only on the local one.
   
I want to get an
 OutputStream
  based on the
   Path and the FileSystem
implementation and then
 pass it
  for
   appending to avro methods.
   
Is that possible?
   
It is not possible without
 modifying
   DataFileWriter. Please open a JIRA
ticket.
   
It could not simply append to
 an
  OutputStream,
   since it must either:
* Seek to the start to
 validate the
  schemas
   match and find the sync
marker, or
* Trust that the schemas
 match and
  find the
   sync marker from the last
block
   
DataFileWriter cannot refer
 to Hadoop
  classes
   such as FileSystem, but we
could add something to the
 mapred
  module that
   takes a Path and
FileSystem and returns
 something that
   implemements an interface that
DataFileWriter can append
 to.
  This would
   be something that is both a
http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
and an OutputStream, or has
 both an
  InputStream
   from the start of the
existing file and an
 OutputStream at
  the end.
   
Thanks,
Vyacheslav
   
On Feb 21, 2012, at 5:29
 AM, Harsh
  J
   wrote:
   
Hi,
   
Use the appendTo
 feature of
  the
   DataFileWriter. See
   
http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
   
For a quick setup
 example,
  read also:
   
http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
   
On Tue, Feb 21, 2012
 at 3:15
  AM,
   Vyacheslav Zholudev
vyacheslav.zholu...@gmail.com
   wrote:
Hi,
   
is it possible to
 append

Re: Is it possible to append to an already existing avro file

2013-02-05 Thread Doug Cutting
It will work on an OutputStream that supports append.

http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
java.io.OutputStream)

So it depends on how well HDFS implements FileSystem#append(), not on
any changes in Avro.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)

I have no recent personal experience with append in HDFS.  Does anyone
else here?

Doug

On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak michaelma...@yahoo.com wrote:
 My understanding is that will append to a file on the local filesystem, but 
 not to a file on HDFS.

 --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote:

 From: Doug Cutting cutt...@apache.org
 Subject: Re: Is it possible to append to an already existing avro file
 To: user@avro.apache.org
 Date: Tuesday, February 5, 2013, 5:08 PM
 The Jira is:

 https://issues.apache.org/jira/browse/AVRO-1035

 It is possible to append to an existing Avro file:

 http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)

 Should we close that issue as fixed?

 Doug

 On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com
 wrote:
  Was a JIRA ticket ever created regarding appending to
 an existing Avro file on HDFS?
 
  What is the status of such a capability, a year out
 from when the issue below was raised?
 
  On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav
 Zholudev vyacheslav.zholu...@gmail.com
 wrote:
 
  Thanks for your reply, I suspected this.
 
  I will create a JIRA ticket.
 
  Vyacheslav
 
  On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
 
 
  On 2/21/12 7:29 AM, Vyacheslav Zholudev
 vyacheslav.zholu...@gmail.com
  wrote:
 
  Yep, I saw that method as well as the
 stackoverflow post. However, I'm
  interested how to append to a file on the
 arbitrary file system, not
  only on the local one.
 
  I want to get an OutputStream based on the
 Path and the FileSystem
  implementation and then pass it for
 appending to avro methods.
 
  Is that possible?
 
  It is not possible without modifying
 DataFileWriter. Please open a JIRA
  ticket.
 
  It could not simply append to an OutputStream,
 since it must either:
  * Seek to the start to validate the schemas
 match and find the sync
  marker, or
  * Trust that the schemas match and find the
 sync marker from the last
  block
 
  DataFileWriter cannot refer to Hadoop classes
 such as FileSystem, but we
  could add something to the mapred module that
 takes a Path and
  FileSystem and returns something that
 implemements an interface that
  DataFileWriter can append to.  This would
 be something that is both a
  http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
  and an OutputStream, or has both an InputStream
 from the start of the
  existing file and an OutputStream at the end.
 
  Thanks,
  Vyacheslav
 
  On Feb 21, 2012, at 5:29 AM, Harsh J
 wrote:
 
  Hi,
 
  Use the appendTo feature of the
 DataFileWriter. See
 
  http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
 
  For a quick setup example, read also:
 
  http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
 
  On Tue, Feb 21, 2012 at 3:15 AM,
 Vyacheslav Zholudev
  vyacheslav.zholu...@gmail.com
 wrote:
  Hi,
 
  is it possible to append to an
 already existing avro file when it was
  written and closed before?
 
  If I use
  outputStream =
 fs.append(avroFilePath);
 
  then later on I get:
 java.io.IOException: Invalid sync!
 
  Probably because the schema is
 written twice and some other issues.
 
  If I use outputStream =
 fs.create(avroFilePath); then the avro file
  gets
  overwritten.
 
  Thanks,
  Vyacheslav
 
  --
  Harsh J
  Customer Ops. Engineer
  Cloudera | http://tiny.cloudera.com/about
 

 On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com
 wrote:
  Was a JIRA ticket ever created regarding appending to
 an existing Avro file on HDFS?
 
  What is the status of such a capability, a year out
 from when the issue below was raised?
 
  On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav
 Zholudev vyacheslav.zholu...@gmail.com
 wrote:
 
  Thanks for your reply, I suspected this.
 
  I will create a JIRA ticket.
 
  Vyacheslav
 
  On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
 
 
  On 2/21/12 7:29 AM, Vyacheslav Zholudev
 vyacheslav.zholu...@gmail.com
  wrote:
 
  Yep, I saw that method as well as the
 stackoverflow post. However, I'm
  interested how to append to a file on the
 arbitrary file system, not
  only on the local one.
 
  I want to get an OutputStream based on the
 Path and the FileSystem
  implementation and then pass it for
 appending to avro methods.
 
  Is that possible?
 
  It is not possible without modifying
 DataFileWriter. Please open a JIRA
  ticket.
 
  It could not simply append

Re: Is it possible to append to an already existing avro file

2013-02-05 Thread Michael Malak
I don't believe a Hadoop FileSystem is a Java OutputStream?

--- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote:

 From: Doug Cutting cutt...@apache.org
 Subject: Re: Is it possible to append to an already existing avro file
 To: user@avro.apache.org
 Date: Tuesday, February 5, 2013, 5:27 PM
 It will work on an OutputStream that
 supports append.
 
 http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
 java.io.OutputStream)
 
 So it depends on how well HDFS implements
 FileSystem#append(), not on
 any changes in Avro.
 
 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
 
 I have no recent personal experience with append in
 HDFS.  Does anyone
 else here?
 
 Doug
 
 On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak michaelma...@yahoo.com
 wrote:
  My understanding is that will append to a file on the
 local filesystem, but not to a file on HDFS.
 
  --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org
 wrote:
 
  From: Doug Cutting cutt...@apache.org
  Subject: Re: Is it possible to append to an already
 existing avro file
  To: user@avro.apache.org
  Date: Tuesday, February 5, 2013, 5:08 PM
  The Jira is:
 
  https://issues.apache.org/jira/browse/AVRO-1035
 
  It is possible to append to an existing Avro file:
 
  http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
 
  Should we close that issue as fixed?
 
  Doug
 
  On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
 michaelma...@yahoo.com
  wrote:
   Was a JIRA ticket ever created regarding
 appending to
  an existing Avro file on HDFS?
  
   What is the status of such a capability, a
 year out
  from when the issue below was raised?
  
   On Wed, 22 Feb 2012 10:57:48 +0100,
 Vyacheslav
  Zholudev vyacheslav.zholu...@gmail.com
  wrote:
  
   Thanks for your reply, I suspected this.
  
   I will create a JIRA ticket.
  
   Vyacheslav
  
   On Feb 21, 2012, at 6:02 PM, Scott Carey
 wrote:
  
  
   On 2/21/12 7:29 AM, Vyacheslav
 Zholudev
  vyacheslav.zholu...@gmail.com
   wrote:
  
   Yep, I saw that method as well as
 the
  stackoverflow post. However, I'm
   interested how to append to a file
 on the
  arbitrary file system, not
   only on the local one.
  
   I want to get an OutputStream
 based on the
  Path and the FileSystem
   implementation and then pass it
 for
  appending to avro methods.
  
   Is that possible?
  
   It is not possible without modifying
  DataFileWriter. Please open a JIRA
   ticket.
  
   It could not simply append to an
 OutputStream,
  since it must either:
   * Seek to the start to validate the
 schemas
  match and find the sync
   marker, or
   * Trust that the schemas match and
 find the
  sync marker from the last
   block
  
   DataFileWriter cannot refer to Hadoop
 classes
  such as FileSystem, but we
   could add something to the mapred
 module that
  takes a Path and
   FileSystem and returns something that
  implemements an interface that
   DataFileWriter can append to. 
 This would
  be something that is both a
   http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
   and an OutputStream, or has both an
 InputStream
  from the start of the
   existing file and an OutputStream at
 the end.
  
   Thanks,
   Vyacheslav
  
   On Feb 21, 2012, at 5:29 AM, Harsh
 J
  wrote:
  
   Hi,
  
   Use the appendTo feature of
 the
  DataFileWriter. See
  
   http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
  
   For a quick setup example,
 read also:
  
   http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
  
   On Tue, Feb 21, 2012 at 3:15
 AM,
  Vyacheslav Zholudev
   vyacheslav.zholu...@gmail.com
  wrote:
   Hi,
  
   is it possible to append
 to an
  already existing avro file when it was
   written and closed
 before?
  
   If I use
   outputStream =
  fs.append(avroFilePath);
  
   then later on I get:
  java.io.IOException: Invalid sync!
  
   Probably because the
 schema is
  written twice and some other issues.
  
   If I use outputStream =
  fs.create(avroFilePath); then the avro file
   gets
   overwritten.
  
   Thanks,
   Vyacheslav
  
   --
   Harsh J
   Customer Ops. Engineer
   Cloudera | http://tiny.cloudera.com/about
  
 
  On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
 michaelma...@yahoo.com
  wrote:
   Was a JIRA ticket ever created regarding
 appending to
  an existing Avro file on HDFS?
  
   What is the status of such a capability, a
 year out
  from when the issue below was raised?
  
   On Wed, 22 Feb 2012 10:57:48 +0100,
 Vyacheslav
  Zholudev vyacheslav.zholu...@gmail.com
  wrote:
  
   Thanks for your reply, I suspected this.
  
   I will create a JIRA ticket.
  
   Vyacheslav
  
   On Feb 21, 2012, at 6:02 PM, Scott Carey
 wrote:
  
  
   On 2/21/12 7:29 AM, Vyacheslav

Is it possible to append to an already existing avro file

2012-02-20 Thread Vyacheslav Zholudev
Hi, 

is it possible to append to an already existing avro file when it was written 
and closed before?

If I use
outputStream = fs.append(avroFilePath);

then later on I get: java.io.IOException: Invalid sync!

Probably because the schema is written twice and some other issues. 

If I use outputStream = fs.create(avroFilePath); then the avro file gets 
overwritten. 

Thanks,
Vyacheslav