Re: AVRO Append HDFS using saveAsNewAPIHadoopFile

2017-01-09 Thread Santosh.B
Yes it provides but whatever have seen its line by line update. Please see
below link
 https://gist.github.com/QwertyManiac/4724582

This is very slow because of append Avro , am thinking of something  which
we normally do for test files where we buffer the data to a size and the
flush the buffer.





On Mon, Jan 9, 2017 at 3:17 PM, Jörn Franke <jornfra...@gmail.com> wrote:

> Avro itself supports it, but I am not sure if this functionality is
> available through the Spark API. Just out of curiosity, if your use case is
> only write to HDFS then you might use simply flume.
>
> On 9 Jan 2017, at 09:58, awkysam <contactsanto...@gmail.com> wrote:
>
> Currently for our project we are collecting data and pushing into Kafka
> with messages are in Avro format. We need to push this data into HDFS and
> we are using SparkStreaming and in HDFS also it is stored in Avro format.
> We are partitioning the data per each day. So when we write data into HDFS
> we need to append to the same file. Curenttly we are using
> GenericRecordWriter and we will be using saveAsNewAPIHadoopFile for writing
> into HDFS. Is there a way to append data into file in HDFS with Avro format
> using saveAsNewAPIHadoopFile ? Thanks, Santosh B
> ----------
> View this message in context: AVRO Append HDFS using
> saveAsNewAPIHadoopFile
> <http://apache-spark-user-list.1001560.n3.nabble.com/AVRO-Append-HDFS-using-saveAsNewAPIHadoopFile-tp28292.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>


Re: AVRO Append HDFS using saveAsNewAPIHadoopFile

2017-01-09 Thread Jörn Franke
Avro itself supports it, but I am not sure if this functionality is available 
through the Spark API. Just out of curiosity, if your use case is only write to 
HDFS then you might use simply flume.

> On 9 Jan 2017, at 09:58, awkysam <contactsanto...@gmail.com> wrote:
> 
> Currently for our project we are collecting data and pushing into Kafka with 
> messages are in Avro format. We need to push this data into HDFS and we are 
> using SparkStreaming and in HDFS also it is stored in Avro format. We are 
> partitioning the data per each day. So when we write data into HDFS we need 
> to append to the same file. Curenttly we are using GenericRecordWriter and we 
> will be using saveAsNewAPIHadoopFile for writing into HDFS. Is there a way to 
> append data into file in HDFS with Avro format using saveAsNewAPIHadoopFile ? 
> Thanks, Santosh B 
> View this message in context: AVRO Append HDFS using saveAsNewAPIHadoopFile
> Sent from the Apache Spark User List mailing list archive at Nabble.com.


AVRO Append HDFS using saveAsNewAPIHadoopFile

2017-01-09 Thread awkysam
Currently for our project we are collecting data and pushing into Kafka with
messages are in Avro format.  We need to push this data into HDFS and we are
using SparkStreaming and in HDFS also it is stored in Avro format. We are
partitioning the data per each day. So when we write data into HDFS we need
to append to the same file. Curenttly we are using GenericRecordWriter and
we will be using saveAsNewAPIHadoopFile for writing into HDFS.Is there a way
to append data into file in HDFS with Avro format using
saveAsNewAPIHadoopFile ?Thanks,Santosh B



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/AVRO-Append-HDFS-using-saveAsNewAPIHadoopFile-tp28292.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.