Hi,
I managed to write to GS from flume [1], but this is not working 100% yet:
- files are created in the expected directories, but are empty
- flume throws a java.lang.OutOfMemoryError: Java heap space:
java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
at
org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
(complete stack trace here: http://pastebin.com/i5iSgCM3)
Has anyone already experienced this ?
Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
Where should I look to find out what's wrong ?
My configuration looks like this:
a1.sinks.hdfs_sink.hdfs.path =
gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d
I am running flume from Docker.
[1]
http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit
Thanks.
Le 26/11/2014 17:05, Jean-Philippe Caruana a écrit :
> Hi,
>
> I am a total newbee about hadoop, so sorry if my questions sound
> stupid (please give me pointers).
>
> I would like to use flume to send data to hdfs on google cloud :
> - does GS (google storage) support exists ? It would be great to use a
> path like this gs://some_path
> - where does the flume agent needs to be ? when I see
> hdfs://some_path/ I wonder why there is no server address in the path
>
> In fact I looking for feedback about sending data to a google cloud
> hadoop cluster from my own (on premises) servers.
>
> Thanks
> --
> Jean-Philippe Caruana
> http://www.barreverte.fr
--
Jean-Philippe Caruana
http://www.barreverte.fr