Hey Guys, I've made a decent amount of progress, and now have the settings correct. For completeness, the settings look like this:
agent.sinks.s3Sink.type = hdfs agent.sinks.s3Sink.hdfs.path = s3://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@BUCKET-NAME/ You can see the full setup at this gist: https://gist.github.com/crowdmatt/5256881 However, I've run into the following problem: 2013-03-29 19:05:28,954 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:460)] process failed org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/FlumeData.1364583927762.tmp' on Host ' mybucket.s3.amazonaws.com' @ 'Fri, 29 Mar 2013 19:05:28 GMT' -- ResponseCode: 404, ResponseStatus: Not Found, RequestId: 00864FE1DCD5AD95, HostId: 68AuSUe/XsP9zUiwe4yqhhDjETjVEnXVuTdZjYKQfj6VBKyACLH++MD1i8xgrEE4 at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:122) Does anyone have any pointers on how I can start debugging? Best, Matt -- Matthew Moore Co-Founder & CTO, CrowdMob Inc. Mobile: (650) 888-5962 Need to schedule a meeting? Invite me via Google Calendar! [email protected] On Fri, Mar 29, 2013 at 8:47 AM, Matthew Moore <[email protected]> wrote: > Hey, > > Thanks for the links to the Jiras. It seems like someone implemented > an S3BufferedWriter which might be helpful in the future. > > However, I'm still not sure what to set the configuration (flume.conf) to > use s3 as a sink? Has anyone done that? > > Best, > Matt > -- > Matthew Moore > Co-Founder & CTO, CrowdMob Inc. > Mobile: (650) 888-5962 > > Need to schedule a meeting? Invite me via Google Calendar! > [email protected] > > > On Fri, Mar 29, 2013 at 7:49 AM, Brock Noland <[email protected]> wrote: > >> Sorry, I don't know much about this, but here are two relevant JIRA's: >> >> https://issues.apache.org/jira/browse/FLUME-1228 >> https://issues.apache.org/jira/browse/FLUME-951 >> >> >> On Fri, Mar 29, 2013 at 9:44 AM, Matthew Moore <[email protected]> wrote: >> >>> Hey there, >>> >>> I know this is a really newbish question, but I'm hoping to get a little >>> assistance here so I'm not stuck guess-and-checking. >>> >>> I'm trying to figure out how to configure FlumeNG (1.3.1), but I >>> couldn't figure out how to setup the hdfs sink to use the s3 >>> implementations. >>> >>> I'm keeping track of my progress on this gist I made: >>> https://gist.github.com/crowdmatt/5256881 >>> >>> From what I've gathered, I should be using the hdfs type, which I'm >>> setting up as such: >>> >>> agent.sinks = s3Sink >>> agent.sinks.s3Sink.type = hdfs >>> agent.sinks.s3Sink.channel = recoverableMemoryChannel >>> >>> ... but that's where I end up hitting my head against the wall. I know >>> I should be specifying my s3 access key, secret, and bucket in this format: >>> s3n://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/ >>> >>> However, I don't know where to specify that, or what dot notation to use. >>> >>> Can anyone point me in the right direction? >>> >>> Best, >>> Matt >>> -- >>> Matthew Moore >>> Co-Founder & CTO, CrowdMob Inc. >>> Mobile: (650) 888-5962 >>> >>> Need to schedule a meeting? Invite me via Google Calendar! >>> [email protected] >>> >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org >> > >
