Awesome, thanks Wyatt! On Thu, Jan 12, 2017 at 10:08 AM, Wyatt Frelot <[email protected]> wrote:
> Thanks for the feedback and "points in the right direction". I will create > a JIRA ticket and coordinate status from that point. Additionally, if I > have anymore questions...will submit to the mailing list. > > Again, thanks all! I definitely feel welcome! > > Wyatt > > On Thu, Jan 12, 2017 at 12:45 PM Stephen Sisk <[email protected]> wrote: > >> Hi Wyatt! >> >> some other info you might find useful: >> * You might be tempted to implement a Sink - it's the obvious thing in >> the API for writing to external data stores. However, we're finding it less >> useful these days and generally discouraging its use unless you're writing >> to files (which you're not). Instead, if you can, just implement a DoFn >> that does the write. As Davor mentioned, BigTableIO is a good example of >> this. >> * It's useful to understand the lifecycle of DoFns >> (setup/startbundle/finishbundle/teardown.) >> For example, you'll likely want to batch writes for efficiency - BigTableIO >> does this by flushing writes stored locally in finishBundle. >> * BigTableIO uses a separate "service" class - that's useful for making >> your tests simpler by abstracting out the network retry/etc logic >> >> As you'll have noticed by the multiple replies to your message, people >> are eager to answer questions you might have - feel free to pipe up on the >> mailing list (dev@ might be more appropriate in that case.) >> >> S >> >> On Wed, Jan 11, 2017 at 9:14 PM Jean-Baptiste Onofré <[email protected]> >> wrote: >> >> Welcome and fully agree with Davor. >> >> You can count on me to do the review ! >> >> Regards >> JB >> On Jan 12, 2017, at 06:12, Davor Bonaci <[email protected]> wrote: >> >> Hi Wyatt -- welcome! >> >> If you'd like to write to a PCollection to Apache Accumulo's key/value >> store, writing an new IO connector would be the best path forward. Accumulo >> has somewhat similar concepts as BigTable, so you can use the existing >> BigTableIO as an inspiration. >> >> You are thinking it exactly right -- a connector written in Beam would be >> runner-independent, and thus can run anywhere. >> >> I'm not aware that anybody has started on this one yet -- feel free to >> file a JIRA to have a place to coordinate if someone else is interested. >> And, if you get stuck or need help in any way, there are plenty of people >> on the Beam mailing lists happy to help! >> >> Once again, welcome! >> >> Davor >> >> On Wed, Jan 11, 2017 at 6:04 PM, Wyatt Frelot <[email protected]> wrote: >> >> All, >> >> Being new to Apache Beam...I want to ensure that I approach things the >> "right way". >> >> My goal: >> >> I want to be able to write a PCollection to Apache Accumulo. Something >> like this: >> >> PCollection.apply( AccumuloIO.Write.to("AccumuloTable")); >> >> >> While I am sure I can create a custom class to do so, it has me thinking >> about identifying the best way forward. >> >> I want to use the Apex Runner to run my applications. Apex has Malhar >> libraries that are already written that would be really useful. But, I >> don't think that is the point. The goal is to develop IO Connectors that >> are able to be applied to any runner. Am I thinking about his correctly? >> >> Is there any work being done to develop an IO Connector for Apache >> Accumulo? >> >> Wyatt >> >> >> wa >> >> >>
