Hi Wyatt!

some other info you might find useful:
* You might be tempted to implement a Sink - it's the obvious thing in the
API for writing to external data stores. However, we're finding it less
useful these days and generally discouraging its use unless you're writing
to files (which you're not). Instead, if you can, just implement a DoFn
that does the write. As Davor mentioned, BigTableIO is a good example of
this.
* It's useful to understand the lifecycle of DoFns
(setup/startbundle/finishbundle/teardown.) For example, you'll likely want
to batch writes for efficiency - BigTableIO does this by flushing writes
stored locally in finishBundle.
* BigTableIO uses a separate "service" class - that's useful for making
your tests simpler by abstracting out the network retry/etc logic

As you'll have noticed by the multiple replies to your message, people are
eager to answer questions you might have - feel free to pipe up on the
mailing list (dev@ might be more appropriate in that case.)

S

On Wed, Jan 11, 2017 at 9:14 PM Jean-Baptiste Onofré <[email protected]>
wrote:

> Welcome and fully agree with Davor.
>
> You can count on me to do the review !
>
> Regards
> JB
> On Jan 12, 2017, at 06:12, Davor Bonaci <[email protected]> wrote:
>
> Hi  Wyatt -- welcome!
>
> If you'd like to write to a PCollection to Apache  Accumulo's key/value
> store, writing an new IO connector would be the best path forward. Accumulo
> has somewhat similar concepts as BigTable, so you can use the existing
> BigTableIO as an inspiration.
>
> You are thinking it exactly right -- a connector written in Beam would be
> runner-independent, and thus can run anywhere.
>
> I'm not aware that anybody has started on this one yet -- feel free to
> file a JIRA to have a place to coordinate if someone else is interested.
> And, if you get stuck or need help in any way, there are plenty of people
> on the Beam mailing lists happy to help!
>
> Once again, welcome!
>
> Davor
>
> On Wed, Jan 11, 2017 at 6:04 PM, Wyatt Frelot <[email protected]> wrote:
>
> All,
>
> Being new to Apache Beam...I want to ensure that I approach things the
> "right way".
>
> My goal:
>
> I want to be able to write a PCollection to Apache Accumulo. Something
> like this:
>
>               PCollection.apply( AccumuloIO.Write.to("AccumuloTable"));
>
>
> While I am sure I can create a custom class to do so, it has me thinking
> about identifying the best way forward.
>
> I want to use the Apex Runner to run my applications. Apex has Malhar
> libraries that are already written that would be really useful. But, I
> don't think that is the point. The goal is to develop IO Connectors that
> are able to be applied to any runner.  Am I thinking about his correctly?
>
> Is there any work being done to develop an IO Connector for Apache
> Accumulo?
>
> Wyatt
>
>
> wa
>
>
>

Reply via email to