[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2018-01-07 Thread nielsbasjes
Github user nielsbasjes commented on the issue: https://github.com/apache/flink/pull/2332 FYI: In close relation to this issue I submitted an enhancement at the HBase side to support these kinds of usecases much better: https://issues.apache.org/jira/browse/HBASE-19486

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-30 Thread nragon
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Indeed, I'm also not aware of how users use it. I would take a similiar approach than c*, already in made for streaming, seems to use a existing async api from datastax. But again, I couldn't find

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-30 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 So I think at a first level let us have put/delete mutations alone for Streaming ? Since am not aware of how flink users are currently interacting with HBase not sure on what HBase ops should be

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-30 Thread nragon
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 No, only puts. Aggregation is coming from a reduce which by itself aggregates, keeping recent values on hbase, more like snapshots I think. During our analysis sending data to hbase was only reliable

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-30 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 @nragon I agree. But your use case does it have increments/appends? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-29 Thread nragon
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 From my point of view, my sample works fine (my use case) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-29 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 I can take this up and come up with a design doc. Reading thro the comments here and the final decision points I think only puts/deletes can be considered idempotent. But increments/decrements

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-19 Thread nragon
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 I've made a custom solution which works for my use cases. Notice that the code attached is not working because it's only a skeleton. This prototype uses asynchbase and tries to manage throttling

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-19 Thread fpompermaier
Github user fpompermaier commented on the issue: https://github.com/apache/flink/pull/2332 anyone working on this? HBase streaming sink would be a very nice addition... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-17 Thread nragon
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Currently trying to fix some throttling issues, reported here https://github.com/OpenTSDB/asynchbase/issues/162 --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-16 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @ramkrish86 , sorry I don't have bandwidth working on it.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-16 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 @delding Are you still working on this? @nragon Let me know how can be of help here? I can work on this PR too since I have some context on the existing PR though it is stale. ---

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-16 Thread nragon
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Hi, I'm testing one my self with a third party library, http://opentsdb.github.io/asynchbase. I'm following a simliar approach as cassandra sink. Testing it as we speak. Thanks --- If your

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-16 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2332 Hi @nragon, the PR seems to be stale. Are you interested in contributing a streaming HBase sink? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-16 Thread nragon
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Any updates on this sink? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-03-01 Thread nragon
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Hi :) I needed to use hbase as sink so i decided to take a look at this pull a use it. Current changes that might be interesting: Like hbase connector for dataset, it's possible to

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-27 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2332 Thanks for these discussions @delding and @ramkrish86. I think this is getting a bit hard to follow. What do you think about writing a design doc that describes the supported HBase features and

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-27 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 Ok. Then it should be clearly documented now. that the sink supports only Puts/Deletes. So in future can the sink be updated with new APIs? I don't know the procedure here in case new APIs have

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 I agree it might be an overkill. But in case of having an sink that only supports Put/Delete, it would be better to have ordered execution than not, after all HBase has this API so there could be

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 With in a single row you need guarantee of order of execution? I agree Append/Increment or non-idempotent in certain failure cases but there is a Nonce generator for that. I should say I have not

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @ramkrish86 , I'm thinking replace batch() with mutateRow() because it provides atomic ordered mutations for a single row, but it only supports Put and Delete which should be fine since only Put

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 @delding Do you have uses cases with Append/Increment? I think with the batch() API we are not sure of the order of execution of the batch() in the hbase server but still it would help

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2332 @nielsbasjes I want to avoid to exclude a version that is still in use by existing Flink users. I do not have in insight in which HBase versions currently in use. If (basically) everybody is on

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread nielsbasjes
Github user nielsbasjes commented on the issue: https://github.com/apache/flink/pull/2332 @fhueske: What is "older" ? I would like a clear statement about the (minimal) supported versions of HBase. I would see 1.1.x as old enough, or do you see 0.98 still required? --- If

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2332 @nielsbasjes We would break compatibility with older HBase versions. Before we do that, we should start a discussion on the user and dev mailing lists. --- If your project is set up for it, you can

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread nielsbasjes
Github user nielsbasjes commented on the issue: https://github.com/apache/flink/pull/2332 @fhueske Should the TableInputFormat be updated to use the HBase 1.1.2 api also? It would make things a bit cleaner. --- If your project is set up for it, you can reply to this email and have

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread zentol
Github user zentol commented on the issue: https://github.com/apache/flink/pull/2332 @ramkrish86 that is correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-26 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 > This scheme does not provide exactly-once delivery guarantees, however at any given point in time the table would be in a state as if the updates were only sent once. This is the same guarantee

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-24 Thread nielsbasjes
Github user nielsbasjes commented on the issue: https://github.com/apache/flink/pull/2332 Yes, as long as everything is compatible with HBase 1.1.2 its fine for me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-23 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Thanks for very detailed and helpful comments, let me work on it :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-23 Thread zentol
Github user zentol commented on the issue: https://github.com/apache/flink/pull/2332 Don't you loose any guarantees regarding order of mutations the moment you use asynchronous updates anyway? The WriteAheadSink should only be used if you want to deal with non-deterministic

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-23 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2332 Thanks for looking into the Cassandra connector. Maybe @zentol (who implemented the Cassandra connector) can comment how an HBase sink can be made fault-tolerant and what code could be reused for

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-22 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @ramkrish86 , CassandraTupleWriteAheadSink extends GenericWriteAheadSink that deals with the checkpoint mechanism --- If your project is set up for it, you can reply to this email and have your

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-21 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 I agree with @delding here. >* Method that does a batch call on Deletes, Gets, Puts, Increments and Appends. * The ordering of execution of the actions is not defined. Meaning

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-21 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 ` #2330 updates the version of the batch TableInputFormat to HBase 1.1.2. I think we should use the same version here.` Valid point. But is it possible to upgrade to even newer version like

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-21 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 I can change back to HBase 1.1.2. The reason of using 2.0.0-SNAPSHOT is because this bug (https://issues.apache.org/jira/browse/HBASE-14963 ). It's only fixed for version 1.3.0+ and 2.0.0 and my

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-21 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @fhueske , in HBase writes to a single row have ACID guarantee. The exactly once semantic can be implemented the way CassandraSink did, storing input records in a state backend and flushing to

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-21 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2332 I also noticed that the PR pulls in an HBase SNAPSHOT dependency. This is not possible. We cannot compile against a development version. #2330 updates the version of the batch TableInputFormat to

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-21 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2332 Hi @delding, I'm sorry that I did not mention this earlier, but I just noticed that the `HBaseSink` does not implement any logic for checkpointing and fault-tolerance. The checkpointing

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-20 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @fhueske , I have updated this PR to address your comments. In this change, only one MutationActions and two ArralyList are needed for entire stream, MutationActions is now resettable. But I

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-02 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @ramkrish86 , I squashed my recent commit into the one you reviewed before, then merged the upstream and pushed --force the commits which is why you only see two commits. The first commit

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-09-01 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 @delding Am not able to see the recent changes at all. Under 'commit' tab it shows 2 commits. If I open the merge commit I am not able to see the changes that you are saying here. Can you

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-31 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @ramkrish86 , I squashed the commits before pushing.. Sorry about making it unable to see those recent changes. If you could still review this PR when you have time? --- If your project is set

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-31 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 How should I view your latest commit ? Am not able to see the changes that you have mentioned above. Thanks @delding --- If your project is set up for it, you can reply to this email and have

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-25 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi, I have updated the PR based on @ramkrish86 's comments. 1. rename MutationActionList to MutationActions which has a public method createMutations that return HBase Mutations 2.

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-23 Thread rmetzger
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2332 Cool. Let me know if you have any questions or if you need any help. I'll keep an eye on this PR and the Bahir-flink repository. --- If your project is set up for it, you can reply to this

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-22 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @rmetzger , Thanks for your comment. I am totally ok with contributing the connector to Apache Bahir :-) --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-22 Thread rmetzger
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2332 @delding: Thanks a lot for opening a pull request for a streaming hbase connector to Flink. Would you be okay with contributing the connector to Apache Bahir, instead of Apache Flink? The

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-14 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @ramkrish86 , thanks for your reviews and detailed comments :-) I will update this PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-11 Thread ramkrish86
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2332 Thanks for the update @delding . I can check this PR more closely tomorrow my time. Had a glance at it. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-10 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 BTW, the example runs well on my machine when I set hbase-client version to 2.0.0-SNAPSHOT, but travis-ci build failed due to "Failed to collect dependencies at

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-10 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi, I have updated this PR, adding javadoc and an example as @tzulitai advised and created Enums for PUT, DELETE, INCREMENT and APPEND as @ramkrish86 advised. --- If your project is set up for it,

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi @tzulitai Thank you for all the comments :-) I will update the PR and add an example in next a few days --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread tzulitai
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2332 @delding Thank you for working on this contribution. Since I'm not that familiar with the HBase client API, I've only skimmed through the code for now. Will try to review in detail later.

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-05 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 BTW, failing tests are results of this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2016-08-04 Thread delding
Github user delding commented on the issue: https://github.com/apache/flink/pull/2332 Hi, this is my second PR to Flink, any comments or suggestions will be very appreciated :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as