[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Indeed, I'm also not aware of how users use it. I would take a similiar approach than c*, already in made for streaming, seems to use a existing async api from datastax. But again, I couldn't find accurate feedback regarding backpressure. If this is an issue, then working with Flink states is a must, imo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 No, only puts. Aggregation is coming from a reduce which by itself aggregates, keeping recent values on hbase, more like snapshots I think. During our analysis sending data to hbase was only reliable when working with aggregates otherwise a huge amount of backpressure will come from region servers. Full dumps are more likely to work on elasticsearch or hdfs than hbase. So, because Flink does an amazing job at keeping states why bother to overload region servers with so many requests? Again, our point of view, of course. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 From my point of view, my sample works fine (my use case) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 I've made a custom solution which works for my use cases. Notice that the code attached is not working because it's only a skeleton. This prototype uses asynchbase and tries to manage throttling issues as mentioned above. The way I do this is by limiting requests per client by 1000 (also configurable, if you want, depending on hbase capacity and response), and skipping records after reaching that threshold. Every record skipped is updated according with system timestamp, always keeping the most recent skipped record for later updates. Now, in my use case I always use a keyby -> reduce before sink, which keeps the aggregation state, meaning that every record invoked by hbase sink will have the last aggregated value from your previous operators. When all requests are done `pending == 0` I compare the last skipped record with the last requested record, if the skipped timestamp is less than the requested timestamp means that hbase has the last aggregation. There is plenty of room for improvments, i just did'nt have the time. [HBaseSink.txt](https://github.com/apache/flink/files/1014991/HBaseSink.txt) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Currently trying to fix some throttling issues, reported here https://github.com/OpenTSDB/asynchbase/issues/162 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Hi, I'm testing one my self with a third party library, http://opentsdb.github.io/asynchbase. I'm following a simliar approach as cassandra sink. Testing it as we speak. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Any updates on this sink? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3642: SignalWindow
Github user nragon closed the pull request at: https://github.com/apache/flink/pull/3642 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3642: SignalWindow
Github user nragon commented on the issue: https://github.com/apache/flink/pull/3642 Also, relating this. You think keeping a value state within the trigger would be better instead of using a window attribute to save the signal event timestamp? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3642: SignalWindow
Github user nragon commented on the issue: https://github.com/apache/flink/pull/3642 @StephanEwen would this implementation work? In you opinion, at first sight :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3642: SignalWindow
Github user nragon commented on the issue: https://github.com/apache/flink/pull/3642 @StephanEwen I'm doing some test with this type of window. I'll try to see some other way (hash related as you said) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3639: Update TimeWindow.java
Github user nragon closed the pull request at: https://github.com/apache/flink/pull/3639 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3639: Update TimeWindow.java
Github user nragon commented on the issue: https://github.com/apache/flink/pull/3639 Closing due to #3642 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3642: SignalWindow
Github user nragon commented on the issue: https://github.com/apache/flink/pull/3642 But the hash code and equals remais the same, just like in TimeWindow. The only difference is that we can merge and control signaled events. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3642: SignalWindow
GitHub user nragon opened a pull request: https://github.com/apache/flink/pull/3642 SignalWindow Related with https://github.com/apache/flink/pull/3639 but might be better with a new window. This concept can also be represented with a state on trigger keeping the signaled event, but I think with this window we are able to get better throughput. Also, this window is almost the same as TimeWindow You can merge this pull request into a Git repository by running: $ git pull https://github.com/nragon/flink patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3642.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3642 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3639: Update TimeWindow.java
GitHub user nragon opened a pull request: https://github.com/apache/flink/pull/3639 Update TimeWindow.java A method to update the time window max timestamp would be useful when we need control the max timestamp based on events. For instance a session time window may end after the gap or when an event with a given record arives. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nragon/flink patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3639.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3639 commit 10c534278dabd08a899eb4723c4cd236c148619e Author: nragon <nuno...@hotmail.com> Date: 2017-03-29T09:34:16Z Update TimeWindow.java A method to update the time window max timestamp would be useful when we need control the max timestamp based on events. For instance a session time window may end after the gap or when an event with a given record arives. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3554: Detached (Remote)StreamEnvironment execution
Github user nragon commented on the issue: https://github.com/apache/flink/pull/3554 I guess for backward compatibility this last commig would do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3554: Detached (Remote)StreamEnvironment execution
GitHub user nragon opened a pull request: https://github.com/apache/flink/pull/3554 Detached (Remote)StreamEnvironment execution Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [ ] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [ ] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [ ] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/nragon/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3554.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3554 commit 14592e6d3c1217537ed85c86cbebe168f5f05dcc Author: nragon <nuno...@hotmail.com> Date: 2017-03-16T10:32:54Z Update Flip6LocalStreamEnvironment.java commit a3830b8b044cd9b4a0cb1455cd71d50e5b79b438 Author: nragon <nuno...@hotmail.com> Date: 2017-03-16T10:33:12Z Update LocalStreamEnvironment.java commit 1712b19b4126aea18742b6c6489486f13e9ee3e4 Author: nragon <nuno...@hotmail.com> Date: 2017-03-16T10:33:31Z Update RemoteStreamEnvironment.java commit 690969c0bc4fd2ac5828cff01d621f86e48de4d1 Author: nragon <nuno...@hotmail.com> Date: 2017-03-16T10:33:45Z Update StreamContextEnvironment.java commit 143a712df865c7c2de9137f1199c0723a4419e23 Author: nragon <nuno...@hotmail.com> Date: 2017-03-16T10:34:01Z Update StreamExecutionEnvironment.java commit 15a245e1fbd8c60d019677818aa6e4dd4a4ca0cb Author: nragon <nuno...@hotmail.com> Date: 2017-03-16T10:34:18Z Update StreamPlanEnvironment.java commit 2753c5ba17fd4e91036b5d5b2f806b39763b8232 Author: nragon <nuno...@hotmail.com> Date: 2017-03-16T10:35:59Z Update Flip6LocalStreamEnvironment.java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Github user nragon commented on the issue: https://github.com/apache/flink/pull/2332 Hi :) I needed to use hbase as sink so i decided to take a look at this pull a use it. Current changes that might be interesting: Like hbase connector for dataset, it's possible to define the configuration in other to connect, for instance, to a remote hbase master. In [HBaseSink](https://github.com/delding/flink/blob/16ad4b2ac13567ceba7bacf15e4698fb4ce17c53/flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java) the `HBaseConfiguration.create()` method should receive this configurations set by constructor (or other method) Hope this makes sense. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---