[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-30 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/2332
  
Indeed, I'm also not aware of how users use it. I would take a similiar 
approach than c*, already in made for streaming, seems to use a existing async 
api from datastax. But again, I couldn't find accurate feedback regarding 
backpressure. If this is an issue, then working with Flink states is a must, 
imo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-30 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/2332
  
No, only puts. Aggregation is coming from a reduce which by itself 
aggregates, keeping recent values on hbase, more like snapshots I think. During 
our analysis sending data to hbase was only reliable when working with 
aggregates otherwise a huge amount of backpressure will come from region 
servers. Full dumps are more likely to work on elasticsearch or hdfs than 
hbase. So, because Flink does an amazing job at keeping states why bother to 
overload region servers with so many requests? Again, our point of view, of 
course.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-29 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/2332
  
From my point of view, my sample works fine (my use case)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-19 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/2332
  
I've made a custom solution which works for my use cases. Notice that the 
code attached is not working because it's only a skeleton.
This prototype uses asynchbase and tries to manage throttling issues as 
mentioned above. The way I do this is by limiting requests per client by 1000 
(also configurable, if you want, depending on hbase capacity and response), and 
skipping records after reaching that threshold. Every record skipped is updated 
according with system timestamp, always keeping the most recent skipped record 
for later updates.
Now, in my use case I always use a keyby -> reduce before sink, which keeps 
the aggregation state, meaning that every record invoked by hbase sink will 
have the last aggregated value from your previous operators. When all requests 
are done `pending == 0` I compare the last skipped record with the last 
requested record, if the skipped timestamp is less than the requested timestamp 
means that hbase has the last aggregation.
There is plenty of room for improvments, i just did'nt have the time.

[HBaseSink.txt](https://github.com/apache/flink/files/1014991/HBaseSink.txt)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-17 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/2332
  
Currently trying to fix some throttling issues, reported here 
https://github.com/OpenTSDB/asynchbase/issues/162


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-16 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/2332
  
Hi, I'm testing one my self with a third party library, 
http://opentsdb.github.io/asynchbase.
I'm following a simliar approach as cassandra sink. Testing it as we speak.
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-05-16 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/2332
  
Any updates on this sink?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3642: SignalWindow

2017-03-30 Thread nragon
Github user nragon closed the pull request at:

https://github.com/apache/flink/pull/3642


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3642: SignalWindow

2017-03-30 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/3642
  
Also, relating this. You think keeping a value state within the trigger 
would be better instead of using a window attribute to save the signal event 
timestamp?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3642: SignalWindow

2017-03-30 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/3642
  
@StephanEwen would this implementation work? In you opinion, at first sight 
:) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3642: SignalWindow

2017-03-29 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/3642
  
@StephanEwen I'm doing some test with this type of window. I'll try to see 
some other way (hash related as you said)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3639: Update TimeWindow.java

2017-03-29 Thread nragon
Github user nragon closed the pull request at:

https://github.com/apache/flink/pull/3639


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3639: Update TimeWindow.java

2017-03-29 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/3639
  
Closing due to #3642


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3642: SignalWindow

2017-03-29 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/3642
  
But the hash code and equals remais the same, just like in TimeWindow. The 
only difference is that we can merge and control signaled events.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3642: SignalWindow

2017-03-29 Thread nragon
GitHub user nragon opened a pull request:

https://github.com/apache/flink/pull/3642

SignalWindow

Related with https://github.com/apache/flink/pull/3639
but might be better with a new window.
This concept can also be represented with a state on trigger keeping the 
signaled event, but I think with this window we are able to get better 
throughput.
Also, this window is almost the same as TimeWindow

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nragon/flink patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3642.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3642






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3639: Update TimeWindow.java

2017-03-29 Thread nragon
GitHub user nragon opened a pull request:

https://github.com/apache/flink/pull/3639

Update TimeWindow.java

A method to update the time window max timestamp would be useful when we 
need control the max timestamp based on events. For instance a session time 
window may end after the gap or when an event with a given record arives.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nragon/flink patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3639


commit 10c534278dabd08a899eb4723c4cd236c148619e
Author: nragon <nuno...@hotmail.com>
Date:   2017-03-29T09:34:16Z

Update TimeWindow.java

A method to update the time window max timestamp would be useful when we 
need control the max timestamp based on events. For instance a session time 
window may end after the gap or when an event with a given record arives.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3554: Detached (Remote)StreamEnvironment execution

2017-03-16 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/3554
  
I guess for backward compatibility this last commig would do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3554: Detached (Remote)StreamEnvironment execution

2017-03-16 Thread nragon
GitHub user nragon opened a pull request:

https://github.com/apache/flink/pull/3554

Detached (Remote)StreamEnvironment execution

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nragon/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3554.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3554


commit 14592e6d3c1217537ed85c86cbebe168f5f05dcc
Author: nragon <nuno...@hotmail.com>
Date:   2017-03-16T10:32:54Z

Update Flip6LocalStreamEnvironment.java

commit a3830b8b044cd9b4a0cb1455cd71d50e5b79b438
Author: nragon <nuno...@hotmail.com>
Date:   2017-03-16T10:33:12Z

Update LocalStreamEnvironment.java

commit 1712b19b4126aea18742b6c6489486f13e9ee3e4
Author: nragon <nuno...@hotmail.com>
Date:   2017-03-16T10:33:31Z

Update RemoteStreamEnvironment.java

commit 690969c0bc4fd2ac5828cff01d621f86e48de4d1
Author: nragon <nuno...@hotmail.com>
Date:   2017-03-16T10:33:45Z

Update StreamContextEnvironment.java

commit 143a712df865c7c2de9137f1199c0723a4419e23
Author: nragon <nuno...@hotmail.com>
Date:   2017-03-16T10:34:01Z

Update StreamExecutionEnvironment.java

commit 15a245e1fbd8c60d019677818aa6e4dd4a4ca0cb
Author: nragon <nuno...@hotmail.com>
Date:   2017-03-16T10:34:18Z

Update StreamPlanEnvironment.java

commit 2753c5ba17fd4e91036b5d5b2f806b39763b8232
Author: nragon <nuno...@hotmail.com>
Date:   2017-03-16T10:35:59Z

Update Flip6LocalStreamEnvironment.java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

2017-03-01 Thread nragon
Github user nragon commented on the issue:

https://github.com/apache/flink/pull/2332
  
Hi :)

I needed to use hbase as sink so i decided to take a look at this pull a 
use it.
Current changes that might be interesting: Like hbase connector for 
dataset, it's possible to define the configuration in other to connect, for 
instance, to a remote hbase master. In 
[HBaseSink](https://github.com/delding/flink/blob/16ad4b2ac13567ceba7bacf15e4698fb4ce17c53/flink-streaming-connectors/flink-connector-hbase/src/main/java/org/apache/flink/streaming/connectors/hbase/HBaseSink.java)
 the `HBaseConfiguration.create()` method should receive this configurations 
set by constructor (or other method)
Hope this makes sense.

Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---