Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread Elias Levy
If you want to ensure you see all changes to a Cassandra table, you need to
make use of the Change Data Capture
 feature.  For
that, you'll need code running on the Cassandra nodes to read the commit
log segments from the Cassandra CDC directory.  Given that you need to read
those files on the Cassandra nodes, and that Cassandra will stop accepting
writes to the tables if the CDC directory fills up beyond a configurable
size as a result of segments not being read, you probably don't want to
implement the segment reader within Flink.

The folks at DataMountainer have developed a Kafka Connect
 connector for Cassandra's
CDC .  That
would allow you to stream Cassandra table changes out of Cassandra and into
Kafka, where they can be consumed by Flink.  The connector is part of their
stream-reactor 
repo.



On Thu, Sep 7, 2017 at 1:34 AM, kant kodali  wrote:

> Yes I can indeed create them but I wonder if that is even possible? I
> haven't see any framework doing this as of today. Flink has something
> called AsyncDataStream? and I wonder if this can be leveraged to create a
> Stream out of Cassandra source?
>
> Thanks!
>
> On Thu, Sep 7, 2017 at 1:16 AM, Tzu-Li (Gordon) Tai 
> wrote:
>
>> Ah, I see. I’m not aware of any existing work / JIRAs on streaming
>> sources for Cassandra or HBase, only sinks.
>> If you are interested in one, could you open JIRAs for them?
>>
>>
>> On 7 September 2017 at 4:11:05 PM, kant kodali (kanth...@gmail.com)
>> wrote:
>>
>> Hi Gordon,
>>
>> Thanks for the response, I did go over the links for sources and sinks
>> prior to posting my question. Maybe, I didn't get my question across
>> correctly so let me rephrase it. Can I get data out of data stores like
>> Cassandra, Hbase in a streaming manner? coz, currently more or less all the
>> sources are of message queue family.
>>
>> Thanks,
>> Kant
>>
>> On Thu, Sep 7, 2017 at 1:04 AM, Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> Hi!
>>>
>>>
>>> I am wondering if Flink can do streaming from data sources other than
>>> Kafka. For example can Flink do streaming from a database like Cassandra,
>>> HBase, MongoDb to sinks like says Elastic search or Kafka.
>>>
>>>
>>> Yes, Flink currently supports various connectors for different sources
>>> and sinks. For an overview you can check out this documentation [1]
>>> Apache Bahir [2] also maintains some Flink connectors and is released
>>> separately.
>>>
>>> Also for out of core stateful streaming. Is RocksDB the only option?
>>>
>>> Currently, RocksDB is the only option for out-of-core state. There was
>>> some previous discussion for a Cassandra state backend, though [3].
>>>
>>> - Gordon
>>> [1] https://ci.apache.org/projects/flink/flink-docs-release-
>>> 1.3/dev/connectors/index.html
>>> [2] http://bahir.apache.org/
>>> [3] https://issues.apache.org/jira/browse/FLINK-4266
>>>
>>> On 7 September 2017 at 2:58:38 PM, kant kodali (kanth...@gmail.com)
>>> wrote:
>>>
>>> Hi All,
>>>
>>> I am wondering if Flink can do streaming from data sources other than
>>> Kafka. For example can Flink do streaming from a database like Cassandra,
>>> HBase, MongoDb to sinks like says Elastic search or Kafka.
>>>
>>> Also for out of core stateful streaming. Is RocksDB the only option? Can
>>> I use some other key value store that has SQL interface (since RocksDB
>>> doesn't)?
>>>
>>> Thanks,
>>> kant
>>>
>>>
>>
>


Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread kant kodali
Yes I can indeed create them but I wonder if that is even possible? I
haven't see any framework doing this as of today. Flink has something
called AsyncDataStream? and I wonder if this can be leveraged to create a
Stream out of Cassandra source?

Thanks!

On Thu, Sep 7, 2017 at 1:16 AM, Tzu-Li (Gordon) Tai 
wrote:

> Ah, I see. I’m not aware of any existing work / JIRAs on streaming sources
> for Cassandra or HBase, only sinks.
> If you are interested in one, could you open JIRAs for them?
>
>
> On 7 September 2017 at 4:11:05 PM, kant kodali (kanth...@gmail.com) wrote:
>
> Hi Gordon,
>
> Thanks for the response, I did go over the links for sources and sinks
> prior to posting my question. Maybe, I didn't get my question across
> correctly so let me rephrase it. Can I get data out of data stores like
> Cassandra, Hbase in a streaming manner? coz, currently more or less all the
> sources are of message queue family.
>
> Thanks,
> Kant
>
> On Thu, Sep 7, 2017 at 1:04 AM, Tzu-Li (Gordon) Tai 
> wrote:
>
>> Hi!
>>
>>
>> I am wondering if Flink can do streaming from data sources other than
>> Kafka. For example can Flink do streaming from a database like Cassandra,
>> HBase, MongoDb to sinks like says Elastic search or Kafka.
>>
>>
>> Yes, Flink currently supports various connectors for different sources
>> and sinks. For an overview you can check out this documentation [1]
>> Apache Bahir [2] also maintains some Flink connectors and is released
>> separately.
>>
>> Also for out of core stateful streaming. Is RocksDB the only option?
>>
>> Currently, RocksDB is the only option for out-of-core state. There was
>> some previous discussion for a Cassandra state backend, though [3].
>>
>> - Gordon
>> [1] https://ci.apache.org/projects/flink/flink-docs-release-
>> 1.3/dev/connectors/index.html
>> [2] http://bahir.apache.org/
>> [3] https://issues.apache.org/jira/browse/FLINK-4266
>>
>> On 7 September 2017 at 2:58:38 PM, kant kodali (kanth...@gmail.com)
>> wrote:
>>
>> Hi All,
>>
>> I am wondering if Flink can do streaming from data sources other than
>> Kafka. For example can Flink do streaming from a database like Cassandra,
>> HBase, MongoDb to sinks like says Elastic search or Kafka.
>>
>> Also for out of core stateful streaming. Is RocksDB the only option? Can
>> I use some other key value store that has SQL interface (since RocksDB
>> doesn't)?
>>
>> Thanks,
>> kant
>>
>>
>


Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread Tzu-Li (Gordon) Tai
Ah, I see. I’m not aware of any existing work / JIRAs on streaming sources for 
Cassandra or HBase, only sinks.
If you are interested in one, could you open JIRAs for them?


On 7 September 2017 at 4:11:05 PM, kant kodali (kanth...@gmail.com) wrote:

Hi Gordon,

Thanks for the response, I did go over the links for sources and sinks prior to 
posting my question. Maybe, I didn't get my question across correctly so let me 
rephrase it. Can I get data out of data stores like Cassandra, Hbase in a 
streaming manner? coz, currently more or less all the sources are of message 
queue family.

Thanks,
Kant

On Thu, Sep 7, 2017 at 1:04 AM, Tzu-Li (Gordon) Tai  wrote:
Hi!


I am wondering if Flink can do streaming from data sources other than Kafka. 
For example can Flink do streaming from a database like Cassandra, HBase, 
MongoDb to sinks like says Elastic search or Kafka.

Yes, Flink currently supports various connectors for different sources and 
sinks. For an overview you can check out this documentation [1]
Apache Bahir [2] also maintains some Flink connectors and is released 
separately.

Also for out of core stateful streaming. Is RocksDB the only option?
Currently, RocksDB is the only option for out-of-core state. There was some 
previous discussion for a Cassandra state backend, though [3].

- Gordon

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/index.html
[2] http://bahir.apache.org/
[3] https://issues.apache.org/jira/browse/FLINK-4266

On 7 September 2017 at 2:58:38 PM, kant kodali (kanth...@gmail.com) wrote:

Hi All,

I am wondering if Flink can do streaming from data sources other than Kafka. 
For example can Flink do streaming from a database like Cassandra, HBase, 
MongoDb to sinks like says Elastic search or Kafka.

Also for out of core stateful streaming. Is RocksDB the only option? Can I use 
some other key value store that has SQL interface (since RocksDB doesn't)?

Thanks,
kant



Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread kant kodali
Hi Gordon,

Thanks for the response, I did go over the links for sources and sinks
prior to posting my question. Maybe, I didn't get my question across
correctly so let me rephrase it. Can I get data out of data stores like
Cassandra, Hbase in a streaming manner? coz, currently more or less all the
sources are of message queue family.

Thanks,
Kant

On Thu, Sep 7, 2017 at 1:04 AM, Tzu-Li (Gordon) Tai 
wrote:

> Hi!
>
>
> I am wondering if Flink can do streaming from data sources other than
> Kafka. For example can Flink do streaming from a database like Cassandra,
> HBase, MongoDb to sinks like says Elastic search or Kafka.
>
>
> Yes, Flink currently supports various connectors for different sources and
> sinks. For an overview you can check out this documentation [1]
> Apache Bahir [2] also maintains some Flink connectors and is released
> separately.
>
> Also for out of core stateful streaming. Is RocksDB the only option?
>
> Currently, RocksDB is the only option for out-of-core state. There was
> some previous discussion for a Cassandra state backend, though [3].
>
> - Gordon
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/dev/connectors/index.html
> [2] http://bahir.apache.org/
> [3] https://issues.apache.org/jira/browse/FLINK-4266
>
> On 7 September 2017 at 2:58:38 PM, kant kodali (kanth...@gmail.com) wrote:
>
> Hi All,
>
> I am wondering if Flink can do streaming from data sources other than
> Kafka. For example can Flink do streaming from a database like Cassandra,
> HBase, MongoDb to sinks like says Elastic search or Kafka.
>
> Also for out of core stateful streaming. Is RocksDB the only option? Can I
> use some other key value store that has SQL interface (since RocksDB
> doesn't)?
>
> Thanks,
> kant
>
>


Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread Tzu-Li (Gordon) Tai
Hi!


I am wondering if Flink can do streaming from data sources other than Kafka. 
For example can Flink do streaming from a database like Cassandra, HBase, 
MongoDb to sinks like says Elastic search or Kafka.

Yes, Flink currently supports various connectors for different sources and 
sinks. For an overview you can check out this documentation [1]
Apache Bahir [2] also maintains some Flink connectors and is released 
separately.

Also for out of core stateful streaming. Is RocksDB the only option?
Currently, RocksDB is the only option for out-of-core state. There was some 
previous discussion for a Cassandra state backend, though [3].

- Gordon

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/index.html
[2] http://bahir.apache.org/
[3] https://issues.apache.org/jira/browse/FLINK-4266

On 7 September 2017 at 2:58:38 PM, kant kodali (kanth...@gmail.com) wrote:

Hi All,

I am wondering if Flink can do streaming from data sources other than Kafka. 
For example can Flink do streaming from a database like Cassandra, HBase, 
MongoDb to sinks like says Elastic search or Kafka.

Also for out of core stateful streaming. Is RocksDB the only option? Can I use 
some other key value store that has SQL interface (since RocksDB doesn't)?

Thanks,
kant

can flink do streaming from data sources other than Kafka?

2017-09-07 Thread kant kodali
Hi All,

I am wondering if Flink can do streaming from data sources other than
Kafka. For example can Flink do streaming from a database like Cassandra,
HBase, MongoDb to sinks like says Elastic search or Kafka.

Also for out of core stateful streaming. Is RocksDB the only option? Can I
use some other key value store that has SQL interface (since RocksDB
doesn't)?

Thanks,
kant