Re: Cassandra store questions

2016-10-13 Thread Igor Rudyak
Ok, thanks.

Igor

On Oct 13, 2016 4:37 PM, "Valentin Kulichenko" <
valentin.kuliche...@gmail.com> wrote:

> Here is the ticket: https://issues.apache.org/jira/browse/IGNITE-4075
>
> -Val
>
> On Wed, Oct 12, 2016 at 6:45 PM, Igor Rudyak  wrote:
>
>> Hi Val,
>>
>> I don't have any objections - please create a ticket and link it to the
>> root ticket https://issues.apache.org/jira/browse/IGNITE-1371
>>
>> Igor
>>
>> On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>>
>>> Hi Igor,
>>>
>>> 1. I still think we should do this. Loading nothing is very
>>> counterintuitive and prevents a newbie user from quick start. For large
>>> tables, when only part of the dataset is needed, user will explicitly
>>> specify the query, of course. Do you have objections? If no, I will create
>>> a ticket.
>>>
>>> 2. Got it, thanks.
>>>
>>> -Val
>>>
>>> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak  wrote:
>>>
 Hi Val,

 1) Well, it's not a problem to implement such default behavior, but
 there is one concern. In most cases, when you are using Cassandra as a
 persistent store you are going to store large amount of data, which is
 significantly bigger that amount of RAM in your Ignite cluster. In the such
 case it doesn't make sense to launch CQL query like "select * from
 my_table" cause:
a) You still will not be able to keep all data from Cassandra table
 in Ignite cache
b) All the data will be pulled from Cassandra table using only one
 thread - which is very slow

 2) Unfortunately it's not possible in Cassandra. For JDBC you are
 splitting table into chunks of 512 rows each, using sub-queries and
 ordering by primary keys. Such kind of things are not supported in
 Cassandra. Probably the only way to load data from Cassandra table in
 parallel, is to load it from some specified partitions (in parallel for
 each partition).


 Igor Rudyak

 On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
 valentin.kuliche...@gmail.com> wrote:

> Hi Igor,
>
> Thanks for response!
>
> 1. It's a bit inconsistent with other store implementations we have in
> the product and actually I find this counterintuitive. Why don't we just
> load all the data available in the table? Explicit query is useful when 
> you
> want to customize this and load subset of data based on some criteria. If
> this is not possible for some reason, then I would at least throw an
> exception in case query is not specified.
>
> 2. Is it possible to automatically split the data in bulks and load
> them in parallel? We do this in the JDBC store, for example.
>
> -Val
>
> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak 
> wrote:
>
>> Hi Val,
>>
>> 1) If you'll call loadCache(null) it will do nothing. You need to
>> provide at least one CQL query.
>>
>> 2) It depends. If you'll provide more than one CQL query, it will use
>> separate thread for each of the queries (max number of threads limited to
>> the number of CPU cores). But for each provided CQL query it will use 
>> only
>> one thread to load all the data returned by the query. Also it will run 
>> the
>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>> That's because loadCache method will be executed on each Ignite node. As
>> you see, it's not very efficient way to load data from Cassandra just by
>> specifying CQL query. The ticket I created, is all about how to load data
>> from one table (or from multiple tables as well) in parallel by
>> partitioning it. Such a way each Ignite node will be responsible to load
>> data from the specific partition range of Cassandra table, which is much
>> more efficient. To support such kind of cache warm-up you should design
>> your Cassandra table specific way - there should be some mapping from
>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>> implement this.
>>
>> Igor Rudyak
>>
>>
>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>>
>>> Hi Igor,
>>>
>>> I've got couple of quick questions about the Cassandra store.
>>>
>>>1. In [1] you suggested to provide an explicit query as a
>>>parameter for loadCache() method, because otherwise user was always 
>>> getting
>>>empty result. Is this a requirement to provide the query? What if I 
>>> just
>>>call loadCache(null)?
>>>2. There is a ticket [2] about parallel load in Cassandra store.
>>>Does it mean that currently it loads only in a single threaded 
>>> fashion? If
>>>so, do you have any plans to implement this improvement?
>>>
>>> 

Re: Cassandra store questions

2016-10-13 Thread Valentin Kulichenko
Here is the ticket: https://issues.apache.org/jira/browse/IGNITE-4075

-Val

On Wed, Oct 12, 2016 at 6:45 PM, Igor Rudyak  wrote:

> Hi Val,
>
> I don't have any objections - please create a ticket and link it to the
> root ticket https://issues.apache.org/jira/browse/IGNITE-1371
>
> Igor
>
> On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Hi Igor,
>>
>> 1. I still think we should do this. Loading nothing is very
>> counterintuitive and prevents a newbie user from quick start. For large
>> tables, when only part of the dataset is needed, user will explicitly
>> specify the query, of course. Do you have objections? If no, I will create
>> a ticket.
>>
>> 2. Got it, thanks.
>>
>> -Val
>>
>> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak  wrote:
>>
>>> Hi Val,
>>>
>>> 1) Well, it's not a problem to implement such default behavior, but
>>> there is one concern. In most cases, when you are using Cassandra as a
>>> persistent store you are going to store large amount of data, which is
>>> significantly bigger that amount of RAM in your Ignite cluster. In the such
>>> case it doesn't make sense to launch CQL query like "select * from
>>> my_table" cause:
>>>a) You still will not be able to keep all data from Cassandra table
>>> in Ignite cache
>>>b) All the data will be pulled from Cassandra table using only one
>>> thread - which is very slow
>>>
>>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
>>> splitting table into chunks of 512 rows each, using sub-queries and
>>> ordering by primary keys. Such kind of things are not supported in
>>> Cassandra. Probably the only way to load data from Cassandra table in
>>> parallel, is to load it from some specified partitions (in parallel for
>>> each partition).
>>>
>>>
>>> Igor Rudyak
>>>
>>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
>>> valentin.kuliche...@gmail.com> wrote:
>>>
 Hi Igor,

 Thanks for response!

 1. It's a bit inconsistent with other store implementations we have in
 the product and actually I find this counterintuitive. Why don't we just
 load all the data available in the table? Explicit query is useful when you
 want to customize this and load subset of data based on some criteria. If
 this is not possible for some reason, then I would at least throw an
 exception in case query is not specified.

 2. Is it possible to automatically split the data in bulks and load
 them in parallel? We do this in the JDBC store, for example.

 -Val

 On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak  wrote:

> Hi Val,
>
> 1) If you'll call loadCache(null) it will do nothing. You need to
> provide at least one CQL query.
>
> 2) It depends. If you'll provide more than one CQL query, it will use
> separate thread for each of the queries (max number of threads limited to
> the number of CPU cores). But for each provided CQL query it will use only
> one thread to load all the data returned by the query. Also it will run 
> the
> same CQL query from ALL Ignite nodes to load the same data, which is bad.
> That's because loadCache method will be executed on each Ignite node. As
> you see, it's not very efficient way to load data from Cassandra just by
> specifying CQL query. The ticket I created, is all about how to load data
> from one table (or from multiple tables as well) in parallel by
> partitioning it. Such a way each Ignite node will be responsible to load
> data from the specific partition range of Cassandra table, which is much
> more efficient. To support such kind of cache warm-up you should design
> your Cassandra table specific way - there should be some mapping from
> Ignite partition to the set of Cassandra partitions. Yes I have plans to
> implement this.
>
> Igor Rudyak
>
>
> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Hi Igor,
>>
>> I've got couple of quick questions about the Cassandra store.
>>
>>1. In [1] you suggested to provide an explicit query as a
>>parameter for loadCache() method, because otherwise user was always 
>> getting
>>empty result. Is this a requirement to provide the query? What if I 
>> just
>>call loadCache(null)?
>>2. There is a ticket [2] about parallel load in Cassandra store.
>>Does it mean that currently it loads only in a single threaded 
>> fashion? If
>>so, do you have any plans to implement this improvement?
>>
>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>
>> Thanks,
>> Val
>>
>
>

Re: Cassandra store questions

2016-10-12 Thread Igor Rudyak
Hi Val,

I don't have any objections - please create a ticket and link it to the
root ticket https://issues.apache.org/jira/browse/IGNITE-1371

Igor

On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Hi Igor,
>
> 1. I still think we should do this. Loading nothing is very
> counterintuitive and prevents a newbie user from quick start. For large
> tables, when only part of the dataset is needed, user will explicitly
> specify the query, of course. Do you have objections? If no, I will create
> a ticket.
>
> 2. Got it, thanks.
>
> -Val
>
> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak  wrote:
>
>> Hi Val,
>>
>> 1) Well, it's not a problem to implement such default behavior, but there
>> is one concern. In most cases, when you are using Cassandra as a persistent
>> store you are going to store large amount of data, which is significantly
>> bigger that amount of RAM in your Ignite cluster. In the such case it
>> doesn't make sense to launch CQL query like "select * from my_table" cause:
>>a) You still will not be able to keep all data from Cassandra table in
>> Ignite cache
>>b) All the data will be pulled from Cassandra table using only one
>> thread - which is very slow
>>
>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
>> splitting table into chunks of 512 rows each, using sub-queries and
>> ordering by primary keys. Such kind of things are not supported in
>> Cassandra. Probably the only way to load data from Cassandra table in
>> parallel, is to load it from some specified partitions (in parallel for
>> each partition).
>>
>>
>> Igor Rudyak
>>
>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>>
>>> Hi Igor,
>>>
>>> Thanks for response!
>>>
>>> 1. It's a bit inconsistent with other store implementations we have in
>>> the product and actually I find this counterintuitive. Why don't we just
>>> load all the data available in the table? Explicit query is useful when you
>>> want to customize this and load subset of data based on some criteria. If
>>> this is not possible for some reason, then I would at least throw an
>>> exception in case query is not specified.
>>>
>>> 2. Is it possible to automatically split the data in bulks and load them
>>> in parallel? We do this in the JDBC store, for example.
>>>
>>> -Val
>>>
>>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak  wrote:
>>>
 Hi Val,

 1) If you'll call loadCache(null) it will do nothing. You need to
 provide at least one CQL query.

 2) It depends. If you'll provide more than one CQL query, it will use
 separate thread for each of the queries (max number of threads limited to
 the number of CPU cores). But for each provided CQL query it will use only
 one thread to load all the data returned by the query. Also it will run the
 same CQL query from ALL Ignite nodes to load the same data, which is bad.
 That's because loadCache method will be executed on each Ignite node. As
 you see, it's not very efficient way to load data from Cassandra just by
 specifying CQL query. The ticket I created, is all about how to load data
 from one table (or from multiple tables as well) in parallel by
 partitioning it. Such a way each Ignite node will be responsible to load
 data from the specific partition range of Cassandra table, which is much
 more efficient. To support such kind of cache warm-up you should design
 your Cassandra table specific way - there should be some mapping from
 Ignite partition to the set of Cassandra partitions. Yes I have plans to
 implement this.

 Igor Rudyak


 On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
 valentin.kuliche...@gmail.com> wrote:

> Hi Igor,
>
> I've got couple of quick questions about the Cassandra store.
>
>1. In [1] you suggested to provide an explicit query as a
>parameter for loadCache() method, because otherwise user was always 
> getting
>empty result. Is this a requirement to provide the query? What if I 
> just
>call loadCache(null)?
>2. There is a ticket [2] about parallel load in Cassandra store.
>Does it mean that currently it loads only in a single threaded 
> fashion? If
>so, do you have any plans to implement this improvement?
>
> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>
> Thanks,
> Val
>


>>>
>>
>


Re: Cassandra store questions

2016-10-12 Thread Valentin Kulichenko
Hi Igor,

1. I still think we should do this. Loading nothing is very
counterintuitive and prevents a newbie user from quick start. For large
tables, when only part of the dataset is needed, user will explicitly
specify the query, of course. Do you have objections? If no, I will create
a ticket.

2. Got it, thanks.

-Val

On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak  wrote:

> Hi Val,
>
> 1) Well, it's not a problem to implement such default behavior, but there
> is one concern. In most cases, when you are using Cassandra as a persistent
> store you are going to store large amount of data, which is significantly
> bigger that amount of RAM in your Ignite cluster. In the such case it
> doesn't make sense to launch CQL query like "select * from my_table" cause:
>a) You still will not be able to keep all data from Cassandra table in
> Ignite cache
>b) All the data will be pulled from Cassandra table using only one
> thread - which is very slow
>
> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
> splitting table into chunks of 512 rows each, using sub-queries and
> ordering by primary keys. Such kind of things are not supported in
> Cassandra. Probably the only way to load data from Cassandra table in
> parallel, is to load it from some specified partitions (in parallel for
> each partition).
>
>
> Igor Rudyak
>
> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Hi Igor,
>>
>> Thanks for response!
>>
>> 1. It's a bit inconsistent with other store implementations we have in
>> the product and actually I find this counterintuitive. Why don't we just
>> load all the data available in the table? Explicit query is useful when you
>> want to customize this and load subset of data based on some criteria. If
>> this is not possible for some reason, then I would at least throw an
>> exception in case query is not specified.
>>
>> 2. Is it possible to automatically split the data in bulks and load them
>> in parallel? We do this in the JDBC store, for example.
>>
>> -Val
>>
>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak  wrote:
>>
>>> Hi Val,
>>>
>>> 1) If you'll call loadCache(null) it will do nothing. You need to
>>> provide at least one CQL query.
>>>
>>> 2) It depends. If you'll provide more than one CQL query, it will use
>>> separate thread for each of the queries (max number of threads limited to
>>> the number of CPU cores). But for each provided CQL query it will use only
>>> one thread to load all the data returned by the query. Also it will run the
>>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>>> That's because loadCache method will be executed on each Ignite node. As
>>> you see, it's not very efficient way to load data from Cassandra just by
>>> specifying CQL query. The ticket I created, is all about how to load data
>>> from one table (or from multiple tables as well) in parallel by
>>> partitioning it. Such a way each Ignite node will be responsible to load
>>> data from the specific partition range of Cassandra table, which is much
>>> more efficient. To support such kind of cache warm-up you should design
>>> your Cassandra table specific way - there should be some mapping from
>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>>> implement this.
>>>
>>> Igor Rudyak
>>>
>>>
>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>>> valentin.kuliche...@gmail.com> wrote:
>>>
 Hi Igor,

 I've got couple of quick questions about the Cassandra store.

1. In [1] you suggested to provide an explicit query as a parameter
for loadCache() method, because otherwise user was always getting empty
result. Is this a requirement to provide the query? What if I just call
loadCache(null)?
2. There is a ticket [2] about parallel load in Cassandra store.
Does it mean that currently it loads only in a single threaded fashion? 
 If
so, do you have any plans to implement this improvement?

 [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
 ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
 [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180

 Thanks,
 Val

>>>
>>>
>>
>


Re: Cassandra store questions

2016-10-07 Thread Valentin Kulichenko
Hi Igor,

Thanks for response!

1. It's a bit inconsistent with other store implementations we have in the
product and actually I find this counterintuitive. Why don't we just load
all the data available in the table? Explicit query is useful when you want
to customize this and load subset of data based on some criteria. If this
is not possible for some reason, then I would at least throw an exception
in case query is not specified.

2. Is it possible to automatically split the data in bulks and load them in
parallel? We do this in the JDBC store, for example.

-Val

On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak  wrote:

> Hi Val,
>
> 1) If you'll call loadCache(null) it will do nothing. You need to provide
> at least one CQL query.
>
> 2) It depends. If you'll provide more than one CQL query, it will use
> separate thread for each of the queries (max number of threads limited to
> the number of CPU cores). But for each provided CQL query it will use only
> one thread to load all the data returned by the query. Also it will run the
> same CQL query from ALL Ignite nodes to load the same data, which is bad.
> That's because loadCache method will be executed on each Ignite node. As
> you see, it's not very efficient way to load data from Cassandra just by
> specifying CQL query. The ticket I created, is all about how to load data
> from one table (or from multiple tables as well) in parallel by
> partitioning it. Such a way each Ignite node will be responsible to load
> data from the specific partition range of Cassandra table, which is much
> more efficient. To support such kind of cache warm-up you should design
> your Cassandra table specific way - there should be some mapping from
> Ignite partition to the set of Cassandra partitions. Yes I have plans to
> implement this.
>
> Igor Rudyak
>
>
> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Hi Igor,
>>
>> I've got couple of quick questions about the Cassandra store.
>>
>>1. In [1] you suggested to provide an explicit query as a parameter
>>for loadCache() method, because otherwise user was always getting empty
>>result. Is this a requirement to provide the query? What if I just call
>>loadCache(null)?
>>2. There is a ticket [2] about parallel load in Cassandra store. Does
>>it mean that currently it loads only in a single threaded fashion? If so,
>>do you have any plans to implement this improvement?
>>
>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-
>> query-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>
>> Thanks,
>> Val
>>
>
>


Re: Cassandra store questions

2016-10-07 Thread Igor Rudyak
Hi Val,

1) If you'll call loadCache(null) it will do nothing. You need to provide
at least one CQL query.

2) It depends. If you'll provide more than one CQL query, it will use
separate thread for each of the queries (max number of threads limited to
the number of CPU cores). But for each provided CQL query it will use only
one thread to load all the data returned by the query. Also it will run the
same CQL query from ALL Ignite nodes to load the same data, which is bad.
That's because loadCache method will be executed on each Ignite node. As
you see, it's not very efficient way to load data from Cassandra just by
specifying CQL query. The ticket I created, is all about how to load data
from one table (or from multiple tables as well) in parallel by
partitioning it. Such a way each Ignite node will be responsible to load
data from the specific partition range of Cassandra table, which is much
more efficient. To support such kind of cache warm-up you should design
your Cassandra table specific way - there should be some mapping from
Ignite partition to the set of Cassandra partitions. Yes I have plans to
implement this.

Igor Rudyak


On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Hi Igor,
>
> I've got couple of quick questions about the Cassandra store.
>
>1. In [1] you suggested to provide an explicit query as a parameter
>for loadCache() method, because otherwise user was always getting empty
>result. Is this a requirement to provide the query? What if I just call
>loadCache(null)?
>2. There is a ticket [2] about parallel load in Cassandra store. Does
>it mean that currently it loads only in a single threaded fashion? If so,
>do you have any plans to implement this improvement?
>
> [1] http://apache-ignite-users.70518.x6.nabble.com/
> Cannot-query-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>
> Thanks,
> Val
>