Re: Cassandra store questions
Ok, thanks. Igor On Oct 13, 2016 4:37 PM, "Valentin Kulichenko" < valentin.kuliche...@gmail.com> wrote: > Here is the ticket: https://issues.apache.org/jira/browse/IGNITE-4075 > > -Val > > On Wed, Oct 12, 2016 at 6:45 PM, Igor Rudyak wrote: > >> Hi Val, >> >> I don't have any objections - please create a ticket and link it to the >> root ticket https://issues.apache.org/jira/browse/IGNITE-1371 >> >> Igor >> >> On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko < >> valentin.kuliche...@gmail.com> wrote: >> >>> Hi Igor, >>> >>> 1. I still think we should do this. Loading nothing is very >>> counterintuitive and prevents a newbie user from quick start. For large >>> tables, when only part of the dataset is needed, user will explicitly >>> specify the query, of course. Do you have objections? If no, I will create >>> a ticket. >>> >>> 2. Got it, thanks. >>> >>> -Val >>> >>> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak wrote: >>> Hi Val, 1) Well, it's not a problem to implement such default behavior, but there is one concern. In most cases, when you are using Cassandra as a persistent store you are going to store large amount of data, which is significantly bigger that amount of RAM in your Ignite cluster. In the such case it doesn't make sense to launch CQL query like "select * from my_table" cause: a) You still will not be able to keep all data from Cassandra table in Ignite cache b) All the data will be pulled from Cassandra table using only one thread - which is very slow 2) Unfortunately it's not possible in Cassandra. For JDBC you are splitting table into chunks of 512 rows each, using sub-queries and ordering by primary keys. Such kind of things are not supported in Cassandra. Probably the only way to load data from Cassandra table in parallel, is to load it from some specified partitions (in parallel for each partition). Igor Rudyak On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hi Igor, > > Thanks for response! > > 1. It's a bit inconsistent with other store implementations we have in > the product and actually I find this counterintuitive. Why don't we just > load all the data available in the table? Explicit query is useful when > you > want to customize this and load subset of data based on some criteria. If > this is not possible for some reason, then I would at least throw an > exception in case query is not specified. > > 2. Is it possible to automatically split the data in bulks and load > them in parallel? We do this in the JDBC store, for example. > > -Val > > On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak > wrote: > >> Hi Val, >> >> 1) If you'll call loadCache(null) it will do nothing. You need to >> provide at least one CQL query. >> >> 2) It depends. If you'll provide more than one CQL query, it will use >> separate thread for each of the queries (max number of threads limited to >> the number of CPU cores). But for each provided CQL query it will use >> only >> one thread to load all the data returned by the query. Also it will run >> the >> same CQL query from ALL Ignite nodes to load the same data, which is bad. >> That's because loadCache method will be executed on each Ignite node. As >> you see, it's not very efficient way to load data from Cassandra just by >> specifying CQL query. The ticket I created, is all about how to load data >> from one table (or from multiple tables as well) in parallel by >> partitioning it. Such a way each Ignite node will be responsible to load >> data from the specific partition range of Cassandra table, which is much >> more efficient. To support such kind of cache warm-up you should design >> your Cassandra table specific way - there should be some mapping from >> Ignite partition to the set of Cassandra partitions. Yes I have plans to >> implement this. >> >> Igor Rudyak >> >> >> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < >> valentin.kuliche...@gmail.com> wrote: >> >>> Hi Igor, >>> >>> I've got couple of quick questions about the Cassandra store. >>> >>>1. In [1] you suggested to provide an explicit query as a >>>parameter for loadCache() method, because otherwise user was always >>> getting >>>empty result. Is this a requirement to provide the query? What if I >>> just >>>call loadCache(null)? >>>2. There is a ticket [2] about parallel load in Cassandra store. >>>Does it mean that currently it loads only in a single threaded >>> fashion? If >>>so, do you have any plans to implement this improvement? >>> >>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot
Re: Cassandra store questions
Here is the ticket: https://issues.apache.org/jira/browse/IGNITE-4075 -Val On Wed, Oct 12, 2016 at 6:45 PM, Igor Rudyak wrote: > Hi Val, > > I don't have any objections - please create a ticket and link it to the > root ticket https://issues.apache.org/jira/browse/IGNITE-1371 > > Igor > > On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > >> Hi Igor, >> >> 1. I still think we should do this. Loading nothing is very >> counterintuitive and prevents a newbie user from quick start. For large >> tables, when only part of the dataset is needed, user will explicitly >> specify the query, of course. Do you have objections? If no, I will create >> a ticket. >> >> 2. Got it, thanks. >> >> -Val >> >> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak wrote: >> >>> Hi Val, >>> >>> 1) Well, it's not a problem to implement such default behavior, but >>> there is one concern. In most cases, when you are using Cassandra as a >>> persistent store you are going to store large amount of data, which is >>> significantly bigger that amount of RAM in your Ignite cluster. In the such >>> case it doesn't make sense to launch CQL query like "select * from >>> my_table" cause: >>>a) You still will not be able to keep all data from Cassandra table >>> in Ignite cache >>>b) All the data will be pulled from Cassandra table using only one >>> thread - which is very slow >>> >>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are >>> splitting table into chunks of 512 rows each, using sub-queries and >>> ordering by primary keys. Such kind of things are not supported in >>> Cassandra. Probably the only way to load data from Cassandra table in >>> parallel, is to load it from some specified partitions (in parallel for >>> each partition). >>> >>> >>> Igor Rudyak >>> >>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko < >>> valentin.kuliche...@gmail.com> wrote: >>> Hi Igor, Thanks for response! 1. It's a bit inconsistent with other store implementations we have in the product and actually I find this counterintuitive. Why don't we just load all the data available in the table? Explicit query is useful when you want to customize this and load subset of data based on some criteria. If this is not possible for some reason, then I would at least throw an exception in case query is not specified. 2. Is it possible to automatically split the data in bulks and load them in parallel? We do this in the JDBC store, for example. -Val On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak wrote: > Hi Val, > > 1) If you'll call loadCache(null) it will do nothing. You need to > provide at least one CQL query. > > 2) It depends. If you'll provide more than one CQL query, it will use > separate thread for each of the queries (max number of threads limited to > the number of CPU cores). But for each provided CQL query it will use only > one thread to load all the data returned by the query. Also it will run > the > same CQL query from ALL Ignite nodes to load the same data, which is bad. > That's because loadCache method will be executed on each Ignite node. As > you see, it's not very efficient way to load data from Cassandra just by > specifying CQL query. The ticket I created, is all about how to load data > from one table (or from multiple tables as well) in parallel by > partitioning it. Such a way each Ignite node will be responsible to load > data from the specific partition range of Cassandra table, which is much > more efficient. To support such kind of cache warm-up you should design > your Cassandra table specific way - there should be some mapping from > Ignite partition to the set of Cassandra partitions. Yes I have plans to > implement this. > > Igor Rudyak > > > On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > >> Hi Igor, >> >> I've got couple of quick questions about the Cassandra store. >> >>1. In [1] you suggested to provide an explicit query as a >>parameter for loadCache() method, because otherwise user was always >> getting >>empty result. Is this a requirement to provide the query? What if I >> just >>call loadCache(null)? >>2. There is a ticket [2] about parallel load in Cassandra store. >>Does it mean that currently it loads only in a single threaded >> fashion? If >>so, do you have any plans to implement this improvement? >> >> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu >> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html >> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180 >> >> Thanks, >> Val >> > > >>> >> >
Re: Cassandra store questions
Hi Val, I don't have any objections - please create a ticket and link it to the root ticket https://issues.apache.org/jira/browse/IGNITE-1371 Igor On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hi Igor, > > 1. I still think we should do this. Loading nothing is very > counterintuitive and prevents a newbie user from quick start. For large > tables, when only part of the dataset is needed, user will explicitly > specify the query, of course. Do you have objections? If no, I will create > a ticket. > > 2. Got it, thanks. > > -Val > > On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak wrote: > >> Hi Val, >> >> 1) Well, it's not a problem to implement such default behavior, but there >> is one concern. In most cases, when you are using Cassandra as a persistent >> store you are going to store large amount of data, which is significantly >> bigger that amount of RAM in your Ignite cluster. In the such case it >> doesn't make sense to launch CQL query like "select * from my_table" cause: >>a) You still will not be able to keep all data from Cassandra table in >> Ignite cache >>b) All the data will be pulled from Cassandra table using only one >> thread - which is very slow >> >> 2) Unfortunately it's not possible in Cassandra. For JDBC you are >> splitting table into chunks of 512 rows each, using sub-queries and >> ordering by primary keys. Such kind of things are not supported in >> Cassandra. Probably the only way to load data from Cassandra table in >> parallel, is to load it from some specified partitions (in parallel for >> each partition). >> >> >> Igor Rudyak >> >> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko < >> valentin.kuliche...@gmail.com> wrote: >> >>> Hi Igor, >>> >>> Thanks for response! >>> >>> 1. It's a bit inconsistent with other store implementations we have in >>> the product and actually I find this counterintuitive. Why don't we just >>> load all the data available in the table? Explicit query is useful when you >>> want to customize this and load subset of data based on some criteria. If >>> this is not possible for some reason, then I would at least throw an >>> exception in case query is not specified. >>> >>> 2. Is it possible to automatically split the data in bulks and load them >>> in parallel? We do this in the JDBC store, for example. >>> >>> -Val >>> >>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak wrote: >>> Hi Val, 1) If you'll call loadCache(null) it will do nothing. You need to provide at least one CQL query. 2) It depends. If you'll provide more than one CQL query, it will use separate thread for each of the queries (max number of threads limited to the number of CPU cores). But for each provided CQL query it will use only one thread to load all the data returned by the query. Also it will run the same CQL query from ALL Ignite nodes to load the same data, which is bad. That's because loadCache method will be executed on each Ignite node. As you see, it's not very efficient way to load data from Cassandra just by specifying CQL query. The ticket I created, is all about how to load data from one table (or from multiple tables as well) in parallel by partitioning it. Such a way each Ignite node will be responsible to load data from the specific partition range of Cassandra table, which is much more efficient. To support such kind of cache warm-up you should design your Cassandra table specific way - there should be some mapping from Ignite partition to the set of Cassandra partitions. Yes I have plans to implement this. Igor Rudyak On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hi Igor, > > I've got couple of quick questions about the Cassandra store. > >1. In [1] you suggested to provide an explicit query as a >parameter for loadCache() method, because otherwise user was always > getting >empty result. Is this a requirement to provide the query? What if I > just >call loadCache(null)? >2. There is a ticket [2] about parallel load in Cassandra store. >Does it mean that currently it loads only in a single threaded > fashion? If >so, do you have any plans to implement this improvement? > > [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu > ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html > [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180 > > Thanks, > Val > >>> >> >
Re: Cassandra store questions
Hi Igor, 1. I still think we should do this. Loading nothing is very counterintuitive and prevents a newbie user from quick start. For large tables, when only part of the dataset is needed, user will explicitly specify the query, of course. Do you have objections? If no, I will create a ticket. 2. Got it, thanks. -Val On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak wrote: > Hi Val, > > 1) Well, it's not a problem to implement such default behavior, but there > is one concern. In most cases, when you are using Cassandra as a persistent > store you are going to store large amount of data, which is significantly > bigger that amount of RAM in your Ignite cluster. In the such case it > doesn't make sense to launch CQL query like "select * from my_table" cause: >a) You still will not be able to keep all data from Cassandra table in > Ignite cache >b) All the data will be pulled from Cassandra table using only one > thread - which is very slow > > 2) Unfortunately it's not possible in Cassandra. For JDBC you are > splitting table into chunks of 512 rows each, using sub-queries and > ordering by primary keys. Such kind of things are not supported in > Cassandra. Probably the only way to load data from Cassandra table in > parallel, is to load it from some specified partitions (in parallel for > each partition). > > > Igor Rudyak > > On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > >> Hi Igor, >> >> Thanks for response! >> >> 1. It's a bit inconsistent with other store implementations we have in >> the product and actually I find this counterintuitive. Why don't we just >> load all the data available in the table? Explicit query is useful when you >> want to customize this and load subset of data based on some criteria. If >> this is not possible for some reason, then I would at least throw an >> exception in case query is not specified. >> >> 2. Is it possible to automatically split the data in bulks and load them >> in parallel? We do this in the JDBC store, for example. >> >> -Val >> >> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak wrote: >> >>> Hi Val, >>> >>> 1) If you'll call loadCache(null) it will do nothing. You need to >>> provide at least one CQL query. >>> >>> 2) It depends. If you'll provide more than one CQL query, it will use >>> separate thread for each of the queries (max number of threads limited to >>> the number of CPU cores). But for each provided CQL query it will use only >>> one thread to load all the data returned by the query. Also it will run the >>> same CQL query from ALL Ignite nodes to load the same data, which is bad. >>> That's because loadCache method will be executed on each Ignite node. As >>> you see, it's not very efficient way to load data from Cassandra just by >>> specifying CQL query. The ticket I created, is all about how to load data >>> from one table (or from multiple tables as well) in parallel by >>> partitioning it. Such a way each Ignite node will be responsible to load >>> data from the specific partition range of Cassandra table, which is much >>> more efficient. To support such kind of cache warm-up you should design >>> your Cassandra table specific way - there should be some mapping from >>> Ignite partition to the set of Cassandra partitions. Yes I have plans to >>> implement this. >>> >>> Igor Rudyak >>> >>> >>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < >>> valentin.kuliche...@gmail.com> wrote: >>> Hi Igor, I've got couple of quick questions about the Cassandra store. 1. In [1] you suggested to provide an explicit query as a parameter for loadCache() method, because otherwise user was always getting empty result. Is this a requirement to provide the query? What if I just call loadCache(null)? 2. There is a ticket [2] about parallel load in Cassandra store. Does it mean that currently it loads only in a single threaded fashion? If so, do you have any plans to implement this improvement? [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180 Thanks, Val >>> >>> >> >
Re: Cassandra store questions
Hi Val, 1) Well, it's not a problem to implement such default behavior, but there is one concern. In most cases, when you are using Cassandra as a persistent store you are going to store large amount of data, which is significantly bigger that amount of RAM in your Ignite cluster. In the such case it doesn't make sense to launch CQL query like "select * from my_table" cause: a) You still will not be able to keep all data from Cassandra table in Ignite cache b) All the data will be pulled from Cassandra table using only one thread - which is very slow 2) Unfortunately it's not possible in Cassandra. For JDBC you are splitting table into chunks of 512 rows each, using sub-queries and ordering by primary keys. Such kind of things are not supported in Cassandra. Probably the only way to load data from Cassandra table in parallel, is to load it from some specified partitions (in parallel for each partition). Igor Rudyak On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hi Igor, > > Thanks for response! > > 1. It's a bit inconsistent with other store implementations we have in the > product and actually I find this counterintuitive. Why don't we just load > all the data available in the table? Explicit query is useful when you want > to customize this and load subset of data based on some criteria. If this > is not possible for some reason, then I would at least throw an exception > in case query is not specified. > > 2. Is it possible to automatically split the data in bulks and load them > in parallel? We do this in the JDBC store, for example. > > -Val > > On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak wrote: > >> Hi Val, >> >> 1) If you'll call loadCache(null) it will do nothing. You need to provide >> at least one CQL query. >> >> 2) It depends. If you'll provide more than one CQL query, it will use >> separate thread for each of the queries (max number of threads limited to >> the number of CPU cores). But for each provided CQL query it will use only >> one thread to load all the data returned by the query. Also it will run the >> same CQL query from ALL Ignite nodes to load the same data, which is bad. >> That's because loadCache method will be executed on each Ignite node. As >> you see, it's not very efficient way to load data from Cassandra just by >> specifying CQL query. The ticket I created, is all about how to load data >> from one table (or from multiple tables as well) in parallel by >> partitioning it. Such a way each Ignite node will be responsible to load >> data from the specific partition range of Cassandra table, which is much >> more efficient. To support such kind of cache warm-up you should design >> your Cassandra table specific way - there should be some mapping from >> Ignite partition to the set of Cassandra partitions. Yes I have plans to >> implement this. >> >> Igor Rudyak >> >> >> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < >> valentin.kuliche...@gmail.com> wrote: >> >>> Hi Igor, >>> >>> I've got couple of quick questions about the Cassandra store. >>> >>>1. In [1] you suggested to provide an explicit query as a parameter >>>for loadCache() method, because otherwise user was always getting empty >>>result. Is this a requirement to provide the query? What if I just call >>>loadCache(null)? >>>2. There is a ticket [2] about parallel load in Cassandra store. >>>Does it mean that currently it loads only in a single threaded fashion? >>> If >>>so, do you have any plans to implement this improvement? >>> >>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu >>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html >>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180 >>> >>> Thanks, >>> Val >>> >> >> >
Re: Cassandra store questions
Hi Igor, Thanks for response! 1. It's a bit inconsistent with other store implementations we have in the product and actually I find this counterintuitive. Why don't we just load all the data available in the table? Explicit query is useful when you want to customize this and load subset of data based on some criteria. If this is not possible for some reason, then I would at least throw an exception in case query is not specified. 2. Is it possible to automatically split the data in bulks and load them in parallel? We do this in the JDBC store, for example. -Val On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak wrote: > Hi Val, > > 1) If you'll call loadCache(null) it will do nothing. You need to provide > at least one CQL query. > > 2) It depends. If you'll provide more than one CQL query, it will use > separate thread for each of the queries (max number of threads limited to > the number of CPU cores). But for each provided CQL query it will use only > one thread to load all the data returned by the query. Also it will run the > same CQL query from ALL Ignite nodes to load the same data, which is bad. > That's because loadCache method will be executed on each Ignite node. As > you see, it's not very efficient way to load data from Cassandra just by > specifying CQL query. The ticket I created, is all about how to load data > from one table (or from multiple tables as well) in parallel by > partitioning it. Such a way each Ignite node will be responsible to load > data from the specific partition range of Cassandra table, which is much > more efficient. To support such kind of cache warm-up you should design > your Cassandra table specific way - there should be some mapping from > Ignite partition to the set of Cassandra partitions. Yes I have plans to > implement this. > > Igor Rudyak > > > On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > >> Hi Igor, >> >> I've got couple of quick questions about the Cassandra store. >> >>1. In [1] you suggested to provide an explicit query as a parameter >>for loadCache() method, because otherwise user was always getting empty >>result. Is this a requirement to provide the query? What if I just call >>loadCache(null)? >>2. There is a ticket [2] about parallel load in Cassandra store. Does >>it mean that currently it loads only in a single threaded fashion? If so, >>do you have any plans to implement this improvement? >> >> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot- >> query-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html >> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180 >> >> Thanks, >> Val >> > >
Re: Cassandra store questions
Hi Val, 1) If you'll call loadCache(null) it will do nothing. You need to provide at least one CQL query. 2) It depends. If you'll provide more than one CQL query, it will use separate thread for each of the queries (max number of threads limited to the number of CPU cores). But for each provided CQL query it will use only one thread to load all the data returned by the query. Also it will run the same CQL query from ALL Ignite nodes to load the same data, which is bad. That's because loadCache method will be executed on each Ignite node. As you see, it's not very efficient way to load data from Cassandra just by specifying CQL query. The ticket I created, is all about how to load data from one table (or from multiple tables as well) in parallel by partitioning it. Such a way each Ignite node will be responsible to load data from the specific partition range of Cassandra table, which is much more efficient. To support such kind of cache warm-up you should design your Cassandra table specific way - there should be some mapping from Ignite partition to the set of Cassandra partitions. Yes I have plans to implement this. Igor Rudyak On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hi Igor, > > I've got couple of quick questions about the Cassandra store. > >1. In [1] you suggested to provide an explicit query as a parameter >for loadCache() method, because otherwise user was always getting empty >result. Is this a requirement to provide the query? What if I just call >loadCache(null)? >2. There is a ticket [2] about parallel load in Cassandra store. Does >it mean that currently it loads only in a single threaded fashion? If so, >do you have any plans to implement this improvement? > > [1] http://apache-ignite-users.70518.x6.nabble.com/ > Cannot-query-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html > [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180 > > Thanks, > Val >