[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496579#comment-14496579 ] Piotr Kołaczkowski commented on CASSANDRA-6348: --- @Alex maybe a simple solution would be to allow to disable predicate push down for cases where ALLOW FILTERING would be needed? TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu Attachments: 6348.txt If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496805#comment-14496805 ] Alex Liu commented on CASSANDRA-6348: - Hive has a setting to enable pushdown, by default it's disable. User can enable it if the table has only one indexed column. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu Attachments: 6348.txt If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863159#comment-13863159 ] Alex Liu commented on CASSANDRA-6348: - Add @bcoverston, This issue hits hard on customer if hadoop uses multiple indexes. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu Attachments: 6348.txt If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830401#comment-13830401 ] Alex Liu commented on CASSANDRA-6348: - rowsPerQuery is only used as page size for Index CF during 2i search. maxColumns is the number of limit clause. If meanColumns is a big number, then filter.maxColumns()/meanColumns is less than 1, rowsPerQuery is 2. The result paging size for index CF is 2 which is too small, we end up with too many random seeks between index CF and base CF, that's the reason why sometimes 2i index search is so slow. We need to avoid the page size of index CF too small. The goal is to set page size an enough large number but not too large to avoid OOM, so we can have less random seeks between index CF and base CF. If there is data filtering involved and many base CF columns don't match the filter, the small page size causes the issue even worse for we needs paging through more pages in index CF. {code} public int maxRows() { return countCQL3Rows ? Integer.MAX_VALUE : maxResults; } public int maxColumns() { return countCQL3Rows ? maxResults : Integer.MAX_VALUE; } {code} for none-cql query, {code} rowsPerQuery = Math.max(Math.min(filter.maxResults, Integer.MAX_VALUE / meanColumns), 2); most likely becomes rowsPerQuery = Math.max(filter.maxResults, 2); most likely becomes rowsPerQuery = filter.maxResults which is the same number of rows to fetch {code} for cql query {code} rowsPerQuery = Math.max(Math.min(Integer.MAX_VALUE, filter.maxResults / meanColumns), 2); most likely becomes rowsPerQuery = Math.max(filter.maxResults/ meanColumns, 2); most likely becomes rowsPerQuery = filter.maxResults/ meanColumns if meanColumns is too big, it's a very small number less than 1 possible. if no limit clause in cql query, it becomes Integer.MAX_VALUE/ meanColumns which is a big number. {code} So the question is how to calculate page size for index CF, so we don't have too many random seeks between index CF and base CF and void fetching too many index columns to avoid OOM. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu Attachments: 6348.txt If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830415#comment-13830415 ] Alex Liu commented on CASSANDRA-6348: - If there is data filtering, for cql query, the total number of index columns needed is unknown, and it's not directly related to the limit clause, so we can't calculate it based on limit clause. Set it to a magic number which is large enough but not too large is a viable solution. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu Attachments: 6348.txt If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829445#comment-13829445 ] Alex Liu commented on CASSANDRA-6348: - It's interesting that if LIMIT clause is in the query, it's timeout, otherwise it's fine. {code} cqlsh:cql3ks select key, qty, size from cf where qty498 and color='red' and size = 'P' allow filtering; key| qty | size +-+-- key_910500 | 499 |P key_35500 | 499 |P key_945500 | 499 |P key_420500 | 499 |P key_140500 | 499 |P key_630500 | 499 |P key_210500 | 499 |P key_805500 | 499 |P key_700500 | 499 |P key_735500 | 499 |P key_385500 | 499 |P key_175500 | 499 |P key_455500 | 499 |P key_245500 | 499 |P key_770500 | 499 |P key_875500 | 499 |P key_70500 | 499 |P key_980500 | 499 |P key_280500 | 499 |P key_105500 | 499 |P key_525500 | 499 |P key_665500 | 499 |P key_595500 | 499 |P key_315500 | 499 |P key_490500 | 499 |P key_350500 | 499 |P key_500 | 499 |P key_840500 | 499 |P key_560500 | 499 |P cqlsh:cql3ks select key, qty, size from cf where qty498 and color='red' and size = 'P' limit 1 allow filtering; Request did not complete within rpc_timeout. {code} TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829534#comment-13829534 ] Alex Liu commented on CASSANDRA-6348: - The rowsPerQuery is 2 for the query with limit 1, but it's 38669 for the query without limit TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826335#comment-13826335 ] Sylvain Lebresne commented on CASSANDRA-6348: - bq. Other than hadoop queries, It's common for user to query on multiple indexes I sure hope you're wrong and for sure it shoudn't be, because Cassandra sucks at it. And I personally have almost never seen anyone use it (on the mailing list for instance). ALLOW FILTERING is really meant as a don't do unless you're just having fun with cqlsh on a toy database. Using ALLOW FILTERING on real production queries is wrong (at least for CQL queries, I'm not talking about Hadoop, which is a different problem). I'm more than happy to make the document/message more clear about that fact if it's not. bq. Hadoop Cql query uses ALLOW FILTERING Which is kind of a problem in the sense that it's not what ALLOW FILTERING has been intended for and that more generally CQL has never been designed with Hadoop in mind, it's a strictly real-time oriented language. So maybe we should re-purpose ALLOW FILTERING as the hadoop mode somehow, but if we do, we should be a explicit about it and think about how to do that best. But trying to shove Hadoop into something it hasn't been made for feels wrong to me. That being said, I wonder if an overall simpler solution to the Hadoop wants to use the 2dnary indexes problem couldn't be better solves by letting it query the 2ndary index CFS directly. That is, allow selects on the index itself (which would obviously require a special flag to unlock). That way, Hadoop would get paging over the index for free (which at the end of the day is the problem that needs solving if I understand it correctly) and would get control over that paging. And it would allow Hadoop to do things like merging indexes that probably make more sense on the Hadoop side that it makes on the realtime side (i.e. we keep Cassandra focuses on on realtime queries with as little processing as possible, which is what it is good at). TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826809#comment-13826809 ] Alex Liu commented on CASSANDRA-6348: - C* is not alone, PostgreSQL has the similar filter predicates issue -- index Filter Predicate http://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates {code} Note Index filter predicates give a false sense of safety; even though an index is used, the performance degrades rapidly on a growing data volume or system load. {code} TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826860#comment-13826860 ] Alex Liu commented on CASSANDRA-6348: - One solution is to implement a query execution planner, so that it can optimize the execution path to avoid the bad performance or look for best performance. For data filtering, if the index size is above certain threshold we disable index scanning, and do a full table scanning or index merging. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825401#comment-13825401 ] Sylvain Lebresne commented on CASSANDRA-6348: - Hum, can't really reproduce on the cassandra-1.2 branch: {noformat} Connected to test at 127.0.0.1:9160. [cqlsh 3.1.8 | Cassandra 1.2.11-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.1] Use HELP for help. cqlsh create KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh use ks; cqlsh:ks create table test ( key1 int, key2 int , col1 int, col2 int, primary key (key1, key2)); cqlsh:ks create index col1 on test(col1); cqlsh:ks create index col2 on test(col2); cqlsh:ks select * from test where col1=100 and col2 =1; Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING {noformat} I.e. ALLOW FILTERING does is required. bq. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. Just to make sure we agree, that's *exactly* what requiring ALLOW FILTERING is about, warning the user that C* does not execute the query smartly and that the performance will suck. You should *never* use ALLOW FILTERING in production unless you know very well what you do in particular. bq. We should be able to auto page through 2i CF (for native protocol), so if the auto-paging ends in the middle of a index scanning This is not really what the native protocol paging is about. If you ask pages of 1000 results, the native protocol paging will return you pages of 1000 results until you're done paging. In that case, the point is that it takes a long time to find any results at all because the way we handle the query is dumb. But I'll note that we do page internally the index scanning (which is why you can get a timeout but in theory not an OOM). Note that I'm not saying we shouldn't improve the way we handle such queries, but that's a whole separate issue (CASSANDRA-6048). TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825496#comment-13825496 ] Alex Liu commented on CASSANDRA-6348: - I forgot to put the ALLOW FILTERING in the clauses. The issue is raised during the Hadoop performance testing on indexed columns(The test case indexes on the columns which results in too big index). Hadoop Cql query uses ALLOW FILTERING, user can provide user defined where clauses which might have data filtering on multiple columns. But the hadoop user may not understand fully what does data filtering work under the hood. Other than hadoop queries, It's common for user to query on multiple indexes, we should explain more detail about when the ALLOW FILTERING results in bad performance and which case leads to timeout exception in the following exception. {code} Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING {code} For most of the cases, ALLOW FILTERING improves performance. We can't assume that user can fully understand ALLOW FILTERING under the hood. I even spend quite some time on CASSANDRA-6048 to understand more about data filtering. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823528#comment-13823528 ] Sylvain Lebresne commented on CASSANDRA-6348: - What version is that test case against? Because requiring ALLOW FILTERING is definitively the intent of the following code from SelectStatement: {noformat} // Make sure this queries is allowed (note: only key range can involve filtering underneath) if (!parameters.allowFiltering stmt.isKeyRange) { // We will potentially filter data if either: // - Have more than one IndexExpression // - Have no index expression and the column filter is not the identity if (stmt.metadataRestrictions.size() 1 || (stmt.metadataRestrictions.isEmpty() !stmt.columnFilterIsIdentity())) throw new InvalidRequestException(Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. + If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING); } {noformat} TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823819#comment-13823819 ] Alex Liu commented on CASSANDRA-6348: - It was tested against on 1.2.11 release. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823830#comment-13823830 ] Alex Liu commented on CASSANDRA-6348: - We should be able to auto page through 2i CF (for native protocol), so if the auto-paging ends in the middle of a index scanning, the next page should start from where the index scanning ends in the previous page. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6348) TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering
[ https://issues.apache.org/jira/browse/CASSANDRA-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823034#comment-13823034 ] Jonathan Ellis commented on CASSANDRA-6348: --- Guess we need to require ALLOW FILTERING for these. TimeoutException throws if Cql query allows data filtering and index is too big and it can't find the data in base CF after filtering -- Key: CASSANDRA-6348 URL: https://issues.apache.org/jira/browse/CASSANDRA-6348 Project: Cassandra Issue Type: Bug Components: Core Reporter: Alex Liu Assignee: Alex Liu If index row is too big, and filtering can't find the match Cql row in base CF, it keep scanning the index row and retrieving base CF until the index row is scanned completely which may take too long and thrift server returns TimeoutException. This is one of the reasons why we shouldn't index a column if the index is too big. Multiple indexes merging can resolve the case where there are only EQUAL clauses. (CASSANDRA-6048 addresses it). If the query has none-EQUAL clauses, we still need do data filtering which might lead to timeout exception. We can either disable those kind of queries or WARN the user that data filtering might lead to timeout exception or OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)