Even if aggregation that forces a full table scan across partitions is
not recommended, the message/exception does seems unrelated to partitioning:
cqlsh:flightdata> select late_flights(uniquecarrier, depdel15) from
flightsbydate in ('2015-09-15', '2015-09-16',
'2015-09-17', '2015-09-18', '2015-09-19', '2015-09-20', '2015-09-21');
Traceback (most recent call last):
File "CassandraInstall-3.1/bin/cqlsh.py", line 1258, in
perform_simple_statement
result = future.result()
File
"/home/wpl/CassandraInstall-3.1/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cluster.py",
line 3122, in result
raise self._final_exception
FunctionFailure: code=1400 [User Defined Function failure]
message="execution of 'flightdata.state_late_flights[map<text,
frozen<tuple<int, int>>>, text, decimal]' failed:
java.security.AccessControlException: access denied
("java.io.FilePermission"
"/home/wpl/CassandraInstall-3.1/conf/logback.xml" "read")"
Is that right?
And note that this same aggregation query (on a subset of the month's
days) does complete successfully sometimes.
The behavior is similar with Cassandra 3.0 as well: on the same set of
days, the query sometimes succeeds, fails most times. Would trying the
Datastax distribution offer any better chances?
Thanks,
Dinesh.
On 12/24/2015 2:59 AM, DuyHai Doan wrote:
Thanks for the pointer on internal paging Tyler, I missed this one.
But then it raises some questions:
1. Is it possible to "tune" the page size or is it hard-coded internally ?
2. Is read-repair performed on EACH page or is it done on the whole
requested rows once they are fetched ?
Question 2. is relevant in some particular scenarios when the user is
using CL QUORUM (or more) and some replicas are out-of-sync. Even in
the case of aggregation over a single partition, if this partition is
wide and spans many fetch pages, the time the coordinator performs all
the read-repair and reconcile over QUORUM replicas, the query may
timeout very quickly.
On Fri, Dec 18, 2015 at 5:26 PM, Tyler Hobbs <ty...@datastax.com
<mailto:ty...@datastax.com>> wrote:
On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan <doanduy...@gmail.com
<mailto:doanduy...@gmail.com>> wrote:
Cassandra will perform a full table scan and fetch all the
data in memory to apply the aggregate function.
Just to clarify for others on the list: when executing aggregation
functions, Cassandra /will/ use paging internally, so at most one
page worth of data will be held in memory at a time. However, if
your aggregation function retains a large amount of data, this may
contribute to heap pressure.
--
Tyler Hobbs
DataStax <http://datastax.com/>