Even if aggregation that forces a full table scan across partitions is not recommended, the message/exception does seems unrelated to partitioning:

   cqlsh:flightdata> select late_flights(uniquecarrier, depdel15) from
   flightsbydate in ('2015-09-15', '2015-09-16',
   '2015-09-17', '2015-09-18', '2015-09-19', '2015-09-20', '2015-09-21');

   Traceback (most recent call last):
      File "CassandraInstall-3.1/bin/cqlsh.py", line 1258, in
   perform_simple_statement
        result = future.result()
      File
"/home/wpl/CassandraInstall-3.1/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cluster.py",
   line 3122, in result
        raise self._final_exception
   FunctionFailure: code=1400 [User Defined Function failure]
   message="execution of 'flightdata.state_late_flights[map<text,
   frozen<tuple<int, int>>>, text, decimal]' failed:
   java.security.AccessControlException: access denied
   ("java.io.FilePermission"
   "/home/wpl/CassandraInstall-3.1/conf/logback.xml" "read")"

Is that right?

And note that this same aggregation query (on a subset of the month's days) does complete successfully sometimes.

The behavior is similar with Cassandra 3.0 as well: on the same set of days, the query sometimes succeeds, fails most times. Would trying the Datastax distribution offer any better chances?

Thanks,
Dinesh.


On 12/24/2015 2:59 AM, DuyHai Doan wrote:
Thanks for the pointer on internal paging Tyler, I missed this one. But then it raises some questions:

1. Is it possible to "tune" the page size or is it hard-coded internally ?
2. Is read-repair performed on EACH page or is it done on the whole requested rows once they are fetched ?

Question 2. is relevant in some particular scenarios when the user is using CL QUORUM (or more) and some replicas are out-of-sync. Even in the case of aggregation over a single partition, if this partition is wide and spans many fetch pages, the time the coordinator performs all the read-repair and reconcile over QUORUM replicas, the query may timeout very quickly.


On Fri, Dec 18, 2015 at 5:26 PM, Tyler Hobbs <ty...@datastax.com <mailto:ty...@datastax.com>> wrote:


    On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan <doanduy...@gmail.com
    <mailto:doanduy...@gmail.com>> wrote:

        Cassandra will perform a full table scan and fetch all the
        data in memory to apply the aggregate function.


    Just to clarify for others on the list: when executing aggregation
    functions, Cassandra /will/ use paging internally, so at most one
    page worth of data will be held in memory at a time.  However, if
    your aggregation function retains a large amount of data, this may
    contribute to heap pressure.


-- Tyler Hobbs
    DataStax <http://datastax.com/>



Reply via email to