[ https://issues.apache.org/jira/browse/ASTERIXDB-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Murtadha Hubail resolved ASTERIXDB-2326. ---------------------------------------- Resolution: Fixed Fix Version/s: 0.9.4 > Cannot run aggregation functions when the external dataset size grows too > large > ------------------------------------------------------------------------------- > > Key: ASTERIXDB-2326 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2326 > Project: Apache AsterixDB > Issue Type: Bug > Components: EXT - External data, FUN - Functions > Reporter: James Fang > Assignee: Murtadha Hubail > Priority: Major > Labels: triaged > Fix For: 0.9.4 > > > I was testing aggregation functions on external data, and found that the > aggregation functions would not work at all at 100 million tuples. At > 10million tuples, the aggregates worked. None of the existing aggregates or > the aggregates I am adding will work for 100 million tuples. > DDL: > DROP DATAVERSE AGG_TEST IF EXISTS; > CREATE DATAVERSE AGG_TEST; > USE AGG_TEST; > CREATE TYPE Data AS { > id: int, > val: double > }; > create external dataset dataval(Data) using > localfs((`path`=`127.0.0.1://Users/name/Documents/100000000.txt`),(`format`=`adm`)); > > Query: > USE AGG_TEST; > {"average":coll_avg((select element x.val from dataval as x))}; > > Error: > 11:55:25.603 [Executor-3:ClusterController] INFO > org.apache.asterix.runtime.utils.ClusterStateManager - Cluster State is now > ACTIVE > 11:55:30.447 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: > GetDatasetDirectoryServiceInfo > 11:55:30.917 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: > GetNodeControllersInfo > 11:55:31.345 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: JobStart > 11:55:31.379 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService - > DatasetDirectoryService notified of new job JID:0.1 > 11:55:31.382 [Worker:ClusterController] INFO > org.apache.asterix.app.active.ActiveNotificationHandler - > notifyJobCreation(JobId jobId, JobSpecification jobSpecification) was called > with jobId = JID:0.1 > 11:55:31.382 [Worker:ClusterController] INFO > org.apache.asterix.app.active.ActiveNotificationHandler - Job is not of type > active job. property found to be: null > 11:55:31.393 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Plan for > org.apache.hyracks.api.job.ActivityCluster@1264c6ff > 11:55:31.393 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Built 1 Task > Clusters > 11:55:31.393 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Tasks: > [TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0] > 11:55:31.394 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC roots: > [TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]], inProgressTaskClusters: [] > 11:55:31.412 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: > WaitForJobCompletion > 11:55:31.412 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: StartTasks > 11:55:31.423 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.nc.work.StartTasksWork - Initializing > TAID:TID:ANID:ODID:0:0:0:0 -> > [org.apache.asterix.external.operators.ExternalScanOperatorDescriptor@74fb82e0, > AlgebricksMeta [assign [1] := > [org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@30d487a5], > stream-project [1], assign > [org.apache.asterix.runtime.aggregates.std.LocalAvgAggregateDescriptor$2@6594e4ce]]] > for JID:0.1 > 11:55:31.450 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1 > 11:55:31.453 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.nc.work.StartTasksWork - Initializing > TAID:TID:ANID:ODID:2:0:0:0 -> > [org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor@71b17102, > AlgebricksMeta [assign > [org.apache.asterix.runtime.aggregates.std.GlobalAvgAggregateDescriptor$2@11121dfc], > assign [1] := > [org.apache.asterix.runtime.evaluators.common.ClosedRecordConstructorEvalFactory@443a919b], > stream-project [1]]] for JID:0.1 > 11:55:31.480 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1 > 11:55:31.517 > [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0] > INFO org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - open(0) > 12:00:57.342 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: > NotifyTaskCompleteWork:TAID:TID:ANID:ODID:0:0:0:0 > 12:00:57.351 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: > [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:0:0:0:0] > 12:00:57.365 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: > RegisterResultPartitionLocation: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0 > NPartitions@1 > [ResultPartitionLocation@127.0.0.1:49695|http://ResultPartitionLocation@127.0.0.1:49695/] > OrderedResult@true EmptyResult@false > 12:00:57.368 > [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0] > INFO org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - close(0) > 12:00:57.373 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: > NotifyTaskCompleteWork:TAID:TID:ANID:ODID:2:0:0:0 > 12:00:57.377 [Worker:ClusterController] WARN > org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork - > Failed to register partition location > org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result > set for job JID:0.1 > at > org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) > ~[classes/:?] > at > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) > ~[classes/:?] > at > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) > ~[classes/:?] > at > org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) > [classes/:?] > at > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) > [classes/:?] > 12:00:57.393 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job: > JID:0.1: \{asterix_nc1=[TAID:TID:ANID:ODID:2:0:0:0]} > 12:00:57.394 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.executor.JobExecutor - Aborting: > [TAID:TID:ANID:ODID:2:0:0:0] at asterix_nc1 > 12:00:57.400 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing > uncommitted partitions: [] > 12:00:57.405 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing > partition requests: [] > 12:00:57.407 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: > ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 > Partition@0 > 12:00:57.407 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: AbortTasks > 12:00:57.407 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.nc.work.AbortTasksWork - Aborting Tasks: > JID:0.1:[TAID:TID:ANID:ODID:2:0:0:0] > 12:00:57.407 [Worker:ClusterController] WARN > org.apache.hyracks.control.common.work.WorkQueue - Exception while executing > ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 > Partition@0 > java.lang.RuntimeException: > org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result > set for job JID:0.1 > at > org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:49) > ~[classes/:?] > at > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) > [classes/:?] > Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: > No result set for job JID:0.1 > at > org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) > ~[classes/:?] > at > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) > ~[classes/:?] > at > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.reportResultPartitionWriteCompletion(DatasetDirectoryService.java:141) > ~[classes/:?] > at > org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:47) > ~[classes/:?] > ... 1 more > 12:00:57.408 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: > [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:2:0:0:0] > 12:00:57.409 [Worker:ClusterController] WARN > org.apache.hyracks.control.cc.executor.JobExecutor - Spurious task complete > notification: TAID:TID:ANID:ODID:2:0:0:0 Current state = ABORTED > 12:00:57.409 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: JobCleanup: > JobId@JID:0.1 Status@FAILURE > Exceptions@[org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: > No result set for job JID:0.1] > 12:00:57.409 [Worker:ClusterController] INFO > org.apache.hyracks.control.cc.work.JobCleanupWork - Cleanup for JobRun with > id: JID:0.1 > 12:00:57.412 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: CleanupJoblet > 12:00:57.413 [Worker:asterix_nc1] INFO > org.apache.hyracks.control.nc.work.CleanupJobletWork - Cleaning up after job: > JID:0.1 > 12:00:57.416 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.Joblet > - Freeing leaked 294912 bytes > 12:00:57.421 [Worker:ClusterController] INFO > org.apache.hyracks.control.common.work.WorkQueue - Executing: > JobletCleanupNotification > 12:00:57.421 [Worker:ClusterController] INFO > org.apache.asterix.app.active.ActiveNotificationHandler - Getting notified of > job finish for JobId: JID:0.1 > 12:00:57.421 [Worker:ClusterController] INFO > org.apache.asterix.app.active.ActiveNotificationHandler - NO NEED TO NOTIFY > JOB FINISH! > 12:00:57.430 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:49684]] INFO > org.apache.hyracks.ipc.impl.IPCSystem - Exception in message > org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result > set for job JID:0.1 > at > org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) > ~[classes/:?] > at > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) > ~[classes/:?] > at > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) > ~[classes/:?] > at > org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) > ~[classes/:?] > at > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) > ~[classes/:?] > 12:00:57.436 [HttpExecutor(port:19001)-0] ERROR org.apache.asterix - HYR0024: > No result set for job JID:0.1 > org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result > set for job JID:0.1 > at > org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) > ~[classes/:?] > at > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) > ~[classes/:?] > at > org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) > ~[classes/:?] > at > org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) > ~[classes/:?] > at > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) > ~[classes/:?] > 12:00:57.442 [Worker:ClusterController] WARN > org.apache.hyracks.control.common.work.WorkQueue - Work > JobletCleanupNotification waited 0 times (~0ms), blocked 1 times (~0ms) -- This message was sent by Atlassian JIRA (v7.6.3#76005)