Hi Odin, yes you can make your query faster. First of all, you can increase disk resource for tajo workers by setting ' *tajo.worker.resource.**disks*'. This disk resource is related to the number of tasks which are executed in parallel. A high disk resource increases the number of tasks which are executed in parallel. For example, given 10 tasks each of which reads data from hdfs, a tajo worker will execute those tasks one by one. With a disk resource of 2, two tasks can be executed simultaneously. So, it can improve the performance. However, as you may know, if too many tasks access a single disk at the same time, there will be a lot of random accesses which make the query performance worse. So, I recommend to use the real number of physical disks for this configuration. Or, if you already configured multiple disks for hdfs, tajo can automatically detect it and use for tajo worker's disk resource by setting '*tajo.worker.resource.dfs-dir-aware*' as true. Please refer to http://tajo.apache.org/docs/devel/configuration/worker_configuration.html for more information. After changing configuration values, you need to restart your tajo cluster.
In addition, I *strongly recommend* to enable ' *dfs.datanode.hdfs-blocks-metadata.enabled*' for your HDFS. With this configuration, tajo can achieve higher data locality when assigning its tasks to workers. This will improve tajo's performance significantly. You need to restart your hdfs after configuring this, too. Best regards, Jihoon 2015년 10월 9일 (금) 오후 11:43, Odin Guillermo Caudillo Gallegos < [email protected]>님이 작성: > Hi. > I did a select count from a hdfs wich returns me a total record of almost > 17 million. > The count was done in 2 minutes. > I have the current config for the worker: > > <property> > <name>tajo.worker.resource.memory-mb</name> > <value>4096</value> > <description>Available memory size (MB)</description> > </property> > > <property> > <name>tajo.worker.resource.disks</name> > <value>1</value> > <description>Available disk capacity (usually number of > disks)</description> > </property> > > <property> > <name>tajo.worker.tmpdir.locations</name> > > <value>/tmp/tajo-11/tmpdir,/tmp/tajo-11/tmpdir1,/tmp/tajo-11/tmpdir2</value> > <description>A base for other temporary directories.</description> > </property> > > Is there anyway to give the query more power to make it faster? > Do i need to do another configuration? > >
