Not yet. We are currently working to support authentication ( https://issues.apache.org/jira/browse/TAJO-600). I expect that 0.12 release will include the basic authentication feature.
Thanks! Jihoon 2015년 10월 10일 (토) 오전 2:43, Odin Guillermo Caudillo Gallegos < [email protected]>님이 작성: > Good. thank you for the tips about cleaning. > Is there anyway to configure tajo with kerberos or some security tool like > Sentry already? > > 2015-10-09 11:13 GMT-05:00 Jihoon Son <[email protected]>: > >> You mean, dfs-dir-aware doesn't work, so you set resource.disks as some >> value by yourself, right? If so, I'll check dfs-dir-aware configuration. >> >> Regarding on space cleaning, you can delete any directories. Some system >> directories and files will be automatically created by tajo if they are >> necessary. >> In contrast, deleting data means that tajo works normally but you cannot >> see deleted data anymore. For example, if you delete the query detail >> directory, you cannot see query details on the web ui anymore. This query >> detail directories are automatically deleted as time goes by, so you don't >> need to clean up unless you are suffering from the low available space. >> >> In addition, you may want to delete tajo's temporal data which are stored >> during query execution. The default temporal directory is created at >> /tmp/tajo-${user.name}/tmpdir. So you can delete by yourself, or set >> 'tajo.worker.tmpdir.cleanup-at-startup' for auto cleanup. >> >> Jihoon >> >> 2015년 10월 10일 (토) 오전 12:50, Odin Guillermo Caudillo Gallegos < >> [email protected]>님이 작성: >> >>> Hi. >>> I put the dfs-dir-aware to true, but the performance wasn't the >>> expected. So for test purposes, i let it with resource.disks >>> About the hdfs space cleaning, which directories can i delete from my >>> hadoop? >>> Like, is there a problem if i delete the query detail? Can i delete >>> another folder? >>> Thanks >>> >>> 2015-10-09 10:15 GMT-05:00 Jihoon Son <[email protected]>: >>> >>>> Hi Odin, yes you can make your query faster. >>>> >>>> First of all, you can increase disk resource for tajo workers by >>>> setting '*tajo.worker.resource.**disks*'. This disk resource is >>>> related to the number of tasks which are executed in parallel. A high disk >>>> resource increases the number of tasks which are executed in parallel. For >>>> example, given 10 tasks each of which reads data from hdfs, a tajo worker >>>> will execute those tasks one by one. With a disk resource of 2, two tasks >>>> can be executed simultaneously. So, it can improve the performance. >>>> However, as you may know, if too many tasks access a single disk at the >>>> same time, there will be a lot of random accesses which make the query >>>> performance worse. >>>> So, I recommend to use the real number of physical disks for this >>>> configuration. Or, if you already configured multiple disks for hdfs, tajo >>>> can automatically detect it and use for tajo worker's disk resource by >>>> setting '*tajo.worker.resource.dfs-dir-aware*' as true. Please refer >>>> to >>>> http://tajo.apache.org/docs/devel/configuration/worker_configuration.html >>>> for more information. >>>> After changing configuration values, you need to restart your tajo >>>> cluster. >>>> >>>> In addition, I *strongly recommend* to enable ' >>>> *dfs.datanode.hdfs-blocks-metadata.enabled*' for your HDFS. With this >>>> configuration, tajo can achieve higher data locality when assigning its >>>> tasks to workers. This will improve tajo's performance significantly. You >>>> need to restart your hdfs after configuring this, too. >>>> >>>> Best regards, >>>> Jihoon >>>> >>>> 2015년 10월 9일 (금) 오후 11:43, Odin Guillermo Caudillo Gallegos < >>>> [email protected]>님이 작성: >>>> >>>>> Hi. >>>>> I did a select count from a hdfs wich returns me a total record of >>>>> almost 17 million. >>>>> The count was done in 2 minutes. >>>>> I have the current config for the worker: >>>>> >>>>> <property> >>>>> <name>tajo.worker.resource.memory-mb</name> >>>>> <value>4096</value> >>>>> <description>Available memory size (MB)</description> >>>>> </property> >>>>> >>>>> <property> >>>>> <name>tajo.worker.resource.disks</name> >>>>> <value>1</value> >>>>> <description>Available disk capacity (usually number of >>>>> disks)</description> >>>>> </property> >>>>> >>>>> <property> >>>>> <name>tajo.worker.tmpdir.locations</name> >>>>> >>>>> <value>/tmp/tajo-11/tmpdir,/tmp/tajo-11/tmpdir1,/tmp/tajo-11/tmpdir2</value> >>>>> <description>A base for other temporary directories.</description> >>>>> </property> >>>>> >>>>> Is there anyway to give the query more power to make it faster? >>>>> Do i need to do another configuration? >>>>> >>>>> >>> >
