[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/844 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/844#issuecomment-155031450 +1 The patch looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/844#issuecomment-155017340 @eminency and @hyunsik, thank you guys for your review! I fixed the test failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/844#issuecomment-154697044 Thanks, it looks good. +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/844#issuecomment-153982075 You can see the updated document here. http://people.apache.org/~jihoonson/tajo-docs/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/844#discussion_r44100331 --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst --- @@ -2,23 +2,455 @@ The tajo-site.xml File ** -To the ``core-site.xml`` file on every host in your cluster, you must add the following information: +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited. +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`. +Also, catalog configurations are found here :doc:`catalog_configuration`. + += +Join Query Settings += + +"" +`tajo.dist-query.join.auto-broadcast` +"" + +A flag to enable or disable the use of broadcast join. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: xml + + +tajo.dist-query.join.auto-broadcast +true + + +""" +`tajo.dist-query.broadcast.non-cross-join.threshold-kb` +""" + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: xml + + +tajo.dist-query.broadcast.non-cross-join.threshold-kb +5120 + + +""" +`tajo.dist-query.broadcast.cross-join.threshold-kb` +""" + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: xml + + +tajo.dist-query.broadcast.cross-join.threshold-kb +1024 + + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +"" +`tajo.dist-query.join.task-volume-mb` +"" + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: xml + + +tajo.dist-query.join.task-volume-mb +64 + + +""" +`tajo.dist-query.join.partition-volume-mb` +""" + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: xml + + +tajo.dist-query.join.partition-volume-mb +128 + + + +`tajo.executor.join.common.in-memory-hash-threshold-mb` + + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: xml + + +tajo.executor.join.common.in-memory-hash-threshold-mb +64 + + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. +
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/844#issuecomment-154281531 Thanks for your comment. I addressed your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/844#issuecomment-154274813 I leave some comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/844#discussion_r44100456 --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst --- @@ -2,23 +2,455 @@ The tajo-site.xml File ** -To the ``core-site.xml`` file on every host in your cluster, you must add the following information: +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited. +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`. +Also, catalog configurations are found here :doc:`catalog_configuration`. + += +Join Query Settings += + +"" +`tajo.dist-query.join.auto-broadcast` +"" + +A flag to enable or disable the use of broadcast join. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: xml + + +tajo.dist-query.join.auto-broadcast +true + + +""" +`tajo.dist-query.broadcast.non-cross-join.threshold-kb` +""" + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: xml + + +tajo.dist-query.broadcast.non-cross-join.threshold-kb +5120 + + +""" +`tajo.dist-query.broadcast.cross-join.threshold-kb` +""" + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: xml + + +tajo.dist-query.broadcast.cross-join.threshold-kb +1024 + + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +"" +`tajo.dist-query.join.task-volume-mb` +"" + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: xml + + +tajo.dist-query.join.task-volume-mb +64 + + +""" +`tajo.dist-query.join.partition-volume-mb` +""" + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: xml + + +tajo.dist-query.join.partition-volume-mb +128 + + + +`tajo.executor.join.common.in-memory-hash-threshold-mb` + + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: xml + + +tajo.executor.join.common.in-memory-hash-threshold-mb +64 + + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. +
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/844#discussion_r44103172 --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst --- @@ -2,23 +2,455 @@ The tajo-site.xml File ** -To the ``core-site.xml`` file on every host in your cluster, you must add the following information: +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited. +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`. +Also, catalog configurations are found here :doc:`catalog_configuration`. + += +Join Query Settings += + +"" +`tajo.dist-query.join.auto-broadcast` +"" + +A flag to enable or disable the use of broadcast join. + + * Property value: Boolean --- End diff -- Thank you for the good comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/844#discussion_r44098591 --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst --- @@ -2,23 +2,455 @@ The tajo-site.xml File ** -To the ``core-site.xml`` file on every host in your cluster, you must add the following information: +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited. +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`. +Also, catalog configurations are found here :doc:`catalog_configuration`. + += +Join Query Settings += + +"" +`tajo.dist-query.join.auto-broadcast` +"" + +A flag to enable or disable the use of broadcast join. + + * Property value: Boolean --- End diff -- IMO, 'property value type' looks clearer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...
GitHub user jihoonson opened a pull request: https://github.com/apache/tajo/pull/844 TAJO-1963: Add more configuration descriptions to document I also fixed a wrong configuration name. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jihoonson/tajo-2 TAJO-1963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/844.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #844 commit 0b9bd167440b5e872f7ef02bae366d24e30e475d Author: Jihoon SonDate: 2015-11-05T07:43:47Z Add a document and fixed a wrong configuration name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---