[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/tajo/pull/844


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-09 Thread hyunsik
Github user hyunsik commented on the pull request:

https://github.com/apache/tajo/pull/844#issuecomment-155031450
  
+1
The patch looks good to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-09 Thread jihoonson
Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/844#issuecomment-155017340
  
@eminency and @hyunsik, thank you guys for your review!
I fixed the test failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-07 Thread eminency
Github user eminency commented on the pull request:

https://github.com/apache/tajo/pull/844#issuecomment-154697044
  
Thanks, it looks good. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-05 Thread jihoonson
Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/844#issuecomment-153982075
  
You can see the updated document here.
http://people.apache.org/~jihoonson/tajo-docs/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-05 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/844#discussion_r44100331
  
--- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
@@ -2,23 +2,455 @@
 The tajo-site.xml File
 **
 
-To the ``core-site.xml`` file on every host in your cluster, you must add 
the following information:
+You can add more configurations in the ``tajo-site.xml`` file. Note that 
you should replicate this file to the whole hosts in your cluster once you 
edited.
+If you are looking for the configurations for the master and the worker, 
please refer to :doc:`tajo_master_configuration` and 
:doc:`worker_configuration`.
+Also, catalog configurations are found here :doc:`catalog_configuration`.
+
+=
+Join Query Settings
+=
+
+""
+`tajo.dist-query.join.auto-broadcast`
+""
+
+A flag to enable or disable the use of broadcast join.
+
+  * Property value: Boolean
+  * Default value: true
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.join.auto-broadcast
+true
+  
+
+"""
+`tajo.dist-query.broadcast.non-cross-join.threshold-kb`
+"""
+
+A threshold for non-cross joins. When a non-cross join query is executed 
with the broadcast join, the whole size of broadcasted tables won't exceed this 
threshold.
+
+  * Property value: Integer
+  * Unit: KB
+  * Default value: 5120
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.broadcast.non-cross-join.threshold-kb
+5120
+  
+
+"""
+`tajo.dist-query.broadcast.cross-join.threshold-kb`
+"""
+
+A threshold for cross joins. When a cross join query is executed, the 
whole size of broadcasted tables won't exceed this threshold.
+
+  * Property value: Integer
+  * Unit: KB
+  * Default value: 1024
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.broadcast.cross-join.threshold-kb
+1024
+  
+
+.. warning::
+  In Tajo, the broadcast join is only the way to perform cross joins. 
Since the cross join is a very expensive operation, this value need to be tuned 
carefully.
+
+""
+`tajo.dist-query.join.task-volume-mb`
+""
+
+The repartition join is executed in two stages. When a join query is 
executed with the repartition join, this value indicates the amount of input 
data processed by each task at the second stage.
+As a result, it determines the degree of the parallel processing of the 
join query.
+
+  * Property value: Integer
+  * Unit: MB
+  * Default value: 64
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.join.task-volume-mb
+64
+  
+
+"""
+`tajo.dist-query.join.partition-volume-mb`
+"""
+
+The repartition join is executed in two stages. When a join query is 
executed with the repartition join,
+this value indicates the output size of each task at the first stage, 
which determines the number of partitions to be shuffled between two stages.
+
+  * Property value: Integer
+  * Unit: MB
+  * Default value: 128
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.join.partition-volume-mb
+128
+  
+
+
+`tajo.executor.join.common.in-memory-hash-threshold-mb`
+
+
+This value provides the criterion to decide the algorithm to perform a 
join in a task.
+If the input data is smaller than this value, join is performed with the 
in-memory hash join.
+Otherwise, the sort-merge join is used.
+
+  * Property value: Integer
+  * Unit: MB
+  * Default value: 64
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.executor.join.common.in-memory-hash-threshold-mb
+64
+  
+
+.. warning::
+  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
+  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
+  This value should be tuned carefully.
+

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-05 Thread jihoonson
Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/844#issuecomment-154281531
  
Thanks for your comment. I addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-05 Thread eminency
Github user eminency commented on the pull request:

https://github.com/apache/tajo/pull/844#issuecomment-154274813
  
I leave some comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-05 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/844#discussion_r44100456
  
--- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
@@ -2,23 +2,455 @@
 The tajo-site.xml File
 **
 
-To the ``core-site.xml`` file on every host in your cluster, you must add 
the following information:
+You can add more configurations in the ``tajo-site.xml`` file. Note that 
you should replicate this file to the whole hosts in your cluster once you 
edited.
+If you are looking for the configurations for the master and the worker, 
please refer to :doc:`tajo_master_configuration` and 
:doc:`worker_configuration`.
+Also, catalog configurations are found here :doc:`catalog_configuration`.
+
+=
+Join Query Settings
+=
+
+""
+`tajo.dist-query.join.auto-broadcast`
+""
+
+A flag to enable or disable the use of broadcast join.
+
+  * Property value: Boolean
+  * Default value: true
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.join.auto-broadcast
+true
+  
+
+"""
+`tajo.dist-query.broadcast.non-cross-join.threshold-kb`
+"""
+
+A threshold for non-cross joins. When a non-cross join query is executed 
with the broadcast join, the whole size of broadcasted tables won't exceed this 
threshold.
+
+  * Property value: Integer
+  * Unit: KB
+  * Default value: 5120
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.broadcast.non-cross-join.threshold-kb
+5120
+  
+
+"""
+`tajo.dist-query.broadcast.cross-join.threshold-kb`
+"""
+
+A threshold for cross joins. When a cross join query is executed, the 
whole size of broadcasted tables won't exceed this threshold.
+
+  * Property value: Integer
+  * Unit: KB
+  * Default value: 1024
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.broadcast.cross-join.threshold-kb
+1024
+  
+
+.. warning::
+  In Tajo, the broadcast join is only the way to perform cross joins. 
Since the cross join is a very expensive operation, this value need to be tuned 
carefully.
+
+""
+`tajo.dist-query.join.task-volume-mb`
+""
+
+The repartition join is executed in two stages. When a join query is 
executed with the repartition join, this value indicates the amount of input 
data processed by each task at the second stage.
+As a result, it determines the degree of the parallel processing of the 
join query.
+
+  * Property value: Integer
+  * Unit: MB
+  * Default value: 64
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.join.task-volume-mb
+64
+  
+
+"""
+`tajo.dist-query.join.partition-volume-mb`
+"""
+
+The repartition join is executed in two stages. When a join query is 
executed with the repartition join,
+this value indicates the output size of each task at the first stage, 
which determines the number of partitions to be shuffled between two stages.
+
+  * Property value: Integer
+  * Unit: MB
+  * Default value: 128
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.dist-query.join.partition-volume-mb
+128
+  
+
+
+`tajo.executor.join.common.in-memory-hash-threshold-mb`
+
+
+This value provides the criterion to decide the algorithm to perform a 
join in a task.
+If the input data is smaller than this value, join is performed with the 
in-memory hash join.
+Otherwise, the sort-merge join is used.
+
+  * Property value: Integer
+  * Unit: MB
+  * Default value: 64
+  * Example
+
+.. code-block:: xml
+
+  
+tajo.executor.join.common.in-memory-hash-threshold-mb
+64
+  
+
+.. warning::
+  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
+  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
+  This value should be tuned carefully.
+

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-05 Thread jihoonson
Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/844#discussion_r44103172
  
--- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
@@ -2,23 +2,455 @@
 The tajo-site.xml File
 **
 
-To the ``core-site.xml`` file on every host in your cluster, you must add 
the following information:
+You can add more configurations in the ``tajo-site.xml`` file. Note that 
you should replicate this file to the whole hosts in your cluster once you 
edited.
+If you are looking for the configurations for the master and the worker, 
please refer to :doc:`tajo_master_configuration` and 
:doc:`worker_configuration`.
+Also, catalog configurations are found here :doc:`catalog_configuration`.
+
+=
+Join Query Settings
+=
+
+""
+`tajo.dist-query.join.auto-broadcast`
+""
+
+A flag to enable or disable the use of broadcast join.
+
+  * Property value: Boolean
--- End diff --

Thank you for the good comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-05 Thread eminency
Github user eminency commented on a diff in the pull request:

https://github.com/apache/tajo/pull/844#discussion_r44098591
  
--- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
@@ -2,23 +2,455 @@
 The tajo-site.xml File
 **
 
-To the ``core-site.xml`` file on every host in your cluster, you must add 
the following information:
+You can add more configurations in the ``tajo-site.xml`` file. Note that 
you should replicate this file to the whole hosts in your cluster once you 
edited.
+If you are looking for the configurations for the master and the worker, 
please refer to :doc:`tajo_master_configuration` and 
:doc:`worker_configuration`.
+Also, catalog configurations are found here :doc:`catalog_configuration`.
+
+=
+Join Query Settings
+=
+
+""
+`tajo.dist-query.join.auto-broadcast`
+""
+
+A flag to enable or disable the use of broadcast join.
+
+  * Property value: Boolean
--- End diff --

IMO, 'property value type' looks clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

2015-11-04 Thread jihoonson
GitHub user jihoonson opened a pull request:

https://github.com/apache/tajo/pull/844

TAJO-1963: Add more configuration descriptions to document

I also fixed a wrong configuration name.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jihoonson/tajo-2 TAJO-1963

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tajo/pull/844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #844


commit 0b9bd167440b5e872f7ef02bae366d24e30e475d
Author: Jihoon Son 
Date:   2015-11-05T07:43:47Z

Add a document and fixed a wrong configuration name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---