[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-03-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1831


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-03 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165810447
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1264,18 +1231,7 @@
   /**
* default property of unsafe processing
*/
-  public static final String ENABLE_UNSAFE_IN_QUERY_EXECUTION_DEFAULTVALUE 
= "false";
-
-  /**
-   * property for offheap based processing
-   */
-  @CarbonProperty
-  public static final String USE_OFFHEAP_IN_QUERY_PROCSSING = 
"use.offheap.in.query.processing";
-
-  /**
-   * default value of offheap based processing
-   */
-  public static final String USE_OFFHEAP_IN_QUERY_PROCSSING_DEFAULT = 
"true";
+  public static final String ENABLE_UNSAFE_IN_QUERY_EXECUTION_DEFAULTVALUE 
= "true";
--- End diff --

Change default  value for "enable.unsafe.columnpage"  to true


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-03 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165809183
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1264,18 +1231,7 @@
   /**
* default property of unsafe processing
*/
-  public static final String ENABLE_UNSAFE_IN_QUERY_EXECUTION_DEFAULTVALUE 
= "false";
-
-  /**
-   * property for offheap based processing
-   */
-  @CarbonProperty
-  public static final String USE_OFFHEAP_IN_QUERY_PROCSSING = 
"use.offheap.in.query.processing";
-
-  /**
-   * default value of offheap based processing
-   */
-  public static final String USE_OFFHEAP_IN_QUERY_PROCSSING_DEFAULT = 
"true";
+  public static final String ENABLE_UNSAFE_IN_QUERY_EXECUTION_DEFAULTVALUE 
= "true";
--- End diff --

Please make ENABLE_UNSAFE_COLUMN_PAGE_LOADING = "enable.unsafe.columnpage" 
also "true" by default as its the common configuration for query also. 


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-02 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165743860
  
--- Diff: conf/carbon.properties.template ---
@@ -17,29 +17,25 @@
 #
 
  System Configuration ##
-#Mandatory. Carbon Store path
-carbon.storelocation=hdfs://hacluster/Opt/CarbonStore
+#Optional. Carbon Store path
--- End diff --

added


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-02 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165743651
  
--- Diff: conf/carbon.properties.template ---
@@ -76,22 +72,16 @@ carbon.enable.quick.filter=false
 #carbon.block.meta.size.reserved.percentage=10
 ##csv reading buffer size.
 #carbon.csv.read.buffersize.byte=1048576
-##To identify and apply compression for non-high cardinality columns
-#high.cardinality.value=10
 ##maximum no of threads used for reading intermediate files for final 
merging.
 #carbon.merge.sort.reader.thread=3
 ##Carbon blocklet size. Note: this configuration cannot be change once 
store is generated
 #carbon.blocklet.size=12
-##number of retries to get the metadata lock for loading data to table
-#carbon.load.metadata.lock.retries=3
 ##Minimum blocklets needed for distribution.
 #carbon.blockletdistribution.min.blocklet.size=10
 ##Interval between the retries to get the lock
 #carbon.load.metadata.lock.retry.timeout.sec=5
 ##Temporary store location, By default it will take 
System.getProperty("java.io.tmpdir")
-#carbon.tempstore.location=/opt/Carbon/TempStoreLoc
-##data loading records count logger
-#carbon.load.log.counter=50
+#carbon.tempstore.location
--- End diff --

We have used this in CarbonAlterTableCompactionCommand, but i think there 
also we can use java tmp dir. so removed the property and usage also.


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-02 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165743745
  
--- Diff: docs/configuration-parameters.md ---
@@ -32,10 +32,10 @@ This section provides the details of all the 
configurations required for the Car
 
 | Property | Default Value | Description |
 
||-|--|
-| carbon.storelocation | /user/hive/warehouse/carbon.store | Location 
where CarbonData will create the store, and write the data in its own format. 
NOTE: Store location should be in HDFS. |
-| carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | This property is 
used to configure the HDFS relative path, the path configured in 
carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in 
fs.defaultFS. If this path is configured, then user need not pass the complete 
path while dataload. For example: If absolute path of the csv file is 
hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv, the path 
"hdfs://10.18.101.155:54310" will come from property fs.defaultFS and user can 
configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user 
can specify the csv path as /2016/xyz.csv. |
-| carbon.badRecords.location | /opt/Carbon/Spark/badrecords | Path where 
the bad records are stored. |
-| carbon.data.file.version | 3 | If this parameter value is set to 1, then 
CarbonData will support the data load which is in old format(0.x version). If 
the value is set to 2(1.x onwards version), then CarbonData will support the 
data load of new format only. The default value for this parameter is 3(latest 
version is set as default version). It improves the query performance by ~20% 
to 50%. For configuring V3 format explicitly, add carbon.data.file.version = V3 
in carbon.properties file. |
+| carbon.storelocation |  | Location where CarbonData will create the 
store, and write the data in its own format. NOTE: Store location should be in 
HDFS. |
--- End diff --

Added


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-02 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165742856
  
--- Diff: conf/carbon.properties.template ---
@@ -110,7 +100,7 @@ carbon.enable.quick.filter=false
 ##Percentage to identify whether column cardinality is more than 
configured percent of total row count
 #high.cardinality.row.count.percentage=80
--- End diff --

not used  removed


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-02 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165741342
  
--- Diff: conf/carbon.properties.template ---
@@ -76,22 +72,16 @@ carbon.enable.quick.filter=false
 #carbon.block.meta.size.reserved.percentage=10
 ##csv reading buffer size.
 #carbon.csv.read.buffersize.byte=1048576
-##To identify and apply compression for non-high cardinality columns
-#high.cardinality.value=10
 ##maximum no of threads used for reading intermediate files for final 
merging.
 #carbon.merge.sort.reader.thread=3
 ##Carbon blocklet size. Note: this configuration cannot be change once 
store is generated
 #carbon.blocklet.size=12
-##number of retries to get the metadata lock for loading data to table
-#carbon.load.metadata.lock.retries=3
 ##Minimum blocklets needed for distribution.
 #carbon.blockletdistribution.min.blocklet.size=10
 ##Interval between the retries to get the lock
 #carbon.load.metadata.lock.retry.timeout.sec=5
 ##Temporary store location, By default it will take 
System.getProperty("java.io.tmpdir")
-#carbon.tempstore.location=/opt/Carbon/TempStoreLoc
-##data loading records count logger
-#carbon.load.log.counter=50
+#carbon.tempstore.location
 ##To dissable/enable carbon block distribution
 #carbon.custom.block.distribution=false
--- End diff --

The property still in use 
 val useCustomDistribution =
  CarbonProperties.getInstance().getProperty(
CarbonCommonConstants.CARBON_CUSTOM_BLOCK_DISTRIBUTION,
"false").toBoolean ||
  
carbonDistribution.equalsIgnoreCase(CarbonCommonConstants.CARBON_TASK_DISTRIBUTION_CUSTOM)
if (useCustomDistribution) 


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165545325
  
--- Diff: docs/configuration-parameters.md ---
@@ -32,10 +32,10 @@ This section provides the details of all the 
configurations required for the Car
 
 | Property | Default Value | Description |
 
||-|--|
-| carbon.storelocation | /user/hive/warehouse/carbon.store | Location 
where CarbonData will create the store, and write the data in its own format. 
NOTE: Store location should be in HDFS. |
-| carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | This property is 
used to configure the HDFS relative path, the path configured in 
carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in 
fs.defaultFS. If this path is configured, then user need not pass the complete 
path while dataload. For example: If absolute path of the csv file is 
hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv, the path 
"hdfs://10.18.101.155:54310" will come from property fs.defaultFS and user can 
configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user 
can specify the csv path as /2016/xyz.csv. |
-| carbon.badRecords.location | /opt/Carbon/Spark/badrecords | Path where 
the bad records are stored. |
-| carbon.data.file.version | 3 | If this parameter value is set to 1, then 
CarbonData will support the data load which is in old format(0.x version). If 
the value is set to 2(1.x onwards version), then CarbonData will support the 
data load of new format only. The default value for this parameter is 3(latest 
version is set as default version). It improves the query performance by ~20% 
to 50%. For configuring V3 format explicitly, add carbon.data.file.version = V3 
in carbon.properties file. |
+| carbon.storelocation |  | Location where CarbonData will create the 
store, and write the data in its own format. NOTE: Store location should be in 
HDFS. |
--- End diff --

Here also mention that if it is not specified it takes spark warehouse path


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165544641
  
--- Diff: conf/carbon.properties.template ---
@@ -76,22 +72,16 @@ carbon.enable.quick.filter=false
 #carbon.block.meta.size.reserved.percentage=10
 ##csv reading buffer size.
 #carbon.csv.read.buffersize.byte=1048576
-##To identify and apply compression for non-high cardinality columns
-#high.cardinality.value=10
 ##maximum no of threads used for reading intermediate files for final 
merging.
 #carbon.merge.sort.reader.thread=3
 ##Carbon blocklet size. Note: this configuration cannot be change once 
store is generated
 #carbon.blocklet.size=12
-##number of retries to get the metadata lock for loading data to table
-#carbon.load.metadata.lock.retries=3
 ##Minimum blocklets needed for distribution.
 #carbon.blockletdistribution.min.blocklet.size=10
 ##Interval between the retries to get the lock
 #carbon.load.metadata.lock.retry.timeout.sec=5
 ##Temporary store location, By default it will take 
System.getProperty("java.io.tmpdir")
-#carbon.tempstore.location=/opt/Carbon/TempStoreLoc
-##data loading records count logger
-#carbon.load.log.counter=50
+#carbon.tempstore.location
 ##To dissable/enable carbon block distribution
 #carbon.custom.block.distribution=false
--- End diff --

This property is now changed to `carbon.task.distribution` and its default 
value is `block`


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165544424
  
--- Diff: conf/carbon.properties.template ---
@@ -110,7 +100,7 @@ carbon.enable.quick.filter=false
 ##Percentage to identify whether column cardinality is more than 
configured percent of total row count
 #high.cardinality.row.count.percentage=80
--- End diff --

This is also not used I guess, please check and remove


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165544265
  
--- Diff: conf/carbon.properties.template ---
@@ -76,22 +72,16 @@ carbon.enable.quick.filter=false
 #carbon.block.meta.size.reserved.percentage=10
 ##csv reading buffer size.
 #carbon.csv.read.buffersize.byte=1048576
-##To identify and apply compression for non-high cardinality columns
-#high.cardinality.value=10
 ##maximum no of threads used for reading intermediate files for final 
merging.
 #carbon.merge.sort.reader.thread=3
 ##Carbon blocklet size. Note: this configuration cannot be change once 
store is generated
 #carbon.blocklet.size=12
-##number of retries to get the metadata lock for loading data to table
-#carbon.load.metadata.lock.retries=3
 ##Minimum blocklets needed for distribution.
 #carbon.blockletdistribution.min.blocklet.size=10
 ##Interval between the retries to get the lock
 #carbon.load.metadata.lock.retry.timeout.sec=5
 ##Temporary store location, By default it will take 
System.getProperty("java.io.tmpdir")
-#carbon.tempstore.location=/opt/Carbon/TempStoreLoc
-##data loading records count logger
-#carbon.load.log.counter=50
+#carbon.tempstore.location
--- End diff --

Are we really using this? I think we always depends on eith java tmp dir or 
get tmp directoris from spark/yarn. Please reverify and remove if not used


---


[GitHub] carbondata pull request #1831: [CARBONDATA-1993] Carbon properties default v...

2018-02-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1831#discussion_r165543880
  
--- Diff: conf/carbon.properties.template ---
@@ -17,29 +17,25 @@
 #
 
  System Configuration ##
-#Mandatory. Carbon Store path
-carbon.storelocation=hdfs://hacluster/Opt/CarbonStore
+#Optional. Carbon Store path
--- End diff --

Mention that if it is not specified it takes spark warehouse path


---