[GitHub] [carbondata] QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-28 Thread GitBox
QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove 
global dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#discussion_r361787554
 
 

 ##
 File path: docs/ddl-of-carbondata.md
 ##
 @@ -83,7 +83,6 @@ CarbonData DDL statements are documented here,which includes:
 
 | Property | Description   
   |
 |  | 
 |
-| [DICTIONARY_INCLUDE](#dictionary-encoding-configuration) | Columns for 
which dictionary needs to be generated   |
 | [NO_INVERTED_INDEX](#inverted-index-configuration)   | Columns to 
exclude from inverted index generation|
 | [INVERTED_INDEX](#inverted-index-configuration)  | Columns to 
include for inverted index generation |
 
 Review comment:
   no need


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-28 Thread GitBox
QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove 
global dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#discussion_r361788681
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java
 ##
 @@ -98,70 +82,21 @@ public void initialize() throws IOException {
 
configuration.getDataLoadProperty(DataLoadProcessorConstants.IS_EMPTY_DATA_BAD_RECORD)
 .toString());
 List fieldConverterList = new ArrayList<>();
-localCaches = new Map[fields.length];
 long lruCacheStartTime = System.currentTimeMillis();
-DictionaryClient client = createDictionaryClient();
-dictClients.add(client);
 
 for (int i = 0; i < fields.length; i++) {
-  localCaches[i] = new ConcurrentHashMap<>();
   FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
-  .createFieldEncoder(fields[i], configuration.getTableIdentifier(), 
i, nullFormat, client,
-  configuration.getUseOnePass(), localCaches[i], isEmptyBadRecord,
-  configuration.getParentTablePath(), isConvertToBinary,
+  .createFieldEncoder(fields[i], i, nullFormat, isEmptyBadRecord, 
isConvertToBinary,
   (String) configuration.getDataLoadProperty(
   CarbonLoadOptionConstants.CARBON_OPTIONS_BINARY_DECODER));
   fieldConverterList.add(fieldConverter);
 }
 CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
 .recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
-fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
+fieldConverters = fieldConverterList.toArray(new FieldConverter[0]);
 
 Review comment:
   why size is zero?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-28 Thread GitBox
QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove 
global dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#discussion_r361787085
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
 ##
 @@ -704,8 +704,6 @@ public boolean isHivePartitionTable() {
   public AbsoluteTableIdentifier getAbsoluteTableIdentifier() {
 AbsoluteTableIdentifier absoluteTableIdentifier =
 tableInfo.getOrCreateAbsoluteTableIdentifier();
-absoluteTableIdentifier.setDictionaryPath(
-
tableInfo.getFactTable().getTableProperties().get(CarbonCommonConstants.DICTIONARY_PATH));
 return absoluteTableIdentifier;
 
 Review comment:
   use "return tableInfo.getOrCreateAbsoluteTableIdentifier()"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-28 Thread GitBox
QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove 
global dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#discussion_r361788336
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -767,78 +766,19 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
   }
 }
 
-// All excluded cols should be there in create table cols
 if 
(tableProperties.get(CarbonCommonConstants.DICTIONARY_EXCLUDE).isDefined) {
-  LOGGER.warn("dictionary_exclude option was deprecated, " +
-  "by default string column does not use global dictionary.")
-  dictExcludeCols =
-
tableProperties.get(CarbonCommonConstants.DICTIONARY_EXCLUDE).get.split(',').map(_.trim)
-  dictExcludeCols
-.foreach { dictExcludeCol =>
-  if (!fields.exists(x => x.column.equalsIgnoreCase(dictExcludeCol))) {
-val errorMsg = "DICTIONARY_EXCLUDE column: " + dictExcludeCol +
-   " does not exist in table or unsupported for 
complex child column. " +
-   "Please check the create table statement."
-throw new MalformedCarbonCommandException(errorMsg)
-  } else {
-val dataType = fields.find(x =>
-  x.column.equalsIgnoreCase(dictExcludeCol)).get.dataType.get
-if (!isDataTypeSupportedForDictionary_Exclude(dataType)) {
-  val errorMsg = "DICTIONARY_EXCLUDE is unsupported for " + 
dataType.toLowerCase() +
- " data type column: " + dictExcludeCol
-  throw new MalformedCarbonCommandException(errorMsg)
-} else if (varcharCols.exists(x => 
x.equalsIgnoreCase(dictExcludeCol))) {
-  throw new MalformedCarbonCommandException(
-"DICTIONARY_EXCLUDE is unsupported for long string datatype 
column: " +
-dictExcludeCol)
-}
-  }
-}
+  // dictionary_exclude is not supported since 2.0
+  throw new DeprecatedFeatureException("dictionary_exclude")
 }
-// All included cols should be there in create table cols
 if 
(tableProperties.get(CarbonCommonConstants.DICTIONARY_INCLUDE).isDefined) {
-  dictIncludeCols =
-
tableProperties(CarbonCommonConstants.DICTIONARY_INCLUDE).split(",").map(_.trim)
-  dictIncludeCols.foreach { distIncludeCol =>
-if (!fields.exists(x => 
x.column.equalsIgnoreCase(distIncludeCol.trim))) {
-  val errorMsg = "DICTIONARY_INCLUDE column: " + distIncludeCol.trim +
- " does not exist in table or unsupported for complex 
child column. " +
- "Please check the create table statement."
-  throw new MalformedCarbonCommandException(errorMsg)
-}
-val rangeField = 
fields.find(_.column.equalsIgnoreCase(distIncludeCol.trim))
-if ("binary".equalsIgnoreCase(rangeField.get.dataType.get)) {
-  throw new MalformedCarbonCommandException(
-"DICTIONARY_INCLUDE is unsupported for binary data type column: " +
-distIncludeCol.trim)
-}
-if (varcharCols.exists(x => x.equalsIgnoreCase(distIncludeCol.trim))) {
-  throw new MalformedCarbonCommandException(
-"DICTIONARY_INCLUDE is unsupported for long string datatype 
column: " +
-distIncludeCol.trim)
-}
-  }
-}
-
-// include cols should not contain exclude cols
-dictExcludeCols.foreach { dicExcludeCol =>
-  if (dictIncludeCols.exists(x => x.equalsIgnoreCase(dicExcludeCol))) {
-val errorMsg = "DICTIONARY_EXCLUDE can not contain the same column: " 
+ dicExcludeCol +
-   " with DICTIONARY_INCLUDE. Please check the create 
table statement."
-throw new MalformedCarbonCommandException(errorMsg)
-  }
+  // dictionary_include is not supported since 2.0
+  throw new DeprecatedFeatureException("dictionary_include")
 }
 
 // by default consider all String cols as dims and if any dictionary 
include isn't present then
 // add it to noDictionaryDims list. consider all dictionary 
excludes/include cols as dims
 fields.foreach { field =>
-  if (dictExcludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
-noDictionaryDims :+= field.column
-dimFields += field
-  } else if (dictIncludeCols.exists(x => 
x.equalsIgnoreCase(field.column))) {
-dimFields += field
-  } else if (field.dataType.get.toUpperCase.equals("TIMESTAMP") &&
 
 Review comment:
   need check this change, maybe exclude timestamp column, it will be non-dict 
column


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[GitHub] [carbondata] QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-28 Thread GitBox
QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove 
global dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#discussion_r361787853
 
 

 ##
 File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/readsupport/impl/DictionaryDecodeReadSupport.java
 ##
 @@ -60,11 +59,9 @@ public void initialize(CarbonColumn[] carbonColumns,
 Cache 
forwardDictionaryCache = cacheProvider
 .createCache(CacheType.FORWARD_DICTIONARY);
 dataTypes[i] = carbonColumns[i].getDataType();
-String dictionaryPath = 
carbonTable.getTableInfo().getFactTable().getTableProperties()
-.get(CarbonCommonConstants.DICTIONARY_PATH);
 dictionaries[i] = forwardDictionaryCache.get(new 
DictionaryColumnUniqueIdentifier(
 carbonTable.getAbsoluteTableIdentifier(),
-carbonColumns[i].getColumnIdentifier(), dataTypes[i], 
dictionaryPath));
+carbonColumns[i].getColumnIdentifier(), dataTypes[i]));
 
 Review comment:
   use CarbonRowReadSupport instead of this class


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-28 Thread GitBox
QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove 
global dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#discussion_r361786417
 
 

 ##
 File path: conf/carbon.properties.template
 ##
 @@ -85,7 +85,6 @@ carbon.major.compaction.size=1024
 ##Min max feature is added to enhance query performance. To disable this 
feature, make it false.
 #carbon.enableMinMax=true
 
- Global Dictionary Configurations 
 
 Review comment:
   how about "Timestamp/Date encoding Configurations"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-28 Thread GitBox
QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove 
global dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#discussion_r361786862
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ##
 @@ -77,7 +76,6 @@ public ColumnPageEncoder createEncoder(TableSpec.ColumnSpec 
columnSpec, ColumnPa
   private ColumnPageEncoder createEncoderForDimension(TableSpec.DimensionSpec 
columnSpec,
   ColumnPage inputPage) {
 switch (columnSpec.getColumnType()) {
-  case GLOBAL_DICTIONARY:
   case DIRECT_DICTIONARY:
 
 Review comment:
   can we use a new name instead of DIRECT_DICTIONARY?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-28 Thread GitBox
QiangCai commented on a change in pull request #3502: [CARBONATA-3605] Remove 
global dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#discussion_r361787474
 
 

 ##
 File path: docs/configuration-parameters.md
 ##
 @@ -90,7 +90,6 @@ This section provides the details of all the configurations 
required for the Car
 | enable.data.loading.statistics | false | CarbonData has extensive logging 
which would be useful for debugging issues related to performance or hard to 
locate issues. This configuration when made ***true*** would log additional 
data loading statistics information to more accurately locate the issues being 
debugged. **NOTE:** Enabling this would log more debug information to log 
files, there by increasing the log files size significantly in short span of 
time. It is advised to configure the log files size, retention of log files 
parameters in log4j properties appropriately. Also extensive logging is an 
increased IO operation and hence over all data loading performance might get 
reduced. Therefore it is recommended to enable this configuration only for the 
duration of debugging. |
 | carbon.dictionary.chunk.size | 1 | CarbonData generates dictionary keys 
and writes them to separate dictionary file during data loading. To optimize 
the IO, this configuration determines the number of dictionary keys to be 
persisted to dictionary file at a time. **NOTE:** Writing to file also serves 
as a commit point to the dictionary generated. Increasing more values in memory 
causes more data loss during system or application failure. It is advised to 
alter this configuration judiciously. |
 | dictionary.worker.threads | 1 | CarbonData supports Optimized data loading 
by relying on a dictionary server. Dictionary server helps to maintain 
dictionary values independent of the data loading and there by avoids reading 
the same input data multiples times. This configuration determines the number 
of concurrent dictionary generation or request that needs to be served by the 
dictionary server. **NOTE:** This configuration takes effect when 
***carbon.options.single.pass*** is configured as true. Please refer to 
*carbon.options.single.pass*to understand how dictionary server optimizes data 
loading. |
 
 Review comment:
   remove this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services