[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361856981 ## File path: core/src/main/java/org/apache/carbondata/core/cache/CacheProvider.java ## @@ -135,11 +131,11 @@ public static CacheProvider getInstance() { private void createDictionaryCacheForGivenType(CacheType cacheType) { Review comment: renamed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361856763 ## File path: core/src/main/java/org/apache/carbondata/core/scan/filter/resolver/resolverinfo/visitor/DictionaryColumnVisitor.java ## @@ -55,26 +54,22 @@ public void populateFilterResolvedInfo(ColumnResolvedFilterInfo visitableObj, } catch (FilterIllegalMemberException e) { throw new FilterUnsupportedException(e); } - try { -resolvedFilterObject = FilterUtil -.getFilterValues(metadata.getTableIdentifier(), metadata.getColumnExpression(), -evaluateResultListFinal, metadata.isIncludeFilter()); -if (!metadata.isIncludeFilter() && null != resolvedFilterObject) { - // Adding default surrogate key of null member inorder to not display the same while - // displaying the report as per hive compatibility. - // first check of surrogate key for null value is already added then - // no need to add again otherwise result will be wrong in case of exclude filter - // this is because two times it will flip the same bit - if (!resolvedFilterObject.getExcludeFilterList() - .contains(CarbonCommonConstants.MEMBER_DEFAULT_VAL_SURROGATE_KEY)) { -resolvedFilterObject.getExcludeFilterList() -.add(CarbonCommonConstants.MEMBER_DEFAULT_VAL_SURROGATE_KEY); - } - Collections.sort(resolvedFilterObject.getExcludeFilterList()); + resolvedFilterObject = FilterUtil + .getFilterValues(metadata.getTableIdentifier(), metadata.getColumnExpression(), + evaluateResultListFinal, metadata.isIncludeFilter()); + if (!metadata.isIncludeFilter() && null != resolvedFilterObject) { +// Adding default surrogate key of null member inorder to not display the same while +// displaying the report as per hive compatibility. +// first check of surrogate key for null value is already added then +// no need to add again otherwise result will be wrong in case of exclude filter +// this is because two times it will flip the same bit +if (!resolvedFilterObject.getExcludeFilterList() +.contains(CarbonCommonConstants.MEMBER_DEFAULT_VAL_SURROGATE_KEY)) { + resolvedFilterObject.getExcludeFilterList() + .add(CarbonCommonConstants.MEMBER_DEFAULT_VAL_SURROGATE_KEY); } - } catch (QueryExecutionException e) { -throw new FilterUnsupportedException(e); Review comment: Because I removed `throw QueryExecutionException` in getFilterValues, so it is not needed to catch it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361856697 ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonObjectInspector.java ## @@ -60,14 +60,16 @@ public CarbonObjectInspector(final StructTypeInfo rowTypeInfo) { } private ObjectInspector getObjectInspector(final TypeInfo typeInfo) { -if (typeInfo.equals(TypeInfoFactory.doubleTypeInfo)) { +if (typeInfo.equals(TypeInfoFactory.stringTypeInfo)) { Review comment: yes, string should be compared first This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361856633 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonLoadStatisticsImpl.java ## @@ -314,9 +299,7 @@ private void printDicGenStatisticsInfo() { double loadCsvfilesToDfTime = getLoadCsvfilesToDfTime(); LOGGER.info("STAGE 1 ->Load csv to DataFrame and generate" + " block distinct values: " + loadCsvfilesToDfTime + "(s)"); -double dicShuffleAndWriteFileTotalTime = getDicShuffleAndWriteFileTotalTime(); -LOGGER.info("STAGE 2 ->Global dict shuffle and write dict file: " + -dicShuffleAndWriteFileTotalTime + "(s)"); +LOGGER.info("STAGE 2 ->Global dict shuffle and write dict file (deprecated): 0(s)"); Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361856665 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/readsupport/impl/DirectDictionaryReadSupport.java ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop.readsupport.impl; + +import java.io.IOException; +import java.sql.Date; +import java.sql.Timestamp; +import java.util.Calendar; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; + +/** + * A read support implementation to return Object array + * Only does direct dictionary (date/timestamp) conversion + */ +public class DirectDictionaryReadSupport implements CarbonReadSupport { Review comment: DirectDictionaryReadSupport is removed now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361856523 ## File path: core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/DictionaryByteArrayWrapper.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.localdictionary.dictionaryholder; + +import java.util.Arrays; + +import org.apache.carbondata.core.util.ByteUtil; + +import net.jpountz.xxhash.XXHash32; + +/** + * class that holds the byte array and overrides equals and hashcode method which + * will be useful for object comparison + */ +public class DictionaryByteArrayWrapper { Review comment: removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361856257 ## File path: core/src/main/java/org/apache/carbondata/core/cache/dictionary/AbstractColumnDictionaryInfo.java ## @@ -17,7 +17,6 @@ package org.apache.carbondata.core.cache.dictionary; -import java.nio.charset.Charset; Review comment: It is used in query flow This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361851813 ## File path: conf/carbon.properties.template ## @@ -85,7 +85,6 @@ carbon.major.compaction.size=1024 ##Min max feature is added to enhance query performance. To disable this feature, make it false. #carbon.enableMinMax=true - Global Dictionary Configurations Review comment: The dictionary read path is not removed, I will do it in another PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361824649 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java ## @@ -77,7 +76,6 @@ public ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, ColumnPa private ColumnPageEncoder createEncoderForDimension(TableSpec.DimensionSpec columnSpec, ColumnPage inputPage) { switch (columnSpec.getColumnType()) { - case GLOBAL_DICTIONARY: case DIRECT_DICTIONARY: Review comment: Yes, if we can think of a good name, we can consider it in another PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361798058 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java ## @@ -98,70 +82,21 @@ public void initialize() throws IOException { configuration.getDataLoadProperty(DataLoadProcessorConstants.IS_EMPTY_DATA_BAD_RECORD) .toString()); List fieldConverterList = new ArrayList<>(); -localCaches = new Map[fields.length]; long lruCacheStartTime = System.currentTimeMillis(); -DictionaryClient client = createDictionaryClient(); -dictClients.add(client); for (int i = 0; i < fields.length; i++) { - localCaches[i] = new ConcurrentHashMap<>(); FieldConverter fieldConverter = FieldEncoderFactory.getInstance() - .createFieldEncoder(fields[i], configuration.getTableIdentifier(), i, nullFormat, client, - configuration.getUseOnePass(), localCaches[i], isEmptyBadRecord, - configuration.getParentTablePath(), isConvertToBinary, + .createFieldEncoder(fields[i], i, nullFormat, isEmptyBadRecord, isConvertToBinary, (String) configuration.getDataLoadProperty( CarbonLoadOptionConstants.CARBON_OPTIONS_BINARY_DECODER)); fieldConverterList.add(fieldConverter); } CarbonTimeStatisticsFactory.getLoadStatisticsInstance() .recordLruCacheLoadTime((System.currentTimeMillis() - lruCacheStartTime) / 1000.0); -fieldConverters = fieldConverterList.toArray(new FieldConverter[fieldConverterList.size()]); +fieldConverters = fieldConverterList.toArray(new FieldConverter[0]); Review comment: It is recommended to give 0 size array to `.toArray` func in java This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361797805 ## File path: docs/ddl-of-carbondata.md ## @@ -83,7 +83,6 @@ CarbonData DDL statements are documented here,which includes: | Property | Description | | | | -| [DICTIONARY_INCLUDE](#dictionary-encoding-configuration) | Columns for which dictionary needs to be generated | | [NO_INVERTED_INDEX](#inverted-index-configuration) | Columns to exclude from inverted index generation| | [INVERTED_INDEX](#inverted-index-configuration) | Columns to include for inverted index generation | Review comment: inverted index is not removed, it is required This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361797763 ## File path: docs/configuration-parameters.md ## @@ -90,7 +90,6 @@ This section provides the details of all the configurations required for the Car | enable.data.loading.statistics | false | CarbonData has extensive logging which would be useful for debugging issues related to performance or hard to locate issues. This configuration when made ***true*** would log additional data loading statistics information to more accurately locate the issues being debugged. **NOTE:** Enabling this would log more debug information to log files, there by increasing the log files size significantly in short span of time. It is advised to configure the log files size, retention of log files parameters in log4j properties appropriately. Also extensive logging is an increased IO operation and hence over all data loading performance might get reduced. Therefore it is recommended to enable this configuration only for the duration of debugging. | | carbon.dictionary.chunk.size | 1 | CarbonData generates dictionary keys and writes them to separate dictionary file during data loading. To optimize the IO, this configuration determines the number of dictionary keys to be persisted to dictionary file at a time. **NOTE:** Writing to file also serves as a commit point to the dictionary generated. Increasing more values in memory causes more data loss during system or application failure. It is advised to alter this configuration judiciously. | | dictionary.worker.threads | 1 | CarbonData supports Optimized data loading by relying on a dictionary server. Dictionary server helps to maintain dictionary values independent of the data loading and there by avoids reading the same input data multiples times. This configuration determines the number of concurrent dictionary generation or request that needs to be served by the dictionary server. **NOTE:** This configuration takes effect when ***carbon.options.single.pass*** is configured as true. Please refer to *carbon.options.single.pass*to understand how dictionary server optimizes data loading. | Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361797761 ## File path: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ## @@ -704,8 +704,6 @@ public boolean isHivePartitionTable() { public AbsoluteTableIdentifier getAbsoluteTableIdentifier() { AbsoluteTableIdentifier absoluteTableIdentifier = tableInfo.getOrCreateAbsoluteTableIdentifier(); -absoluteTableIdentifier.setDictionaryPath( - tableInfo.getFactTable().getTableProperties().get(CarbonCommonConstants.DICTIONARY_PATH)); return absoluteTableIdentifier; Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature
jackylk commented on a change in pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#discussion_r361797717 ## File path: conf/carbon.properties.template ## @@ -85,7 +85,6 @@ carbon.major.compaction.size=1024 ##Min max feature is added to enhance query performance. To disable this feature, make it false. #carbon.enableMinMax=true - Global Dictionary Configurations Review comment: they are still required This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services