[GitHub] carbondata pull request #3066: [CARBONDATA-3244] Add benchmark for Change Da...
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/3066 [CARBONDATA-3244] Add benchmark for Change Data Capture scenario CDC (change data capture) is a common scenario for analyzing slowly changed table in data warehouse. It is good to add benchmark test comparing two update methods: 1. hive_solution, which uses INSERT OVERWRITE. This is a popular method for hive warehouse. 2. carbon_solution, which uses CarbonData's update syntax to update the history table directly. This test simulates updates to history table using CDC table. When running in a 8-cores laptop, the benchmark shows: 1. test one History table 1M records, update 10K records everyday and insert 10K records everyday, simulated 3 days. hive_solution: total process time takes 13,516 ms carbon_solution: total process time takes 7,521 ms 2. test two History table 10M records, update 10K records everyday and insert 10K records everyday, simulated 3 days. hive_solution: total process time takes 104,250 ms carbon_solution: total process time takes 17,384 ms - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Only example is added - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata cdc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3066.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3066 commit ebb5ef79ac85a6c736496fe19f719bfed74902c1 Author: Jacky Li Date: 2019-01-10T16:44:58Z add benchmark for Change Data Capture scenario ---
[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2963#discussion_r245013088 --- Diff: pom.xml --- @@ -527,6 +526,7 @@ examples/spark2 datamap/lucene datamap/bloom +datamap/example --- End diff -- I think it is better not to add this, since it will make the assembling bigger ---
[GitHub] carbondata issue #3019: [CARBONDATA-3194] Integrating Carbon with Presto usi...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/3019 LGTM ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244269584 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala --- @@ -157,7 +157,7 @@ case class CarbonCreateTableCommand( | tableName "$tableName", | dbName "$dbName", | tablePath "$tablePath", - | path "$tablePath", + | path "${FileFactory.addSchemeIfNotExists(tablePath)}", --- End diff -- Can you update the comment from line 147 to 150. I feel it need to be rephrased ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244269478 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java --- @@ -369,6 +369,24 @@ public static boolean createNewLockFile(String filePath, FileType fileType) thro LOCAL, HDFS, ALLUXIO, VIEWFS, S3 } + public static String addSchemeIfNotExists(String filePath) { --- End diff -- add comment ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244269435 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataModule.java --- @@ -17,62 +17,150 @@ package org.apache.carbondata.presto; -import javax.inject.Inject; +import java.util.function.Supplier; import static java.util.Objects.requireNonNull; -import org.apache.carbondata.presto.impl.CarbonTableConfig; import org.apache.carbondata.presto.impl.CarbonTableReader; +import com.facebook.presto.hive.CoercionPolicy; +import com.facebook.presto.hive.DirectoryLister; +import com.facebook.presto.hive.FileFormatDataSourceStats; +import com.facebook.presto.hive.GenericHiveRecordCursorProvider; +import com.facebook.presto.hive.HadoopDirectoryLister; +import com.facebook.presto.hive.HdfsConfiguration; +import com.facebook.presto.hive.HdfsConfigurationUpdater; +import com.facebook.presto.hive.HdfsEnvironment; +import com.facebook.presto.hive.HiveClientConfig; +import com.facebook.presto.hive.HiveClientModule; +import com.facebook.presto.hive.HiveCoercionPolicy; +import com.facebook.presto.hive.HiveConnectorId; +import com.facebook.presto.hive.HiveEventClient; +import com.facebook.presto.hive.HiveFileWriterFactory; +import com.facebook.presto.hive.HiveHdfsConfiguration; +import com.facebook.presto.hive.HiveLocationService; +import com.facebook.presto.hive.HiveMetadataFactory; +import com.facebook.presto.hive.HiveNodePartitioningProvider; +import com.facebook.presto.hive.HivePageSinkProvider; +import com.facebook.presto.hive.HivePageSourceFactory; +import com.facebook.presto.hive.HivePartitionManager; +import com.facebook.presto.hive.HiveRecordCursorProvider; +import com.facebook.presto.hive.HiveSessionProperties; +import com.facebook.presto.hive.HiveSplitManager; +import com.facebook.presto.hive.HiveTableProperties; +import com.facebook.presto.hive.HiveTransactionManager; +import com.facebook.presto.hive.HiveTypeTranslator; +import com.facebook.presto.hive.HiveWriterStats; +import com.facebook.presto.hive.LocationService; +import com.facebook.presto.hive.NamenodeStats; +import com.facebook.presto.hive.OrcFileWriterConfig; +import com.facebook.presto.hive.OrcFileWriterFactory; +import com.facebook.presto.hive.PartitionUpdate; +import com.facebook.presto.hive.RcFileFileWriterFactory; +import com.facebook.presto.hive.TableParameterCodec; +import com.facebook.presto.hive.TransactionalMetadata; +import com.facebook.presto.hive.TypeTranslator; +import com.facebook.presto.hive.orc.DwrfPageSourceFactory; +import com.facebook.presto.hive.orc.OrcPageSourceFactory; +import com.facebook.presto.hive.parquet.ParquetPageSourceFactory; +import com.facebook.presto.hive.parquet.ParquetRecordCursorProvider; +import com.facebook.presto.hive.rcfile.RcFilePageSourceFactory; +import com.facebook.presto.spi.connector.ConnectorNodePartitioningProvider; +import com.facebook.presto.spi.connector.ConnectorPageSinkProvider; import com.facebook.presto.spi.connector.ConnectorPageSourceProvider; import com.facebook.presto.spi.connector.ConnectorSplitManager; -import com.facebook.presto.spi.type.Type; -import com.facebook.presto.spi.type.TypeManager; -import com.fasterxml.jackson.databind.DeserializationContext; -import com.fasterxml.jackson.databind.deser.std.FromStringDeserializer; import com.google.inject.Binder; -import com.google.inject.Module; import com.google.inject.Scopes; +import com.google.inject.TypeLiteral; +import com.google.inject.multibindings.Multibinder; +import io.airlift.event.client.EventClient; -import static com.facebook.presto.spi.type.TypeSignature.parseTypeSignature; -import static com.google.common.base.Preconditions.checkArgument; +import static com.google.inject.multibindings.Multibinder.newSetBinder; import static io.airlift.configuration.ConfigBinder.configBinder; +import static io.airlift.json.JsonCodecBinder.jsonCodecBinder; -public class CarbondataModule implements Module { +import static org.weakref.jmx.ObjectNames.generatedNameOf; +import static org.weakref.jmx.guice.ExportBinder.newExporter; + +public class CarbondataModule extends HiveClientModule { --- End diff -- Please add comment for this class ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244254924 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -78,32 +72,33 @@ private static final String HIVE_DEFAULT_DYNAMIC_PARTITION = "__HIVE_DEFAULT_PARTITION__"; /** - * @param carbondataColumnHandle + * @param columnHandle * @return */ - private static DataType spi2CarbondataTypeMapper(CarbondataColumnHandle carbondataColumnHandle) { -Type colType = carbondataColumnHandle.getColumnType(); -if (colType == BooleanType.BOOLEAN) { + private static DataType spi2CarbondataTypeMapper(HiveColumnHandle columnHandle) { +HiveType colType = columnHandle.getHiveType(); +if (colType.equals(HiveType.HIVE_BOOLEAN)) { return DataTypes.BOOLEAN; -} else if (colType == SmallintType.SMALLINT) { +} else if (colType.equals(HiveType.HIVE_SHORT)) { return DataTypes.SHORT; -} else if (colType == IntegerType.INTEGER) { +} else if (colType.equals(HiveType.HIVE_INT)) { return DataTypes.INT; -} else if (colType == BigintType.BIGINT) { +} else if (colType.equals(HiveType.HIVE_LONG)) { return DataTypes.LONG; -} else if (colType == DoubleType.DOUBLE) { +} else if (colType.equals(HiveType.HIVE_DOUBLE)) { return DataTypes.DOUBLE; -} else if (colType == VarcharType.VARCHAR) { +} else if (colType.equals(HiveType.HIVE_STRING)) { return DataTypes.STRING; -} else if (colType == DateType.DATE) { +} else if (colType.equals(HiveType.HIVE_DATE)) { return DataTypes.DATE; -} else if (colType == TimestampType.TIMESTAMP) { +} else if (colType.equals(HiveType.HIVE_TIMESTAMP)) { return DataTypes.TIMESTAMP; -} else if (colType.equals(DecimalType.createDecimalType(carbondataColumnHandle.getPrecision(), -carbondataColumnHandle.getScale( { - return DataTypes.createDecimalType(carbondataColumnHandle.getPrecision(), - carbondataColumnHandle.getScale()); -} else { +} +else if (colType.getTypeInfo() instanceof DecimalTypeInfo) { + DecimalTypeInfo typeInfo = (DecimalTypeInfo) colType.getTypeInfo(); + return DataTypes.createDecimalType(typeInfo.getPrecision(),typeInfo.getScale()); +} +else { --- End diff -- move up ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244254915 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -78,32 +72,33 @@ private static final String HIVE_DEFAULT_DYNAMIC_PARTITION = "__HIVE_DEFAULT_PARTITION__"; /** - * @param carbondataColumnHandle + * @param columnHandle * @return */ - private static DataType spi2CarbondataTypeMapper(CarbondataColumnHandle carbondataColumnHandle) { -Type colType = carbondataColumnHandle.getColumnType(); -if (colType == BooleanType.BOOLEAN) { + private static DataType spi2CarbondataTypeMapper(HiveColumnHandle columnHandle) { +HiveType colType = columnHandle.getHiveType(); +if (colType.equals(HiveType.HIVE_BOOLEAN)) { return DataTypes.BOOLEAN; -} else if (colType == SmallintType.SMALLINT) { +} else if (colType.equals(HiveType.HIVE_SHORT)) { return DataTypes.SHORT; -} else if (colType == IntegerType.INTEGER) { +} else if (colType.equals(HiveType.HIVE_INT)) { return DataTypes.INT; -} else if (colType == BigintType.BIGINT) { +} else if (colType.equals(HiveType.HIVE_LONG)) { return DataTypes.LONG; -} else if (colType == DoubleType.DOUBLE) { +} else if (colType.equals(HiveType.HIVE_DOUBLE)) { return DataTypes.DOUBLE; -} else if (colType == VarcharType.VARCHAR) { +} else if (colType.equals(HiveType.HIVE_STRING)) { return DataTypes.STRING; -} else if (colType == DateType.DATE) { +} else if (colType.equals(HiveType.HIVE_DATE)) { return DataTypes.DATE; -} else if (colType == TimestampType.TIMESTAMP) { +} else if (colType.equals(HiveType.HIVE_TIMESTAMP)) { return DataTypes.TIMESTAMP; -} else if (colType.equals(DecimalType.createDecimalType(carbondataColumnHandle.getPrecision(), -carbondataColumnHandle.getScale( { - return DataTypes.createDecimalType(carbondataColumnHandle.getPrecision(), - carbondataColumnHandle.getScale()); -} else { +} +else if (colType.getTypeInfo() instanceof DecimalTypeInfo) { --- End diff -- move up ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244254822 --- Diff: integration/presto/src/test/scala/org/apache/carbondata/presto/server/PrestoServer.scala --- @@ -129,6 +130,21 @@ class PrestoServer { } } + def execute(query: String) = { + +Try { + val conn: Connection = createJdbcConnection(dbName) + logger.info(s"* executing the query * \n $query") --- End diff -- rename logger to LOGGER ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244254647 --- Diff: integration/presto/src/test/scala/org/apache/carbondata/presto/util/CarbonDataStoreCreator.scala --- @@ -80,7 +80,7 @@ object CarbonDataStoreCreator { UUID.randomUUID().toString)) // val factFilePath: String = new File(dataFilePath).getCanonicalPath val storeDir: File = new File(absoluteTableIdentifier.getTablePath) - CarbonUtil.deleteFoldersAndFiles(storeDir) +// CarbonUtil.deleteFoldersAndFiles(storeDir) --- End diff -- delete it if not required, same for line 81 ---
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2161 LGTM. Merging into master branch ---
[GitHub] carbondata issue #3018: [HOTFIX] rename field "thread_pool_size" to match ca...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/3018 LGTM ---
[GitHub] carbondata issue #2984: [CARBONDATA-3165]Protection of Bloom Null Exception
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2984 LGTM ---
[GitHub] carbondata pull request #3019: [CARBONDATA-3194] Integrating Carbon with Pre...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3019#discussion_r244172111 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataConnectorFactory.java --- @@ -17,69 +17,177 @@ package org.apache.carbondata.presto; +import java.lang.management.ManagementFactory; +import java.lang.reflect.*; import java.util.Map; +import java.util.Optional; +import java.util.Set; import static java.util.Objects.requireNonNull; -import com.facebook.presto.spi.ConnectorHandleResolver; +import org.apache.carbondata.presto.impl.CarbonTableConfig; + +import com.facebook.presto.hive.HiveConnector; +import com.facebook.presto.hive.HiveConnectorFactory; +import com.facebook.presto.hive.HiveMetadataFactory; +import com.facebook.presto.hive.HiveProcedureModule; +import com.facebook.presto.hive.HiveSchemaProperties; +import com.facebook.presto.hive.HiveSessionProperties; +import com.facebook.presto.hive.HiveStorageFormat; +import com.facebook.presto.hive.HiveTableProperties; +import com.facebook.presto.hive.HiveTransactionManager; +import com.facebook.presto.hive.NodeVersion; +import com.facebook.presto.hive.RebindSafeMBeanServer; +import com.facebook.presto.hive.authentication.HiveAuthenticationModule; +import com.facebook.presto.hive.metastore.HiveMetastoreModule; +import com.facebook.presto.hive.s3.HiveS3Module; +import com.facebook.presto.hive.security.HiveSecurityModule; +import com.facebook.presto.hive.security.PartitionsAwareAccessControl; +import com.facebook.presto.spi.NodeManager; +import com.facebook.presto.spi.PageIndexerFactory; +import com.facebook.presto.spi.PageSorter; import com.facebook.presto.spi.classloader.ThreadContextClassLoader; -import com.facebook.presto.spi.connector.*; -import com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata; +import com.facebook.presto.spi.connector.Connector; +import com.facebook.presto.spi.connector.ConnectorAccessControl; +import com.facebook.presto.spi.connector.ConnectorContext; +import com.facebook.presto.spi.connector.ConnectorNodePartitioningProvider; +import com.facebook.presto.spi.connector.ConnectorPageSinkProvider; +import com.facebook.presto.spi.connector.ConnectorPageSourceProvider; +import com.facebook.presto.spi.connector.ConnectorSplitManager; +import com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSinkProvider; import com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider; import com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorSplitManager; -import com.google.common.base.Throwables; +import com.facebook.presto.spi.connector.classloader.ClassLoaderSafeNodePartitioningProvider; +import com.facebook.presto.spi.procedure.Procedure; +import com.facebook.presto.spi.type.TypeManager; +import com.google.common.collect.ImmutableSet; import com.google.inject.Injector; +import com.google.inject.Key; +import com.google.inject.TypeLiteral; import io.airlift.bootstrap.Bootstrap; import io.airlift.bootstrap.LifeCycleManager; +import io.airlift.event.client.EventModule; import io.airlift.json.JsonModule; +import io.airlift.units.DataSize; +import org.weakref.jmx.guice.MBeanModule; +import sun.reflect.ConstructorAccessor; +import static com.google.common.base.Throwables.throwIfUnchecked; +import static io.airlift.configuration.ConfigBinder.configBinder; /** * Build Carbondata Connector * It will be called by CarbondataPlugin */ -public class CarbondataConnectorFactory implements ConnectorFactory { +public class CarbondataConnectorFactory extends HiveConnectorFactory { - private final String name; private final ClassLoader classLoader; public CarbondataConnectorFactory(String connectorName, ClassLoader classLoader) { -this.name = connectorName; +super(connectorName, classLoader, null); this.classLoader = requireNonNull(classLoader, "classLoader is null"); } - @Override public String getName() { -return name; - } - - @Override public ConnectorHandleResolver getHandleResolver() { -return new CarbondataHandleResolver(); - } - - @Override public Connector create(String connectorId, Map config, + @Override public Connector create(String catalogName, Map config, ConnectorContext context) { requireNonNull(config, "config is null"); try (ThreadContextClassLoader ignored = new ThreadContextClassLoader(classLoader)) { - Bootstrap app = new Bootstrap(new JsonModule(), -
[GitHub] carbondata issue #3004: [CARBONDATA-3188] Create carbon table as hive unders...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/3004 LGTM ---
[GitHub] carbondata pull request #3017: [HOTFIX] remove this useless assignment
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3017#discussion_r244065800 --- Diff: core/src/main/java/org/apache/carbondata/core/locks/LocalFileLock.java --- @@ -112,7 +112,7 @@ public LocalFileLock(String lockFileLocation, String lockFile) { status = true; } } catch (IOException e) { - status = false; + // status = false; --- End diff -- Yes, I think HdfsFileLock has similar problem. Actually there is a PR #2878 trying to fix all exception code style problem, by @kevinjmh . I suggest we can have it handled in that PR. Then for this PR, I think you can remove the assignment directly, same for HdfsFileLock ---
[GitHub] carbondata pull request #3026: [WIP] Added support to compile carbon CDH spa...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3026#discussion_r244065240 --- Diff: integration/spark-datasource/src/main/spark2.1andspark2.2/org/apache/spark/sql/CarbonDictionaryUtil.java --- @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql; + +import java.lang.reflect.Array; +import java.lang.reflect.Field; +import java.lang.reflect.Method; + +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; + +import org.apache.spark.sql.execution.vectorized.ColumnVector; + +/** + * This class uses the java reflection to create parquet dictionary class as CDH distribution uses + * twitter parquet instead of apache parquet. + */ +public class CarbonDictionaryUtil { --- End diff -- It it better to make it as `ReflectionUtil` And please add InterfaceAudience annotation ---
[GitHub] carbondata pull request #2963: [CARBONDATA-3139] Fix bugs in MinMaxDataMap e...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2963#discussion_r244007049 --- Diff: datamap/example/src/main/java/org/apache/carbondata/datamap/minmax/MinMaxDataMapFactory.java --- @@ -0,0 +1,353 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.datamap.minmax; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.exceptions.sql.MalformedDataMapCommandException; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.cache.Cache; +import org.apache.carbondata.core.cache.CacheProvider; +import org.apache.carbondata.core.cache.CacheType; +import org.apache.carbondata.core.datamap.DataMapDistributable; +import org.apache.carbondata.core.datamap.DataMapLevel; +import org.apache.carbondata.core.datamap.DataMapMeta; +import org.apache.carbondata.core.datamap.DataMapStoreManager; +import org.apache.carbondata.core.datamap.Segment; +import org.apache.carbondata.core.datamap.TableDataMap; +import org.apache.carbondata.core.datamap.dev.DataMapBuilder; +import org.apache.carbondata.core.datamap.dev.DataMapWriter; +import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap; +import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMapFactory; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.features.TableOperation; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.DataMapSchema; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.statusmanager.SegmentStatusManager; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.events.Event; + +import org.apache.log4j.Logger; + +/** + * Min Max DataMap Factory + */ +@InterfaceAudience.Internal +public class MinMaxDataMapFactory extends CoarseGrainDataMapFactory { + private static final Logger LOGGER = + LogServiceFactory.getLogService(MinMaxDataMapFactory.class.getName()); + private DataMapMeta dataMapMeta; + private String dataMapName; + // segmentId -> list of index files + private Map> segmentMap = new ConcurrentHashMap<>(); + private Cache cache; + + public MinMaxDataMapFactory(CarbonTable carbonTable, DataMapSchema dataMapSchema) + throws MalformedDataMapCommandException { +super(carbonTable, dataMapSchema); + +// this is an example for datamap, we can choose the columns and operations that +// will be supported by this datamap. Furthermore, we can add cache-support for this datamap. + +this.dataMapName = dataMapSchema.getDataMapName(); +List indexedColumns = carbonTable.getIndexedColumns(dataMapSchema); + +// operations that will be supported on the indexed columns +List optOperations = new ArrayList<>(); +optOperations.add(ExpressionType.NOT); +optOperations.add(ExpressionType.EQUALS); +optOperations.add(ExpressionType.NOT_EQUALS); +optOperations.add(ExpressionType.GREATERTHAN); +optOperations.add(ExpressionType.GREATERTHAN_EQUALTO); +opt
[GitHub] carbondata issue #3001: [Presto][Streaming] support presto read streaming ta...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/3001 @QiangCai Can you explain the dependency of readers after this PR? Ideally, presto module should depends on streaming reader in core or hadoop modules. ---
[GitHub] carbondata pull request #2970: [CARBONDATA-3142]Add timestamp with thread na...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2970#discussion_r244005751 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java --- @@ -144,14 +144,14 @@ private void initParameters(CarbonFactDataHandlerModel model) { blockletProcessingCount = new AtomicInteger(0); producerExecutorService = Executors.newFixedThreadPool(model.getNumberOfCores(), new CarbonThreadFactory( -"ProducerPool_" + System.nanoTime() + ":" + model.getTableName() + ", range: " + model -.getBucketId())); +"ProducerPool:" + model.getTableName() + ", range: " + model --- End diff -- better to use String.format ---
[GitHub] carbondata pull request #2970: [CARBONDATA-3142]Add timestamp with thread na...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2970#discussion_r244005776 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java --- @@ -144,14 +144,14 @@ private void initParameters(CarbonFactDataHandlerModel model) { blockletProcessingCount = new AtomicInteger(0); producerExecutorService = Executors.newFixedThreadPool(model.getNumberOfCores(), new CarbonThreadFactory( -"ProducerPool_" + System.nanoTime() + ":" + model.getTableName() + ", range: " + model -.getBucketId())); +"ProducerPool:" + model.getTableName() + ", range: " + model +.getBucketId(), true)); producerExecutorServiceTaskList = new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE); LOGGER.debug("Initializing writer executors"); consumerExecutorService = Executors.newFixedThreadPool(1, new CarbonThreadFactory( -"ConsumerPool_" + System.nanoTime() + ":" + model.getTableName() + ", range: " + model -.getBucketId())); +"ConsumerPool:" + model.getTableName() + ", range: " + model --- End diff -- better to use String.format() ---
[GitHub] carbondata pull request #2970: [CARBONDATA-3142]Add timestamp with thread na...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2970#discussion_r244005707 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonThreadFactory.java --- @@ -34,14 +34,26 @@ */ private String name; + private boolean withTime = false; + public CarbonThreadFactory(String name) { this.defaultFactory = Executors.defaultThreadFactory(); this.name = name; } + public CarbonThreadFactory(String name, boolean withTime) { +this(name); +this.withTime = withTime; + } + @Override public Thread newThread(Runnable r) { final Thread thread = defaultFactory.newThread(r); -thread.setName(name); +if (withTime) { + thread.setName(name + "_" + System.currentTimeMillis()); +} +else { --- End diff -- move to previous line ---
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2161 @chandrasaripaka thanks for contributing this, I have only last 1 commend. It can be merged after that is resolved, if it is really an issue ---
[GitHub] carbondata pull request #2161: [CARBONDATA-2218] AlluxioCarbonFile while try...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2161#discussion_r244005006 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AlluxioCarbonFile.java --- @@ -90,21 +97,82 @@ public CarbonFile getParentFile() { return null == parent ? null : new AlluxioCarbonFile(parent); } + /** + * RenameForce of the fileName for the AlluxioFileSystem Implementation. + * Involves by opening a {@link FSDataInputStream} from the existing path and copy + * bytes to {@link FSDataOutputStream}. + * + * Close the output and input streams only after the files have been written + * Also check for the existence of the changed path and then delete the previous Path. + * The No of Bytes that can be read is controlled by {@literal io.file.buffer.size}, + * where the default value is 4096. + * @param changeToName + * @return + */ @Override public boolean renameForce(String changeToName) { -FileSystem fs; +FileSystem fs = null; +FSDataOutputStream fsdos = null; +FSDataInputStream fsdis = null; try { - fs = fileStatus.getPath().getFileSystem(FileFactory.getConfiguration()); - if (fs instanceof DistributedFileSystem) { -((DistributedFileSystem) fs).rename(fileStatus.getPath(), new Path(changeToName), -org.apache.hadoop.fs.Options.Rename.OVERWRITE); + Path actualPath = fileStatus.getPath(); + Path changedPath = new Path(changeToName); + fs = actualPath.getFileSystem(hadoopConf); + fsdos = fs.create(changedPath, true); + fsdis = fs.open(actualPath); + if (null != fsdis && null != fsdos) { +IOUtils.copyBytes(fsdis, fsdos, hadoopConf, true); return true; - } else { -return false; } + return false; } catch (IOException e) { LOGGER.error("Exception occured: " + e.getMessage()); return false; +} finally { + try { +if (null != fsdis && null != fsdos) { + if (fs.exists(new Path(changeToName))) { +fs.delete(fileStatus.getPath(), true); --- End diff -- Here the fileStatus is not updated, I suspect there will be potential issue ---
[GitHub] carbondata pull request #2161: [CARBONDATA-2218] AlluxioCarbonFile while try...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2161#discussion_r244004960 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AlluxioCarbonFile.java --- @@ -90,21 +97,82 @@ public CarbonFile getParentFile() { return null == parent ? null : new AlluxioCarbonFile(parent); } + /** + * RenameForce of the fileName for the AlluxioFileSystem Implementation. + * Involves by opening a {@link FSDataInputStream} from the existing path and copy + * bytes to {@link FSDataOutputStream}. + * + * Close the output and input streams only after the files have been written + * Also check for the existence of the changed path and then delete the previous Path. + * The No of Bytes that can be read is controlled by {@literal io.file.buffer.size}, + * where the default value is 4096. + * @param changeToName + * @return + */ @Override public boolean renameForce(String changeToName) { -FileSystem fs; +FileSystem fs = null; +FSDataOutputStream fsdos = null; +FSDataInputStream fsdis = null; try { - fs = fileStatus.getPath().getFileSystem(FileFactory.getConfiguration()); - if (fs instanceof DistributedFileSystem) { -((DistributedFileSystem) fs).rename(fileStatus.getPath(), new Path(changeToName), -org.apache.hadoop.fs.Options.Rename.OVERWRITE); + Path actualPath = fileStatus.getPath(); + Path changedPath = new Path(changeToName); + fs = actualPath.getFileSystem(hadoopConf); + fsdos = fs.create(changedPath, true); + fsdis = fs.open(actualPath); + if (null != fsdis && null != fsdos) { +IOUtils.copyBytes(fsdis, fsdos, hadoopConf, true); return true; --- End diff -- I think what you do in the finally block should move to here and fileStatus should be assigned to point to the newly created file. ---
[GitHub] carbondata pull request #3017: [HOTFIX] remove this useless assignment
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3017#discussion_r244003686 --- Diff: core/src/main/java/org/apache/carbondata/core/locks/LocalFileLock.java --- @@ -112,7 +112,7 @@ public LocalFileLock(String lockFileLocation, String lockFile) { status = true; } } catch (IOException e) { - status = false; + // status = false; --- End diff -- I suggest we print a warning message, and remove this assignment @ravipesala @QiangCai ---
[GitHub] carbondata issue #3004: [CARBONDATA-3188] Create carbon table as hive unders...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/3004 @SteNicholas Let's first merge this PR and then you can start refactory on carbondata-hive module ---
[GitHub] carbondata pull request #3004: [CARBONDATA-3188] Create carbon table as hive...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3004#discussion_r244002248 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateHiveTableWithCarbonDS.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.{AnalysisException, CarbonEnv, CarbonSession} +import org.apache.spark.util.SparkUtil +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat + +class TestCreateHiveTableWithCarbonDS extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with stored by") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |STORED BY 'carbondata' + """.stripMargin) + +verifyTable + +sql("DROP TABLE IF EXISTS source") + } + + private def verifyTable = { +val table = sqlContext.sparkSession.asInstanceOf[CarbonSession].sessionState.catalog.asInstanceOf[CarbonSessionCatalog].getClient().getTable("default", "source") +assert(table.schema.fields.length == 3) +if (SparkUtil.isSparkVersionEqualTo("2.2")) { + assert(table.storage.locationUri.get.equals(new Path(s"file:$storeLocation/source").toUri)) +} + assert(table.storage.inputFormat.get.equals(classOf[CarbonFileInputFormat[_]].getName)) --- End diff -- Shouldn't it be CarbonTableInputFormat class? ---
[GitHub] carbondata pull request #3004: [CARBONDATA-3188] Create carbon table as hive...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3004#discussion_r244002150 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateHiveTableWithCarbonDS.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.{AnalysisException, CarbonEnv, CarbonSession} +import org.apache.spark.util.SparkUtil +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat + +class TestCreateHiveTableWithCarbonDS extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with stored by") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |STORED BY 'carbondata' + """.stripMargin) + +verifyTable + +sql("DROP TABLE IF EXISTS source") + } + + private def verifyTable = { +val table = sqlContext.sparkSession.asInstanceOf[CarbonSession].sessionState.catalog.asInstanceOf[CarbonSessionCatalog].getClient().getTable("default", "source") +assert(table.schema.fields.length == 3) +if (SparkUtil.isSparkVersionEqualTo("2.2")) { + assert(table.storage.locationUri.get.equals(new Path(s"file:$storeLocation/source").toUri)) +} + assert(table.storage.inputFormat.get.equals(classOf[CarbonFileInputFormat[_]].getName)) --- End diff -- please assert the serde class and outputformat class also ---
[GitHub] carbondata pull request #3004: [CARBONDATA-3188] Create carbon table as hive...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3004#discussion_r244001857 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateHiveTableWithCarbonDS.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.{AnalysisException, CarbonEnv, CarbonSession} +import org.apache.spark.util.SparkUtil +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat + +class TestCreateHiveTableWithCarbonDS extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with stored by") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |STORED BY 'carbondata' + """.stripMargin) + +verifyTable + +sql("DROP TABLE IF EXISTS source") + } + + private def verifyTable = { +val table = sqlContext.sparkSession.asInstanceOf[CarbonSession].sessionState.catalog.asInstanceOf[CarbonSessionCatalog].getClient().getTable("default", "source") +assert(table.schema.fields.length == 3) --- End diff -- same for line 61, 63 ---
[GitHub] carbondata pull request #3004: [CARBONDATA-3188] Create carbon table as hive...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3004#discussion_r244001900 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateHiveTableWithCarbonDS.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.{AnalysisException, CarbonEnv, CarbonSession} +import org.apache.spark.util.SparkUtil +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat + +class TestCreateHiveTableWithCarbonDS extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with stored by") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |STORED BY 'carbondata' + """.stripMargin) + +verifyTable + +sql("DROP TABLE IF EXISTS source") + } + + private def verifyTable = { +val table = sqlContext.sparkSession.asInstanceOf[CarbonSession].sessionState.catalog.asInstanceOf[CarbonSessionCatalog].getClient().getTable("default", "source") +assert(table.schema.fields.length == 3) +if (SparkUtil.isSparkVersionEqualTo("2.2")) { + assert(table.storage.locationUri.get.equals(new Path(s"file:$storeLocation/source").toUri)) +} + assert(table.storage.inputFormat.get.equals(classOf[CarbonFileInputFormat[_]].getName)) + } + + test("test create table and verify the hive table correctness with using carbondata") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |using carbondata + """.stripMargin) + +verifyTable + + +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with using carbon") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |using carbon + """.stripMargin) + +verifyTable + --- End diff -- remove extra line ---
[GitHub] carbondata pull request #3004: [CARBONDATA-3188] Create carbon table as hive...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3004#discussion_r244001878 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateHiveTableWithCarbonDS.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.{AnalysisException, CarbonEnv, CarbonSession} +import org.apache.spark.util.SparkUtil +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat + +class TestCreateHiveTableWithCarbonDS extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with stored by") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |STORED BY 'carbondata' + """.stripMargin) + +verifyTable + +sql("DROP TABLE IF EXISTS source") + } + + private def verifyTable = { +val table = sqlContext.sparkSession.asInstanceOf[CarbonSession].sessionState.catalog.asInstanceOf[CarbonSessionCatalog].getClient().getTable("default", "source") +assert(table.schema.fields.length == 3) +if (SparkUtil.isSparkVersionEqualTo("2.2")) { + assert(table.storage.locationUri.get.equals(new Path(s"file:$storeLocation/source").toUri)) +} + assert(table.storage.inputFormat.get.equals(classOf[CarbonFileInputFormat[_]].getName)) + } + + test("test create table and verify the hive table correctness with using carbondata") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |using carbondata + """.stripMargin) + +verifyTable + + --- End diff -- remove extra line ---
[GitHub] carbondata pull request #3004: [CARBONDATA-3188] Create carbon table as hive...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3004#discussion_r244001830 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateHiveTableWithCarbonDS.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.{AnalysisException, CarbonEnv, CarbonSession} +import org.apache.spark.util.SparkUtil +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat + +class TestCreateHiveTableWithCarbonDS extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with stored by") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |STORED BY 'carbondata' + """.stripMargin) + +verifyTable + +sql("DROP TABLE IF EXISTS source") + } + + private def verifyTable = { +val table = sqlContext.sparkSession.asInstanceOf[CarbonSession].sessionState.catalog.asInstanceOf[CarbonSessionCatalog].getClient().getTable("default", "source") +assert(table.schema.fields.length == 3) --- End diff -- use assertResult so that it can show the actual value when failed ---
[GitHub] carbondata pull request #3004: [CARBONDATA-3188] Create carbon table as hive...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3004#discussion_r244001671 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateHiveTableWithCarbonDS.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.{AnalysisException, CarbonEnv, CarbonSession} +import org.apache.spark.util.SparkUtil +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat + +class TestCreateHiveTableWithCarbonDS extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with stored by") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |STORED BY 'carbondata' --- End diff -- ```suggestion |STORED AS carbondata ``` ---
[GitHub] carbondata pull request #3004: [CARBONDATA-3188] Create carbon table as hive...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3004#discussion_r244001631 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateHiveTableWithCarbonDS.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import java.io.File + +import org.apache.hadoop.fs.Path +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.sql.test.util.QueryTest +import org.apache.spark.sql.{AnalysisException, CarbonEnv, CarbonSession} +import org.apache.spark.util.SparkUtil +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.hadoop.api.CarbonFileInputFormat + +class TestCreateHiveTableWithCarbonDS extends QueryTest with BeforeAndAfterAll { + + override def beforeAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + override def afterAll(): Unit = { +sql("DROP TABLE IF EXISTS source") + } + + test("test create table and verify the hive table correctness with stored by") { +sql("DROP TABLE IF EXISTS source") +sql( + s""" + |CREATE TABLE source (key INT, value string, col1 double) + |STORED BY 'carbondata' --- End diff -- please use STORED AS carbondata ---
[GitHub] carbondata issue #3012: [CARBONDATA-3127]Fix the HiveExample & TestCarbonSer...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/3012 Thanks for contributing this fix ---
[GitHub] carbondata issue #3012: [CARBONDATA-3127]Fix the HiveExample & TestCarbonSer...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/3012 LGTM ---
[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2990 LGTM. Thanks for working on this ---
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2161 Since user normally uses Alluxio as a read cache, I think we can firstly verify carbon on alluxio for the query scenario. As I am still not very sure what is the correct way to implement rename for Alluxio, in the meantime, we can merge this PR first. So please rebase it. @chandrasaripaka ---
[GitHub] carbondata issue #2969: [CARBONDATA-3127]Fix the HiveExample & TestCarbonSer...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2969 LGTM, can you squash your commits and rebase to master. I could not squash it ---
[GitHub] carbondata issue #3010: Fix PreAggregate Datamap Issue
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/3010 Please create a JIRA issue and put it in the PR title ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243250514 --- Diff: integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.carbondata.restructure.vectorreader + +import org.apache.spark.sql.common.util.Spark2QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.metadata.CarbonMetadata +import org.apache.carbondata.spark.exception.ProcessMetaDataException + +class AlterTableColumnRenameTestCase extends Spark2QueryTest with BeforeAndAfterAll { + + --- End diff -- remove extra empty line ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243250452 --- Diff: integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala --- @@ -548,11 +548,16 @@ class BloomCoarseGrainDataMapSuite extends QueryTest with BeforeAndAfterAll with | USING 'bloomfilter' | DMProperties( 'INDEX_COLUMNS'='city,id', 'BLOOM_SIZE'='64') """.stripMargin) -val exception: MalformedCarbonCommandException = intercept[MalformedCarbonCommandException] { +val exception1: MalformedCarbonCommandException = intercept[MalformedCarbonCommandException] { --- End diff -- use a proper name ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243248566 --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala --- @@ -269,12 +269,50 @@ object AlterTableUtil { } } } - metastore + metaStore .revertTableSchemaInAlterFailure(carbonTable.getCarbonTableIdentifier, thriftTable, carbonTable.getAbsoluteTableIdentifier)(sparkSession) } } + /** + * This method modifies the table properties if column rename happened + * + * @param tableProperties --- End diff -- add comment for all parameter ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243248408 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/StreamingTableStrategy.scala --- @@ -54,10 +54,10 @@ private[sql] class StreamingTableStrategy(sparkSession: SparkSession) extends Sp new TableIdentifier(model.tableName, model.databaseName), "Alter table drop column") Nil - case CarbonAlterTableDataTypeChangeCommand(model) => + case CarbonAlterTableColRenameDataTypeChangeCommand(model, _) => rejectIfStreamingTable( new TableIdentifier(model.tableName, model.databaseName), - "Alter table change datatype") + "Alter table change datatype or column rename") --- End diff -- I think you can know whether it is rename by `CarbonAlterTableColRenameDataTypeChangeCommand(model, rename)`, so you can use different message ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243247751 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala --- @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.schema + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.execution.command.{AlterTableDataTypeChangeModel, DataTypeInfo, + MetadataCommand} +import org.apache.spark.sql.hive.CarbonSessionCatalog +import org.apache.spark.util.AlterTableUtil + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.features.TableOperation +import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl +import org.apache.carbondata.core.metadata.datatype.DecimalType +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn +import org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, + AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, OperationListenerBus} +import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, TableInfo} +import org.apache.carbondata.spark.util.DataTypeConverterUtil + +abstract class CarbonAlterTableColumnRenameCommand(oldColumnName: String, newColumnName: String) + extends MetadataCommand { + + protected def validColumnsForRenaming(carbonColumns: mutable.Buffer[CarbonColumn], + oldCarbonColumn: CarbonColumn, + carbonTable: CarbonTable): Unit = { +// check whether new column name is already an existing column name +if (carbonColumns.exists(_.getColName.equalsIgnoreCase(newColumnName))) { + throw new MalformedCarbonCommandException(s"Column Rename Operation failed. New " + +s"column name $newColumnName already exists" + +s" in table ${ carbonTable.getTableName }") +} + +// if the column rename is for complex column, block the operation +if (oldCarbonColumn.isComplex) { + throw new MalformedCarbonCommandException(s"Column Rename Operation failed. Rename " + +s"column is unsupported for complex datatype " + +s"column ${ oldCarbonColumn.getColName }") +} + +// if column rename operation is on partition column, then fail the rename operation +if (null != carbonTable.getPartitionInfo) { + val partitionColumns = carbonTable.getPartitionInfo.getColumnSchemaList + partitionColumns.asScala.foreach { +col => + if (col.getColumnName.equalsIgnoreCase(oldColumnName)) { +throw new MalformedCarbonCommandException( + s"Column Rename Operation failed. Renaming " + + s"the partition column $newColumnName is not " + + s"allowed") + } + } +} + + } +} + +private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand( +alterTableColRenameAndDataTypeChangeModel: AlterTableDataTypeChangeModel, +childTableColumnRename: Boolean = false) + extends CarbonAlterTableColumnRenameCommand(alterTableColRenameAndDataTypeChangeModel.columnName, +alterTableColRenameAndDataTypeChangeModel.newColu
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243245093 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -1511,7 +1514,16 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } DataTypeInfo("decimal", precision, scale) case _ => -throw new MalformedCarbonCommandException("Data type provided is invalid.") +if (isColumnRename) { + dataType match { --- End diff -- why here match `dataType` again which is already match in line 1496. Seems this func need to be refactored ---
[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2990#discussion_r243242285 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -1487,16 +1487,19 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { * @param values * @return */ - def parseDataType(dataType: String, values: Option[List[(Int, Int)]]): DataTypeInfo = { + def parseDataType( + dataType: String, + values: Option[List[(Int, Int)]], + isColumnRename: Boolean): DataTypeInfo = { --- End diff -- complete the comment start from line 1486 ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r243150967 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -22,52 +22,49 @@ import java.sql.{DriverManager, ResultSet, Statement} import org.apache.spark.sql.SparkSession --- End diff -- Please move this example to example folder, I suggest to put all example in one module, so instead of adding another example module for hive, I think it is better add it in existing example module and rename it to carbondata-example from carbondata-example-spark2 ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r243149546 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -85,25 +82,35 @@ object HiveExample { logger.info(s"HIVE CLI IS STARTED ON PORT $port ==") -statement.execute("CREATE TABLE IF NOT EXISTS " + "HIVE_CARBON_EXAMPLE " + - " (ID int, NAME string,SALARY double)") -statement - .execute( -"ALTER TABLE HIVE_CARBON_EXAMPLE SET FILEFORMAT INPUTFORMAT \"org.apache.carbondata." + -"hive.MapredCarbonInputFormat\"OUTPUTFORMAT \"org.apache.carbondata.hive." + -"MapredCarbonOutputFormat\"SERDE \"org.apache.carbondata.hive." + -"CarbonHiveSerDe\" ") +statement.execute( + s""" + | CREATE TABLE IF NOT EXISTS HIVE_CARBON_EXAMPLE + | (ID int, NAME string,SALARY double) + | ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' + | WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default', + | 'mapreduce.input.carboninputformat.tableName'='HIVE_CARBON_EXAMPLE') + """.stripMargin) + +statement.execute( + s""" + | ALTER TABLE HIVE_CARBON_EXAMPLE + | SET FILEFORMAT + | INPUTFORMAT \"org.apache.carbondata.hive.MapredCarbonInputFormat\" --- End diff -- This is fine for this PR, and after merging #3004 this ALTER TABLE will not be required ---
[GitHub] carbondata pull request #2969: [CARBONDATA-3127]Fix the TestCarbonSerde exce...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2969#discussion_r243149331 --- Diff: integration/hive/src/main/scala/org/apache/carbondata/hiveexample/HiveExample.scala --- @@ -49,17 +49,19 @@ object HiveExample { sparkSession.sql("""DROP TABLE IF EXISTS HIVE_CARBON_EXAMPLE""".stripMargin) -sparkSession - .sql( -"CREATE TABLE HIVE_CARBON_EXAMPLE (ID int,NAME string,SALARY double) " + - "STORED BY 'CARBONDATA' ") +sparkSession.sql( + s""" + | CREATE TABLE HIVE_CARBON_EXAMPLE + | (ID int,NAME string,SALARY double) + | STORED BY 'CARBONDATA' --- End diff -- ```suggestion | STORED AS CARBONDATA ``` ---
[GitHub] carbondata issue #2925: [CARBONDATA-3102] Fix NoClassDefFoundError when use ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2925 LGTM ---
[GitHub] carbondata pull request #2984: [CARBONDATA-3165]Protection of Bloom Null Exc...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2984#discussion_r241938979 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java --- @@ -227,6 +231,12 @@ private String getAncestorTablePath(CarbonTable currentTable) { } } } +if (hitBlocklets == null) { + LOGGER.warn(String.format("HitBlocklets is empty in bloom filter prune method. " + --- End diff -- can you make it non-null, such as making it an empty set ---
[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2925#discussion_r241774500 --- Diff: integration/spark2/pom.xml --- @@ -134,6 +134,11 @@ + --- End diff -- I see `org.apache.httpcomponents` is already introduced in other modules, please unify the version for this jar in all places ---
[GitHub] carbondata issue #2973: [WIP][CARBONDATA-3144] CarbonData support spark-2.4....
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2973 Please close this if you are not working on it. Thanks ---
[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2925#discussion_r241736440 --- Diff: integration/spark2/pom.xml --- @@ -134,6 +134,11 @@ + --- End diff -- I have never encounter this problem before, can you describe the scenario in more detail, in what case NoClassDefFoundError will be thrown? ---
[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2925#discussion_r241736214 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/thriftserver/CarbonThriftServer.scala --- @@ -48,8 +48,13 @@ object CarbonThriftServer { System.exit(0) } +val master = Option(System.getProperty("spark.master")) --- End diff -- I think better to let user give --master local, user can control this. ---
[GitHub] carbondata pull request #2890: [CARBONDATA-3002] Fix some spell error
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2890#discussion_r241735498 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -569,25 +569,25 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden nodes.add(relation.getNode()); } -int noofNodes = (-1 == noOfNodesInput) ? nodes.size() : noOfNodesInput; +int noOfNodes = (-1 == noOfNodesInput) ? nodes.size() : noOfNodesInput; --- End diff -- ```suggestion int numNodes = (-1 == noOfNodesInput) ? nodes.size() : numNodesInput; ``` ---
[GitHub] carbondata pull request #2914: [CARBONDATA-3093] Provide property builder fo...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2914#discussion_r241285417 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -40,1372 +41,1024 @@ private CarbonCommonConstants() { @CarbonProperty public static final String STORE_LOCATION = "carbon.storelocation"; - /** - * blocklet size in carbon file - */ - @CarbonProperty - public static final String BLOCKLET_SIZE = "carbon.blocklet.size"; - - /** - * default blocklet size - */ - public static final String BLOCKLET_SIZE_DEFAULT_VAL = "12"; - - /** - * min blocklet size - */ - public static final int BLOCKLET_SIZE_MIN_VAL = 2000; - - /** - * max blocklet size - */ - public static final int BLOCKLET_SIZE_MAX_VAL = 1200; - - /** - * min block size in MB - */ - public static final int BLOCK_SIZE_MIN_VAL = 1; - - /** - * max block size in MB - */ - public static final int BLOCK_SIZE_MAX_VAL = 2048; - - /** - * carbon properties file path - */ - @CarbonProperty - public static final String CARBON_PROPERTIES_FILE_PATH = "carbon.properties.filepath"; - - /** - * default carbon properties file path - */ - public static final String CARBON_PROPERTIES_FILE_PATH_DEFAULT = - "../../../conf/carbon.properties"; - - /** - * CARBON_DDL_BASE_HDFS_URL - */ - @CarbonProperty - public static final String CARBON_DDL_BASE_HDFS_URL = "carbon.ddl.base.hdfs.url"; - - /** - * CARBON_BADRECORDS_LOCATION - */ - @CarbonProperty - public static final String CARBON_BADRECORDS_LOC = "carbon.badRecords.location"; - - /** - * CARBON_BADRECORDS_LOCATION_DEFAULT - */ - public static final String CARBON_BADRECORDS_LOC_DEFAULT_VAL = ""; - - /** - * Property for specifying the format of TIMESTAMP data type column. - * e.g. /MM/dd HH:mm:ss, or using default value - */ - @CarbonProperty - public static final String CARBON_TIMESTAMP_FORMAT = "carbon.timestamp.format"; - - /** - * default value - */ - public static final String CARBON_TIMESTAMP_DEFAULT_FORMAT = "-MM-dd HH:mm:ss"; - - /** - * CARBON_TIMESTAMP - */ - public static final String CARBON_TIMESTAMP = "dd-MM- HH:mm:ss"; - - /** - * CARBON_TIMESTAMP - */ - public static final String CARBON_TIMESTAMP_MILLIS = "dd-MM- HH:mm:ss:SSS"; - - /** - * Property for specifying the format of DATE data type column. - * e.g. /MM/dd , or using default value - */ - @CarbonProperty - public static final String CARBON_DATE_FORMAT = "carbon.date.format"; - - /** - * default value - */ - public static final String CARBON_DATE_DEFAULT_FORMAT = "-MM-dd"; - - /** - * compressor for writing/reading CarbonData file - */ - @CarbonProperty - public static final String COMPRESSOR = "carbon.column.compressor"; - - /** - * default compressor is snappy - */ - public static final String DEFAULT_COMPRESSOR = "snappy"; - - /** - * ZOOKEEPER_ENABLE_LOCK if this is set to true then zookeeper - * will be used to handle locking - * mechanism of carbon - */ - @CarbonProperty - public static final String LOCK_TYPE = "carbon.lock.type"; - - /** - * ZOOKEEPER_ENABLE_DEFAULT the default value for zookeeper will be true for carbon - */ - public static final String LOCK_TYPE_DEFAULT = "LOCALLOCK"; - - /** - * Specifies the path where the lock files have to be created. - * By default, lock files are created in table path. - */ - @CarbonProperty - public static final String LOCK_PATH = "carbon.lock.path"; - - public static final String LOCK_PATH_DEFAULT = ""; - - /** - * ZOOKEEPER_LOCATION this is the location in zookeeper file system where locks are created. - * mechanism of carbon - */ - public static final String ZOOKEEPER_LOCATION = "/CarbonLocks"; - - /** - * xxhash algorithm property for hashmap - */ - @CarbonProperty - public static final String ENABLE_XXHASH = "carbon.enableXXHash"; - - /** - * xxhash algorithm property for hashmap Default value false - */ - public static final String ENABLE_XXHASH_DEFAULT = "true&qu
[GitHub] carbondata pull request #2914: [CARBONDATA-3093] Provide property builder fo...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2914#discussion_r241285278 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -40,1372 +41,1024 @@ private CarbonCommonConstants() { @CarbonProperty public static final String STORE_LOCATION = "carbon.storelocation"; - /** - * blocklet size in carbon file - */ - @CarbonProperty - public static final String BLOCKLET_SIZE = "carbon.blocklet.size"; - - /** - * default blocklet size - */ - public static final String BLOCKLET_SIZE_DEFAULT_VAL = "12"; - - /** - * min blocklet size - */ - public static final int BLOCKLET_SIZE_MIN_VAL = 2000; - - /** - * max blocklet size - */ - public static final int BLOCKLET_SIZE_MAX_VAL = 1200; - - /** - * min block size in MB - */ - public static final int BLOCK_SIZE_MIN_VAL = 1; - - /** - * max block size in MB - */ - public static final int BLOCK_SIZE_MAX_VAL = 2048; - - /** - * carbon properties file path - */ - @CarbonProperty - public static final String CARBON_PROPERTIES_FILE_PATH = "carbon.properties.filepath"; - - /** - * default carbon properties file path - */ - public static final String CARBON_PROPERTIES_FILE_PATH_DEFAULT = - "../../../conf/carbon.properties"; - - /** - * CARBON_DDL_BASE_HDFS_URL - */ - @CarbonProperty - public static final String CARBON_DDL_BASE_HDFS_URL = "carbon.ddl.base.hdfs.url"; - - /** - * CARBON_BADRECORDS_LOCATION - */ - @CarbonProperty - public static final String CARBON_BADRECORDS_LOC = "carbon.badRecords.location"; - - /** - * CARBON_BADRECORDS_LOCATION_DEFAULT - */ - public static final String CARBON_BADRECORDS_LOC_DEFAULT_VAL = ""; - - /** - * Property for specifying the format of TIMESTAMP data type column. - * e.g. /MM/dd HH:mm:ss, or using default value - */ - @CarbonProperty - public static final String CARBON_TIMESTAMP_FORMAT = "carbon.timestamp.format"; - - /** - * default value - */ - public static final String CARBON_TIMESTAMP_DEFAULT_FORMAT = "-MM-dd HH:mm:ss"; - - /** - * CARBON_TIMESTAMP - */ - public static final String CARBON_TIMESTAMP = "dd-MM- HH:mm:ss"; - - /** - * CARBON_TIMESTAMP - */ - public static final String CARBON_TIMESTAMP_MILLIS = "dd-MM- HH:mm:ss:SSS"; - - /** - * Property for specifying the format of DATE data type column. - * e.g. /MM/dd , or using default value - */ - @CarbonProperty - public static final String CARBON_DATE_FORMAT = "carbon.date.format"; - - /** - * default value - */ - public static final String CARBON_DATE_DEFAULT_FORMAT = "-MM-dd"; - - /** - * compressor for writing/reading CarbonData file - */ - @CarbonProperty - public static final String COMPRESSOR = "carbon.column.compressor"; - - /** - * default compressor is snappy - */ - public static final String DEFAULT_COMPRESSOR = "snappy"; - - /** - * ZOOKEEPER_ENABLE_LOCK if this is set to true then zookeeper - * will be used to handle locking - * mechanism of carbon - */ - @CarbonProperty - public static final String LOCK_TYPE = "carbon.lock.type"; - - /** - * ZOOKEEPER_ENABLE_DEFAULT the default value for zookeeper will be true for carbon - */ - public static final String LOCK_TYPE_DEFAULT = "LOCALLOCK"; - - /** - * Specifies the path where the lock files have to be created. - * By default, lock files are created in table path. - */ - @CarbonProperty - public static final String LOCK_PATH = "carbon.lock.path"; - - public static final String LOCK_PATH_DEFAULT = ""; - - /** - * ZOOKEEPER_LOCATION this is the location in zookeeper file system where locks are created. - * mechanism of carbon - */ - public static final String ZOOKEEPER_LOCATION = "/CarbonLocks"; - - /** - * xxhash algorithm property for hashmap - */ - @CarbonProperty - public static final String ENABLE_XXHASH = "carbon.enableXXHash"; - - /** - * xxhash algorithm property for hashmap Default value false - */ - public static final String ENABLE_XXHASH_DEFAULT = "true&qu
[GitHub] carbondata pull request #2914: [CARBONDATA-3093] Provide property builder fo...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2914#discussion_r241285056 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -40,1372 +41,1024 @@ private CarbonCommonConstants() { @CarbonProperty public static final String STORE_LOCATION = "carbon.storelocation"; - /** - * blocklet size in carbon file - */ - @CarbonProperty - public static final String BLOCKLET_SIZE = "carbon.blocklet.size"; - - /** - * default blocklet size - */ - public static final String BLOCKLET_SIZE_DEFAULT_VAL = "12"; - - /** - * min blocklet size - */ - public static final int BLOCKLET_SIZE_MIN_VAL = 2000; - - /** - * max blocklet size - */ - public static final int BLOCKLET_SIZE_MAX_VAL = 1200; - - /** - * min block size in MB - */ - public static final int BLOCK_SIZE_MIN_VAL = 1; - - /** - * max block size in MB - */ - public static final int BLOCK_SIZE_MAX_VAL = 2048; - - /** - * carbon properties file path - */ - @CarbonProperty - public static final String CARBON_PROPERTIES_FILE_PATH = "carbon.properties.filepath"; - - /** - * default carbon properties file path - */ - public static final String CARBON_PROPERTIES_FILE_PATH_DEFAULT = - "../../../conf/carbon.properties"; - - /** - * CARBON_DDL_BASE_HDFS_URL - */ - @CarbonProperty - public static final String CARBON_DDL_BASE_HDFS_URL = "carbon.ddl.base.hdfs.url"; - - /** - * CARBON_BADRECORDS_LOCATION - */ - @CarbonProperty - public static final String CARBON_BADRECORDS_LOC = "carbon.badRecords.location"; - - /** - * CARBON_BADRECORDS_LOCATION_DEFAULT - */ - public static final String CARBON_BADRECORDS_LOC_DEFAULT_VAL = ""; - - /** - * Property for specifying the format of TIMESTAMP data type column. - * e.g. /MM/dd HH:mm:ss, or using default value - */ - @CarbonProperty - public static final String CARBON_TIMESTAMP_FORMAT = "carbon.timestamp.format"; - - /** - * default value - */ - public static final String CARBON_TIMESTAMP_DEFAULT_FORMAT = "-MM-dd HH:mm:ss"; - - /** - * CARBON_TIMESTAMP - */ - public static final String CARBON_TIMESTAMP = "dd-MM- HH:mm:ss"; - - /** - * CARBON_TIMESTAMP - */ - public static final String CARBON_TIMESTAMP_MILLIS = "dd-MM- HH:mm:ss:SSS"; - - /** - * Property for specifying the format of DATE data type column. - * e.g. /MM/dd , or using default value - */ - @CarbonProperty - public static final String CARBON_DATE_FORMAT = "carbon.date.format"; - - /** - * default value - */ - public static final String CARBON_DATE_DEFAULT_FORMAT = "-MM-dd"; - - /** - * compressor for writing/reading CarbonData file - */ - @CarbonProperty - public static final String COMPRESSOR = "carbon.column.compressor"; - - /** - * default compressor is snappy - */ - public static final String DEFAULT_COMPRESSOR = "snappy"; - - /** - * ZOOKEEPER_ENABLE_LOCK if this is set to true then zookeeper - * will be used to handle locking - * mechanism of carbon - */ - @CarbonProperty - public static final String LOCK_TYPE = "carbon.lock.type"; - - /** - * ZOOKEEPER_ENABLE_DEFAULT the default value for zookeeper will be true for carbon - */ - public static final String LOCK_TYPE_DEFAULT = "LOCALLOCK"; - - /** - * Specifies the path where the lock files have to be created. - * By default, lock files are created in table path. - */ - @CarbonProperty - public static final String LOCK_PATH = "carbon.lock.path"; - - public static final String LOCK_PATH_DEFAULT = ""; - - /** - * ZOOKEEPER_LOCATION this is the location in zookeeper file system where locks are created. - * mechanism of carbon - */ - public static final String ZOOKEEPER_LOCATION = "/CarbonLocks"; - - /** - * xxhash algorithm property for hashmap - */ - @CarbonProperty - public static final String ENABLE_XXHASH = "carbon.enableXXHash"; - - /** - * xxhash algorithm property for hashmap Default value false - */ - public static final String ENABLE_XXHASH_DEFAULT = "true&qu
[GitHub] carbondata issue #2978: [CARBONDATA-3157] Added lazy load and direct vector ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2978 @chenliang613 column vector code (CarbonColumnVector interface) and a base implementation (CarbonColumnVectorImpl class) is in carbon-core module, but still for every engine integration layer, they need to be adapt to the compute engine. ---
[GitHub] carbondata issue #2978: [CARBONDATA-3157] Added lazy load and direct vector ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2978 LGTM ---
[GitHub] carbondata issue #2978: [CARBONDATA-3157] Added lazy load and direct vector ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2978 LGTM ---
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240530952 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto; + +import java.math.BigDecimal; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.scan.result.vector.impl.directread.SequentialFill; +import org.apache.carbondata.core.scan.scanner.LazyPageLoader; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector,SequentialFill { + + + /** + * It is adapter class of complete ColumnarBatch. + */ + protected CarbonColumnVectorImpl columnVector; + + private DataType blockDataType; + + private CarbonColumnVector dictionaryVector; + + private BitSet nullBitset; + + ColumnarVectorWrapperDirect(CarbonColumnVectorImpl columnVector) { +this.columnVector = columnVector; +this.dictionaryVector = columnVector.getDictionaryVector(); +this.nullBitset = new BitSet(); + } + + @Override public void setNullBits(BitSet nullBits) { +this.nullBitset = nullBits; + } + + @Override public void putBoolean(int rowId, boolean value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putBoolean(rowId, value); +} + } + + @Override public void putFloat(int rowId, float value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putFloat(rowId, value); +} + } + + @Override public void putShort(int rowId, short value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putShort(rowId, value); +} + } + + @Override public void putShorts(int rowId, int count, short value) { +for (int i = 0; i < count; i++) { + if (nullBitset.get(rowId)) { +columnVector.putNull(rowId); + } else { +columnVector.putShort(rowId, value); + } + rowId++; +} + + } + + @Override public void putInt(int rowId, int value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putInt(rowId, value); +} + } + + @Override public void putInts(int rowId, int count, int value) { +columnVector.putInts(rowId, count, value); + } + + @Override public void putLong(int rowId, long value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putLong(rowId, value); +} + } + + @Override public void putLongs(int rowId, int count, long value) { +columnVector.putLongs(rowId, count, value); + } + + @Override public void putDecimal(int rowId, BigDecimal value, int precision) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putDecimal(rowId, value, precision); +} + } + + @Override public void putDecimals(int rowId, int count, BigDecimal value, int precision) { +for (int i = 0; i < count; i++) { + if (nullBitset.get(rowId)) { +columnVector.putNull(rowId); + } else { +columnVector.putDecimal(rowId, value, precision);
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240530928 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto; + +import java.math.BigDecimal; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.scan.result.vector.impl.directread.SequentialFill; +import org.apache.carbondata.core.scan.scanner.LazyPageLoader; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector,SequentialFill { + + + /** + * It is adapter class of complete ColumnarBatch. + */ + protected CarbonColumnVectorImpl columnVector; + + private DataType blockDataType; + + private CarbonColumnVector dictionaryVector; + + private BitSet nullBitset; + + ColumnarVectorWrapperDirect(CarbonColumnVectorImpl columnVector) { +this.columnVector = columnVector; +this.dictionaryVector = columnVector.getDictionaryVector(); +this.nullBitset = new BitSet(); + } + + @Override public void setNullBits(BitSet nullBits) { +this.nullBitset = nullBits; + } + + @Override public void putBoolean(int rowId, boolean value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putBoolean(rowId, value); +} + } + + @Override public void putFloat(int rowId, float value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putFloat(rowId, value); +} + } + + @Override public void putShort(int rowId, short value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putShort(rowId, value); +} + } + + @Override public void putShorts(int rowId, int count, short value) { +for (int i = 0; i < count; i++) { + if (nullBitset.get(rowId)) { +columnVector.putNull(rowId); + } else { +columnVector.putShort(rowId, value); + } + rowId++; +} + + } + + @Override public void putInt(int rowId, int value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putInt(rowId, value); +} + } + + @Override public void putInts(int rowId, int count, int value) { +columnVector.putInts(rowId, count, value); + } + + @Override public void putLong(int rowId, long value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putLong(rowId, value); +} + } + + @Override public void putLongs(int rowId, int count, long value) { +columnVector.putLongs(rowId, count, value); + } + + @Override public void putDecimal(int rowId, BigDecimal value, int precision) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putDecimal(rowId, value, precision); +} + } + + @Override public void putDecimals(int rowId, int count, BigDecimal value, int precision) { +for (int i = 0; i < count; i++) { + if (nullBitset.get(rowId)) { +columnVector.putNull(rowId); + } else { +columnVector.putDecimal(rowId, value, precision);
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240531041 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto; + +import java.math.BigDecimal; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.scan.result.vector.impl.directread.SequentialFill; +import org.apache.carbondata.core.scan.scanner.LazyPageLoader; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector,SequentialFill { + + + /** + * It is adapter class of complete ColumnarBatch. + */ + protected CarbonColumnVectorImpl columnVector; + + private DataType blockDataType; + + private CarbonColumnVector dictionaryVector; + + private BitSet nullBitset; + + ColumnarVectorWrapperDirect(CarbonColumnVectorImpl columnVector) { +this.columnVector = columnVector; +this.dictionaryVector = columnVector.getDictionaryVector(); +this.nullBitset = new BitSet(); + } + + @Override public void setNullBits(BitSet nullBits) { +this.nullBitset = nullBits; + } + + @Override public void putBoolean(int rowId, boolean value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putBoolean(rowId, value); +} + } + + @Override public void putFloat(int rowId, float value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putFloat(rowId, value); +} + } + + @Override public void putShort(int rowId, short value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putShort(rowId, value); +} + } + + @Override public void putShorts(int rowId, int count, short value) { +for (int i = 0; i < count; i++) { + if (nullBitset.get(rowId)) { +columnVector.putNull(rowId); + } else { +columnVector.putShort(rowId, value); + } + rowId++; +} + + } + + @Override public void putInt(int rowId, int value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putInt(rowId, value); +} + } + + @Override public void putInts(int rowId, int count, int value) { +columnVector.putInts(rowId, count, value); + } + + @Override public void putLong(int rowId, long value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putLong(rowId, value); +} + } + + @Override public void putLongs(int rowId, int count, long value) { +columnVector.putLongs(rowId, count, value); + } + + @Override public void putDecimal(int rowId, BigDecimal value, int precision) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putDecimal(rowId, value, precision); +} + } + + @Override public void putDecimals(int rowId, int count, BigDecimal value, int precision) { +for (int i = 0; i < count; i++) { + if (nullBitset.get(rowId)) { +columnVector.putNull(rowId); + } else { +columnVector.putDecimal(rowId, value, precision);
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240530272 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto; + +import java.math.BigDecimal; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.scan.result.vector.impl.directread.SequentialFill; +import org.apache.carbondata.core.scan.scanner.LazyPageLoader; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector,SequentialFill { + + + /** + * It is adapter class of complete ColumnarBatch. + */ + protected CarbonColumnVectorImpl columnVector; + + private DataType blockDataType; + + private CarbonColumnVector dictionaryVector; + + private BitSet nullBitset; + + ColumnarVectorWrapperDirect(CarbonColumnVectorImpl columnVector) { +this.columnVector = columnVector; +this.dictionaryVector = columnVector.getDictionaryVector(); +this.nullBitset = new BitSet(); + } + + @Override public void setNullBits(BitSet nullBits) { +this.nullBitset = nullBits; + } + + @Override public void putBoolean(int rowId, boolean value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putBoolean(rowId, value); +} + } + + @Override public void putFloat(int rowId, float value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putFloat(rowId, value); +} + } + + @Override public void putShort(int rowId, short value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putShort(rowId, value); +} + } + + @Override public void putShorts(int rowId, int count, short value) { +for (int i = 0; i < count; i++) { + if (nullBitset.get(rowId)) { +columnVector.putNull(rowId); + } else { +columnVector.putShort(rowId, value); + } + rowId++; +} + + } + + @Override public void putInt(int rowId, int value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putInt(rowId, value); +} + } + + @Override public void putInts(int rowId, int count, int value) { +columnVector.putInts(rowId, count, value); + } + + @Override public void putLong(int rowId, long value) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putLong(rowId, value); +} + } + + @Override public void putLongs(int rowId, int count, long value) { +columnVector.putLongs(rowId, count, value); + } + + @Override public void putDecimal(int rowId, BigDecimal value, int precision) { +if (nullBitset.get(rowId)) { + columnVector.putNull(rowId); +} else { + columnVector.putDecimal(rowId, value, precision); +} + } + + @Override public void putDecimals(int rowId, int count, BigDecimal value, int precision) { +for (int i = 0; i < count; i++) { + if (nullBitset.get(rowId)) { +columnVector.putNull(rowId); + } else { +columnVector.putDecimal(rowId, value, precision);
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240528243 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto; + +import java.math.BigDecimal; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.scan.result.vector.impl.directread.SequentialFill; +import org.apache.carbondata.core.scan.scanner.LazyPageLoader; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector,SequentialFill { + + --- End diff -- remove extra empty line ---
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240528010 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto; + +import java.math.BigDecimal; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.scan.result.vector.impl.directread.SequentialFill; +import org.apache.carbondata.core.scan.scanner.LazyPageLoader; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector,SequentialFill { + + + /** + * It is adapter class of complete ColumnarBatch. + */ + protected CarbonColumnVectorImpl columnVector; + + private DataType blockDataType; --- End diff -- it is column's datatype? ---
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240528169 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/ColumnarVectorWrapperDirect.java --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.presto; + +import java.math.BigDecimal; +import java.util.BitSet; + +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.scan.result.vector.impl.directread.SequentialFill; +import org.apache.carbondata.core.scan.scanner.LazyPageLoader; + +/** + * Fills the vector directly with out considering any deleted rows. + */ +class ColumnarVectorWrapperDirect implements CarbonColumnVector,SequentialFill { --- End diff -- add space after ',' ---
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240526032 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java --- @@ -367,7 +378,14 @@ public void setBlockDataType(DataType blockDataType) { } @Override public void setLazyPage(LazyPageLoader lazyPage) { -lazyPage.loadPage(); +this.lazyPage = lazyPage; + } + + public void loadPage() { +if (lazyPage != null) { + lazyPage.loadPage(); +} +loaded = true; --- End diff -- if lazyPath is null, still set to true? ---
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240524512 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -83,20 +84,37 @@ public StringVectorFiller(int numberOfRows, int actualDataLength) { @Override public void fillVector(byte[] data, CarbonColumnVector vector) { // start position will be used to store the current data position +boolean invertedIndex = vector instanceof ColumnarVectorWrapperDirectWithInvertedIndex +|| vector instanceof SequentialFill; + int localOffset = 0; ByteUtil.UnsafeComparer comparator = ByteUtil.UnsafeComparer.INSTANCE; -for (int i = 0; i < numberOfRows; i++) { - int length = (((data[localOffset] & 0xFF) << 8) | (data[localOffset + 1] & 0xFF)); - localOffset += 2; - if (comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 0, - CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY.length, data, localOffset, length)) { -vector.putNull(i); - } else { -vector.putArray(i, localOffset, length); +if (invertedIndex) { --- End diff -- can you add comment for this if-else branch ---
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240524219 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -83,20 +84,37 @@ public StringVectorFiller(int numberOfRows, int actualDataLength) { @Override public void fillVector(byte[] data, CarbonColumnVector vector) { // start position will be used to store the current data position +boolean invertedIndex = vector instanceof ColumnarVectorWrapperDirectWithInvertedIndex +|| vector instanceof SequentialFill; --- End diff -- It seems SequentialFill is not subclass of CarbonColumnVector? ---
[GitHub] carbondata pull request #2978: [CARBONDATA-3157] Added lazy load and direct ...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2978#discussion_r240519199 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/directread/SequentialFill.java --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.scan.result.vector.impl.directread; + +import java.util.BitSet; + +/** + * It is sort of a marker interface to let execution engine know that it is appendable/sequential + * data adding vector. It means we cannot add random rowids to it. + */ +public interface SequentialFill { --- End diff -- Please add @InterfaceAudience annotation ---
[GitHub] carbondata issue #2940: [CARBONDATA-3116] Support set carbon.query.directQue...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2940 LGTM ---
[GitHub] carbondata pull request #2941: [WIP] Support CarbonSession from pyspark shel...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2941#discussion_r236035510 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/__init__.py --- @@ -0,0 +1,6 @@ +from __future__ import absolute_import + + +__all__ = [ --- End diff -- I think it is better to add example and doc to describe how to use it in the same PR ---
[GitHub] carbondata pull request #2941: [WIP] Support CarbonSession from pyspark shel...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2941#discussion_r236035500 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.py --- @@ -0,0 +1,122 @@ +from __future__ import print_function +import sys +from threading import RLock + +if sys.version >= '3': +basestring = unicode = str +xrange = range +else: +from itertools import izip as zip, imap as map + +from pyspark.sql.conf import RuntimeConfig + +__all__ = ["CarbonSession"] + + +class CarbonSession(object): --- End diff -- please add comment to this file ---
[GitHub] carbondata pull request #2941: [WIP] Support CarbonSession from pyspark shel...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2941#discussion_r236035495 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.py --- @@ -0,0 +1,122 @@ +from __future__ import print_function --- End diff -- license header is required ---
[GitHub] carbondata pull request #2940: [CARBONDATA-3116] Support set carbon.query.di...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2940#discussion_r235494866 --- Diff: integration/spark2/pom.xml --- @@ -105,6 +105,11 @@ + + org.apache.httpcomponents --- End diff -- Is this required in this PR? ---
[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2936 please describe this PR ---
[GitHub] carbondata issue #2926: [HOTFIX] Reduce blocklet minimum configurable size
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2926 LGTM ---
[GitHub] carbondata issue #2932: [HOTFIX]Fix Describe Formatted Testcases
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2932 LGTM can be merged after CI passed ---
[GitHub] carbondata issue #2872: [WIP] Added reusable buffer code
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2872 LGTM please change the title to incluce jira ticket number ---
[GitHub] carbondata issue #2863: [WIP] Optimise decompressing while filling the vecto...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2863 LGTM ---
[GitHub] carbondata issue #2917: [WIP]Show load/insert/update/delete row number
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2917 This is a good feature, will it show the number of rows got deleted or updated? hope it can get into next version. ---
[GitHub] carbondata pull request #2926: [HOTFIX] Reduce blocklet minimum configurable...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2926#discussion_r234399060 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonV3DataFormatConstants.java --- @@ -35,9 +35,9 @@ String BLOCKLET_SIZE_IN_MB_DEFAULT_VALUE = "64"; /** - * blocklet group size min value + * blocklet size min value --- End diff -- since you are modifying this, can you please modify line 33, 27 also ---
[GitHub] carbondata issue #2926: [HOTFIX] Reduce blocklet minimum configurable size
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2926 `table_blocklet_size` is a table property that can be used in create table ---
[GitHub] carbondata pull request #2872: [WIP] Added reusable buffer code
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2872#discussion_r234209828 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/ReusableDataBuffer.java --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore; + +public class ReusableDataBuffer { + + private byte[] dataBuffer; + + private int size; + + public byte[] getDataBuffer(int requestedSize) { --- End diff -- add comment ---
[GitHub] carbondata pull request #2872: [WIP] Added reusable buffer code
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2872#discussion_r234209700 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/ReusableDataBuffer.java --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore; + +public class ReusableDataBuffer { --- End diff -- add interface annotation also ---
[GitHub] carbondata issue #2922: [WIP]s3 lock file fix
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2922 Is this for CARBONDATA-3103? ---
[GitHub] carbondata pull request #2920: [HOTFIX] Improve log message in CarbonWriterB...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2920#discussion_r233470872 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -438,13 +438,13 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { Objects.requireNonNull(path, "path should not be null"); if (this.writerType == null) { throw new IOException( - "Writer type is not set, use withCsvInput() or withAvroInput() or withJsonInput() " + "'writerType' must be set, use withCsvInput() or withAvroInput() or withJsonInput() " + "API based on input"); } if (this.writtenByApp == null || this.writtenByApp.isEmpty()) { throw new RuntimeException( - "AppName is not set, please use writtenBy() API to set the App Name" - + "which is using SDK"); + "'writtenBy' must be set when writting carbon files, use writtenBy() API to " + + "set it, it can be the application name which using the SDK"); --- End diff -- ```suggestion + "set it, it can be the name of the application which using the SDK"); ``` ---
[GitHub] carbondata pull request #2920: [HOTFIX] Improve log message in CarbonWriterB...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2920#discussion_r233437690 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -438,13 +438,13 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { Objects.requireNonNull(path, "path should not be null"); if (this.writerType == null) { throw new IOException( - "Writer type is not set, use withCsvInput() or withAvroInput() or withJsonInput() " + "'writerType' must be set, use withCsvInput() or withAvroInput() or withJsonInput() " + "API based on input"); } if (this.writtenByApp == null || this.writtenByApp.isEmpty()) { throw new RuntimeException( - "AppName is not set, please use writtenBy() API to set the App Name" - + "which is using SDK"); + "'writtenBy' must be set when writting carbon files, use writtenBy() API to " --- End diff -- ```suggestion "'writtenBy' must be set when writting carbon files, use writtenBy() API to " ``` ---
[GitHub] carbondata pull request #2920: [HOTFIX] Improve log message in CarbonWriterB...
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/2920 [HOTFIX] Improve log message in CarbonWriterBuilder In master the log message is not proper: `AppName is not set, please use writtenBy() API to set the App Namewhich is using SDK` This PR improves log message in CarbonWriterBuilder - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. rerun all test - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata patch-17 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2920.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2920 commit b7fd6a6a9469feb99fdbc6b82907bc5c57381792 Author: Jacky Li Date: 2018-11-14T12:54:33Z Update CarbonWriterBuilder.java ---
[GitHub] carbondata issue #2909: [CARBONDATA-3089] Change task distribution for NO_SO...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2909 Since it is using CSV split in this PR, there are more tasks and each task is smaller, thus we can get better parallelism and require less resource to run in each task ---
[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2863#discussion_r233341050 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java --- @@ -53,6 +53,12 @@ private DataType blockDataType; + private int[] lengths; --- End diff -- please add comment for these three newly added variables ---
[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2863#discussion_r233340004 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -224,130 +238,134 @@ public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorIn } } -private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, -DataType vectorDataType, DataType pageDataType, int pageSize, ColumnVectorInfo vectorInfo) { +private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vectorDataType, +DataType pageDataType, int pageSize, ColumnVectorInfo vectorInfo, BitSet nullBits) { + int rowId = 0; if (pageDataType == DataTypes.BOOLEAN || pageDataType == DataTypes.BYTE) { -byte[] byteData = columnPage.getBytePage(); if (vectorDataType == DataTypes.SHORT) { for (int i = 0; i < pageSize; i++) { -vector.putShort(i, (short) byteData[i]); +vector.putShort(i, (short) pageData[i]); } } else if (vectorDataType == DataTypes.INT) { for (int i = 0; i < pageSize; i++) { -vector.putInt(i, (int) byteData[i]); +vector.putInt(i, (int) pageData[i]); } } else if (vectorDataType == DataTypes.LONG) { for (int i = 0; i < pageSize; i++) { -vector.putLong(i, byteData[i]); +vector.putLong(i, pageData[i]); } } else if (vectorDataType == DataTypes.TIMESTAMP) { for (int i = 0; i < pageSize; i++) { -vector.putLong(i, (long) byteData[i] * 1000); +vector.putLong(i, (long) pageData[i] * 1000); } } else if (vectorDataType == DataTypes.BOOLEAN || vectorDataType == DataTypes.BYTE) { - vector.putBytes(0, pageSize, byteData, 0); + vector.putBytes(0, pageSize, pageData, 0); } else if (DataTypes.isDecimal(vectorDataType)) { DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; - decimalConverter.fillVector(byteData, pageSize, vectorInfo, columnPage.getNullBits()); + decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType); } else { for (int i = 0; i < pageSize; i++) { -vector.putDouble(i, byteData[i]); +vector.putDouble(i, pageData[i]); } } } else if (pageDataType == DataTypes.SHORT) { -short[] shortData = columnPage.getShortPage(); +int size = pageSize * DataTypes.SHORT.getSizeInBytes(); if (vectorDataType == DataTypes.SHORT) { - vector.putShorts(0, pageSize, shortData, 0); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putShort(rowId++, (ByteUtil.toShortLittleEndian(pageData, i))); + } } else if (vectorDataType == DataTypes.INT) { - for (int i = 0; i < pageSize; i++) { -vector.putInt(i, (int) shortData[i]); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putInt(rowId++, ByteUtil.toShortLittleEndian(pageData, i)); } } else if (vectorDataType == DataTypes.LONG) { - for (int i = 0; i < pageSize; i++) { -vector.putLong(i, shortData[i]); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putLong(rowId++, ByteUtil.toShortLittleEndian(pageData, i)); } } else if (vectorDataType == DataTypes.TIMESTAMP) { - for (int i = 0; i < pageSize; i++) { -vector.putLong(i, (long) shortData[i] * 1000); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putLong(rowId++, (long) ByteUtil.toShortLittleEndian(pageData, i) * 1000); } } else if (DataTypes.isDecimal(vectorDataType)) { DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; - decimalConverter.fillVector(shortData, pageSize, vectorInfo, columnPage.getNullBits()); + decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType); } else { - for (int i = 0; i < pageSize; i++) { -vector.putDouble(i, shortData[i]); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putDouble(rowId++,
[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2863#discussion_r29949 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java --- @@ -224,130 +238,134 @@ public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorIn } } -private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, -DataType vectorDataType, DataType pageDataType, int pageSize, ColumnVectorInfo vectorInfo) { +private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vectorDataType, +DataType pageDataType, int pageSize, ColumnVectorInfo vectorInfo, BitSet nullBits) { + int rowId = 0; if (pageDataType == DataTypes.BOOLEAN || pageDataType == DataTypes.BYTE) { -byte[] byteData = columnPage.getBytePage(); if (vectorDataType == DataTypes.SHORT) { for (int i = 0; i < pageSize; i++) { -vector.putShort(i, (short) byteData[i]); +vector.putShort(i, (short) pageData[i]); } } else if (vectorDataType == DataTypes.INT) { for (int i = 0; i < pageSize; i++) { -vector.putInt(i, (int) byteData[i]); +vector.putInt(i, (int) pageData[i]); } } else if (vectorDataType == DataTypes.LONG) { for (int i = 0; i < pageSize; i++) { -vector.putLong(i, byteData[i]); +vector.putLong(i, pageData[i]); } } else if (vectorDataType == DataTypes.TIMESTAMP) { for (int i = 0; i < pageSize; i++) { -vector.putLong(i, (long) byteData[i] * 1000); +vector.putLong(i, (long) pageData[i] * 1000); } } else if (vectorDataType == DataTypes.BOOLEAN || vectorDataType == DataTypes.BYTE) { - vector.putBytes(0, pageSize, byteData, 0); + vector.putBytes(0, pageSize, pageData, 0); } else if (DataTypes.isDecimal(vectorDataType)) { DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; - decimalConverter.fillVector(byteData, pageSize, vectorInfo, columnPage.getNullBits()); + decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType); } else { for (int i = 0; i < pageSize; i++) { -vector.putDouble(i, byteData[i]); +vector.putDouble(i, pageData[i]); } } } else if (pageDataType == DataTypes.SHORT) { -short[] shortData = columnPage.getShortPage(); +int size = pageSize * DataTypes.SHORT.getSizeInBytes(); if (vectorDataType == DataTypes.SHORT) { - vector.putShorts(0, pageSize, shortData, 0); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putShort(rowId++, (ByteUtil.toShortLittleEndian(pageData, i))); + } } else if (vectorDataType == DataTypes.INT) { - for (int i = 0; i < pageSize; i++) { -vector.putInt(i, (int) shortData[i]); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putInt(rowId++, ByteUtil.toShortLittleEndian(pageData, i)); } } else if (vectorDataType == DataTypes.LONG) { - for (int i = 0; i < pageSize; i++) { -vector.putLong(i, shortData[i]); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putLong(rowId++, ByteUtil.toShortLittleEndian(pageData, i)); } } else if (vectorDataType == DataTypes.TIMESTAMP) { - for (int i = 0; i < pageSize; i++) { -vector.putLong(i, (long) shortData[i] * 1000); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putLong(rowId++, (long) ByteUtil.toShortLittleEndian(pageData, i) * 1000); } } else if (DataTypes.isDecimal(vectorDataType)) { DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; - decimalConverter.fillVector(shortData, pageSize, vectorInfo, columnPage.getNullBits()); + decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType); } else { - for (int i = 0; i < pageSize; i++) { -vector.putDouble(i, shortData[i]); + for (int i = 0; i < size; i += DataTypes.SHORT.getSizeInBytes()) { +vector.putDouble(rowId++,
[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2863#discussion_r28236 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -2006,6 +2006,12 @@ private CarbonCommonConstants() { */ public static final String CARBON_WRITTEN_BY_APPNAME = "carbon.writtenby.app.name"; + /** + * When more global dictionary columns are there then there is issue in generating codegen to them --- End diff -- Is it only valid for table with global dictionary, or for normal table also? ---
[GitHub] carbondata pull request #2863: [WIP] Optimise decompressing while filling th...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2863#discussion_r27530 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPageValueConverter.java --- @@ -37,5 +40,6 @@ double decodeDouble(long value); double decodeDouble(float value); double decodeDouble(double value); - void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo); + void decodeAndFillVector(byte[] pageData, ColumnVectorInfo vectorInfo, BitSet nullBits, + DataType pageDataType, int pageSize); --- End diff -- can you provide comment for this func ---