[jira] [Created] (CARBONDATA-3435) Show Metacache on table displays different output in Driver and Index-Server cache.
Naman Rastogi created CARBONDATA-3435: - Summary: Show Metacache on table displays different output in Driver and Index-Server cache. Key: CARBONDATA-3435 URL: https://issues.apache.org/jira/browse/CARBONDATA-3435 Project: CarbonData Issue Type: Improvement Reporter: Naman Rastogi Behaviour of Show-Metacache is not same for Driver and Index-Server. Need same behavior for both. New behaviour: Don't show any entry with size 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3429) CarbonCli on wrong segment path wrong error message is displayed
Naman Rastogi created CARBONDATA-3429: - Summary: CarbonCli on wrong segment path wrong error message is displayed Key: CARBONDATA-3429 URL: https://issues.apache.org/jira/browse/CARBONDATA-3429 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi User executes the cli command to view the sort columns present in table segment but inputs incorrect segment path. The correct error message should be displayed stating that the "segment does not exist". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3420) Concurrent Datamap Creation is not synchronised across different sessions
Naman Rastogi created CARBONDATA-3420: - Summary: Concurrent Datamap Creation is not synchronised across different sessions Key: CARBONDATA-3420 URL: https://issues.apache.org/jira/browse/CARBONDATA-3420 Project: CarbonData Issue Type: Bug Components: spark-integration Reporter: Naman Rastogi Create (preaggregate) datamap from two or more concurrent sessions. After this only one datamap creation is success, others failed. Now do {{SHOW DATAMAP ON TABLE tableName}}. Some sessions do not display the newly created datamap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3372) Migrate CarbonData to support PrestoSQL
Naman Rastogi created CARBONDATA-3372: - Summary: Migrate CarbonData to support PrestoSQL Key: CARBONDATA-3372 URL: https://issues.apache.org/jira/browse/CARBONDATA-3372 Project: CarbonData Issue Type: New Feature Components: presto-integration Affects Versions: 1.6.0 Reporter: Naman Rastogi As we all know, *Presto Software Foundation (presto sql)* has been formed recently in Jan 2019. and is currently very active in taking and implementing many open source features like support for Hive 3.0, Hadoop 3.2.0. Support for Hive 3.1 is also in progress. Now, old Presto DB is only used by Facebook. So, it is better for CarbonData to support Presto SQL instead of Presto DB. Why Presto SQL? https://www.starburstdata.com/technical-blog/the-presto-software-foundation/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3331) Database index size is more than overall index size in SHOW METADATA command
[ https://issues.apache.org/jira/browse/CARBONDATA-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi updated CARBONDATA-3331: -- Summary: Database index size is more than overall index size in SHOW METADATA command (was: Database index size is more than overall index size) > Database index size is more than overall index size in SHOW METADATA command > > > Key: CARBONDATA-3331 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3331 > Project: CarbonData > Issue Type: Bug >Reporter: Naman Rastogi >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3331) Database index size is more than overall index size
Naman Rastogi created CARBONDATA-3331: - Summary: Database index size is more than overall index size Key: CARBONDATA-3331 URL: https://issues.apache.org/jira/browse/CARBONDATA-3331 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3323) Output is null when cache is empty
Naman Rastogi created CARBONDATA-3323: - Summary: Output is null when cache is empty Key: CARBONDATA-3323 URL: https://issues.apache.org/jira/browse/CARBONDATA-3323 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi *Problem*: When "SHOW METACACHE ON TABLE" is executed and carbonLRUCAche is null, output is empty sequence, which is not standard. *Fix*: Return standard output even when carbonLRUCache is not initalised (null) with size for index and dictionary as 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3322) After renaming table, "SHOW METACACHE ON TABLE" still works for old table
Naman Rastogi created CARBONDATA-3322: - Summary: After renaming table, "SHOW METACACHE ON TABLE" still works for old table Key: CARBONDATA-3322 URL: https://issues.apache.org/jira/browse/CARBONDATA-3322 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi *Problem*: After we alter table name from t1 to t2, "SHOW METACACHE ON TABLE" works for both old table name "t1" and new table name "t2" *Fix*: Added check for table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3318) Decoupling of Cache Commands
Naman Rastogi created CARBONDATA-3318: - Summary: Decoupling of Cache Commands Key: CARBONDATA-3318 URL: https://issues.apache.org/jira/browse/CARBONDATA-3318 Project: CarbonData Issue Type: Improvement Reporter: Naman Rastogi Decoupling of CarbonDropCacheCommand and CarbonShowCacheCommands for Bloom filters and Pre-Aggregate tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3305) DDLs to Operate on CarbonLRUCache
Naman Rastogi created CARBONDATA-3305: - Summary: DDLs to Operate on CarbonLRUCache Key: CARBONDATA-3305 URL: https://issues.apache.org/jira/browse/CARBONDATA-3305 Project: CarbonData Issue Type: New Feature Reporter: Naman Rastogi New DDLs # SHOW METACACHE # SHOW METACACHE FOR TABLE tableName # DROP METACACHE FOR TABLE tableName -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3284) Workaround for Create-PreAgg Datamap Fail
Naman Rastogi created CARBONDATA-3284: - Summary: Workaround for Create-PreAgg Datamap Fail Key: CARBONDATA-3284 URL: https://issues.apache.org/jira/browse/CARBONDATA-3284 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi Assignee: Naman Rastogi If for some reason^*[1]*^, creating PreAgg datamap failed and its dropping also failed. Then dropping datamap also cannot be done, as the datamap was not registered to the parent table schema file, but got registered in spark-hive, so it shows it as a table, but won't let us drop it as carbon throws error if we try to drop it as a table. Workaround: After this change, we can at lease drop that as a hive folder by command {{drop table table_datamap; }} *[1]* - Reason could be something like setting HDFS Quota on database folder, so that parent table schema file cound not be modified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3274) On table with some SORT_COLUMNS and SORT_SCOPE not specified, SORT_SCOPE was not considering CARBON.OPTIONS.SORT.SCOPE for SORT_SCOPE
Naman Rastogi created CARBONDATA-3274: - Summary: On table with some SORT_COLUMNS and SORT_SCOPE not specified, SORT_SCOPE was not considering CARBON.OPTIONS.SORT.SCOPE for SORT_SCOPE Key: CARBONDATA-3274 URL: https://issues.apache.org/jira/browse/CARBONDATA-3274 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi Assignee: Naman Rastogi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3273) For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT instead of NO_SORT
[ https://issues.apache.org/jira/browse/CARBONDATA-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi updated CARBONDATA-3273: -- Summary: For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT instead of NO_SORT (was: For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT and not NO_SORT) > For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT > instead of NO_SORT > > > Key: CARBONDATA-3273 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3273 > Project: CarbonData > Issue Type: Bug >Reporter: Naman Rastogi >Assignee: Naman Rastogi >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3273) For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT and not NO_SORT
Naman Rastogi created CARBONDATA-3273: - Summary: For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT and not NO_SORT Key: CARBONDATA-3273 URL: https://issues.apache.org/jira/browse/CARBONDATA-3273 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi Assignee: Naman Rastogi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3264) Support SORT_SCOPE in ALTER TABLE SET Command
Naman Rastogi created CARBONDATA-3264: - Summary: Support SORT_SCOPE in ALTER TABLE SET Command Key: CARBONDATA-3264 URL: https://issues.apache.org/jira/browse/CARBONDATA-3264 Project: CarbonData Issue Type: New Feature Reporter: Naman Rastogi Assignee: Naman Rastogi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3243) CarbonTable.getSortScope() is not considering session property CARBON.TABLE.LOAD.SORT.SCOPE
Naman Rastogi created CARBONDATA-3243: - Summary: CarbonTable.getSortScope() is not considering session property CARBON.TABLE.LOAD.SORT.SCOPE Key: CARBONDATA-3243 URL: https://issues.apache.org/jira/browse/CARBONDATA-3243 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi Assignee: Naman Rastogi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3235) AlterTableRename and PreAgg Datamap Fail Issue
[ https://issues.apache.org/jira/browse/CARBONDATA-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi updated CARBONDATA-3235: -- Summary: AlterTableRename and PreAgg Datamap Fail Issue (was: HDFS Quota Issue) > AlterTableRename and PreAgg Datamap Fail Issue > -- > > Key: CARBONDATA-3235 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3235 > Project: CarbonData > Issue Type: Bug >Reporter: Naman Rastogi >Assignee: Naman Rastogi >Priority: Minor > > h3. Alter Table Rename Table Fail > * When table rename is success in hive, but failed in carbon data store, it > would throw exception, but would not go back and undo rename in hive. > h3. Create-Preagregate-Datamap Fail > * When (preaggregate) datamap schema is written, but table updation is failed > -> call CarbonDropDataMapCommand.processMetadata() > -> call dropDataMapFromSystemFolder() -> this is supposed to delete the > folder on disk, but doesnt as the datamap is not yet updated in table, and > throws NoSuchDataMapException -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3235) HDFS Quota Issue
Naman Rastogi created CARBONDATA-3235: - Summary: HDFS Quota Issue Key: CARBONDATA-3235 URL: https://issues.apache.org/jira/browse/CARBONDATA-3235 Project: CarbonData Issue Type: Bug Reporter: Naman Rastogi Assignee: Naman Rastogi h3. Alter Table Rename Table Fail * When table rename is success in hive, but failed in carbon data store, it would throw exception, but would not go back and undo rename in hive. h3. Create-Preagregate-Datamap Fail * When (preaggregate) datamap schema is written, but table updation is failed -> call CarbonDropDataMapCommand.processMetadata() -> call dropDataMapFromSystemFolder() -> this is supposed to delete the folder on disk, but doesnt as the datamap is not yet updated in table, and throws NoSuchDataMapException -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3201) SORT_SCOPE in LOAD_OPTIONS
[ https://issues.apache.org/jira/browse/CARBONDATA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi updated CARBONDATA-3201: -- Description: Prerequisite: [CARBONDATA-3200|https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-3200] If the Compaction always sort the data, then we can take advantage of the faster loading speed. If we provide SORT_COLUMNS in CREATE TABLE command, then we can load some data with SORT_SCOPE as NO_SORT. This helps in faster loading speed. But during off-peak time, user can COMPACT the data, and thus improving the subsequent query perfrmance. was: Prerequisite: [CARBONDATA-3200|https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-3200] If the Compaction always sort the data, then we can take advantage of the faster loading speed. If we provide SORT_COLUMNS in CREATE TABLE command, then we can load some data with SORT_SCOPE as NO_SORT. This helps in faster loading speed. But during off-peak time, user can COMPACT the data, and thus improving the subsequent query perfrmance. PR Link: [Github/pull/3014|http://https://github.com/apache/carbondata/pull/3014] > SORT_SCOPE in LOAD_OPTIONS > -- > > Key: CARBONDATA-3201 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3201 > Project: CarbonData > Issue Type: New Feature >Reporter: Naman Rastogi >Priority: Major > > Prerequisite: > [CARBONDATA-3200|https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-3200] > > If the Compaction always sort the data, then we can take advantage of the > faster loading speed. If we provide SORT_COLUMNS in CREATE TABLE command, > then we can load some data with SORT_SCOPE as NO_SORT. This helps in faster > loading speed. But during off-peak time, user can COMPACT the data, and thus > improving the subsequent query perfrmance. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3201) SORT_SCOPE in LOAD_OPTIONS
Naman Rastogi created CARBONDATA-3201: - Summary: SORT_SCOPE in LOAD_OPTIONS Key: CARBONDATA-3201 URL: https://issues.apache.org/jira/browse/CARBONDATA-3201 Project: CarbonData Issue Type: New Feature Reporter: Naman Rastogi Prerequisite: [CARBONDATA-3200|https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-3200] If the Compaction always sort the data, then we can take advantage of the faster loading speed. If we provide SORT_COLUMNS in CREATE TABLE command, then we can load some data with SORT_SCOPE as NO_SORT. This helps in faster loading speed. But during off-peak time, user can COMPACT the data, and thus improving the subsequent query perfrmance. PR Link: [Github/pull/3014|http://https://github.com/apache/carbondata/pull/3014] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3200) No-Sort Compaction
Naman Rastogi created CARBONDATA-3200: - Summary: No-Sort Compaction Key: CARBONDATA-3200 URL: https://issues.apache.org/jira/browse/CARBONDATA-3200 Project: CarbonData Issue Type: New Feature Components: core Reporter: Naman Rastogi Assignee: Naman Rastogi When the data is loaded with SORT_SCOPE as NO_SORT, and done compaction upon, the data still remains unsorted. This does not affect much in query. The major purpose of compaction, is better pack the data and improve query performance. Now, the expected behaviour of compaction is sort to the data, so that after compaction, query performance becomes better. The columns to sort upon are provided by SORT_COLUMNS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3121) CarbonReader build time is huge
Naman Rastogi created CARBONDATA-3121: - Summary: CarbonReader build time is huge Key: CARBONDATA-3121 URL: https://issues.apache.org/jira/browse/CARBONDATA-3121 Project: CarbonData Issue Type: Improvement Components: core Reporter: Naman Rastogi Assignee: Naman Rastogi CarbonReader build is fetching data and triggering I/O operation instead of only initializing the iterator, thus large build time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3104) Extra Unnecessary Hadoop Conf is getting stored in LRU (~100K) for each LRU entry
Naman Rastogi created CARBONDATA-3104: - Summary: Extra Unnecessary Hadoop Conf is getting stored in LRU (~100K) for each LRU entry Key: CARBONDATA-3104 URL: https://issues.apache.org/jira/browse/CARBONDATA-3104 Project: CarbonData Issue Type: Improvement Components: core Affects Versions: 1.5.1 Reporter: Naman Rastogi Assignee: Naman Rastogi Fix For: 1.5.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3056) Implement concurrent reading through CarbonReader
[ https://issues.apache.org/jira/browse/CARBONDATA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi updated CARBONDATA-3056: -- Summary: Implement concurrent reading through CarbonReader (was: Implement Concurrent SDK Reader) > Implement concurrent reading through CarbonReader > - > > Key: CARBONDATA-3056 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3056 > Project: CarbonData > Issue Type: Sub-task >Reporter: Naman Rastogi >Priority: Minor > > The current reading through SDK is slow as in CarbonReader, we are reading > the carbondata files sequentially, even though we have individual > CarbonRecordReader for each file. We can parallelize this by adding an API in > CarbonReader class > *List readers = CarbonReader.split(numSplits)* > which returns a list of CarbonReaders, which can be used to read parallelly, > as reading each file is independent of other files. > > This enables the SDK user to read the files as it is, or in a multithreaded > environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3055) Improve SDK Reader Performance
[ https://issues.apache.org/jira/browse/CARBONDATA-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi updated CARBONDATA-3055: -- Summary: Improve SDK Reader Performance (was: Improve CarbonReader performance) > Improve SDK Reader Performance > -- > > Key: CARBONDATA-3055 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3055 > Project: CarbonData > Issue Type: Improvement >Reporter: Naman Rastogi >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3057) Implement Vectorized CarbonReader for SDK
Naman Rastogi created CARBONDATA-3057: - Summary: Implement Vectorized CarbonReader for SDK Key: CARBONDATA-3057 URL: https://issues.apache.org/jira/browse/CARBONDATA-3057 Project: CarbonData Issue Type: Sub-task Reporter: Naman Rastogi Implement Vectorized Reader and expose a API for the user to switch between CarbonReader/Vectorized reader. Additionally an API would be provided for the user to extract the columnar batch instead of rows. This would allow the user to have a deeper integration with carbon. Additionally the reduction in method calls for vector reader would improve the read time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3056) Implement Concurrent SDK Reader
Naman Rastogi created CARBONDATA-3056: - Summary: Implement Concurrent SDK Reader Key: CARBONDATA-3056 URL: https://issues.apache.org/jira/browse/CARBONDATA-3056 Project: CarbonData Issue Type: Sub-task Reporter: Naman Rastogi The current reading through SDK is slow as in CarbonReader, we are reading the carbondata files sequentially, even though we have individual CarbonRecordReader for each file. We can parallelize this by adding an API in CarbonReader class *List readers = CarbonReader.split(numSplits)* which returns a list of CarbonReaders, which can be used to read parallelly, as reading each file is independent of other files. This enables the SDK user to read the files as it is, or in a multithreaded environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3055) Improve CarbonReader performance
Naman Rastogi created CARBONDATA-3055: - Summary: Improve CarbonReader performance Key: CARBONDATA-3055 URL: https://issues.apache.org/jira/browse/CARBONDATA-3055 Project: CarbonData Issue Type: Improvement Reporter: Naman Rastogi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2959) Validation required when Long_string_columns are included for some TBL properties
[ https://issues.apache.org/jira/browse/CARBONDATA-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi updated CARBONDATA-2959: -- Description: Validation required when LONG_STRING_COLUMNS are included in some TBL properties like DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and scenarios when duplicate columns, invalid columns and partition column are provided in long_string_columns while creating a table. was: :Validation required when Long_string_columns are included in some TBL properties like dictionary_include,dictionary_exclude,no_inverted_index and scenarios when duplicate columns,invalid columns and partition column are provided in long_string_columns while creating table 【Precondition] :NA 【Test step】: CREATE TABLE lsc1(id int, name string, description string,address string, note string) using carbon options('long_string_columns'='note,note'); CREATE TABLE lsc2(id int, name string, description string,address string, note string) using carbon options('long_string_columns'=''); CREATE TABLE lsc3(id int, name string, description string,address string, note string) using carbon options('long_string_columns'='abc'); CREATE TABLE lsc4(id int, name string, description string,address string, note string) using carbon options('long_string_columns'='id'); CREATE TABLE lsc5(id int, name string, description string,address string, note string) using carbon options('dictionary_include'='note','long_string_columns'='note,description'); CREATE TABLE lsc6(id int, name string, description string,address string, note string) using carbon options('dictionary_exclude'='note','long_string_columns'='note,description'); CREATE TABLE lsc8(id int, name string, description string,address string, note string) using carbon options('no_inverted_index'='note','long_string_columns'='note,description'); CREATE TABLE lcs9(id int,name string, description string,address string,note string) using carbon options('long_string_columns'='note,description') partitioned by (note); > Validation required when Long_string_columns are included for some TBL > properties > - > > Key: CARBONDATA-2959 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2959 > Project: CarbonData > Issue Type: Improvement >Reporter: Naman Rastogi >Assignee: Naman Rastogi >Priority: Minor > > Validation required when LONG_STRING_COLUMNS are included in some TBL > properties like > DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and scenarios when > duplicate columns, invalid columns and partition column are provided in > long_string_columns while creating a table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2959) Validation required when Long_string_columns are included for some TBL properties
[ https://issues.apache.org/jira/browse/CARBONDATA-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi updated CARBONDATA-2959: -- Description: Validation required when LONG_STRING_COLUMNS are included in some TBL properties like DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and scenarios when duplicate columns, invalid columns and partition column are provided in long_string_columns while creating a table. (was: Validation required when LONG_STRING_COLUMNS are included in some TBL properties like DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and scenarios when duplicate columns, invalid columns and partition column are provided in long_string_columns while creating a table.) > Validation required when Long_string_columns are included for some TBL > properties > - > > Key: CARBONDATA-2959 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2959 > Project: CarbonData > Issue Type: Improvement >Reporter: Naman Rastogi >Assignee: Naman Rastogi >Priority: Minor > > Validation required when LONG_STRING_COLUMNS are included in some TBL > properties like DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and > scenarios when duplicate columns, invalid columns and partition column are > provided in long_string_columns while creating a table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2959) Validation required when Long_string_columns are included for some TBL properties
[ https://issues.apache.org/jira/browse/CARBONDATA-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi reassigned CARBONDATA-2959: - Assignee: Naman Rastogi > Validation required when Long_string_columns are included for some TBL > properties > - > > Key: CARBONDATA-2959 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2959 > Project: CarbonData > Issue Type: Improvement >Reporter: Naman Rastogi >Assignee: Naman Rastogi >Priority: Minor > > :Validation required when Long_string_columns are included in some TBL > properties like dictionary_include,dictionary_exclude,no_inverted_index and > scenarios when duplicate columns,invalid columns and partition column are > provided in long_string_columns while creating table > 【Precondition] :NA > 【Test step】: > CREATE TABLE lsc1(id int, name string, description string,address string, > note string) using carbon options('long_string_columns'='note,note'); > CREATE TABLE lsc2(id int, name string, description string,address string, > note string) using carbon options('long_string_columns'=''); > CREATE TABLE lsc3(id int, name string, description string,address string, > note string) using carbon options('long_string_columns'='abc'); > CREATE TABLE lsc4(id int, name string, description string,address string, > note string) using carbon options('long_string_columns'='id'); > CREATE TABLE lsc5(id int, name string, description string,address string, > note string) using carbon > options('dictionary_include'='note','long_string_columns'='note,description'); > CREATE TABLE lsc6(id int, name string, description string,address string, > note string) using carbon > options('dictionary_exclude'='note','long_string_columns'='note,description'); > CREATE TABLE lsc8(id int, name string, description string,address string, > note string) using carbon > options('no_inverted_index'='note','long_string_columns'='note,description'); > CREATE TABLE lcs9(id int,name string, description string,address string,note > string) using carbon options('long_string_columns'='note,description') > partitioned by (note); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2959) Validation required when Long_string_columns are included for some TBL properties
Naman Rastogi created CARBONDATA-2959: - Summary: Validation required when Long_string_columns are included for some TBL properties Key: CARBONDATA-2959 URL: https://issues.apache.org/jira/browse/CARBONDATA-2959 Project: CarbonData Issue Type: Improvement Reporter: Naman Rastogi :Validation required when Long_string_columns are included in some TBL properties like dictionary_include,dictionary_exclude,no_inverted_index and scenarios when duplicate columns,invalid columns and partition column are provided in long_string_columns while creating table 【Precondition] :NA 【Test step】: CREATE TABLE lsc1(id int, name string, description string,address string, note string) using carbon options('long_string_columns'='note,note'); CREATE TABLE lsc2(id int, name string, description string,address string, note string) using carbon options('long_string_columns'=''); CREATE TABLE lsc3(id int, name string, description string,address string, note string) using carbon options('long_string_columns'='abc'); CREATE TABLE lsc4(id int, name string, description string,address string, note string) using carbon options('long_string_columns'='id'); CREATE TABLE lsc5(id int, name string, description string,address string, note string) using carbon options('dictionary_include'='note','long_string_columns'='note,description'); CREATE TABLE lsc6(id int, name string, description string,address string, note string) using carbon options('dictionary_exclude'='note','long_string_columns'='note,description'); CREATE TABLE lsc8(id int, name string, description string,address string, note string) using carbon options('no_inverted_index'='note','long_string_columns'='note,description'); CREATE TABLE lcs9(id int,name string, description string,address string,note string) using carbon options('long_string_columns'='note,description') partitioned by (note); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2877) CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit
[ https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617521#comment-16617521 ] Naman Rastogi commented on CARBONDATA-2877: --- Data loading from large files requires large "Unsafe Working Memory", a lot more than default 512 MB. So changing it something like 10GB should fix this problem. Please make this change, and it should work just fine. > CarbonDataWriterException when loading data to carbon table with large number > of rows/columns from Spark-Submit > --- > > Key: CARBONDATA-2877 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2877 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.4.1 > Environment: Spark 2.1 >Reporter: Chetan Bhat >Assignee: Naman Rastogi >Priority: Major > > Steps : > from Spark-Submit. User creates a table with large number of columns(around > 100) and tries to load around 3 lakh records to the table. > Spark-submit command - spark-submit --master yarn --num-executors 3 > --executor-memory 75g --driver-memory 10g --executor-cores 12 --class > Actual Issue : Data loading fails with CarbonDataWriterException. > Executor yarn UI log- > org.apache.spark.util.TaskCompletionListenerException: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > Previous exception in task: Error while initializing data handler : > > org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141) > > org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51) > > org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.(NewCarbonDataLoadRDD.scala:221) > > org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197) > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:99) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Expected : The dataloading should be successful from Spark-submit similar to > that in Beeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2877) CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit
[ https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naman Rastogi reassigned CARBONDATA-2877: - Assignee: Naman Rastogi (was: Brijoo Bopanna) > CarbonDataWriterException when loading data to carbon table with large number > of rows/columns from Spark-Submit > --- > > Key: CARBONDATA-2877 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2877 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.4.1 > Environment: Spark 2.1 >Reporter: Chetan Bhat >Assignee: Naman Rastogi >Priority: Major > > Steps : > from Spark-Submit. User creates a table with large number of columns(around > 100) and tries to load around 3 lakh records to the table. > Spark-submit command - spark-submit --master yarn --num-executors 3 > --executor-memory 75g --driver-memory 10g --executor-cores 12 --class > Actual Issue : Data loading fails with CarbonDataWriterException. > Executor yarn UI log- > org.apache.spark.util.TaskCompletionListenerException: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > Previous exception in task: Error while initializing data handler : > > org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141) > > org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51) > > org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.(NewCarbonDataLoadRDD.scala:221) > > org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197) > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:99) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Expected : The dataloading should be successful from Spark-submit similar to > that in Beeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)