[jira] [Created] (CARBONDATA-3435) Show Metacache on table displays different output in Driver and Index-Server cache.

2019-06-13 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3435:
-

 Summary: Show Metacache on table displays different output in 
Driver and Index-Server cache.
 Key: CARBONDATA-3435
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3435
 Project: CarbonData
  Issue Type: Improvement
Reporter: Naman Rastogi


Behaviour of Show-Metacache is not same for Driver and Index-Server. Need same 
behavior for both.

 

New behaviour: Don't show any entry with size 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3429) CarbonCli on wrong segment path wrong error message is displayed

2019-06-12 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3429:
-

 Summary: CarbonCli on wrong segment path wrong error message is 
displayed
 Key: CARBONDATA-3429
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3429
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi


User executes the cli command to view the sort columns present in table segment 
but inputs incorrect segment path.

The correct error message should be displayed stating that the "segment does 
not exist".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3420) Concurrent Datamap Creation is not synchronised across different sessions

2019-06-07 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3420:
-

 Summary: Concurrent Datamap Creation is not synchronised across 
different sessions
 Key: CARBONDATA-3420
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3420
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Reporter: Naman Rastogi


Create (preaggregate) datamap from two or more concurrent sessions. After this 
only one datamap creation is success, others failed. Now do {{SHOW DATAMAP ON 
TABLE tableName}}. Some sessions do not display the newly created datamap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3372) Migrate CarbonData to support PrestoSQL

2019-05-06 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3372:
-

 Summary: Migrate CarbonData to support PrestoSQL
 Key: CARBONDATA-3372
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3372
 Project: CarbonData
  Issue Type: New Feature
  Components: presto-integration
Affects Versions: 1.6.0
Reporter: Naman Rastogi


As we all know, *Presto Software Foundation (presto sql)* has been formed 
recently in Jan 2019. and is currently very active in taking and implementing 
many open source features like support for Hive 3.0, Hadoop 3.2.0. Support for 
Hive 3.1 is also in progress. 
Now, old Presto DB is only used by Facebook.


So, it is better for CarbonData to support Presto SQL instead of Presto DB.
Why Presto SQL?
https://www.starburstdata.com/technical-blog/the-presto-software-foundation/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3331) Database index size is more than overall index size in SHOW METADATA command

2019-03-26 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi updated CARBONDATA-3331:
--
Summary: Database index size is more than overall index size in SHOW 
METADATA command  (was: Database index size is more than overall index size)

> Database index size is more than overall index size in SHOW METADATA command
> 
>
> Key: CARBONDATA-3331
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3331
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Naman Rastogi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3331) Database index size is more than overall index size

2019-03-26 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3331:
-

 Summary: Database index size is more than overall index size
 Key: CARBONDATA-3331
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3331
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3323) Output is null when cache is empty

2019-03-22 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3323:
-

 Summary: Output is null when cache is empty
 Key: CARBONDATA-3323
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3323
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi


*Problem*:

When "SHOW METACACHE ON TABLE" is executed and carbonLRUCAche is null, output 
is empty sequence, which is not standard.

 

*Fix*:

Return standard output even when carbonLRUCache is not initalised (null) with 
size for index and dictionary as 0.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3322) After renaming table, "SHOW METACACHE ON TABLE" still works for old table

2019-03-22 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3322:
-

 Summary: After renaming table, "SHOW METACACHE ON TABLE" still 
works for old table
 Key: CARBONDATA-3322
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3322
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi


*Problem*:

After we alter table name from t1 to t2, "SHOW METACACHE ON TABLE" works for 
both old table name "t1" and new table name "t2"

 

*Fix*:

Added check for table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3318) Decoupling of Cache Commands

2019-03-20 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3318:
-

 Summary: Decoupling of Cache Commands
 Key: CARBONDATA-3318
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3318
 Project: CarbonData
  Issue Type: Improvement
Reporter: Naman Rastogi


Decoupling of CarbonDropCacheCommand and CarbonShowCacheCommands for Bloom 
filters and Pre-Aggregate tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3305) DDLs to Operate on CarbonLRUCache

2019-02-27 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3305:
-

 Summary: DDLs to Operate on CarbonLRUCache
 Key: CARBONDATA-3305
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3305
 Project: CarbonData
  Issue Type: New Feature
Reporter: Naman Rastogi


New DDLs
 # SHOW METACACHE
 # SHOW METACACHE FOR TABLE tableName
 # DROP METACACHE FOR TABLE tableName



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3284) Workaround for Create-PreAgg Datamap Fail

2019-01-30 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3284:
-

 Summary: Workaround for Create-PreAgg Datamap Fail
 Key: CARBONDATA-3284
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3284
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi
Assignee: Naman Rastogi


If for some reason^*[1]*^, creating PreAgg datamap failed and its dropping also 
failed.
Then dropping datamap also cannot be done, as the datamap was not registered to 
the parent table schema file, but got registered in spark-hive, so it shows it 
as a table, but won't let us drop it as carbon throws error if we try to drop 
it as a table.

Workaround:
After this change, we can at lease drop that as a hive folder by command

{{drop table table_datamap; }}

*[1]* - Reason could be something like setting HDFS Quota on database folder, 
so that parent table schema file cound not be modified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3274) On table with some SORT_COLUMNS and SORT_SCOPE not specified, SORT_SCOPE was not considering CARBON.OPTIONS.SORT.SCOPE for SORT_SCOPE

2019-01-25 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3274:
-

 Summary: On table with some SORT_COLUMNS and SORT_SCOPE not 
specified, SORT_SCOPE was not considering CARBON.OPTIONS.SORT.SCOPE for 
SORT_SCOPE
 Key: CARBONDATA-3274
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3274
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi
Assignee: Naman Rastogi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3273) For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT instead of NO_SORT

2019-01-25 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi updated CARBONDATA-3273:
--
Summary: For table without SORT_COLUMNS, Loading data is showing 
SORT_SCOPE=LOCAL_SORT instead of NO_SORT  (was: For table without SORT_COLUMNS, 
Loading data is showing SORT_SCOPE=LOCAL_SORT and not NO_SORT)

> For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT 
> instead of NO_SORT
> 
>
> Key: CARBONDATA-3273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3273
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Naman Rastogi
>Assignee: Naman Rastogi
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3273) For table without SORT_COLUMNS, Loading data is showing SORT_SCOPE=LOCAL_SORT and not NO_SORT

2019-01-25 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3273:
-

 Summary: For table without SORT_COLUMNS, Loading data is showing 
SORT_SCOPE=LOCAL_SORT and not NO_SORT
 Key: CARBONDATA-3273
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3273
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi
Assignee: Naman Rastogi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3264) Support SORT_SCOPE in ALTER TABLE SET Command

2019-01-21 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3264:
-

 Summary: Support SORT_SCOPE in ALTER TABLE SET Command
 Key: CARBONDATA-3264
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3264
 Project: CarbonData
  Issue Type: New Feature
Reporter: Naman Rastogi
Assignee: Naman Rastogi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3243) CarbonTable.getSortScope() is not considering session property CARBON.TABLE.LOAD.SORT.SCOPE

2019-01-10 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3243:
-

 Summary: CarbonTable.getSortScope() is not considering session 
property CARBON.TABLE.LOAD.SORT.SCOPE
 Key: CARBONDATA-3243
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3243
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi
Assignee: Naman Rastogi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3235) AlterTableRename and PreAgg Datamap Fail Issue

2019-01-08 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi updated CARBONDATA-3235:
--
Summary: AlterTableRename and PreAgg Datamap Fail Issue  (was: HDFS Quota 
Issue)

> AlterTableRename and PreAgg Datamap Fail Issue
> --
>
> Key: CARBONDATA-3235
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3235
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Naman Rastogi
>Assignee: Naman Rastogi
>Priority: Minor
>
> h3. Alter Table Rename Table Fail
>  * When table rename is success in hive, but failed in carbon data store, it 
> would throw exception, but would not go back and undo rename in hive.
> h3. Create-Preagregate-Datamap Fail
>  * When (preaggregate) datamap schema is written, but table updation is failed
> -> call CarbonDropDataMapCommand.processMetadata()
> -> call dropDataMapFromSystemFolder() -> this is supposed to delete the 
> folder on disk, but doesnt as the datamap is not yet updated in table, and 
> throws NoSuchDataMapException



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3235) HDFS Quota Issue

2019-01-08 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3235:
-

 Summary: HDFS Quota Issue
 Key: CARBONDATA-3235
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3235
 Project: CarbonData
  Issue Type: Bug
Reporter: Naman Rastogi
Assignee: Naman Rastogi


h3. Alter Table Rename Table Fail
 * When table rename is success in hive, but failed in carbon data store, it 
would throw exception, but would not go back and undo rename in hive.

h3. Create-Preagregate-Datamap Fail
 * When (preaggregate) datamap schema is written, but table updation is failed
-> call CarbonDropDataMapCommand.processMetadata()
-> call dropDataMapFromSystemFolder() -> this is supposed to delete the folder 
on disk, but doesnt as the datamap is not yet updated in table, and throws 
NoSuchDataMapException



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3201) SORT_SCOPE in LOAD_OPTIONS

2018-12-26 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi updated CARBONDATA-3201:
--
Description: 
Prerequisite: 
[CARBONDATA-3200|https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-3200]

 

If the Compaction always sort the data, then we can take advantage of the 
faster loading speed. If we provide SORT_COLUMNS in CREATE TABLE command, then 
we can load some data with SORT_SCOPE as NO_SORT. This helps in faster loading 
speed. But during off-peak time, user can COMPACT the data, and thus improving 
the subsequent query perfrmance.

 

  was:
Prerequisite: 
[CARBONDATA-3200|https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-3200]

 

If the Compaction always sort the data, then we can take advantage of the 
faster loading speed. If we provide SORT_COLUMNS in CREATE TABLE command, then 
we can load some data with SORT_SCOPE as NO_SORT. This helps in faster loading 
speed. But during off-peak time, user can COMPACT the data, and thus improving 
the subsequent query perfrmance.

 

PR Link: 
[Github/pull/3014|http://https://github.com/apache/carbondata/pull/3014]

 


> SORT_SCOPE in LOAD_OPTIONS
> --
>
> Key: CARBONDATA-3201
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3201
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Naman Rastogi
>Priority: Major
>
> Prerequisite: 
> [CARBONDATA-3200|https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-3200]
>  
> If the Compaction always sort the data, then we can take advantage of the 
> faster loading speed. If we provide SORT_COLUMNS in CREATE TABLE command, 
> then we can load some data with SORT_SCOPE as NO_SORT. This helps in faster 
> loading speed. But during off-peak time, user can COMPACT the data, and thus 
> improving the subsequent query perfrmance.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3201) SORT_SCOPE in LOAD_OPTIONS

2018-12-26 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3201:
-

 Summary: SORT_SCOPE in LOAD_OPTIONS
 Key: CARBONDATA-3201
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3201
 Project: CarbonData
  Issue Type: New Feature
Reporter: Naman Rastogi


Prerequisite: 
[CARBONDATA-3200|https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-3200]

 

If the Compaction always sort the data, then we can take advantage of the 
faster loading speed. If we provide SORT_COLUMNS in CREATE TABLE command, then 
we can load some data with SORT_SCOPE as NO_SORT. This helps in faster loading 
speed. But during off-peak time, user can COMPACT the data, and thus improving 
the subsequent query perfrmance.

 

PR Link: 
[Github/pull/3014|http://https://github.com/apache/carbondata/pull/3014]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3200) No-Sort Compaction

2018-12-26 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3200:
-

 Summary: No-Sort Compaction
 Key: CARBONDATA-3200
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3200
 Project: CarbonData
  Issue Type: New Feature
  Components: core
Reporter: Naman Rastogi
Assignee: Naman Rastogi


When the data is loaded with SORT_SCOPE as NO_SORT, and done compaction upon, 
the data still remains unsorted. This does not affect much in query. The major 
purpose of compaction, is better pack the data and improve query performance.

 

Now, the expected behaviour of compaction is sort to the data, so that after 
compaction, query performance becomes better. The columns to sort upon are 
provided by SORT_COLUMNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3121) CarbonReader build time is huge

2018-11-22 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3121:
-

 Summary: CarbonReader build time is huge
 Key: CARBONDATA-3121
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3121
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: Naman Rastogi
Assignee: Naman Rastogi


CarbonReader build is fetching data and triggering I/O operation instead of 
only initializing the iterator, thus large build time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3104) Extra Unnecessary Hadoop Conf is getting stored in LRU (~100K) for each LRU entry

2018-11-15 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3104:
-

 Summary: Extra Unnecessary Hadoop Conf is getting stored in LRU 
(~100K) for each LRU entry
 Key: CARBONDATA-3104
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3104
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Affects Versions: 1.5.1
Reporter: Naman Rastogi
Assignee: Naman Rastogi
 Fix For: 1.5.1






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3056) Implement concurrent reading through CarbonReader

2018-10-29 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi updated CARBONDATA-3056:
--
Summary: Implement concurrent reading through CarbonReader  (was: Implement 
Concurrent SDK Reader)

> Implement concurrent reading through CarbonReader
> -
>
> Key: CARBONDATA-3056
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3056
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Naman Rastogi
>Priority: Minor
>
> The current reading through SDK is slow as in CarbonReader, we are reading 
> the carbondata files sequentially, even though we have individual 
> CarbonRecordReader for each file. We can parallelize this by adding an API in 
> CarbonReader class
> *List readers = CarbonReader.split(numSplits)*
> which returns a list of CarbonReaders, which can be used to read parallelly, 
> as reading each file is independent of other files.
>  
> This enables the SDK user to read the files as it is, or in a multithreaded 
> environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3055) Improve SDK Reader Performance

2018-10-29 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi updated CARBONDATA-3055:
--
Summary: Improve SDK Reader Performance  (was: Improve CarbonReader 
performance)

> Improve SDK Reader Performance
> --
>
> Key: CARBONDATA-3055
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3055
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Naman Rastogi
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3057) Implement Vectorized CarbonReader for SDK

2018-10-29 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3057:
-

 Summary: Implement Vectorized CarbonReader for SDK
 Key: CARBONDATA-3057
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3057
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Naman Rastogi


Implement Vectorized Reader and expose a API for the user to switch
between CarbonReader/Vectorized reader. Additionally an API would be
provided for the user to extract the columnar batch instead of rows. This
would allow the user to have a deeper integration with carbon.
Additionally the reduction in method calls for vector reader would improve
the read time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3056) Implement Concurrent SDK Reader

2018-10-29 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3056:
-

 Summary: Implement Concurrent SDK Reader
 Key: CARBONDATA-3056
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3056
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Naman Rastogi


The current reading through SDK is slow as in CarbonReader, we are reading the 
carbondata files sequentially, even though we have individual 
CarbonRecordReader for each file. We can parallelize this by adding an API in 
CarbonReader class
*List readers = CarbonReader.split(numSplits)*
which returns a list of CarbonReaders, which can be used to read parallelly, as 
reading each file is independent of other files.
 
This enables the SDK user to read the files as it is, or in a multithreaded 
environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3055) Improve CarbonReader performance

2018-10-29 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-3055:
-

 Summary: Improve CarbonReader performance
 Key: CARBONDATA-3055
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3055
 Project: CarbonData
  Issue Type: Improvement
Reporter: Naman Rastogi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2959) Validation required when Long_string_columns are included for some TBL properties

2018-09-21 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi updated CARBONDATA-2959:
--
Description: 
Validation required when LONG_STRING_COLUMNS are included in some TBL 
properties like

DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and scenarios when 
duplicate columns, invalid columns and partition column are provided in 
long_string_columns while creating a table.

  was:
:Validation required when Long_string_columns are included in some TBL 
properties like dictionary_include,dictionary_exclude,no_inverted_index and 
scenarios when duplicate columns,invalid columns and partition column are 
provided in long_string_columns while creating table
【Precondition] :NA
【Test step】: 
CREATE TABLE lsc1(id int, name string, description string,address string, note 
string) using carbon options('long_string_columns'='note,note');

 CREATE TABLE lsc2(id int, name string, description string,address string, note 
string) using carbon options('long_string_columns'='');

CREATE TABLE lsc3(id int, name string, description string,address string, note 
string) using carbon options('long_string_columns'='abc');

CREATE TABLE lsc4(id int, name string, description string,address string, note 
string) using carbon options('long_string_columns'='id');

CREATE TABLE lsc5(id int, name string, description string,address string, note 
string) using carbon 
options('dictionary_include'='note','long_string_columns'='note,description');

CREATE TABLE lsc6(id int, name string, description string,address string, note 
string) using carbon 
options('dictionary_exclude'='note','long_string_columns'='note,description');

CREATE TABLE lsc8(id int, name string, description string,address string, note 
string) using carbon 
options('no_inverted_index'='note','long_string_columns'='note,description');

CREATE TABLE lcs9(id int,name string, description string,address string,note 
string) using carbon options('long_string_columns'='note,description') 
partitioned by (note);


> Validation required when Long_string_columns are included for some TBL 
> properties
> -
>
> Key: CARBONDATA-2959
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2959
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Naman Rastogi
>Assignee: Naman Rastogi
>Priority: Minor
>
> Validation required when LONG_STRING_COLUMNS are included in some TBL 
> properties like
> DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and scenarios when 
> duplicate columns, invalid columns and partition column are provided in 
> long_string_columns while creating a table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2959) Validation required when Long_string_columns are included for some TBL properties

2018-09-21 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi updated CARBONDATA-2959:
--
Description: Validation required when LONG_STRING_COLUMNS are included in 
some TBL properties like DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, 
NO_INVERTED_INDEX and scenarios when duplicate columns, invalid columns and 
partition column are provided in long_string_columns while creating a table.  
(was: Validation required when LONG_STRING_COLUMNS are included in some TBL 
properties like

DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and scenarios when 
duplicate columns, invalid columns and partition column are provided in 
long_string_columns while creating a table.)

> Validation required when Long_string_columns are included for some TBL 
> properties
> -
>
> Key: CARBONDATA-2959
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2959
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Naman Rastogi
>Assignee: Naman Rastogi
>Priority: Minor
>
> Validation required when LONG_STRING_COLUMNS are included in some TBL 
> properties like DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE, NO_INVERTED_INDEX and 
> scenarios when duplicate columns, invalid columns and partition column are 
> provided in long_string_columns while creating a table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2959) Validation required when Long_string_columns are included for some TBL properties

2018-09-21 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi reassigned CARBONDATA-2959:
-

Assignee: Naman Rastogi

> Validation required when Long_string_columns are included for some TBL 
> properties
> -
>
> Key: CARBONDATA-2959
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2959
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Naman Rastogi
>Assignee: Naman Rastogi
>Priority: Minor
>
> :Validation required when Long_string_columns are included in some TBL 
> properties like dictionary_include,dictionary_exclude,no_inverted_index and 
> scenarios when duplicate columns,invalid columns and partition column are 
> provided in long_string_columns while creating table
> 【Precondition] :NA
> 【Test step】: 
> CREATE TABLE lsc1(id int, name string, description string,address string, 
> note string) using carbon options('long_string_columns'='note,note');
>  CREATE TABLE lsc2(id int, name string, description string,address string, 
> note string) using carbon options('long_string_columns'='');
> CREATE TABLE lsc3(id int, name string, description string,address string, 
> note string) using carbon options('long_string_columns'='abc');
> CREATE TABLE lsc4(id int, name string, description string,address string, 
> note string) using carbon options('long_string_columns'='id');
> CREATE TABLE lsc5(id int, name string, description string,address string, 
> note string) using carbon 
> options('dictionary_include'='note','long_string_columns'='note,description');
> CREATE TABLE lsc6(id int, name string, description string,address string, 
> note string) using carbon 
> options('dictionary_exclude'='note','long_string_columns'='note,description');
> CREATE TABLE lsc8(id int, name string, description string,address string, 
> note string) using carbon 
> options('no_inverted_index'='note','long_string_columns'='note,description');
> CREATE TABLE lcs9(id int,name string, description string,address string,note 
> string) using carbon options('long_string_columns'='note,description') 
> partitioned by (note);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2959) Validation required when Long_string_columns are included for some TBL properties

2018-09-21 Thread Naman Rastogi (JIRA)
Naman Rastogi created CARBONDATA-2959:
-

 Summary: Validation required when Long_string_columns are included 
for some TBL properties
 Key: CARBONDATA-2959
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2959
 Project: CarbonData
  Issue Type: Improvement
Reporter: Naman Rastogi


:Validation required when Long_string_columns are included in some TBL 
properties like dictionary_include,dictionary_exclude,no_inverted_index and 
scenarios when duplicate columns,invalid columns and partition column are 
provided in long_string_columns while creating table
【Precondition] :NA
【Test step】: 
CREATE TABLE lsc1(id int, name string, description string,address string, note 
string) using carbon options('long_string_columns'='note,note');

 CREATE TABLE lsc2(id int, name string, description string,address string, note 
string) using carbon options('long_string_columns'='');

CREATE TABLE lsc3(id int, name string, description string,address string, note 
string) using carbon options('long_string_columns'='abc');

CREATE TABLE lsc4(id int, name string, description string,address string, note 
string) using carbon options('long_string_columns'='id');

CREATE TABLE lsc5(id int, name string, description string,address string, note 
string) using carbon 
options('dictionary_include'='note','long_string_columns'='note,description');

CREATE TABLE lsc6(id int, name string, description string,address string, note 
string) using carbon 
options('dictionary_exclude'='note','long_string_columns'='note,description');

CREATE TABLE lsc8(id int, name string, description string,address string, note 
string) using carbon 
options('no_inverted_index'='note','long_string_columns'='note,description');

CREATE TABLE lcs9(id int,name string, description string,address string,note 
string) using carbon options('long_string_columns'='note,description') 
partitioned by (note);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2877) CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit

2018-09-17 Thread Naman Rastogi (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617521#comment-16617521
 ] 

Naman Rastogi commented on CARBONDATA-2877:
---

Data loading from large files requires large "Unsafe Working Memory", a lot 
more than default 512 MB. So changing it something like 10GB should fix this 
problem. Please make this change, and it should work just fine.

> CarbonDataWriterException when loading data to carbon table with large number 
> of rows/columns from Spark-Submit
> ---
>
> Key: CARBONDATA-2877
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2877
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.4.1
> Environment: Spark 2.1
>Reporter: Chetan Bhat
>Assignee: Naman Rastogi
>Priority: Major
>
> Steps :
> from Spark-Submit. User creates a table with large number of columns(around 
> 100) and tries to load around 3 lakh records to the table.
> Spark-submit command - spark-submit --master yarn --num-executors 3 
> --executor-memory 75g --driver-memory 10g --executor-cores 12 --class
> Actual Issue : Data loading fails with CarbonDataWriterException.
> Executor yarn UI log-
> org.apache.spark.util.TaskCompletionListenerException: 
> org.apache.carbondata.core.datastore.exception.CarbonDataWriterException
> Previous exception in task: Error while initializing data handler : 
>  
> org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141)
>  
> org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.(NewCarbonDataLoadRDD.scala:221)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197)
>  org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  org.apache.spark.scheduler.Task.run(Task.scala:99)
>  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  java.lang.Thread.run(Thread.java:748)
>  at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
>  at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Expected : The dataloading should be successful from Spark-submit similar to 
> that in Beeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2877) CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit

2018-09-17 Thread Naman Rastogi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naman Rastogi reassigned CARBONDATA-2877:
-

Assignee: Naman Rastogi  (was: Brijoo Bopanna)

> CarbonDataWriterException when loading data to carbon table with large number 
> of rows/columns from Spark-Submit
> ---
>
> Key: CARBONDATA-2877
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2877
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.4.1
> Environment: Spark 2.1
>Reporter: Chetan Bhat
>Assignee: Naman Rastogi
>Priority: Major
>
> Steps :
> from Spark-Submit. User creates a table with large number of columns(around 
> 100) and tries to load around 3 lakh records to the table.
> Spark-submit command - spark-submit --master yarn --num-executors 3 
> --executor-memory 75g --driver-memory 10g --executor-cores 12 --class
> Actual Issue : Data loading fails with CarbonDataWriterException.
> Executor yarn UI log-
> org.apache.spark.util.TaskCompletionListenerException: 
> org.apache.carbondata.core.datastore.exception.CarbonDataWriterException
> Previous exception in task: Error while initializing data handler : 
>  
> org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141)
>  
> org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.(NewCarbonDataLoadRDD.scala:221)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197)
>  org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  org.apache.spark.scheduler.Task.run(Task.scala:99)
>  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  java.lang.Thread.run(Thread.java:748)
>  at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
>  at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Expected : The dataloading should be successful from Spark-submit similar to 
> that in Beeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)