[jira] [Created] (CARBONDATA-3992) Drop Index is throwing null pointer exception.

2020-09-16 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-3992:


 Summary: Drop Index is throwing null pointer exception.
 Key: CARBONDATA-3992
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3992
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Index server set to true but index server is not running.
Create an index as 'carbondata' and try to drop the index -> throwing null 
pointer exception.

IndexStoreMandaer.Java -> line 98



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3912) Clean file requests are failing in case of multiple load due to concurrent locking.

2020-10-06 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha resolved CARBONDATA-3912.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

This issue was handled in 
PR: https://github.com/apache/carbondata/pull/3871

> Clean file requests are failing in case of multiple load due to concurrent 
> locking.
> ---
>
> Key: CARBONDATA-3912
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3912
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> If multiple loads are fired at the same time then clean file requests are 
> failing due to failing in lock acquiring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3806) Create bloom datamap fails with null pointer exception

2020-10-06 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208575#comment-17208575
 ] 

Nihal kumar ojha commented on CARBONDATA-3806:
--

This issue was handled in
PR: https://github.com/apache/carbondata/pull/3775

> Create bloom datamap fails with null pointer exception
> --
>
> Key: CARBONDATA-3806
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3806
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.6.1
> Environment: Spark 2.3.2
>Reporter: Chetan Bhat
>Priority: Major
>
> Create bloom datamap fails with null pointer exception
> create table brinjal_bloom (imei string,AMSize string,channelsId 
> string,ActiveCountry string, Activecity string,gamePointId 
> double,deviceInformationId double,productionDate Timestamp,deliveryDate 
> timestamp,deliverycharge double) STORED BY 'carbondata' 
> TBLPROPERTIES('table_blocksize'='1');
> LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE 
> brinjal_bloom OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
> '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
> 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
> 0: jdbc:hive2://10.20.255.171:23040/default> CREATE DATAMAP dm_brinjal4 ON 
> TABLE brinjal_bloom USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 
> 'AMSize', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 210.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 210.0 (TID 1477, vm2, executor 2): java.lang.NullPointerException
>  at 
> org.apache.carbondata.core.datamap.Segment.getCommittedIndexFile(Segment.java:150)
>  at 
> org.apache.carbondata.core.util.BlockletDataMapUtil.getTableBlockUniqueIdentifiers(BlockletDataMapUtil.java:198)
>  at 
> org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getTableBlockIndexUniqueIdentifiers(BlockletDataMapFactory.java:176)
>  at 
> org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getDataMaps(BlockletDataMapFactory.java:154)
>  at 
> org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:425)
>  at 
> org.apache.carbondata.datamap.IndexDataMapRebuildRDD.internalCompute(IndexDataMapRebuildRDD.scala:359)
>  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:84)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3880) How to start JDBC service in distributed index

2020-10-14 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213825#comment-17213825
 ] 

Nihal kumar ojha commented on CARBONDATA-3880:
--

Hi, please follow the below steps to configure the distributed index server 
with JDBC.

1. Add these properties in spark-defaults.conf
 spark.yarn.keytab=
 spark.carbon.indexserver.keytab=
 spark.carbon.indexserver.principal=spark2x/hadoop.hadoop@hadoop.com
 spark.yarn.principal=spark2x/hadoop.hadoop@hadoop.com

2. Add following configuration in carbon.properties(Ensure the 
carbon.properties is configured in spark-defaults.conf in driver extra java 
option)
 carbon.enable.index.server=true
 carbon.indexserver.enable.prepriming=true
 carbon.indexserver.HA.enabled=true
 carbon.max.executor.lru.cache.size=-1
 carbon.disable.index.server.fallback=false
 carbon.indexserver.zookeeper.dir=/indexserver2x
 carbon.index.server.port=

Then run below spark-submit command at $spark_home location
bin/spark-submit  --num-executors 2 --master yarn --class 
org.apache.carbondata.indexserver.indexserver  


then start spark JDBCserver as usual.
Queries should reflect in yarn UI, in the index server and spark JDBC 
application.
 

>  How to start JDBC service in distributed index
> ---
>
> Key: CARBONDATA-3880
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3880
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: li
>Priority: Major
> Fix For: 2.1.0
>
>
> How to start JDBC service in distributed index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3892) An exception occurred when modifying the table name using SparkSession

2020-10-14 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213785#comment-17213785
 ] 

Nihal kumar ojha commented on CARBONDATA-3892:
--

Hi, I was trying to reproduce this issue but not getting reproduced.
I am using the query "ALTER TABLE oldTable RENAME to newTable".
please correct me if I am wrong.
Or if there is some other configuration then please add here.

> An exception occurred when modifying the table name using SparkSession
> --
>
> Key: CARBONDATA-3892
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3892
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: li
>Priority: Blocker
>
> Exception in thread "main" java.lang.LinkageError: ClassCastException: 
> attempting to 
> castjar:file:/usr/hdp/2.6.5.0-292/spark2/carbonlib/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/usr/hdp/2.6.5.0-292/spark2/carbonlib/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar!/javax/ws/rs/ext/RuntimeDelegate.class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3964) Select * from table or select count(*) without filter is throwing null pointer exception.

2020-08-27 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-3964:
-
Priority: Minor  (was: Major)

> Select * from table or select count(*) without filter is throwing null 
> pointer exception.
> -
>
> Key: CARBONDATA-3964
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3964
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce.
> 1. Create a table.
> 2. Load around 500 segments and more than 1 million records.
> 3. Running query select(*) or select count(*) without filter is throwing null 
> pointer exception.
> File: TableIndex.java
> Method: pruneWithMultiThread
> line: 447
> Reason: filter.getresolver() is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3964) Select * from table or select count(*) without filter is throwing null pointer exception.

2020-08-27 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-3964:


 Summary: Select * from table or select count(*) without filter is 
throwing null pointer exception.
 Key: CARBONDATA-3964
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3964
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Steps to reproduce.
1. Create a table.
2. Load around 500 segments and more than 1 million records.
3. Running query select(*) or select count(*) without filter is throwing null 
pointer exception.

File: TableIndex.java
Method: pruneWithMultiThread
line: 447
Reason: filter.getresolver() is null.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3992) Drop Index is throwing null pointer exception.

2020-10-01 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha resolved CARBONDATA-3992.
--
Resolution: Fixed

Fixed in PR:
https://github.com/apache/carbondata/pull/3928

> Drop Index is throwing null pointer exception.
> --
>
> Key: CARBONDATA-3992
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3992
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Index server set to true but index server is not running.
> Create an index as 'carbondata' and try to drop the index -> throwing null 
> pointer exception.
> IndexStoreMandaer.Java -> line 98



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3912) Clean file requests are failing in case of multiple load due to concurrent locking.

2020-07-17 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-3912:


 Summary: Clean file requests are failing in case of multiple load 
due to concurrent locking.
 Key: CARBONDATA-3912
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3912
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


If multiple loads are fired at the same time then clean file requests are 
failing due to failing in lock acquiring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3947) Insert Into Select Operation is throwing exception for hive read/write operation in carbon.

2020-08-09 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-3947:


 Summary: Insert Into Select Operation is throwing exception for 
hive read/write operation in carbon.
 Key: CARBONDATA-3947
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3947
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


CREATE TABLE hive_carbon_table1(id INT, name STRING, scale DECIMAL, country 
STRING, salary DOUBLE) stored by 
'org.apache.carbondata.hive.CarbonStorageHandler';

INSERT into hive_carbon_table1 SELECT 1, 'RAM', '2.3', 'INDIA', 3500";
   
CREATE TABLE hive_carbon_table2(id INT, name STRING, scale DECIMAL, country 
STRING, salary DOUBLE) stored by 
'org.apache.carbondata.hive.CarbonStorageHandler';

INSERT into hive_carbon_table2 SELECT * FROM hive_carbon_table1";   -> Throwing 
exception as "CarbonData file is not present in the table location"




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3855) Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.

2020-07-08 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-3855:
-
Attachment: (was: CarbonData SDK support load from file.pdf)

> Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.
> --
>
> Key: CARBONDATA-3855
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3855
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Nihal kumar ojha
>Priority: Major
> Attachments: CarbonData SDK support load from file.pdf
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Please find the solution document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3855) Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.

2020-07-08 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-3855:
-
Attachment: CarbonData SDK support load from file.pdf

> Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.
> --
>
> Key: CARBONDATA-3855
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3855
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Nihal kumar ojha
>Priority: Major
> Attachments: CarbonData SDK support load from file.pdf
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Please find the solution document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3855) Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.

2020-06-24 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-3855:
-
Attachment: CarbonData SDK support load from file.pdf

> Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.
> --
>
> Key: CARBONDATA-3855
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3855
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Nihal kumar ojha
>Priority: Major
> Attachments: CarbonData SDK support load from file.pdf
>
>
> Please find the solution document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3855) Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.

2020-06-24 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-3855:
-
Attachment: (was: CarbonData SDK support load from file .pdf)

> Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.
> --
>
> Key: CARBONDATA-3855
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3855
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Nihal kumar ojha
>Priority: Major
>
> Please find the solution document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3855) Support Carbondata SDK to load data from parquet, ORC, CSV, Avro and JSON.

2020-06-12 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-3855:


 Summary: Support Carbondata SDK to load data from parquet, ORC, 
CSV, Avro and JSON.
 Key: CARBONDATA-3855
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3855
 Project: CarbonData
  Issue Type: New Feature
Reporter: Nihal kumar ojha
 Attachments: CarbonData SDK support load from file .pdf

Please find the solution document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3928) Handle the Strings which length is greater than 32000 as a bad record.

2020-07-27 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-3928:


 Summary: Handle the Strings which length is greater than 32000 as 
a bad record.
 Key: CARBONDATA-3928
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3928
 Project: CarbonData
  Issue Type: Task
Reporter: Nihal kumar ojha


Currently, when the string length exceeds 32000 then the load is failed.
Suggestion:
1. Bad record can handle string length greater than 32000 and load should not 
be failed because only a few records string length is greater than 32000.
2. Include some more information in the log message like which record and 
column have the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4102) Add UT and FT to improve coverage of SI module.

2021-01-07 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4102:


 Summary: Add UT and FT to improve coverage of SI module.
 Key: CARBONDATA-4102
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4102
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Add UT and FT to improve coverage of SI module and also remove dead or unused 
code if exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4059) Block compaction on SI table.

2020-11-26 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4059:


 Summary: Block compaction on SI table.
 Key: CARBONDATA-4059
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4059
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Currently compaction is allowed on SI table. Because of this if only SI table 
is compacted then running filter query query on main table is causing more data 
scan of SI table which will causing performance degradation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4070) Handle the scenario mentioned in description for SI.

2020-12-03 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4070:


 Summary: Handle the scenario mentioned in description for SI.
 Key: CARBONDATA-4070
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4070
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


# SI creation should not be allowed on SI table.
 # SI table should not be scanned with like filter on MT.
 # Drop column should not be allowed on SI table.

Add the FT for all above scenario and sort column related scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4068) Alter table set long string should not allowed on SI column.

2020-12-03 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4068:


 Summary: Alter table set long string should not allowed on SI 
column.
 Key: CARBONDATA-4068
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4068
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


# Create table and create SI.
 # Now try to set the column data type to long string on which SI is created.

Operation should not be allowed because we don't support SI on long string.

create table maintable (a string,b string,c int) STORED AS carbondata;

create index indextable on table maintable(b) AS 'carbondata';

insert into maintable values('k','x',2);

ALTER TABLE maintable SET TBLPROPERTIES('long_String_columns'='b');



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4069) Alter table set streaming=true should not be allowed on SI table or table having SI.

2020-12-03 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4069:


 Summary: Alter table set streaming=true should not be allowed on 
SI table or table having SI.
 Key: CARBONDATA-4069
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4069
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


# Create carbon table and SI .
 # Now set streaming = true on either SI table or main table.

Both the operation should not be allowed because SI is not supported on 
streaming table.

 

create table maintable2 (a string,b string,c int) STORED AS carbondata;

insert into maintable2 values('k','x',2);

create index m_indextable on table maintable2(b) AS 'carbondata';

ALTER TABLE maintable2 SET TBLPROPERTIES('streaming'='true');  => operation 
should not be allowed.

ALTER TABLE m_indextable SET TBLPROPERTIES('streaming'='true') => operation 
should not be allowed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4052) Select query on SI table after insert overwrite is giving wrong result.

2020-11-22 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4052:


 Summary: Select query on SI table after insert overwrite is giving 
wrong result.
 Key: CARBONDATA-4052
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4052
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


# Create carbon table.
 # Create SI table on the same carbon table.
 # Do load or insert operation.
 # Run query insert overwrite on maintable.
 # Now select query on SI table is showing old as well as new data which should 
be only new data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4046) Select count(*) fails on partition table.

2020-11-02 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4046:


 Summary: Select count(*) fails on partition table.
 Key: CARBONDATA-4046
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4046
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Steps to reproduce

1. set property `carbon.read.partition.hive.direct=false`

2. Create table which contain more than one partition column.

3. run query select count (*)

 

It fails with exception as `Key not found`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4046) Select count(*) fails on partition table.

2020-11-02 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-4046:
-
Description: 
Steps to reproduce

1. set property `carbon.read.partition.hive.direct=false`

2. Create table which contain more than one partition column.

3. run query select count (*)

 

It fails with exception as `Key not found`.

 

create table partition_cache(a string) partitioned by(b int, c String) stored 
as carbondata;

insert into partition_cache select 'k',1,'nihal';

select count(*) from partition_cache where b = 1;

  was:
Steps to reproduce

1. set property `carbon.read.partition.hive.direct=false`

2. Create table which contain more than one partition column.

3. run query select count (*)

 

It fails with exception as `Key not found`.


> Select count(*) fails on partition table.
> -
>
> Key: CARBONDATA-4046
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4046
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
>
> Steps to reproduce
> 1. set property `carbon.read.partition.hive.direct=false`
> 2. Create table which contain more than one partition column.
> 3. run query select count (*)
>  
> It fails with exception as `Key not found`.
>  
> create table partition_cache(a string) partitioned by(b int, c String) stored 
> as carbondata;
> insert into partition_cache select 'k',1,'nihal';
> select count(*) from partition_cache where b = 1;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4101) Carbondata Connectivity via JDBC driver

2021-01-18 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267164#comment-17267164
 ] 

Nihal kumar ojha commented on CARBONDATA-4101:
--

Hi Rohit,

    We can connect carbondata using the JDBC connector.

Please follow [https://carbondata.apache.org/quick-start-guide.html] to 
understand the integration of carbondata with different engines like spark, 
presto, hive, flink, and let us know if you have any dought.

 

Regards,

Nihal kumar ojha

> Carbondata Connectivity via JDBC driver
> ---
>
> Key: CARBONDATA-4101
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4101
> Project: CarbonData
>  Issue Type: Task
>  Components: other
>Reporter: Rohit Paranjape
>Priority: Blocker
>
> Hello Team,
>  
> We are working on one POC in which we wanted to connect to carbondata via our 
> third party application using JDBC connector.
>  
> Can we connect to carbondata using JDBC ? If yes, what would be the procedure 
> to do the same and if not, then what would be the possible options to connect 
> to carbondata using 
> third party application.
>  
> Please share your inputs on the same.
>  
> Thanks & Regards,
> Rohit Paranjape



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4114) Select query is returning empty result when carbon.read.partition.hive.direct = false

2021-01-29 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4114:


 Summary: Select query is returning empty result when 
carbon.read.partition.hive.direct = false
 Key: CARBONDATA-4114
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4114
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Currently when {{carbon.read.partition.hive.direct = false}} then select query 
with load command for the CSV file which contain multiple rows is returning 
empty result.

 

set carbon.read.partition.hive.direct=false;

drop table if exists sourceTable;
CREATE TABLE sourceTable (empno int, empname String, designation String, doj 
Timestamp, workgroupcategory int, workgroupcategoryname String, deptno int, 
deptname String, projectcode int, projectjoindate Timestamp, projectenddate 
Timestamp) partitioned by(attendance int, utilization int, salary int) STORED 
AS carbondata;

LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE sourceTable 
OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"');
select * from sourceTable;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4114) Select query is returning empty result when carbon.read.partition.hive.direct = false

2021-06-10 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360633#comment-17360633
 ] 

Nihal kumar ojha commented on CARBONDATA-4114:
--

Duplicate of  

CARBONDATA-4113

> Select query is returning empty result when carbon.read.partition.hive.direct 
> = false
> -
>
> Key: CARBONDATA-4114
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4114
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently when {{carbon.read.partition.hive.direct = false}} then select 
> query with load command for the CSV file which contain multiple rows is 
> returning empty result.
>  
> set carbon.read.partition.hive.direct=false;
> drop table if exists sourceTable;
> CREATE TABLE sourceTable (empno int, empname String, designation String, doj 
> Timestamp, workgroupcategory int, workgroupcategoryname String, deptno int, 
> deptname String, projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp) partitioned by(attendance int, utilization int, salary int) STORED 
> AS carbondata;
> LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE sourceTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"');
> select * from sourceTable;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4114) Select query is returning empty result when carbon.read.partition.hive.direct = false

2021-06-10 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha closed CARBONDATA-4114.

Resolution: Duplicate

> Select query is returning empty result when carbon.read.partition.hive.direct 
> = false
> -
>
> Key: CARBONDATA-4114
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4114
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently when {{carbon.read.partition.hive.direct = false}} then select 
> query with load command for the CSV file which contain multiple rows is 
> returning empty result.
>  
> set carbon.read.partition.hive.direct=false;
> drop table if exists sourceTable;
> CREATE TABLE sourceTable (empno int, empname String, designation String, doj 
> Timestamp, workgroupcategory int, workgroupcategoryname String, deptno int, 
> deptname String, projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp) partitioned by(attendance int, utilization int, salary int) STORED 
> AS carbondata;
> LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE sourceTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"');
> select * from sourceTable;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4188) Select query fails for longstring data with small table page size after alter add columns

2021-05-17 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4188:


 Summary: Select query fails for longstring data with small table 
page size after alter add columns
 Key: CARBONDATA-4188
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4188
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Steps to reproduce:
 # Create table with small page size and longstring data type.
 # Load large amount of data(more than one page should be created.)
 # Alter add int column on the same table.
 # Select query with filter on newly added columns fails with 
ArrayIndexOutOfBoundException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4186) Insert query is failing when partition column is part of local sort scope.

2021-05-12 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4186:


 Summary: Insert query is failing when partition column is part of 
local sort scope.
 Key: CARBONDATA-4186
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4186
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Currently when we create table with partition column and put the same column as 
part of local sort scope then Insert query fails with ArrayIndexOutOfBounds 
exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4196) Allow zero or more white spaces in geo UDFs

2021-06-04 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4196:


 Summary: Allow zero or more white spaces in geo UDFs
 Key: CARBONDATA-4196
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4196
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Currently, regex of geo UDF is not allowing zero space between UDF name and 
parenthesis. It always expects a single space in between. For ex: {{linestring 
(120.184179 30.327465)}}. Because of this sometimes using the UDFs without 
space is not giving the expected result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4232) Add missing doc change for secondary index.

2021-06-24 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4232:


 Summary: Add missing doc change for secondary index.
 Key: CARBONDATA-4232
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4232
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Doc changes were not handled for PR-4116 to leverage secondary index till 
segment level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4162) Leverage Secondary Index till segment level with SI as datamap and SI with plan rewrite

2021-04-22 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-4162:
-
Summary: Leverage Secondary Index till segment level with SI as datamap and 
SI with plan rewrite  (was: Leverage Secondary Index till segment level with 
Spark plan rewrite)

> Leverage Secondary Index till segment level with SI as datamap and SI with 
> plan rewrite
> ---
>
> Key: CARBONDATA-4162
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4162
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Nihal kumar ojha
>Priority: Major
> Attachments: Support SI at segment level.pdf
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *Background:*
> Secondary index tables are created as indexes and managed as child tables 
> internally by Carbondata. In the existing architecture, if the parent(main) 
> table and SI table don’t
> have the same valid segments then we disable the SI table. And then from the
> next query onwards, we scan and prune only the parent table until we trigger
> the next load or REINDEX command (as these commands will make the
> parent and SI table segments in sync). Because of this, queries take more
> time to give the result when SI is disabled.
> *Proposed Solution:*
> We are planning to leverage SI till the segment level. It means at place
> of disabling the SI table(when parent and child table segments are not in 
> sync)
> we will do pruning on SI tables for all the valid segments(segments with 
> status
> success, marked for update and load partial success) and the rest of the
> segments will be pruned by the parent table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4162) Leverage Secondary Index till segment level with Spark plan rewrite

2021-04-05 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4162:


 Summary: Leverage Secondary Index till segment level with Spark 
plan rewrite
 Key: CARBONDATA-4162
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4162
 Project: CarbonData
  Issue Type: New Feature
Reporter: Nihal kumar ojha
 Attachments: Support SI at segment level.pdf

*Background:*

Secondary index tables are created as indexes and managed as child tables 
internally by Carbondata. In the existing architecture, if the parent(main) 
table and SI table don’t
have the same valid segments then we disable the SI table. And then from the
next query onwards, we scan and prune only the parent table until we trigger
the next load or REINDEX command (as these commands will make the
parent and SI table segments in sync). Because of this, queries take more
time to give the result when SI is disabled.

*Proposed Solution:*
We are planning to leverage SI till the segment level. It means at place
of disabling the SI table(when parent and child table segments are not in sync)
we will do pruning on SI tables for all the valid segments(segments with status
success, marked for update and load partial success) and the rest of the
segments will be pruned by the parent table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4144) After the alter table xxx compact command is executed, the index size of the segment is 0, and an error is reported while quering

2021-03-09 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297969#comment-17297969
 ] 

Nihal kumar ojha commented on CARBONDATA-4144:
--

Hi, can you please give clear steps to reproduce this issue? As whatever image 
you have uploaded is not rendering.

> After the alter table xxx compact command is executed, the index size of the 
> segment is 0, and an error is reported while quering
> -
>
> Key: CARBONDATA-4144
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4144
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.6.1, 2.0.0, 2.1.0
>Reporter: liuhe0702
>Priority: Major
>
> 1、In the tablestatus of the second index table, the value of indexSize is 0 
> and segmentFile is xxx_null.segment.
> !https://dts-szv.clouddragon.huawei.com/v1/downLoadFile?filePath=liuhe%2000450496/image/2020102600.jpg!
> 2、query failed
> ​!/net/dts/fckeditor/download.ashx?Path=HXE3plWtEstOcBZ8qldKhWFgc8703rzjttf0DP3ccTTJ21FmQMM2WBKlMMiV1tu7Q%2bCX9IYzuiC2ZDlw22gp5wmAKdaHmUvtbQCfVF70yuXj3LoG3neGY7nF%2b%2bxEd9Mv|width=1280,height=807!!image-2021-03-09-17-09-29-077.png|width=15,height=15!​​!https://dts.huawei.com/net/dts/fckeditor/download.ashx?Path=HXE3plWtEstOcBZ8qldKhWFgc8703rzjttf0DP3ccTTJ21FmQMM2WBKlMMiV1tu7Q%2bCX9IYzuiC2ZDlw22gp5wmAKdaHmUvtbQCfVF70yuXj3LoG3neGY7nF%2b%2bxEd9Mv!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4131) Concurrent load on table with flat folder structure fails with FileNotFound

2021-02-16 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4131:


 Summary: Concurrent load on table with flat folder structure fails 
with FileNotFound
 Key: CARBONDATA-4131
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4131
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4131) Concurrent load on table with flat folder structure fails with FileNotFound

2021-02-17 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285791#comment-17285791
 ] 

Nihal kumar ojha commented on CARBONDATA-4131:
--

Duplicate of CARBONDATA-3962

> Concurrent load on table with flat folder structure fails with FileNotFound
> ---
>
> Key: CARBONDATA-4131
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4131
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4131) Concurrent load on table with flat folder structure fails with FileNotFound

2021-02-17 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha resolved CARBONDATA-4131.
--
Resolution: Duplicate

> Concurrent load on table with flat folder structure fails with FileNotFound
> ---
>
> Key: CARBONDATA-4131
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4131
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4195) Materialized view loading time increased due to full refresh

2021-08-23 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403524#comment-17403524
 ] 

Nihal kumar ojha commented on CARBONDATA-4195:
--

 Hi, can you please provide the create MV command? 

Based on that only MV will be created with incremental or full refresh. If your 
query contains avg() aggregate function or some expression like sum(col1) + 
sum(col2) then MV will be created with full refresh. So once we have that 
command then we can conclude.

> Materialized view loading time increased due to full refresh
> 
>
> Key: CARBONDATA-4195
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4195
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Mayuri Patole
>Priority: Major
> Fix For: 2.1.0
>
>
> Hi Team,
> We are using carbon 2.1.0 in our project where parallel data loading is 
> happening.
> We are working on getting optimal performance for aggregated queries using 
> materialized views.
> We observed that continues data loading and full refresh of MV is causing 
> increased load time and high memory usage which doesn't have to be this way.
> Can you suggest a way to perform incremental refresh because we do not need 
> to calculate old data again while loading ? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4195) Materialized view loading time increased due to full refresh

2021-08-23 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403524#comment-17403524
 ] 

Nihal kumar ojha edited comment on CARBONDATA-4195 at 8/24/21, 5:14 AM:


 Hi, can you please provide the create MV command? 

Based on that only MV will be created with incremental or full refresh. If your 
query contains avg() aggregate function or some expression like sum(col1) + 
sum(col2) then MV will be created with full refresh. So once we have that 
command then we can conclude.

Or if it is a duplicate of 
[CARBONDATA-4239|https://issues.apache.org/jira/browse/CARBONDATA-4239] then 
please close this issue as we are already tracking that issue.


was (Author: nihal):
 Hi, can you please provide the create MV command? 

Based on that only MV will be created with incremental or full refresh. If your 
query contains avg() aggregate function or some expression like sum(col1) + 
sum(col2) then MV will be created with full refresh. So once we have that 
command then we can conclude.

> Materialized view loading time increased due to full refresh
> 
>
> Key: CARBONDATA-4195
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4195
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Mayuri Patole
>Priority: Major
> Fix For: 2.1.0
>
>
> Hi Team,
> We are using carbon 2.1.0 in our project where parallel data loading is 
> happening.
> We are working on getting optimal performance for aggregated queries using 
> materialized views.
> We observed that continues data loading and full refresh of MV is causing 
> increased load time and high memory usage which doesn't have to be this way.
> Can you suggest a way to perform incremental refresh because we do not need 
> to calculate old data again while loading ? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4286) Select query with and filter is giving empty result

2021-09-15 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4286:


 Summary: Select query with and filter is giving empty result
 Key: CARBONDATA-4286
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4286
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Select query on a table with and filter condition returns an empty result while 
valid data present in the table.

Root cause: Currently when we are building the min-max index at block level 
that time we are using unsafe byte comparator for either dimension or measure 
column which returns incorrect result for measure columns. 

We should use different comparators for dimensions and measure columns which we 
are already doing at time of writing the min-max index at blocklet level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4256) SI creation on a complex column that includes child column with a dot(.) fails with parse exception.

2021-07-29 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4256:


 Summary: SI creation on a complex column that includes child 
column with a dot(.) fails with parse exception.
 Key: CARBONDATA-4256
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4256
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


sql("create table complextable (country struct, name string, id 
Map, arr1 array, arr2 array) stored as 
carbondata");

sql("create index index_1 on table complextable(country.b) as 'carbondata'");

 

The above query fails with a parsing exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4177) performence issue with Query

2021-08-03 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392172#comment-17392172
 ] 

Nihal kumar ojha commented on CARBONDATA-4177:
--

Hi,

    Currently, carbondata doesn't support limit push down for either main table 
or MV table. It is only supported in the case of secondary index(SI). Because 
of this when we try to select rows with limit 10 in that case also carbon first 
fill vector of size 4096 rows and then sends it to spark. After that spark will 
apply the limit and give the result. So even for 10 rows we fetch 4096 rows and 
that is taking time. May be in future we can support limit pushdown for main 
table and MV and we will fetch only 10 rows at place of 4096 rows after that 
your query can get advantage.

> performence issue with Query
> 
>
> Key: CARBONDATA-4177
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4177
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.1
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team,Hi Team,
> We are working on a POC using carbondata 2.0.1 and have come across 
> parformance issue.Below are the details:
> 1.Table creation query:
> ==
> spark.sql("create table Flow_TS_1day_stats_16042021(start_time 
> timestamp,end_time timestamp,source_ip_address string,destintion_ip_address 
> string,appname string,protocol_name string,source_tos smallint,in_interface 
> smallint,out_interface smallint,src_as bigint,dst_as bigint,source_mask 
> smallint,destination_mask smallint, dst_tos smallint,input_pkt 
> bigint,input_byt bigint,output_pkt bigint,output_byt bigint,source_port 
> int,destination_port int) stored as carbondata TBLPROPERTIES 
> ('local_dictionary_enable'='false')").show()
> TWO MVs are there on this table, Below are the queries for those MVs
> :===
> 1. Network MV
> 
> spark.sql("create materialized view 
> Network_Level_Agg_10min_MV_with_ip_15042021_again as select 
> timeseries(end_time,'ten_minute') as end_time,source_ip_address, 
> destintion_ip_address,appname,protocol_name,source_port,destination_port,source_tos,src_as,dst_as,sum(input_pkt)
>  as input_pkt,sum(input_byt) as input_byt,sum(output_pkt) as 
> output_pkt,sum(output_byt) as output_byt from 
> Flow_TS_1day_stats_15042021_again group by 
> timeseries(end_time,'ten_minute'),source_ip_address,destintion_ip_address, 
> appname,protocol_name,source_port,destination_port,source_tos,src_as,dst_as 
> order by input_pkt,input_byt,output_pkt,output_byt desc").show(false)
> 2. Interfae MV:
> ==Interface :==
> spark.sql("create materialized view Interface_Level_Agg_10min_MV_16042021 as 
> select timeseries(end_time,'ten_minute') as end_time, 
> source_ip_address,destintion_ip_address,appname,protocol_name,source_port,destination_port,source_tos,src_as,dst_as,in_interface,out_interface,sum(input_pkt)
>  as input_pkt,sum(input_byt) as input_byt,sum(output_pkt) as 
> output_pkt,sum(output_byt) as output_byt from Flow_TS_1day_stats_16042021 
> group by timeseries(end_time,'ten_minute'), 
> source_ip_address,destintion_ip_address,appname,protocol_name,source_port,destination_port,source_tos,src_as,dst_as,in_interface,out_interface
>  order by input_pkt,input_byt,output_pkt,output_byt desc").show(false)
> +*We are firing below query for fethcing data which is taking almost 10 
> seconds:*+
> *Select appname,input_byt from Flow_TS_1day_stats_16042021 where end_time >= 
> '2021-03-02 00:00:00' and end_time < '2021-03-03 00:00:00' group by 
> appname,input_byt order by input_byt desc LIMIT 10*
>  
> The above query is only fetching 10 records but it is taking almost 10 
> seconds to complete.
> Could you please review above schemas and help us to understand how can we 
> get some improvement in the qury execution time. We are expectingt he 
> response should be in subseconds.
> Table Name : RAW Table (1 Day - 300K/Sec)#Records : 2592000
> RegardsSuyash Yadav                          



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4227) SDK CarbonWriterBuilder cannot execute `build()` several times with different output path

2021-08-04 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392772#comment-17392772
 ] 

Nihal kumar ojha commented on CARBONDATA-4227:
--

Hi,

   Based on the current carbon SDK implementation, when you execute 
`CarbonWriter.builder()` that time we create an instance of CarbonWriterBuilder 
and then we keep changing the other property based on other exposed APIs in the 
same instance. Now when we trigger build() then we consider all the property of 
instance as final and make use of that for creating the CarbonWriter. Now 
suppose after performing build() if we will allow change other property of the 
same instance and perform build() again then it will overwrite the previous 
build() instance and that again leads to confusion.

    So I will suggest if you want to use another build() instance then please 
create a new builder() instance first otherwise it won't behave as expected.

Please post your reply if any other confusion related to this.

> SDK CarbonWriterBuilder cannot execute `build()` several times with different 
> output path
> -
>
> Key: CARBONDATA-4227
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4227
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: ChenKai
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Sometimes we want to reuse CarbonWriterBuilder object to build CarbonWriter 
> with different output paths, but it does not work.
> For example: 
> {code:scala}
> val builder = CarbonWriter.builder().withCsvInput(...).writtenBy(...)
> // 1. first writing with path1
> val writer1 = builder.outputPath(path1).build()
> // write data, it works 
> // 2. second writing with path2
> val writer2 = builder.outputPath(path2).build()
> // write data, it does not work. It still writes data to path1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4248) Explain query with upper case column is throwing key not found exception.

2021-07-19 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4248:


 Summary: Explain query with upper case column is throwing key not 
found exception.
 Key: CARBONDATA-4248
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4248
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Steps to reproduce:

sql("drop table if exists carbon_table")
 sql("drop table if exists parquet_table")
 sql("create table IF NOT EXISTS carbon_table(`BEGIN_TIME` BIGINT," +
 " `SAI_CGI_ECGI` STRING) stored as carbondata")
 sql("create table IF NOT EXISTS parquet_table(CELL_NAME string, CGISAI 
string)" +
 " stored as parquet")
 sql("explain extended with grpMainDatathroughput as (select" +
 " from_unixtime(begin_time, 'MMdd') as data_time, SAI_CGI_ECGI from 
carbon_table)," +
 " grpMainData as (select * from grpMainDatathroughput a JOIN(select CELL_NAME, 
CGISAI from" +
 " parquet_table) b ON b.CGISAI=a.SAI_CGI_ECGI) " +
 "select * from grpMainData a left join grpMainData b on 
a.cell_name=b.cell_name")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)