[jira] [Assigned] (CARBONDATA-3830) Presto read support for complex columns
[ https://issues.apache.org/jira/browse/CARBONDATA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kumar Vishal reassigned CARBONDATA-3830: Assignee: Ajantha Bhat > Presto read support for complex columns > --- > > Key: CARBONDATA-3830 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3830 > Project: CarbonData > Issue Type: New Feature > Components: core, presto-integration >Reporter: Akshay >Assignee: Ajantha Bhat >Priority: Minor > Attachments: Presto Read Support.pdf > > Time Spent: 33h 40m > Remaining Estimate: 0h > > This feature is to enable Presto to read complex columns from carbon file. > Complex columns include - array, map and struct. > This design document handles only for array type. > Map and Struct types will be handled later. > > PR - [https://github.com/apache/carbondata/pull/3773] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3830) Presto read support for complex columns
[ https://issues.apache.org/jira/browse/CARBONDATA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kumar Vishal resolved CARBONDATA-3830. -- Fix Version/s: 2.1.0 Resolution: Fixed > Presto read support for complex columns > --- > > Key: CARBONDATA-3830 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3830 > Project: CarbonData > Issue Type: New Feature > Components: core, presto-integration >Reporter: Akshay >Assignee: Ajantha Bhat >Priority: Minor > Fix For: 2.1.0 > > Attachments: Presto Read Support.pdf > > Time Spent: 33h 40m > Remaining Estimate: 0h > > This feature is to enable Presto to read complex columns from carbon file. > Complex columns include - array, map and struct. > This design document handles only for array type. > Map and Struct types will be handled later. > > PR - [https://github.com/apache/carbondata/pull/3773] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-1328) Refactoring unsafe code and added new property
[ https://issues.apache.org/jira/browse/CARBONDATA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-1328: - Description: refactoring unsafe memory manager and unsafe sort Removed Sort memory manager and added Single memory manager Deprecated old property: sort.inmemory.size.inmb, enable.offheap.sort Added new property: carbon.sort.storage.inmemory.size.inmb, carbon.unsafe.working.memory.in.mb, enable.offheap Removed copying of block from working memory to sort storage memory Now all the unsafe flow must use CarbonUnsfeMemoryManager for allocating/ removing unsafe memory If user has configured old property then internally it will be converted to new property for ex: If user has configured sort.inmemory.size.inmb then 20% memory will be used as working memory and rest for storage memory was:Add unsafe property validation + handle old unsafe parameter for backward compatibility Issue Type: Improvement (was: Bug) Summary: Refactoring unsafe code and added new property (was: Carbon Unsafe Property validation) > Refactoring unsafe code and added new property > -- > > Key: CARBONDATA-1328 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1328 > Project: CarbonData > Issue Type: Improvement >Reporter: kumar vishal >Assignee: kumar vishal > > refactoring unsafe memory manager and unsafe sort > Removed Sort memory manager and added Single memory manager > Deprecated old property: sort.inmemory.size.inmb, enable.offheap.sort > Added new property: carbon.sort.storage.inmemory.size.inmb, > carbon.unsafe.working.memory.in.mb, enable.offheap > Removed copying of block from working memory to sort storage memory > Now all the unsafe flow must use CarbonUnsfeMemoryManager for allocating/ > removing unsafe memory > If user has configured old property then internally it will be converted to > new property > for ex: If user has configured sort.inmemory.size.inmb then 20% memory will > be used as working memory and rest for storage memory -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1370) Query is failing with array index out of bound exception
kumar vishal created CARBONDATA-1370: Summary: Query is failing with array index out of bound exception Key: CARBONDATA-1370 URL: https://issues.apache.org/jira/browse/CARBONDATA-1370 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Query is failing with array index out of bound exception when table contains sort columns and less than filter is applied on timestamp column which is not present in sort columns -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1402) JVM crashes when data loading is done with unsafe column page=true
kumar vishal created CARBONDATA-1402: Summary: JVM crashes when data loading is done with unsafe column page=true Key: CARBONDATA-1402 URL: https://issues.apache.org/jira/browse/CARBONDATA-1402 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal JVM crashes when data loading is done with unsafe column page=true -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1410) Thread leak issue in case of data loading failure
kumar vishal created CARBONDATA-1410: Summary: Thread leak issue in case of data loading failure Key: CARBONDATA-1410 URL: https://issues.apache.org/jira/browse/CARBONDATA-1410 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Thread leak issue in case of data loading failure -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1449) GC issue in case of date filter if it is going to rowlevel executor
kumar vishal created CARBONDATA-1449: Summary: GC issue in case of date filter if it is going to rowlevel executor Key: CARBONDATA-1449 URL: https://issues.apache.org/jira/browse/CARBONDATA-1449 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal GC issue in case of date filter if it is going to rowlevel executor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1474) Memory leak issue in case of vector reader
kumar vishal created CARBONDATA-1474: Summary: Memory leak issue in case of vector reader Key: CARBONDATA-1474 URL: https://issues.apache.org/jira/browse/CARBONDATA-1474 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Memory leak issue in case of vector reader -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1474) Memory leak issue in case of vector reader
[ https://issues.apache.org/jira/browse/CARBONDATA-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178909#comment-16178909 ] kumar vishal commented on CARBONDATA-1474: -- Fixed as part of CARBONDATA-1488, so closing this issue > Memory leak issue in case of vector reader > -- > > Key: CARBONDATA-1474 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1474 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > > Memory leak issue in case of vector reader -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1474) Memory leak issue in case of vector reader
[ https://issues.apache.org/jira/browse/CARBONDATA-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal closed CARBONDATA-1474. Resolution: Fixed > Memory leak issue in case of vector reader > -- > > Key: CARBONDATA-1474 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1474 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > > Memory leak issue in case of vector reader -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1514) Sort Column Property is not getting added in case of alter operation
kumar vishal created CARBONDATA-1514: Summary: Sort Column Property is not getting added in case of alter operation Key: CARBONDATA-1514 URL: https://issues.apache.org/jira/browse/CARBONDATA-1514 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Sort Column Property is not getting added in case of alter operation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1515) Fixed NPE in Data loading
kumar vishal created CARBONDATA-1515: Summary: Fixed NPE in Data loading Key: CARBONDATA-1515 URL: https://issues.apache.org/jira/browse/CARBONDATA-1515 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Scenario: Data size: 3.5 billion rows(4.1 tb data) 3 node cluster Number of core while data loading 12. No. of loads 100 times Problem: In DataConverterProcessorStepImpl it is using array list for adding all the local converter, in case of multiple thread scenario it is creating a hole (null value)(as array list if not synchronized). while closing the converter it is it is throwing NPE Solution: Add local converter in synchronized block -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1516) Support pre-aggregate tables and timeseries in carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205427#comment-16205427 ] kumar vishal commented on CARBONDATA-1516: -- [~chenliang613] In case of drop column if any aggregate table has that column then that table will become invalid. In that case user need to rebuild aggregate table again. > Support pre-aggregate tables and timeseries in carbondata > - > > Key: CARBONDATA-1516 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1516 > Project: CarbonData > Issue Type: New Feature >Reporter: Ravindra Pesala > Attachments: CarbonData Pre-aggregation Table.pdf > > > Currently Carbondata has standard SQL capability on distributed data > sets.Carbondata should support pre-aggregating tables for timeseries and > improve query performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1589) Support Desc table and desc formatted table for Pre Aggregate table
kumar vishal created CARBONDATA-1589: Summary: Support Desc table and desc formatted table for Pre Aggregate table Key: CARBONDATA-1589 URL: https://issues.apache.org/jira/browse/CARBONDATA-1589 Project: CarbonData Issue Type: Sub-task Reporter: kumar vishal Support Desc table and desc formatted table for Pre Aggregate table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1609) Update thrift to support Pre Aggregate support
kumar vishal created CARBONDATA-1609: Summary: Update thrift to support Pre Aggregate support Key: CARBONDATA-1609 URL: https://issues.apache.org/jira/browse/CARBONDATA-1609 Project: CarbonData Issue Type: Sub-task Reporter: kumar vishal Update thrift to support Pre Aggregate support -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1658) Thread Leak Issue in No Sort
kumar vishal created CARBONDATA-1658: Summary: Thread Leak Issue in No Sort Key: CARBONDATA-1658 URL: https://issues.apache.org/jira/browse/CARBONDATA-1658 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Threads are not getting closed in case of no sort -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1713) Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after creating pre-aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1713: Assignee: kumar vishal > Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after > creating pre-aggregate table > --- > > Key: CARBONDATA-1713 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1713 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: ANT Test cluster - 3 node >Reporter: Ramakrishna S >Assignee: kumar vishal >Priority: Blocker > Labels: sanity > Fix For: 1.3.0 > > > 0: jdbc:hive2://10.18.98.34:23040> load data inpath > "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > Error: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or > view 'lineitem' not found in database 'default'; (state=,code=0) > 0: jdbc:hive2://10.18.98.34:23040> create table if not exists lineitem( > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPMODE string, > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPINSTRUCT string, > 0: jdbc:hive2://10.18.98.34:23040> L_RETURNFLAG string, > 0: jdbc:hive2://10.18.98.34:23040> L_RECEIPTDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_ORDERKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_PARTKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_SUPPKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_LINENUMBER int, > 0: jdbc:hive2://10.18.98.34:23040> L_QUANTITY double, > 0: jdbc:hive2://10.18.98.34:23040> L_EXTENDEDPRICE double, > 0: jdbc:hive2://10.18.98.34:23040> L_DISCOUNT double, > 0: jdbc:hive2://10.18.98.34:23040> L_TAX double, > 0: jdbc:hive2://10.18.98.34:23040> L_LINESTATUS string, > 0: jdbc:hive2://10.18.98.34:23040> L_COMMITDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_COMMENT string > 0: jdbc:hive2://10.18.98.34:23040> ) STORED BY 'org.apache.carbondata.format' > 0: jdbc:hive2://10.18.98.34:23040> TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.338 seconds) > 0: jdbc:hive2://10.18.98.34:23040> load data inpath > "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (48.634 seconds) > 0: jdbc:hive2://10.18.98.34:23040> create datamap agr_lineitem ON TABLE > lineitem USING "org.apache.carbondata.datamap.AggregateDataMapHandler" as > select L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from > lineitem group by L_RETURNFLAG, L_LINESTATUS; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (16.552 seconds) > 0: jdbc:hive2://10.18.98.34:23040> select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem > group by L_RETURNFLAG, L_LINESTATUS; > Error: org.apache.spark.sql.AnalysisException: Column doesnot exists in Pre > Aggregate table; (state=,code=0) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1713) Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after creating pre-aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253143#comment-16253143 ] kumar vishal commented on CARBONDATA-1713: -- This failing because in select statement column name is in upper case to aggregate table selection is failing. > Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after > creating pre-aggregate table > --- > > Key: CARBONDATA-1713 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1713 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: ANT Test cluster - 3 node >Reporter: Ramakrishna S >Assignee: kumar vishal >Priority: Blocker > Labels: sanity > Fix For: 1.3.0 > > > 0: jdbc:hive2://10.18.98.34:23040> load data inpath > "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > Error: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or > view 'lineitem' not found in database 'default'; (state=,code=0) > 0: jdbc:hive2://10.18.98.34:23040> create table if not exists lineitem( > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPMODE string, > 0: jdbc:hive2://10.18.98.34:23040> L_SHIPINSTRUCT string, > 0: jdbc:hive2://10.18.98.34:23040> L_RETURNFLAG string, > 0: jdbc:hive2://10.18.98.34:23040> L_RECEIPTDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_ORDERKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_PARTKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_SUPPKEY string, > 0: jdbc:hive2://10.18.98.34:23040> L_LINENUMBER int, > 0: jdbc:hive2://10.18.98.34:23040> L_QUANTITY double, > 0: jdbc:hive2://10.18.98.34:23040> L_EXTENDEDPRICE double, > 0: jdbc:hive2://10.18.98.34:23040> L_DISCOUNT double, > 0: jdbc:hive2://10.18.98.34:23040> L_TAX double, > 0: jdbc:hive2://10.18.98.34:23040> L_LINESTATUS string, > 0: jdbc:hive2://10.18.98.34:23040> L_COMMITDATE string, > 0: jdbc:hive2://10.18.98.34:23040> L_COMMENT string > 0: jdbc:hive2://10.18.98.34:23040> ) STORED BY 'org.apache.carbondata.format' > 0: jdbc:hive2://10.18.98.34:23040> TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.338 seconds) > 0: jdbc:hive2://10.18.98.34:23040> load data inpath > "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (48.634 seconds) > 0: jdbc:hive2://10.18.98.34:23040> create datamap agr_lineitem ON TABLE > lineitem USING "org.apache.carbondata.datamap.AggregateDataMapHandler" as > select L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from > lineitem group by L_RETURNFLAG, L_LINESTATUS; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (16.552 seconds) > 0: jdbc:hive2://10.18.98.34:23040> select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem > group by L_RETURNFLAG, L_LINESTATUS; > Error: org.apache.spark.sql.AnalysisException: Column doesnot exists in Pre > Aggregate table; (state=,code=0) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1740) Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query with order by when main table is having pre-aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255622#comment-16255622 ] kumar vishal commented on CARBONDATA-1740: -- This is failing because of order by in query. In PreAggregate rules order by scenario is not handled > Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query > with order by when main table is having pre-aggregate table > - > > Key: CARBONDATA-1740 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1740 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S > Labels: DFX > Fix For: 1.3.0 > > > lineitem3: has a pre-aggregate table > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus order by l_returnflag, > l_linestatus; > Error: org.apache.spark.sql.AnalysisException: expression > '`lineitem3_l_returnflag`' is neither present in the group by, nor is it an > aggregate function. Add to group by or wrap in first() (or first_value) if > you don't care which value you get.;; > Project [l_returnflag#2356, l_linestatus#2366, sum(l_quantity)#2791, > sum(l_extendedprice)#2792] > +- Sort [aggOrder#2795 ASC NULLS FIRST, aggOrder#2796 ASC NULLS FIRST], true >+- !Aggregate [l_returnflag#2356, l_linestatus#2366], [l_returnflag#2356, > l_linestatus#2366, sum(l_quantity#2362) AS sum(l_quantity)#2791, > sum(l_extendedprice#2363) AS sum(l_extendedprice)#2792, > lineitem3_l_returnflag#2341 AS aggOrder#2795, lineitem3_l_linestatus#2342 AS > aggOrder#2796] > +- SubqueryAlias lineitem3 > +- > Relation[L_SHIPDATE#2353,L_SHIPMODE#2354,L_SHIPINSTRUCT#2355,L_RETURNFLAG#2356,L_RECEIPTDATE#2357,L_ORDERKEY#2358,L_PARTKEY#2359,L_SUPPKEY#2360,L_LINENUMBER#2361,L_QUANTITY#2362,L_EXTENDEDPRICE#2363,L_DISCOUNT#2364,L_TAX#2365,L_LINESTATUS#2366,L_COMMITDATE#2367,L_COMMENT#2368] > CarbonDatasourceHadoopRelation [ Database name :test_db1, Table name > :lineitem3, Schema :Some(StructType(StructField(L_SHIPDATE,StringType,true), > StructField(L_SHIPMODE,StringType,true), > StructField(L_SHIPINSTRUCT,StringType,true), > StructField(L_RETURNFLAG,StringType,true), > StructField(L_RECEIPTDATE,StringType,true), > StructField(L_ORDERKEY,StringType,true), > StructField(L_PARTKEY,StringType,true), > StructField(L_SUPPKEY,StringType,true), > StructField(L_LINENUMBER,IntegerType,true), > StructField(L_QUANTITY,DoubleType,true), > StructField(L_EXTENDEDPRICE,DoubleType,true), > StructField(L_DISCOUNT,DoubleType,true), StructField(L_TAX,DoubleType,true), > StructField(L_LINESTATUS,StringType,true), > StructField(L_COMMITDATE,StringType,true), > StructField(L_COMMENT,StringType,true))) ] (state=,code=0) > lineitem4: no pre-aggregate table created > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem4 group by l_returnflag, l_linestatus order by l_returnflag, > l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | A | F | 1.263625E7 | 1.8938515425239815E10 | > | N | F | 327800.0 | 4.91387677622E8| > | N | O | 2.5398626E7 | 3.810981608977963E10 | > | R | F | 1.2643878E7 | 1.8948524305619884E10 | > +---+---+--++--+ > *+Expected:+*: aggregate query with order by should run fine > *+Actual:+* aggregate query with order failed -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1740) Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query with order by when main table is having pre-aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1740: Assignee: kumar vishal > Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query > with order by when main table is having pre-aggregate table > - > > Key: CARBONDATA-1740 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1740 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: kumar vishal > Labels: DFX > Fix For: 1.3.0 > > > lineitem3: has a pre-aggregate table > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus order by l_returnflag, > l_linestatus; > Error: org.apache.spark.sql.AnalysisException: expression > '`lineitem3_l_returnflag`' is neither present in the group by, nor is it an > aggregate function. Add to group by or wrap in first() (or first_value) if > you don't care which value you get.;; > Project [l_returnflag#2356, l_linestatus#2366, sum(l_quantity)#2791, > sum(l_extendedprice)#2792] > +- Sort [aggOrder#2795 ASC NULLS FIRST, aggOrder#2796 ASC NULLS FIRST], true >+- !Aggregate [l_returnflag#2356, l_linestatus#2366], [l_returnflag#2356, > l_linestatus#2366, sum(l_quantity#2362) AS sum(l_quantity)#2791, > sum(l_extendedprice#2363) AS sum(l_extendedprice)#2792, > lineitem3_l_returnflag#2341 AS aggOrder#2795, lineitem3_l_linestatus#2342 AS > aggOrder#2796] > +- SubqueryAlias lineitem3 > +- > Relation[L_SHIPDATE#2353,L_SHIPMODE#2354,L_SHIPINSTRUCT#2355,L_RETURNFLAG#2356,L_RECEIPTDATE#2357,L_ORDERKEY#2358,L_PARTKEY#2359,L_SUPPKEY#2360,L_LINENUMBER#2361,L_QUANTITY#2362,L_EXTENDEDPRICE#2363,L_DISCOUNT#2364,L_TAX#2365,L_LINESTATUS#2366,L_COMMITDATE#2367,L_COMMENT#2368] > CarbonDatasourceHadoopRelation [ Database name :test_db1, Table name > :lineitem3, Schema :Some(StructType(StructField(L_SHIPDATE,StringType,true), > StructField(L_SHIPMODE,StringType,true), > StructField(L_SHIPINSTRUCT,StringType,true), > StructField(L_RETURNFLAG,StringType,true), > StructField(L_RECEIPTDATE,StringType,true), > StructField(L_ORDERKEY,StringType,true), > StructField(L_PARTKEY,StringType,true), > StructField(L_SUPPKEY,StringType,true), > StructField(L_LINENUMBER,IntegerType,true), > StructField(L_QUANTITY,DoubleType,true), > StructField(L_EXTENDEDPRICE,DoubleType,true), > StructField(L_DISCOUNT,DoubleType,true), StructField(L_TAX,DoubleType,true), > StructField(L_LINESTATUS,StringType,true), > StructField(L_COMMITDATE,StringType,true), > StructField(L_COMMENT,StringType,true))) ] (state=,code=0) > lineitem4: no pre-aggregate table created > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem4 group by l_returnflag, l_linestatus order by l_returnflag, > l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | A | F | 1.263625E7 | 1.8938515425239815E10 | > | N | F | 327800.0 | 4.91387677622E8| > | N | O | 2.5398626E7 | 3.810981608977963E10 | > | R | F | 1.2643878E7 | 1.8948524305619884E10 | > +---+---+--++--+ > *+Expected:+*: aggregate query with order by should run fine > *+Actual:+* aggregate query with order failed -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258966#comment-16258966 ] kumar vishal edited comment on CARBONDATA-1777 at 11/20/17 8:59 AM: [~Ram@huawei] please check the executor log in executor log you will get the detail: Query will be executed on table: And you can check the query plan which table it is hitting to execute the query was (Author: kumarvishal09): [~Ram@huawei] please check the executor log in executor log you will get the detail: Query will be executed on table: > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,
[jira] [Commented] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258966#comment-16258966 ] kumar vishal commented on CARBONDATA-1777: -- [~Ram@huawei] please check the executor log in executor log you will get the detail: Query will be executed on table: > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,360 | INFO | [pool-23-thread-53] | Reading in results > for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection > used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77) > 2017-11-20 15:46:48,362 | INFO | [pool-23-thread-53] | Using di
[jira] [Assigned] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1777: Assignee: kumar vishal (was: Kunal Kapoor) > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: kumar vishal >Priority: Minor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,360 | INFO | [pool-23-thread-53] | Reading in results > for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection > used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77) > 2017-11-20 15:46:48,362 | INFO | [pool-23-thread-53] | Using direct SQL, > underlying DB is MYSQL | > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.
[jira] [Assigned] (CARBONDATA-1760) Carbon 1.3.0- Pre_aggregate: Proper Error message should be displayed, when parent table name is not correct while creating datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1760: Assignee: Kunal Kapoor > Carbon 1.3.0- Pre_aggregate: Proper Error message should be displayed, when > parent table name is not correct while creating datamap. > > > Key: CARBONDATA-1760 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1760 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 1.3.0 >Reporter: Ayushi Sharma >Assignee: Kunal Kapoor >Priority: Minor > Labels: dfx > > Steps: > 1. CREATE DATAMAP tt3 ON TABLE cust_2 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" AS SELECT c_custkey, > c_name, sum(c_acctbal), avg(c_acctbal), count(c_acctbal) FROM tstcust GROUP > BY c_custkey, c_name; > Issue: > Proper error message is not displayed. It throws "assertion failed" error. > Expected: > Proper error message should be displayed, if parent table name has any > ambiguity. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1737) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate table loads partially when segment filter is set on the main table
[ https://issues.apache.org/jira/browse/CARBONDATA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1737: Assignee: Kunal Kapoor > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate table loads partially when > segment filter is set on the main table > - > > Key: CARBONDATA-1737 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1737 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > > 1. Create a table > create table if not exists lineitem2(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > 2. Load 2 times to create 2 segments > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem2 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 3. Check the table content without setting any filter: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem2 group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 327800.0 | 4.91387677624E8| > | A | F | 1.263625E7 | 1.893851542524009E10 | > | N | O | 2.5398626E7 | 3.810981608977967E10 | > | R | F | 1.2643878E7 | 1.8948524305619976E10 | > +---+---+--++--+ > 4. Set segment filter on the main table: > set carbon.input.segments.test_db1.lineitem2=1; > +---++--+ > |key| value | > +---++--+ > | carbon.input.segments.test_db1.lineitem2 | 1 | > +---++--+ > 5. Create pre-aggregate table > create datamap agr_lineitem2 ON TABLE lineitem2 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem2 > group by L_RETURNFLAG, L_LINESTATUS; > 6. Check table content: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem2 group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 163900.0 | 2.456938388124E8 | > | A | F | 6318125.0| 9.469257712620043E9| > | N | O | 1.2699313E7 | 1.9054908044889835E10 | > | R | F | 6321939.0| 9.474262152809986E9| > +---+---+--++--+ > 7. remove the filter on segment > 0: jdbc:hive2://10.18.98.48:23040> reset; > 8. Check the table conent: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem2 group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 163900.0 | 2.456938388124E8 | > | A | F | 6318125.0| 9.469257712620043E9| > | N | O | 1.2699313E7 | 1.9054908044889835E10 | > | R | F | 6321939.0| 9.474262152809986E9| > +---+---+
[jira] [Assigned] (CARBONDATA-1736) Carbon1.3.0-Pre-AggregateTable -Query from segment set is not effective when pre-aggregate table is present
[ https://issues.apache.org/jira/browse/CARBONDATA-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1736: Assignee: kumar vishal > Carbon1.3.0-Pre-AggregateTable -Query from segment set is not effective when > pre-aggregate table is present > - > > Key: CARBONDATA-1736 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1736 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: kumar vishal > Labels: DFX > Fix For: 1.3.0 > > > 1. Create a table > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > 2. Run load : > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 3. create pre-agg table > create datamap agr_lineitem3 ON TABLE lineitem3 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem3 > group by L_RETURNFLAG, L_LINESTATUS; > 3. Check table content using aggregate query: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 4913382.0| 7.369901176949993E9| > | A | F | 1.88818373E8 | 2.8310705145736383E11 | > | N | O | 3.82400594E8 | 5.734650756707479E11 | > | R | F | 1.88960009E8 | 2.833523780876951E11 | > +---+---+--++--+ > 4 rows selected (1.568 seconds) > 4. Load one more time: > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 5. Check table content using aggregate query: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 9826764.0| 1.4739802353899986E10 | > | A | F | 3.77636746E8 | 5.662141029147278E11 | > | N | O | 7.64801188E8 | 1.1469301513414958E12 | > | R | F | 3.77920018E8 | 5.667047561753901E11 | > +---+---+--++--+ > 6. Set query from segment 1: > 0: jdbc:hive2://10.18.98.48:23040> set > carbon.input.segments.test_db1.lilneitem1=1; > +++--+ > |key | value | > +++--+ > | carbon.input.segments.test_db1.lilneitem1 | 1 | > +++--+ > 7. Check table content using aggregate query: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus; > *+Expected+*: It should return the values from segment 1 alone. > *+Actual :+* : It returns values from both segments > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice)
[jira] [Assigned] (CARBONDATA-1518) 2. Support creating timeseries while creating main table.
[ https://issues.apache.org/jira/browse/CARBONDATA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1518: Assignee: kumar vishal > 2. Support creating timeseries while creating main table. > - > > Key: CARBONDATA-1518 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1518 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: kumar vishal > > User can give timeseries option while creating the main table itself and > carbon will create aggregate tables automatically. > {code} > CREATE TABLE agg_sales > STORED BY 'carbondata' > TBLPROPERTIES ('parent_table'='sales', ‘timeseries_column’=’order_time’, > ‘granualarity’=’hour’, ‘rollup’ =’quantity:sum, max # user_id: count # price: > sum, max, min, avg’) > {code} > In the above case, user choose timeseries_column, granularity and aggregation > types for measures, so carbon generates the aggregation tables automatically > for year, month, day and hour level aggregation tables (totally 4 tables, > their table name will be prefixed with agg_sales). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1519) 3. Create UDF for timestamp to extract year,month,day,hour and minute from timestamp and date
[ https://issues.apache.org/jira/browse/CARBONDATA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1519: Assignee: kumar vishal > 3. Create UDF for timestamp to extract year,month,day,hour and minute from > timestamp and date > - > > Key: CARBONDATA-1519 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1519 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: kumar vishal > > Create UDF for timestamp to extract year,month,day,hour and minute from > timestamp and date -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1526) 10. Handle compaction in aggregation tables.
[ https://issues.apache.org/jira/browse/CARBONDATA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1526: Assignee: Kunal Kapoor > 10. Handle compaction in aggregation tables. > > > Key: CARBONDATA-1526 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1526 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Kunal Kapoor > > User can trigger compaction on pre-aggregate table directly, it will further > merge the segments inside pre-aggregation table. To do that, use ALTER TABLE > COMPACT command on the pre-aggregate table just like the main table. > For implementation, there are two kinds of implementation for compaction. > 1. Mergable pre-aggregate tables: if aggregate functions are count, max, min, > sum, avg, the pre-aggregate table segments can be merged directly without > re-computing it. > 2. Non-mergable pre-aggregate tables: if aggregate function include > distinct_count, it needs to re-compute when doing compaction on pre-aggregate > table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (CARBONDATA-1841) Data is not being loaded into pre-aggregation table after creation
[ https://issues.apache.org/jira/browse/CARBONDATA-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reopened CARBONDATA-1841: -- > Data is not being loaded into pre-aggregation table after creation > -- > > Key: CARBONDATA-1841 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1841 > Project: CarbonData > Issue Type: Bug >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1877) Rollup in query on timeseries table is not working
kumar vishal created CARBONDATA-1877: Summary: Rollup in query on timeseries table is not working Key: CARBONDATA-1877 URL: https://issues.apache.org/jira/browse/CARBONDATA-1877 Project: CarbonData Issue Type: Bug Reporter: kumar vishal *Problem: *When hour level timeseries table is present and user is firing query for year level it is hitting the maintable . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1877) Rollup on query in case of timeseries table is not working
[ https://issues.apache.org/jira/browse/CARBONDATA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-1877: - Summary: Rollup on query in case of timeseries table is not working (was: Rollup in query on timeseries table is not working) > Rollup on query in case of timeseries table is not working > -- > > Key: CARBONDATA-1877 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1877 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > > *Problem: *When hour level timeseries table is present and user is firing > query for year level it is hitting the maintable . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1877) Rollup in query on timeseries table is not working
[ https://issues.apache.org/jira/browse/CARBONDATA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1877: Assignee: kumar vishal > Rollup in query on timeseries table is not working > -- > > Key: CARBONDATA-1877 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1877 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > > *Problem: *When hour level timeseries table is present and user is firing > query for year level it is hitting the maintable . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1516) Support pre-aggregate tables and timeseries in carbondata
[ https://issues.apache.org/jira/browse/CARBONDATA-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-1516: - Attachment: CarbonData Pre-aggregation Table_v1.2.pdf > Support pre-aggregate tables and timeseries in carbondata > - > > Key: CARBONDATA-1516 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1516 > Project: CarbonData > Issue Type: New Feature >Reporter: Ravindra Pesala > Attachments: CarbonData Pre-aggregation Table.pdf, CarbonData > Pre-aggregation Table_v1.1.pdf, CarbonData Pre-aggregation Table_v1.2.pdf > > > Currently Carbondata has standard SQL capability on distributed data > sets.Carbondata should support pre-aggregating tables for timeseries and > improve query performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1888) Compaction is failing in case of timeseries
kumar vishal created CARBONDATA-1888: Summary: Compaction is failing in case of timeseries Key: CARBONDATA-1888 URL: https://issues.apache.org/jira/browse/CARBONDATA-1888 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Compaction is failing in case of timeseries -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1881) insert overwrite not working properly for pre-aggregate tables
[ https://issues.apache.org/jira/browse/CARBONDATA-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1881. -- Resolution: Fixed Fix Version/s: 1.3.0 > insert overwrite not working properly for pre-aggregate tables > -- > > Key: CARBONDATA-1881 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1881 > Project: CarbonData > Issue Type: Bug >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > when insert overwrite if fired on the main table then the pre-aggregate > tables are not overwritten with the new values instead the values are > appended to the table like a normal insert -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1891) None.get when creating timeseries table after loading data into main table
[ https://issues.apache.org/jira/browse/CARBONDATA-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1891. -- Resolution: Fixed Fix Version/s: 1.3.0 > None.get when creating timeseries table after loading data into main table > -- > > Key: CARBONDATA-1891 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1891 > Project: CarbonData > Issue Type: Bug >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *Steps to reproduce* > 1. CREATE TABLE mainTable(mytime timestamp, name string, age int) STORED BY > 'org.apache.carbondata.format' > 2. LOAD DATA LOCAL INPATH 'timeseriestest.csv' into table mainTable > 3. create datamap agg0 on table mainTable using 'preaggregate' DMPROPERTIES > ('timeseries.eventTime'='mytime', > 'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1') as > select mytime, sum(age) from mainTable group by mytime -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1887) block pruning not happening is carbon for ShortType and SmallIntType columns
[ https://issues.apache.org/jira/browse/CARBONDATA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1887. -- Resolution: Fixed Fix Version/s: 1.3.0 > block pruning not happening is carbon for ShortType and SmallIntType columns > > > Key: CARBONDATA-1887 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1887 > Project: CarbonData > Issue Type: Bug >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > Fix For: 1.3.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > spark.sql( > s""" > | create table test_numeric_type(c1 int, c2 long, c3 smallint, c4 > bigint, c5 short) stored by 'carbondata' >""".stripMargin).show() > spark.sql( > s""" > | insert into test_numeric_type select > 1,111,111,11,'2019-01-03 12:12:12' >""".stripMargin).show() > spark.sql( > s""" > | insert into test_numeric_type select > 2,222,222,22,'2020-01-03 12:12:12' >""".stripMargin).show() > spark.sql( > s""" > | insert into test_numeric_type select > 3,333,333,33,'2021-01-03 12:12:12' >""".stripMargin).show() > spark.sql( > s""" > | select * from test_numeric_type where c5> >""".stripMargin).show() > Only two blocks should be selected but all blocks are selected during query > execution. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1898) Like, Contains, Ends With query optimization in case of or filter
kumar vishal created CARBONDATA-1898: Summary: Like, Contains, Ends With query optimization in case of or filter Key: CARBONDATA-1898 URL: https://issues.apache.org/jira/browse/CARBONDATA-1898 Project: CarbonData Issue Type: Improvement Reporter: kumar vishal *Problem:* In case of like, contains, ends with filter With all or condition query is taking more time in carbon *Solution*: This type of query avoid filter push down and let spark handle those filters -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1901) Fix Pre aggregate data map creation and query parsing issue
kumar vishal created CARBONDATA-1901: Summary: Fix Pre aggregate data map creation and query parsing issue Key: CARBONDATA-1901 URL: https://issues.apache.org/jira/browse/CARBONDATA-1901 Project: CarbonData Issue Type: Improvement Reporter: kumar vishal *Problem:*Fixed below issues in case of pre aggregate 1. Pre aggregate data map table column order is not as per query given by user because of which while data is loaded to wrong column 2. when aggregate function contains any expression query is failing with match error 3. pre aggregate data map columns and parent tables columns encoder is not matching *Solution:* 1. Do not consider group columns in pre aggregate 2. when aggregate function contains any expression hit the maintable 3. Get encoder from main table and add in pre aggregate table column When aggregation type is sum or avg create measure column -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1901) Fixed Pre aggregate data map creation and query parsing
[ https://issues.apache.org/jira/browse/CARBONDATA-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1901: Assignee: kumar vishal > Fixed Pre aggregate data map creation and query parsing > --- > > Key: CARBONDATA-1901 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1901 > Project: CarbonData > Issue Type: Improvement >Reporter: kumar vishal >Assignee: kumar vishal > Time Spent: 10m > Remaining Estimate: 0h > > *Problem:*Fixed below issues in case of pre aggregate > 1. Pre aggregate data map table column order is not as per query given by > user because of which while data is loaded to wrong column > 2. when aggregate function contains any expression query is failing with > match error > 3. pre aggregate data map columns and parent tables columns encoder is not > matching > *Solution:* > 1. Do not consider group columns in pre aggregate > 2. when aggregate function contains any expression hit the maintable > 3. Get encoder from main table and add in pre aggregate table column > When aggregation type is sum or avg create measure column -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1901) Fixed Pre aggregate data map creation and query parsing
[ https://issues.apache.org/jira/browse/CARBONDATA-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-1901: - Summary: Fixed Pre aggregate data map creation and query parsing (was: Fix Pre aggregate data map creation and query parsing issue) > Fixed Pre aggregate data map creation and query parsing > --- > > Key: CARBONDATA-1901 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1901 > Project: CarbonData > Issue Type: Improvement >Reporter: kumar vishal > Time Spent: 10m > Remaining Estimate: 0h > > *Problem:*Fixed below issues in case of pre aggregate > 1. Pre aggregate data map table column order is not as per query given by > user because of which while data is loaded to wrong column > 2. when aggregate function contains any expression query is failing with > match error > 3. pre aggregate data map columns and parent tables columns encoder is not > matching > *Solution:* > 1. Do not consider group columns in pre aggregate > 2. when aggregate function contains any expression hit the maintable > 3. Get encoder from main table and add in pre aggregate table column > When aggregation type is sum or avg create measure column -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1743) Carbon1.3.0-Pre-AggregateTable - Query returns no value if run at the time of pre-aggregate table creation
[ https://issues.apache.org/jira/browse/CARBONDATA-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1743. -- Resolution: Fixed > Carbon1.3.0-Pre-AggregateTable - Query returns no value if run at the time of > pre-aggregate table creation > -- > > Key: CARBONDATA-1743 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1743 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor > Labels: DFX > Fix For: 1.3.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Steps: > 1. Create table and load with large data > create table if not exists lineitem4(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem4 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. Create a pre-aggregate table > create datamap agr_lineitem4 ON TABLE lineitem4 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem4 > group by L_RETURNFLAG, L_LINESTATUS; > 3. Run aggregate query at the same time > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem4 group by l_returnflag, l_linestatus; > *+Expected:+*: aggregate query should fetch data either from main table or > pre-aggregate table. > *+Actual:+* aggregate query does not return data until the pre-aggregate > table is created > 0: jdbc:hive2://10.18.98.48:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem4 > group by l_returnflag, l_linestatus; > +---+---+--+---+--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--+---+--+ > +---+---+--+---+--+ > No rows selected (1.74 seconds) > 0: jdbc:hive2://10.18.98.48:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem4 > group by l_returnflag, l_linestatus; > +---+---+--+---+--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--+---+--+ > +---+---+--+---+--+ > No rows selected (0.746 seconds) > 0: jdbc:hive2://10.18.98.48:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem4 > group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 2.9808092E7 | 4.471079473931997E10 | > | A | F | 1.145546488E9| 1.717580824169429E12 | > | N | O | 2.31980219E9 | 3.4789002701143467E12 | > | R | F | 1.146403932E9| 1.7190627928317903E12 | > +---+---+--++--+ > 4 rows selected (0.8 seconds) > 0: jdbc:hive2://10.18.98.48:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem4 > group by l_returnflag, l_linestatus; > +---+---+--++--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--++--+ > | N | F | 2.9808092E7 | 4.471079473931997E10 | > | A | F | 1.145546488E9| 1.717580824169429E12 | > | N
[jira] [Resolved] (CARBONDATA-1907) Avoid unnecessary logging to improve query performance for no dictionary non string columns
[ https://issues.apache.org/jira/browse/CARBONDATA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1907. -- Resolution: Fixed Fix Version/s: 1.3.0 > Avoid unnecessary logging to improve query performance for no dictionary non > string columns > --- > > Key: CARBONDATA-1907 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1907 > Project: CarbonData > Issue Type: Bug >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In case of no dictionary column for non string data types exception is thrown > while parsing when data is empty. Due to this excessive logging is happening > which is impacting the query performance. > Log printed in the logs: > "Problem while converting data type" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1777) Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
[ https://issues.apache.org/jira/browse/CARBONDATA-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1777: Assignee: Kunal Kapoor (was: kumar vishal) > Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell > sessions are not used in the beeline session > - > > Key: CARBONDATA-1777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1777 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Kunal Kapoor >Priority: Minor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create table and load with data > Spark-shell: > 1. create a pre-aggregate table > Beeline: > 1. Run aggregate query > *+Expected:+* Pre-aggregate table should be used in the aggregate query > *+Actual:+* Pre-aggregate table is not used > 1. > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.5" into table > lineitem1 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 2. > carbon.sql("create datamap agr1_lineitem1 ON TABLE lineitem1 USING > 'org.apache.carbondata.datamap.AggregateDataMapHandler' as select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 group by l_returnflag, l_linestatus").show(); > 3. > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus; > Actual: > 0: jdbc:hive2://10.18.98.136:23040> show tables; > +---+---+--+--+ > | database | tableName | isTemporary | > +---+---+--+--+ > | test_db2 | lineitem1 | false| > | test_db2 | lineitem1_agr1_lineitem1 | false| > +---+---+--+--+ > 2 rows selected (0.047 seconds) > Logs: > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Running query 'select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus' > with 7f3091a8-4d7b-40ac-840f-9db6f564c9cf | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,314 | INFO | [pool-23-thread-53] | Parsing command: > select > l_returnflag,l_linestatus,sum(l_quantity),avg(l_quantity),count(l_quantity) > from lineitem1 where l_returnflag = 'R' group by l_returnflag, l_linestatus | > org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | 55: get_table : > db=test_db2 tbl=lineitem1 | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:746) > 2017-11-20 15:46:48,353 | INFO | [pool-23-thread-53] | ugi=anonymous > ip=unknown-ip-addr cmd=get_table : db=test_db2 tbl=lineitem1| > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:371) > 2017-11-20 15:46:48,354 | INFO | [pool-23-thread-53] | 55: Opening raw store > with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore | > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:589) > 2017-11-20 15:46:48,355 | INFO | [pool-23-thread-53] | ObjectStore, > initialize called | > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:289) > 2017-11-20 15:46:48,360 | INFO | [pool-23-thread-53] | Reading in results > for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection > used is closing | org.datanucleus.util.Log4JLogger.info(Log4JLogger.java:77) > 2017-11-20 15:46:48,362 | INFO | [pool-23-thread-53] | Using direct SQL, > underlying DB is MYSQL | > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.
[jira] [Resolved] (CARBONDATA-1913) Global Sort data dataload fails for big with RPC timeout exception
[ https://issues.apache.org/jira/browse/CARBONDATA-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1913. -- Resolution: Fixed Fix Version/s: 1.3.0 > Global Sort data dataload fails for big with RPC timeout exception > --- > > Key: CARBONDATA-1913 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1913 > Project: CarbonData > Issue Type: Bug >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > Fix For: 1.3.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When gloabl sort option is used for big data then for some times it fails with > RPC timeout after 120s. > This is happening because the driver is not able to unpersist rdd cache with > in 120s. > The issue is happening due to rdd unpersist blocking call. Sometimes spark is > not able > to unppersist the rdd in default "spark.rpc.askTimeout" or > "spark.network.timeout" time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1714) Carbon1.3.0-Alter Table - Select columns with is null and limit throws ArrayIndexOutOfBoundsException after multiple alter
[ https://issues.apache.org/jira/browse/CARBONDATA-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1714. -- Resolution: Fixed Fix Version/s: 1.3.0 > Carbon1.3.0-Alter Table - Select columns with is null and limit throws > ArrayIndexOutOfBoundsException after multiple alter > -- > > Key: CARBONDATA-1714 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1714 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 > Environment: 3 node ant cluster- SUSE 11 SP4 >Reporter: Chetan Bhat >Assignee: Jatin > Labels: DFX > Fix For: 1.3.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Steps - > Execute the below queries in sequence. > create database test; > use test; > CREATE TABLE uniqdata111785 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='INTEGER_COLUMN1,CUST_ID'); > LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table > uniqdata111785 OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > alter table test.uniqdata111785 RENAME TO uniqdata1117856; > select * from test.uniqdata1117856 limit 100; > ALTER TABLE test.uniqdata1117856 ADD COLUMNS (cust_name1 int); > select * from test.uniqdata1117856 where cust_name1 is null limit 100; > ALTER TABLE test.uniqdata1117856 DROP COLUMNS (cust_name1); > select * from test.uniqdata1117856 where cust_name1 is null limit 100; > ALTER TABLE test.uniqdata1117856 CHANGE CUST_ID CUST_ID BIGINT; > select * from test.uniqdata1117856 where CUST_ID in (10013,10011,1,10019) > limit 10; > ALTER TABLE test.uniqdata1117856 ADD COLUMNS (a1 INT, b1 STRING) > TBLPROPERTIES('DICTIONARY_EXCLUDE'='b1'); > select a1,b1 from test.uniqdata1117856 where a1 is null and b1 is null limit > 100; > Actual Issue : Select columns with is null and limit throws > ArrayIndexOutOfBoundsException after multiple alter operations. > 0: jdbc:hive2://10.18.98.34:23040> select a1,b1 from test.uniqdata1117856 > where a1 is null and b1 is null limit 100; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 9.0 (TID 14, BLR114269, executor 2): > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.carbondata.core.scan.model.QueryModel.setDimAndMsrColumnNode(QueryModel.java:223) > at > org.apache.carbondata.core.scan.model.QueryModel.processFilterExpression(QueryModel.java:172) > at > org.apache.carbondata.core.scan.model.QueryModel.processFilterExpression(QueryModel.java:181) > at > org.apache.carbondata.hadoop.util.CarbonInputFormatUtil.processFilterExpression(CarbonInputFormatUtil.java:118) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:791) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:250) > at > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: (state=,code=0) > Expected : The select query should be successful after m
[jira] [Created] (CARBONDATA-1925) Support expression inside aggregate expression in create and load data on Pre aggregate table
kumar vishal created CARBONDATA-1925: Summary: Support expression inside aggregate expression in create and load data on Pre aggregate table Key: CARBONDATA-1925 URL: https://issues.apache.org/jira/browse/CARBONDATA-1925 Project: CarbonData Issue Type: Sub-task Reporter: kumar vishal Assignee: kumar vishal Support expression inside aggregate expression in create and load data on Pre aggregate table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1926) Support expression inside aggregate expression during query on Pre Aggregate table
kumar vishal created CARBONDATA-1926: Summary: Support expression inside aggregate expression during query on Pre Aggregate table Key: CARBONDATA-1926 URL: https://issues.apache.org/jira/browse/CARBONDATA-1926 Project: CarbonData Issue Type: Sub-task Reporter: kumar vishal Assignee: kumar vishal Support expression inside aggregate expression during query on Pre Aggregate table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1927) Support sub query on Pre Aggregate table
kumar vishal created CARBONDATA-1927: Summary: Support sub query on Pre Aggregate table Key: CARBONDATA-1927 URL: https://issues.apache.org/jira/browse/CARBONDATA-1927 Project: CarbonData Issue Type: Sub-task Reporter: kumar vishal Assignee: kumar vishal Currently sub query is not hitting the pre aggregate table. This Jira is to handle the sub query on pre Aggregate table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1930) Dictionary not found exception is thrown when filter expression is given in aggergate table query
[ https://issues.apache.org/jira/browse/CARBONDATA-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1930. -- Resolution: Fixed Fix Version/s: 1.3.0 > Dictionary not found exception is thrown when filter expression is given in > aggergate table query > - > > Key: CARBONDATA-1930 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1930 > Project: CarbonData > Issue Type: Bug >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Steps to reproduce; > 1. CREATE TABLE filtertable(id int, name string, city string, age string) > STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('dictionary_include'='name,age') > 2. LOAD DATA LOCAL INPATH > 3. create datamap agg9 on table filtertable using 'preaggregate' as select > name, age, sum(age) from filtertable group by name, age > 4. select name, sum(age) from filtertable where age = '29' group by name, age -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1931) DataLoad failed for Aggregate table when measure is used for groupby
[ https://issues.apache.org/jira/browse/CARBONDATA-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1931: Assignee: Babulal (was: kumar vishal) > DataLoad failed for Aggregate table when measure is used for groupby > > > Key: CARBONDATA-1931 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1931 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Assignee: Babulal > Fix For: 1.3.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Run commands in sequence > > spark.sql( > "create table y(year int,month int,name string,salary int) stored by > 'carbondata'" > ) > spark.sql( > s"insert into y select 10,11,'x',12" > ) > spark.sql("create datamap y1_sum1 on table y using 'preaggregate' as select > year,name,sum(salary) from y group by year,name") > Result :- Aggregate creation is failed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1931) DataLoad failed for Aggregate table when measure is used for groupby
[ https://issues.apache.org/jira/browse/CARBONDATA-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1931. -- Resolution: Fixed Fix Version/s: 1.3.0 > DataLoad failed for Aggregate table when measure is used for groupby > > > Key: CARBONDATA-1931 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1931 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Assignee: Babulal > Fix For: 1.3.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Run commands in sequence > > spark.sql( > "create table y(year int,month int,name string,salary int) stored by > 'carbondata'" > ) > spark.sql( > s"insert into y select 10,11,'x',12" > ) > spark.sql("create datamap y1_sum1 on table y using 'preaggregate' as select > year,name,sum(salary) from y group by year,name") > Result :- Aggregate creation is failed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1931) DataLoad failed for Aggregate table when measure is used for groupby
[ https://issues.apache.org/jira/browse/CARBONDATA-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1931: Assignee: kumar vishal > DataLoad failed for Aggregate table when measure is used for groupby > > > Key: CARBONDATA-1931 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1931 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Assignee: kumar vishal > Fix For: 1.3.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Run commands in sequence > > spark.sql( > "create table y(year int,month int,name string,salary int) stored by > 'carbondata'" > ) > spark.sql( > s"insert into y select 10,11,'x',12" > ) > spark.sql("create datamap y1_sum1 on table y using 'preaggregate' as select > year,name,sum(salary) from y group by year,name") > Result :- Aggregate creation is failed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1953) Pre-aggregate Should inherit sort column,sort_scope,dictionary encoding from main table
[ https://issues.apache.org/jira/browse/CARBONDATA-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1953. -- Resolution: Fixed Fix Version/s: 1.3.0 > Pre-aggregate Should inherit sort column,sort_scope,dictionary encoding from > main table > --- > > Key: CARBONDATA-1953 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1953 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Assignee: Babulal > Fix For: 1.3.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Pre-aggregate Should inherit sort column,sort_scope,dictionary encoding from > main table > spark.sql("drop table if exists y ") > spark.sql("create table y(year int,month int,name string,salary int) > stored by 'carbondata' > tblproperties('NO_INVERTED_INDEX'='name','sort_scope'='Global_sort','table_blocksize'='23','Dictionary_include'='month','Dictionary_exclude'='year,name','sort_columns'='month,year,name')") > spark.sql("insert into y select 10,11,'babu',12") > spark.sql("create datamap y1_sum1 on table y using 'preaggregate' as select > year,month,name,sum(salary) from y group by year,month,name") > spark.sql("desc formatted y").show(100,false) > spark.sql("desc formatted y_y1_sum1").show(100,false) > -- > |col_name|data_type > |comment >| > ++++ > |y_year |int > |KEY COLUMN,NOINVERTEDINDEX,null >| > |y_month |int > |DICTIONARY, KEY > COLUMN,NOINVERTEDINDEX,null | > |y_name |string > |KEY COLUMN,null >| > |y_salary_sum|bigint > |MEASURE,null >| > || > | >| > |##Detailed Table Information| > | >| > |Database Name |default > | >| > |Table Name |y_y1_sum1 > | >| > |CARBON Store Path > |D:\code\carbondata\myfork\incubator-carbondata/examples/spark2/target/store >|| > |Comment | > | >| > |Table Block Size|1024 MB > | >| > |Table Data Size |1297 > | >| > |Table Index Size|1076 > | >| > |Last Update Time|1514546841061 > | >| > |SORT_SCOPE |LOCAL_SORT > |LOCAL_SORT >| > |Streaming |false
[jira] [Assigned] (CARBONDATA-1719) Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table created in parallel with table load, aggregate query returns no data
[ https://issues.apache.org/jira/browse/CARBONDATA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1719: Assignee: Jatin (was: kumar vishal) > Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table > created in parallel with table load, aggregate query returns no data > -- > > Key: CARBONDATA-1719 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1719 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Jatin > Labels: DFX > Fix For: 1.3.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > 1. Create a table > create table if not exists lineitem3(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > 2. Run load queries and create pre-agg table queries in diff console: > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem3 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > create datamap agr_lineitem3 ON TABLE lineitem3 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem3 > group by L_RETURNFLAG, L_LINESTATUS; > 3. Check table content using aggregate query: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus; > 0: jdbc:hive2://10.18.98.34:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem3 > group by l_returnflag, l_linestatus; > +---+---+--+---+--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--+---+--+ > +---+---+--+---+--+ > No rows selected (1.258 seconds) > HDFS data: > BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs > -ls /carbonstore/default/lineitem3_agr_lineitem3/Fact/Part0/Segment_0 > BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs > -ls /carbonstore/default/lineitem3/Fact/Part0/Segment_0 > Found 27 items > -rw-r--r-- 2 root users 22148 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/1510740293106.carbonindexmerge > -rw-r--r-- 2 root users 58353052 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58351680 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-0_batchno1-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58364823 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-1_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58356303 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-2_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58342246 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-0_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58353186 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-0_batchno1-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58352964 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-1_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58357183 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-2_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58345739 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-2-0_batchno0-0-1510740300247.carbondata > Yarn job stages: > 29 > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem3 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PAR
[jira] [Assigned] (CARBONDATA-1719) Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table created in parallel with table load, aggregate query returns no data
[ https://issues.apache.org/jira/browse/CARBONDATA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1719: Assignee: kumar vishal (was: Jatin) > Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table > created in parallel with table load, aggregate query returns no data > -- > > Key: CARBONDATA-1719 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1719 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: kumar vishal > Labels: DFX > Fix For: 1.3.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > 1. Create a table > create table if not exists lineitem3(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > 2. Run load queries and create pre-agg table queries in diff console: > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem3 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > create datamap agr_lineitem3 ON TABLE lineitem3 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem3 > group by L_RETURNFLAG, L_LINESTATUS; > 3. Check table content using aggregate query: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus; > 0: jdbc:hive2://10.18.98.34:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem3 > group by l_returnflag, l_linestatus; > +---+---+--+---+--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--+---+--+ > +---+---+--+---+--+ > No rows selected (1.258 seconds) > HDFS data: > BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs > -ls /carbonstore/default/lineitem3_agr_lineitem3/Fact/Part0/Segment_0 > BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs > -ls /carbonstore/default/lineitem3/Fact/Part0/Segment_0 > Found 27 items > -rw-r--r-- 2 root users 22148 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/1510740293106.carbonindexmerge > -rw-r--r-- 2 root users 58353052 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58351680 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-0_batchno1-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58364823 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-1_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58356303 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-2_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58342246 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-0_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58353186 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-0_batchno1-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58352964 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-1_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58357183 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-2_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58345739 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-2-0_batchno0-0-1510740300247.carbondata > Yarn job stages: > 29 > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem3 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKE
[jira] [Resolved] (CARBONDATA-1719) Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table created in parallel with table load, aggregate query returns no data
[ https://issues.apache.org/jira/browse/CARBONDATA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1719. -- Resolution: Fixed > Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table > created in parallel with table load, aggregate query returns no data > -- > > Key: CARBONDATA-1719 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1719 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: Jatin > Labels: DFX > Fix For: 1.3.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > 1. Create a table > create table if not exists lineitem3(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); > 2. Run load queries and create pre-agg table queries in diff console: > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem3 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > create datamap agr_lineitem3 ON TABLE lineitem3 USING > "org.apache.carbondata.datamap.AggregateDataMapHandler" as select > L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem3 > group by L_RETURNFLAG, L_LINESTATUS; > 3. Check table content using aggregate query: > select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from > lineitem3 group by l_returnflag, l_linestatus; > 0: jdbc:hive2://10.18.98.34:23040> select > l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem3 > group by l_returnflag, l_linestatus; > +---+---+--+---+--+ > | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | > +---+---+--+---+--+ > +---+---+--+---+--+ > No rows selected (1.258 seconds) > HDFS data: > BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs > -ls /carbonstore/default/lineitem3_agr_lineitem3/Fact/Part0/Segment_0 > BLR114307:/srv/spark2.2Bigdata/install/hadoop/datanode # bin/hadoop fs > -ls /carbonstore/default/lineitem3/Fact/Part0/Segment_0 > Found 27 items > -rw-r--r-- 2 root users 22148 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/1510740293106.carbonindexmerge > -rw-r--r-- 2 root users 58353052 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58351680 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-0_batchno1-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58364823 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-1_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58356303 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-0-2_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58342246 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-0_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58353186 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-0_batchno1-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58352964 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-1_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58357183 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-1-2_batchno0-0-1510740300247.carbondata > -rw-r--r-- 2 root users 58345739 2017-11-15 18:05 > /carbonstore/default/lineitem3/Fact/Part0/Segment_0/part-2-0_batchno0-0-1510740300247.carbondata > Yarn job stages: > 29 > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem3 > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUM
[jira] [Created] (CARBONDATA-2022) Query With table alias is not hitting pre aggregate table
kumar vishal created CARBONDATA-2022: Summary: Query With table alias is not hitting pre aggregate table Key: CARBONDATA-2022 URL: https://issues.apache.org/jira/browse/CARBONDATA-2022 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: Babulal **Problem:**Query with table alias is Not hitting pre Aggregate table. **Solution:** Problem is table alias query is plan is coming as SubQueryAlias(alias, SubqueryAlias) ans this case is not present in tranform query plan for pre aggregate -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1927) Support sub query on Pre Aggregate table
[ https://issues.apache.org/jira/browse/CARBONDATA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1927. -- Resolution: Fixed Fix Version/s: 1.3.0 Fixed as a part of https://github.com/apache/carbondata/pull/1728 > Support sub query on Pre Aggregate table > > > Key: CARBONDATA-1927 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1927 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: kumar vishal >Priority: Major > Fix For: 1.3.0 > > > Currently sub query is not hitting the pre aggregate table. This Jira is to > handle the sub query on pre Aggregate table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2029) Query with expression is giving wrong result
kumar vishal created CARBONDATA-2029: Summary: Query with expression is giving wrong result Key: CARBONDATA-2029 URL: https://issues.apache.org/jira/browse/CARBONDATA-2029 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Create Maintable: CREATE TABLE mainTable(id int, name string, city string, age string) STORED BY 'org.apache.carbondata.format Create datamap create datamap agg1 on table mainTable using 'preaggregate' as select name,sum(id) from mainTable group by name Load data Run query :select sum(id)+count(id) from maintable is giving wrong result Problem: When query has expression it is not checking which aggregate function is applied on table and based on table it is selecting aggregate table Solution: While extracting the aggregate expression from query plan in case if any expression is present extract which aggregate function applied on column to select the aggregate table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2034) Improve query performance
kumar vishal created CARBONDATA-2034: Summary: Improve query performance Key: CARBONDATA-2034 URL: https://issues.apache.org/jira/browse/CARBONDATA-2034 Project: CarbonData Issue Type: Improvement Reporter: kumar vishal Assignee: kumar vishal *Problem: Dictionary loading is taking more time in executor side when number of nodes is high.* *Solution: During query no need to load non complex dimension dictionary. Dictionary decoder will take care of loading and decoding the dictionary column* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2042) Data Mismatch issue in case of Timeseries Year, Month and Day level table
kumar vishal created CARBONDATA-2042: Summary: Data Mismatch issue in case of Timeseries Year, Month and Day level table Key: CARBONDATA-2042 URL: https://issues.apache.org/jira/browse/CARBONDATA-2042 Project: CarbonData Issue Type: Improvement Reporter: kumar vishal Assignee: kumar vishal Attachments: data_sort.csv sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table mainTable") sql("CREATE TABLE table_03 (imei string,age int,mac string,productdate timestamp,updatedate timestamp,gamePointId double,contractid double ) STORED BY 'org.apache.carbondata.format'") sql(s"LOAD DATA inpath '$resourcesPath/data_sort.csv' INTO table table_03 options ('DELIMITER'=',', 'QUOTECHAR'='','FILEHEADER'='imei,age,mac,productdate,updatedate,gamePointId,contractid')") sql("create datamap ag1 on table table_03 using 'preaggregate' DMPROPERTIES ( 'timeseries.eventtime'='productdate','timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1')as select productdate,mac,sum(age) from table_03 group by productdate,mac") -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2045) Query from segment set is not effective when pre-aggregate table is present
kumar vishal created CARBONDATA-2045: Summary: Query from segment set is not effective when pre-aggregate table is present Key: CARBONDATA-2045 URL: https://issues.apache.org/jira/browse/CARBONDATA-2045 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal 1. Create a table create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ('table_blocksize'='128','NO_INVERTED_INDEX'='L_SHIPDATE,L_SHIPMODE,L_SHIPINSTRUCT,L_RETURNFLAG,L_RECEIPTDATE,L_ORDERKEY,L_PARTKEY,L_SUPPKEY','sort_columns'=''); 2. Run load : load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem1 options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); 3. create pre-agg table create datamap agr_lineitem3 ON TABLE lineitem3 USING "org.apache.carbondata.datamap.AggregateDataMapHandler" as select L_RETURNFLAG,L_LINESTATUS,sum(L_QUANTITY),sum(L_EXTENDEDPRICE) from lineitem3 group by L_RETURNFLAG, L_LINESTATUS; 3. Check table content using aggregate query: select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem3 group by l_returnflag, l_linestatus; ++--++--++-- |l_returnflag|l_linestatus|sum(l_quantity)|sum(l_extendedprice)| ++--++--++-- |N|F|4913382.0|7.369901176949993E9| |A|F|1.88818373E8|2.8310705145736383E11| |N|O|3.82400594E8|5.734650756707479E11| |R|F|1.88960009E8|2.833523780876951E11| ++--++--++-- 4 rows selected (1.568 seconds) 4. Load one more time: load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table lineitem1 options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); 5. Check table content using aggregate query: select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem3 group by l_returnflag, l_linestatus; ++--++--++-- |l_returnflag|l_linestatus|sum(l_quantity)|sum(l_extendedprice)| ++--++--++-- |N|F|9826764.0|1.4739802353899986E10| |A|F|3.77636746E8|5.662141029147278E11| |N|O|7.64801188E8|1.1469301513414958E12| |R|F|3.77920018E8|5.667047561753901E11| ++--++--++-- 6. Set query from segment 1: 0: jdbc:hive2://10.18.98.48:23040> set carbon.input.segments.test_db1.lilneitem1=1; +-+---++-- |key|value| +-+---++-- |carbon.input.segments.test_db1.lilneitem1|1| +-+---++-- 7. Check table content using aggregate query: select l_returnflag,l_linestatus,sum(l_quantity),sum(l_extendedprice) from lineitem3 group by l_returnflag, l_linestatus; *+Expected+*: It should return the values from segment 1 alone. *+Actual :+* : It returns values from both segments ++--++--++-- |l_returnflag|l_linestatus|sum(l_quantity)|sum(l_extendedprice)| ++--++--++-- |N|F|9826764.0|1.4739802353899986E10| |A|F|3.77636746E8|5.662141029147278E11| |N|O|7.64801188E8|1.1469301513414958E12| |R|F|3.77920018E8|5.667047561753901E11| ++--++--++-- -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2042) Data Mismatch issue in case of Timeseries Year, Month and Day level table
[ https://issues.apache.org/jira/browse/CARBONDATA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-2042: - Issue Type: Bug (was: Improvement) > Data Mismatch issue in case of Timeseries Year, Month and Day level table > - > > Key: CARBONDATA-2042 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2042 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal >Priority: Major > Attachments: data_sort.csv > > Time Spent: 0.5h > Remaining Estimate: 0h > > sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into table > mainTable") > sql("CREATE TABLE table_03 (imei string,age int,mac string,productdate > timestamp,updatedate timestamp,gamePointId double,contractid double ) STORED > BY 'org.apache.carbondata.format'") > sql(s"LOAD DATA inpath '$resourcesPath/data_sort.csv' INTO table table_03 > options ('DELIMITER'=',', > 'QUOTECHAR'='','FILEHEADER'='imei,age,mac,productdate,updatedate,gamePointId,contractid')") > sql("create datamap ag1 on table table_03 using 'preaggregate' DMPROPERTIES ( > 'timeseries.eventtime'='productdate','timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1')as > select productdate,mac,sum(age) from table_03 group by productdate,mac") -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2031) Select column with is null for no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/CARBONDATA-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2031. -- Resolution: Fixed Fix Version/s: 1.3.0 > Select column with is null for no_inverted_index column throws > java.lang.ArrayIndexOutOfBoundsException > --- > > Key: CARBONDATA-2031 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2031 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Minor > Fix For: 1.3.0 > > Attachments: dest.csv > > Time Spent: 2h 50m > Remaining Estimate: 0h > > steps: > {color:#33}1) create table zerorows_part (c1 string,c2 int,c3 string,c5 > string) STORED BY 'carbondata' > TBLPROPERTIES('DICTIONARY_INCLUDE'='C2','NO_INVERTED_INDEX'='C2'){color} > {color:#33}2){color}{color:#33}LOAD DATA LOCAL INPATH > '$filepath/dest.csv' INTO table zerorows_part > OPTIONS('delimiter'=',','fileheader'='c1,c2,c3,c5'){color} > {color:#33}3){color}{color:#33}select c2 from zerorows_part where c2 > is null{color} > > *Previous exception in task: java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 0* > > *org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.updateScanner(AbstractDataBlockIterator.java:136)* > > *org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:64)* > > *org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46)* > > *org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:283)* > > *org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:171)* > > *org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:370)* > > *org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown > Source)* > > *org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source)* > > *org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)* > > *org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)* > > *org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)* > > *org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)* > > *org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)* > > *org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)* > *org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)* > *org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)* > *org.apache.spark.rdd.RDD.iterator(RDD.scala:287)* > *org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)* > *org.apache.spark.scheduler.Task.run(Task.scala:108)* > *org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)* > > *java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)* > > *java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)* > *java.lang.Thread.run(Thread.java:748)* > *at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)* > *at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)* > *at org.apache.spark.scheduler.Task.run(Task.scala:118)* > *at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)* > *at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)* > *at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)* > *at java.lang.Thread.run(Thread.java:748)* > > > {color:#33}[^dest.csv]{color} > {color:#33} {color} > > > > ** -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2028) Select Query failed with preagg having timeseries and normal agg table together
[ https://issues.apache.org/jira/browse/CARBONDATA-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2028. -- Resolution: Fixed Fix Version/s: 1.3.0 > Select Query failed with preagg having timeseries and normal agg table > together > --- > > Key: CARBONDATA-2028 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2028 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Assignee: Babulal >Priority: Major > Fix For: 1.3.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > sql("drop table if exists maintabletime") > sql("create table maintabletime(year int,month int,name string,salary > int,dob timestamp) stored by 'carbondata' > tblproperties('sort_scope'='Global_sort','table_blocksize'='23','sort_columns'='month,year,name')") > sql("insert into maintabletime select 10,11,'babu',12,'2014-01-01 00:00:00'") > sql("create datamap agg0 on table maintabletime using 'preaggregate' as > select dob,name from maintabletime group by dob,name") > sql("create datamap agg1 on table maintabletime using 'preaggregate' > DMPROPERTIES ('timeseries.eventTime'='dob', > 'timeseries.hierarchy'='hour=1,day=1,month=1,year=1') as select dob,name from > maintabletime group by dob,name") > val df = sql("select timeseries(dob,'year') from maintabletime group by > timeseries(dob,'year')") > > > *Exception* > Exception in thread "main" org.apache.spark.sql.AnalysisException: Column > does not exists in Pre Aggregate table; > at > org.apache.spark.sql.hive.CarbonPreAggregateQueryRules.getChildAttributeReference(CarbonPreAggregateRules.scala:719) > at > org.apache.spark.sql.hive.CarbonPreAggregateQueryRules$$anonfun$19$$anonfun$4.applyOrElse(CarbonPreAggregateRules.scala:855) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2030) avg with Aggregate table for double data type is failed.
[ https://issues.apache.org/jira/browse/CARBONDATA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2030. -- Resolution: Fixed Assignee: Babulal Fix Version/s: 1.3.0 > avg with Aggregate table for double data type is failed. > - > > Key: CARBONDATA-2030 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2030 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Assignee: Babulal >Priority: Major > Fix For: 1.3.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > spark.sql("drop table if exists y ") > spark.sql("create table y(year int,month int,name string,salary double) > stored by 'carbondata' > tblproperties('sort_scope'='Global_sort','table_blocksize'='23','sort_columns'='month,year,name')") > spark.sql("insert into y select 10,11,'babu',12.89") > spark.sql("insert into y select 10,11,'babu',12.89") > spark.sql("create datamap y1_sum1 on table y using 'preaggregate' as select > name,avg(salary) from y group by name") > spark.sql("select name,avg(salary) from y group by name").show(false) > > > Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot > resolve '(sum(y_y1_sum1.`y_salary_sum`) / sum(y_y1_sum1.`y_salary_count`))' > due to data type mismatch: differing types in '(sum(y_y1_sum1.`y_salary_sum`) > / sum(y_y1_sum1.`y_salary_count`))' (double and bigint).;; > 'Aggregate [y_name#25], [y_name#25 AS name#41, (sum(y_salary_sum#26) / > sum(y_salary_count#27L)) AS avg(salary)#46] > +- Relation[y_name#25,y_salary_sum#26,y_salary_count#27L] > CarbonDatasourceHadoopRelation [ Database name :default, Table name > :y_y1_sum1, Schema :Some(StructType(StructField(y_name,StringType,true), > StructField(y_salary_sum,DoubleType,true), > StructField(y_salary_count,LongType,true))) ] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2036) Insert overwrite on static partition cannot work properly
[ https://issues.apache.org/jira/browse/CARBONDATA-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2036. -- Resolution: Fixed Fix Version/s: 1.3.0 > Insert overwrite on static partition cannot work properly > - > > Key: CARBONDATA-2036 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2036 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > When trying to insert overwrite on the static partition with 0 at first on > int column has an issue. > Example : > create table test(d1 string) partition by (c1 int, c2 int, c3 int) > And use insert overwrite table partition(01, 02, 03) select "s1" > > The above case has a problem as 01 is not converting to actual integer to > partition map file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2068) Drop datamap should work for timeseries
[ https://issues.apache.org/jira/browse/CARBONDATA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335758#comment-16335758 ] kumar vishal commented on CARBONDATA-2068: -- [~xubo245] can u please add your testcase for the above scenario > Drop datamap should work for timeseries > - > > Key: CARBONDATA-2068 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2068 > Project: CarbonData > Issue Type: Bug > Components: core, spark-integration >Affects Versions: 1.3.0 >Reporter: xubo245 >Priority: Major > Fix For: 1.3.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Drop datamap is not work after creating timeseries datamap for preaggregate > table, > but it should work. > refer: > https://issues.apache.org/jira/browse/CARBONDATA-1516 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2085) It's different between load twice and create datamap with load again after load data and create datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343347#comment-16343347 ] kumar vishal commented on CARBONDATA-2085: -- [~xubo245] +*First Case:*+ *Create table,* *Load data* *Create data Map* *Load data* *In this case it will have two segments in data map so it will return 2 rows ,* *Second case* *Create table,* *Load data* *Load data* *Create data Map* *In this case it will have 1 segments as out of 2 maintable segments data will be aggregated and only 1 segments will be created for data map so it will have 1 row as complete data is aggregate* *Note: While creating a data map if maintable data is already loaded then it will create only one segments and complete aggregated data will be present in one segment* *When data is loaded after creating data map then new segment will be created for data map, that segment will contain the data of only that load* *To Validate the Result of data map whether its correct or not please run the query on maintable* > It's different between load twice and create datamap with load again after > load data and create datamap > --- > > Key: CARBONDATA-2085 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2085 > Project: CarbonData > Issue Type: Bug > Components: core, spark-integration >Affects Versions: 1.3.0 >Reporter: xubo245 >Priority: Major > Fix For: 1.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > It's different between two test case > test case 1: load twice and create datamap , and then query > test case 2:load once , create datamap and load again, and then query > {code:java} > + test("load data into mainTable after create timeseries datamap on table > 1") { > +sql("drop table if exists mainTable") > +sql( > + """ > +| CREATE TABLE mainTable( > +| mytime timestamp, > +| name string, > +| age int) > +| STORED BY 'org.apache.carbondata.format' > + """.stripMargin) > + > +sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into > table mainTable") > + > +sql( > + """ > +| create datamap agg0 on table mainTable > +| using 'preaggregate' > +| DMPROPERTIES ( > +| 'timeseries.eventTime'='mytime', > +| > 'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1') > +| as select mytime, sum(age) > +| from mainTable > +| group by mytime""".stripMargin) > + > +sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into > table mainTable") > +val df = sql( > + """ > +| select > +| timeseries(mytime,'minute') as minuteLevel, > +| sum(age) as sum > +| from mainTable > +| where timeseries(mytime,'minute')>='2016-02-23 01:01:00' > +| group by > +| timeseries(mytime,'minute') > +| order by > +| timeseries(mytime,'minute') > + """.stripMargin) > + > +// only for test, it need remove before merge > +df.show() > +sql("select * from maintable_agg0_minute").show(100) > + > +checkAnswer(df, > + Seq(Row(Timestamp.valueOf("2016-02-23 01:01:00"), 120), > +Row(Timestamp.valueOf("2016-02-23 01:02:00"), 280))) > + > + } > + > + test("load data into mainTable after create timeseries datamap on table > 2") { > +sql("drop table if exists mainTable") > +sql( > + """ > +| CREATE TABLE mainTable( > +| mytime timestamp, > +| name string, > +| age int) > +| STORED BY 'org.apache.carbondata.format' > + """.stripMargin) > + > +sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into > table mainTable") > +sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/timeseriestest.csv' into > table mainTable") > +sql( > + """ > +| create datamap agg0 on table mainTable > +| using 'preaggregate' > +| DMPROPERTIES ( > +| 'timeseries.eventTime'='mytime', > +| > 'timeseries.hierarchy'='second=1,minute=1,hour=1,day=1,month=1,year=1') > +| as select mytime, sum(age) > +| from mainTable > +| group by mytime""".stripMargin) > + > + > +val df = sql( > + """ > +| select > +| timeseries(mytime,'minute') as minuteLevel, > +| sum(age) as sum > +| from mainTable > +| where timeseries(mytime,'minute')>='2016-02-23 01:01:00' > +| group by > +| timeseries(mytime,'minute') > +| order by > +
[jira] [Created] (CARBONDATA-2101) Restrict Direct query on aggregation and timeseries data map
kumar vishal created CARBONDATA-2101: Summary: Restrict Direct query on aggregation and timeseries data map Key: CARBONDATA-2101 URL: https://issues.apache.org/jira/browse/CARBONDATA-2101 Project: CarbonData Issue Type: Improvement Reporter: kumar vishal Restrict direct query on timeseries and pre-aggregate data map -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2101) Restrict Direct query on aggregation and timeseries data map
[ https://issues.apache.org/jira/browse/CARBONDATA-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-2101: Assignee: kumar vishal > Restrict Direct query on aggregation and timeseries data map > > > Key: CARBONDATA-2101 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2101 > Project: CarbonData > Issue Type: Improvement >Reporter: kumar vishal >Assignee: kumar vishal >Priority: Major > > Restrict direct query on timeseries and pre-aggregate data map > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-1224) Going out of memory if more segments are compacted at once in V3 format
[ https://issues.apache.org/jira/browse/CARBONDATA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1224. -- Resolution: Fixed Fix Version/s: 1.3.0 > Going out of memory if more segments are compacted at once in V3 format > --- > > Key: CARBONDATA-1224 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1224 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala >Priority: Major > Fix For: 1.3.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > In V3 format we read the whole blocklet at once to memory in order save IO > time. But it turns out to be costlier in case of parallel reading of more > carbondata files. > For example if we need to compact 50 segments then compactor need to open the > readers on all the 50 segments to do merge sort. But the memory consumption > is too high if each reader reads whole blocklet to the memory and there is > high chances of going out of memory. > Solution: > In this type of scenarios we can introduce new readers for V3 to read the > data page by page instead of reading whole blocklet at once to reduce the > memory footprint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2107) Average query is failing when data map has both sum(column) and avg(column) of big int, int type
kumar vishal created CARBONDATA-2107: Summary: Average query is failing when data map has both sum(column) and avg(column) of big int, int type Key: CARBONDATA-2107 URL: https://issues.apache.org/jira/browse/CARBONDATA-2107 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Average query is failing when data map has both sum(column) and avg(column) of big int, int type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2108) RefactorUnsafe sort property
kumar vishal created CARBONDATA-2108: Summary: RefactorUnsafe sort property Key: CARBONDATA-2108 URL: https://issues.apache.org/jira/browse/CARBONDATA-2108 Project: CarbonData Issue Type: Improvement Reporter: kumar vishal Assignee: kumar vishal Deprecated old property: sort.inmemory.size.inmb, Add new property: carbon.sort.storage.inmemory.size.inmb, If user has configured old property(sort.inmemory.size.inmb) then internally it will be converted to new property for ex: If user has configured sort.inmemory.size.inmb then 20% memory will be used as working memory and rest for storage memory -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2094) Filter DataMap Tables in "Show Table Command"
[ https://issues.apache.org/jira/browse/CARBONDATA-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2094. -- Resolution: Fixed Fix Version/s: 1.3.0 > Filter DataMap Tables in "Show Table Command" > - > > Key: CARBONDATA-2094 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2094 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Priority: Major > Fix For: 1.3.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Currently Show Table command shows datamap tables (agg tablels) but show > table command should not show aggregate tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2117) Fixed Synchronization issue while creating multiple carbon session
kumar vishal created CARBONDATA-2117: Summary: Fixed Synchronization issue while creating multiple carbon session Key: CARBONDATA-2117 URL: https://issues.apache.org/jira/browse/CARBONDATA-2117 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal +*Problem:*+ When creating multiple session (100) session initialisation is failing with below error java.lang.IllegalArgumentException: requirement failed: Config entry enable.unsafe.sort already registered! +*Solution: Currently in CarbonEnv we are updating global configuration(shared) and location configuration in class level synchronized block. In case of multiple session class level lock will not work , need to add global level lock so only one thread will update the global configuration*+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2082) Timeseries pre-aggregate table should support the blank space
[ https://issues.apache.org/jira/browse/CARBONDATA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2082. -- Resolution: Fixed > Timeseries pre-aggregate table should support the blank space > - > > Key: CARBONDATA-2082 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2082 > Project: CarbonData > Issue Type: Bug > Components: core, spark-integration >Affects Versions: 1.3.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > timeseries pre-aggregate table should support the blank space > 1.scenario 1 > {code:java} >test("test timeseries create table 35: support event_time and granularity > key with space") { > sql("DROP DATAMAP IF EXISTS agg1_month ON TABLE maintable") > sql( > s"""CREATE DATAMAP agg1_month ON TABLE mainTable >|USING '$timeSeries' >|DMPROPERTIES ( >| 'event_time '=' dataTime', >| 'MONTH_GRANULARITY '='1') >|AS SELECT dataTime, SUM(age) FROM mainTable >|GROUP BY dataTime > """.stripMargin) > checkExistence(sql("SHOW TABLES"), true, "maintable_agg1_month") > } > {code} > problem: NPE > {code:java} > java.lang.NullPointerException was thrown. > java.lang.NullPointerException > at > org.apache.spark.sql.execution.command.timeseries.TimeSeriesUtil$.validateTimeSeriesEventTime(TimeSeriesUtil.scala:50) > at > org.apache.spark.sql.execution.command.preaaggregate.CreatePreAggregateTableCommand.processMetadata(CreatePreAggregateTableCommand.scala:104) > at > org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:75) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:84) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > {code} > 2.scenario 2 > {code:java} > test("test timeseries create table 36: support event_time and > granularity key with space") { > sql("DROP DATAMAP IF EXISTS agg1_month ON TABLE maintable") > sql( > s"""CREATE DATAMAP agg1_month ON TABLE mainTable >|USING '$timeSeries' >|DMPROPERTIES ( >| 'event_time '='dataTime', >| 'MONTH_GRANULARITY '=' 1') >|AS SELECT dataTime, SUM(age) FROM mainTable >|GROUP BY dataTime > """.stripMargin) > checkExistence(sql("SHOW TABLES"), true, > "maintable_agg1_month") > } > > {code} > problem: > {code:java} > Granularity only support 1 > org.apache.carbondata.spark.exception.MalformedDataMapCommandException: > Granularity only support 1 > at > org.apache.spark.sql.execution.command.timeseries.TimeSeriesUtil$.getTimeSeriesGranularityDetails(TimeSeriesUtil.scala:118) > at > org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:58) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:84) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2120) Fixed data mismatch for No dictionary numeric data type
kumar vishal created CARBONDATA-2120: Summary: Fixed data mismatch for No dictionary numeric data type Key: CARBONDATA-2120 URL: https://issues.apache.org/jira/browse/CARBONDATA-2120 Project: CarbonData Issue Type: Bug Reporter: kumar vishal *Problem:* Is null filter is failing for numeric data type(No dictionary column). *Root cause:* Min max calculation is wrong when no dictionary column is not the first column. As it is not the first column null value can come in between and min max for null value is getting updated only when first row is null *Solution:* Update the min max in all the case when value is null or not null for all type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-1918) Incorrect data is displayed when String is updated using Sentences
[ https://issues.apache.org/jira/browse/CARBONDATA-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-1918. -- Resolution: Fixed Fix Version/s: 1.3.0 > Incorrect data is displayed when String is updated using Sentences > -- > > Key: CARBONDATA-1918 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1918 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > update t_carbn01 set (active_status)= (sentences('Hello there! How are > you?')); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (2.784 seconds) > select active_status from t_carbn01; > +-+--+ > | active_status | > +-+--+ > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > +-+–+ > > The issue for sentences function also occurs when the below update is > performed. > update t_carbn01 set (active_status)= (split('ab', 'a')); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2105) Incorrect result displays after creating data map
[ https://issues.apache.org/jira/browse/CARBONDATA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2105. -- Resolution: Fixed Fix Version/s: 1.3.0 > Incorrect result displays after creating data map > - > > Key: CARBONDATA-2105 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2105 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.3.0 >Reporter: Vandana Yadav >Assignee: anubhav tarar >Priority: Major > Fix For: 1.3.0 > > Attachments: 2000_UniqData.csv > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Incorrect result displays after creating data map > Steps to Reproduce: > 1. Create a TAble: > CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > 2. Load Data > a) LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into > table uniqdata OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > b) LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into > table uniqdata OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > c) LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into > table uniqdata OPTIONS('DELIMITER'=',', > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1') > Execute Query: > a) select avg(cust_id) from uniqdata group by cust_id > output: > | 9460.0 | > | 9671.0 | > | 10403.0 | > | 10725.0 | > | 10867.0 | > +---+--+ > | avg(cust_id) | > +---+--+ > | 9067.0 | > | 9901.0 | > +---+--+ > 2,002 rows selected (1.718 seconds) > b) create data map: > create datamap uniqdata_agg on table uniqdata using 'preaggregate' as select > avg(cust_id) from uniqdata group by cust_id; > c) select avg(cust_id) from uniqdata group by cust_id; > output: > | NULL | > | NULL | > | NULL | > +---+--+ > | avg(cust_id) | > +---+--+ > | NULL | > | NULL | > +---+--+ > 2,002 rows selected (0.895 seconds) > Expected result: it should display similar result as before creating datamap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2142) Fixed aggregate data map creation issue in case of hive metastore
kumar vishal created CARBONDATA-2142: Summary: Fixed aggregate data map creation issue in case of hive metastore Key: CARBONDATA-2142 URL: https://issues.apache.org/jira/browse/CARBONDATA-2142 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Fixed aggregate data map creation issue in case of hive metastore -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2208) Pre aggregate datamap creation is failing when count(*) present in query
kumar vishal created CARBONDATA-2208: Summary: Pre aggregate datamap creation is failing when count(*) present in query Key: CARBONDATA-2208 URL: https://issues.apache.org/jira/browse/CARBONDATA-2208 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Pre aggregate data map creation is failing with parsing error create datamap agg9 on table maintable using 'preaggregate' as select name, count(*) from maintable group by name -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2208) Pre aggregate datamap creation is failing when count(*) present in query
[ https://issues.apache.org/jira/browse/CARBONDATA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-2208: - Description: Pre aggregate data map creation is failing with parsing error create datamap agg on table maintable using 'preaggregate' as select name, count(*) from maintable group by name was: Pre aggregate data map creation is failing with parsing error create datamap agg9 on table maintable using 'preaggregate' as select name, count(*) from maintable group by name > Pre aggregate datamap creation is failing when count(*) present in query > > > Key: CARBONDATA-2208 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2208 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Pre aggregate data map creation is failing with parsing error > create datamap agg on table maintable using 'preaggregate' as select name, > count(*) from maintable group by name > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2248) Removing parsers thread local objects after parsing of carbon query
kumar vishal created CARBONDATA-2248: Summary: Removing parsers thread local objects after parsing of carbon query Key: CARBONDATA-2248 URL: https://issues.apache.org/jira/browse/CARBONDATA-2248 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal In some scenarios where more sessions are created, there are many parser failure objects are accumulated in memory inside thread locals. Solution: Remove the parser object from thread local after parsing of the query -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-1522) 6. Loading aggregation tables for streaming data tables.
[ https://issues.apache.org/jira/browse/CARBONDATA-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal reassigned CARBONDATA-1522: Assignee: Kunal Kapoor > 6. Loading aggregation tables for streaming data tables. > > > Key: CARBONDATA-1522 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1522 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Kunal Kapoor >Priority: Major > > we can finish the segment load after receives configurable amount of data > and create a new segment to load new streaming data. While finishing the > segment we can trigger the agg tables on it. So there would not be any agg > tables on ongoing streaming segment but querying can be done on the streaming > segment of actual table. > For example user configures stream_segment size as 1 GB, so for every 1 GB of > stream data it receives it creates a new segment and finishes the current > segment. While finishing the current segment we can trigger agg table loading > and compaction of segments. > While querying of data we change the query plan to apply union of agg table > and streaming segment of actual table to get the current data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2269) Support Query on Pre Aggregate on streaming table
kumar vishal created CARBONDATA-2269: Summary: Support Query on Pre Aggregate on streaming table Key: CARBONDATA-2269 URL: https://issues.apache.org/jira/browse/CARBONDATA-2269 Project: CarbonData Issue Type: Sub-task Reporter: kumar vishal Assignee: kumar vishal For querying the data on PreAggregate table on streaming table change the query plan to apply union of agg table and streaming segment of actual table to get the current data. For more detail, see the streaming ingest design document Query Example for streaming table: +User Query:+ SELECT name, sum(Salary) as totalSalary FROM maintable +Updated Query:+ SELECT name, sum(totalSalary) FROM( SELECT name, sum(Salary) as totalSalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_salary) as totalSalary FROM maintable_agg GROUP BY maintable_name) GROUP BY name) +User Query:+ SELECT name, AVG(Salary) as avgSalary FROM maintable. +Updated Query:+ SELECT name, Divide(sum(sumSalary)/sum(countsalary)) FROM( SELECT name, sum(Salary) as sumSalary,count(salary) countsalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_salary) as sumSalary, count(maintable_salary) countsalary FROM maintable_agg GROUP BY maintable_name) GROUP BY name) +User Query:+ SELECT name, count(Salary) as countSalary FROM maintable. +Updated Query:+ SELECT name, sum(countsalary) FROM( SELECT name, count(Salary) as countSalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_count) FROM maintable_agg GROUP BY maintable_name) GROUP BY name) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2269) Support Query on Pre Aggregate on streaming table
[ https://issues.apache.org/jira/browse/CARBONDATA-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-2269: - Description: Support Query On Pre Aggregate table created on Streaming table For querying the data on PreAggregate table on streaming table change the query plan to apply union of agg table and streaming segment of actual table to get the current data. Query Example for streaming table: **User Query:** SELECT name, sum(Salary) as totalSalary FROM maintable **Updated Query:** SELECT name, sum(totalSalary) FROM( SELECT name, sum(Salary) as totalSalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_salary) as totalSalary FROM maintable_agg GROUP BY maintable_name) GROUP BY name) **User Query:** SELECT name, AVG(Salary) as avgSalary FROM maintable. **Updated Query:** SELECT name, Divide(sum(sumSalary)/sum(countsalary)) FROM( SELECT name, sum(Salary) as sumSalary,count(salary) countsalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_salary) as sumSalary, count(maintable_salary) countsalary FROM maintable_agg GROUP BY maintable_name) GROUP BY name) **User Query:** SELECT name, count(Salary) as countSalary FROM maintable. **Updated Query:** SELECT name, sum(countsalary) FROM( SELECT name, count(Salary) as countSalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_count) FROM maintable_agg GROUP BY maintable_name) GROUP BY name) was: For querying the data on PreAggregate table on streaming table change the query plan to apply union of agg table and streaming segment of actual table to get the current data. For more detail, see the streaming ingest design document Query Example for streaming table: +User Query:+ SELECT name, sum(Salary) as totalSalary FROM maintable +Updated Query:+ SELECT name, sum(totalSalary) FROM( SELECT name, sum(Salary) as totalSalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_salary) as totalSalary FROM maintable_agg GROUP BY maintable_name) GROUP BY name) +User Query:+ SELECT name, AVG(Salary) as avgSalary FROM maintable. +Updated Query:+ SELECT name, Divide(sum(sumSalary)/sum(countsalary)) FROM( SELECT name, sum(Salary) as sumSalary,count(salary) countsalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_salary) as sumSalary, count(maintable_salary) countsalary FROM maintable_agg GROUP BY maintable_name) GROUP BY name) +User Query:+ SELECT name, count(Salary) as countSalary FROM maintable. +Updated Query:+ SELECT name, sum(countsalary) FROM( SELECT name, count(Salary) as countSalary FROM maintable GROUP BY name UNION ALL SELECT maintable_name,sum(maintable_count) FROM maintable_agg GROUP BY maintable_name) GROUP BY name) > Support Query on Pre Aggregate on streaming table > - > > Key: CARBONDATA-2269 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2269 > Project: CarbonData > Issue Type: Sub-task >Reporter: kumar vishal >Assignee: kumar vishal >Priority: Major > > Support Query On Pre Aggregate table created on Streaming table > For querying the data on PreAggregate table on streaming table change the > query plan to apply union of agg table and streaming segment of actual table > to get the current data. > Query Example for streaming table: > **User Query:** > SELECT name, sum(Salary) as totalSalary > FROM maintable > **Updated Query:** > SELECT name, sum(totalSalary) FROM( > SELECT name, sum(Salary) as totalSalary > FROM maintable > GROUP BY name > UNION ALL > SELECT maintable_name,sum(maintable_salary) as totalSalary > FROM maintable_agg > GROUP BY maintable_name) > GROUP BY name) > **User Query:** > SELECT name, AVG(Salary) as avgSalary > FROM maintable. > **Updated Query:** > SELECT name, Divide(sum(sumSalary)/sum(countsalary)) > FROM( > SELECT name, sum(Salary) as sumSalary,count(salary) countsalary > FROM maintable > GROUP BY name > UNION ALL > SELECT maintable_name,sum(maintable_salary) as sumSalary, > count(maintable_salary) countsalary > FROM maintable_agg > GROUP BY maintable_name) > GROUP BY name) > **User Query:** > SELECT name, count(Salary) as countSalary > FROM maintable. > **Updated Query:** > SELECT name, sum(countsalary) > FROM( > SELECT name, count(Salary) as countSalary > FROM maintable > GROUP BY name > UNION ALL > SELECT maintable_name,sum(maintable_count) > FROM maintable_agg > GROUP BY maintable_name) > GROUP BY name) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2312) Support In Memory catalog
kumar vishal created CARBONDATA-2312: Summary: Support In Memory catalog Key: CARBONDATA-2312 URL: https://issues.apache.org/jira/browse/CARBONDATA-2312 Project: CarbonData Issue Type: New Feature Reporter: kumar vishal Assignee: kumar vishal Support Storing Catalog in memory(not in hive) for each session, after session restart user can create eternal table and run select query -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2325) Page level uncompress and Query performance improvement for Unsafe No Dictionary
kumar vishal created CARBONDATA-2325: Summary: Page level uncompress and Query performance improvement for Unsafe No Dictionary Key: CARBONDATA-2325 URL: https://issues.apache.org/jira/browse/CARBONDATA-2325 Project: CarbonData Issue Type: Improvement Reporter: kumar vishal *Page Level Decoder for query* Add page level on demand decoding, in current code, all pages of blocklet is getting uncompressed, because of this memory footprint is too high and causing OOM, Now added code to support page level decoding, one page will be decoding when all the records are processed next page data will be decoded. It will improve query performance for example limit query. *Unsafe No Dictionary(Unsafe variable length)* Optimized getRow(for Vector processing) And putArray method -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2328) Fixed Table Alias Issue in Pre Aggregate
kumar vishal created CARBONDATA-2328: Summary: Fixed Table Alias Issue in Pre Aggregate Key: CARBONDATA-2328 URL: https://issues.apache.org/jira/browse/CARBONDATA-2328 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal *Issue:* Query with table alias is not fetching data from pre-aggregate *Problem:* when table has alias all the attribute reference's qualifiers will have alias name, but as data map is created without alias so expression comparison is failing. *Solution*: While comparing alias remove qualifiers and then compare expressions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2346) Dropping partition failing with null error for Partition table with Pre-Aggregate tables
[ https://issues.apache.org/jira/browse/CARBONDATA-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2346. -- Resolution: Fixed Fix Version/s: 1.4.0 > Dropping partition failing with null error for Partition table with > Pre-Aggregate tables > > > Key: CARBONDATA-2346 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2346 > Project: CarbonData > Issue Type: Bug >Reporter: Praveen M P >Assignee: Praveen M P >Priority: Major > Fix For: 1.4.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2335) Autohandoff is failing when preaggregate is created on streaming table
[ https://issues.apache.org/jira/browse/CARBONDATA-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2335. -- Resolution: Fixed Fix Version/s: 1.4.0 > Autohandoff is failing when preaggregate is created on streaming table > -- > > Key: CARBONDATA-2335 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2335 > Project: CarbonData > Issue Type: Bug >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Major > Fix For: 1.4.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Auto hand off is failing with NullPointerException when preaggregate table is > present in the streaming table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2386) Query on Pre-Aggregate table is slower
kumar vishal created CARBONDATA-2386: Summary: Query on Pre-Aggregate table is slower Key: CARBONDATA-2386 URL: https://issues.apache.org/jira/browse/CARBONDATA-2386 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Fix For: 1.4.0 *Problem:* Query on pre aggregate table is consuming too much time. *Root cause:* Time consumption to calculate size of selecting the smallest Pre-Aggregate table is approximately 76 seconds. This is index file is being read when segment file is present to compute the size of Pre-Aggregate table *Solution:* Read table status and get the size of data file and index file for valid segments. For older segments were datasize and indexsize is not present calculate the size of store folder -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2322) Data mismatch in Aggregate query after compaction on Pre-Agg table on Partition table
[ https://issues.apache.org/jira/browse/CARBONDATA-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2322. -- Resolution: Fixed > Data mismatch in Aggregate query after compaction on Pre-Agg table on > Partition table > - > > Key: CARBONDATA-2322 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2322 > Project: CarbonData > Issue Type: Bug >Reporter: Praveen M P >Assignee: Praveen M P >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2407) Removed All Unused code
kumar vishal created CARBONDATA-2407: Summary: Removed All Unused code Key: CARBONDATA-2407 URL: https://issues.apache.org/jira/browse/CARBONDATA-2407 Project: CarbonData Issue Type: Improvement Reporter: kumar vishal Assignee: kumar vishal After adding datamap, executor btree is not used as driver is loading blocklet information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2407) Removed All Unused Executor BTree code
[ https://issues.apache.org/jira/browse/CARBONDATA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal updated CARBONDATA-2407: - Summary: Removed All Unused Executor BTree code (was: Removed All Unused code ) > Removed All Unused Executor BTree code > --- > > Key: CARBONDATA-2407 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2407 > Project: CarbonData > Issue Type: Improvement >Reporter: kumar vishal >Assignee: kumar vishal >Priority: Major > > After adding datamap, executor btree is not used as driver is loading > blocklet information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2381) Improve compaction performance by filling batch result in columnar format and performing IO at blocklet level
[ https://issues.apache.org/jira/browse/CARBONDATA-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2381. -- Resolution: Fixed Fix Version/s: 1.4.0 > Improve compaction performance by filling batch result in columnar format and > performing IO at blocklet level > - > > Key: CARBONDATA-2381 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2381 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 1.3.1 >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Major > Fix For: 1.4.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Problem: Compaction performance is slow as compared to data load. If > compaction threshold is set to 6,6 then on minor compaction after 6 loads > compaction performance is almost 6-7 times of the total load performance for > 6 loads. > Analysis: > # During compaction result filling is done in row format. Due to this as the > number of columns increases the dimension and measure data filling time > increases. This happens because in row filling we are not able to take > advantage of OS cacheable buffers as we continuously read data for next > column. > # As compaction uses a page level reader flow wherein both IO and > uncompression is done at page level, the IO and uncompression time increases > in this model. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2411) infinite loop when sdk writer throws Exception
[ https://issues.apache.org/jira/browse/CARBONDATA-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kumar vishal resolved CARBONDATA-2411. -- Resolution: Fixed > infinite loop when sdk writer throws Exception > -- > > Key: CARBONDATA-2411 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2411 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Assignee: Babulal >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > When SDK CSVWriter throws Error (Cast String Exception). application is stuck > and data loading Threads are in waiting state. Use below code to reproduce > issue. > > val fields: Array[Field] = new Array[Field](2) > fields(0) = new Field("stringField", DataTypes.STRING) > fields(1) = new Field("intField", DataTypes.INT) > val builder: CarbonWriterBuilder = CarbonWriter.builder.withSchema(new > Schema(fields)) > > .isTransactionalTable(false).outputPath("D:/data/yyy").taskNo("5").isTransactionalTable(false) > val writer: CarbonWriter = builder.buildWriterForCSVInput > writer.write(Array("xyz",1)) > writer.close() > -- This message was sent by Atlassian JIRA (v7.6.3#76005)