[GitHub] carbondata issue #1901: [wip]compatibility fix for v2
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1901 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3396/ ---
[GitHub] carbondata issue #1458: [CARBONDATA-1663] Decouple spark and core modules
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1458 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3395/ ---
[GitHub] carbondata issue #1899: [CARBONDATA-2109]make configs of dataframe load with...
Github user qiuchenjian commented on the issue: https://github.com/apache/carbondata/pull/1899 @xuchuanyin , I agree, this 'tempCSV' option has serveral bugs to fix ---
[GitHub] carbondata issue #1860: [CARBONDATA-2080] [S3-Implementation] Propagated had...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1860 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2158/ ---
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1768 Merged into carbonstore branch ---
[GitHub] carbondata pull request #1768: [CARBONDATA-2025] Unify all path construction...
Github user jackylk closed the pull request at: https://github.com/apache/carbondata/pull/1768 ---
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1768 LGTM ---
[jira] [Created] (CARBONDATA-2112) Data getting garbled after datamap creation when table is created with GLOBAL SORT
Sangeeta Gulia created CARBONDATA-2112: -- Summary: Data getting garbled after datamap creation when table is created with GLOBAL SORT Key: CARBONDATA-2112 URL: https://issues.apache.org/jira/browse/CARBONDATA-2112 Project: CarbonData Issue Type: Bug Components: data-query Environment: spark-2.1 Reporter: Sangeeta Gulia Attachments: 2000_UniqData.csv Data is getting garbled after datamap creation when table is created with BATCH_SORT/GLOBAL_SORT. Steps to reproduce : spark.sql("drop table if exists uniqdata_batchsort_compact3") spark.sql("CREATE TABLE uniqdata_batchsort_compact3 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'carbondata' TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT')").show() spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," + "'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')") spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," + "'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')") spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='\"'," + "'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'='1')") spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 group by cust_id ").show(50) +---++ |cust_id|avg(cust_id)| +---++ | 9376| 9376.0| | 9427| 9427.0| | 9465| 9465.0| | 9852| 9852.0| | 9900| 9900.0| | 10206| 10206.0| | 10362| 10362.0| | 10623| 10623.0| | 10817| 10817.0| | 9182| 9182.0| | 9564| 9564.0| | 9879| 9879.0| | 10081| 10081.0| | 10121| 10121.0| | 10230| 10230.0| | 10462| 10462.0| | 10703| 10703.0| | 10914| 10914.0| | 9162| 9162.0| | 9383| 9383.0| | 9454| 9454.0| | 9517| 9517.0| | 9558| 9558.0| | 10708| 10708.0| | 10798| 10798.0| | 10862| 10862.0| | 9071| 9071.0| | 9169| 9169.0| | 9946| 9946.0| | 10468| 10468.0| | 10745| 10745.0| | 10768| 10768.0| | 9153| 9153.0| | 9206| 9206.0| | 9403| 9403.0| | 9597| 9597.0| | 9647| 9647.0| | 9775| 9775.0| | 10032| 10032.0| | 10395| 10395.0| | 10527| 10527.0| | 10567| 10567.0| | 10632| 10632.0| | 10788| 10788.0| | 10815| 10815.0| | 10840| 10840.0| | 9181| 9181.0| | 9344| 9344.0| | 9575| 9575.0| | 9675| 9675.0| +---++ only showing top 50 rows Note: Here the cust_id is coming correct . spark.sql("create datamap uniqdata_agg on table uniqdata_batchsort_compact3 using " + "'preaggregate' as select avg(cust_id) from uniqdata_batchsort_compact3 group by cust_id") spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 group by cust_id ").show(50) +---++ |cust_id|avg(cust_id)| +---++ | 27651| 9217.0| | 31944| 10648.0| | 32667| 10889.0| | 28242| 9414.0| | 29841| 9947.0| | 28728| 9576.0| | 27255| 9085.0| | 32571| 10857.0| | 30276| 10092.0| | 27276| 9092.0| | 31503| 10501.0| | 27687| 9229.0| | 27183| 9061.0| | 29334| 9778.0| | 29913| 9971.0| | 28683| 9561.0| | 31545| 10515.0| | 30405| 10135.0| | 27693| 9231.0| | 29649| 9883.0| | 30537| 10179.0| | 32709| 10903.0| | 29586| 9862.0| | 32895| 10965.0| | 32415| 10805.0| | 31644| 10548.0| | 30030| 10010.0| | 31713| 10571.0| | 28083| 9361.0| | 27813| 9271.0| | 27171| 9057.0| | 27189| 9063.0| | 30444| 10148.0| | 28623| 9541.0| | 28566| 9522.0| | 32655| 10885.0| | 31164| 10388.0| | 30321| 10107.0| | 31452| 10484.0| | 29829| 9943.0| | 27468| 9156.0| | 31212| 10404.0| | 32154| 10718.0| | 27531| 9177.0| | 27654| 9218.0| | 27105| 9035.0| | 31113| 10371.0| | 28479| 9493.0| | 29094| 9698.0| | 31551| 10517.0| +---++ only showing top 50 rows Note: But after datamap creation, cust_id is coming incorrect. It is coming as thrice(equivalent to number of loads) of its original value and avg(cust_id) is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #1902: [CARBONDATA-2082][CARBONDATA-1516] Timeseries...
GitHub user xubo245 opened a pull request: https://github.com/apache/carbondata/pull/1902 [CARBONDATA-2082][CARBONDATA-1516] Timeseries pre-aggregate table should support the blank space Timeseries pre-aggregate table should support the blank space, including: -event_time -different franularity Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NO You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata checkPreaggAndSupportBlankSpace Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1902.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1902 commit a7408477e3038e296182ec4b56ff7f6bf132e0ea Author: xubo245 <601450868@...> Date: 2018-02-01T07:32:36Z [CARBONDATA-2082][CARBONDATA-1516] Timeseries pre-aggregate table should support the blank space ---
[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1104 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2157/ ---
[GitHub] carbondata issue #1571: [CARBONDATA-1811] Use StructType as schema when crea...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1571 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3394/ ---
[GitHub] carbondata issue #1458: [CARBONDATA-1663] Decouple spark and core modules
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1458 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2156/ ---
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1768 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3392/ ---
[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1104 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3391/ ---
[jira] [Resolved] (CARBONDATA-2111) TPCH query which has multiple joins inside does not return any rows.
[ https://issues.apache.org/jira/browse/CARBONDATA-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Gupta resolved CARBONDATA-2111. -- Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.3.0 > TPCH query which has multiple joins inside does not return any rows. > > > Key: CARBONDATA-2111 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2111 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala >Priority: Major > Fix For: 1.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The below TPCH query which has multiple joins does not return any roes. > {code} > sql( > *"select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, > s_phone, s_comment from "* + > *"part, supplier, partsupp, nation, region where p_partkey = ps_partkey > and s_suppkey = "* + > *"ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = > n_nationkey and "* + > *"n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( > select min"* + > *"(ps_supplycost) from partsupp, supplier,nation, region where p_partkey > = ps_partkey and "* + > *"s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = > r_regionkey and "* + > *"r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey > limit 100"*) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #1901: [wip]compatibility fix for v2
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/1901 [wip]compatibility fix for v2 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata compatibility_v2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1901.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1901 commit b4e6885dab8f01d5e2980d30e98375db551be6f2 Author: akashrn5Date: 2018-02-01T07:12:49Z compatibility fix for v2 ---
[GitHub] carbondata pull request #1895: [CARBONDATA-2111] Fix the decoder issue when ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1895 ---
[GitHub] carbondata issue #1895: [CARBONDATA-2111] Fix the decoder issue when multipl...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/1895 LGTM ---
[GitHub] carbondata issue #1896: [CARBONDATA-2108]Updated unsafe sort memory configur...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1896 retest this please ---
[jira] [Updated] (CARBONDATA-2082) Timeseries pre-aggregate table should support the blank space
[ https://issues.apache.org/jira/browse/CARBONDATA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2082: Description: timeseries pre-aggregate table should support the blank space 1.scenario 1 {code:java} test("test timeseries create table 35: support event_time and granularity key with space") { sql("DROP DATAMAP IF EXISTS agg1_month ON TABLE maintable") sql( s"""CREATE DATAMAP agg1_month ON TABLE mainTable |USING '$timeSeries' |DMPROPERTIES ( | 'event_time '=' dataTime', | 'MONTH_GRANULARITY '='1') |AS SELECT dataTime, SUM(age) FROM mainTable |GROUP BY dataTime """.stripMargin) checkExistence(sql("SHOW TABLES"), true, "maintable_agg1_month") } {code} problem: NPE {code:java} java.lang.NullPointerException was thrown. java.lang.NullPointerException at org.apache.spark.sql.execution.command.timeseries.TimeSeriesUtil$.validateTimeSeriesEventTime(TimeSeriesUtil.scala:50) at org.apache.spark.sql.execution.command.preaaggregate.CreatePreAggregateTableCommand.processMetadata(CreatePreAggregateTableCommand.scala:104) at org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:75) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:84) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) {code} 2.scenario 2 {code:java} test("test timeseries create table 36: support event_time and granularity key with space") { sql("DROP DATAMAP IF EXISTS agg1_month ON TABLE maintable") sql( s"""CREATE DATAMAP agg1_month ON TABLE mainTable |USING '$timeSeries' |DMPROPERTIES ( | 'event_time '='dataTime', | 'MONTH_GRANULARITY '=' 1') |AS SELECT dataTime, SUM(age) FROM mainTable |GROUP BY dataTime """.stripMargin) checkExistence(sql("SHOW TABLES"), true, "maintable_agg1_month") } {code} problem: {code:java} Granularity only support 1 org.apache.carbondata.spark.exception.MalformedDataMapCommandException: Granularity only support 1 at org.apache.spark.sql.execution.command.timeseries.TimeSeriesUtil$.getTimeSeriesGranularityDetails(TimeSeriesUtil.scala:118) at org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:58) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:84) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) at org.apache.spark.sql.Dataset.(Dataset.scala:183) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632) {code} was: timeseries pre-aggregate table should support the blank space 1.scenario 1 {code:java} test("test timeseries create table 12: hierarchy type with space") { try { sql( """create datamap agg1 on table mainTable using 'preaggregate' |DMPROPERTIES ( | 'timeseries.eventTime'='dataTime', | 'timeseries.hierarchy'='second= 1,hour=1,day=1,month=1,year=1') |as select dataTime, sum(age) from mainTable |group by dataTime """.stripMargin) assert(false) } catch { case e: MalformedCarbonCommandException => assert(e.getMessage.contains("Unsupported Value for hierarchy:second= 1")) assert(true) } } {code} 2.scenario 2 second=1, hour=1,day=1,month=1,year=1 > Timeseries pre-aggregate table should support the blank space > - > > Key: CARBONDATA-2082 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2082 > Project: CarbonData > Issue Type: Bug > Components: core, spark-integration >Affects Versions: 1.3.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Minor > Fix For: 1.3.0 > > Time
[GitHub] carbondata pull request #1861: [CARBONDATA-2078][CARBONDATA-1516] Add 'if no...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1861#discussion_r165272350 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala --- @@ -49,10 +52,22 @@ case class CarbonCreateDataMapCommand( throw new MalformedCarbonCommandException("Streaming table does not support creating datamap") } val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val dbName = tableIdentifier.database.getOrElse("default") +val tableName = tableIdentifier.table + "_" + dataMapName -if (dmClassName.equalsIgnoreCase(PREAGGREGATE.toString) || +if (sparkSession.sessionState.catalog.listTables(dbName) --- End diff -- sparkSession.sessionState.catalog.tableExists(tableIdentifier) ---
[jira] [Created] (CARBONDATA-2111) TPCH query which has multiple joins inside does not return any rows.
Ravindra Pesala created CARBONDATA-2111: --- Summary: TPCH query which has multiple joins inside does not return any rows. Key: CARBONDATA-2111 URL: https://issues.apache.org/jira/browse/CARBONDATA-2111 Project: CarbonData Issue Type: Improvement Reporter: Ravindra Pesala The below TPCH query which has multiple joins does not return any roes. {code} sql( *"select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from "* + *"part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = "* + *"ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and "* + *"n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min"* + *"(ps_supplycost) from partsupp, supplier,nation, region where p_partkey = ps_partkey and "* + *"s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and "* + *"r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100"*) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #1571: [CARBONDATA-1811] Use StructType as schema when crea...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1571 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2155/ ---
[jira] [Assigned] (CARBONDATA-2082) Timeseries pre-aggregate table should support the blank space
[ https://issues.apache.org/jira/browse/CARBONDATA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 reassigned CARBONDATA-2082: --- Assignee: xubo245 > Timeseries pre-aggregate table should support the blank space > - > > Key: CARBONDATA-2082 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2082 > Project: CarbonData > Issue Type: Bug > Components: core, spark-integration >Affects Versions: 1.3.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Minor > Fix For: 1.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > timeseries pre-aggregate table should support the blank space > 1.scenario 1 > {code:java} > test("test timeseries create table 12: hierarchy type with space") { > try { > sql( > """create datamap agg1 on table mainTable using 'preaggregate' > |DMPROPERTIES ( > | 'timeseries.eventTime'='dataTime', > | 'timeseries.hierarchy'='second= 1,hour=1,day=1,month=1,year=1') > |as select dataTime, sum(age) from mainTable > |group by dataTime > """.stripMargin) > assert(false) > } catch { > case e: MalformedCarbonCommandException => > assert(e.getMessage.contains("Unsupported Value for hierarchy:second= > 1")) > assert(true) > } > } > {code} > 2.scenario 2 > second=1, hour=1,day=1,month=1,year=1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #1861: [CARBONDATA-2078][CARBONDATA-1516] Add 'if no...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1861#discussion_r165271557 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/timeseries/TestTimeSeriesCreateTable.scala --- @@ -319,6 +326,53 @@ class TestTimeSeriesCreateTable extends QueryTest with BeforeAndAfterAll { assert(e.getMessage.equals(s"$timeSeries should define time granularity")) } + test("test timeseries create table 19: should support if not exists") { +sql("DROP DATAMAP IF EXISTS agg1 ON TABLE mainTable") +try { --- End diff -- no need for try block. If any exception if thrown the test case will fail ---
[GitHub] carbondata pull request #1861: [CARBONDATA-2078][CARBONDATA-1516] Add 'if no...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1861#discussion_r165271343 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala --- @@ -310,5 +310,99 @@ test("check load and select for avg double datatype") { checkAnswer(sql("select name,avg(salary) from maintbl group by name"), rows) } + test("create datamap with 'if not exists' after load data into mainTable and create datamap") { --- End diff -- I think no need to add test cases in all the files. One test case in TestPreAggregateCreateCommand and one in TestTimeseriesCreateTable would be enough. ---
[jira] [Updated] (CARBONDATA-2110) option of TempCsv should be removed since the default delimiter may conflicts with field value
[ https://issues.apache.org/jira/browse/CARBONDATA-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2110: --- Description: Currently in carbondata, an option named ‘tempCSV’ is available during loading dataframe. After enabling this option, Carbondata will write the dataframe to a *standard* csv file at first and then load the data files. The delimiters of the standard csv file, such as field delimiter / escape char/ quote char/ multi-line/ line separator and so on may conflict with the actual field value. For example, if a field contains ',', then it will cause problem in further data loading if we save the tempCSV using ',' as field separator. Since we are not sure about the content of dataframe, I think it's better to deprecate this option. To make forward compatible, user can still use this option but will get warning about it. was: Currently in carbondata, an option named ‘tempCSV’ is available during loading dataframe. After enabling this option, Carbondata will write the dataframe to a *standard* csv file at first and then load the data files. The delimiters of the standard csv file, such as field delimiter / escape char/ quote char/ multi-line/ line separator and so on may conflict with the actual field value. For example, if a field contains ',', then it will cause problem to save the tempCSV using ',' as field separator. So I think it's better to deprecate this option. To make forward compatible, user can still use this option but will get warning about it. > option of TempCsv should be removed since the default delimiter may conflicts > with field value > -- > > Key: CARBONDATA-2110 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2110 > Project: CarbonData > Issue Type: Bug > Components: data-load >Reporter: xuchuanyin >Priority: Major > > Currently in carbondata, an option named ‘tempCSV’ is available during > loading dataframe. > > After enabling this option, Carbondata will write the dataframe to a > *standard* csv file at first and then load the data files. > > The delimiters of the standard csv file, such as field delimiter / escape > char/ quote char/ multi-line/ line separator and so on may conflict with the > actual field value. For example, if a field contains ',', then it will cause > problem in further data loading if we save the tempCSV using ',' as field > separator. > > Since we are not sure about the content of dataframe, I think it's better to > deprecate this option. To make forward compatible, user can still use this > option but will get warning about it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1768 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2154/ ---
[jira] [Updated] (CARBONDATA-2110) option of TempCsv should be removed since the default delimiter may conflicts with field value
[ https://issues.apache.org/jira/browse/CARBONDATA-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2110: --- Description: Currently in carbondata, an option named ‘tempCSV’ is available during loading dataframe. After enabling this option, Carbondata will write the dataframe to a *standard* csv file at first and then load the data files. The delimiters of the standard csv file, such as field delimiter / escape char/ quote char/ multi-line/ line separator and so on may conflict with the actual field value. For example, if a field contains ',', then it will cause problem to save the tempCSV using ',' as field separator. So I think it's better to deprecate this option. To make forward compatible, user can still use this option but will get warning about it. was: Currently in carbondata, an option named ‘tempCSV’ is available during loading dataframe. After enabling this option, Carbondata will write the dataframe to a **standard** csv file at first and then load the data files. The delimiters of the standard csv file, such as field delimiter / escape char/ quote char/ multi-line/ line separator and so on may conflict with the actual field value. For example, if a field contains ',', then it will cause problem to save the tempCSV using ',' as field separator. So I think it's better to deprecate this option. To make forward compatible, user can still use this option but will get warning about it. > option of TempCsv should be removed since the default delimiter may conflicts > with field value > -- > > Key: CARBONDATA-2110 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2110 > Project: CarbonData > Issue Type: Bug > Components: data-load >Reporter: xuchuanyin >Priority: Major > > Currently in carbondata, an option named ‘tempCSV’ is available during > loading dataframe. > > After enabling this option, Carbondata will write the dataframe to a > *standard* csv file at first and then load the data files. > > The delimiters of the standard csv file, such as field delimiter / escape > char/ quote char/ multi-line/ line separator and so on may conflict with the > actual field value. For example, if a field contains ',', then it will cause > problem to save the tempCSV using ',' as field separator. > > So I think it's better to deprecate this option. To make forward compatible, > user can still use this option but will get warning about it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #1822: [CARBONDATA-2043] Configurable wait time for request...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1822 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2153/ ---
[jira] [Created] (CARBONDATA-2110) option of TempCsv should be removed since the default delimiter may conflicts with field value
xuchuanyin created CARBONDATA-2110: -- Summary: option of TempCsv should be removed since the default delimiter may conflicts with field value Key: CARBONDATA-2110 URL: https://issues.apache.org/jira/browse/CARBONDATA-2110 Project: CarbonData Issue Type: Bug Components: data-load Reporter: xuchuanyin Currently in carbondata, an option named ‘tempCSV’ is available during loading dataframe. After enabling this option, Carbondata will write the dataframe to a **standard** csv file at first and then load the data files. The delimiters of the standard csv file, such as field delimiter / escape char/ quote char/ multi-line/ line separator and so on may conflict with the actual field value. For example, if a field contains ',', then it will cause problem to save the tempCSV using ',' as field separator. So I think it's better to deprecate this option. To make forward compatible, user can still use this option but will get warning about it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #1861: [CARBONDATA-2078][CARBONDATA-1516] Add 'if no...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1861#discussion_r165270563 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/timeseries/TestTimeSeriesCreateTable.scala --- @@ -81,29 +82,29 @@ class TestTimeSeriesCreateTable extends QueryTest with BeforeAndAfterAll { """.stripMargin) } - test("test timeseries create table Zero") { + test("test timeseries create table 1") { --- End diff -- Please remove unnecessary changes like this ---
[GitHub] carbondata issue #1822: [CARBONDATA-2043] Configurable wait time for request...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1822 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3390/ ---
[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1831 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3389/ ---
[jira] [Resolved] (CARBONDATA-2092) Fix compaction bug to prevent the compaction flow from going through the restructure compaction flow
[ https://issues.apache.org/jira/browse/CARBONDATA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-2092. - Resolution: Fixed Fix Version/s: 1.3.0 > Fix compaction bug to prevent the compaction flow from going through the > restructure compaction flow > > > Key: CARBONDATA-2092 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2092 > Project: CarbonData > Issue Type: Bug >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Major > Fix For: 1.3.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Problem and analysis: > > During data load current schema timestamp is written to the carbondata > fileHeader. This is used during compaction to decide whether the block is a > restructured block or the block is according to the latest schema. > As the blocklet information is now stored in the index file, while laoding it > in memory the carbondata file header is not read and due to this the schema > timestamp is not getting set to the blocklet information. Due to this during > compaction flow there is a mismatch on comparing the current schema time > stamp with the timestamp stored in the block and the flow goes through the > restructure compaction flow instead of normal compaction flow. > Impact: > - > Compaction performance degradation as restructure compaction flow involves > sorting of data again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #1781: [CARBONDATA-2012] Add support to load pre-aggregate ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1781 retest this please ---
[GitHub] carbondata pull request #1875: [CARBONDATA-2092] Fix compaction bug to preve...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1875 ---
[GitHub] carbondata issue #1861: [CARBONDATA-2078][CARBONDATA-1516] Add 'if not exist...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1861 CI pass @kumarvishal09 ---
[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1831 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2152/ ---
[GitHub] carbondata issue #1847: [WIP][CARBONDATA-2064] Add compaction listener
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1847 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2151/ ---
[GitHub] carbondata issue #1847: [WIP][CARBONDATA-2064] Add compaction listener
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1847 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3388/ ---
[GitHub] carbondata issue #1875: [CARBONDATA-2092] Fix compaction bug to prevent the ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1875 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3278/ ---
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1768 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3387/ ---
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1768 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2150/ ---
[GitHub] carbondata issue #1856: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1856 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2149/ ---
[GitHub] carbondata issue #1856: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1856 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3386/ ---
[GitHub] carbondata pull request #1860: [CARBONDATA-2080] [S3-Implementation] Propaga...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1860#discussion_r165261635 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -59,6 +76,33 @@ abstract class CarbonRDD[T: ClassTag](@transient sc: SparkContext, map(f => CarbonProperties.getInstance().addProperty(f._1, f._2)) internalCompute(split, context) } + + private def getConf: Configuration = { +val configuration = new Configuration(false) +val bai = new ByteArrayInputStream(CompressorFactory.getInstance().getCompressor + .unCompressByte(confBytes)) +val ois = new ObjectInputStream(bai) +configuration.readFields(ois) +ois.close() +configuration + } + + private def setS3Configurations(hadoopConf: Configuration): Unit = { --- End diff -- Can you use the same function in CarbonInputFormatUtil. It was added when I rebase yesterday ---
[jira] [Closed] (CARBONDATA-1974) Exception when to load data using static partition for uniqdata table
[ https://issues.apache.org/jira/browse/CARBONDATA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anubhav tarar closed CARBONDATA-1974. - Resolution: Fixed > Exception when to load data using static partition for uniqdata table > - > > Key: CARBONDATA-1974 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1974 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.3.0 > Environment: spark2.1,hadoop2.7 >Reporter: anubhav tarar >Priority: Major > Fix For: 1.3.0 > > > 1.CREATE TABLE uniqdata_string(CUST_ID int,CUST_NAME String,DOB timestamp,DOJ > timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 > decimal(30,10),DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, > Double_COLUMN2 double,INTEGER_COLUMN1 int) PARTITIONED BY(ACTIVE_EMUI_VERSION > string) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ('TABLE_BLOCKSIZE'= '256 MB'); > 2,jdbc:hive2://localhost:1/default> LOAD DATA INPATH > 'hdfs://localhost:54311/2000_UniqData.csv' into table uniqdata_string > partition(ACTIVE_EMUI_VERSION='abc') > OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ, > BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, > Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); > Error: org.apache.spark.sql.AnalysisException: Cannot insert into table > `default`.`uniqdata_string` because the number of columns are different: need > 11 columns, but query has 12 columns.; (state=,code=0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #1713: [CARBONDATA-1899] Optimize CarbonData concurrency te...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1713 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3385/ ---
[jira] [Closed] (CARBONDATA-1084) Add documentation for V3 Data Format
[ https://issues.apache.org/jira/browse/CARBONDATA-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Yadav closed CARBONDATA-1084. - Resolution: Fixed Resolved with [https://github.com/apache/carbondata/pull/1883] PR > Add documentation for V3 Data Format > > > Key: CARBONDATA-1084 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1084 > Project: CarbonData > Issue Type: Improvement > Components: docs >Reporter: Vandana Yadav >Assignee: Vandana Yadav >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > Benefits to be added in documentation and add commands to set this format and > specify that this is the dafault format -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1768 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2147/ ---
[GitHub] carbondata issue #1713: [CARBONDATA-1899] Optimize CarbonData concurrency te...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1713 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2148/ ---
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1768 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3384/ ---
[GitHub] carbondata issue #1822: [CARBONDATA-2043] Configurable wait time for request...
Github user mohammadshahidkhan commented on the issue: https://github.com/apache/carbondata/pull/1822 retest this please ---
[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...
Github user mohammadshahidkhan commented on the issue: https://github.com/apache/carbondata/pull/1104 retest this please ---
[GitHub] carbondata issue #1847: [WIP][CARBONDATA-2064] Add compaction listener
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1847 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3383/ ---
[GitHub] carbondata issue #1899: [CARBONDATA-2109]make configs of dataframe load with...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1899 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3276/ ---
[GitHub] carbondata issue #1847: [WIP][CARBONDATA-2064] Add compaction listener
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1847 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2146/ ---
[GitHub] carbondata issue #1861: [CARBONDATA-2078][CARBONDATA-1516] Add 'if not exist...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1861 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2145/ ---
[GitHub] carbondata issue #1861: [CARBONDATA-2078][CARBONDATA-1516] Add 'if not exist...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1861 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3382/ ---
[GitHub] carbondata issue #1768: [CARBONDATA-2025] Unify all path construction throug...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1768 retest this please ---
[GitHub] carbondata issue #1894: [CARBONDATA-2107]Fixed query failure in case if aver...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1894 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3380/ ---
[GitHub] carbondata issue #1894: [CARBONDATA-2107]Fixed query failure in case if aver...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1894 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2143/ ---
[GitHub] carbondata pull request #1868: [CARBONDATA-2082][CARBONDATA-1516] Timeseries...
Github user xubo245 closed the pull request at: https://github.com/apache/carbondata/pull/1868 ---
[GitHub] carbondata pull request #1868: [CARBONDATA-2082][CARBONDATA-1516] Timeseries...
GitHub user xubo245 reopened a pull request: https://github.com/apache/carbondata/pull/1868 [CARBONDATA-2082][CARBONDATA-1516] Timeseries pre-aggregate table should support the blank space Timeseries pre-aggregate table should support the blank space Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done add some new test cases - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. No You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata supportBlankSpace Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1868.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1868 commit 1a106cbe212099f407b6ce5872b698970d7dd073 Author: xubo245 <601450868@...> Date: 2018-01-27T03:35:38Z [CARBONDATA-2082][CARBONDATA-1516] Timeseries pre-aggregate table should support the blank space ---
[GitHub] carbondata issue #1891: [CARBONDATA-2104] Add testcase for concurrent execut...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1891 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3381/ ---
[GitHub] carbondata issue #1891: [CARBONDATA-2104] Add testcase for concurrent execut...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1891 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2144/ ---
[GitHub] carbondata issue #1831: [CARBONDATA-1993] Carbon properties default values f...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1831 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3275/ ---
[GitHub] carbondata issue #1899: [CARBONDATA-2109]make configs of dataframe load with...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1899 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3378/ ---
[GitHub] carbondata issue #1874: [CARBONDATA-2099] Refactor query scan process to imp...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1874 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2142/ ---
[GitHub] carbondata issue #1899: [CARBONDATA-2109]make configs of dataframe load with...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1899 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2141/ ---
[GitHub] carbondata issue #1874: [CARBONDATA-2099] Refactor query scan process to imp...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1874 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3379/ ---
[GitHub] carbondata issue #1886: [CARBONDATA-2098]Add Documentation for Pre-Aggregate...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1886 @sraghunandan please add pre-agg example, it is better to have the performance comparison inside example. ---
[GitHub] carbondata issue #1104: [CARBONDATA-1239] Add validation for set command par...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1104 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3274/ ---
[GitHub] carbondata pull request #1883: [CARBONDATA-1840]Updated configuration-parame...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1883 ---
[jira] [Resolved] (CARBONDATA-1840) carbon.data.file.version default value is not correct in http://carbondata.apache.org/configuration-parameters.html
[ https://issues.apache.org/jira/browse/CARBONDATA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-1840. Resolution: Fixed Fix Version/s: 1.3.0 > carbon.data.file.version default value is not correct in > http://carbondata.apache.org/configuration-parameters.html > --- > > Key: CARBONDATA-1840 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1840 > Project: CarbonData > Issue Type: Bug > Components: docs >Affects Versions: 1.3.0 > Environment: > http://carbondata.apache.org/configuration-parameters.html >Reporter: Chetan Bhat >Assignee: Vandana Yadav >Priority: Minor > Labels: Docs > Fix For: 1.3.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Steps: > User checks the http://carbondata.apache.org/configuration-parameters.html > link for the default value of carbon.data.file.version. > Issue : carbon.data.file.version default value is mentioned as 2. Old format > value is mentioned as 1. > Expected : carbon.data.file.version default value should be 3. Old format > value should be 1 and 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #1886: [CARBONDATA-2098]Add Documentation for Pre-Ag...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1886#discussion_r165249189 --- Diff: docs/data-management-on-carbondata.md --- @@ -703,6 +704,194 @@ This tutorial is going to introduce all commands and data operations on CarbonDa * The partitioned column can be excluded from SORT_COLUMNS, this will let other columns to do the efficient sorting. * When writing SQL on a partition table, try to use filters on the partition column. +## PRE-AGGREGATE TABLES --- End diff -- Please add some example to show the plan matching mechanism, like what query will hit which datamap ---
[GitHub] carbondata issue #1713: [CARBONDATA-1899] Optimize CarbonData concurrency te...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1713 retest this please ---
[jira] [Resolved] (CARBONDATA-1616) Add document for streaming ingestion usage
[ https://issues.apache.org/jira/browse/CARBONDATA-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-1616. Resolution: Fixed Assignee: QiangCai (was: Gururaj Shetty) > Add document for streaming ingestion usage > -- > > Key: CARBONDATA-1616 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1616 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Assignee: QiangCai >Priority: Major > Fix For: 1.3.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #1880: [CARBONDATA-1616] Add CarbonData Streaming In...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1880 ---
[GitHub] carbondata issue #1880: [CARBONDATA-1616] Add CarbonData Streaming Ingestion...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1880 LGTM ---
[GitHub] carbondata issue #1847: [WIP][CARBONDATA-2064] Add compaction listener
Github user dhatchayani commented on the issue: https://github.com/apache/carbondata/pull/1847 Retest this please ---
[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1898#discussion_r165247013 --- Diff: docs/configuration-parameters.md --- @@ -60,6 +60,7 @@ This section provides the details of all the configurations required for CarbonD | carbon.options.is.empty.data.bad.record | false | If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa. | | | carbon.options.bad.record.path | | Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect. | | | carbon.enable.vector.reader | true | This parameter increases the performance of select queries as it fetch columnar batch of size 4*1024 rows instead of fetching data row by row. | | +| carbon.task.distribution | merge_small_files | Setting this parameter value to *merge_small_files* will merge all the small files to a size of (128 MB). During data loading, all the small CSV files are combined to a map task to reduce the number of read task. This enhances the performance. | | --- End diff -- 1. carbon.task.distribution is only for the query, not be used by data loading. Global_Sort loading will always merge small CSV files, not require this configuration. 2. better to list all values of carbon.task.distribution custom, block(default), blocklet, merge_small_files ---
[GitHub] carbondata pull request #1900: [HOTFIX] Correct the order of dropping pre-ag...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1900 ---
[GitHub] carbondata issue #1900: [HOTFIX] Correct the order of dropping pre-aggregate...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1900 LGTM, I will merge this now otherwise all CI will fail ---
[GitHub] carbondata pull request #1874: [CARBONDATA-2099] Refactor query scan process...
Github user jackylk closed the pull request at: https://github.com/apache/carbondata/pull/1874 ---
[GitHub] carbondata issue #1874: [CARBONDATA-2099] Refactor query scan process to imp...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1874 Merged into carbonstore branch ---
[GitHub] carbondata issue #1874: [CARBONDATA-2099] Refactor query scan process to imp...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1874 LGTM ---
[GitHub] carbondata issue #1834: [CARBONDATA-2056] Hadoop Configuration with access k...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1834 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3273/ ---
[GitHub] carbondata issue #1894: [CARBONDATA-2107]Fixed query failure in case if aver...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1894 retest this please ---
[GitHub] carbondata pull request #1900: [HOTFIX] Correct the order of dropping pre-ag...
GitHub user sraghunandan opened a pull request: https://github.com/apache/carbondata/pull/1900 [HOTFIX] Correct the order of dropping pre-aggregate tables.pre-aggregate tables to be dropped before main table is dropped Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? No - [x] Any backward compatibility impacted? No - [x] Document update required? No - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Test case corrected - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/sraghunandan/carbondata-1 correct_ut_pre-agg_drop Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1900 commit 1ec2d07222f722e79ae8b5c73738e5b0b8854ca6 Author: Raghunandan SDate: 2018-02-01T02:12:09Z [HOTFIX] Correct the order of dropping pre-aggregate tables.pre-aggregate tables to be dropped before main table is dropped ---
[GitHub] carbondata issue #1874: [CARBONDATA-2099] Refactor query scan process to imp...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1874 retest sdv please ---
[GitHub] carbondata issue #1896: [CARBONDATA-2108]Updated unsafe sort memory configur...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1896 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3272/ ---
[GitHub] carbondata issue #1874: [CARBONDATA-2099] Refactor query scan process to imp...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1874 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3377/ ---
[GitHub] carbondata issue #1891: [CARBONDATA-2104] Add testcase for concurrent execut...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1891 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2139/ ---
[GitHub] carbondata issue #1874: [CARBONDATA-2099] Refactor query scan process to imp...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1874 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2140/ ---
[GitHub] carbondata issue #1891: [CARBONDATA-2104] Add testcase for concurrent execut...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1891 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3376/ ---
[GitHub] carbondata issue #1895: [WIP] Fix the decoder issue when joins are present i...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1895 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3271/ ---
[GitHub] carbondata issue #1874: [CARBONDATA-2099] Refactor query scan process to imp...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1874 retest this please ---
[GitHub] carbondata issue #1861: [CARBONDATA-2078][CARBONDATA-1516] Add 'if not exist...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1861 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3270/ ---