[GitHub] carbondata issue #1948: [CARBONDATA-2143] Fixed query memory leak issue for ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1948 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3425/ ---
[GitHub] carbondata issue #1952: [HotFix][CheckStyle] Fix import related checkstyle
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1952 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3580/ ---
[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1825 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3582/ ---
[GitHub] carbondata issue #1952: [HotFix][CheckStyle] Fix import related checkstyle
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1952 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2342/ ---
[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1825 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2344/ ---
[GitHub] carbondata pull request #1953: [CARBONDATA-2091][DataLoad] Support specifyin...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/1953 [CARBONDATA-2091][DataLoad] Support specifying sort column bounds in data loading Enhance data loading performance by specifying sort column bounds 1. Add row range number during convert-process-step 2. Dispatch rows to each sorter by range number 3. Sort/Write process step can be done concurrently in each range Tests added and docs updated After implementing this feature, the data load performance has gained about 25% enhancement (80MB/s/Node -> 102MB/s/Node) in my scenario with only 1 bounds provided. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `Only internal used interfaces are changed` - [x] Any backward compatibility impacted? `No` - [x] Document update required? `Yes, added the usage of this feature to documents` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `Yes` - How it is tested? Please attach test report. `Tested in 3-node cluster and local machine` - Is it a performance related change? Please attach the performance test report. `Yes. After implementing this feature, the data load performance has gained about 25% enhancement (80MB/s/Node -> 102MB/s/Node) in my scenario with only 1 bounds provided. ` - Any additional information to help reviewers in testing this change. `I refactored the bucket related feature and treated the range and bucket as the similar logic` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `Not related` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0208_support_specifying_sort_column_bounds Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1953.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1953 commit 11463dd22db17f2e1858e0a1f3ebfeb07e3ec0e9 Author: xuchuanyin Date: 2018-02-08T08:30:09Z Support specifying sort column bounds in data loading Enhance data loading performance by specifying sort column bounds 1. Add row range number during convert-process-step 2. Dispatch rows to each sorter by range number 3. Sort/Write process step can be done concurrently in each range Tests added and docs updated ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1951 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3581/ ---
[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2346/ ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1951 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2343/ ---
[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1808 this PR depends on #1952 ---
[GitHub] carbondata issue #1792: [CARBONDATA-2018][DataLoad] Optimization in reading/...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1792 this PR depends on #1952 ---
[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1825 this PR depends on #1952 ---
[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1953 this PR depends on #1952 ---
[GitHub] carbondata issue #1792: [CARBONDATA-2018][DataLoad] Optimization in reading/...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1792 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3426/ ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1951 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2345/ ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1951 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3583/ ---
[GitHub] carbondata issue #1949: [CARBONDATA2144] Optimize preaggregate table documen...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1949 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3584/ ---
[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1953 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3586/ ---
[GitHub] carbondata issue #1949: [CARBONDATA2144] Optimize preaggregate table documen...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1949 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2347/ ---
[GitHub] carbondata issue #1935: [CARBONDATA-2134] Prevent implicit column filter lis...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1935 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3585/ ---
[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1808 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3427/ ---
[GitHub] carbondata issue #1935: [CARBONDATA-2134] Prevent implicit column filter lis...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1935 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2348/ ---
[GitHub] carbondata pull request #1857: [WIP][CARBONDATA-2073][CARBONDATA-1516][Tests...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1857#discussion_r166269467 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala --- @@ -412,8 +430,467 @@ test("check load and select for avg double datatype") { sql(s"LOAD DATA LOCAL INPATH '$testData' into table maintable") sql(s"LOAD DATA LOCAL INPATH '$testData' into table maintable") val rows = sql("select age,avg(age) from maintable group by age").collect() -sql("create datamap maintbl_douoble on table maintable using 'preaggregate' as select avg(age) from maintable group by age") +sql("create datamap maintbl_double on table maintable using 'preaggregate' as select avg(age) from maintable group by age") checkAnswer(sql("select age,avg(age) from maintable group by age"), rows) +sql("drop table if exists maintable ") + } + + def testFunction(): Unit = { +// check answer +checkAnswer(sql(s"SELECT * FROM main_table_preagg_sum"), + Seq(Row(1, 31), Row(2, 27), Row(3, 70), Row(4, 55))) +checkAnswer(sql(s"SELECT * FROM main_table_preagg_avg"), + Seq(Row(1, 31, 1), Row(2, 27, 1), Row(3, 70, 2), Row(4, 55, 2))) +checkAnswer(sql(s"SELECT * FROM main_table_preagg_count"), + Seq(Row(1, 1), Row(2, 1), Row(3, 2), Row(4, 2))) +checkAnswer(sql(s"SELECT * FROM main_table_preagg_min"), + Seq(Row(1, 31), Row(2, 27), Row(3, 35), Row(4, 26))) +checkAnswer(sql(s"SELECT * FROM main_table_preagg_max"), + Seq(Row(1, 31), Row(2, 27), Row(3, 35), Row(4, 29))) + +// check select and match or not match pre-aggregate table +checkPreAggTable(sql("SELECT id, SUM(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_sum") +checkPreAggTable(sql("SELECT id, SUM(age) FROM main_table GROUP BY id"), + false, "main_table_preagg_avg", "main_table") + +checkPreAggTable(sql("SELECT id, AVG(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_avg") +checkPreAggTable(sql("SELECT id, AVG(age) from main_table GROUP BY id"), + false, "main_table_preagg_sum", "main_table") + +checkPreAggTable(sql("SELECT id, COUNT(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_count") +checkPreAggTable(sql("SELECT id, COUNT(age) FROM main_table GROUP BY id"), + false, "main_table_preagg_sum", "main_table") + +checkPreAggTable(sql("SELECT id, MIN(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_min") +checkPreAggTable(sql("SELECT id, MIN(age) FROM main_table GROUP BY id"), + false, "main_table_preagg_sum", "main_table") + +checkPreAggTable(sql("SELECT id, MAX(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_max") +checkPreAggTable(sql("SELECT id, MAX(age) FROM main_table GROUP BY id"), + false, "main_table_preagg_sum", "main_table") + +// sub query should match pre-aggregate table +checkPreAggTable(sql("SELECT SUM(age) FROM main_table"), + true, "main_table_preagg_sum") +checkPreAggTable(sql("SELECT SUM(age) FROM main_table"), + false, "main_table_preagg_avg", "main_table") + +checkPreAggTable(sql("SELECT AVG(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_avg") +checkPreAggTable(sql("SELECT AVG(age) from main_table GROUP BY id"), + false, "main_table_preagg_sum", "main_table") + +checkPreAggTable(sql("SELECT COUNT(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_count") +checkPreAggTable(sql("SELECT COUNT(age) FROM main_table GROUP BY id"), + false, "main_table_preagg_sum", "main_table") + +checkPreAggTable(sql("SELECT MIN(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_min") +checkPreAggTable(sql("SELECT MIN(age) FROM main_table GROUP BY id"), + false, "main_table_preagg_sum", "main_table") + +checkPreAggTable(sql("SELECT MAX(age) FROM main_table GROUP BY id"), + true, "main_table_preagg_max") +checkPreAggTable(sql("SELECT MAX(age) FROM main_table GROUP BY id"), + false, "main_table_preagg_sum", "main_table") + } + + test("test load into main table with pre-aggregate table: double") { +sql( + """ +| CREATE TABLE main_table( +| id INT, +| name STRING, +| city STRING, +| age DOUBLE) +| STORED BY 'org.apache.carbondata.format' + """.stripMargin) + +createAllAggregateTables("main_table") +sql(s"LOAD DATA LOCAL INPATH '$testData' INTO TABLE ma
[GitHub] carbondata issue #1937: [CARBONDATA-2137] Delete query performance improved
Github user rahulforallp commented on the issue: https://github.com/apache/carbondata/pull/1937 @sraghunandan Performance report is added in comment section. ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/1951 @ravipesala Build success ---
[GitHub] carbondata issue #1867: [CARBONDATA-2055][Streaming][WIP]Support integrating...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1867 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3588/ ---
[GitHub] carbondata issue #1867: [CARBONDATA-2055][Streaming][WIP]Support integrating...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1867 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2350/ ---
[GitHub] carbondata issue #1857: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1857 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2351/ ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1941 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3428/ ---
[GitHub] carbondata issue #1857: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1857 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3589/ ---
[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1825 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3429/ ---
[GitHub] carbondata issue #1952: [HotFix][CheckStyle] Fix import related checkstyle
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1952 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3430/ ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1951 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3431/ ---
[jira] [Created] (CARBONDATA-2147) Exception displays while loading data with streaming
Vandana Yadav created CARBONDATA-2147: - Summary: Exception displays while loading data with streaming Key: CARBONDATA-2147 URL: https://issues.apache.org/jira/browse/CARBONDATA-2147 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.3.0 Environment: spark 2.1, spark 2.2.1 Reporter: Vandana Yadav Exception displays while loading data with streaming Steps to reproduce: 1) start spark-shell: ./spark-shell --jars /opt/spark/spark-2.2.1/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar 2) Execute following script: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ import org.apache.carbondata.core.util.CarbonProperties import org.apache.spark.sql.streaming.\{ProcessingTime, StreamingQuery} val carbon = SparkSession.builder().config(sc.getConf) .getOrCreateCarbonSession("hdfs://localhost:54310/newCarbonStore","/tmp") import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "FORCE") carbon.sql("drop table if exists uniqdata_stream") carbon.sql("create table uniqdata_stream(CUST_ID int,CUST_NAME String,DOB timestamp,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10),DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ('TABLE_BLOCKSIZE'= '256 MB', 'streaming'='true')"); import carbon.sqlContext.implicits._ import org.apache.spark.sql.types._ val uniqdataSch = StructType( Array(StructField("CUST_ID", IntegerType),StructField("CUST_NAME", StringType),StructField("DOB", TimestampType), StructField("DOJ", TimestampType), StructField("BIGINT_COLUMN1", LongType), StructField("BIGINT_COLUMN2", LongType), StructField("DECIMAL_COLUMN1", org.apache.spark.sql.types.DecimalType(30, 10)), StructField("DECIMAL_COLUMN2", org.apache.spark.sql.types.DecimalType(36,10)), StructField("Double_COLUMN1", DoubleType), StructField("Double_COLUMN2", DoubleType), StructField("INTEGER_COLUMN1", IntegerType))) val streamDf = carbon.readStream .schema(uniqdataSch) .option("sep", ",") .csv("file:///home/knoldus/Documents/uniqdata") val qry = streamDf.writeStream.format("carbondata").trigger(ProcessingTime("5 seconds")) .option("checkpointLocation","/stream/uniq") .option("dbName", "default") .option("tableName", "uniqdata_stream") .start() 3) Error logs: warning: there was one deprecation warning; re-run with -deprecation for details uniqdataSch: org.apache.spark.sql.types.StructType = StructType(StructField(CUST_ID,IntegerType,true), StructField(CUST_NAME,StringType,true), StructField(DOB,TimestampType,true), StructField(DOJ,TimestampType,true), StructField(BIGINT_COLUMN1,LongType,true), StructField(BIGINT_COLUMN2,LongType,true), StructField(DECIMAL_COLUMN1,DecimalType(30,10),true), StructField(DECIMAL_COLUMN2,DecimalType(36,10),true), StructField(Double_COLUMN1,DoubleType,true), StructField(Double_COLUMN2,DoubleType,true), StructField(INTEGER_COLUMN1,IntegerType,true)) streamDf: org.apache.spark.sql.DataFrame = [CUST_ID: int, CUST_NAME: string ... 9 more fields] qry: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@d0e155c scala> 18/02/08 16:38:53 ERROR StreamSegment: Executor task launch worker for task 5 Failed to append batch data to stream segment: hdfs://localhost:54310/newCarbonStore/default/uniqdata_stream1/Fact/Part0/Segment_0 java.lang.NullPointerException at org.apache.spark.sql.catalyst.InternalRow.getString(InternalRow.scala:32) at org.apache.carbondata.streaming.parser.CSVStreamParserImp.parserRow(CSVStreamParserImp.java:40) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$InputIterator.next(CarbonAppendableStreamSink.scala:337) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$InputIterator.next(CarbonAppendableStreamSink.scala:331) at org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:315) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:305) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:305) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendable
[GitHub] carbondata pull request #1954: [Documentation] Formatting issue fixed
GitHub user jatin9896 opened a pull request: https://github.com/apache/carbondata/pull/1954 [Documentation] Formatting issue fixed Updated document syntax of which pdf generation was failing Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jatin9896/incubator-carbondata DocumentUpdate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1954.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1954 commit e4cb62cedc6a48a6a2b10f11635ea561a4453ca1 Author: Jatin Date: 2018-02-08T10:55:14Z updated data-management for pdf generation ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/1951 retest sdv please ---
[GitHub] carbondata issue #1904: [CARBONDATA-2059] - Changes to support compaction fo...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1904 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2352/ ---
[GitHub] carbondata issue #1904: [CARBONDATA-2059] - Changes to support compaction fo...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1904 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3590/ ---
[jira] [Commented] (CARBONDATA-2147) Exception displays while loading data with streaming
[ https://issues.apache.org/jira/browse/CARBONDATA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356870#comment-16356870 ] Zhichao Zhang commented on CARBONDATA-2147: [~Vandana7] I can resolve this issue, the default parser 'CSVStreamParserImp' will cause this problem. > Exception displays while loading data with streaming > > > Key: CARBONDATA-2147 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2147 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: spark 2.1, spark 2.2.1 >Reporter: Vandana Yadav >Priority: Minor > > Exception displays while loading data with streaming > Steps to reproduce: > 1) start spark-shell: > ./spark-shell --jars > /opt/spark/spark-2.2.1/carbonlib/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar > 2) Execute following script: > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.spark.sql.streaming.\{ProcessingTime, StreamingQuery} > val carbon = SparkSession.builder().config(sc.getConf) > .getOrCreateCarbonSession("hdfs://localhost:54310/newCarbonStore","/tmp") > import org.apache.carbondata.core.constants.CarbonCommonConstants > import org.apache.carbondata.core.util.CarbonProperties > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, > "FORCE") > carbon.sql("drop table if exists uniqdata_stream") > carbon.sql("create table uniqdata_stream(CUST_ID int,CUST_NAME String,DOB > timestamp,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10),DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ('TABLE_BLOCKSIZE'= '256 MB', 'streaming'='true')"); > import carbon.sqlContext.implicits._ > import org.apache.spark.sql.types._ > val uniqdataSch = StructType( > Array(StructField("CUST_ID", IntegerType),StructField("CUST_NAME", > StringType),StructField("DOB", TimestampType), StructField("DOJ", > TimestampType), StructField("BIGINT_COLUMN1", LongType), > StructField("BIGINT_COLUMN2", LongType), StructField("DECIMAL_COLUMN1", > org.apache.spark.sql.types.DecimalType(30, 10)), > StructField("DECIMAL_COLUMN2", > org.apache.spark.sql.types.DecimalType(36,10)), StructField("Double_COLUMN1", > DoubleType), StructField("Double_COLUMN2", DoubleType), > StructField("INTEGER_COLUMN1", IntegerType))) > val streamDf = carbon.readStream > .schema(uniqdataSch) > .option("sep", ",") > .csv("file:///home/knoldus/Documents/uniqdata") > val qry = streamDf.writeStream.format("carbondata").trigger(ProcessingTime("5 > seconds")) > .option("checkpointLocation","/stream/uniq") > .option("dbName", "default") > .option("tableName", "uniqdata_stream") > .start() > > 3) Error logs: > warning: there was one deprecation warning; re-run with -deprecation for > details > uniqdataSch: org.apache.spark.sql.types.StructType = > StructType(StructField(CUST_ID,IntegerType,true), > StructField(CUST_NAME,StringType,true), StructField(DOB,TimestampType,true), > StructField(DOJ,TimestampType,true), > StructField(BIGINT_COLUMN1,LongType,true), > StructField(BIGINT_COLUMN2,LongType,true), > StructField(DECIMAL_COLUMN1,DecimalType(30,10),true), > StructField(DECIMAL_COLUMN2,DecimalType(36,10),true), > StructField(Double_COLUMN1,DoubleType,true), > StructField(Double_COLUMN2,DoubleType,true), > StructField(INTEGER_COLUMN1,IntegerType,true)) > streamDf: org.apache.spark.sql.DataFrame = [CUST_ID: int, CUST_NAME: string > ... 9 more fields] > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@d0e155c > scala> 18/02/08 16:38:53 ERROR StreamSegment: Executor task launch worker for > task 5 Failed to append batch data to stream segment: > hdfs://localhost:54310/newCarbonStore/default/uniqdata_stream1/Fact/Part0/Segment_0 > java.lang.NullPointerException > at org.apache.spark.sql.catalyst.InternalRow.getString(InternalRow.scala:32) > at > org.apache.carbondata.streaming.parser.CSVStreamParserImp.parserRow(CSVStreamParserImp.java:40) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$InputIterator.next(CarbonAppendableStreamSink.scala:337) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$InputIterator.next(CarbonAppendableStreamSink.scala:331) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendab
[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1825 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3432/ ---
[GitHub] carbondata issue #1954: [Documentation] Formatting issue fixed
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1954 LGTM ---
[GitHub] carbondata issue #1952: [HotFix][CheckStyle] Fix import related checkstyle
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1952 retest this please ---
[GitHub] carbondata issue #1954: [Documentation] Formatting issue fixed
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1954 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3591/ ---
[GitHub] carbondata issue #1954: [Documentation] Formatting issue fixed
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1954 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2353/ ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1951 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3433/ ---
[GitHub] carbondata issue #1952: [HotFix][CheckStyle] Fix import related checkstyle
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1952 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3592/ ---
[GitHub] carbondata issue #1952: [HotFix][CheckStyle] Fix import related checkstyle
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1952 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2354/ ---
[GitHub] carbondata issue #1935: [CARBONDATA-2134] Prevent implicit column filter lis...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1935 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3434/ ---
[GitHub] carbondata issue #1953: [CARBONDATA-2091][DataLoad] Support specifying sort ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1953 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3435/ ---
[GitHub] carbondata issue #1857: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1857 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3593/ ---
[GitHub] carbondata issue #1952: [HotFix][CheckStyle] Fix import related checkstyle
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1952 LGTM ---
[GitHub] carbondata issue #1857: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1857 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2355/ ---
[GitHub] carbondata issue #1792: [CARBONDATA-2018][DataLoad] Optimization in reading/...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1792 retest this please ---
[GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1808#discussion_r166936712 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java --- @@ -114,4 +114,14 @@ */ public static final int MAX_EXTERNAL_DICTIONARY_SIZE = 1000; + /** + * enable block size based block allocation while loading data. By default, carbondata assigns + * blocks to node based on block number. If this option is set to `true`, carbondata will + * consider block size first and make sure that all the nodes will process almost equal size of + * data. This option is especially useful when you encounter skewed data. + */ + @CarbonProperty + public static final String ENABLE_CARBON_LOAD_SKEWED_DATA_OPTIMIZATION + = "carbon.load.skewed.data.optimization"; --- End diff -- change to `carbon.load.skewedDataOptimization.enabled` ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1941: [CARBONDATA1506] fix SDV error in PushUP_FILTER_uniq...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/1941 retest sdv please ---
[GitHub] carbondata issue #1949: [CARBONDATA2144] Optimize preaggregate table documen...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1949 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3436/ ---
[GitHub] carbondata issue #1792: [CARBONDATA-2018][DataLoad] Optimization in reading/...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1792 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3594/ ---
[GitHub] carbondata issue #1792: [CARBONDATA-2018][DataLoad] Optimization in reading/...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1792 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2356/ ---
[GitHub] carbondata issue #1943: [CARBONDATA-2142] Fixed Pre-Aggregate datamap creati...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1943 LGTM ---
[GitHub] carbondata issue #1857: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1857 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3595/ ---
[GitHub] carbondata issue #1857: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1857 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2357/ ---
[GitHub] carbondata pull request #1943: [CARBONDATA-2142] Fixed Pre-Aggregate datamap...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1943 ---
[GitHub] carbondata issue #1857: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1857 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3437/ ---
[GitHub] carbondata issue #1867: [CARBONDATA-2055][Streaming][WIP]Support integrating...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1867 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3438/ ---
[GitHub] carbondata pull request #1955: [HOTFIX] Fix documentation errors.
GitHub user sraghunandan opened a pull request: https://github.com/apache/carbondata/pull/1955 [HOTFIX] Fix documentation errors. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? No - [x] Any backward compatibility impacted? No - [x] Document update required? Yes - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. NA - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/sraghunandan/carbondata-1 make_doc_example_simple Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1955.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1955 commit 47e2396707c861774feb0a5f993038ad79ddc933 Author: Raghunandan S Date: 2018-02-08T16:00:03Z [HOTFIX] Fix documentation errors. ---
[GitHub] carbondata issue #1857: [CARBONDATA-2073][CARBONDATA-1516][Tests] Add test c...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1857 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3439/ ---
[GitHub] carbondata pull request #1955: [HOTFIX] Fix documentation errors.
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1955#discussion_r166984885 --- Diff: docs/data-management-on-carbondata.md --- @@ -955,7 +947,7 @@ roll-up for the queries on these hierarchies. USING "timeseries" DMPROPERTIES ( 'event_timeâ=âorder_timeâ, - 'year_granualrityâ=â1â, + 'year_granularityâ=â1â, --- End diff -- please remove "," ---
[GitHub] carbondata issue #1955: [HOTFIX] Fix documentation errors.
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1955 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3597/ ---
[GitHub] carbondata issue #1904: [CARBONDATA-2059] - Changes to support compaction fo...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1904 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3440/ ---
[GitHub] carbondata issue #1955: [HOTFIX] Fix documentation errors.
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1955 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2359/ ---
[GitHub] carbondata issue #1792: [CARBONDATA-2018][DataLoad] Optimization in reading/...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1792 LGTM ---
[GitHub] carbondata issue #1792: [CARBONDATA-2018][DataLoad] Optimization in reading/...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1792 merged into carbonstore branch ---
[GitHub] carbondata issue #1952: [HotFix][CheckStyle] Fix import related checkstyle
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1952 merged into carbonstore branch ---
[GitHub] carbondata issue #1928: [MINOR]Remove dependency of Java 1.8
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1928 LGTM ---
[GitHub] carbondata pull request #1928: [MINOR]Remove dependency of Java 1.8
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1928 ---
[GitHub] carbondata issue #1808: [CARBONDATA-2023][DataLoad] Add size base block allo...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1808 retest this please ---
[GitHub] carbondata issue #1947: [CARBONDATA-2119]deserialization issue for carbonloa...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1947 LGTM ---
[GitHub] carbondata pull request #1947: [CARBONDATA-2119]deserialization issue for ca...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1947 ---
[GitHub] carbondata issue #1948: [CARBONDATA-2143] Fixed query memory leak issue for ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1948 LGTM ---
[jira] [Created] (CARBONDATA-2148) Use Row parser to replace current default parser:CSVStreamParserImp
Zhichao Zhang created CARBONDATA-2148: -- Summary: Use Row parser to replace current default parser:CSVStreamParserImp Key: CARBONDATA-2148 URL: https://issues.apache.org/jira/browse/CARBONDATA-2148 Project: CarbonData Issue Type: Improvement Components: data-load, spark-integration Affects Versions: 1.3.0 Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: 1.3.0 Currently the default value of 'carbon.stream.parser' is CSVStreamParserImp, it transforms InternalRow(0) to Array[Object], InternalRow(0) represents the value of one line which is received from Socket. When it receives data from Kafka, the schema of InternalRow is changed, either it need to assemble the fields of kafka data Row into a String and stored it as InternalRow(0), or define a new parser to convert kafka data Row to Array[Object]. It needs the same operation for every table. *Solution:* Use a new parser called RowStreamParserImpl as the default parser instead of CSVStreamParserImpl, this new parser will automatically convert InternalRow to Array[Object] according to the schema. In general, we will transform source data to a structed Row object, using this way, we do not need to define a parser for every table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2149) Displayed complex type data is error when use DataFrame to write complex type data.
Zhichao Zhang created CARBONDATA-2149: -- Summary: Displayed complex type data is error when use DataFrame to write complex type data. Key: CARBONDATA-2149 URL: https://issues.apache.org/jira/browse/CARBONDATA-2149 Project: CarbonData Issue Type: Bug Components: data-load, spark-integration Affects Versions: 1.3.0 Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: 1.3.0 The default value of 'complex_delimiter_level_1' and 'complex_delimiter_level_2' is wrong, it must be '$' and ':', not be '\\$' and '\\:'. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #1948: [CARBONDATA-2143] Fixed query memory leak iss...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1948 ---
[GitHub] carbondata issue #1951: [CARBONDATA-1763] Dropped table if exception thrown ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1951 @kunal642 Please rebase ---
[GitHub] carbondata issue #1928: [MINOR]Remove dependency of Java 1.8
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/1928 @jackylk should this pr be merged into branch-1.3? ---
[GitHub] carbondata issue #1954: [Documentation] Formatting issue fixed
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1954 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3441/ ---