[GitHub] carbondata issue #2121: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2121 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5013/ ---
[GitHub] carbondata issue #2121: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2121 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3797/ ---
[GitHub] carbondata issue #2121: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2121 retest this please ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3796/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5012/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2171 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4446/ ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2136 @manishgupta88 is it ok to be merged? ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2097 retest this please ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181439768 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala --- @@ -173,6 +174,10 @@ object CarbonEnv { .addListener(classOf[AlterTableDropPartitionPostStatusEvent], AlterTableDropPartitionPostStatusListener) .addListener(classOf[AlterTableDropPartitionMetaEvent], AlterTableDropPartitionMetaListener) + .addListener(classOf[AlterTableRenamePreEvent], LuceneRenameTablePreListener) --- End diff -- this listener class is added to block alter operation on lucene datamap, if we are blocking alter operation for all the datamaps, then this may not be required. ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3795/ ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181438240 --- Diff: datamap/lucene/pom.xml --- @@ -141,6 +141,34 @@ + --- End diff -- this was added to include test suite in main CI ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5011/ ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181437595 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1642,6 +1642,16 @@ public static final String CARBON_SEARCH_MODE_THREAD_DEFAULT = "3"; + /** + * compression mode used by lucene for index writing + */ + public static final String CARBON_LUCENE_COMPRESSION_MODE = "carbon.lucene.compression.mode"; --- End diff -- SPEED and COMPRESSION, by default the property value will be SPEED ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2170 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4445/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3794/ ---
[GitHub] carbondata pull request #2168: [CARBONDATA-2343][DataMap]Improper filter res...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2168 ---
[jira] [Resolved] (CARBONDATA-2343) Improper filter resolver cause more filter scan on data that could be skipped
[ https://issues.apache.org/jira/browse/CARBONDATA-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-2343. -- Resolution: Fixed Fix Version/s: 1.4.0 > Improper filter resolver cause more filter scan on data that could be skipped > - > > Key: CARBONDATA-2343 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2343 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > In DataMapChooser, Carbondata try to choose and combine datamap for > expressions. In some scenario, it will generate `TrueConditionalResolverImpl` > to wrap the sub-expression, which will cause data scan on the blocklet which > can be skipped (For `TrueConditionalResolverImpl`, the `TrueFilterExecutor` > will always cause scanning the data even it simply wraps a range expression). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2168 LGTM ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5010/ ---
[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437420#comment-16437420 ] Zhichao Zhang commented on CARBONDATA-2345: [~oceaneast], you need to add below option into 'writeStream' block: {code:java} .option(CarbonStreamParser.CARBON_STREAM_PARSER, CarbonStreamParser.CARBON_STREAM_PARSER_ROW_PARSER) {code} for example: {code:java} qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("20 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("dbName", "default") .option("tableName", tableName) .option(CarbonStreamParser.CARBON_STREAM_PARSER, CarbonStreamParser.CARBON_STREAM_PARSER_ROW_PARSER) .outputMode("append") .start() {code} Please try again. > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): > org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at >
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2136 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3792/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2171 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests// ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5009/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2097 retest this please ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3793/ ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2169 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3791/ ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2170 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4443/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2097 retest this please ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2136 retest this please ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181399425 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala --- @@ -69,11 +69,33 @@ case class CarbonCreateDataMapCommand( } dataMapSchema = new DataMapSchema(dataMapName, dmClassName) -if (mainTable != null && -mainTable.isStreamingTable && - !(dataMapSchema.getProviderName.equalsIgnoreCase(DataMapClassProvider.PREAGGREGATE.toString) - || dataMapSchema.getProviderName -.equalsIgnoreCase(DataMapClassProvider.TIMESERIES.toString))) { +if (dataMapSchema.getProviderName.equalsIgnoreCase(DataMapClassProvider.LUCENEFG.toString) || --- End diff -- I think we should abstract interface for it. We can not add if check for every new datamap added ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181397988 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala --- @@ -173,6 +174,10 @@ object CarbonEnv { .addListener(classOf[AlterTableDropPartitionPostStatusEvent], AlterTableDropPartitionPostStatusListener) .addListener(classOf[AlterTableDropPartitionMetaEvent], AlterTableDropPartitionMetaListener) + .addListener(classOf[AlterTableRenamePreEvent], LuceneRenameTablePreListener) --- End diff -- Is this required? Ideally, lucene datamap is a separate module which should not have intrusive modification in other modules ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181396888 --- Diff: datamap/lucene/pom.xml --- @@ -141,6 +141,34 @@ + --- End diff -- I realize that in this pom, it should not depend on carbon-spark2, please modify the dependency in this pom ---
[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2113#discussion_r181394048 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1642,6 +1642,16 @@ public static final String CARBON_SEARCH_MODE_THREAD_DEFAULT = "3"; + /** + * compression mode used by lucene for index writing + */ + public static final String CARBON_LUCENE_COMPRESSION_MODE = "carbon.lucene.compression.mode"; --- End diff -- what are the options available for this property? ---
[GitHub] carbondata issue #2172: [CARBONDATA-2333] Block insert overwrite if all part...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2172 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5005/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3790/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5006/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2148 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4442/ ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2169 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4441/ ---
[jira] [Commented] (CARBONDATA-2318) Remove invalid table name(.ds_store) of presto integration
[ https://issues.apache.org/jira/browse/CARBONDATA-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437283#comment-16437283 ] anubhav tarar commented on CARBONDATA-2318: --- hi i tried again using same steps that you provided but didn't able to replicate the issue step 1: create carbonsession using sparkshell val carbon = SparkSession.builder().config(sc.getConf) .getOrCreateCarbonSession("/home/anubhav/Documents/prestostore/data","/home/anubhav/Documents/prestostore/metadata") step2: copy old carbondata to new store location /home/anubhav/Documents/carbondata/carbondata/examples/spark2/target/store/default to /home/anubhav/Documents/prestostore/data step3:refresh table scala> carbon.sql("refresh table carbonsession_table").show 18/04/13 13:43:48 AUDIT CarbonCreateTableCommand: [anubhav-Vostro-3559][anubhav][Thread-1]Creating Table with Database name [default] and Table name [carbonsession_table] 18/04/13 13:43:49 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`carbonsession_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. 18/04/13 13:43:49 AUDIT CarbonCreateTableCommand: [anubhav-Vostro-3559][anubhav][Thread-1]Table created with Database name [default] and Table name [carbonsession_table] 18/04/13 13:43:49 AUDIT RefreshCarbonTableCommand: [anubhav-Vostro-3559][anubhav][Thread-1]Table registration with Database name [default] and Table name [carbonsession_table] is successful. step4:query the new store from presto ./presto-cli-0.187-executable.jar --server localhost:9000 --catalog carbondata presto> show tables from default; Table - carbonsession_table (1 row) Query 20180413_080021_0_vev2q, FINISHED, 1 node Splits: 18 total, 18 done (100.00%) 0:02 [1 rows, 36B] [0 rows/s, 22B/s] > Remove invalid table name(.ds_store) of presto integration > --- > > Key: CARBONDATA-2318 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2318 > Project: CarbonData > Issue Type: Improvement > Components: presto-integration >Reporter: Liang Chen >Priority: Minor > > For presto integration , will get the invalid table name via "show tables > from default" > As below. > presto:default> show tables from default; > Table > > .ds_store > carbon_table > carbontable > partition_bigtable > partition_table > (5 rows) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2166: [CARBONDATA-2341] Added Clean up of files for Pre-Ag...
Github user praveenmeenakshi56 commented on the issue: https://github.com/apache/carbondata/pull/2166 retest SDV please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3788/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5004/ ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2169 retest this please ---
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2169 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4440/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5003/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3787/ ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3785/ ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5001/ ---
[GitHub] carbondata pull request #2172: [CARBONDATA-2333] Block insert overwrite if a...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2172 [CARBONDATA-2333] Block insert overwrite if all partition columns are not present in an⦠â¦y one of the datamaps Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata preagg_partition_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2172.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2172 commit 5c53f555d3a46b1a0961b7eb82e7ba5df628e994 Author: kunal642Date: 2018-04-11T11:22:08Z block insert overwrite if all partition columns are not present in any one of the datamaps ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2141 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4439/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2168 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4437/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4999/ ---
[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2171 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3783/ ---
[jira] [Updated] (CARBONDATA-2347) Fix Functional issues in LuceneDatamap in load and query and make stable
[ https://issues.apache.org/jira/browse/CARBONDATA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal updated CARBONDATA-2347: Description: 1) The index write location for the lucene is same, and to IndexWriter will take a lock file called write.lock in write location while writing the index files. In carbon loading the writer tasks are launched parallel and those many writers are opened,Since the write.lock file is acquired by one writer, all other tasks will fail and dataloading will fail. 2)in query side, read index path for lucene was in single path, but after load fix, there will be multiple index directories after load functional issues in drop table, drop datamap, show datamap > Fix Functional issues in LuceneDatamap in load and query and make stable > > > Key: CARBONDATA-2347 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2347 > Project: CarbonData > Issue Type: Bug > Components: data-load, data-query >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Major > > 1) The index write location for the lucene is same, and to IndexWriter will > take a lock file called write.lock in write location while writing the index > files. In carbon loading the writer tasks are launched parallel and those > many writers are opened,Since the write.lock file is acquired by one writer, > all other tasks will fail and dataloading will fail. > 2)in query side, read index path for lucene was in single path, but after > load fix, there will be multiple index directories after load > functional issues in drop table, drop datamap, show datamap > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3784/ ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5000/ ---
[jira] [Created] (CARBONDATA-2347) Fix Functional issues in LuceneDatamap in load and query and make stable
Akash R Nilugal created CARBONDATA-2347: --- Summary: Fix Functional issues in LuceneDatamap in load and query and make stable Key: CARBONDATA-2347 URL: https://issues.apache.org/jira/browse/CARBONDATA-2347 Project: CarbonData Issue Type: Bug Components: data-load, data-query Reporter: Akash R Nilugal Assignee: Akash R Nilugal -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2171: [wip]test lucene sdv and UT in CI
GitHub user Indhumathi27 opened a pull request: https://github.com/apache/carbondata/pull/2171 [wip]test lucene sdv and UT in CI Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Indhumathi27/carbondata test_ci_luc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2171.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2171 commit 46b29dd2103156a5096a04cc72960dd5170e2d9a Author: Indhumathi27Date: 2018-04-13T06:29:22Z Added UT & SDV Testcases for LuceneDataMap commit 1be3dfd26a96cfa123de403512d4d04121340aed Author: akashrn5 Date: 2018-03-29T14:29:36Z load issue in lucene datamap, make multiple directory based on taskId make the datamap distributable object based on lucene index path written during load Added Lucene Listener and Fixed Show Datamap ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4998/ ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2136 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4438/ ---
[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2170 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3782/ ---
[GitHub] carbondata pull request #2170: [CARBONDATA-2346] Added fix for NULL error wh...
GitHub user praveenmeenakshi56 opened a pull request: https://github.com/apache/carbondata/pull/2170 [CARBONDATA-2346] Added fix for NULL error while dropping partition with multiple Pre-Aggregate tables Fixed null value issue for childcolumn - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. NA - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/praveenmeenakshi56/carbondata defect_part Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2170.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2170 commit dd3d3d1181847a1930048144740bfa053c878dd8 Author: praveenmeenakshi56Date: 2018-04-13T10:31:35Z Added fix for error while dropping partition with multiple Pre-Aggregate tables ---
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user chandrasaripaka commented on the issue: https://github.com/apache/carbondata/pull/2161 @CarbonDataQA May I know if this has to be fixed from my side..as a part of the pull request, Kindly advise. @xubo245 Also, I dont have access to resolve the conflicts and recommit. Please advise. ---
[jira] [Created] (CARBONDATA-2346) Dropping partition failing with null error for Partition table with Pre-Aggregate tables
Praveen M P created CARBONDATA-2346: --- Summary: Dropping partition failing with null error for Partition table with Pre-Aggregate tables Key: CARBONDATA-2346 URL: https://issues.apache.org/jira/browse/CARBONDATA-2346 Project: CarbonData Issue Type: Bug Reporter: Praveen M P Assignee: Praveen M P -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4995/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4997/ ---
[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437140#comment-16437140 ] ocean commented on CARBONDATA-2345: --- stream source is parquet file。 reproduce can use this code: val tableName = "profile_carbondata_stream2" val pqtpath = "/test/stream" val warehouse = new File("./warehouse").getCanonicalPath val metastore = new File("./metastore").getCanonicalPath val spark = SparkSession .builder() .appName("StreamExample") .config("spark.sql.warehouse.dir", warehouse) .getOrCreateCarbonSession(warehouse, metastore) val carbonTable = CarbonEnv.getCarbonTable(Some("default"), tableName)(spark) val tablePath = CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier) var qry: StreamingQuery = null val userSchema = spark.read.parquet(pqtpath).schema val readSocketDF = spark.readStream.schema(userSchema).parquet(pqtpath) // Write data from socket stream to carbondata file qry = readSocketDF.writeStream .format("carbondata") .trigger(ProcessingTime("20 seconds")) .option("checkpointLocation", tablePath.getStreamingCheckpointDir) .option("dbName", "default") .option("tableName", tableName) .outputMode("append") .start() qry.awaitTermination() > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure:
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2169 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4996/ ---
[jira] [Created] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
ocean created CARBONDATA-2345: - Summary: "Task failed while writing rows" error occuers when streaming ingest into carbondata table Key: CARBONDATA-2345 URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.3.1 Reporter: ocean carbondata version:1.3.1。spark:2.2.1 When using spark structured streaming ingest data into carbondata table , such error occurs: warning: there was one deprecation warning; re-run with -deprecation for details qry: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while writing rows at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) at org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) at org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) at org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) ... 8 more [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while writing rows at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) at org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) at org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) at org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) at
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3781/ ---
[GitHub] carbondata pull request #2149: [CARBONDATA-2325]Page level uncompress and Im...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2149#discussion_r181342780 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/unsafe/UnsafeVariableLengthDimensionDataChunkStore.java --- @@ -78,70 +88,96 @@ public UnsafeVariableLengthDimensionDataChunkStore(long totalSize, boolean isInv // start position will be used to store the current data position int startOffset = 0; -// position from where offsets will start -long pointerOffsets = this.dataPointersOffsets; // as first position will be start from 2 byte as data is stored first in the memory block // we need to skip first two bytes this is because first two bytes will be length of the data // which we have to skip -CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(), -dataPageMemoryBlock.getBaseOffset() + pointerOffsets, -CarbonCommonConstants.SHORT_SIZE_IN_BYTE); -// incrementing the pointers as first value is already filled and as we are storing as int -// we need to increment the 4 bytes to set the position of the next value to set -pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE; +int [] dataOffsets = new int[numberOfRows]; +dataOffsets[0] = CarbonCommonConstants.SHORT_SIZE_IN_BYTE; // creating a byte buffer which will wrap the length of the row -// using byte buffer as unsafe will return bytes in little-endian encoding -ByteBuffer buffer = ByteBuffer.allocate(CarbonCommonConstants.SHORT_SIZE_IN_BYTE); -// store length of data -byte[] length = new byte[CarbonCommonConstants.SHORT_SIZE_IN_BYTE]; -// as first offset is already stored, we need to start from the 2nd row in data array +ByteBuffer buffer = ByteBuffer.wrap(data); for (int i = 1; i < numberOfRows; i++) { - // first copy the length of previous row - CarbonUnsafe.getUnsafe().copyMemory(dataPageMemoryBlock.getBaseObject(), - dataPageMemoryBlock.getBaseOffset() + startOffset, length, CarbonUnsafe.BYTE_ARRAY_OFFSET, - CarbonCommonConstants.SHORT_SIZE_IN_BYTE); - buffer.put(length); - buffer.flip(); + buffer.position(startOffset); // so current row position will be // previous row length + 2 bytes used for storing previous row data - startOffset += CarbonCommonConstants.SHORT_SIZE_IN_BYTE + buffer.getShort(); + startOffset += buffer.getShort() + CarbonCommonConstants.SHORT_SIZE_IN_BYTE; // as same byte buffer is used to avoid creating many byte buffer for each row // we need to clear the byte buffer - buffer.clear(); - // now put the offset of current row, here we need to add 2 more bytes as current will - // also have length part so we have to skip length - CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(), - dataPageMemoryBlock.getBaseOffset() + pointerOffsets, - startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE); - // incrementing the pointers as first value is already filled and as we are storing as int - // we need to increment the 4 bytes to set the position of the next value to set - pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE; + dataOffsets[i] = startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE; } - +CarbonUnsafe.getUnsafe().copyMemory(dataOffsets, CarbonUnsafe.INT_ARRAY_OFFSET, +dataPageMemoryBlock.getBaseObject(), +dataPageMemoryBlock.getBaseOffset() + this.dataPointersOffsets, +dataOffsets.length * CarbonCommonConstants.INT_SIZE_IN_BYTE); } /** * Below method will be used to get the row based on row id passed - * + * Getting the row from unsafe works in below logic + * 1. if inverted index is present then get the row id based on reverse inverted index + * 2. get the current row id data offset + * 3. if it's not a last row- get the next row offset + * Subtract the current row offset + 2 bytes(to skip the data length) with next row offset + * 4. if it's last row + * subtract the current row offset + 2 bytes(to skip the data length) with complete data length * @param rowId * @return row */ @Override public byte[] getRow(int rowId) { +// get the actual row id +rowId = getRowId(rowId); +// get offset of data in unsafe +int currentDataOffset = getOffSet(rowId); +// get the data length +short length = getLength(rowId, currentDataOffset); +// create data array +byte[] data = new byte[length]; +// fill the row data +
[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2169 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3780/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3778/ ---
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2113 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4994/ ---
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2113 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3777/ ---
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Description: h5. Support unmanaged carbon table (was: h1. Support unmanaged carbon table) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2113 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4436/ ---
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Description: h1. Support unmanaged carbon table > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > > h1. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2169: [CARBONDATA-2344][DataMap] Fix bugs in mappin...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2169 [CARBONDATA-2344][DataMap] Fix bugs in mapping blocklet to UnsafeDMStore rows In BlockletDataMap, carbondata stores DMRow in an array for each blocklet. But currently carbondata accesses the DMRow only by blockletId(0, 1, etc.), which will cause problem since different block can have same blockletId. This PR adds a map to map the blockId#blockletId to array index, carbondata can access the DMRow by blockId and blockletId. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO, only internal interfaces have been changed` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO` - How it is tested? Please attach test report. `Tested in local` - Is it a performance related change? Please attach the performance test report. `No` - Any additional information to help reviewers in testing this change. `NO` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `Not related` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0413_bug_blocklet_dm_unsafe_row Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2169.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2169 commit dd010297c7f7428dc8f42ec1a292b8cdddcc09aa Author: xuchuanyinDate: 2018-04-13T08:18:23Z Fix bugs in mapping blocklet to UnsafeDMStore In BlockletDataMap, carbondata stores DMRow in an array for each blocklet. But currently carbondata accesses the DMRow only by blockletId(0, 1, etc.), which will cause problem since different block can have same blockletId. This PR adds a map to map the blockId#blockletId to array index, carbondata can access the DMRow by blockId and blockletId. ---
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon unmanaged table desgin doc_V1.0.pdf > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: (was: carbon unamanged table desgin doc_V1.0.pdf) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2344) Fix bugs in BlockletDataMap
xuchuanyin created CARBONDATA-2344: -- Summary: Fix bugs in BlockletDataMap Key: CARBONDATA-2344 URL: https://issues.apache.org/jira/browse/CARBONDATA-2344 Project: CarbonData Issue Type: Bug Components: data-query Reporter: xuchuanyin Assignee: xuchuanyin DMStore stores DataMapRows for each blocklet. Currently carbondata access the DMStore by blockletId, which is not unique and will cause problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4991/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3775/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4993/ ---
[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2168 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4992/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3776/ ---
[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2168 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3774/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2141 retest this please ---
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2113 retest this please ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2148 retest this please ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3771/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4990/ ---
[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2141 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4435/ ---
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2113 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3772/ ---
[jira] [Commented] (CARBONDATA-2318) Remove invalid table name(.ds_store) of presto integration
[ https://issues.apache.org/jira/browse/CARBONDATA-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436954#comment-16436954 ] Liang Chen commented on CARBONDATA-2318: I tested it in sparkshell : Step1: ./bin/spark-shell --master local --jars ${carbon_jar} --driver-memory 4G Step2: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("/Users/apple/DEMO/presto_test/data","/Users/apple/DEMO/presto_test/metadata") Step3: reuse the old carbondata 1) : copy all data "default/carbon_table/.." to new location : /Users/apple/DEMO/presto_test/data 2) : run carbon.sql("refresh table carbon_table") > Remove invalid table name(.ds_store) of presto integration > --- > > Key: CARBONDATA-2318 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2318 > Project: CarbonData > Issue Type: Improvement > Components: presto-integration >Reporter: Liang Chen >Priority: Minor > > For presto integration , will get the invalid table name via "show tables > from default" > As below. > presto:default> show tables from default; > Table > > .ds_store > carbon_table > carbontable > partition_bigtable > partition_table > (5 rows) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2113 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4989/ ---
[GitHub] carbondata pull request #2168: [CARBONDATA-2343][DataMap]Improper filter res...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2168 [CARBONDATA-2343][DataMap]Improper filter resolver cause more filter scan on data that could be skipped Currently DataMapChooser will choose and combine datamap for expressions and it will wrap the expression in a `TrueConditionalResolverImpl`. However the executor `TrueFilterExecutor` will always cause scanning the blocklet which could be skipped. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO, only internal interface has been changed` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0413_bug_dm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2168.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2168 commit 2e2c0683f867a7ecda7a4e7f80b2c7030220cd4a Author: xuchuanyinDate: 2018-04-13T07:14:25Z Fix bugs in datamap chooser Currently DataMapChooser will choose and combine datamap for expressions and it will wrap the expression in a `TrueConditionalResolverImpl`. However the executor `TrueFilterExecutor` will always cause scanning the blocklet which could be skipped. ---
[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2136 retest sdv please ---