[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2141 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3818/ ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181624234 --- Diff: store/search/src/main/java/org/apache/carbondata/store/worker/SearchRequestHandler.java --- @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.store.worker; + +import java.io.ByteArrayInputStream; +import java.io.DataInputStream; +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; +import java.util.Queue; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datamap.DataMapChooser; +import org.apache.carbondata.core.datamap.DataMapLevel; +import org.apache.carbondata.core.datamap.Segment; +import org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapper; +import org.apache.carbondata.core.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.datastore.row.CarbonRow; +import org.apache.carbondata.core.indexstore.ExtendedBlocklet; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.TableInfo; +import org.apache.carbondata.core.readcommitter.LatestFilesReadCommittedScope; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.model.QueryModelBuilder; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.CarbonMultiBlockSplit; +import org.apache.carbondata.hadoop.CarbonRecordReader; +import org.apache.carbondata.hadoop.readsupport.impl.CarbonRowReadSupport; +import org.apache.carbondata.store.protocol.SearchRequest; +import org.apache.carbondata.store.protocol.SearchResult; +import org.apache.carbondata.store.util.GrpcSerdes; + +import com.google.protobuf.ByteString; + +/** + * Thread runnable for handling SearchRequest from master. + */ +@InterfaceAudience.Internal +class SearchRequestHandler implements Runnable { + + private static final LogService LOG = + LogServiceFactory.getLogService(SearchRequestHandler.class.getName()); + private boolean running = true; + private Queue requestQueue; + + SearchRequestHandler(Queue requestQueue) { +this.requestQueue = requestQueue; + } + + public void run() { +while (running) { + SearchService.SearchRequestContext requestContext = requestQueue.poll(); + if (requestContext == null) { +try { + Thread.sleep(10); +} catch (InterruptedException e) { + LOG.error(e); +} + } else { +try { + List rows = handleRequest(requestContext); + sendSuccessResponse(requestContext, rows); +} catch (IOException | InterruptedException e) { + LOG.error(e); + sendFailureResponse(requestContext, e); +} + } +} + } + + public void stop() { +running = false; + } + + /** + * Builds {@link QueryModel} and read data from files + */ + private List handleRequest(SearchService.SearchRequestContext requestContext) + throws IOException, InterruptedException { +SearchRequest request = requestContext.getRequest(); +TableInfo tableInfo = GrpcSerdes.deserialize(request.getTableInfo()); +CarbonTable table = CarbonTable.buildFromTableInfo(tableInfo); +QueryModel queryModel = createQueryModel(table, request); + +// the request contains CarbonMultiBlockSplit and reader will read multiple blocks +// by using a thread pool +CarbonMultiBlockSplit mbSplit = getMultiBlockSplit(request); + +// If there is
[GitHub] carbondata issue #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue in lucen...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/2113 retest sdv please ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181620436 --- Diff: store/search/src/main/java/org/apache/carbondata/store/master/Master.java --- @@ -0,0 +1,279 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.store.master; + +import java.io.ByteArrayOutputStream; +import java.io.DataOutput; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.Random; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ExecutionException; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.block.Distributable; +import org.apache.carbondata.core.datastore.row.CarbonRow; +import org.apache.carbondata.core.exception.InvalidConfigurationException; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.CarbonMultiBlockSplit; +import org.apache.carbondata.hadoop.api.CarbonTableInputFormat; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.processing.util.CarbonLoaderUtil; +import org.apache.carbondata.store.protocol.EchoRequest; +import org.apache.carbondata.store.protocol.EchoResponse; +import org.apache.carbondata.store.protocol.SearchRequest; +import org.apache.carbondata.store.protocol.SearchResult; +import org.apache.carbondata.store.protocol.ShutdownRequest; +import org.apache.carbondata.store.protocol.ShutdownResponse; +import org.apache.carbondata.store.protocol.WorkerGrpc; +import org.apache.carbondata.store.util.GrpcSerdes; + +import com.google.common.util.concurrent.ListenableFuture; +import com.google.protobuf.ByteString; +import io.grpc.ManagedChannel; +import io.grpc.ManagedChannelBuilder; +import io.grpc.Server; +import io.grpc.ServerBuilder; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.Job; + +/** + * Master of CarbonSearch. + * It listens to {@link Master#DEFAULT_PORT} to wait for worker to register. + * And it provides search API to fire RPC call to workers. + */ +@InterfaceAudience.Internal +public class Master { + + private static final LogService LOG = LogServiceFactory.getLogService(Master.class.getName()); + + public static final int DEFAULT_PORT = 10020; + + private Server registryServer; + + private int port; + + private Random random = new Random(); + + /** mapping of worker hostname to rpc stub */ + private Mapworkers; + + public Master() { +this(DEFAULT_PORT); + } + + public Master(int port) { +this.port = port; +this.workers = new ConcurrentHashMap<>(); + } + + /** start service and listen on port passed in constructor */ + public void startService() throws IOException { +if (registryServer == null) { + /* The port on which the registryServer should run */ + registryServer = ServerBuilder.forPort(port) + .addService(new RegistryService(this)) + .build() + .start(); + LOG.info("Master started, listening on " + port); + Runtime.getRuntime().addShutdownHook(new Thread() { +@Override public void run() { + // Use stderr here since the logger may have
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2161 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5033/ ---
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2161 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3817/ ---
[GitHub] carbondata issue #2173: [WIP][TEST]Carbondata rpc
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2173 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3816/ ---
[jira] [Comment Edited] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438969#comment-16438969 ] Zhichao Zhang edited comment on CARBONDATA-2345 at 4/16/18 4:11 AM: - [~oceaneast], you can see the doc [Stream data parser|https://github.com/apache/carbondata/blob/branch-1.3/docs/streaming-guide.md#stream-data-parser]. There is also an [example|https://github.com/apache/carbondata/blob/branch-1.3/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonStructuredStreamingWithRowParser.scala] showing how to use Stream Data Parser. was (Author: zzcclp): [~oceaneast], you can see the doc [Stream data parser|https://github.com/apache/carbondata/blob/branch-1.3/docs/streaming-guide.md#stream-data-parser] > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): > org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at >
[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438969#comment-16438969 ] Zhichao Zhang commented on CARBONDATA-2345: [~oceaneast], you can see the doc [Stream data parser|https://github.com/apache/carbondata/blob/branch-1.3/docs/streaming-guide.md#stream-data-parser] > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): > org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at >
[GitHub] carbondata issue #2174: [CARBONDATA-2350][DataMap] Fix bugs in minmax datama...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2174 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5031/ ---
[GitHub] carbondata issue #2173: [WIP][TEST]Carbondata rpc
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2173 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5032/ ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181617779 --- Diff: pom.xml --- @@ -121,6 +122,8 @@ org.apache.carbondata.cluster.sdv.suite.SDVSuites .sh false +1.10.0 +4.0.43.Final --- End diff -- Better use netty version same version as spark uses. Otherwise lot of exclusions need to be done. ---
[GitHub] carbondata issue #2173: [WIP][TEST]Carbondata rpc
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2173 retest this please ---
[GitHub] carbondata issue #2174: [CARBONDATA-2350][DataMap] Fix bugs in minmax datama...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2174 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3814/ ---
[GitHub] carbondata issue #2173: [WIP][TEST]Carbondata rpc
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2173 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3815/ ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3813/ ---
[GitHub] carbondata issue #2173: [WIP][TEST]Carbondata rpc
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2173 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5030/ ---
[GitHub] carbondata pull request #2174: [CARBONDATA-2350][DataMap] Fix bugs in minmax...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2174 [CARBONDATA-2350][DataMap] Fix bugs in minmax datamap example Fix bugs in minmax datamap example Previous implementation of minmax datamap example doesn't work as intended and has the following problems: + The example cannot run functionally. + The minmax datamap doesn't prune blocklet as intended. This PR fixed the above problems. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `Updated the test case` - How it is tested? Please attach test report. `Tested in local machine` - Is it a performance related change? Please attach the performance test report. `NO` - Any additional information to help reviewers in testing this change. `NO` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `Not related` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0416_minmax_dm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2174.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2174 commit 77a8b2114857914fd07f4df417b4c99d574abf7b Author: xuchuanyinDate: 2018-04-16T02:55:10Z Fix bugs in minmax datamap example Fix bugs in minmax datamap example ---
[GitHub] carbondata pull request #2173: [WIP][TEST]Carbondata rpc
GitHub user xubo245 opened a pull request: https://github.com/apache/carbondata/pull/2173 [WIP][TEST]Carbondata rpc Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xubo245/carbondata carbondata-rpc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2173 commit 5c1b6c08a20a314149928412224f3143b5006b67 Author: Jacky LiDate: 2018-04-15T01:06:03Z support search mode by gRPC commit 52faa528c14580dc71884bb240ea8c4c1a3edfa7 Author: Jacky Li Date: 2018-04-15T15:54:23Z fix comment commit 07c3b56e16b6fd1d15e0050076943bf0f6cdd129 Author: xubo245 <601450868@...> Date: 2018-04-16T02:59:18Z remove unused import ---
[jira] [Created] (CARBONDATA-2350) Fix bugs in minmax datamap example
xuchuanyin created CARBONDATA-2350: -- Summary: Fix bugs in minmax datamap example Key: CARBONDATA-2350 URL: https://issues.apache.org/jira/browse/CARBONDATA-2350 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin Current implementation of minmax datamap example doesn't work as intended and has the following problems: # The example cannot run functionally. # The minmax datamap doesn't prune blocklet as intended. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181613222 --- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java --- @@ -26,7 +26,6 @@ import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.cache.dictionary.Dictionary; -import org.apache.carbondata.core.constants.CarbonCommonConstants; --- End diff -- Please remove line 38 and 45 unused import ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181613087 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala --- @@ -59,6 +60,8 @@ import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport import org.apache.carbondata.processing.util.CarbonLoaderUtil import org.apache.carbondata.spark.InitInputMetrics import org.apache.carbondata.spark.util.{SparkDataTypeConverterImpl, Util} +import org.apache.carbondata.store.master.Master +import org.apache.carbondata.store.worker.Worker --- End diff -- Above all newly added imports are unused, please remove ---
[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438942#comment-16438942 ] ocean commented on CARBONDATA-2345: --- hi zhichao zhang, after I add this option. It's OK.But I think we should add this option to documents and examples. > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): > org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181612979 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -18,6 +18,7 @@ package org.apache.carbondata.spark.rdd import java.io.{ByteArrayInputStream, ByteArrayOutputStream, ObjectInputStream, ObjectOutputStream} +import java.net.InetAddress --- End diff -- Unused import, please remove ---
[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2161 refresh your local code: git fetch --all rebase to latest master branch code: git rebase -i origin/master fix the conflict and push again ---
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2097 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5029/ ---
[jira] [Issue Comment Deleted] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ocean updated CARBONDATA-2345: -- Comment: was deleted (was: hi zhichao zhang, I add this option, but this error still happen. There must be other problems) > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): > org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table
[ https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438922#comment-16438922 ] ocean commented on CARBONDATA-2345: --- hi zhichao zhang, I add this option, but this error still happen. There must be other problems > "Task failed while writing rows" error occuers when streaming ingest into > carbondata table > -- > > Key: CARBONDATA-2345 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2345 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.1 >Reporter: ocean >Priority: Major > > carbondata version:1.3.1。spark:2.2.1 > When using spark structured streaming ingest data into carbondata table , > such error occurs: > warning: there was one deprecation warning; re-run with -deprecation for > details > qry: org.apache.spark.sql.streaming.StreamingQuery = > org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a > [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 > in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor > 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126) > at > org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164) > at > org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186) > at > org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338) > ... 8 more > [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: > Task 0 in stage 1.0 failed 4 times; aborting job > 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread > for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = > 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): > org.apache.carbondata.streaming.CarbonStreamException: Task failed while > writing rows > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247) > at > org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/2097 retest this please ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181589448 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/store/SparkCarbonStore.scala --- @@ -68,19 +78,16 @@ private[store] class SparkCarbonStore extends MetaCachedCarbonStore { filter: Expression): java.util.Iterator[CarbonRow] = { require(path != null) require(projectColumns != null) -val table = getTable(path) -val rdd = new CarbonScanRDD[CarbonRow]( - spark = session, - columnProjection = new CarbonProjection(projectColumns), - filterExpression = filter, - identifier = table.getAbsoluteTableIdentifier, - serializedTableInfo = table.getTableInfo.serialize, - tableInfo = table.getTableInfo, - inputMetricsStats = new CarbonInputMetrics, - partitionNames = null, - dataTypeConverterClz = null, - readSupportClz = classOf[CarbonRowReadSupport]) -rdd.collect +scan(getTable(path), projectColumns, filter) + } + + def scan( + carbonTable: CarbonTable, + projectColumns: Array[String], --- End diff -- Now I removed this function and use `scan` interface only ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181589099 --- Diff: store/search/src/main/java/org/apache/carbondata/store/worker/SearchRequestHandler.java --- @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.store.worker; + +import java.io.ByteArrayInputStream; +import java.io.DataInputStream; +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; +import java.util.Queue; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datamap.DataMapChooser; +import org.apache.carbondata.core.datamap.DataMapLevel; +import org.apache.carbondata.core.datamap.Segment; +import org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapper; +import org.apache.carbondata.core.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.datastore.row.CarbonRow; +import org.apache.carbondata.core.indexstore.ExtendedBlocklet; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.TableInfo; +import org.apache.carbondata.core.readcommitter.LatestFilesReadCommittedScope; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.model.QueryModelBuilder; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.CarbonMultiBlockSplit; +import org.apache.carbondata.hadoop.CarbonRecordReader; +import org.apache.carbondata.hadoop.readsupport.impl.CarbonRowReadSupport; +import org.apache.carbondata.store.protocol.SearchRequest; +import org.apache.carbondata.store.protocol.SearchResult; +import org.apache.carbondata.store.util.GrpcSerdes; + +import com.google.protobuf.ByteString; + +/** + * Thread runnable for handling SearchRequest from master. + */ +@InterfaceAudience.Internal +class SearchRequestHandler implements Runnable { + + private static final LogService LOG = + LogServiceFactory.getLogService(SearchRequestHandler.class.getName()); + private boolean running = true; + private Queue requestQueue; + + SearchRequestHandler(Queue requestQueue) { +this.requestQueue = requestQueue; + } + + public void run() { +while (running) { + SearchService.SearchRequestContext requestContext = requestQueue.poll(); + if (requestContext == null) { +try { + Thread.sleep(10); +} catch (InterruptedException e) { + LOG.error(e); +} + } else { +try { + List rows = handleRequest(requestContext); + sendSuccessResponse(requestContext, rows); +} catch (IOException | InterruptedException e) { + LOG.error(e); + sendFailureResponse(requestContext, e); +} + } +} + } + + public void stop() { +running = false; + } + + /** + * Builds {@link QueryModel} and read data from files + */ + private List handleRequest(SearchService.SearchRequestContext requestContext) + throws IOException, InterruptedException { +SearchRequest request = requestContext.getRequest(); +TableInfo tableInfo = GrpcSerdes.deserialize(request.getTableInfo()); +CarbonTable table = CarbonTable.buildFromTableInfo(tableInfo); +QueryModel queryModel = createQueryModel(table, request); + +// the request contains CarbonMultiBlockSplit and reader will read multiple blocks +// by using a thread pool +CarbonMultiBlockSplit mbSplit = getMultiBlockSplit(request); + +// If there is
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181588502 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/store/SparkCarbonStore.scala --- @@ -68,19 +78,16 @@ private[store] class SparkCarbonStore extends MetaCachedCarbonStore { filter: Expression): java.util.Iterator[CarbonRow] = { require(path != null) require(projectColumns != null) -val table = getTable(path) -val rdd = new CarbonScanRDD[CarbonRow]( - spark = session, - columnProjection = new CarbonProjection(projectColumns), - filterExpression = filter, - identifier = table.getAbsoluteTableIdentifier, - serializedTableInfo = table.getTableInfo.serialize, - tableInfo = table.getTableInfo, - inputMetricsStats = new CarbonInputMetrics, - partitionNames = null, - dataTypeConverterClz = null, - readSupportClz = classOf[CarbonRowReadSupport]) -rdd.collect +scan(getTable(path), projectColumns, filter) + } + + def scan( + carbonTable: CarbonTable, + projectColumns: Array[String], --- End diff -- why not use `QueryProjection` instead of array ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323]Distributed search mode using gRPC
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3812/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323]Distributed search mode using gRPC
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5028/ ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181587662 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModelBuilder.java --- @@ -23,51 +23,101 @@ import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.SingleTableProvider; import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.util.DataTypeConverter; public class QueryModelBuilder { - private CarbonTable carbonTable; + private CarbonTable table; + private QueryProjection projection; + private Expression filterExpression; + private DataTypeConverter dataTypeConverter; + private boolean forcedDetailRawQuery; + private boolean readPageByPage; - public QueryModelBuilder(CarbonTable carbonTable) { -this.carbonTable = carbonTable; + public QueryModelBuilder(CarbonTable table) { +this.table = table; } - public QueryModel build(String[] projectionColumnNames, Expression filterExpression) { -QueryModel queryModel = QueryModel.newInstance(carbonTable); -QueryProjection projection = carbonTable.createProjection(projectionColumnNames); -queryModel.setProjection(projection); -boolean[] isFilterDimensions = new boolean[carbonTable.getDimensionOrdinalMax()]; -boolean[] isFilterMeasures = new boolean[carbonTable.getAllMeasures().size()]; -carbonTable.processFilterExpression(filterExpression, isFilterDimensions, isFilterMeasures); -queryModel.setIsFilterDimensions(isFilterDimensions); -queryModel.setIsFilterMeasures(isFilterMeasures); -FilterResolverIntf filterIntf = carbonTable.resolveFilter(filterExpression, null); -queryModel.setFilterExpressionResolverTree(filterIntf); -return queryModel; + public QueryModelBuilder projectColumns(String[] projectionColumns) { +String factTableName = table.getTableName(); +QueryProjection projection = new QueryProjection(); +// fill dimensions +// If columns are null, set all dimensions and measures --- End diff -- fixed ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181587611 --- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java --- @@ -134,9 +133,7 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont queryModel.setTableBlockInfos(tableBlockInfoList); queryModel.setVectorReader(true); try { - if (CarbonProperties.getInstance().getProperty( - CarbonCommonConstants.CARBON_SEARCH_MODE_ENABLE, - CarbonCommonConstants.CARBON_SEARCH_MODE_ENABLE_DEFAULT).equals("true")) { + if (CarbonProperties.isSearchModeEnabled()) { --- End diff -- fixed ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181587533 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModelBuilder.java --- @@ -23,51 +23,101 @@ import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.SingleTableProvider; import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.util.DataTypeConverter; public class QueryModelBuilder { - private CarbonTable carbonTable; + private CarbonTable table; + private QueryProjection projection; + private Expression filterExpression; + private DataTypeConverter dataTypeConverter; + private boolean forcedDetailRawQuery; + private boolean readPageByPage; - public QueryModelBuilder(CarbonTable carbonTable) { -this.carbonTable = carbonTable; + public QueryModelBuilder(CarbonTable table) { +this.table = table; } - public QueryModel build(String[] projectionColumnNames, Expression filterExpression) { -QueryModel queryModel = QueryModel.newInstance(carbonTable); -QueryProjection projection = carbonTable.createProjection(projectionColumnNames); -queryModel.setProjection(projection); -boolean[] isFilterDimensions = new boolean[carbonTable.getDimensionOrdinalMax()]; -boolean[] isFilterMeasures = new boolean[carbonTable.getAllMeasures().size()]; -carbonTable.processFilterExpression(filterExpression, isFilterDimensions, isFilterMeasures); -queryModel.setIsFilterDimensions(isFilterDimensions); -queryModel.setIsFilterMeasures(isFilterMeasures); -FilterResolverIntf filterIntf = carbonTable.resolveFilter(filterExpression, null); -queryModel.setFilterExpressionResolverTree(filterIntf); -return queryModel; + public QueryModelBuilder projectColumns(String[] projectionColumns) { +String factTableName = table.getTableName(); +QueryProjection projection = new QueryProjection(); +// fill dimensions +// If columns are null, set all dimensions and measures --- End diff -- I think it is better to throw an exception if projections are null. Because it has already another method `projectAllColumns` if the user wants to get all columns. ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181587500 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SearchModeVectorDetailQueryExecutor.java --- @@ -56,6 +59,11 @@ } } + public static void shutdown() { --- End diff -- This is called in Worker.shutdown ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181587317 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/datatype/DataType.java --- @@ -91,6 +91,7 @@ public static char convertType(DataType dataType) { return STRING_CHAR; } else if (dataType == DataTypes.TIMESTAMP) { return TIMESTAMP_CHAR; + --- End diff -- fixed ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181586769 --- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java --- @@ -134,9 +133,7 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont queryModel.setTableBlockInfos(tableBlockInfoList); queryModel.setVectorReader(true); try { - if (CarbonProperties.getInstance().getProperty( - CarbonCommonConstants.CARBON_SEARCH_MODE_ENABLE, - CarbonCommonConstants.CARBON_SEARCH_MODE_ENABLE_DEFAULT).equals("true")) { + if (CarbonProperties.isSearchModeEnabled()) { --- End diff -- I guess no need to do if check here, already `QueryExecutorFactory` available so move the code there. ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181586088 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SearchModeVectorDetailQueryExecutor.java --- @@ -56,6 +59,11 @@ } } + public static void shutdown() { --- End diff -- Better override the finish method to call the executorservice shutdowm ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181586050 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SearchModeVectorDetailQueryExecutor.java --- @@ -56,6 +59,11 @@ } } + public static void shutdown() { --- End diff -- I don't find the caller method of this method. ---
[GitHub] carbondata pull request #2148: [CARBONDATA-2323]Distributed search mode usin...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2148#discussion_r181585854 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/datatype/DataType.java --- @@ -91,6 +91,7 @@ public static char convertType(DataType dataType) { return STRING_CHAR; } else if (dataType == DataTypes.TIMESTAMP) { return TIMESTAMP_CHAR; + --- End diff -- remove unnecessary line ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323]Distributed search mode using gRPC
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3811/ ---
[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2149 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3810/ ---
[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2149 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5026/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323]Distributed search mode using gRPC
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5027/ ---
[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...
Github user kumarvishal09 commented on the issue: https://github.com/apache/carbondata/pull/2149 retest this please ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5025/ ---
[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2148 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3809/ ---