[jira] [Created] (CARBONDATA-3911) NullPointerException is thrown when clean files is executed after two updates

2020-07-16 Thread Akash R Nilugal (Jira)
Akash R Nilugal created CARBONDATA-3911:
---

 Summary: NullPointerException is thrown when clean files is 
executed after two updates
 Key: CARBONDATA-3911
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3911
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


* create table
* load data
* load one more data
* update1
* update2
* clean files

fails with NullPointer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akashrn5 commented on pull request #3838: [CARBONDATA-3910]Fix load failure in cluster when csv present in local file system in case of global sort

2020-07-16 Thread GitBox


akashrn5 commented on pull request #3838:
URL: https://github.com/apache/carbondata/pull/3838#issuecomment-659863917


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3910) load fails when csv file present in local and loading to cluster

2020-07-16 Thread Akash R Nilugal (Jira)
Akash R Nilugal created CARBONDATA-3910:
---

 Summary: load fails when csv file present in local and loading to 
cluster
 Key: CARBONDATA-3910
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3910
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


load fails when csv file present in local and loading to cluster



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] QiangCai commented on a change in pull request #3842: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3842:
URL: https://github.com/apache/carbondata/pull/3842#discussion_r456206506



##
File path: 
integration/spark/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala
##
@@ -157,21 +157,21 @@ object CarbonMergeFilesRDD {
 if (carbonTable.isHivePartitionTable && 
!StringUtils.isEmpty(tempFolderPath)) {
   // remove all tmp folder of index files
   val startDelete = System.currentTimeMillis()
-  val numThreads = Math.min(Math.max(partitionInfo.size(), 1), 10)
-  val executorService = Executors.newFixedThreadPool(numThreads)
-  val carbonSessionInfo = ThreadLocalSessionInfo.getCarbonSessionInfo
-  partitionInfo
-.asScala
-.map { partitionPath =>
-  executorService.submit(new Runnable {
-override def run(): Unit = {
-  ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)
-  FileFactory.deleteAllCarbonFilesOfDir(
-FileFactory.getCarbonFile(partitionPath + "/" + 
tempFolderPath))
-}
-  })
+  val allTmpDirs = partitionInfo
+.asScala.map { partitionPath =>
+  partitionPath + CarbonCommonConstants.FILE_SEPARATOR + tempFolderPath
 }
-.map(_.get())
+  val allTmpFiles = allTmpDirs.map { partitionDir =>
+  FileFactory.getCarbonFile(partitionDir).listFiles()

Review comment:
   if loading create too many partitions, this listFiles will take a long 
time also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3842: [CARBONDATA-3702] Clean temp index files in parallel in merge index flow

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3842:
URL: https://github.com/apache/carbondata/pull/3842#discussion_r456206506



##
File path: 
integration/spark/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala
##
@@ -157,21 +157,21 @@ object CarbonMergeFilesRDD {
 if (carbonTable.isHivePartitionTable && 
!StringUtils.isEmpty(tempFolderPath)) {
   // remove all tmp folder of index files
   val startDelete = System.currentTimeMillis()
-  val numThreads = Math.min(Math.max(partitionInfo.size(), 1), 10)
-  val executorService = Executors.newFixedThreadPool(numThreads)
-  val carbonSessionInfo = ThreadLocalSessionInfo.getCarbonSessionInfo
-  partitionInfo
-.asScala
-.map { partitionPath =>
-  executorService.submit(new Runnable {
-override def run(): Unit = {
-  ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)
-  FileFactory.deleteAllCarbonFilesOfDir(
-FileFactory.getCarbonFile(partitionPath + "/" + 
tempFolderPath))
-}
-  })
+  val allTmpDirs = partitionInfo
+.asScala.map { partitionPath =>
+  partitionPath + CarbonCommonConstants.FILE_SEPARATOR + tempFolderPath
 }
-.map(_.get())
+  val allTmpFiles = allTmpDirs.map { partitionDir =>
+  FileFactory.getCarbonFile(partitionDir).listFiles()

Review comment:
   if loading create too many partitions, this list file also take a long 
time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3778: [WIP] Support array with SI

2020-07-16 Thread GitBox


QiangCai commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-659830702


   please describe the PR and fix the failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version

2020-07-16 Thread GitBox


QiangCai commented on pull request #3807:
URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659829122


   @ajantha-bhat 
   1. remove CarbonData jars from your local maven repo at first. 
   2. build it with -o, you will find dependency error( can not find the 
dependency: carbondata-spark_2.3 and carbondata-spark_2.4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3810: [CARBONDATA-3900] [CARBONDATA-3882] [CARBONDATA-3881] Fix multiple concurrent issues in table status lock and segment lock f

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3810:
URL: https://github.com/apache/carbondata/pull/3810#discussion_r456203328



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
##
@@ -440,14 +448,22 @@ object SecondaryIndexUtil {
 val loadFolderDetailsArray = 
SegmentStatusManager.readLoadMetadata(indexTable

Review comment:
   reading should be inside of locking 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


Zhangshunyu commented on a change in pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#discussion_r456193999



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortParameters.java
##
@@ -37,6 +40,13 @@
 import org.apache.log4j.Logger;
 
 public class SortParameters implements Serializable {
+  
+  private ExecutorService writeService = Executors.newFixedThreadPool(5,

Review comment:
   Suggest to make it configurable when set core pool size for threadpool





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on a change in pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


Zhangshunyu commented on a change in pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#discussion_r456193818



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
##
@@ -200,25 +203,44 @@ public void startSorting() {
* @param file file
* @throws CarbonSortKeyAndGroupByException
*/
-  private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file)
-  throws CarbonSortKeyAndGroupByException {
-DataOutputStream stream = null;
-try {
-  // open stream
-  stream = FileFactory.getDataOutputStream(file.getPath(),
-  parameters.getFileWriteBufferSize(), 
parameters.getSortTempCompressorName());
-  int actualSize = rowPage.getBuffer().getActualSize();
-  // write number of entries to the file
-  stream.writeInt(actualSize);
-  for (int i = 0; i < actualSize; i++) {
-rowPage.writeRow(
-rowPage.getBuffer().get(i) + 
rowPage.getDataBlock().getBaseOffset(), stream);
+  private void writeDataToFile(UnsafeCarbonRowPage rowPage, File file) {
+writeService.submit(new WriteThread(rowPage, file));
+  }
+
+  public class WriteThread implements Runnable {
+private File file;
+private UnsafeCarbonRowPage rowPage;
+
+public WriteThread(UnsafeCarbonRowPage rowPage, File file) {
+  this.rowPage = rowPage;
+  this.file = file;
+
+}
+
+@Override
+public void run() {
+  DataOutputStream stream = null;
+  try {
+// open stream
+stream = FileFactory.getDataOutputStream(this.file.getPath(),
+parameters.getFileWriteBufferSize(), 
parameters.getSortTempCompressorName());
+int actualSize = rowPage.getBuffer().getActualSize();
+// write number of entries to the file
+stream.writeInt(actualSize);
+for (int i = 0; i < actualSize; i++) {
+  rowPage.writeRow(
+  rowPage.getBuffer().get(i) + 
rowPage.getDataBlock().getBaseOffset(), stream);
+}
+// add sort temp filename to and arrayList. When the list size reaches 
20 then
+// intermediate merging of sort temp files will be triggered
+unsafeInMemoryIntermediateFileMerger.addFileToMerge(file);
+  } catch (IOException | MemoryException e) {
+e.printStackTrace();

Review comment:
   use log4j instead of printStackStrace





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


Zhangshunyu commented on pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659810429


   please check the build failure info



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] tianlileer closed pull request #710: [CARBONDATA-833]load data from dataframe,generater data row may be error when delimiter…

2020-07-16 Thread GitBox


tianlileer closed pull request #710:
URL: https://github.com/apache/carbondata/pull/710


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3785: [CARBONDATA-3843] Fix merge index issue in streaming table

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3785:
URL: https://github.com/apache/carbondata/pull/3785#discussion_r456179554



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala
##
@@ -104,73 +104,80 @@ class MergeIndexEventListener extends 
OperationEventListener with Logging {
   case alterTableMergeIndexEvent: AlterTableMergeIndexEvent =>
 val carbonMainTable = alterTableMergeIndexEvent.carbonTable
 val sparkSession = alterTableMergeIndexEvent.sparkSession
-if (!carbonMainTable.isStreamingSink) {
-  LOGGER.info(s"Merge Index request received for table " +
-  s"${ carbonMainTable.getDatabaseName }.${ 
carbonMainTable.getTableName }")
-  val lock = CarbonLockFactory.getCarbonLockObj(
-carbonMainTable.getAbsoluteTableIdentifier,
-LockUsage.COMPACTION_LOCK)
+LOGGER.info(s"Merge Index request received for table " +
+s"${ carbonMainTable.getDatabaseName }.${ 
carbonMainTable.getTableName }")
+val lock = CarbonLockFactory.getCarbonLockObj(
+  carbonMainTable.getAbsoluteTableIdentifier,
+  LockUsage.COMPACTION_LOCK)
 
-  try {
-if (lock.lockWithRetries()) {
-  LOGGER.info("Acquired the compaction lock for table" +
-  s" ${ carbonMainTable.getDatabaseName }.${
-carbonMainTable
-  .getTableName
-  }")
-  val segmentsToMerge =
-if 
(alterTableMergeIndexEvent.alterTableModel.customSegmentIds.isEmpty) {
-  val validSegments =
-
CarbonDataMergerUtil.getValidSegmentList(carbonMainTable).asScala
-  val validSegmentIds: mutable.Buffer[String] = 
mutable.Buffer[String]()
-  validSegments.foreach { segment =>
+try {
+  if (lock.lockWithRetries()) {
+LOGGER.info("Acquired the compaction lock for table" +
+s" ${ carbonMainTable.getDatabaseName }.${
+  carbonMainTable
+.getTableName
+}")

Review comment:
   combine these lines to one line

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala
##
@@ -104,73 +104,80 @@ class MergeIndexEventListener extends 
OperationEventListener with Logging {
   case alterTableMergeIndexEvent: AlterTableMergeIndexEvent =>
 val carbonMainTable = alterTableMergeIndexEvent.carbonTable
 val sparkSession = alterTableMergeIndexEvent.sparkSession
-if (!carbonMainTable.isStreamingSink) {
-  LOGGER.info(s"Merge Index request received for table " +
-  s"${ carbonMainTable.getDatabaseName }.${ 
carbonMainTable.getTableName }")
-  val lock = CarbonLockFactory.getCarbonLockObj(
-carbonMainTable.getAbsoluteTableIdentifier,
-LockUsage.COMPACTION_LOCK)
+LOGGER.info(s"Merge Index request received for table " +
+s"${ carbonMainTable.getDatabaseName }.${ 
carbonMainTable.getTableName }")
+val lock = CarbonLockFactory.getCarbonLockObj(
+  carbonMainTable.getAbsoluteTableIdentifier,
+  LockUsage.COMPACTION_LOCK)
 
-  try {
-if (lock.lockWithRetries()) {
-  LOGGER.info("Acquired the compaction lock for table" +
-  s" ${ carbonMainTable.getDatabaseName }.${
-carbonMainTable
-  .getTableName
-  }")
-  val segmentsToMerge =
-if 
(alterTableMergeIndexEvent.alterTableModel.customSegmentIds.isEmpty) {
-  val validSegments =
-
CarbonDataMergerUtil.getValidSegmentList(carbonMainTable).asScala
-  val validSegmentIds: mutable.Buffer[String] = 
mutable.Buffer[String]()
-  validSegments.foreach { segment =>
+try {
+  if (lock.lockWithRetries()) {
+LOGGER.info("Acquired the compaction lock for table" +
+s" ${ carbonMainTable.getDatabaseName }.${
+  carbonMainTable
+.getTableName
+}")
+val loadFolderDetailsArray = SegmentStatusManager
+  .readLoadMetadata(carbonMainTable.getMetadataPath)
+val segmentFileNameMap: java.util.Map[String, String] = new 
util.HashMap[String,
+  String]()
+var streamingSegment: Set[String] = Set[String]()
+loadFolderDetailsArray.foreach(loadMetadataDetails => {
+  if (loadMetadataDetails.getFileFormat.equals(FileFormat.ROW_V1)) 
{
+streamingSegment += 

[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456176045



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelFilterExecuterImpl.java
##
@@ -222,49 +228,103 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   }
 }
 BitSetGroup bitSetGroup = new BitSetGroup(pageNumbers);
-for (int i = 0; i < pageNumbers; i++) {
-  BitSet set = new BitSet(numberOfRows[i]);
-  RowIntf row = new RowImpl();
-  BitSet prvBitset = null;
-  // if bitset pipe line is enabled then use rowid from previous bitset
-  // otherwise use older flow
-  if (!useBitsetPipeLine ||
-  null == rawBlockletColumnChunks.getBitSetGroup() ||
-  null == bitSetGroup.getBitSet(i) ||
-  rawBlockletColumnChunks.getBitSetGroup().getBitSet(i).isEmpty()) {
+if (isDimensionPresentInCurrentBlock.length == 1 && 
isDimensionPresentInCurrentBlock[0]

Review comment:
   it will be hard to read the code after we add more if condition





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456175101



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -865,7 +870,33 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 Some(CarbonContainsWith(c))
   case c@Literal(v, t) if (v == null) =>
 Some(FalseExpr())
-  case others => None
+  case c@ArrayContains(a: Attribute, Literal(v, t)) =>
+a.dataType match {
+  case arrayType: ArrayType =>
+arrayType.elementType match {

Review comment:
   how about extract the match code block to a method: isPrimitiveDataType 
and move it into a util class?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on pull request #3849: [WIP] table level timestampformat

2020-07-16 Thread GitBox


Zhangshunyu commented on pull request #3849:
URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659774957


   Greate! This is a useful feature.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-16 Thread GitBox


QiangCai commented on a change in pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#discussion_r456167613



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestArrayContainsPushDown.scala
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestArrayContainsPushDown extends QueryTest with BeforeAndAfterAll {
+
+  override protected def afterAll(): Unit = {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)
+sql("DROP TABLE IF EXISTS compactComplex")
+  }
+
+  test("test array contains pushdown for array of string") {
+sql("drop table if exists complex1")
+sql("create table complex1 (arr array) stored as carbondata")
+sql("insert into complex1 select array('as') union all " +
+"select array('sd','df','gh') union all " +
+"select array('rt','ew','rtyu','jk',null) union all " +
+"select array('ghsf','dbv','','ty') union all " +
+"select array('hjsd','fggb','nhj','sd','asd')")
+
+checkExistence(sql(" explain select * from complex1 where 
array_contains(arr,'sd')"),
+  true,
+  "PushedFilters: [*EqualTo(arr,sd)]")
+
+checkExistence(sql(" explain select count(*) from complex1 where 
array_contains(arr,'sd')"),
+  true,
+  "PushedFilters: [*EqualTo(arr,sd)]")
+
+checkAnswer(sql(" select * from complex1 where array_contains(arr,'sd')"),

Review comment:
   can you add a test case that likes the below query?
   
   select * from complex1 where arr[0] = 'sd'
   
   can we push down this filter too?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3851: [WIP]Fix Global sort data load failure issue with Decimal value as NULL

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3851:
URL: https://github.com/apache/carbondata/pull/3851#issuecomment-659592635


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3409/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3851: [WIP]Fix Global sort data load failure issue with Decimal value as NULL

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3851:
URL: https://github.com/apache/carbondata/pull/3851#issuecomment-659591221


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1667/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3851: [WIP]Fix Global sort data load failure issue with Decimal value as NULL

2020-07-16 Thread GitBox


akashrn5 commented on a change in pull request #3851:
URL: https://github.com/apache/carbondata/pull/3851#discussion_r455977271



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -234,9 +234,25 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
   CarbonCommonConstants.BLOCKLET_SIZE_DEFAULT_VAL)
   }
 
+  test("test decimal value as null with global sort load") {

Review comment:
   @kunal642 had already fixed one issue regarding null value for string, 
now we got for decimal. He has added test case with string, int, double, int, 
bigint having null values in the test case, now decimal is added, please better 
to add a test case for all complex types also once. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3850: [CARBONDATA-3907]Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutio

2020-07-16 Thread GitBox


akashrn5 commented on a change in pull request #3850:
URL: https://github.com/apache/carbondata/pull/3850#discussion_r455974523



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##
@@ -228,24 +228,17 @@ case class CarbonAddLoadCommand(
 model.setTableName(carbonTable.getTableName)
 val operationContext = new OperationContext
 operationContext.setProperty("isLoadOrCompaction", false)
-val loadTablePreExecutionEvent: LoadTablePreExecutionEvent =
-  new LoadTablePreExecutionEvent(
-carbonTable.getCarbonTableIdentifier,
-model)
-operationContext.setProperty("isOverwrite", false)
-OperationListenerBus.getInstance.fireEvent(loadTablePreExecutionEvent, 
operationContext)
-// Add pre event listener for index indexSchema
-val tableIndexes = 
IndexStoreManager.getInstance().getAllCGAndFGIndexes(carbonTable)
-val indexOperationContext = new OperationContext()
-if (tableIndexes.size() > 0) {
-  val indexNames: mutable.Buffer[String] =
-tableIndexes.asScala.map(index => index.getIndexSchema.getIndexName)
-  val buildIndexPreExecutionEvent: BuildIndexPreExecutionEvent =
-BuildIndexPreExecutionEvent(
-  sparkSession, carbonTable.getAbsoluteTableIdentifier, indexNames)
-  OperationListenerBus.getInstance().fireEvent(buildIndexPreExecutionEvent,
-indexOperationContext)
-}
+val (tableIndexes, indexOperationContext) = 
CommonLoadUtils.firePreLoadEvents(

Review comment:
   @VenuReddy2103 can you confirm if at all please the same function is 
being used, if not change there also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3849: [WIP] table level timestampformat

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3849:
URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659554650


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1666/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3849: [WIP] table level timestampformat

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3849:
URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659550711


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3408/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3909) Insert into select fails after insert decimal value as null and set sort scope to global sort

2020-07-16 Thread Chetan Bhat (Jira)
Chetan Bhat created CARBONDATA-3909:
---

 Summary: Insert into select fails after insert decimal value as 
null and set sort scope to global sort
 Key: CARBONDATA-3909
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3909
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 2.0.1
 Environment: Spark 2.3.2, 2.4.5
Reporter: Chetan Bhat


Steps -

insert decimal value as null and set sort scope to global sort and do insert 
into select.

 

Issue : - Insert into select fails.

 

Expected : - Insert into select should be success.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] Indhumathi27 opened a new pull request #3851: [WIP]Fix Global sort data load failure issue with Decimal value as NULL

2020-07-16 Thread GitBox


Indhumathi27 opened a new pull request #3851:
URL: https://github.com/apache/carbondata/pull/3851


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on pull request #3849: [WIP] table level timestampformat

2020-07-16 Thread GitBox


ShreelekhyaG commented on pull request #3849:
URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659467768


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3849: [WIP] table level timestampformat

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3849:
URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659447560


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1665/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3850: [CARBONDATA-3907]Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3850:
URL: https://github.com/apache/carbondata/pull/3850#issuecomment-659422935







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3849: [WIP] table level timestampformat

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3849:
URL: https://github.com/apache/carbondata/pull/3849#issuecomment-659392712


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3407/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3908) When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values.

2020-07-16 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3908:
-

 Summary: When a carbon segment is added through the alter add 
segments query, then it is not accounting the added carbon segment values.
 Key: CARBONDATA-3908
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3908
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: FI cluster and opensource cluster.
Reporter: Prasanna Ravichandran


When a carbon segment is added through the alter add segments query, then it is 
not accounting the added carbon segment values. If we do count(*) on the added 
segment, then it is always showing as 0.

Test queries:

drop table if exists uniqdata;
CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, 
dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');

--hdfs dfs -mkdir /uniqdata-carbon-segment;
--hdfs dfs -cp /user/hive/warehouse/uniqdata/Fact/Part0/Segment_0/* 
/uniqdata-carbon-segment/
Alter table uniqdata add segment options 
('path'='hdfs://hacluster/uniqdata-carbon-segment/','format'='carbon');

select count(*) from uniqdata;--4000 expected as one load of 2000 records 
happened and same segment is added again;

set carbon.input.segments.default.uniqdata=1;
select count(*) from uniqdata;--2000 expected - it should just show the records 
count of added segments;

CONSOLE:

/> set carbon.input.segments.default.uniqdata=1;
+-++
| key | value |
+-++
| carbon.input.segments.default.uniqdata | 1 |
+-++
1 row selected (0.192 seconds)
/> select count(*) from uniqdata;
INFO : Execution ID: 1734
+---+
| count(1) |
+---+
| 2000 |
+---+
1 row selected (4.036 seconds)
/> set carbon.input.segments.default.uniqdata=2;
+-++
| key | value |
+-++
| carbon.input.segments.default.uniqdata | 2 |
+-++
1 row selected (0.088 seconds)
/> select count(*) from uniqdata;
INFO : Execution ID: 1745
+---+
| count(1) |
+---+
| 2000 |
+---+
1 row selected (6.056 seconds)
/> set carbon.input.segments.default.uniqdata=3;
+-++
| key | value |
+-++
| carbon.input.segments.default.uniqdata | 3 |
+-++
1 row selected (0.161 seconds)
/> select count(*) from uniqdata;
INFO : Execution ID: 1753
+---+
| count(1) |
+---+
| 0 |
+---+
1 row selected (4.875 seconds)
/> show segments for table uniqdata;
+-+--+--+--+++-+--+
| ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
Index Size | File Format |
+-+--+--+--+++-+--+
| 4 | Success | 2020-07-17 16:01:53.673 | 5.579S | {} | 269.10KB | 7.21KB | 
columnar_v3 |
| 3 | Success | 2020-07-17 16:00:24.866 | 0.578S | {} | 88.55KB | 1.81KB | 
columnar_v3 |
| 2 | Success | 2020-07-17 15:07:54.273 | 0.642S | {} | 36.72KB | NA | orc |
| 1 | Success | 2020-07-17 15:03:59.767 | 0.564S | {} | 89.26KB | NA | parquet |
| 0 | Success | 2020-07-16 12:44:32.095 | 4.484S | {} | 88.55KB | 1.81KB | 
columnar_v3 |
+-+--+--+--+++-+--+

Expected result: Records added by adding carbon segment should be considered.

Actual result: Records added by adding carbon segment is not considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3848: [CARBONDATA-3891] Fix loading data will update all segments updateDeltaEndTimestamp

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3848:
URL: https://github.com/apache/carbondata/pull/3848#issuecomment-659380516


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3404/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3848: [CARBONDATA-3891] Fix loading data will update all segments updateDeltaEndTimestamp

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3848:
URL: https://github.com/apache/carbondata/pull/3848#issuecomment-659376770


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1662/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3907) Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alt

2020-07-16 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3907:
--
Description: 
*[Issue]*

Currently we have 2 different ways of firing LoadTablePreExecutionEvent and 
LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and 
firePostLoadEvents methods from CommonLoadUtils to trigger 
LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in 
alter table add segment flow as well. 

*[Suggestion]*

Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to 
trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively 
in alter table add segment flow.

  was:
*[Issue]*

Currently we have 2 different ways of firing LoadTablePreExecutionEvent and 
LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and 
firePostLoadEvents methods from CommonLoadUtils to trigger 
LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in 
alter table add segment flow as well. So that we can have single flow to fire 
these events

 

*[Suggestion]*

Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to 
trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively 
in alter table add segment flow.


> Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils 
> to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent 
> respectively in alter table add segment flow
> --
>
> Key: CARBONDATA-3907
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3907
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: Venugopal Reddy K
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *[Issue]*
> Currently we have 2 different ways of firing LoadTablePreExecutionEvent and 
> LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and 
> firePostLoadEvents methods from CommonLoadUtils to trigger 
> LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in 
> alter table add segment flow as well. 
> *[Suggestion]*
> Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils 
> to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent 
> respectively in alter table add segment flow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] VenuReddy2103 opened a new pull request #3850: [CARBONDATA-3907]Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent

2020-07-16 Thread GitBox


VenuReddy2103 opened a new pull request #3850:
URL: https://github.com/apache/carbondata/pull/3850


### Why is this PR needed?
   Currently we have 2 different ways of firing LoadTablePreExecutionEvent and 
LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and 
firePostLoadEvents methods from CommonLoadUtils to trigger 
LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in 
alter table add segment flow as well. So that we can have single flow to fire 
these events

### What changes were proposed in this PR?
   Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils 
to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent 
respectively in alter table add segment flow.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3907) Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in alt

2020-07-16 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created CARBONDATA-3907:
-

 Summary: Reuse firePreLoadEvents and firePostLoadEvents methods 
from CommonLoadUtils to trigger LoadTablePreExecutionEvent and 
LoadTablePostExecutionEvent respectively in alter table add segment flow
 Key: CARBONDATA-3907
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3907
 Project: CarbonData
  Issue Type: Improvement
  Components: spark-integration
Affects Versions: 2.0.0
Reporter: Venugopal Reddy K
 Fix For: 2.1.0


*[Issue]*

Currently we have 2 different ways of firing LoadTablePreExecutionEvent and 
LoadTablePostExecutionEvent. We can reuse firePreLoadEvents and 
firePostLoadEvents methods from CommonLoadUtils to trigger 
LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively in 
alter table add segment flow as well. So that we can have single flow to fire 
these events

 

*[Suggestion]*

Reuse firePreLoadEvents and firePostLoadEvents methods from CommonLoadUtils to 
trigger LoadTablePreExecutionEvent and LoadTablePostExecutionEvent respectively 
in alter table add segment flow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3807:
URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659339600


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3403/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3807:
URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659338632


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1660/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3849: [WIP] table level timestampformat

2020-07-16 Thread GitBox


ShreelekhyaG opened a new pull request #3849:
URL: https://github.com/apache/carbondata/pull/3849


### Why is this PR needed?
To support timestamp format table level.

### What changes were proposed in this PR?
   Made the priority of timestamp format as:
1) Load command options 
2) Table level properties
3) configurable properties (carbon.timestamp.format)
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3787: [WIP] support sort_scope for index creation

2020-07-16 Thread GitBox


ajantha-bhat commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-659312911


   @QiangCai : yes, it is in WIP. SI global sort I will support from this PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] IceMimosa commented on pull request #3848: [CARBONDATA-3891] Fix loading data will update all segments updateDeltaEndTimestamp

2020-07-16 Thread GitBox


IceMimosa commented on pull request #3848:
URL: https://github.com/apache/carbondata/pull/3848#issuecomment-659312101


   reset please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3787: [WIP] support sort_scope for index creation

2020-07-16 Thread GitBox


QiangCai commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-659312148


   during SI loading, it should use this sort_scope.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] IceMimosa opened a new pull request #3848: [CARBONDATA-3891] Fix loading data will update all segments updateDeltaEndTimestamp

2020-07-16 Thread GitBox


IceMimosa opened a new pull request #3848:
URL: https://github.com/apache/carbondata/pull/3848


### Why is this PR needed?
Loading Data to the partitioned table will update all segments 
updateDeltaEndTimestamp,that will cause the driver to clear all segments cache 
when doing the query.

### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes TODO

   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659311646


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1661/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659309836


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3402/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


ajantha-bhat commented on pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659307892


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


ajantha-bhat commented on pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659307713


   Add to whitelist



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version

2020-07-16 Thread GitBox


ajantha-bhat commented on pull request #3807:
URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659307068


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version

2020-07-16 Thread GitBox


ajantha-bhat edited a comment on pull request #3807:
URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659305724


   @QiangCai : Developer should not manually modify pom to make it work for 
spark2.4.
   After this PR both 2.4 and 2.5 works without any manual change and jar names 
also will have binary version.
   
   so I fixed like above. 
   
   But some testcase failed to find CSV file after this change. so, I stopped 
it. Need to analyze why CSV files unable to find because of my change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version

2020-07-16 Thread GitBox


ajantha-bhat commented on pull request #3807:
URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659305724


   @QiangCai : Developer should not manually modify pom to make it work for 
spark2.4.
   
   so I fixed like above. 
   
   But some testcase failed to find CSV file after this change. so, I stopped 
it. Need to analyze why CSV files unable to find because of my change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3807: [HOTFIX] Fix module problems of mv and spark with spark binary version

2020-07-16 Thread GitBox


QiangCai commented on pull request #3807:
URL: https://github.com/apache/carbondata/pull/3807#issuecomment-659304203


   finalName is ${artifactId}-${version} by default, this change will not 
impact artifactId and version. 
   Other modules will can not find the dependency: carbondata-spark_2.3 and 
carbondata-spark_2.4.
   
   actually, if you change spark.binary.version to 2.4 in pom.xml of the parent 
module, IDEA will work again for spark 2.4.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shunlean opened a new pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


shunlean opened a new pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847


### Why is this PR needed?
   
   Only after sorting temp, the write(sortTemp file) operation can run.
   For better performance, we want to do the  writeDataToFile and SortDataRows 
operations in parallel.

### What changes were proposed in this PR?
   
   In (Unsafe)SortDataRows, we add new threads to run write the file operation. 
   About 10% time is reduced with parallel operation in one case.
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3847: [CARBONDATA-3906] Optimize sort performance in writting file

2020-07-16 Thread GitBox


CarbonDataQA1 commented on pull request #3847:
URL: https://github.com/apache/carbondata/pull/3847#issuecomment-659300018


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3906) Optimize sort performance in writting file

2020-07-16 Thread bishunli (Jira)
bishunli created CARBONDATA-3906:


 Summary:  Optimize sort performance in writting file
 Key: CARBONDATA-3906
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3906
 Project: CarbonData
  Issue Type: Improvement
Reporter: bishunli






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3904) insert into data got Failed to create directory path /d

2020-07-16 Thread Kunal Kapoor (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158972#comment-17158972
 ] 

Kunal Kapoor commented on CARBONDATA-3904:
--

What is the warehouse location? HDFS/S3?

> insert into data got Failed to create directory path /d
> ---
>
> Key: CARBONDATA-3904
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3904
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
> Environment: spark-2.4.5
> hadoop 2.7.3
> carbondata2.0.1
>Reporter: XiaoWen
>Priority: Minor
>
> insert data
> {code:java}
> spark.sql("INSERT OVERWRITE TABLE ods.test_table SELECT * FROM 
> ods.socol_cmdinfo")
> {code}
>  check logs from spark application on yarn
> $ yarn logs -applicationId application_1592787941917_4116
> found a lot this error messages
> {code:java}
> 20/07/15 16:59:45 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 16:59:45 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 16:59:51 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 16:59:51 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:03:36 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:03:36 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:09:55 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:09:55 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
> 20/07/15 17:12:45 ERROR FileFactory:  Failed to create directory path /d
> {code}
> {code:java}
> core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java
> {code}
> {code:java}
> public static void createDirectoryAndSetPermission(String directoryPath, 
> FsPermission permission)
>   throws IOException {
> FileFactory.FileType fileType = FileFactory.getFileType(directoryPath);
> switch (fileType) {
>   case S3:
>   case HDFS:
>   case ALLUXIO:
>   case VIEWFS:
>   case CUSTOM:
>   case HDFS_LOCAL:
> try {
>   Path path = new Path(directoryPath);
>   FileSystem fs = path.getFileSystem(getConfiguration());
>   if (!fs.exists(path)) {
> fs.mkdirs(path);
> fs.setPermission(path, permission);
>   }
> } catch (IOException e) {
>   LOGGER.error("Exception occurred : " + e.getMessage(), e);
>   throw e;
> }
> return;
>   case LOCAL:
>   default:
> directoryPath = FileFactory.getUpdatedFilePath(directoryPath);
> File file = new File(directoryPath);
> if (!file.mkdirs()) {
>   LOGGER.error(" Failed to create directory path " + directoryPath);
> }}
>   }
> {code}
>  
> I output the variable directoryPath and fileType
> {code:java}
> if (!file.mkdirs()) {
>   //  check variables
>   LOGGER.info("directoryPath = [" + directoryPath + "], fileType = [" 
> + fileType.toString() + "]");
>   LOGGER.error(" Failed to create directory path " + directoryPath);
> }
> {code}
> add line 
> LOGGER.info("directoryPath = [" + directoryPath + "], fileType = [" + 
> fileType.toString() + "]");
> got echo on yarn logs
> 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
>  2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
>  2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
>  2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
>  2020-07-15 10:48:56 

[jira] [Updated] (CARBONDATA-3905) When there are many segment files presto query fail

2020-07-16 Thread XiaoWen (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaoWen updated CARBONDATA-3905:

Description: 
test case1

insert data in:
{code:java}
df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => {
...
val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*)
target.as("A")
  .merge(df.as("B"), "A.id = B.id")
  .whenMatched(cond)
  .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
  .whenNotMatched(cond)
  .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", 
"age" -> "B.age"))
  .execute()
 ...
}).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start()
{code}
a lot of segment files will be generated after a few hours
 when i try to use presto to query.
 single condition can be queried, but cannot be queried when there are multiple 
conditions.

select name from test_table // ok
 select name from test_table where name = 'joe' // ok 
 select name from test_table where name='joe' AND age > 25;// query failed
 select name from test_table where name='joe' AND age > 25 AND city 
='shenzhen';// query failed

i have also tried to compact 'major' the segment files to reduce the segment 
quantity, and I still cannot query successfully.

presto server logs

java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions
 at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62)
 at 
io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160)
 at 
io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154)
 at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248)
 at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
 at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
 at 
io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115)
 at 
io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254)
 at 
io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246)
 at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
 at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
 at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
 at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at 
io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
 at 
io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
 at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at 
io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278)
 at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
 at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
 at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at 
io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
 at 
io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
 at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at 
io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
 at 
io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
 at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
 at 
io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
 at io.prestosql.operator.Driver.processInternal(Driver.java:379)
 at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
 at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
 at io.prestosql.operator.Driver.processFor(Driver.java:276)
 at 
io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
 at 
io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
 at 
io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
 at io.prestosql.$gen.Presto_31620200623_163219_1.run(Unknown Source)
 at 

[jira] [Created] (CARBONDATA-3905) When there are many segment files presto query fail

2020-07-16 Thread XiaoWen (Jira)
XiaoWen created CARBONDATA-3905:
---

 Summary: When there are many segment files presto query fail
 Key: CARBONDATA-3905
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3905
 Project: CarbonData
  Issue Type: Bug
  Components: presto-integration
Affects Versions: 2.0.0
Reporter: XiaoWen


test case1

insert data in:
df.writeStream.foreachBatch{ (batchDF: DataFrame, batchId: Long) => {
 ...

val cond = $"B.id".isin(df.select(col = "id").as[Int].collect: _*)
 target.as("A")
 .merge(df.as("B"), "A.id = B.id")
 .whenMatched(cond)
 .updateExpr(Map("name" -> "B.name", "city" -> "B.city", "age" -> "B.age"))
 .whenNotMatched(cond)
 .insertExpr(Map("id" -> "B.id", "name" -> "B.name", "city" -> "B.city", "age" 
-> "B.age"))
 .execute()

...
}).outputMode("update").trigger(Trigger.ProcessingTime("3600 seconds")).start()

a lot of segment files will be generated after a few hours
when i try to use presto to query.
single condition can be queried, but cannot be queried when there are multiple 
conditions.

select name from test_table // ok
select name from test_table where name = 'joe' // ok 
select name from test_table where name='joe' AND age > 25;// query failed
select name from test_table where name='joe' AND age > 25 AND city 
='shenzhen';// query failed

i have also tried to compact 'major' the segment files to reduce the segment 
quantity, and I still cannot query successfully.


presto server logs

java.lang.IllegalArgumentException: Invalid position 0 in block with 0 positions
at io.prestosql.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:62)
at 
io.prestosql.spi.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:160)
at 
io.prestosql.spi.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:154)
at io.prestosql.spi.block.LazyBlock.isNull(LazyBlock.java:248)
at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
at io.prestosql.$gen.PageFilter_20200703_084817_965.filter(Unknown Source)
at 
io.prestosql.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115)
at 
io.prestosql.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:254)
at 
io.prestosql.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:246)
at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at 
io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at 
io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at 
io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:278)
at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:320)
at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at 
io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:307)
at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at 
io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at 
io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at 
io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at 
io.prestosql.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
at 
io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:373)
at 
io.prestosql.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
at io.prestosql.operator.Driver.processInternal(Driver.java:379)
at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
at io.prestosql.operator.Driver.processFor(Driver.java:276)
at 
io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
at 
io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at 
io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at