[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-659176745


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3401/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-659176346


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1659/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3904) insert into data got Failed to create directory path /d

2020-07-15 Thread XiaoWen (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaoWen updated CARBONDATA-3904:

Description: 
insert data
{code:java}
spark.sql("INSERT OVERWRITE TABLE ods.test_table SELECT * FROM 
ods.socol_cmdinfo")
{code}
 check logs from spark application on yarn

$ yarn logs -applicationId application_1592787941917_4116

found a lot this error messages
{code:java}
20/07/15 16:59:45 ERROR FileFactory:  Failed to create directory path /d
20/07/15 16:59:45 ERROR FileFactory:  Failed to create directory path /d
20/07/15 16:59:51 ERROR FileFactory:  Failed to create directory path /d
20/07/15 16:59:51 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:03:36 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:03:36 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:09:55 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:09:55 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:12:45 ERROR FileFactory:  Failed to create directory path /d
{code}
{code:java}
core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java
{code}
{code:java}
public static void createDirectoryAndSetPermission(String directoryPath, 
FsPermission permission)
  throws IOException {
FileFactory.FileType fileType = FileFactory.getFileType(directoryPath);
switch (fileType) {
  case S3:
  case HDFS:
  case ALLUXIO:
  case VIEWFS:
  case CUSTOM:
  case HDFS_LOCAL:
try {
  Path path = new Path(directoryPath);
  FileSystem fs = path.getFileSystem(getConfiguration());
  if (!fs.exists(path)) {
fs.mkdirs(path);
fs.setPermission(path, permission);
  }
} catch (IOException e) {
  LOGGER.error("Exception occurred : " + e.getMessage(), e);
  throw e;
}
return;
  case LOCAL:
  default:
directoryPath = FileFactory.getUpdatedFilePath(directoryPath);
File file = new File(directoryPath);
if (!file.mkdirs()) {
  LOGGER.error(" Failed to create directory path " + directoryPath);
}}
  }
{code}
 

I output the variable directoryPath and fileType
{code:java}
if (!file.mkdirs()) {
  //  check variables
  LOGGER.info("directoryPath = [" + directoryPath + "], fileType = [" + 
fileType.toString() + "]");
  LOGGER.error(" Failed to create directory path " + directoryPath);
}
{code}
add line 

LOGGER.info("directoryPath = [" + directoryPath + "], fileType = [" + 
fileType.toString() + "]");

got echo on yarn logs

2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
 2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]



 

Why fileType is LOCAL?

I have never set this value: directoryPath = [/d]?

But the data is inserted normally and can be queried normally.

  was:
insert data

 
{code:java}
spark.sql("INSERT OVERWRITE TABLE ods.test_table SELECT * FROM 
ods.socol_cmdinfo")
{code}
 

 check logs from spark application on yarn

$ yarn logs -applicationId application_1592787941917_4116

found a lot this error messages
{code:java}

[jira] [Created] (CARBONDATA-3904) insert into data got Failed to create directory path /d

2020-07-15 Thread XiaoWen (Jira)
XiaoWen created CARBONDATA-3904:
---

 Summary: insert into data got Failed to create directory path /d
 Key: CARBONDATA-3904
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3904
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Affects Versions: 2.0.0
 Environment: spark-2.4.5
hadoop 2.7.3
carbondata2.0.1
Reporter: XiaoWen


insert data

 
{code:java}
spark.sql("INSERT OVERWRITE TABLE ods.test_table SELECT * FROM 
ods.socol_cmdinfo")
{code}
 

 check logs from spark application on yarn

$ yarn logs -applicationId application_1592787941917_4116

found a lot this error messages
{code:java}
20/07/15 16:59:45 ERROR FileFactory:  Failed to create directory path /d
20/07/15 16:59:45 ERROR FileFactory:  Failed to create directory path /d
20/07/15 16:59:51 ERROR FileFactory:  Failed to create directory path /d
20/07/15 16:59:51 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:00 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:00:35 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:02:47 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:03:36 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:03:36 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:09:55 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:09:55 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:10:05 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:11:08 ERROR FileFactory:  Failed to create directory path /d
20/07/15 17:12:45 ERROR FileFactory:  Failed to create directory path /d
{code}
 
{code:java}
core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java
{code}
{code:java}
public static void createDirectoryAndSetPermission(String directoryPath, 
FsPermission permission)
  throws IOException {
FileFactory.FileType fileType = FileFactory.getFileType(directoryPath);
switch (fileType) {
  case S3:
  case HDFS:
  case ALLUXIO:
  case VIEWFS:
  case CUSTOM:
  case HDFS_LOCAL:
try {
  Path path = new Path(directoryPath);
  FileSystem fs = path.getFileSystem(getConfiguration());
  if (!fs.exists(path)) {
fs.mkdirs(path);
fs.setPermission(path, permission);
  }
} catch (IOException e) {
  LOGGER.error("Exception occurred : " + e.getMessage(), e);
  throw e;
}
return;
  case LOCAL:
  default:
directoryPath = FileFactory.getUpdatedFilePath(directoryPath);
File file = new File(directoryPath);
if (!file.mkdirs()) {
  LOGGER.error(" Failed to create directory path " + directoryPath);
}}
  }
{code}
x

 
{code:java}
if (!file.mkdirs()) {
  //  check variables
  LOGGER.info("directoryPath = [" + directoryPath + "], fileType = [" + 
fileType.toString() + "]");
  LOGGER.error(" Failed to create directory path " + directoryPath);
}
{code}
add line 

LOGGER.info("directoryPath = [" + directoryPath + "], fileType = [" + 
fileType.toString() + "]");

got echo on yarn logs

2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]
2020-07-15 10:48:56 INFO directoryPath = [/d], fileType = [LOCAL]



 

Why fileType is LOCAL?

I have never set this value: directoryPath = [/d]?

But the data is inserted normally and can be queried normally

 

 

 

 

x



--
This message was sent by Atlassian Jira

[jira] [Created] (CARBONDATA-3903) Documentation Issue in Github Docs Link https://github.com/apache/carbondata/tree/master/docs

2020-07-15 Thread PURUJIT CHAUGULE (Jira)
PURUJIT CHAUGULE created CARBONDATA-3903:


 Summary: Documentation Issue in Github  Docs Link 
https://github.com/apache/carbondata/tree/master/docs
 Key: CARBONDATA-3903
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3903
 Project: CarbonData
  Issue Type: Bug
  Components: docs
Affects Versions: 2.0.1
 Environment: https://github.com/apache/carbondata/tree/master/docs
Reporter: PURUJIT CHAUGULE


dml-of-carbondata.md

LOAD DATA:
 * Mention Each Load is considered as a Segment.
 * Give all possible options for SORT_SCOPE like GLOBAL_SORT/LOCAL_SORT/NO_SORT 
(with explanation of difference between each type).
 * Add Example Of complete Load query with/without use of OPTIONS.

INSERT DATA:
 * Mention each insert is a Segment.

LOAD Using Static/Dynamic Partitioning:
 * Can give a hyperlink to Static/Dynamic partitioning.

UPDATE/DELETE:
 * Mention about delta files concept in update and delete.

DELETE:
 * Add example for deletion of all records from a table (delete from tablename).

COMPACTION:
 * Can mention Minor compaction of two types Auto and Manual( 
carbon.auto.load.merge =true/false), and that if carbon.auto.load.merge=false, 
trigger should be done manually.
 * Hyperlink to Configurable properties of Compaction.
 * Mention that compacted segments do not get cleaned automatically and should 
be triggered manually using clean files.

 

flink-integration-guide.md
 * Mention what are stages, how is it used.
 * Process of insertion, deletion of stages in carbontable. (How is it stored 
in carbontable).

 

language-manual.md
 * Mention Compaction Hyperlink in DML section.

 

spatial-index-guide.md
 * Mention the TBLPROPERTIES supported / not supported for Geo table.
 * Mention Spatial Index does not make a new column.
 * CTAS from one geo table to another does not create another Geo table can be 
mentioned.
 * Mention that a certain combination of Spatial Index table properties need to 
be added in create table, without which a geo table does not get created.
 * Mention that we cannot alter columns (change datatype, change name, drop) 
mentioned in spatial_index.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-658880484


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1658/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [CARBONDATA-3902] Fix CDC delete data issue on partition table

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-658879364


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3400/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 After this delete operation, partition 0, 1 and 2 should have deleted from it.

Actual:

{color:#067d17}select * from target order by key;{color}

{color:#067d17}+---+-+
|key|value|
+---+-+
|a |0 |
|b |1 |
|c |2 |
|d |3 |
+---+-+{color}

{color:#067d17}Expected:{color}

{color:#067d17}+---+-+
|key|value|
+---+-+
|d |3 |
+---+-+{color}

  was:
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 

abc


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 

abc

  was:
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 

abc


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> {code:java}
> import scala.collection.JavaConverters.
> import java.sql.Date
> import org.apache.spark.sql._
> import 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 

abc

  was:
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> {code:java}
> import scala.collection.JavaConverters.
> import java.sql.Date
> import org.apache.spark.sql._
> import 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Attachment: (was: issue.scala)

> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> [^issue.scala]
> {code:java}
> // code placeholder
> import scala.collection.JavaConverters.
> import java.sql.Date
> import org.apache.spark.sql._
> import org.apache.spark.sql.CarbonSession._
> import org.apache.spark.sql.catalyst.TableIdentifier
> import 
> org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
>  DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
> MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
> WhenNotMatchedAndExistsOnlyOnTarget}
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.test.util.QueryTest
> import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
> StringType, StructField, StructType}
> import spark.implicits.
> sql("drop table if exists target").show()
> val initframe = spark.createDataFrame(Seq(
> Row("a", "0"),
> Row("b", "1"),
> Row("c", "2"),
> Row("d", "3")
> ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
> StringTypeinitframe.write
> .format("carbondata")
> .option("tableName", "target")
> .option("partitionColumns", "value")
> .mode(SaveMode.Overwrite)
> .save()
> val target = spark.read.format("carbondata").option("tableName", 
> "target").load()
> var ccd =
> spark.createDataFrame(Seq(
> Row("a", "10", false, 0),
> Row("a", null, true, 1),
> Row("b", null, true, 2),
> Row("c", null, true, 3),
> Row("c", "20", false, 4),
> Row("c", "200", false, 5),
> Row("e", "100", false, 6)
> ).asJava,
> StructType(Seq(StructField("key", StringType),
> StructField("newValue", StringType),
> StructField("deleted", BooleanType), StructField("time", IntegerType
> ccd.createOrReplaceTempView("changes")
> ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
> FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM 
> changes GROUP BY key)")
> val updateMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> val insertMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> target.as("A").merge(ccd.as("B"), "A.key=B.key").
> whenMatched("B.deleted=true").
> delete().execute()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}

  was:
Steps to Reproduce Issue :

[^issue.scala]
{code:java}
// code placeholder
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> {code:java}
> import scala.collection.JavaConverters.
> import java.sql.Date
> import org.apache.spark.sql._
> 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :

[^issue.scala]
{code:java}
// code placeholder
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}

  was:
Steps to Reproduce Issue :

[^issue.scala]
{code:java}
// code placeholder
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}import 
org.apache.spark.sql.functions._import 
org.apache.spark.sql.test.util.QueryTestimport 
org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, 
StructField, StructType}
import spark.implicits.sql("drop table if exists target").show()val initframe = 
spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()val target = spark.read.format("carbondata").option("tableName", 
"target").load()var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", 
IntegerTypeccd.createOrReplaceTempView("changes")ccd = sql("SELECT key, 
latest.newValue as newValue, latest.deleted as deleted FROM ( SELECT key, 
max(struct(time, newValue, deleted)) as latest FROM changes GROUP BY key)")val 
updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]val insertMap = Map("key" -> "B.key", 
"value" -> "B.newValue").asInstanceOf[Map[Any, 
Any]]target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: issue.scala
>
>
> Steps to Reproduce Issue :
> [^issue.scala]
> {code:java}
> // code placeholder

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Attachment: issue.scala

> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: issue.scala
>
>
> Steps to Reproduce Issue :
> import scala.collection.JavaConverters._import java.sql.Dateimport 
> org.apache.spark.sql._import org.apache.spark.sql.CarbonSession._import 
> org.apache.spark.sql.catalyst.TableIdentifierimport 
> org.apache.spark.sql.execution.command.mutation.merge.\{CarbonMergeDataSetCommand,
>  DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
> MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
> WhenNotMatchedAndExistsOnlyOnTarget}import 
> org.apache.spark.sql.functions._import 
> org.apache.spark.sql.test.util.QueryTestimport 
> org.apache.spark.sql.types.\{BooleanType, DateType, IntegerType, StringType, 
> StructField, StructType}import spark.implicits._
>   
> sql("drop table if exists target").show()
> val initframe = spark.createDataFrame(Seq(
>   Row("a", "0"),
>   Row("b", "1"),
>   Row("c", "2"),
>   Row("d", "3")
> ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
> StringType
> initframe.write
>   .format("carbondata")
>   .option("tableName", "target")
>   .option("partitionColumns", "value")
>   .mode(SaveMode.Overwrite)
>   .save()
>   
> val target = spark.read.format("carbondata").option("tableName", 
> "target").load()var ccd =
>   spark.createDataFrame(Seq(
> Row("a", "10", false,  0),
> Row("a", null, true, 1),   
> Row("b", null, true, 2),   
> Row("c", null, true, 3),   
> Row("c", "20", false, 4),
> Row("c", "200", false, 5),
> Row("e", "100", false, 6) 
>   ).asJava,
> StructType(Seq(StructField("key", StringType),
>   StructField("newValue", StringType),
>   StructField("deleted", BooleanType), StructField("time", IntegerType
> 
> ccd.createOrReplaceTempView("changes")
> ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
> FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM 
> changes GROUP BY key)")
> val updateMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> val insertMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> target.as("A").merge(ccd.as("B"), "A.key=B.key").
>   whenMatched("B.deleted=true").
>   delete().execute()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :

[^issue.scala]
{code:java}
// code placeholder
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}import 
org.apache.spark.sql.functions._import 
org.apache.spark.sql.test.util.QueryTestimport 
org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, 
StructField, StructType}
import spark.implicits.sql("drop table if exists target").show()val initframe = 
spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()val target = spark.read.format("carbondata").option("tableName", 
"target").load()var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", 
IntegerTypeccd.createOrReplaceTempView("changes")ccd = sql("SELECT key, 
latest.newValue as newValue, latest.deleted as deleted FROM ( SELECT key, 
max(struct(time, newValue, deleted)) as latest FROM changes GROUP BY key)")val 
updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]val insertMap = Map("key" -> "B.key", 
"value" -> "B.newValue").asInstanceOf[Map[Any, 
Any]]target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}

  was:
Steps to Reproduce Issue :
import scala.collection.JavaConverters._import java.sql.Dateimport 
org.apache.spark.sql._import org.apache.spark.sql.CarbonSession._import 
org.apache.spark.sql.catalyst.TableIdentifierimport 
org.apache.spark.sql.execution.command.mutation.merge.\{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}import 
org.apache.spark.sql.functions._import 
org.apache.spark.sql.test.util.QueryTestimport 
org.apache.spark.sql.types.\{BooleanType, DateType, IntegerType, StringType, 
StructField, StructType}import spark.implicits._

sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
  Row("a", "0"),
  Row("b", "1"),
  Row("c", "2"),
  Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
  .format("carbondata")
  .option("tableName", "target")
  .option("partitionColumns", "value")
  .mode(SaveMode.Overwrite)
  .save()
  
val target = spark.read.format("carbondata").option("tableName", 
"target").load()var ccd =
  spark.createDataFrame(Seq(
Row("a", "10", false,  0),
Row("a", null, true, 1),   
Row("b", null, true, 2),   
Row("c", null, true, 3),   
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6) 
  ).asJava,
StructType(Seq(StructField("key", StringType),
  StructField("newValue", StringType),
  StructField("deleted", BooleanType), StructField("time", IntegerType
  
ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
  whenMatched("B.deleted=true").
  delete().execute()


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: issue.scala
>
>
> Steps to Reproduce Issue :
> 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
import scala.collection.JavaConverters._import java.sql.Dateimport 
org.apache.spark.sql._import org.apache.spark.sql.CarbonSession._import 
org.apache.spark.sql.catalyst.TableIdentifierimport 
org.apache.spark.sql.execution.command.mutation.merge.\{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}import 
org.apache.spark.sql.functions._import 
org.apache.spark.sql.test.util.QueryTestimport 
org.apache.spark.sql.types.\{BooleanType, DateType, IntegerType, StringType, 
StructField, StructType}import spark.implicits._

sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
  Row("a", "0"),
  Row("b", "1"),
  Row("c", "2"),
  Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
  .format("carbondata")
  .option("tableName", "target")
  .option("partitionColumns", "value")
  .mode(SaveMode.Overwrite)
  .save()
  
val target = spark.read.format("carbondata").option("tableName", 
"target").load()var ccd =
  spark.createDataFrame(Seq(
Row("a", "10", false,  0),
Row("a", null, true, 1),   
Row("b", null, true, 2),   
Row("c", null, true, 3),   
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6) 
  ).asJava,
StructType(Seq(StructField("key", StringType),
  StructField("newValue", StringType),
  StructField("deleted", BooleanType), StructField("time", IntegerType
  
ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
  whenMatched("B.deleted=true").
  delete().execute()

> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> import scala.collection.JavaConverters._import java.sql.Dateimport 
> org.apache.spark.sql._import org.apache.spark.sql.CarbonSession._import 
> org.apache.spark.sql.catalyst.TableIdentifierimport 
> org.apache.spark.sql.execution.command.mutation.merge.\{CarbonMergeDataSetCommand,
>  DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
> MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
> WhenNotMatchedAndExistsOnlyOnTarget}import 
> org.apache.spark.sql.functions._import 
> org.apache.spark.sql.test.util.QueryTestimport 
> org.apache.spark.sql.types.\{BooleanType, DateType, IntegerType, StringType, 
> StructField, StructType}import spark.implicits._
>   
> sql("drop table if exists target").show()
> val initframe = spark.createDataFrame(Seq(
>   Row("a", "0"),
>   Row("b", "1"),
>   Row("c", "2"),
>   Row("d", "3")
> ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
> StringType
> initframe.write
>   .format("carbondata")
>   .option("tableName", "target")
>   .option("partitionColumns", "value")
>   .mode(SaveMode.Overwrite)
>   .save()
>   
> val target = spark.read.format("carbondata").option("tableName", 
> "target").load()var ccd =
>   spark.createDataFrame(Seq(
> Row("a", "10", false,  0),
> Row("a", null, true, 1),   
> Row("b", null, true, 2),   
> Row("c", null, true, 3),   
> Row("c", "20", false, 4),
> Row("c", "200", false, 5),
> Row("e", "100", false, 6) 
>   ).asJava,
> StructType(Seq(StructField("key", StringType),
>   StructField("newValue", StringType),
>   StructField("deleted", BooleanType), StructField("time", IntegerType
> 
> ccd.createOrReplaceTempView("changes")
> ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
> FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM 
> changes GROUP BY key)")
> val updateMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> val insertMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> 

[jira] [Created] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3902:


 Summary: Query on partition table gives incorrect results after 
Delete records using CDC
 Key: CARBONDATA-3902
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3770: [CARBONDATA-3829] Support pagination in SDK reader

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3770:
URL: https://github.com/apache/carbondata/pull/3770#issuecomment-658797923


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3399/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3770: [CARBONDATA-3829] Support pagination in SDK reader

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3770:
URL: https://github.com/apache/carbondata/pull/3770#issuecomment-658797618


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1657/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [WIP] Fix CDC delete data on partition table

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-658780719


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3398/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3846: [WIP] Fix CDC delete data on partition table

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846#issuecomment-658779577


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1656/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3901) Documentation issues in https://github.com/apache/carbondata/tree/master/docs

2020-07-15 Thread Chetan Bhat (Jira)
Chetan Bhat created CARBONDATA-3901:
---

 Summary: Documentation issues in 
https://github.com/apache/carbondata/tree/master/docs
 Key: CARBONDATA-3901
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3901
 Project: CarbonData
  Issue Type: Bug
  Components: docs
Affects Versions: 2.0.1
 Environment: https://github.com/apache/carbondata/tree/master/docs
Reporter: Chetan Bhat


*Issue 1 :* 
https://github.com/apache/carbondata/blob/master/docs/alluxio-guide.md 
getOrCreateCarbonSession not used in Carbon 2.0 version and should be 
removed.Issue 1 : 
https://github.com/apache/carbondata/blob/master/docs/alluxio-guide.md 
getOrCreateCarbonSession not used in Carbon 2.0 version and should be removed.
Testing use alluxio by CarbonSessionimport 
org.apache.spark.sql.CarbonSession._import org.apache.spark.sql.SparkSession   
val carbon = 
SparkSession.builder().master("local").appName("test").getOrCreateCarbonSession("alluxio://localhost:19998/carbondata");carbon.sql("CREATE
 TABLE carbon_alluxio(id String,name String, city String,age Int) STORED as 
carbondata");carbon.sql(s"LOAD DATA LOCAL INPATH 
'${CARBONDATA_PATH}/integration/spark/src/test/resources/sample.csv' into table 
carbon_alluxio");carbon.sql("select * from carbon_alluxio").show

*Issue 2  -* 
https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.mdSORT_SCOPE
 Sort scope of the load.Options include no sort, local sort ,batch sort and 
global sort  --> Batch sort to be removed as its not supported.

*Issue 3 -* 
https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md#close-stream
   CLOSE STREAM link is not working.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3845: [CARBONDATA-3871][FOLLOW-UP] Fix memory issue when get data from row page

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3845:
URL: https://github.com/apache/carbondata/pull/3845#issuecomment-658737910


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3397/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3845: [CARBONDATA-3871][FOLLOW-UP] Fix memory issue when get data from row page

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3845:
URL: https://github.com/apache/carbondata/pull/3845#issuecomment-658737476


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1655/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3810: [CARBONDATA-3900] [CARBONDATA-3882] [CARBONDATA-3881] Fix multiple concurrent issues in table status lock and segment lock for SI a

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3810:
URL: https://github.com/apache/carbondata/pull/3810#issuecomment-658733550


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3395/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3770: [CARBONDATA-3829] Support pagination in SDK reader

2020-07-15 Thread GitBox


ajantha-bhat commented on pull request #3770:
URL: https://github.com/apache/carbondata/pull/3770#issuecomment-658725720


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 opened a new pull request #3846: [WIP] Fix CDC delete data on partition table

2020-07-15 Thread GitBox


Indhumathi27 opened a new pull request #3846:
URL: https://github.com/apache/carbondata/pull/3846


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3844: [HOTFIX] Fix wrong License header

2020-07-15 Thread GitBox


asfgit closed pull request #3844:
URL: https://github.com/apache/carbondata/pull/3844


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3844: [HOTFIX] Fix wrong License header

2020-07-15 Thread GitBox


kunal642 commented on pull request #3844:
URL: https://github.com/apache/carbondata/pull/3844#issuecomment-658708250


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658688900


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1652/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3810: [CARBONDATA-3900] [CARBONDATA-3882] [CARBONDATA-3881] Fix multiple concurrent issues in table status lock and segment lock for SI a

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3810:
URL: https://github.com/apache/carbondata/pull/3810#issuecomment-658688342


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1654/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3785: [CARBONDATA-3843] Fix merge index issue in streaming table

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3785:
URL: https://github.com/apache/carbondata/pull/3785#issuecomment-658686549


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1651/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3785: [CARBONDATA-3843] Fix merge index issue in streaming table

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3785:
URL: https://github.com/apache/carbondata/pull/3785#issuecomment-658685680


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3394/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3770: [CARBONDATA-3829] Support pagination in SDK reader

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3770:
URL: https://github.com/apache/carbondata/pull/3770#issuecomment-658685529


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1653/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658682568


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3392/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3770: [CARBONDATA-3829] Support pagination in SDK reader

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3770:
URL: https://github.com/apache/carbondata/pull/3770#issuecomment-658681740


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3393/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #710: [CARBONDATA-833]load data from dataframe,generater data row may be error when delimiter…

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #710:
URL: https://github.com/apache/carbondata/pull/710#issuecomment-658678637


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3396/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3844: [HOTFIX] Fix wrong License header

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3844:
URL: https://github.com/apache/carbondata/pull/3844#issuecomment-658678123


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3391/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3844: [HOTFIX] Fix wrong License header

2020-07-15 Thread GitBox


CarbonDataQA1 commented on pull request #3844:
URL: https://github.com/apache/carbondata/pull/3844#issuecomment-658674995


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1650/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kevinjmh commented on a change in pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-07-15 Thread GitBox


kevinjmh commented on a change in pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#discussion_r454913685



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/merger/UnsafeInMemoryIntermediateDataMerger.java
##
@@ -141,25 +140,26 @@ private UnsafeCarbonRowForMerge 
getSortedRecordFromMemory() {
 // be based on comparator we are passing the heap
 // when will call poll it will always delete root of the tree and then
 // it does trickel down operation complexity is log(n)
-UnsafeInmemoryMergeHolder poll = this.recordHolderHeap.poll();
+UnsafeInmemoryMergeHolder poll = this.recordHolderHeap.peek();
 
 // get the row from chunk
 row = poll.getRow();
 
 // check if there no entry present
 if (!poll.hasNext()) {
+  this.recordHolderHeap.poll();

Review comment:
   @QiangCai Please check #3845





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kevinjmh opened a new pull request #3845: [CARBONDATA-3871][FOLLOW-UP] Fix memory issue when get data from row page

2020-07-15 Thread GitBox


kevinjmh opened a new pull request #3845:
URL: https://github.com/apache/carbondata/pull/3845


### Why is this PR needed?
   PR  #3804 free the row page after running in-memory intermeidate merge, but 
the data will still be used in final sort because that merge only acquire the 
memory address and row page index instead of moving the rows from here to 
there. 

### What changes were proposed in this PR?
   Remove code for freeing the row page in case of in-memory intermeidate merge.
   As a special case, add comment for explanation.
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat closed pull request #3829: [WIP] Fix maintable load failure in concurrent load and compaction sceneario

2020-07-15 Thread GitBox


ajantha-bhat closed pull request #3829:
URL: https://github.com/apache/carbondata/pull/3829


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3829: [WIP] Fix maintable load failure in concurrent load and compaction sceneario

2020-07-15 Thread GitBox


ajantha-bhat commented on pull request #3829:
URL: https://github.com/apache/carbondata/pull/3829#issuecomment-658620654


   combined in #3810



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat closed pull request #3809: [CARBONDATA-3881] Fix concurrent main table compaction and SI load issue

2020-07-15 Thread GitBox


ajantha-bhat closed pull request #3809:
URL: https://github.com/apache/carbondata/pull/3809


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3809: [CARBONDATA-3881] Fix concurrent main table compaction and SI load issue

2020-07-15 Thread GitBox


ajantha-bhat commented on pull request #3809:
URL: https://github.com/apache/carbondata/pull/3809#issuecomment-658620378


   @kunal642 , @akashrn5 : combined all 3 in #3810 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3810: [CARBONDATA-3900] [CARBONDATA-3882] [CARBONDATA-3881] Fix multiple concurrent issues in table status lock and segment lock for SI an

2020-07-15 Thread GitBox


ajantha-bhat commented on pull request #3810:
URL: https://github.com/apache/carbondata/pull/3810#issuecomment-658620028


   @kunal642 , @akashrn5 @QiangCai : please check it



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3900) maintable load failure in concurrent load and compaction sceneario

2020-07-15 Thread Ajantha Bhat (Jira)
Ajantha Bhat created CARBONDATA-3900:


 Summary: maintable load failure in concurrent load and compaction 
sceneario
 Key: CARBONDATA-3900
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3900
 Project: CarbonData
  Issue Type: Bug
Reporter: Ajantha Bhat
Assignee: Ajantha Bhat
 Fix For: 2.1.0


Main table load flow segment lock is released before updating the table status 
success.

So, concurrent operation was considering this segment as stale segment (as 
segment lock is not present) and cleaning it. Leading to unable to get file 
status exception. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] ajantha-bhat commented on pull request #3770: [CARBONDATA-3829] Support pagination in SDK reader

2020-07-15 Thread GitBox


ajantha-bhat commented on pull request #3770:
URL: https://github.com/apache/carbondata/pull/3770#issuecomment-658610305


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3785: [CARBONDATA-3843] Fix merge index issue in streaming table

2020-07-15 Thread GitBox


ajantha-bhat commented on pull request #3785:
URL: https://github.com/apache/carbondata/pull/3785#issuecomment-658610534


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types

2020-07-15 Thread GitBox


ajantha-bhat commented on pull request #3771:
URL: https://github.com/apache/carbondata/pull/3771#issuecomment-658610416


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3844: [HOTFIX] Fix wrong License header

2020-07-15 Thread GitBox


ajantha-bhat commented on pull request #3844:
URL: https://github.com/apache/carbondata/pull/3844#issuecomment-658602758


   @kunal642 please check



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat opened a new pull request #3844: [HOTFIX] Fix wrong License header

2020-07-15 Thread GitBox


ajantha-bhat opened a new pull request #3844:
URL: https://github.com/apache/carbondata/pull/3844


### Why is this PR needed?
   Two test files in pycarbon module as wrong license header.
   As pycarbon depends on open source Apache license uber's petastrom project. 
   These two testcase files were imported from that project has this error. 

### What changes were proposed in this PR?
   Fix the license header same as other files.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org