[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in 
error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569880578
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1372/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert 
array('')/array() into Struct column …
URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569880512
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1371/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3550: [WIP] Remove global dictionary in query

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3550: [WIP] Remove global dictionary in query
URL: https://github.com/apache/carbondata/pull/3550#issuecomment-569879187
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1370/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …

2019-12-30 Thread GitBox
jackylk commented on issue #3549: [Carbondata-3643] Insert array('')/array() 
into Struct column …
URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569872897
 
 
   please modify the PR description according to the template


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …

2019-12-30 Thread GitBox
jackylk commented on issue #3549: [Carbondata-3643] Insert array('')/array() 
into Struct column …
URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569872912
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3631) StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map

2019-12-30 Thread Jacky Li (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-3631.
--
Resolution: Fixed

> StringIndexOutOfBoundsException When Inserting Select From a Parquet Table 
> with Empty array/map
> ---
>
> Key: CARBONDATA-3631
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3631
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>
> sql("insert into datatype_array_parquet values(array())")
> sql("insert into datatype_array_carbondata select f from 
> datatype_array_parquet")
>  
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:935)
> at java.lang.StringBuilder.substring(StringBuilder.java:76)
> at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166)
> at 
> org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:77)
> at 
> org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…

2019-12-30 Thread GitBox
asfgit closed pull request #3545: [Carbondata-3631] 
StringIndexOutOfBoundsException When Inserting Sele…
URL: https://github.com/apache/carbondata/pull/3545
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk opened a new pull request #3550: [WIP] Remove global dictionary in query

2019-12-30 Thread GitBox
jackylk opened a new pull request #3550: [WIP] Remove global dictionary in query
URL: https://github.com/apache/carbondata/pull/3550
 
 
### Why is this PR needed?
Global dictionary feature is deprecated, it should be removed in query flow

### What changes were proposed in this PR?
   Global dictionary related analyzer rules and late decode optimizer strategy 
is removed
   Global dictionary related filter processing is removed in read flow in 
carbon-core module
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…

2019-12-30 Thread GitBox
jackylk commented on issue #3545: [Carbondata-3631] 
StringIndexOutOfBoundsException When Inserting Sele…
URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569872561
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

2019-12-30 Thread GitBox
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add 
column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154321
 
 

 ##
 File path: 
streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala
 ##
 @@ -50,7 +51,7 @@ object FieldConverter {
   value match {
 case s: String => if (!isVarcharType && !isComplexType &&
   s.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-  throw new Exception("Dataload failed, String length cannot exceed " +
+  throw new Exception( exceedErrorMsg +
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

2019-12-30 Thread GitBox
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add 
column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154286
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala
 ##
 @@ -297,7 +297,8 @@ class CarbonBlockDistinctValuesCombineRDD(
   val complexDelimiters = new util.ArrayList[String]
   model.delimiters.foreach(x => complexDelimiters.add(x))
   for (i <- 0 until dimNum) {
-
dimensionParsers(i).parseString(CarbonScalaUtil.getString(row.get(i),
+dimensionParsers(i).parseString(CarbonScalaUtil.getString(row,
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

2019-12-30 Thread GitBox
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add 
column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154300
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ##
 @@ -60,17 +60,27 @@ object CarbonScalaUtil {
 
   private val LOGGER: Logger = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
-  def getString(value: Any,
+  def getString(row: Row,
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000

2019-12-30 Thread GitBox
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add 
column idx in error msg when string length exceed 32000
URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154198
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ##
 @@ -60,17 +60,27 @@ object CarbonScalaUtil {
 
   private val LOGGER: Logger = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
-  def getString(value: Any,
+  def getString(row: Row,
+  idx: Int,
   serializationNullFormat: String,
   complexDelimiters: util.ArrayList[String],
   timeStampFormat: SimpleDateFormat,
   dateFormat: SimpleDateFormat,
   isVarcharType: Boolean = false,
   isComplexType: Boolean = false,
   level: Int = 0): String = {
-FieldConverter.objectToString(value, serializationNullFormat, 
complexDelimiters,
-  timeStampFormat, dateFormat, isVarcharType = isVarcharType, 
isComplexType = isComplexType,
-  level)
+try {
+  FieldConverter.objectToString(row.get(idx), serializationNullFormat, 
complexDelimiters,
+timeStampFormat, dateFormat, isVarcharType = isVarcharType, 
isComplexType = isComplexType,
+level)
+} catch {
+  case e: Exception =>
+if (e.getMessage.startsWith(FieldConverter.exceedErrorMsg)) {
+  throw new Exception("Column idx " + idx + " too long", e)
 
 Review comment:
   I want to add column idx into the error message. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert 
array('')/array() into Struct column …
URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569869688
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1389/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert 
array('')/array() into Struct column …
URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569866482
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1379/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert 
array('')/array() into Struct column …
URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569861903
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1369/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] marchpure opened a new pull request #3549: [Carbondata-3643] Insert array('')/array() into Struct column …

2019-12-30 Thread GitBox
marchpure opened a new pull request #3549: [Carbondata-3643] Insert 
array('')/array() into Struct column …
URL: https://github.com/apache/carbondata/pull/3549
 
 
   …will result in array(null), which is inconsist with Parquet
   
   Modification reason: Result is incorrect when Inserting Select From a 
Parquet Table with a Struct with array('')/array(, The result shouldn't be 
array(null), while parquet results in array('') or array().
   
   Modification content: When the input value is Struct(""), the 
StructParserImpl handle the EMPTY STRING
   
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet

2019-12-30 Thread Xingjun Hao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingjun Hao updated CARBONDATA-3643:

Description: 
sql("create table datatype_struct_parquet(price struct>) stored 
as parquet")
 sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))")
 sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata")
 sql("insert into datatype_struct_carbondata select * from 
datatype_struct_parquet")

 
{code:java}
//

sql("create table datatype_struct_parquet(price struct>) stored 
as parquet") 
sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))") 
sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata") sql("insert into datatype_struct_carbondata select * 
from datatype_struct_parquet")

checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
FROM datatype_struct_parquet"))

!== Correct Answer - 1 == == Spark Answer - 1 == 
![[WrappedArray()]] [[WrappedArray(null)]]
{code}
 

  was:
sql("create table datatype_struct_parquet(price struct>) stored 
as parquet")
 sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))")
 sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata")
 sql("insert into datatype_struct_carbondata select * from 
datatype_struct_parquet")

 
{code:java}
//
checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
FROM datatype_struct_parquet"))

!== Correct Answer - 1 == == Spark Answer - 1 == 
![[WrappedArray()]] [[WrappedArray(null)]]
{code}
 


> Insert array('')/array() into Struct column will result in 
> array(null), which is inconsist with Parquet
> --
>
> Key: CARBONDATA-3643
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3643
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>
> sql("create table datatype_struct_parquet(price struct>) 
> stored as parquet")
>  sql("insert into table datatype_struct_parquet values(named_struct('b', 
> array('')))")
>  sql("create table datatype_struct_carbondata(price struct>) 
> stored as carbondata")
>  sql("insert into datatype_struct_carbondata select * from 
> datatype_struct_parquet")
>  
> {code:java}
> //
> sql("create table datatype_struct_parquet(price struct>) 
> stored as parquet") 
> sql("insert into table datatype_struct_parquet values(named_struct('b', 
> array('')))") 
> sql("create table datatype_struct_carbondata(price struct>) 
> stored as carbondata") sql("insert into datatype_struct_carbondata select * 
> from datatype_struct_parquet")
> checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
> FROM datatype_struct_parquet"))
> !== Correct Answer - 1 == == Spark Answer - 1 == 
> ![[WrappedArray()]] [[WrappedArray(null)]]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet

2019-12-30 Thread Xingjun Hao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingjun Hao updated CARBONDATA-3643:

Description: 
sql("create table datatype_struct_parquet(price struct>) stored 
as parquet")
 sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))")
 sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata")
 sql("insert into datatype_struct_carbondata select * from 
datatype_struct_parquet")

 
{code:java}
//
checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
FROM datatype_struct_parquet"))

!== Correct Answer - 1 == == Spark Answer - 1 == 
![[WrappedArray()]] [[WrappedArray(null)]]
{code}
 

  was:
sql("create table datatype_struct_parquet(price struct>) stored 
as parquet")
sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))")
sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata")
sql("insert into datatype_struct_carbondata select * from 
datatype_struct_parquet")

 
{code:java}
//
!== Correct Answer - 1 == == Spark Answer - 1 == 
![[WrappedArray()]] [[WrappedArray(null)]]
{code}
 


> Insert array('')/array() into Struct column will result in 
> array(null), which is inconsist with Parquet
> --
>
> Key: CARBONDATA-3643
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3643
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>
> sql("create table datatype_struct_parquet(price struct>) 
> stored as parquet")
>  sql("insert into table datatype_struct_parquet values(named_struct('b', 
> array('')))")
>  sql("create table datatype_struct_carbondata(price struct>) 
> stored as carbondata")
>  sql("insert into datatype_struct_carbondata select * from 
> datatype_struct_parquet")
>  
> {code:java}
> //
> checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
> FROM datatype_struct_parquet"))
> !== Correct Answer - 1 == == Spark Answer - 1 == 
> ![[WrappedArray()]] [[WrappedArray(null)]]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet

2019-12-30 Thread Xingjun Hao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingjun Hao updated CARBONDATA-3643:

Description: 
 
{code:java}
//
sql("create table datatype_struct_parquet(price struct>) stored 
as parquet") 
sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))") 
sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata") 
sql("insert into datatype_struct_carbondata select * from 
datatype_struct_parquet")

checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
FROM datatype_struct_parquet"))

!== Correct Answer - 1 == == Spark Answer - 1 == 
![[WrappedArray()]] [[WrappedArray(null)]]
{code}
 

  was:
sql("create table datatype_struct_parquet(price struct>) stored 
as parquet")
 sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))")
 sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata")
 sql("insert into datatype_struct_carbondata select * from 
datatype_struct_parquet")

 
{code:java}
//

sql("create table datatype_struct_parquet(price struct>) stored 
as parquet") 
sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))") 
sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata") sql("insert into datatype_struct_carbondata select * 
from datatype_struct_parquet")

checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
FROM datatype_struct_parquet"))

!== Correct Answer - 1 == == Spark Answer - 1 == 
![[WrappedArray()]] [[WrappedArray(null)]]
{code}
 


> Insert array('')/array() into Struct column will result in 
> array(null), which is inconsist with Parquet
> --
>
> Key: CARBONDATA-3643
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3643
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>
>  
> {code:java}
> //
> sql("create table datatype_struct_parquet(price struct>) 
> stored as parquet") 
> sql("insert into table datatype_struct_parquet values(named_struct('b', 
> array('')))") 
> sql("create table datatype_struct_carbondata(price struct>) 
> stored as carbondata") 
> sql("insert into datatype_struct_carbondata select * from 
> datatype_struct_parquet")
> checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
> FROM datatype_struct_parquet"))
> !== Correct Answer - 1 == == Spark Answer - 1 == 
> ![[WrappedArray()]] [[WrappedArray(null)]]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet

2019-12-30 Thread Xingjun Hao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingjun Hao updated CARBONDATA-3643:

Fix Version/s: 2.0.0
Affects Version/s: 2.0.0
   1.6.1
  Description: 
sql("create table datatype_struct_parquet(price struct>) stored 
as parquet")
sql("insert into table datatype_struct_parquet values(named_struct('b', 
array('')))")
sql("create table datatype_struct_carbondata(price struct>) 
stored as carbondata")
sql("insert into datatype_struct_carbondata select * from 
datatype_struct_parquet")

 
{code:java}
//
!== Correct Answer - 1 == == Spark Answer - 1 == 
![[WrappedArray()]] [[WrappedArray(null)]]
{code}
 

> Insert array('')/array() into Struct column will result in 
> array(null), which is inconsist with Parquet
> --
>
> Key: CARBONDATA-3643
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3643
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>
> sql("create table datatype_struct_parquet(price struct>) 
> stored as parquet")
> sql("insert into table datatype_struct_parquet values(named_struct('b', 
> array('')))")
> sql("create table datatype_struct_carbondata(price struct>) 
> stored as carbondata")
> sql("insert into datatype_struct_carbondata select * from 
> datatype_struct_parquet")
>  
> {code:java}
> //
> !== Correct Answer - 1 == == Spark Answer - 1 == 
> ![[WrappedArray()]] [[WrappedArray(null)]]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet

2019-12-30 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3643:
---

 Summary: Insert array('')/array() into Struct column will 
result in array(null), which is inconsist with Parquet
 Key: CARBONDATA-3643
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3643
 Project: CarbonData
  Issue Type: Bug
Reporter: Xingjun Hao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3545: [Carbondata-3631] 
StringIndexOutOfBoundsException When Inserting Sele…
URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569858679
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1387/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory 
DB and carbon data query performance comparison doc chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#issuecomment-569857896
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1385/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3545: [Carbondata-3631] 
StringIndexOutOfBoundsException When Inserting Sele…
URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569857629
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1377/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory 
DB and carbon data query performance comparison doc chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#issuecomment-569856676
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1374/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569854272
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1378/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569854185
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1388/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3545: [Carbondata-3631] 
StringIndexOutOfBoundsException When Inserting Sele…
URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569852276
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1367/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569851775
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1368/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory 
DB and carbon data query performance comparison doc chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#issuecomment-569851035
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1364/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] marchpure commented on a change in pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…

2019-12-30 Thread GitBox
marchpure commented on a change in pull request #3545: [Carbondata-3631] 
StringIndexOutOfBoundsException When Inserting Sele…
URL: https://github.com/apache/carbondata/pull/3545#discussion_r362137262
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ##
 @@ -300,6 +300,9 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_SKIP_EMPTY_LINE_DEFAULT = "false";
 
+
+  public static final String EMPTY_DATA_RETURN = "!EMPTY_DATA_RETURN!";
 
 Review comment:
   Modified


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] marchpure commented on a change in pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…

2019-12-30 Thread GitBox
marchpure commented on a change in pull request #3545: [Carbondata-3631] 
StringIndexOutOfBoundsException When Inserting Sele…
URL: https://github.com/apache/carbondata/pull/3545#discussion_r362137258
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/StructParserImpl.java
 ##
 @@ -59,6 +59,12 @@ public StructObject parse(Object data) {
   }
   return new StructObject(array);
 }
+  } else if (value.isEmpty()) {
+Object[] array = new Object[1];
 
 Review comment:
   Modified


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] marchpure commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…

2019-12-30 Thread GitBox
marchpure commented on issue #3545: [Carbondata-3631] 
StringIndexOutOfBoundsException When Inserting Sele…
URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569850428
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135805
 
 

 ##
 File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md
 ##
 @@ -0,0 +1,115 @@
+
+
+## CarbonData与商业列存DB性能对比
+
+本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.测试环境对比
+
+查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。
+
+| 集群 | 描述  |
+|  | - 
|
+| 某商业列存DB集群 | 3节点,SSD硬盘|
+| Hadoop集群   | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 |
+
+## 2.查询SQL模型介绍
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135688
 
 

 ##
 File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md
 ##
 @@ -0,0 +1,115 @@
+
+
+## CarbonData与商业列存DB性能对比
+
+本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.测试环境对比
+
+查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。
+
+| 集群 | 描述  |
+|  | - 
|
+| 某商业列存DB集群 | 3节点,SSD硬盘|
+| Hadoop集群   | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 |
+
+## 2.查询SQL模型介绍
+
+某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。
+
+```某商业列存DB的查询SQL模型:```
+
+SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS 
COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 
0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS 
COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 
0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * 
delta AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * delta AS COLUMN_G , 
SUM(COALESCE(COLUMN_B, 0)) * delta AS COLUMN_H , MT."TEMP" AS "TEMP", COUNT(1) 
OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , 
COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS 
COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."TEMP" AS 
"TEMP" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "TEMP2" , CASE WHEN 
"TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "TEMP" , CASE WHEN "TYPE_ID" = 2 THEN 
"CLOUMN_NAME" END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE 
WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN 
"CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."TEMP2" WHERE 
TABLE_A.NAME_TEMP IS NOT NULL AND "TIME" < A AND "TIME" >= B GROUP BY 
TABLE_A."TEMP" ) MT GROUP BY MT."TEMP" ORDER BY COLUMN_C DESC
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135714
 
 

 ##
 File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md
 ##
 @@ -0,0 +1,115 @@
+
+
+## CarbonData与商业列存DB性能对比
+
+本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.测试环境对比
+
+查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。
+
+| 集群 | 描述  |
+|  | - 
|
+| 某商业列存DB集群 | 3节点,SSD硬盘|
+| Hadoop集群   | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 |
+
+## 2.查询SQL模型介绍
+
+某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135553
 
 

 ##
 File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md
 ##
 @@ -0,0 +1,115 @@
+
+
+## CarbonData与商业列存DB性能对比
+
+本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.测试环境对比
+
+查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。
+
+| 集群 | 描述  |
+|  | - 
|
+| 某商业列存DB集群 | 3节点,SSD硬盘|
+| Hadoop集群   | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 |
+
+## 2.查询SQL模型介绍
+
+某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。
+
+```某商业列存DB的查询SQL模型:```
+
+SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS 
COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 
0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS 
COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 
0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * 
delta AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * delta AS COLUMN_G , 
SUM(COALESCE(COLUMN_B, 0)) * delta AS COLUMN_H , MT."TEMP" AS "TEMP", COUNT(1) 
OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , 
COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS 
COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."TEMP" AS 
"TEMP" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "TEMP2" , CASE WHEN 
"TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "TEMP" , CASE WHEN "TYPE_ID" = 2 THEN 
"CLOUMN_NAME" END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE 
WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN 
"CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."TEMP2" WHERE 
TABLE_A.NAME_TEMP IS NOT NULL AND "TIME" < A AND "TIME" >= B GROUP BY 
TABLE_A."TEMP" ) MT GROUP BY MT."TEMP" ORDER BY COLUMN_C DESC
+
+其中一个SUM后面称为一个counter
+
+```Spark的查询SQL模型:```
+
+SELECT COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , 
COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , COALESCE(SUM(COLUMN_B), 0) AS 
COLUMN_B_B , COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS 
COLUMN_F , COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , COALESCE(SUM(COLUMN_E), 
0) AS COLUMN_E_E , (COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * 
delta AS COLUMN_F , COALESCE(SUM(COLUMN_A), 0) * delta AS COLUMN_G , 
COALESCE(SUM(COLUMN_B), 0) * delta AS COLUMN_H , MT.`TEMP` AS `TEMP` FROM ( 
SELECT `COLUMN_1_A` AS COLUMN_A, `COLUMN_1_E` AS COLUMN_E, `COLUMN_1_B` AS 
COLUMN_B, `COLUMN_1_D` AS COLUMN_D, TABLE_A.`TEMP` AS `TEMP` FROM TABLE_B LEFT 
JOIN ( SELECT `COLUMN_CSI` AS `TEMP2` , CASE WHEN `TYPE_ID` = 2 THEN 
`COLUMN_CSI` END AS `TEMP` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS 
NAME_TEMP FROM DIMENSION_TABLE GROUP BY `COLUMN_CSI`, CASE WHEN `TYPE_ID` = 2 
THEN `COLUMN_CSI` END, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END ) TABLE_A 
ON `COLUMN_CSI` = TABLE_A.`TEMP2` WHERE TABLE_A.NAME_TEMP IS NOT NULL AND 
`TIME` >= A AND `TIME` < B ) MT GROUP BY MT.`TEMP` ORDER BY COLUMN_C DESC LIMIT 
5000
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135432
 
 

 ##
 File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md
 ##
 @@ -0,0 +1,115 @@
+
+
+## CarbonData与商业列存DB性能对比
+
+本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.测试环境对比
+
+查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。
+
+| 集群 | 描述  |
+|  | - 
|
+| 某商业列存DB集群 | 3节点,SSD硬盘|
+| Hadoop集群   | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 |
+
+## 2.查询SQL模型介绍
+
+某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。
+
+```某商业列存DB的查询SQL模型:```
+
+SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS 
COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 
0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS 
COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 
0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * 
delta AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * delta AS COLUMN_G , 
SUM(COALESCE(COLUMN_B, 0)) * delta AS COLUMN_H , MT."TEMP" AS "TEMP", COUNT(1) 
OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , 
COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS 
COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."TEMP" AS 
"TEMP" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "TEMP2" , CASE WHEN 
"TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "TEMP" , CASE WHEN "TYPE_ID" = 2 THEN 
"CLOUMN_NAME" END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE 
WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN 
"CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."TEMP2" WHERE 
TABLE_A.NAME_TEMP IS NOT NULL AND "TIME" < A AND "TIME" >= B GROUP BY 
TABLE_A."TEMP" ) MT GROUP BY MT."TEMP" ORDER BY COLUMN_C DESC
+
+其中一个SUM后面称为一个counter
+
+```Spark的查询SQL模型:```
+
+SELECT COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , 
COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , COALESCE(SUM(COLUMN_B), 0) AS 
COLUMN_B_B , COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS 
COLUMN_F , COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , COALESCE(SUM(COLUMN_E), 
0) AS COLUMN_E_E , (COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * 
delta AS COLUMN_F , COALESCE(SUM(COLUMN_A), 0) * delta AS COLUMN_G , 
COALESCE(SUM(COLUMN_B), 0) * delta AS COLUMN_H , MT.`TEMP` AS `TEMP` FROM ( 
SELECT `COLUMN_1_A` AS COLUMN_A, `COLUMN_1_E` AS COLUMN_E, `COLUMN_1_B` AS 
COLUMN_B, `COLUMN_1_D` AS COLUMN_D, TABLE_A.`TEMP` AS `TEMP` FROM TABLE_B LEFT 
JOIN ( SELECT `COLUMN_CSI` AS `TEMP2` , CASE WHEN `TYPE_ID` = 2 THEN 
`COLUMN_CSI` END AS `TEMP` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS 
NAME_TEMP FROM DIMENSION_TABLE GROUP BY `COLUMN_CSI`, CASE WHEN `TYPE_ID` = 2 
THEN `COLUMN_CSI` END, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END ) TABLE_A 
ON `COLUMN_CSI` = TABLE_A.`TEMP2` WHERE TABLE_A.NAME_TEMP IS NOT NULL AND 
`TIME` >= A AND `TIME` < B ) MT GROUP BY MT.`TEMP` ORDER BY COLUMN_C DESC LIMIT 
5000
+
+## 3.CarbonData主要配置参数
+
+```主要配置```
+
+| CarbonData主要配置   | 参数值 | 描述  
   |
+|  | -- | 
 |
+| carbon.inmemory.record.size  | 48 | 查询每个表需要加载到内存的总行数。
   |
+| carbon.number.of.cores   | 4  | carbon查询过程中并行扫描的线程数。 
  |
+| carbon.number.of.cores.while.loading | 15 | carbon数据加载过程中并行扫描的线程数。   
|
+| carbon.sort.file.buffer.size | 20 | 
在合并排序(读/写)操作时存储每个临时过程文件的所使用的总缓存大小。单位为MB |
+| carbon.sort.size | 50 | 在数据加载操作时,每次被排序的记录数。  
 |
+| Spark主要配置||  
|
+| spark.sql.shuffle.partitions | 70 |  
|
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory 
DB and carbon data query performance comparison doc chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#issuecomment-569846648
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1363/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-30 Thread GitBox
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r362134167
 
 

 ##
 File path: docs/zh_cn/某商业列存DB和CarbonData查询性能对比.md
 ##
 @@ -0,0 +1,111 @@
+
+
+## CarbonData 替换某商业列存DB查询性能对比
+
+本文主要在于给用户呈现CarbonData在替换某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.集群状态对比
+
+| 集群 | 描述  |
+|  | - 
|
+| 某商业列存DB集群 | 3节点,SSD硬盘|
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569723184
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1372/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup 
failure issue if user fails to access table
URL: https://github.com/apache/carbondata/pull/3548#issuecomment-569723185
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1381/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569713060
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1383/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569699604
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1382/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569695697
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1371/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup 
failure issue if user fails to access table
URL: https://github.com/apache/carbondata/pull/3548#issuecomment-569694886
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1370/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569693908
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1362/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (CARBONDATA-3625) Make stage files queryable

2019-12-30 Thread Jacky Li (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li closed CARBONDATA-3625.

Resolution: Won't Fix

> Make stage files queryable
> --
>
> Key: CARBONDATA-3625
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3625
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jacky Li
>Priority: Major
> Fix For: 2.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Stage files are data files written by external applications such as flink. 
> These files are committed but not been loaded to the table. 
> This PR adds a configuration to include them in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] jackylk closed pull request #3519: [CARBONDATA-3625] Make stage input queryable

2019-12-30 Thread GitBox
jackylk closed pull request #3519: [CARBONDATA-3625] Make stage input queryable
URL: https://github.com/apache/carbondata/pull/3519
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
akashrn5 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569683859
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569682742
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1361/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup 
failure issue if user fails to access table
URL: https://github.com/apache/carbondata/pull/3548#issuecomment-569682663
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1360/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is 
not hitting datamap if granularity in query is given case insensitive
URL: https://github.com/apache/carbondata/pull/3541#issuecomment-569680532
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1380/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ajantha-bhat commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
ajantha-bhat commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569678506
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 opened a new pull request #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table

2019-12-30 Thread GitBox
akashrn5 opened a new pull request #3548: [CARBONDATA-3600]Fix the cleanup 
failure issue if user fails to access table
URL: https://github.com/apache/carbondata/pull/3548
 
 
### Why is this PR needed?
the code to clean up the stale datamap folders during session 
initialization causes problem, if some other user tries to access the datamap 
table, we might get permission exception.

### What changes were proposed in this PR?
   We need to clean up only if the table does not exists in hive metastore, but 
the schema exists, we get exception incase the table exists but other user, so 
in such case, no need to go ahead, we can just catch exception and go ahead. 
Same changes are proposed here.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No (Not required as tested in cluster and fix is for clean up issue)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is 
not hitting datamap if granularity in query is given case insensitive
URL: https://github.com/apache/carbondata/pull/3541#issuecomment-569677750
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1369/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569676801
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1379/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569674951
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1368/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569669234
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1378/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is 
not hitting datamap if granularity in query is given case insensitive
URL: https://github.com/apache/carbondata/pull/3541#issuecomment-569666573
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1359/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569665240
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1366/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in 
case of multiple data files in one segment
URL: https://github.com/apache/carbondata/pull/3474#issuecomment-569664577
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1367/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 closed pull request #3547: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table

2019-12-30 Thread GitBox
akashrn5 closed pull request #3547: [CARBONDATA-3600]Fix the cleanup failure 
issue if user fails to access table
URL: https://github.com/apache/carbondata/pull/3547
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3600) Fix creating mv timeseries UDF column as partition column

2019-12-30 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3600:
-
Description: 
Problem:
Issue 1:
When trying to create datamap with partition column in timeseries udf, throws 
Exception.
Issue 2:
When Create datamap was in progress, Jdbc application is killed. When 
restarting, datamap table not found exception is thrown.

> Fix creating mv timeseries UDF column as partition column
> -
>
> Key: CARBONDATA-3600
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3600
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Problem:
> Issue 1:
> When trying to create datamap with partition column in timeseries udf, throws 
> Exception.
> Issue 2:
> When Create datamap was in progress, Jdbc application is killed. When 
> restarting, datamap table not found exception is thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akashrn5 opened a new pull request #3547: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table

2019-12-30 Thread GitBox
akashrn5 opened a new pull request #3547: [CARBONDATA-3600]Fix the cleanup 
failure issue if user fails to access table
URL: https://github.com/apache/carbondata/pull/3547
 
 
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No (Not required as tested in cluster and fix is for clean up issue)
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569658559
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1358/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569654186
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1357/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in 
case of multiple data files in one segment
URL: https://github.com/apache/carbondata/pull/3474#issuecomment-569651022
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1377/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with 
trailing task
URL: https://github.com/apache/carbondata/pull/3514#issuecomment-569650060
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1355/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in 
case of multiple data files in one segment
URL: https://github.com/apache/carbondata/pull/3474#issuecomment-569649738
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1356/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569649247
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1363/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD 
and CCD scenarios
URL: https://github.com/apache/carbondata/pull/3483#issuecomment-569645767
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1376/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with 
trailing task
URL: https://github.com/apache/carbondata/pull/3514#issuecomment-569644106
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1365/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD 
and CCD scenarios
URL: https://github.com/apache/carbondata/pull/3483#issuecomment-569642383
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1364/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569641629
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1374/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD 
and CCD scenarios
URL: https://github.com/apache/carbondata/pull/3483#issuecomment-569630481
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1354/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
jackylk commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed 
global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r361940701
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
 ##
 @@ -374,17 +365,32 @@ class CarbonTableCompactor(carbonLoadModel: 
CarbonLoadModel,
   sparkSession: SparkSession,
   carbonLoadModel: CarbonLoadModel,
   carbonMergerMapping: CarbonMergerMapping): Array[(String, Boolean)] = {
+val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
 val splits = splitsOfSegments(
   sparkSession,
-  carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
+  carbonTable,
   carbonMergerMapping.validSegments)
-val dataFrame = DataLoadProcessBuilderOnSpark.createInputDataFrame(
-  sparkSession,
-  carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
-  splits.asScala)
+val dataFrame = try {
+  // segments to be compacted are set in the threadset() in carbon 
session, and unset in the end
 
 Review comment:
   please add it in the comment in code


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3514: [FAQ]add faq for how to deal with trailing task

2019-12-30 Thread GitBox
asfgit closed pull request #3514: [FAQ]add faq for how to deal with trailing 
task
URL: https://github.com/apache/carbondata/pull/3514
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on issue #3514: [FAQ]add faq for how to deal with trailing task

2019-12-30 Thread GitBox
jackylk commented on issue #3514: [FAQ]add faq for how to deal with trailing 
task
URL: https://github.com/apache/carbondata/pull/3514#issuecomment-569627377
 
 
   I canceled CI for this PR since it is for document modification
   LGTM, merging this PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with 
trailing task
URL: https://github.com/apache/carbondata/pull/3514#issuecomment-569627248
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1375/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569626224
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1353/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-30 Thread GitBox
asfgit closed pull request #3502: [CARBONATA-3605] Remove global dictionary 
feature
URL: https://github.com/apache/carbondata/pull/3502
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-30 Thread GitBox
QiangCai commented on issue #3502: [CARBONATA-3605] Remove global dictionary 
feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569618332
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569617569
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1362/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569617458
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1373/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-30 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569616794
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1352/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (CARBONDATA-3631) StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map

2019-12-30 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005201#comment-17005201
 ] 

Zhichao  Zhang commented on CARBONDATA-3631:


[~shenhong] please raise a pr to fix this, thanks.

> StringIndexOutOfBoundsException When Inserting Select From a Parquet Table 
> with Empty array/map
> ---
>
> Key: CARBONDATA-3631
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3631
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>
> sql("insert into datatype_array_parquet values(array())")
> sql("insert into datatype_array_carbondata select f from 
> datatype_array_parquet")
>  
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:935)
> at java.lang.StringBuilder.substring(StringBuilder.java:76)
> at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166)
> at 
> org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:77)
> at 
> org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed 
global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r361927270
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala
 ##
 @@ -442,7 +442,8 @@ object DataLoadProcessBuilderOnSpark {
   .map { row =>
 new GenericRow(row.getData.asInstanceOf[Array[Any]])
 
 Review comment:
   Due to the confliction between data types that is - long in carbonScanRDD 
and Timestamp in schema, we cannot use the existing API with rdd.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-30 Thread GitBox
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed 
global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r361925792
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
 ##
 @@ -374,17 +365,32 @@ class CarbonTableCompactor(carbonLoadModel: 
CarbonLoadModel,
   sparkSession: SparkSession,
   carbonLoadModel: CarbonLoadModel,
   carbonMergerMapping: CarbonMergerMapping): Array[(String, Boolean)] = {
+val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
 val splits = splitsOfSegments(
   sparkSession,
-  carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
+  carbonTable,
   carbonMergerMapping.validSegments)
-val dataFrame = DataLoadProcessBuilderOnSpark.createInputDataFrame(
-  sparkSession,
-  carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
-  splits.asScala)
+val dataFrame = try {
+  // segments to be compacted are set in the threadset() in carbon 
session, and unset in the end
 
 Review comment:
   During custom compaction it might so happen that all the segments might be 
taken into consideration. To avoid this, segments to be considered and set, are 
explicitly mentioned here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (CARBONDATA-3631) StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map

2019-12-30 Thread Hong Shen (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005190#comment-17005190
 ] 

Hong Shen commented on CARBONDATA-3631:
---

I have fix it in my local branch, if you need, I can add a patch to fix it.

> StringIndexOutOfBoundsException When Inserting Select From a Parquet Table 
> with Empty array/map
> ---
>
> Key: CARBONDATA-3631
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3631
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>
> sql("insert into datatype_array_parquet values(array())")
> sql("insert into datatype_array_carbondata select f from 
> datatype_array_parquet")
>  
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:935)
> at java.lang.StringBuilder.substring(StringBuilder.java:76)
> at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166)
> at 
> org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:77)
> at 
> org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)