[GitHub] [carbondata] CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
CarbonDataQA1 commented on issue #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000 URL: https://github.com/apache/carbondata/pull/3546#issuecomment-569880578 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1372/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …
CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column … URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569880512 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1371/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3550: [WIP] Remove global dictionary in query
CarbonDataQA1 commented on issue #3550: [WIP] Remove global dictionary in query URL: https://github.com/apache/carbondata/pull/3550#issuecomment-569879187 Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1370/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …
jackylk commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column … URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569872897 please modify the PR description according to the template This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …
jackylk commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column … URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569872912 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (CARBONDATA-3631) StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map
[ https://issues.apache.org/jira/browse/CARBONDATA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-3631. -- Resolution: Fixed > StringIndexOutOfBoundsException When Inserting Select From a Parquet Table > with Empty array/map > --- > > Key: CARBONDATA-3631 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3631 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("insert into datatype_array_parquet values(array())") > sql("insert into datatype_array_carbondata select f from > datatype_array_parquet") > > {code:java} > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:935) > at java.lang.StringBuilder.substring(StringBuilder.java:76) > at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166) > at > org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:77) > at > org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…
asfgit closed pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele… URL: https://github.com/apache/carbondata/pull/3545 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk opened a new pull request #3550: [WIP] Remove global dictionary in query
jackylk opened a new pull request #3550: [WIP] Remove global dictionary in query URL: https://github.com/apache/carbondata/pull/3550 ### Why is this PR needed? Global dictionary feature is deprecated, it should be removed in query flow ### What changes were proposed in this PR? Global dictionary related analyzer rules and late decode optimizer strategy is removed Global dictionary related filter processing is removed in read flow in carbon-core module ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…
jackylk commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele… URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569872561 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000 URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154321 ## File path: streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala ## @@ -50,7 +51,7 @@ object FieldConverter { value match { case s: String => if (!isVarcharType && !isComplexType && s.length > CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) { - throw new Exception("Dataload failed, String length cannot exceed " + + throw new Exception( exceedErrorMsg + Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000 URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154286 ## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala ## @@ -297,7 +297,8 @@ class CarbonBlockDistinctValuesCombineRDD( val complexDelimiters = new util.ArrayList[String] model.delimiters.foreach(x => complexDelimiters.add(x)) for (i <- 0 until dimNum) { - dimensionParsers(i).parseString(CarbonScalaUtil.getString(row.get(i), +dimensionParsers(i).parseString(CarbonScalaUtil.getString(row, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000 URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154300 ## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ## @@ -60,17 +60,27 @@ object CarbonScalaUtil { private val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) - def getString(value: Any, + def getString(row: Row, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000
shenh062326 commented on a change in pull request #3546: [CARBONDATA-3642] Add column idx in error msg when string length exceed 32000 URL: https://github.com/apache/carbondata/pull/3546#discussion_r362154198 ## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ## @@ -60,17 +60,27 @@ object CarbonScalaUtil { private val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) - def getString(value: Any, + def getString(row: Row, + idx: Int, serializationNullFormat: String, complexDelimiters: util.ArrayList[String], timeStampFormat: SimpleDateFormat, dateFormat: SimpleDateFormat, isVarcharType: Boolean = false, isComplexType: Boolean = false, level: Int = 0): String = { -FieldConverter.objectToString(value, serializationNullFormat, complexDelimiters, - timeStampFormat, dateFormat, isVarcharType = isVarcharType, isComplexType = isComplexType, - level) +try { + FieldConverter.objectToString(row.get(idx), serializationNullFormat, complexDelimiters, +timeStampFormat, dateFormat, isVarcharType = isVarcharType, isComplexType = isComplexType, +level) +} catch { + case e: Exception => +if (e.getMessage.startsWith(FieldConverter.exceedErrorMsg)) { + throw new Exception("Column idx " + idx + " too long", e) Review comment: I want to add column idx into the error message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …
CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column … URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569869688 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1389/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …
CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column … URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569866482 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1379/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column …
CarbonDataQA1 commented on issue #3549: [Carbondata-3643] Insert array('')/array() into Struct column … URL: https://github.com/apache/carbondata/pull/3549#issuecomment-569861903 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1369/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] marchpure opened a new pull request #3549: [Carbondata-3643] Insert array('')/array() into Struct column …
marchpure opened a new pull request #3549: [Carbondata-3643] Insert array('')/array() into Struct column … URL: https://github.com/apache/carbondata/pull/3549 …will result in array(null), which is inconsist with Parquet Modification reason: Result is incorrect when Inserting Select From a Parquet Table with a Struct with array('')/array(, The result shouldn't be array(null), while parquet results in array('') or array(). Modification content: When the input value is Struct(""), the StructParserImpl handle the EMPTY STRING ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3643: Description: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} was: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > > {code:java} > // > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") sql("insert into datatype_struct_carbondata select * > from datatype_struct_parquet") > checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * > FROM datatype_struct_parquet")) > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3643: Description: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} was: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > > {code:java} > // > checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * > FROM datatype_struct_parquet")) > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3643: Description: {code:java} // sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} was: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * FROM datatype_struct_parquet")) !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > > {code:java} > // > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * > FROM datatype_struct_parquet")) > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
[ https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingjun Hao updated CARBONDATA-3643: Fix Version/s: 2.0.0 Affects Version/s: 2.0.0 1.6.1 Description: sql("create table datatype_struct_parquet(price struct>) stored as parquet") sql("insert into table datatype_struct_parquet values(named_struct('b', array('')))") sql("create table datatype_struct_carbondata(price struct>) stored as carbondata") sql("insert into datatype_struct_carbondata select * from datatype_struct_parquet") {code:java} // !== Correct Answer - 1 == == Spark Answer - 1 == ![[WrappedArray()]] [[WrappedArray(null)]] {code} > Insert array('')/array() into Struct column will result in > array(null), which is inconsist with Parquet > -- > > Key: CARBONDATA-3643 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("create table datatype_struct_parquet(price struct>) > stored as parquet") > sql("insert into table datatype_struct_parquet values(named_struct('b', > array('')))") > sql("create table datatype_struct_carbondata(price struct>) > stored as carbondata") > sql("insert into datatype_struct_carbondata select * from > datatype_struct_parquet") > > {code:java} > // > !== Correct Answer - 1 == == Spark Answer - 1 == > ![[WrappedArray()]] [[WrappedArray(null)]] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet
Xingjun Hao created CARBONDATA-3643: --- Summary: Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet Key: CARBONDATA-3643 URL: https://issues.apache.org/jira/browse/CARBONDATA-3643 Project: CarbonData Issue Type: Bug Reporter: Xingjun Hao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…
CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele… URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569858679 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1387/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#issuecomment-569857896 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1385/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…
CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele… URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569857629 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1377/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#issuecomment-569856676 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1374/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569854272 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1378/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569854185 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1388/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…
CarbonDataQA1 commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele… URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569852276 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1367/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569851775 Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1368/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#issuecomment-569851035 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1364/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] marchpure commented on a change in pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…
marchpure commented on a change in pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele… URL: https://github.com/apache/carbondata/pull/3545#discussion_r362137262 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -300,6 +300,9 @@ private CarbonCommonConstants() { public static final String CARBON_SKIP_EMPTY_LINE_DEFAULT = "false"; + + public static final String EMPTY_DATA_RETURN = "!EMPTY_DATA_RETURN!"; Review comment: Modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] marchpure commented on a change in pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…
marchpure commented on a change in pull request #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele… URL: https://github.com/apache/carbondata/pull/3545#discussion_r362137258 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/StructParserImpl.java ## @@ -59,6 +59,12 @@ public StructObject parse(Object data) { } return new StructObject(array); } + } else if (value.isEmpty()) { +Object[] array = new Object[1]; Review comment: Modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] marchpure commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele…
marchpure commented on issue #3545: [Carbondata-3631] StringIndexOutOfBoundsException When Inserting Sele… URL: https://github.com/apache/carbondata/pull/3545#issuecomment-569850428 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135805 ## File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md ## @@ -0,0 +1,115 @@ + + +## CarbonData与商业列存DB性能对比 + +本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。 + + + + + +## 1.测试环境对比 + +查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。 + +| 集群 | 描述 | +| | - | +| 某商业列存DB集群 | 3节点,SSD硬盘| +| Hadoop集群 | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 | + +## 2.查询SQL模型介绍 Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135688 ## File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md ## @@ -0,0 +1,115 @@ + + +## CarbonData与商业列存DB性能对比 + +本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。 + + + + + +## 1.测试环境对比 + +查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。 + +| 集群 | 描述 | +| | - | +| 某商业列存DB集群 | 3节点,SSD硬盘| +| Hadoop集群 | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 | + +## 2.查询SQL模型介绍 + +某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。 + +```某商业列存DB的查询SQL模型:``` + +SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * delta AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * delta AS COLUMN_G , SUM(COALESCE(COLUMN_B, 0)) * delta AS COLUMN_H , MT."TEMP" AS "TEMP", COUNT(1) OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."TEMP" AS "TEMP" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "TEMP2" , CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "TEMP" , CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."TEMP2" WHERE TABLE_A.NAME_TEMP IS NOT NULL AND "TIME" < A AND "TIME" >= B GROUP BY TABLE_A."TEMP" ) MT GROUP BY MT."TEMP" ORDER BY COLUMN_C DESC Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135714 ## File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md ## @@ -0,0 +1,115 @@ + + +## CarbonData与商业列存DB性能对比 + +本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。 + + + + + +## 1.测试环境对比 + +查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。 + +| 集群 | 描述 | +| | - | +| 某商业列存DB集群 | 3节点,SSD硬盘| +| Hadoop集群 | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 | + +## 2.查询SQL模型介绍 + +某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。 Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135553 ## File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md ## @@ -0,0 +1,115 @@ + + +## CarbonData与商业列存DB性能对比 + +本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。 + + + + + +## 1.测试环境对比 + +查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。 + +| 集群 | 描述 | +| | - | +| 某商业列存DB集群 | 3节点,SSD硬盘| +| Hadoop集群 | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 | + +## 2.查询SQL模型介绍 + +某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。 + +```某商业列存DB的查询SQL模型:``` + +SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * delta AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * delta AS COLUMN_G , SUM(COALESCE(COLUMN_B, 0)) * delta AS COLUMN_H , MT."TEMP" AS "TEMP", COUNT(1) OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."TEMP" AS "TEMP" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "TEMP2" , CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "TEMP" , CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."TEMP2" WHERE TABLE_A.NAME_TEMP IS NOT NULL AND "TIME" < A AND "TIME" >= B GROUP BY TABLE_A."TEMP" ) MT GROUP BY MT."TEMP" ORDER BY COLUMN_C DESC + +其中一个SUM后面称为一个counter + +```Spark的查询SQL模型:``` + +SELECT COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , COALESCE(SUM(COLUMN_B), 0) AS COLUMN_B_B , COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS COLUMN_F , COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , COALESCE(SUM(COLUMN_E), 0) AS COLUMN_E_E , (COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * delta AS COLUMN_F , COALESCE(SUM(COLUMN_A), 0) * delta AS COLUMN_G , COALESCE(SUM(COLUMN_B), 0) * delta AS COLUMN_H , MT.`TEMP` AS `TEMP` FROM ( SELECT `COLUMN_1_A` AS COLUMN_A, `COLUMN_1_E` AS COLUMN_E, `COLUMN_1_B` AS COLUMN_B, `COLUMN_1_D` AS COLUMN_D, TABLE_A.`TEMP` AS `TEMP` FROM TABLE_B LEFT JOIN ( SELECT `COLUMN_CSI` AS `TEMP2` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END AS `TEMP` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY `COLUMN_CSI`, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END ) TABLE_A ON `COLUMN_CSI` = TABLE_A.`TEMP2` WHERE TABLE_A.NAME_TEMP IS NOT NULL AND `TIME` >= A AND `TIME` < B ) MT GROUP BY MT.`TEMP` ORDER BY COLUMN_C DESC LIMIT 5000 Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#discussion_r362135432 ## File path: docs/zh_cn/CarbonData与商业列存DB性能对比.md ## @@ -0,0 +1,115 @@ + + +## CarbonData与商业列存DB性能对比 + +本文主要在于给用户呈现CarbonData在对比某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。 + + + + + +## 1.测试环境对比 + +查询时某商业列存DB一台查询节点,配置SSD硬盘。CarbonData6个DataNode,配置STAT硬盘,但是查询队列设置1/6的资源,等同于1台商业DB服务器对比1台CarbonData服务器的查询性能。同时CarbonData使用的服务器的磁盘是STAT盘,成本比某商业列存DB服务器低。 + +| 集群 | 描述 | +| | - | +| 某商业列存DB集群 | 3节点,SSD硬盘| +| Hadoop集群 | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 | + +## 2.查询SQL模型介绍 + +某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。 + +```某商业列存DB的查询SQL模型:``` + +SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * delta AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * delta AS COLUMN_G , SUM(COALESCE(COLUMN_B, 0)) * delta AS COLUMN_H , MT."TEMP" AS "TEMP", COUNT(1) OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."TEMP" AS "TEMP" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "TEMP2" , CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "TEMP" , CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."TEMP2" WHERE TABLE_A.NAME_TEMP IS NOT NULL AND "TIME" < A AND "TIME" >= B GROUP BY TABLE_A."TEMP" ) MT GROUP BY MT."TEMP" ORDER BY COLUMN_C DESC + +其中一个SUM后面称为一个counter + +```Spark的查询SQL模型:``` + +SELECT COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , COALESCE(SUM(COLUMN_B), 0) AS COLUMN_B_B , COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS COLUMN_F , COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , COALESCE(SUM(COLUMN_E), 0) AS COLUMN_E_E , (COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * delta AS COLUMN_F , COALESCE(SUM(COLUMN_A), 0) * delta AS COLUMN_G , COALESCE(SUM(COLUMN_B), 0) * delta AS COLUMN_H , MT.`TEMP` AS `TEMP` FROM ( SELECT `COLUMN_1_A` AS COLUMN_A, `COLUMN_1_E` AS COLUMN_E, `COLUMN_1_B` AS COLUMN_B, `COLUMN_1_D` AS COLUMN_D, TABLE_A.`TEMP` AS `TEMP` FROM TABLE_B LEFT JOIN ( SELECT `COLUMN_CSI` AS `TEMP2` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END AS `TEMP` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY `COLUMN_CSI`, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END ) TABLE_A ON `COLUMN_CSI` = TABLE_A.`TEMP2` WHERE TABLE_A.NAME_TEMP IS NOT NULL AND `TIME` >= A AND `TIME` < B ) MT GROUP BY MT.`TEMP` ORDER BY COLUMN_C DESC LIMIT 5000 + +## 3.CarbonData主要配置参数 + +```主要配置``` + +| CarbonData主要配置 | 参数值 | 描述 | +| | -- | | +| carbon.inmemory.record.size | 48 | 查询每个表需要加载到内存的总行数。 | +| carbon.number.of.cores | 4 | carbon查询过程中并行扫描的线程数。 | +| carbon.number.of.cores.while.loading | 15 | carbon数据加载过程中并行扫描的线程数。 | +| carbon.sort.file.buffer.size | 20 | 在合并排序(读/写)操作时存储每个临时过程文件的所使用的总缓存大小。单位为MB | +| carbon.sort.size | 50 | 在数据加载操作时,每次被排序的记录数。 | +| Spark主要配置|| | +| spark.sql.shuffle.partitions | 70 | | Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#issuecomment-569846648 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1363/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#discussion_r362134167 ## File path: docs/zh_cn/某商业列存DB和CarbonData查询性能对比.md ## @@ -0,0 +1,111 @@ + + +## CarbonData 替换某商业列存DB查询性能对比 + +本文主要在于给用户呈现CarbonData在替换某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。 + + + + + +## 1.集群状态对比 + +| 集群 | 描述 | +| | - | +| 某商业列存DB集群 | 3节点,SSD硬盘| Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569723184 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1372/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table
CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table URL: https://github.com/apache/carbondata/pull/3548#issuecomment-569723185 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1381/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569713060 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1383/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569699604 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1382/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569695697 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1371/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table
CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table URL: https://github.com/apache/carbondata/pull/3548#issuecomment-569694886 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1370/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569693908 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1362/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (CARBONDATA-3625) Make stage files queryable
[ https://issues.apache.org/jira/browse/CARBONDATA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li closed CARBONDATA-3625. Resolution: Won't Fix > Make stage files queryable > -- > > Key: CARBONDATA-3625 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3625 > Project: CarbonData > Issue Type: New Feature >Reporter: Jacky Li >Priority: Major > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Stage files are data files written by external applications such as flink. > These files are committed but not been loaded to the table. > This PR adds a configuration to include them in the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] jackylk closed pull request #3519: [CARBONDATA-3625] Make stage input queryable
jackylk closed pull request #3519: [CARBONDATA-3625] Make stage input queryable URL: https://github.com/apache/carbondata/pull/3519 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
akashrn5 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569683859 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569682742 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1361/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table
CarbonDataQA1 commented on issue #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table URL: https://github.com/apache/carbondata/pull/3548#issuecomment-569682663 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1360/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive
CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive URL: https://github.com/apache/carbondata/pull/3541#issuecomment-569680532 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1380/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ajantha-bhat commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
ajantha-bhat commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569678506 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 opened a new pull request #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table
akashrn5 opened a new pull request #3548: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table URL: https://github.com/apache/carbondata/pull/3548 ### Why is this PR needed? the code to clean up the stale datamap folders during session initialization causes problem, if some other user tries to access the datamap table, we might get permission exception. ### What changes were proposed in this PR? We need to clean up only if the table does not exists in hive metastore, but the schema exists, we get exception incase the table exists but other user, so in such case, no need to go ahead, we can just catch exception and go ahead. Same changes are proposed here. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No (Not required as tested in cluster and fix is for clean up issue) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive
CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive URL: https://github.com/apache/carbondata/pull/3541#issuecomment-569677750 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1369/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569676801 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1379/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569674951 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1368/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569669234 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1378/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive
CarbonDataQA1 commented on issue #3541: [CARBONDATA-3636]Timeseries query is not hitting datamap if granularity in query is given case insensitive URL: https://github.com/apache/carbondata/pull/3541#issuecomment-569666573 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1359/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569665240 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1366/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment
CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment URL: https://github.com/apache/carbondata/pull/3474#issuecomment-569664577 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1367/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akashrn5 closed pull request #3547: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table
akashrn5 closed pull request #3547: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table URL: https://github.com/apache/carbondata/pull/3547 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3600) Fix creating mv timeseries UDF column as partition column
[ https://issues.apache.org/jira/browse/CARBONDATA-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Indhumathi Muthumurugesh updated CARBONDATA-3600: - Description: Problem: Issue 1: When trying to create datamap with partition column in timeseries udf, throws Exception. Issue 2: When Create datamap was in progress, Jdbc application is killed. When restarting, datamap table not found exception is thrown. > Fix creating mv timeseries UDF column as partition column > - > > Key: CARBONDATA-3600 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3600 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Priority: Minor > Fix For: 2.0.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Problem: > Issue 1: > When trying to create datamap with partition column in timeseries udf, throws > Exception. > Issue 2: > When Create datamap was in progress, Jdbc application is killed. When > restarting, datamap table not found exception is thrown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akashrn5 opened a new pull request #3547: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table
akashrn5 opened a new pull request #3547: [CARBONDATA-3600]Fix the cleanup failure issue if user fails to access table URL: https://github.com/apache/carbondata/pull/3547 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No (Not required as tested in cluster and fix is for clean up issue) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569658559 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1358/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569654186 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1357/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment
CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment URL: https://github.com/apache/carbondata/pull/3474#issuecomment-569651022 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1377/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task
CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task URL: https://github.com/apache/carbondata/pull/3514#issuecomment-569650060 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1355/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment
CarbonDataQA1 commented on issue #3474: [CARBONDATA-3592] Fix query on bloom in case of multiple data files in one segment URL: https://github.com/apache/carbondata/pull/3474#issuecomment-569649738 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1356/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569649247 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1363/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios
CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios URL: https://github.com/apache/carbondata/pull/3483#issuecomment-569645767 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1376/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task
CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task URL: https://github.com/apache/carbondata/pull/3514#issuecomment-569644106 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1365/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios
CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios URL: https://github.com/apache/carbondata/pull/3483#issuecomment-569642383 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1364/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569641629 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1374/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios
CarbonDataQA1 commented on issue #3483: [CARBONDATA-3597] Support Merge for SCD and CCD scenarios URL: https://github.com/apache/carbondata/pull/3483#issuecomment-569630481 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1354/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
jackylk commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#discussion_r361940701 ## File path: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ## @@ -374,17 +365,32 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, sparkSession: SparkSession, carbonLoadModel: CarbonLoadModel, carbonMergerMapping: CarbonMergerMapping): Array[(String, Boolean)] = { +val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable val splits = splitsOfSegments( sparkSession, - carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable, + carbonTable, carbonMergerMapping.validSegments) -val dataFrame = DataLoadProcessBuilderOnSpark.createInputDataFrame( - sparkSession, - carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable, - splits.asScala) +val dataFrame = try { + // segments to be compacted are set in the threadset() in carbon session, and unset in the end Review comment: please add it in the comment in code This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] asfgit closed pull request #3514: [FAQ]add faq for how to deal with trailing task
asfgit closed pull request #3514: [FAQ]add faq for how to deal with trailing task URL: https://github.com/apache/carbondata/pull/3514 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3514: [FAQ]add faq for how to deal with trailing task
jackylk commented on issue #3514: [FAQ]add faq for how to deal with trailing task URL: https://github.com/apache/carbondata/pull/3514#issuecomment-569627377 I canceled CI for this PR since it is for document modification LGTM, merging this PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task
CarbonDataQA1 commented on issue #3514: [FAQ]add faq for how to deal with trailing task URL: https://github.com/apache/carbondata/pull/3514#issuecomment-569627248 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1375/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569626224 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1353/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] asfgit closed pull request #3502: [CARBONATA-3605] Remove global dictionary feature
asfgit closed pull request #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] QiangCai commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature
QiangCai commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569618332 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569617569 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1362/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569617458 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1373/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert. URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569616794 Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1352/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (CARBONDATA-3631) StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map
[ https://issues.apache.org/jira/browse/CARBONDATA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005201#comment-17005201 ] Zhichao Zhang commented on CARBONDATA-3631: [~shenhong] please raise a pr to fix this, thanks. > StringIndexOutOfBoundsException When Inserting Select From a Parquet Table > with Empty array/map > --- > > Key: CARBONDATA-3631 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3631 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("insert into datatype_array_parquet values(array())") > sql("insert into datatype_array_carbondata select f from > datatype_array_parquet") > > {code:java} > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:935) > at java.lang.StringBuilder.substring(StringBuilder.java:76) > at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166) > at > org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:77) > at > org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#discussion_r361927270 ## File path: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ## @@ -442,7 +442,8 @@ object DataLoadProcessBuilderOnSpark { .map { row => new GenericRow(row.getData.asInstanceOf[Array[Any]]) Review comment: Due to the confliction between data types that is - long in carbonScanRDD and Timestamp in schema, we cannot use the existing API with rdd. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column URL: https://github.com/apache/carbondata/pull/3515#discussion_r361925792 ## File path: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ## @@ -374,17 +365,32 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, sparkSession: SparkSession, carbonLoadModel: CarbonLoadModel, carbonMergerMapping: CarbonMergerMapping): Array[(String, Boolean)] = { +val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable val splits = splitsOfSegments( sparkSession, - carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable, + carbonTable, carbonMergerMapping.validSegments) -val dataFrame = DataLoadProcessBuilderOnSpark.createInputDataFrame( - sparkSession, - carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable, - splits.asScala) +val dataFrame = try { + // segments to be compacted are set in the threadset() in carbon session, and unset in the end Review comment: During custom compaction it might so happen that all the segments might be taken into consideration. To avoid this, segments to be considered and set, are explicitly mentioned here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (CARBONDATA-3631) StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map
[ https://issues.apache.org/jira/browse/CARBONDATA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005190#comment-17005190 ] Hong Shen commented on CARBONDATA-3631: --- I have fix it in my local branch, if you need, I can add a patch to fix it. > StringIndexOutOfBoundsException When Inserting Select From a Parquet Table > with Empty array/map > --- > > Key: CARBONDATA-3631 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3631 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.6.1, 2.0.0 >Reporter: Xingjun Hao >Priority: Minor > Fix For: 2.0.0 > > > sql("insert into datatype_array_parquet values(array())") > sql("insert into datatype_array_carbondata select f from > datatype_array_parquet") > > {code:java} > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:935) > at java.lang.StringBuilder.substring(StringBuilder.java:76) > at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166) > at > org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:77) > at > org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)