[jira] [Updated] (CARBONDATA-3610) Drop old timeseries datamap feature
[ https://issues.apache.org/jira/browse/CARBONDATA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3610: - Fix Version/s: 2.0.0 > Drop old timeseries datamap feature > --- > > Key: CARBONDATA-3610 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3610 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3610) Drop old timeseries datamap feature
[ https://issues.apache.org/jira/browse/CARBONDATA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3610. -- Resolution: Fixed > Drop old timeseries datamap feature > --- > > Key: CARBONDATA-3610 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3610 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3609) Drop preaggregate datamap feature
[ https://issues.apache.org/jira/browse/CARBONDATA-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3609. -- Resolution: Fixed > Drop preaggregate datamap feature > - > > Key: CARBONDATA-3609 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3609 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Priority: Major > Fix For: 2.0.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] Indhumathi27 opened a new pull request #3530: [WIP]Fix Select query failure on aggregation of same column on MV
Indhumathi27 opened a new pull request #3530: [WIP]Fix Select query failure on aggregation of same column on MV URL: https://github.com/apache/carbondata/pull/3530 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] asfgit closed pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
asfgit closed pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ajantha-bhat commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
ajantha-bhat commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#issuecomment-568681736 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (CARBONDATA-3582) Support backup table status file before overwrite
[ https://issues.apache.org/jira/browse/CARBONDATA-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3582. -- Resolution: Fixed > Support backup table status file before overwrite > - > > Key: CARBONDATA-3582 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3582 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Priority: Major > Fix For: 2.0.0 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > When overwriting table status file, if process crashed, table status file > will be in corrupted state. This can happen in an unstable environment, like > in the cloud. To prevent the table corruption, user can enable a newly added > CarbonProperty to enable backup of the table status before overwriting it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3459: [CARBONDATA-3582] support table status file backup
asfgit closed pull request #3459: [CARBONDATA-3582] support table status file backup URL: https://github.com/apache/carbondata/pull/3459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ajantha-bhat commented on issue #3459: [CARBONDATA-3582] support table status file backup
ajantha-bhat commented on issue #3459: [CARBONDATA-3582] support table status file backup URL: https://github.com/apache/carbondata/pull/3459#issuecomment-568675248 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#issuecomment-568675312 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1280/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3459: [CARBONDATA-3582] support table status file backup
jackylk commented on a change in pull request #3459: [CARBONDATA-3582] support table status file backup URL: https://github.com/apache/carbondata/pull/3459#discussion_r361083964 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -525,45 +540,78 @@ private static Integer compareDateValues(Long loadValue, Long userValue) { } /** - * writes load details into a given file at @param dataLoadLocation + * Backup the table status file as 'tablestatus.backup' in the same path * - * @param dataLoadLocation - * @param listOfLoadFolderDetailsArray - * @throws IOException + * @param tableStatusPath table status file path */ - public static void writeLoadDetailsIntoFile(String dataLoadLocation, + private static void backupTableStatus(String tableStatusPath) throws IOException { +CarbonFile file = FileFactory.getCarbonFile(tableStatusPath); +if (file.exists()) { + String backupPath = tableStatusPath + ".backup"; + String currentContent = readFileAsString(tableStatusPath); + if (currentContent != null) { +writeStringIntoFile(backupPath, currentContent); + } +} + } + + /** + * writes load details to specified path + * + * @param tableStatusPath path of the table status file + * @param listOfLoadFolderDetailsArray segment metadata + * @throws IOException if IO errors + */ + public static void writeLoadDetailsIntoFile( + String tableStatusPath, LoadMetadataDetails[] listOfLoadFolderDetailsArray) throws IOException { -AtomicFileOperations fileWrite = -AtomicFileOperationFactory.getAtomicFileOperations(dataLoadLocation); +// When overwriting table status file, if process crashed, table status file +// will be in corrupted state. This can happen in an unstable environment, +// like in the cloud. To prevent the table corruption, user can enable following +// property to enable backup of the table status before overwriting it. +if (tableStatusPath.endsWith(CarbonTablePath.TABLE_STATUS_FILE) && +CarbonProperties.isEnableTableStatusBackup()) { + backupTableStatus(tableStatusPath); +} +String content = new Gson().toJson(listOfLoadFolderDetailsArray); +mockForTest(); +// make the table status file smaller by removing fields that are default value +for (LoadMetadataDetails loadMetadataDetails : listOfLoadFolderDetailsArray) { + loadMetadataDetails.removeUnnecessaryField(); +} +// If process crashed during following write, table status file need to be +// manually recovered. +writeStringIntoFile(tableStatusPath, content); + } + + // a dummy func for mocking in testcase, which simulates IOException + private static void mockForTest() throws IOException { Review comment: There is other place we we add util for test like TestUtil.java in SDK module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#issuecomment-568674344 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1269/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#issuecomment-568663426 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1259/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3529: [CARBONDATA-3628] Support alter hive table add complex column type
CarbonDataQA1 commented on issue #3529: [CARBONDATA-3628] Support alter hive table add complex column type URL: https://github.com/apache/carbondata/pull/3529#issuecomment-568658809 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1258/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3529: [CARBONDATA-3628] Support alter hive table add complex column type
CarbonDataQA1 commented on issue #3529: [CARBONDATA-3628] Support alter hive table add complex column type URL: https://github.com/apache/carbondata/pull/3529#issuecomment-568654019 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1268/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3529: [CARBONDATA-3628] Support alter hive table add complex column type
CarbonDataQA1 commented on issue #3529: [CARBONDATA-3628] Support alter hive table add complex column type URL: https://github.com/apache/carbondata/pull/3529#issuecomment-568652322 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1279/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#issuecomment-568642477 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1267/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#issuecomment-568642476 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1278/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
CarbonDataQA1 commented on issue #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#issuecomment-568642475 Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1257/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] IceMimosa opened a new pull request #3529: [CARBONDATA-3628] Support alter hive table add complex column type
IceMimosa opened a new pull request #3529: [CARBONDATA-3628] Support alter hive table add complex column type URL: https://github.com/apache/carbondata/pull/3529 Tips: * Complex type only supports default, `see DataTypeUtil#valueOf` * Fix compile error Be sure to do all of the following checklists to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach the test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
jackylk commented on a change in pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#discussion_r361052278 ## File path: README.md ## @@ -58,8 +58,6 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com * [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/datamap/datamap-management.md) * [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/bloomfilter-datamap-guide.md) * [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md) - * [CarbonData Pre-aggregate DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md) - * [CarbonData Timeseries DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/timeseries-datamap-guide.md) Review comment: searched all place and fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (CARBONDATA-3628) Alter hive table add complex column type
ChenKai created CARBONDATA-3628: --- Summary: Alter hive table add complex column type Key: CARBONDATA-3628 URL: https://issues.apache.org/jira/browse/CARBONDATA-3628 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.6.0 Reporter: ChenKai ERROR: NullPointerException {code:java} alter table alter_hive add columns (var map) {code} Tips: Complex type only supports default, see *DataTypeUtil#valueOf* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] jackylk commented on a change in pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
jackylk commented on a change in pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#discussion_r361050141 ## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDropTableCommand.scala ## @@ -69,7 +69,7 @@ case class CarbonDropTableCommand( CarbonLockUtil.getLockObject(identifier, lock) } // check for directly drop datamap table - if (carbonTable.isChildTable && !dropChildTable) { + if (carbonTable.isChildTableForMV && !dropChildTable) { Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
jackylk commented on a change in pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#discussion_r361050057 ## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala ## @@ -33,7 +32,6 @@ import org.apache.carbondata.core.datamap.{DataMapProvider, DataMapStoreManager} import org.apache.carbondata.core.datamap.status.DataMapStatusManager import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, LockUsage} import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier -import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl Review comment: This will break compatibility, better do not delete them This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on a change in pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap
jackylk commented on a change in pull request #3522: [CARBONDATA-3609][CARBONDATA-3610] Remove preaggregate and timeseries datamap URL: https://github.com/apache/carbondata/pull/3522#discussion_r361050163 ## File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ## @@ -116,54 +116,35 @@ SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier, readCommittedScope.getConfiguration()); SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager -.getValidAndInvalidSegments(carbonTable.isChildTable(), loadMetadataDetails, +.getValidAndInvalidSegments(carbonTable.isChildTableForMV(), loadMetadataDetails, this.readCommittedScope); -// to check whether only streaming segments access is enabled or not, -// if access streaming segment is true then data will be read from streaming segments -boolean accessStreamingSegments = getAccessStreamingSegments(job.getConfiguration()); -if (getValidateSegmentsToAccess(job.getConfiguration())) { - if (!accessStreamingSegments) { -List validSegments = segments.getValidSegments(); -streamSegments = segments.getStreamSegments(); -streamSegments = getFilteredSegment(job, streamSegments, readCommittedScope); -if (validSegments.size() == 0) { - return getSplitsOfStreaming(job, streamSegments, carbonTable); -} -List filteredSegmentToAccess = -getFilteredSegment(job, segments.getValidSegments(), readCommittedScope); -if (filteredSegmentToAccess.size() == 0) { - return getSplitsOfStreaming(job, streamSegments, carbonTable); -} else { - setSegmentsToAccess(job.getConfiguration(), filteredSegmentToAccess); -} - } else { -List filteredNormalSegments = -getFilteredNormalSegments(segments.getValidSegments(), Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query
CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query URL: https://github.com/apache/carbondata/pull/3528#issuecomment-568530905 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1277/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query
CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query URL: https://github.com/apache/carbondata/pull/3528#issuecomment-568529471 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1266/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query
CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query URL: https://github.com/apache/carbondata/pull/3528#issuecomment-568511481 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1256/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#issuecomment-568505672 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1264/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query
CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query URL: https://github.com/apache/carbondata/pull/3528#issuecomment-568471327 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1276/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query
CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query URL: https://github.com/apache/carbondata/pull/3528#issuecomment-568471258 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1265/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query
CarbonDataQA1 commented on issue #3528: [WIP] update should support limit 1 sub query URL: https://github.com/apache/carbondata/pull/3528#issuecomment-568467247 Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1255/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] ajantha-bhat opened a new pull request #3528: [WIP] update should support limit 1 sub query
ajantha-bhat opened a new pull request #3528: [WIP] update should support limit 1 sub query URL: https://github.com/apache/carbondata/pull/3528 Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#issuecomment-568454964 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1275/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3519) Optimizations in write step to avoid unnecessary memory blk allocation/free
[ https://issues.apache.org/jira/browse/CARBONDATA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venugopal Reddy K updated CARBONDATA-3519: -- Summary: Optimizations in write step to avoid unnecessary memory blk allocation/free (was: A new column page MemoryBlock is allocated at each row addition to table page if having string column with local dictionary enabled. ) > Optimizations in write step to avoid unnecessary memory blk allocation/free > --- > > Key: CARBONDATA-3519 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3519 > Project: CarbonData > Issue Type: Improvement > Components: core >Reporter: Venugopal Reddy K >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > +*Issue-1:*+ > {color:#0747a6}*Context:*{color} > For a string column with local dictionary enabled, a column page of > `{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype > `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for > `{color:#de350b}{{encodedPage}}{color}` along with regular > `{color:#de350b}{{actualPage}}{color}` of > `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`. > We have `{color:#de350b}*{{capacity}}*{color}` field in the > `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field > indicates the capacity of allocated > `{color:#de350b}{{memoryBlock}}{color}` for the page. > `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding > rows to check if `{color:#de350b}{{totalLength + requestSize > > capacity}}{color}` to allocate a new memoryBlock. If there is no room to add > the next row, allocates a new block, copy the old context(prev rows) and free > the old memoryBlock. > {color:#0747a6} *Problem:*{color} > While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with > datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for > `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the > *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block > size. Hence, for each add row to tablePage, *ensureMemory() check always > fails*, allocates a new column page memoryBlock, copy the old context(prev > rows) and free the old memoryBlock. This *allocation of new memoryBlock and > free of old memoryBlock happens for each row row addition* for the string > columns with local dictionary. > > +*Issue-2:*+ > {color:#0747a6}*Context:*{color} > In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a > `{color:#de350b}rowOffset{color}` column page of > `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype > `{color:#de350b}INT{color}` > to maintain the data offset to {color:#172b4d}each{color} row of variable > length columns. This `{color:#de350b}rowOffset{color}` page allocates to be > size of page. > {color:#0747a6} *Problem:*{color} > {color:#172b4d}If we have 10 rows in the page, we need 11 rows for its > rowOffset page. Because we always keep 0 as offset to 1st row. So an > additional row is required for rowOffset page[pasted code below to show the > reference]. Otherwise, *ensureMemory() check always fails for the last > row*(10th row in this case) of data and *allocates a new rowOffset page > memoryBlock, copy the old context(prev rows) and free the old memoryBlock*. > This *can happen for the string columns with local dictionary, direct > dictionary columns, global disctionary columns*.{color} > > {code:java} > public abstract class VarLengthColumnPageBase extends ColumnPage { > ... > @Override > public void putBytes(int rowId, byte[] bytes) { > ... > if (rowId == 0) { > rowOffset.putInt(0, 0); ==> offset to 1st row is 0. > } > rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length); > putBytesAtRow(rowId, bytes); > totalLength += bytes.length; > } > ... > } > > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
CarbonDataQA1 commented on issue #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#issuecomment-568431434 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1254/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list
CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list URL: https://github.com/apache/carbondata/pull/3481#issuecomment-568430106 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1274/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list
CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list URL: https://github.com/apache/carbondata/pull/3481#issuecomment-568428532 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1263/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#discussion_r360832593 ## File path: docs/zh_cn/SybaseIQ和CarbonData查询性能对比.md ## @@ -0,0 +1,109 @@ + + +## Carbondata 替换Sybase IQ查询性能对比 Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata
MarvinLitt commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata URL: https://github.com/apache/carbondata/pull/3521#discussion_r360832602 ## File path: docs/zh_cn/SybaseIQ和CarbonData查询性能对比.md ## @@ -0,0 +1,109 @@ + + +## Carbondata 替换Sybase IQ查询性能对比 + +本文主要在于给用户呈现Carbondata在替换Syabse IQ过程中对于Sybase IQ的查询性能,Carbondata自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。 + + + + + +## 1.集群状态对比 + +| 集群 | 描述 | +| -- | - | +| IQ集群 | 1个加载节点,1个协调节点,1个查询节点,SSD硬盘,磁阵 | +| Hadoop集群 | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 | + +## 2.查询SQL模型介绍 + +IQ与Carbon查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。 + +```IQ的查询SQL模型:``` + +SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * 8 / 72000 / 1024 AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * 8 / 72000 / 1024 AS COLUMN_G , SUM(COALESCE(COLUMN_B, 0)) * 8 / 72000 / 1024 AS COLUMN_H , MT."202080101" AS "202080101", COUNT(1) OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."202080101" AS "202080101" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "202050101" , CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "202080101" , CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END AS NAME_202080101 FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."202050101" WHERE TABLE_A.NAME_202080101 IS NOT NULL AND "TIME" < 1576087200 AND "TIME" >= 1576015200 GROUP BY TABLE_A."202080101" ) MT GROUP BY MT."202080101" ORDER BY COLUMN_C DESC + +其中一个SUM后面称为一个counter + +```Spark的查询SQL模型:``` + +SELECT COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , COALESCE(SUM(COLUMN_B), 0) AS COLUMN_B_B , COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS COLUMN_F , COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , COALESCE(SUM(COLUMN_E), 0) AS COLUMN_E_E , (COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * 8 / 72000 / 1024 AS COLUMN_F , COALESCE(SUM(COLUMN_A), 0) * 8 / 72000 / 1024 AS COLUMN_G , COALESCE(SUM(COLUMN_B), 0) * 8 / 72000 / 1024 AS COLUMN_H , MT.`202080101` AS `202080101` FROM ( SELECT `COLUMN_1_A` AS COLUMN_A, `COLUMN_1_E` AS COLUMN_E, `COLUMN_1_B` AS COLUMN_B, `COLUMN_1_D` AS COLUMN_D, TABLE_A.`202080101` AS `202080101` FROM TABLE_B LEFT JOIN ( SELECT `COLUMN_CSI` AS `202050101` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END AS `202080101` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS NAME_202080101 FROM DIMENSION_TABLE GROUP BY `COLUMN_CSI`, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END ) TABLE_A ON `COLUMN_CSI` = TABLE_A.`202050101` WHERE TABLE_A.NAME_202080101 IS NOT NULL AND `TIME` >= 1576015200 AND `TIME` < 1576087200 ) MT GROUP BY MT.`202080101` ORDER BY COLUMN_C DESC LIMIT 5000 + +## 3.Carbon主要配置参数 + +```主要配置``` + +| Carbon主要配置 | 参数值 | 描述 | +| | -- | | +| carbon.inmemory.record.size | 48 | 查询每个表需要加载到内存的总行数。 | +| carbon.number.of.cores | 4 | carbon查询过程中并行扫描的线程数。 | +| carbon.number.of.cores.while.loading | 15 | carbon数据加载过程中并行扫描的线程数。 | +| carbon.sort.file.buffer.size | 20 | 在合并排序(读/写)操作时存储每个临时过程文件的所使用的总缓存大小。单位为MB | +| carbon.sort.size | 50 | 在数据加载操作时,每次被排序的记录数。 | +| Spark主要配置|| | +| spark.sql.shuffle.partitions | 70 | | +| spark.executor.instances | 6 | | +| spark.executor.cores | 13 | | +| spark.locality.wait | 0 | | +| spark.executor.memory| 5G |
[GitHub] [carbondata] VenuReddy2103 commented on issue #3524: [CARBONDATA-3519]Made optimizations in Jira link CARBONDATA-3519
VenuReddy2103 commented on issue #3524: [CARBONDATA-3519]Made optimizations in Jira link CARBONDATA-3519 URL: https://github.com/apache/carbondata/pull/3524#issuecomment-568425598 @jackylk This is for unsafe memory allocation(off heap memory with native memory allocations). Fix avoids this unnessary system memory alloc/free. Didn't measure the load performance after this change though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list
CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list URL: https://github.com/apache/carbondata/pull/3481#issuecomment-568407788 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1253/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services