[jira] [Created] (CARBONDATA-1336) Add issue mailing list
xuchuanyin created CARBONDATA-1336: -- Summary: Add issue mailing list Key: CARBONDATA-1336 URL: https://issues.apache.org/jira/browse/CARBONDATA-1336 Project: CarbonData Issue Type: Improvement Components: docs Reporter: xuchuanyin Assignee: xuchuanyin Priority: Trivial Carbondata's issue related mails have been sent to a new mailing list other than DEV. We need to add the related guidance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1203: Rebase encoding_override branch onto master
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1203 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3229/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1203: Rebase encoding_override branch onto master
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1203 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/634/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1203: Rebase encoding_override branch onto master
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1203 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1203: Rebase encoding_override branch onto master
GitHub user sraghunandan opened a pull request: https://github.com/apache/carbondata/pull/1203 Rebase encoding_override branch onto master Rebase encoding_override branch onto master You can merge this pull request into a Git repository by running: $ git pull https://github.com/sraghunandan/carbondata-1 rebase_encoding-override_onto_master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1203.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1203 commit bc3e6843ee83370b6b20e5c9eef92f10667edbae Author: jackylkDate: 2017-07-04T00:12:13Z [CARBONDATA-1098] Change page statistics use exact type and use column page in writer This PR changes writer in data load: make statistics collection use exact data type in schema instead of generic type change consumer and writer to use EncodedTablePage instead of NodeHolder. EncodedTablePage is the output of TablePage.encode This closes#1102 commit a5af0ff238230bf64c8ac987bec9977d3f081ff2 Author: jackylk Date: 2017-07-13T01:21:30Z [CARBONDATA-1268] Support encoding strategy for dimension columns In this PR, dimension encoding is changed to use EncodingStrategy instead of hard coding. In future, dimension encoding can be adjusted by extending EncodingStrategy This closes#1136 commit 74226907990cdee41a6ccbd69e2a813077792f89 Author: Raghunandan S Date: 2017-07-26T13:59:05Z Resolve rebase conflicts when rebasing branch encoding_override onto master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1203: Rebase encoding_override branch onto master
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1203 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1134: [CARBONDATA-1262] Remove unnecessary LoadConfigurati...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1134 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/633/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1134: [CARBONDATA-1262] Remove unnecessary LoadConfigurati...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1134 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3228/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3227/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1134: [CARBONDATA-1262] Remove unnecessary LoadConfigurati...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1134 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/632/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/631/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1202: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1202 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/630/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1202: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1202 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3226/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1202: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1202 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1202: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1202 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/629/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1202: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1202 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3225/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1202: [CARBONDATA-1326] Fixed normal/low priority f...
GitHub user mohammadshahidkhan opened a pull request: https://github.com/apache/carbondata/pull/1202 [CARBONDATA-1326] Fixed normal/low priority findbug issues You can merge this pull request into a Git repository by running: $ git pull https://github.com/mohammadshahidkhan/incubator-carbondata findbugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1202.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1202 commit 5e78124091341daf02f5a04d472d7f0e5590d40c Author: mohammadshahidkhanDate: 2017-07-27T15:37:54Z [CARBONDATA-1326] Fixed normal/low priority findbug issues --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1202: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1202 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1134: [CARBONDATA-1262] Remove unnecessary LoadConf...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1134#discussion_r129874324 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -465,6 +463,29 @@ public static String checkAndCreateCarbonStoreLocation(String factStoreLocation, } /** + * Return the sort scope enum. + */ + public static SortScopeOptions.SortScope getSortScope(String sortScopeString) { +SortScopeOptions.SortScope sortScope; +try { + // first check whether user input it from ddl, otherwise get from carbon properties --- End diff -- suggest changing "it" to "sort scope" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (CARBONDATA-1287) Remove unnecessary MDK generation in loading
[ https://issues.apache.org/jira/browse/CARBONDATA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-1287. Resolution: Fixed Assignee: Jacky Li > Remove unnecessary MDK generation in loading > > > Key: CARBONDATA-1287 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1287 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Assignee: Jacky Li > Fix For: 1.2.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When updating MDK key in data load write step, there is unnecessary MDK > generation. It can be removed to improvement loading performance -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1145: [CARBONDATA-1287] remove unnecessary MDK gene...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1145 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1145: [CARBONDATA-1287] remove unnecessary MDK generation
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1145 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1200: [Documentation] Fixed the syntax issue in Del...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1200 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1200: [Documentation] Fixed the syntax issue in Delete by ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1200 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (CARBONDATA-1313) Remove unnecessary statistics
[ https://issues.apache.org/jira/browse/CARBONDATA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-1313. Resolution: Fixed Assignee: Jacky Li > Remove unnecessary statistics > -- > > Key: CARBONDATA-1313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1313 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Assignee: Jacky Li > Fix For: 1.2.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Unique Value and Decimal Point is not used, remove them in measure statistics -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3224/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/628/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1181: [CARBONDATA-1313] Remove unnecessary measure ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1181 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1168: [CARBONDATA-1229] restrict drop when loading is in p...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1168 @kunal642 can you please rebase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (CARBONDATA-1335) Duplicated & time-consuming method call found in query
[ https://issues.apache.org/jira/browse/CARBONDATA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-1335. -- Resolution: Fixed Fix Version/s: 1.2.0 > Duplicated & time-consuming method call found in query > -- > > Key: CARBONDATA-1335 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1335 > Project: CarbonData > Issue Type: Improvement > Components: data-query >Affects Versions: 1.1.1 >Reporter: xuchuanyin >Priority: Minor > Labels: performance > Fix For: 1.2.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > # Scenario > Currently we did a concurrent 14 queries on Carbondata. The queries are the > same, but on different tables. We have noticed the following scene: > + A single query took about 5s; > + In concurrent scenario, each query took about 15s; > By adding checkpoint in the log, we found that there was great latency in > starting query jobs in spark. > # Analyze > When we fire a query, Carbondata firstly do some job in the client side, > including parse/analyze plans and prepare filtered blocks and inputSplits. > Then Carbondata start to submit query job to spark. > We found in the first step, Carbondata took about 7s in current scenario, but > it only took about <1s in single scenario. > By studying the related code, we found the most time consuming method call > was `CarbonSessionCatalog.lookupRelation`. In side this method, it called > `super.lookupRelation` twice, which consumed about 3s each time. > # Solution > Carbondata only needs to call the `super.lookupRelation` only once, we need > to remove the useless duplicated method call. > I've tested in my environment and it works well. In concurrent scenario, each > query takes about 12s (3s saved for the improvement). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1181: [CARBONDATA-1313] Remove unnecessary measure statist...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1181 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (CARBONDATA-1281) Disk hotspot found during data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-1281. Resolution: Fixed Fix Version/s: 1.2.0 > Disk hotspot found during data loading > -- > > Key: CARBONDATA-1281 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1281 > Project: CarbonData > Issue Type: Improvement > Components: core, data-load >Affects Versions: 1.1.0 >Reporter: xuchuanyin >Assignee: xuchuanyin > Fix For: 1.2.0 > > Time Spent: 17.5h > Remaining Estimate: 0h > > # Scenario > Currently we have done a massive data loading. The input data is about 71GB > in CSV format,and have about 88million records. When using carbondata, we do > not use any dictionary encoding. Our testing environment has three nodes and > each of them have 11 disks as yarn executor directory. We submit the loading > command through JDBCServer.The JDBCServer instance have three executors in > total, one on each node respectively. The loading takes about 10minutes > (+-3min vary from each time). > We have observed the nmon information during the loading and find: > 1. lots of CPU waits in the first half of loading; > 2. only one single disk has many writes and almost reaches its bottleneck > (Avg. 80M/s, Max. 150M/s on SAS Disk) > 3. the other disks are quite idel > # Analyze > When do data loading, carbondata read and sort data locally(default scope) > and write the temp files to local disk. In my case, there is only one > executor in one node, so carbondata write all the temp file to one > disk(container directory or yarn local directory), thus resulting into single > disk hotspot. > # Modification > We should support multiple directory for writing temp files to avoid disk > hotspot. > Ps: I have improved this in my environment and the result is pretty > optimistic: the loading takes about 6minutes (10 minutes before improving). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1281) Disk hotspot found during data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen reassigned CARBONDATA-1281: -- Assignee: xuchuanyin (was: Liang Chen) > Disk hotspot found during data loading > -- > > Key: CARBONDATA-1281 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1281 > Project: CarbonData > Issue Type: Improvement > Components: core, data-load >Affects Versions: 1.1.0 >Reporter: xuchuanyin >Assignee: xuchuanyin > Fix For: 1.2.0 > > Time Spent: 17.5h > Remaining Estimate: 0h > > # Scenario > Currently we have done a massive data loading. The input data is about 71GB > in CSV format,and have about 88million records. When using carbondata, we do > not use any dictionary encoding. Our testing environment has three nodes and > each of them have 11 disks as yarn executor directory. We submit the loading > command through JDBCServer.The JDBCServer instance have three executors in > total, one on each node respectively. The loading takes about 10minutes > (+-3min vary from each time). > We have observed the nmon information during the loading and find: > 1. lots of CPU waits in the first half of loading; > 2. only one single disk has many writes and almost reaches its bottleneck > (Avg. 80M/s, Max. 150M/s on SAS Disk) > 3. the other disks are quite idel > # Analyze > When do data loading, carbondata read and sort data locally(default scope) > and write the temp files to local disk. In my case, there is only one > executor in one node, so carbondata write all the temp file to one > disk(container directory or yarn local directory), thus resulting into single > disk hotspot. > # Modification > We should support multiple directory for writing temp files to avoid disk > hotspot. > Ps: I have improved this in my environment and the result is pretty > optimistic: the loading takes about 6minutes (10 minutes before improving). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1281) Disk hotspot found during data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen reassigned CARBONDATA-1281: -- Assignee: Liang Chen > Disk hotspot found during data loading > -- > > Key: CARBONDATA-1281 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1281 > Project: CarbonData > Issue Type: Improvement > Components: core, data-load >Affects Versions: 1.1.0 >Reporter: xuchuanyin >Assignee: Liang Chen > Time Spent: 17.5h > Remaining Estimate: 0h > > # Scenario > Currently we have done a massive data loading. The input data is about 71GB > in CSV format,and have about 88million records. When using carbondata, we do > not use any dictionary encoding. Our testing environment has three nodes and > each of them have 11 disks as yarn executor directory. We submit the loading > command through JDBCServer.The JDBCServer instance have three executors in > total, one on each node respectively. The loading takes about 10minutes > (+-3min vary from each time). > We have observed the nmon information during the loading and find: > 1. lots of CPU waits in the first half of loading; > 2. only one single disk has many writes and almost reaches its bottleneck > (Avg. 80M/s, Max. 150M/s on SAS Disk) > 3. the other disks are quite idel > # Analyze > When do data loading, carbondata read and sort data locally(default scope) > and write the temp files to local disk. In my case, there is only one > executor in one node, so carbondata write all the temp file to one > disk(container directory or yarn local directory), thus resulting into single > disk hotspot. > # Modification > We should support multiple directory for writing temp files to avoid disk > hotspot. > Ps: I have improved this in my environment and the result is pretty > optimistic: the loading takes about 6minutes (10 minutes before improving). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1134: [CARBONDATA-1262] Remove unnecessary LoadConfigurati...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1134 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3223/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1134: [CARBONDATA-1262] Remove unnecessary LoadConfigurati...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1134 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/627/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1198 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1198 LGTM, very good PR! Thanks for your good contribution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1067: [CARBONDATA-1199] support dynamically enabling unsaf...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1067 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/626/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1067: [CARBONDATA-1199] support dynamically enabling unsaf...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1067 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3222/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1181: [CARBONDATA-1313] Remove unnecessary measure statist...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1181 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/625/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1181: [CARBONDATA-1313] Remove unnecessary measure statist...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1181 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3221/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (CARBONDATA-1268) Add encoding selection strategy for columns
[ https://issues.apache.org/jira/browse/CARBONDATA-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-1268. -- Resolution: Fixed > Add encoding selection strategy for columns > --- > > Key: CARBONDATA-1268 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1268 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li > Fix For: 1.2.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > For each column, carbon should support encoding strategy to choose the > suitable encoding method. > This strategy should be extensible, so developer can change its behavior > easily. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1136: [CARBONDATA-1268] Support encoding strategy f...
Github user jackylk closed the pull request at: https://github.com/apache/carbondata/pull/1136 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1194: Rebase metadata onto master
Github user sraghunandan commented on the issue: https://github.com/apache/carbondata/pull/1194 Merged onto master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1102: [CARBONDATA-1098] Change page statistics use ...
Github user jackylk closed the pull request at: https://github.com/apache/carbondata/pull/1102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1194: Rebase metadata onto master
Github user sraghunandan closed the pull request at: https://github.com/apache/carbondata/pull/1194 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1099: [CARBONDATA-1232] Datamap implementation for ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1099 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1196: Rebase datamap branch onto master
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1196 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1196: Rebase datamap branch onto master
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1196 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1196: Rebase datamap branch onto master
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1196 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/624/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1196: Rebase datamap branch onto master
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1196 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3220/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1198 @chenliang613 Does this PR can be merged or need more reviews? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1099: [CARBONDATA-1232] Datamap implementation for Blockle...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1099 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/623/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1099: [CARBONDATA-1232] Datamap implementation for Blockle...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1099 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3219/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1179: [WIP] Added the blocklet info to index file and make...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1179 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/622/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1179: [WIP] Added the blocklet info to index file and make...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1179 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3218/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3217/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/621/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3216/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/620/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3215/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1201 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/619/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3213/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1201: [CARBONDATA-1326] Fixed normal/low priority f...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/1201 [CARBONDATA-1326] Fixed normal/low priority findbug issues Fixed normal/low priority findbug issues in the code You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata findbugs_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1201.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1201 commit 3d17859d030772c6492f79ce025577fa98b60ac0 Author: kunal642Date: 2017-07-27T10:15:24Z fixed findbugs issues --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3214/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1201 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/618/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1201: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1201 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1200: [Documentation] Fixed the syntax issue in Delete by ...
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1200 LGTM @chenliang613 kindly review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1200: [Documentation] Fixed the syntax issue in Delete by ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1200 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1200: [Documentation] Fixed the syntax issue in Delete by ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1200 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1200: [Documentation] Fixed the syntax issue in Delete by ...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1200 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1200: [Documentation] Fixed the syntax issue in Del...
GitHub user siddhardhk opened a pull request: https://github.com/apache/carbondata/pull/1200 [Documentation] Fixed the syntax issue in Delete by Segment ID In the Delete by Segment ID command the WHERE was misspelled as WERE You can merge this pull request into a Git repository by running: $ git pull https://github.com/siddhardhk/carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1200.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1200 commit 7aba7caf042bec08282941db7f9470407781a07a Author: siddhardhkDate: 2017-07-27T09:58:43Z [Documentation] Fixed the syntax issue in Delete by Segment ID In the Delete by Segment ID command the WHERE was misspelled as WERE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1200: [Documentation] Fixed the syntax issue in Delete by ...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1200 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1199: [CARBONDATA-1335] Remove duplicated time-consuming m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1199 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3212/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1199: [CARBONDATA-1335] Remove duplicated time-consuming m...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1199 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/617/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3211/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/616/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (CARBONDATA-1335) Duplicated & time-consuming method call found in query
[ https://issues.apache.org/jira/browse/CARBONDATA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-1335: --- Description: # Scenario Currently we did a concurrent 14 queries on Carbondata. The queries are the same, but on different tables. We have noticed the following scene: + A single query took about 5s; + In concurrent scenario, each query took about 15s; By adding checkpoint in the log, we found that there was great latency in starting query jobs in spark. # Analyze When we fire a query, Carbondata firstly do some job in the client side, including parse/analyze plans and prepare filtered blocks and inputSplits. Then Carbondata start to submit query job to spark. We found in the first step, Carbondata took about 7s in current scenario, but it only took about <1s in single scenario. By studying the related code, we found the most time consuming method call was `CarbonSessionCatalog.lookupRelation`. In side this method, it called `super.lookupRelation` twice, which consumed about 3s each time. # Solution Carbondata only needs to call the `super.lookupRelation` only once, we need to remove the useless duplicated method call. I've tested in my environment and it works well. In concurrent scenario, each query takes about 12s (3s saved for the improvement). was: # Scenario Currently we did a concurrent 14 queries on Carbondata. The queries are the same, but on different tables. We have noticed the following scene: + A single query took about 5s; + In concurrent scenario, each query took about 15s; By adding checkpoint in the log, we found that there was great latency in starting query jobs in spark. # Analysts When we fire a query, Carbondata firstly do some job in the client side, including parse/analyze plans and prepare filtered blocks and inputSplits. Then Carbondata start to submit query job to spark. We found in the first step, Carbondata took about 7s in current scenario, but it only took about <1s in single scenario. By studying the related code, we found the most time consuming method call was `CarbonSessionCatalog.lookupRelation`. In side this method, it called `super.lookupRelation` twice, which consumed about 3s each time. # Solution Carbondata only needs to call the `super.lookupRelation` only once, we need to remove the useless duplicated method call. I've tested in my environment and it works well. In concurrent scenario, each query takes about 12s (3s saved for the improvement). > Duplicated & time-consuming method call found in query > -- > > Key: CARBONDATA-1335 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1335 > Project: CarbonData > Issue Type: Improvement > Components: data-query >Affects Versions: 1.1.1 >Reporter: xuchuanyin >Priority: Minor > Labels: performance > Time Spent: 40m > Remaining Estimate: 0h > > # Scenario > Currently we did a concurrent 14 queries on Carbondata. The queries are the > same, but on different tables. We have noticed the following scene: > + A single query took about 5s; > + In concurrent scenario, each query took about 15s; > By adding checkpoint in the log, we found that there was great latency in > starting query jobs in spark. > # Analyze > When we fire a query, Carbondata firstly do some job in the client side, > including parse/analyze plans and prepare filtered blocks and inputSplits. > Then Carbondata start to submit query job to spark. > We found in the first step, Carbondata took about 7s in current scenario, but > it only took about <1s in single scenario. > By studying the related code, we found the most time consuming method call > was `CarbonSessionCatalog.lookupRelation`. In side this method, it called > `super.lookupRelation` twice, which consumed about 3s each time. > # Solution > Carbondata only needs to call the `super.lookupRelation` only once, we need > to remove the useless duplicated method call. > I've tested in my environment and it works well. In concurrent scenario, each > query takes about 12s (3s saved for the improvement). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3210/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1199: [CARBONDATA-1335] Remove duplicated time-cons...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1199#discussion_r129767934 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonSessionState.scala --- @@ -106,7 +109,6 @@ class CarbonSessionCatalog( } case _ => } -super.lookupRelation(name, alias) --- End diff -- this PR mainly focus on removing this useless method call --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1199: [CARBONDATA-1335] Remove duplicated time-consuming m...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1199 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/615/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1199: [CARBONDATA-1335] Remove duplicated time-consuming m...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1199 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1199: [CARBONDATA-1335] Remove duplicated time-cons...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/1199 [CARBONDATA-1335] Remove duplicated time-consuming method call # Scenario Currently we did a concurrent 14 queries on Carbondata. The queries are the same, but on different tables. We have noticed the following scene: + A single query took about 5s; + In concurrent scenario, each query took about 15s; By adding checkpoint in the log, we found that there was great latency in starting query jobs in spark. # Analysts When we fire a query, Carbondata firstly do some job in the client side, including parse/analyze plans and prepare filtered blocks and inputSplits. Then Carbondata start to submit query job to spark. We found in the first step, Carbondata took about 7s in current scenario, but it only took about <1s in single scenario. By studying the related code, we found the most time consuming method call was `CarbonSessionCatalog.lookupRelation`. In side this method, it called `super.lookupRelation` twice, which consumed about 3s each time. # Solution Carbondata only needs to call the `super.lookupRelation` only once, we need to remove the useless duplicated method call. I've tested in my environment and it works well. In concurrent scenario, each query takes about 12s (3s saved for the improvement). You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata remove_duplicated_lookup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1199 commit d05d1fadbe0f773a00ff1d0e96ff9fe90b7b7f06 Author: xuchuanyinDate: 2017-07-27T07:07:25Z Remove duplicated time-consuming method call --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1198 All review comments solved --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1198#discussion_r129765977 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -145,21 +146,31 @@ public static void renameBadRecordsFromInProgressToNormal( /** * This method will be used to delete sort temp location is it is exites */ - public static void deleteSortLocationIfExists(String tempFileLocation) { -// create new temp file location where this class -//will write all the temp files -File file = new File(tempFileLocation); - -if (file.exists()) { - try { -CarbonUtil.deleteFoldersAndFiles(file); - } catch (IOException | InterruptedException e) { -LOGGER.error(e); + public static void deleteSortLocationIfExists(String[] locations) { +for (String loc : locations) { + File file = new File(loc); + if (file.exists()) { +try { + CarbonUtil.deleteFoldersAndFiles(file); +} catch (IOException | InterruptedException e) { + LOGGER.error(e, "Failed to delete " + loc); +} } } } /** + * This method will be used to create dirs + * @param locations locations to create + */ + public static void createLocations(String[] locations) { +for (String loc : locations) { + if (new File(loc).mkdirs()) { --- End diff -- :+1: nice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1198#discussion_r129765796 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1296,6 +1296,18 @@ public static final String CARBON_LEASE_RECOVERY_RETRY_INTERVAL = "carbon.lease.recovery.retry.interval"; + /** + * whether to use multi directories when loading data, + * the main purpose is to avoid single-disk-hot-spot + */ + @CarbonProperty + public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir"; + + /** + * default value for multi temp dir + */ + public static final String CARBON_USING_MULTI_TEMP_DIR_DEFAULT = "false"; --- End diff -- :+1: fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1335) Duplicated & time-consuming method call found in query
xuchuanyin created CARBONDATA-1335: -- Summary: Duplicated & time-consuming method call found in query Key: CARBONDATA-1335 URL: https://issues.apache.org/jira/browse/CARBONDATA-1335 Project: CarbonData Issue Type: Improvement Components: data-query Affects Versions: 1.1.1 Reporter: xuchuanyin Priority: Minor # Scenario Currently we did a concurrent 14 queries on Carbondata. The queries are the same, but on different tables. We have noticed the following scene: + A single query took about 5s; + In concurrent scenario, each query took about 15s; By adding checkpoint in the log, we found that there was great latency in starting query jobs in spark. # Analysts When we fire a query, Carbondata firstly do some job in the client side, including parse/analyze plans and prepare filtered blocks and inputSplits. Then Carbondata start to submit query job to spark. We found in the first step, Carbondata took about 7s in current scenario, but it only took about <1s in single scenario. By studying the related code, we found the most time consuming method call was `CarbonSessionCatalog.lookupRelation`. In side this method, it called `super.lookupRelation` twice, which consumed about 3s each time. # Solution Carbondata only needs to call the `super.lookupRelation` only once, we need to remove the useless duplicated method call. I've tested in my environment and it works well. In concurrent scenario, each query takes about 12s (3s saved for the improvement). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1192 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3209/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1192 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/614/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1198#discussion_r129753971 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -145,21 +146,31 @@ public static void renameBadRecordsFromInProgressToNormal( /** * This method will be used to delete sort temp location is it is exites */ - public static void deleteSortLocationIfExists(String tempFileLocation) { -// create new temp file location where this class -//will write all the temp files -File file = new File(tempFileLocation); - -if (file.exists()) { - try { -CarbonUtil.deleteFoldersAndFiles(file); - } catch (IOException | InterruptedException e) { -LOGGER.error(e); + public static void deleteSortLocationIfExists(String[] locations) { +for (String loc : locations) { + File file = new File(loc); + if (file.exists()) { +try { + CarbonUtil.deleteFoldersAndFiles(file); +} catch (IOException | InterruptedException e) { + LOGGER.error(e, "Failed to delete " + loc); +} } } } /** + * This method will be used to create dirs + * @param locations locations to create + */ + public static void createLocations(String[] locations) { +for (String loc : locations) { + if (new File(loc).mkdirs()) { --- End diff -- should it not be !new File(loc).mkdirs() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1198#discussion_r129753676 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1296,6 +1296,18 @@ public static final String CARBON_LEASE_RECOVERY_RETRY_INTERVAL = "carbon.lease.recovery.retry.interval"; + /** + * whether to use multi directories when loading data, + * the main purpose is to avoid single-disk-hot-spot + */ + @CarbonProperty + public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir"; + + /** + * default value for multi temp dir + */ + public static final String CARBON_USING_MULTI_TEMP_DIR_DEFAULT = "false"; --- End diff -- change to match the above configuration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1334) Delete Operation Hung in large dataset
sounak chakraborty created CARBONDATA-1334: -- Summary: Delete Operation Hung in large dataset Key: CARBONDATA-1334 URL: https://issues.apache.org/jira/browse/CARBONDATA-1334 Project: CarbonData Issue Type: Bug Reporter: sounak chakraborty Delete operation is hung in large dataset. Due to wrong quals check in DeleteDeltaBlockletDetails.java multiple DeleteDeltaBlockDetails objects being formed (almost like each object for each delete offset). Due to this high object formation search cost became very high which caused the hung situation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)