[GitHub] incubator-carbondata issue #806: Docs for optimizing mass data loading
Github user allwefantasy commented on the issue: https://github.com/apache/incubator-carbondata/pull/806 @sgururajshetty I have modified the paragraph. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #:
Github user allwefantasy commented on the pull request: https://github.com/apache/incubator-carbondata/commit/6c9194d97c54351434866f423ef44907b887ae5a#commitcomment-20151464 In integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala: In integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala on line 439: Ok, I will fix this later . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...
Github user allwefantasy commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/368#discussion_r91218797 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala --- @@ -308,3 +310,155 @@ class NewCarbonDataLoadRDD[K, V]( } } } + --- End diff -- Comments have been added for NewDataFrameLoaderRDD and NewRddIterator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...
Github user allwefantasy closed the pull request at: https://github.com/apache/incubator-carbondata/pull/368 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...
Github user allwefantasy commented on the issue: https://github.com/apache/incubator-carbondata/pull/368 Yes It seems not ok.. I will try to figure out how to resolve this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...
Github user allwefantasy commented on the issue: https://github.com/apache/incubator-carbondata/pull/368 The commit log shows Changes allwefantasy and others added some commits 5 days ago. I guess there is no problem @ravipesala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #368: [CARBONDATA-465] Spark streaming dataframe ...
Github user allwefantasy commented on the issue: https://github.com/apache/incubator-carbondata/pull/368 It's weird. In my local branch, git log shows: ``` commit acdf78a8cba4f7c18cbaaf0fcc1a9e9dc3189068 Merge: 8a21cb7 5ca7218 Author: WilliamZhu <allwefant...@gmail.com> Date: Mon Dec 5 11:30:35 2016 +0800 Merge branch 'spark-streaming-dataframe-support2' of github.com:allwefantasy/incubator-carbondata into spark-streaming-dataframe-support2 commit 8a21cb715eac50c04b859530ab459ae9b6f226a3 Author: WilliamZhu <allwefant...@gmail.com> Date: Wed Nov 30 21:24:33 2016 +0800 remove comments on createTableFromThrift and rais jira later commit 06bc4239a2762a6f27da99982b47e880d6a1be4c Author: WilliamZhu <allwefant...@gmail.com> Date: Wed Nov 30 00:12:24 2016 +0800 reset maven-source-plugin commit 0f042797f54143bd473296bc33650e84d071dd15 Author: WilliamZhu <allwefant...@gmail.com> Date: Tue Nov 29 23:46:42 2016 +0800 spark streaming dataframe support commit 70ae82045e461c740cf2ae80c2058160bc9855a9 Merge: e7958b6 fc3f6b3 Author: ravipesala <ravi.pes...@gmail.com> ``` I will try to figure out --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #368: [CARBONDATA-465] Spark streaming dat...
Github user allwefantasy commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/368#discussion_r90229415 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -604,4 +606,33 @@ public static boolean isHeaderValid(String tableName, String header, } return dateformatsHashMap; } + --- End diff -- Module carbon-processing do not depends on spark or other computing engine however there are some class need multi-thread to load data which runs as computing engine's task which need get TaskContext using ThreadLocal tech. Yes, my first PR is merged from your PR333,but it's not merged to master yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #369: [CARBONDATA-470][WIP]Add unsafe offheap and...
Github user allwefantasy commented on the issue: https://github.com/apache/incubator-carbondata/pull/369 Does this PR have considered allocating memory from TaskMemoryManagerï¼ Many Spark application runs on Yarn,if you use off-heap,it's easy to trigger behavior of yarn's killing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #367: [CARBONDATA-465] Spark streaming dataframe ...
Github user allwefantasy commented on the issue: https://github.com/apache/incubator-carbondata/pull/367 Ok, i will remove the PR333 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #367: [CARBONDATA-465] Spark streaming dat...
Github user allwefantasy closed the pull request at: https://github.com/apache/incubator-carbondata/pull/367 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #367: [CARBONDATA-465] Spark streaming dataframe ...
Github user allwefantasy commented on the issue: https://github.com/apache/incubator-carbondata/pull/367 Can we merge PR333 first then merge this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #367: [CARBONDATA-465] Spark streaming dataframe ...
Github user allwefantasy commented on the issue: https://github.com/apache/incubator-carbondata/pull/367 I am not sure whether it depends on PR333ãI try to resolve issues when carbondata working with Spark Streaming by Merging PR333 however it does not work. Maybe I should remove the PR333 later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #367: [CARBONDATA-465] Spark streaming dat...
GitHub user allwefantasy opened a pull request: https://github.com/apache/incubator-carbondata/pull/367 [CARBONDATA-465] Spark streaming dataframe support 1. mvn clean verify have already been passed locally. 2. No new unit test cases are added 3. Tested in streamingpro project. 4. Remove kettle clearly. 5. @ravipesala 's pr333 ( https://github.com/apache/incubator-carbondata/pull/333 ) have been merged into this PR You can merge this pull request into a Git repository by running: $ git pull https://github.com/allwefantasy/incubator-carbondata spark-streaming-dataframe-support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/367.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #367 commit fb56ef0eeb3ed46ec5bd41c01648b139de93e000 Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-19T04:24:56Z Optimized no kettle flow and fixed issues in cluster commit fd0f7f2d9c11e1537cc79ac6a222dfaf55227365 Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-21T05:29:42Z Updated code fix table loading bug commit a4770bc4f3067a743745fb19de27cf4b35d51ff1 Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-22T07:32:38Z Performance improve try1 commit 2b6fda46b3c7ee680ad340abd6e319ee87165973 Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-22T12:48:40Z Test unsafe sort commit cf120dae7b427e04fa49b9e2cc4db63daedf95b8 Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-22T14:44:32Z test unsafe sort test2 commit 3f1f19a2fec3aa7343213f258f5f93a8c31c22fa Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-22T15:14:12Z disable unsafe sort commit e977363f6b508efd72be439c0900fc18f4de3202 Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-23T17:56:17Z Added unsafe offheap and heap sort commit 4110a11bb0778af3af6e3fc02dd88551bbfdbc03 Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-24T13:20:10Z unsafe1 commit 5403015f5baea55380ff4c8e0a306b77ecd7b49b Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-24T13:52:41Z unsafe2 commit f6642b3dcf3bab5a2699dc4304ba3a4102a8cc53 Author: ravipesala <ravi.pes...@gmail.com> Date: 2016-11-24T13:59:29Z unsafe3 commit eb1e183ed42ab87d67369cab1e541565c10ad70b Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-26T02:46:34Z merge commit 2b967f9e8c4b67275bd50d8f191f5da688380a90 Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-26T03:02:45Z merge from no kettle optimize commit b6e018884ef2f38c5ada3adf1dc52a430b4e9cba Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-26T03:04:46Z Fix SparkRowReadSupportImpl CONFLICT with master commit 0b11b9b73609d914d0e469aafe00beebac1d5159 Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-28T09:19:01Z DataFrame API witout kettle requirement commit 1a462532bc5bea36b6e8c0bc290dcb499f100794 Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-28T12:38:23Z fix NLP of loading data with new flow when sometimes spark executor will use TaskContext.get but the code run in another thread commit a3c79c726743cc3a4ab69716cc3e0499de13e1f8 Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-29T06:03:19Z spark streaming dataframe support commit d4c20bb399889e778c90ee92d557d0b224dafdaa Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-29T06:04:04Z Merge branch 'master' of https://github.com/apache/incubator-carbondata commit b3ef1258da247c183126889acc859934d9ad6a7c Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-29T06:30:58Z spark streaming support commit 332131cc2d1c5f1c50edef1a63b4a5310e6bf8d2 Author: WilliamZhu <allwefant...@gmail.com> Date: 2016-11-29T07:10:21Z fix check style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---