[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...
Github user foryou2030 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/82#discussion_r76726188 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -1443,5 +1446,32 @@ public static int getDictionaryChunkSize() { } return dictionaryOneChunkSize; } + + /** + * @param csvFilePath + * @return + */ + public static String readHeader(String csvFilePath) { + +DataInputStream fileReader = null; +BufferedReader bufferedReader = null; +String readLine = null; + +try { + fileReader = + FileFactory.getDataInputStream(csvFilePath, FileFactory.getFileType(csvFilePath)); + bufferedReader = + new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); --- End diff -- ok, handled --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-191) load data is null when quote char is single and no '\n' being end.
Jay created CARBONDATA-191: -- Summary: load data is null when quote char is single and no '\n' being end. Key: CARBONDATA-191 URL: https://issues.apache.org/jira/browse/CARBONDATA-191 Project: CarbonData Issue Type: Bug Reporter: Jay Priority: Minor when load data just like below, CREATE TABLE Priyal11 (id int,name string) STORED BY 'org.apache.carbondata.format'; LOAD DATA inpath 'hdfs://hacluster/Priyal1/test34.csv' INTO table Priyal11 options ('DELIMITER'=',', 'QUOTECHAR'='\"', 'FILEHEADER'='id,name'); and test34.csv is like below(note: there is no '\n' in the end of file.): 1,"priyal\" 2,"hello\" then query name's result is null. Actually, because of the existence of quote char. the expected result should be |prival" 6,"hello"| and if we add new line in the end of file, then query is right. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #104: [CARBONDATA-188] Compress CSV file b...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/104#discussion_r76723805 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -364,6 +364,7 @@ object GlobalDictionaryUtil extends Logging { .option("escape", carbonLoadModel.getEscapeChar) .option("ignoreLeadingWhiteSpace", "false") .option("ignoreTrailingWhiteSpace", "false") + .option("codec", "gzip") --- End diff -- Please check whether if it is a compression file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #104: [CARBONDATA-188] Compress CSV file b...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/104#discussion_r76723773 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -657,6 +657,8 @@ object CarbonDataRDDFactory extends Logging { val filePaths = carbonLoadModel.getFactFilePath hadoopConfiguration.set("mapreduce.input.fileinputformat.inputdir", filePaths) hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive", "true") + hadoopConfiguration.set("io.compression.codecs", +"org.apache.hadoop.io.compress.GzipCodec") --- End diff -- This configuration is only for compression file. Please check whether if it is a compression file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: A warning when loading data
Thank you Ravi, but I've set hadoop.tmp.dir in hadoop's core_site.xml. I'll build the master and try it again. 2016-08-30 0:11 GMT+08:00 Ravindra Pesala: > Hi Zen, > > It seems this issue is related to the PR > https://github.com/apache/incubator-carbondata/pull/89 .And it is merged > to > master. Alternatively please try to add hadoop.tmp.dir to carbon.properties > file to solve this issue. > > Thanks, > Ravi > > On 29 August 2016 at 20:44, Zen Wellon wrote: > > > I don't think it's raised by lockfile, because I've tried to recreate a > > new table with a totally different name. However, I'll check it tomorrow. > > > > 2016-08-29 23:09 GMT+08:00 Ravindra Pesala : > > > > > Hi, > > > > > > Did you check if any locks are created under system temp folder with > > > //lockfile, if it exists please delete and > try. > > > > > > Thanks, > > > Ravi. > > > > > > On 29 August 2016 at 20:29, Zen Wellon wrote: > > > > > > > Hi Ravi, > > > > > > > > After I upgrade carbon to 0.1.0, this problem occurs every time when > I > > > try > > > > to load data, and I'm sure no other carbon is running because I use > my > > > > personal dev spark-cluster, I've also tried to recreate a new table, > > but > > > > it's still there.. > > > > > > > > 2016-08-29 18:11 GMT+08:00 Ravindra Pesala : > > > > > > > > > Hi, > > > > > > > > > > Are you getting this exception continuously for every load? Usually > > it > > > > > occurs when you try to load the data concurrently to the same > table. > > So > > > > > please make sure that no other instance of carbon is running and > data > > > > load > > > > > on the same table is not happening. > > > > > Check if any locks are created under system temp folder with > > > > > //lockfile, if it exists please delete. > > > > > > > > > > Thanks & Regards, > > > > > Ravi > > > > > > > > > > On Mon, 29 Aug 2016 1:27 pm Zen Wellon, wrote: > > > > > > > > > > > Hi guys, > > > > > > When I tried to load some data into carbondata table with carbon > > > > 0.1.0, I > > > > > > met a problem below. > > > > > > > > > > > > WARN 29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365, > > > > > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file > > > > > ***(sensitive > > > > > > column) is locked for updation. Please try after some time > > > > > > at scala.sys.package$.error(package.scala:27) > > > > > > at > > > > > > > > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > > > > > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354) > > > > > > at > > > > > > > > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > > > > > RDD.compute(CarbonGlobalDictionaryRDD.scala:294) > > > > > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD. > > > > > scala:306) > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > > > > > > at > > > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask. > scala:66) > > > > > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > > > > > > at > > > > > > org.apache.spark.executor.Executor$TaskRunner.run( > > > Executor.scala:227) > > > > > > at > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > > > > ThreadPoolExecutor.java:1145) > > > > > > at > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > > > > ThreadPoolExecutor.java:615) > > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > Best regards, > > > > > > William Zen > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Best regards, > > > > William Zen > > > > > > > > > > > > > > > > -- > > > Thanks & Regards, > > > Ravi > > > > > > > > > > > -- > > > > > > Best regards, > > William Zen > > > > > > -- > Thanks & Regards, > Ravi > -- Best regards, William Zen
[GitHub] incubator-carbondata pull request #105: [CARBONDATA-189] Drop database casca...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/105#discussion_r76657713 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala --- @@ -1342,4 +1345,9 @@ class CarbonSqlParser() } } + protected lazy val dropDatabaseCascade: Parser[LogicalPlan] = +DROP ~> (DATABASE|SCHEMA) ~> opt(IF ~> EXISTS) ~> ident ~> CASCADE <~ opt(";") ^^ { + case cascade => throw new MalformedCarbonCommandException( + "Unsupported cascade operation in drop database command") --- End diff -- since system supports both database and schema better to provide message including schema like database/schema command like "Unsupported cascade operation in drop database/schema command" , else better provide "Unsupported cascade operation in drop command". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: A warning when loading data
Hi Zen, It seems this issue is related to the PR https://github.com/apache/incubator-carbondata/pull/89 .And it is merged to master. Alternatively please try to add hadoop.tmp.dir to carbon.properties file to solve this issue. Thanks, Ravi On 29 August 2016 at 20:44, Zen Wellonwrote: > I don't think it's raised by lockfile, because I've tried to recreate a > new table with a totally different name. However, I'll check it tomorrow. > > 2016-08-29 23:09 GMT+08:00 Ravindra Pesala : > > > Hi, > > > > Did you check if any locks are created under system temp folder with > > //lockfile, if it exists please delete and try. > > > > Thanks, > > Ravi. > > > > On 29 August 2016 at 20:29, Zen Wellon wrote: > > > > > Hi Ravi, > > > > > > After I upgrade carbon to 0.1.0, this problem occurs every time when I > > try > > > to load data, and I'm sure no other carbon is running because I use my > > > personal dev spark-cluster, I've also tried to recreate a new table, > but > > > it's still there.. > > > > > > 2016-08-29 18:11 GMT+08:00 Ravindra Pesala : > > > > > > > Hi, > > > > > > > > Are you getting this exception continuously for every load? Usually > it > > > > occurs when you try to load the data concurrently to the same table. > So > > > > please make sure that no other instance of carbon is running and data > > > load > > > > on the same table is not happening. > > > > Check if any locks are created under system temp folder with > > > > //lockfile, if it exists please delete. > > > > > > > > Thanks & Regards, > > > > Ravi > > > > > > > > On Mon, 29 Aug 2016 1:27 pm Zen Wellon, wrote: > > > > > > > > > Hi guys, > > > > > When I tried to load some data into carbondata table with carbon > > > 0.1.0, I > > > > > met a problem below. > > > > > > > > > > WARN 29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365, > > > > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file > > > > ***(sensitive > > > > > column) is locked for updation. Please try after some time > > > > > at scala.sys.package$.error(package.scala:27) > > > > > at > > > > > > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > > > > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354) > > > > > at > > > > > > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > > > > RDD.compute(CarbonGlobalDictionaryRDD.scala:294) > > > > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD. > > > > scala:306) > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > > > > > at > > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > > > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > > > > > at > > > > > org.apache.spark.executor.Executor$TaskRunner.run( > > Executor.scala:227) > > > > > at > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > > > ThreadPoolExecutor.java:1145) > > > > > at > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > > > ThreadPoolExecutor.java:615) > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > > > -- > > > > > > > > > > > > > > > Best regards, > > > > > William Zen > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Best regards, > > > William Zen > > > > > > > > > > > -- > > Thanks & Regards, > > Ravi > > > > > > -- > > > Best regards, > William Zen > -- Thanks & Regards, Ravi
[jira] [Created] (CARBONDATA-190) Data mismatch issue
kumar vishal created CARBONDATA-190: --- Summary: Data mismatch issue Key: CARBONDATA-190 URL: https://issues.apache.org/jira/browse/CARBONDATA-190 Project: CarbonData Issue Type: Bug Reporter: kumar vishal Assignee: kumar vishal Issue steps:1. create table , then restart the server and then do data load, in that case filter query record count is not matching. Problem: When user is creating any table and if user has not disabled inverted index false for any key column we are setting the inverted index true in column schema object. As we are not persisting this information in schema file, so after restarting the server useInvertedIndex property is false in columnschema object and in data loading column data is not sorted and in filter execution we are doing binary search, as data is not sorted binary search is failing and it is skipping some of the record. Solution : In this pr default value is set to true. One more PR will be raised to handle inverted index disabled scneario. By default Inverted index will be enabled for all the column for better query performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/82#discussion_r76625697 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -1443,5 +1446,32 @@ public static int getDictionaryChunkSize() { } return dictionaryOneChunkSize; } + + /** + * @param csvFilePath + * @return + */ + public static String readHeader(String csvFilePath) { + +DataInputStream fileReader = null; +BufferedReader bufferedReader = null; +String readLine = null; + +try { + fileReader = + FileFactory.getDataInputStream(csvFilePath, FileFactory.getFileType(csvFilePath)); + bufferedReader = + new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); --- End diff -- @foryou2030 instead of using Charset.defaultCharset() use the below line of code. Charset.forName( CarbonCommonConstants.DEFAULT_CHARSET) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: A warning when loading data
I don't think it's raised by lockfile, because I've tried to recreate a new table with a totally different name. However, I'll check it tomorrow. 2016-08-29 23:09 GMT+08:00 Ravindra Pesala: > Hi, > > Did you check if any locks are created under system temp folder with > //lockfile, if it exists please delete and try. > > Thanks, > Ravi. > > On 29 August 2016 at 20:29, Zen Wellon wrote: > > > Hi Ravi, > > > > After I upgrade carbon to 0.1.0, this problem occurs every time when I > try > > to load data, and I'm sure no other carbon is running because I use my > > personal dev spark-cluster, I've also tried to recreate a new table, but > > it's still there.. > > > > 2016-08-29 18:11 GMT+08:00 Ravindra Pesala : > > > > > Hi, > > > > > > Are you getting this exception continuously for every load? Usually it > > > occurs when you try to load the data concurrently to the same table. So > > > please make sure that no other instance of carbon is running and data > > load > > > on the same table is not happening. > > > Check if any locks are created under system temp folder with > > > //lockfile, if it exists please delete. > > > > > > Thanks & Regards, > > > Ravi > > > > > > On Mon, 29 Aug 2016 1:27 pm Zen Wellon, wrote: > > > > > > > Hi guys, > > > > When I tried to load some data into carbondata table with carbon > > 0.1.0, I > > > > met a problem below. > > > > > > > > WARN 29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365, > > > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file > > > ***(sensitive > > > > column) is locked for updation. Please try after some time > > > > at scala.sys.package$.error(package.scala:27) > > > > at > > > > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > > > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354) > > > > at > > > > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > > > RDD.compute(CarbonGlobalDictionaryRDD.scala:294) > > > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD. > > > scala:306) > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > > > > at > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > > > > at > > > > org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:227) > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > > ThreadPoolExecutor.java:1145) > > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > > ThreadPoolExecutor.java:615) > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > -- > > > > > > > > > > > > Best regards, > > > > William Zen > > > > > > > > > > > > > > > -- > > > > > > Best regards, > > William Zen > > > > > > -- > Thanks & Regards, > Ravi > -- Best regards, William Zen
Re: A warning when loading data
Hi, Did you check if any locks are created under system temp folder with //lockfile, if it exists please delete and try. Thanks, Ravi. On 29 August 2016 at 20:29, Zen Wellonwrote: > Hi Ravi, > > After I upgrade carbon to 0.1.0, this problem occurs every time when I try > to load data, and I'm sure no other carbon is running because I use my > personal dev spark-cluster, I've also tried to recreate a new table, but > it's still there.. > > 2016-08-29 18:11 GMT+08:00 Ravindra Pesala : > > > Hi, > > > > Are you getting this exception continuously for every load? Usually it > > occurs when you try to load the data concurrently to the same table. So > > please make sure that no other instance of carbon is running and data > load > > on the same table is not happening. > > Check if any locks are created under system temp folder with > > //lockfile, if it exists please delete. > > > > Thanks & Regards, > > Ravi > > > > On Mon, 29 Aug 2016 1:27 pm Zen Wellon, wrote: > > > > > Hi guys, > > > When I tried to load some data into carbondata table with carbon > 0.1.0, I > > > met a problem below. > > > > > > WARN 29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365, > > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file > > ***(sensitive > > > column) is locked for updation. Please try after some time > > > at scala.sys.package$.error(package.scala:27) > > > at > > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354) > > > at > > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > > RDD.compute(CarbonGlobalDictionaryRDD.scala:294) > > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD. > > scala:306) > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > > > at > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > > > at > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > > > at > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > ThreadPoolExecutor.java:1145) > > > at > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > ThreadPoolExecutor.java:615) > > > at java.lang.Thread.run(Thread.java:745) > > > > > > -- > > > > > > > > > Best regards, > > > William Zen > > > > > > > > > -- > > > Best regards, > William Zen > -- Thanks & Regards, Ravi
Re: A warning when loading data
Hi Ravi, After I upgrade carbon to 0.1.0, this problem occurs every time when I try to load data, and I'm sure no other carbon is running because I use my personal dev spark-cluster, I've also tried to recreate a new table, but it's still there.. 2016-08-29 18:11 GMT+08:00 Ravindra Pesala: > Hi, > > Are you getting this exception continuously for every load? Usually it > occurs when you try to load the data concurrently to the same table. So > please make sure that no other instance of carbon is running and data load > on the same table is not happening. > Check if any locks are created under system temp folder with > //lockfile, if it exists please delete. > > Thanks & Regards, > Ravi > > On Mon, 29 Aug 2016 1:27 pm Zen Wellon, wrote: > > > Hi guys, > > When I tried to load some data into carbondata table with carbon 0.1.0, I > > met a problem below. > > > > WARN 29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365, > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file > ***(sensitive > > column) is locked for updation. Please try after some time > > at scala.sys.package$.error(package.scala:27) > > at > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354) > > at > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate > RDD.compute(CarbonGlobalDictionaryRDD.scala:294) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD. > scala:306) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > > at > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > > at > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > > at > > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > > at > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > -- > > > > > > Best regards, > > William Zen > > > -- Best regards, William Zen
[GitHub] incubator-carbondata pull request #107: [WIP]quotechar is single without new...
GitHub user Jay357089 opened a pull request: https://github.com/apache/incubator-carbondata/pull/107 [WIP]quotechar is single without newLine You can merge this pull request into a Git repository by running: $ git pull https://github.com/Jay357089/incubator-carbondata quoteNewline Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/107.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #107 commit bf97e735a0e619e78e47611c1095939d6c6b92eb Author: Jay357089Date: 2016-08-29T14:55:22Z quotechar and newLine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...
Github user ravikiran23 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/92#discussion_r76619356 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -449,6 +457,12 @@ public void writeLoadDetailsIntoFile(String dataLoadLocation, for (LoadMetadataDetails loadMetadata : listOfLoadFolderDetailsArray) { Integer result = compareDateValues(loadMetadata.getLoadStartTimeAsLong(), loadStartTime); if (result < 0) { +if (CarbonCommonConstants.SEGMENT_COMPACTED --- End diff -- handled --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/81 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...
Github user ravikiran23 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/92#discussion_r76606724 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -410,18 +410,26 @@ public void writeLoadDetailsIntoFile(String dataLoadLocation, for (LoadMetadataDetails loadMetadata : listOfLoadFolderDetailsArray) { if (loadId.equalsIgnoreCase(loadMetadata.getLoadName())) { + // if the segment is compacted then no need to delete that. + if (CarbonCommonConstants.SEGMENT_COMPACTED + .equalsIgnoreCase(loadMetadata.getLoadStatus())) { +LOG.error("Cannot delete the Segment which is compacted. Segment is " + loadId); +loadFound = true; +invalidLoadIds.add(loadId); --- End diff -- fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...
Github user ManoharVanam commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/92#discussion_r76606495 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -410,18 +410,26 @@ public void writeLoadDetailsIntoFile(String dataLoadLocation, for (LoadMetadataDetails loadMetadata : listOfLoadFolderDetailsArray) { if (loadId.equalsIgnoreCase(loadMetadata.getLoadName())) { + // if the segment is compacted then no need to delete that. + if (CarbonCommonConstants.SEGMENT_COMPACTED + .equalsIgnoreCase(loadMetadata.getLoadStatus())) { +LOG.error("Cannot delete the Segment which is compacted. Segment is " + loadId); +loadFound = true; +invalidLoadIds.add(loadId); --- End diff -- Above two lines are not required as we are deleting all or none --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: A warning when loading data
Hi, Are you getting this exception continuously for every load? Usually it occurs when you try to load the data concurrently to the same table. So please make sure that no other instance of carbon is running and data load on the same table is not happening. Check if any locks are created under system temp folder with //lockfile, if it exists please delete. Thanks & Regards, Ravi On Mon, 29 Aug 2016 1:27 pm Zen Wellon,wrote: > Hi guys, > When I tried to load some data into carbondata table with carbon 0.1.0, I > met a problem below. > > WARN 29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365, > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file ***(sensitive > column) is locked for updation. Please try after some time > at scala.sys.package$.error(package.scala:27) > at > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354) > at > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:294) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > -- > > > Best regards, > William Zen >
[GitHub] incubator-carbondata pull request #104: [CARBONDATA-188] Compress CSV file b...
Github user Zhangshunyu commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/104#discussion_r76580960 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -112,25 +116,29 @@ private void initializeReader() throws IOException { // if already one input stream is open first we need to close and then // open new stream close(); -// get the block offset -long startOffset = this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockOffset(); -FileType fileType = FileFactory - .getFileType(this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath()); -// calculate the end offset the block -long endOffset = - this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockLength() + startOffset; - -// create a input stream for the block -DataInputStream dataInputStream = FileFactory - .getDataInputStream(this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath(), -fileType, bufferSize, startOffset); -// if start offset is not 0 then reading then reading and ignoring the extra line -if (startOffset != 0) { - LineReader lineReader = new LineReader(dataInputStream, 1); - startOffset += lineReader.readLine(new Text(), 0); + +String path = this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath(); +FileType fileType = FileFactory.getFileType(path); + +if (path.endsWith(".gz")) { + DataInputStream dataInputStream = + FileFactory.getCompressedDataInputStream(path, fileType, bufferSize); + inputStreamReader = new BufferedReader(new InputStreamReader(dataInputStream)); +} else { + long startOffset = this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockOffset(); + long blockLength = this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockLength(); + long endOffset = blockLength + startOffset; + + DataInputStream dataInputStream = FileFactory.getDataInputStream(path, fileType, bufferSize); + + // if start offset is not 0 then reading then reading and ignoring the extra line + if (startOffset != 0) { +LineReader lineReader = new LineReader(dataInputStream, 1); +startOffset += lineReader.readLine(new Text(), 0); + } + inputStreamReader = new BufferedReader(new InputStreamReader( + new BoundedDataStream(dataInputStream, endOffset - startOffset))); --- End diff -- Can not find class BoundedDataStream --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...
Github user ravikiran23 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/92#discussion_r76581081 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -410,6 +410,14 @@ public void writeLoadDetailsIntoFile(String dataLoadLocation, for (LoadMetadataDetails loadMetadata : listOfLoadFolderDetailsArray) { if (loadId.equalsIgnoreCase(loadMetadata.getLoadName())) { + // if the segment is compacted then no need to delete that. + if (CarbonCommonConstants.SEGMENT_COMPACTED + .equalsIgnoreCase(loadMetadata.getLoadStatus())) { +LOG.error("Cannot delete the load which is compacted."); --- End diff -- Here logs will be only in case of user is trying to delete the compacted loads intentionally using delete segment DDL. so it is ok to add the segment ID in logs. Fixing that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #103: Fix the bug that when using Decimal ...
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/103 Fix the bug that when using Decimal type as dictionary gen surrogate key will mismatch for the same values during increment load. ## Why raise this pr? **Fix bug: when using Decimal type as dictionary gen surrogate key will mismatch for the same values during increment load.** For example, when we specify Decimal type column using dictionary, as the using of `DataTypeUtil.normalizeColumnValueForItsDataType`, deciaml data for example 45, if we specify the precision of this column as 3, parsedValue would be 45.000, and this 45.000 would be written into dic file by writer.write(parsedValue). As a result, the second time we load the same data 45, dictionary.getSurrogateKey(value) would compare the value with dic value, but here the value is 45, our dic value is 45.000 stored as string, so dic would think that i don not have 45, this would lead to repeated values in dic, this is a mistake. How to solve this? Before check the surrogate key, if the datatype is decimal, we first using his parsedValue as value to check, this would not take 45 itself as different value. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata decimalDic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/103.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #103 commit 0403b9fe4ed32b9cbc4727b5a541cfccb089422e Author: ZhangshunyuDate: 2016-08-29T08:29:54Z Fix the bug that when Decimal type as dictionary gen surrogate key will mismatch for the same values --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
A warning when loading data
Hi guys, When I tried to load some data into carbondata table with carbon 0.1.0, I met a problem below. WARN 29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365, amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file ***(sensitive column) is locked for updation. Please try after some time at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:294) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- Best regards, William Zen
Re: [Exception] a thrift related problem occured when trying 0.0.1 release version
yes, I resolved this problem by deleting the old carbon metastore. 2016-08-27 0:18 GMT+08:00 Ravindra Pesala: > Hi William, > > It may be because you are using old carbon store. Please try using new > store path. There were changes in thrift so old store won't work on this > release. > > Thanks & Regards, > Ravi > > On 26 August 2016 at 21:05, Zen Wellon wrote: > > > Hi, guys > > > > Congratulations for the first stable version ! > > Today I heard that 0.0.1 was released and build a fresh jar for my spark > > cluster. But when I try to create a new table, an Exception occured, > anyone > > could help? > > > > below is the full stack: > > > > INFO 26-08 23:23:46,062 - Parsing command: create table if not exists > > carbondata_001_release_test(..) > > INFO 26-08 23:23:46,086 - Parse Completed > > java.io.IOException: org.apache.thrift.protocol.TProtocolException: > > Required field 'fact_table' was not present! Struct: > > TableInfo(fact_table:null, aggregate_table_list:null) > > at > > org.apache.carbondata.core.reader.ThriftReader.read( > ThriftReader.java:110) > > at > > org.apache.spark.sql.hive.CarbonMetastoreCatalog$$ > anonfun$fillMetaData$1$$ > > anonfun$apply$1.apply(CarbonMetastoreCatalog.scala:216) > > at > > org.apache.spark.sql.hive.CarbonMetastoreCatalog$$ > anonfun$fillMetaData$1$$ > > anonfun$apply$1.apply(CarbonMetastoreCatalog.scala:196) > > at > > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized. > > scala:33) > > at > > scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > > at > > org.apache.spark.sql.hive.CarbonMetastoreCatalog$$ > > anonfun$fillMetaData$1.apply(CarbonMetastoreCatalog.scala:196) > > at > > org.apache.spark.sql.hive.CarbonMetastoreCatalog$$ > > anonfun$fillMetaData$1.apply(CarbonMetastoreCatalog.scala:191) > > at > > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized. > > scala:33) > > at > > scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > > at > > org.apache.spark.sql.hive.CarbonMetastoreCatalog.fillMetaData( > > CarbonMetastoreCatalog.scala:191) > > at > > org.apache.spark.sql.hive.CarbonMetastoreCatalog.loadMetadata( > > CarbonMetastoreCatalog.scala:177) > > at > > org.apache.spark.sql.hive.CarbonMetastoreCatalog.( > > CarbonMetastoreCatalog.scala:112) > > at > > org.apache.spark.sql.CarbonContext$$anon$1.( > CarbonContext.scala:70) > > at > > org.apache.spark.sql.CarbonContext.catalog$lzycompute(CarbonContext. > > scala:70) > > at > > org.apache.spark.sql.CarbonContext.catalog(CarbonContext.scala:67) > > at > > org.apache.spark.sql.CarbonContext$$anon$2.( > CarbonContext.scala:75) > > at > > org.apache.spark.sql.CarbonContext.analyzer$lzycompute(CarbonContext. > > scala:75) > > at > > org.apache.spark.sql.CarbonContext.analyzer(CarbonContext.scala:74) > > at > > org.apache.spark.sql.execution.QueryExecution. > > assertAnalyzed(QueryExecution.scala:34) > > at org.apache.spark.sql.DataFrame.(DataFrame.scala:133) > > at > > org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.( > > CarbonDataFrameRDD.scala:23) > > at org.apache.spark.sql.CarbonContext.sql( > CarbonContext.scala:130) > > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40) > > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42) > > at $iwC$$iwC$$iwC$$iwC$$iwC.(:44) > > at $iwC$$iwC$$iwC$$iwC.(:46) > > at $iwC$$iwC$$iwC.(:48) > > at $iwC$$iwC.(:50) > > at $iwC.(:52) > > at (:54) > > at .(:58) > > at .() > > at .(:7) > > at .() > > at $print() > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java: > > 57) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke( > > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call( > SparkIMain.scala:1065) > > at > > org.apache.spark.repl.SparkIMain$Request.loadAndRun( > SparkIMain.scala:1346) > > at > > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > > at org.apache.spark.repl.SparkIMain.interpret( > > SparkIMain.scala:871) > > at org.apache.spark.repl.SparkIMain.interpret( > > SparkIMain.scala:819) > > at > > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > > at > > org.apache.spark.repl.SparkILoop.interpretStartingWith( > > SparkILoop.scala:902) > > at org.apache.spark.repl.SparkILoop.command(SparkILoop. > scala:814) > > at > >
[GitHub] incubator-carbondata pull request #102: [CARBONDATA-186] Except compaction a...
GitHub user nareshpr opened a pull request: https://github.com/apache/incubator-carbondata/pull/102 [CARBONDATA-186] Except compaction all other alter operations on carbon table will be unsupported. Reason: As Carbon table will not support alter operations except compaction, all the alter operations on carbon table should be skipped and error message should be displayed as "Unsupported alter operation on carbon table" Whereas if the alter operation is on hive table, it should be transferred to hive for performing the operation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nareshpr/incubator-carbondata altertableunsupportedoperations Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/102.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #102 commit 8ef206511fd1b8d83de63308fc338f85688f6451 Author: nareshprDate: 2016-08-29T06:57:23Z Alter operations on carbon table will be unsupported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-186) Exception Compaction all other alter operations on carbon table should not be performed.
Naresh P R created CARBONDATA-186: - Summary: Exception Compaction all other alter operations on carbon table should not be performed. Key: CARBONDATA-186 URL: https://issues.apache.org/jira/browse/CARBONDATA-186 Project: CarbonData Issue Type: Bug Reporter: Naresh P R Priority: Minor As Carbon table will not support alter operations exception compaction, all the alter operations on carbon table should be skipped and error message should be displayed as "Unsupported alter operation on carbon table" Whereas if the alter operation is on hive table, it should be transferred to hive for performing the operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)