[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133377209 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java --- @@ -0,0 +1,103 @@ +package org.apache.carbondata.presto.readers; --- End diff -- license header is missing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1253: [CARBONDATA-1373] Enhance update performance by incr...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1253 A new PR #1261 has been raised to solve this problem in a better way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133376989 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/readers/BooleanStreamReader.java --- @@ -0,0 +1,31 @@ +package org.apache.carbondata.presto.readers; --- End diff -- The license header is missing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user anubhav100 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133371582 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/expression/ExpressionResult.java --- @@ -183,7 +183,7 @@ public String getString() throws FilterIllegalMemberException { return parser.format((java.sql.Date) value); } else if (value instanceof Long) { if (isLiteral) { - return parser.format(new Timestamp((long) value / 1000)); + return parser.format(new Timestamp((long) value)); --- End diff -- @chenliang613 earlier time stamp was getting multiplied by 1000 in carbondata so thats why it was wriiten as -return parser.format(new Timestamp((long) value / 1000)); but now it is not getting multipled by 1000 so at time of building expression result so now there is no need of dividing timestamp by 1000 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1236 @steven-qin this pr has resolved this issue with more optimizations #1257 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1259: [Review][CARBONDATA-1381] Add test cases for missing...
Github user sraghunandan commented on the issue: https://github.com/apache/carbondata/pull/1259 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1253: [CARBONDATA-1373] Enhance update performance ...
Github user xuchuanyin closed the pull request at: https://github.com/apache/carbondata/pull/1253 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133370557 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/expression/ExpressionResult.java --- @@ -183,7 +183,7 @@ public String getString() throws FilterIllegalMemberException { return parser.format((java.sql.Date) value); } else if (value instanceof Long) { if (isLiteral) { - return parser.format(new Timestamp((long) value / 1000)); + return parser.format(new Timestamp((long) value)); --- End diff -- Why need do this change (remove /1000) for getString() ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133372855 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/constants/CarbonCommonConstants.java --- @@ -0,0 +1,1319 @@ +/* --- End diff -- please only keep these CommonConstants which be used by presto module, don't need to copy all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1261: [CARBONDATA-1373] Enhance update performance by incr...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1261 Previous PR #1253 is closed, use this new PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1261: [CARBONDATA-1373] Enhance update performance ...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/1261 [CARBONDATA-1373] Enhance update performance by increasing parallelism # Scenario Recently I have tested the update feature provided in Carbondata and found its poor performance. I had a table containing about 14 million records with about 370 columns(no dictionary columns) and the data files are about 3.8 GB in total. All the data files were in one segment. I performed an update SQL which update a column for all the records and the SQL looked like `UPDATE myTable SET (col1)=(col1+1000) WHERE TRUE`. In my environment, the update job failed with 'executor lost errors'. And I found 'spill data' related messages in the container logs. # Analyze I've read about the implementation of update-delete in Carbondata in ISSUE#440. The update consists a delete and an insert operation. And the error occurred during the insert operation. After studying the code, I have found that while doing inserting, the updated records are grouped by the `segmentId`, which means all the recoreds in one segment will be processed in only one task, thus will cause task failure when the amount of input data is quite large. # Solution We should improve the parallelism when doing update for a segment. I append a random key to the `segmentId` to increase the partition number before doing the insertion stage and then remove the suffix when doing the real insertion. # Modification + Increase parallelism while processing one segment in update, this is achieved by distributing records to different partitions by using a customized partitioner. + Add a property to configure the parallelism. + Clean up local files after update (previous bugs) + Remove useless imports + Add tests + Add related documents # Notes I have tested in my example and the job finished in about 13 minutes successfully. The records were updated as expected. Comparing to the previous implementation, the update performance has been enhanced: Origin(Parallelism(1) + GroupBy): Update **FAILED** Adding Parallelism(6) + GroupBy: Update **SUCCESSFULLY** using **13mins** Parallelism(1) + PartitionBy: Update **SUCCESSFULLY** using **21mins** Adding Parallelism(6) + PartitionBy: Update **SUCCESSFULLY** using **5mins** You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata enhance_update_perf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1261.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1261 commit ebfe1ca2a125c0b736e917d8a7956f5e39dedc50 Author: xuchuanyinDate: 2017-08-11T15:00:20Z Enhance update performance by increasing parallelism + Increase parallelism while processing one segment in update + Use partitionBy instead of groupby + Add a property to configure the parallelism + Clean up local files after update (previous bugs) + Remove useless imports --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1254: [CARBONDATA-1379] Fixed Date range filter wit...
Github user lionelcao commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1254#discussion_r133364318 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/DateDirectDictionaryGenerator.java --- @@ -154,14 +147,14 @@ private int generateDirectSurrogateKeyForNonTimestampType(String memberStr) { } private int generateKey(long timeValue) { -long milli = timeValue + threadLocalLocalTimeZone.get().getOffset(timeValue); -return (int) Math.floor((double) milli / MILLIS_PER_DAY) + cutOffDate; +return (int) Math.floor((double) timeValue / MILLIS_PER_DAY) + cutOffDate; } public void initialize() { if (simpleDateFormatLocal.get() == null) { simpleDateFormatLocal.set(new SimpleDateFormat(dateFormat)); simpleDateFormatLocal.get().setLenient(false); + simpleDateFormatLocal.get().setTimeZone(TimeZone.getTimeZone("GMT")); --- End diff -- Does it mean carbon will process all Date/TimeStamp as GMT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1257: [CARBONDATA-1347] Implemented Columnar Reading Of Da...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1257 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133375796 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/memory/AggregatedMemoryContext.java --- @@ -0,0 +1,62 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and --- End diff -- The license header is wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1235: fixed bug for IsNull and IsNotNull clause in presto
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1235 @steven-qin can you please provide reproducable steps? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133375914 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/memory/LocalMemoryContext.java --- @@ -0,0 +1,40 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --- End diff -- The license header is wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133375740 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/memory/AbstractAggregatedMemoryContext.java --- @@ -0,0 +1,37 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * --- End diff -- The license header is wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133377510 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java --- @@ -0,0 +1,103 @@ +package org.apache.carbondata.presto.readers; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; + +import com.facebook.presto.spi.block.Block; +import com.facebook.presto.spi.block.BlockBuilder; +import com.facebook.presto.spi.block.BlockBuilderStatus; +import com.facebook.presto.spi.type.DecimalType; +import com.facebook.presto.spi.type.Decimals; +import com.facebook.presto.spi.type.Type; +import io.airlift.slice.Slice; + +import static com.facebook.presto.spi.type.Decimals.encodeUnscaledValue; +import static com.facebook.presto.spi.type.Decimals.isShortDecimal; +import static com.facebook.presto.spi.type.Decimals.rescale; +import static com.google.common.base.Preconditions.checkArgument; +import static com.google.common.base.Preconditions.checkState; +import static io.airlift.slice.Slices.utf8Slice; +import static java.math.RoundingMode.HALF_UP; + +public class DecimalSliceStreamReader implements StreamReader { --- End diff -- This pr is optimizing for column reader, why need to add StreamReader ? please consider using different PR to implement different features. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1202: [CARBONDATA-1326] Fixed normal/low priority f...
Github user mohammadshahidkhan closed the pull request at: https://github.com/apache/carbondata/pull/1202 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1202: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user mohammadshahidkhan commented on the issue: https://github.com/apache/carbondata/pull/1202 not required --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1204: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user sraghunandan commented on the issue: https://github.com/apache/carbondata/pull/1204 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1257: [CARBONDATA-1347] Implemented Columnar Readin...
Github user anubhav100 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1257#discussion_r133407524 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/constants/CarbonCommonConstants.java --- @@ -0,0 +1,1319 @@ +/* --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1126: [CARBONDATA-1258] CarbonData should not allow...
Github user mohammadshahidkhan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1126#discussion_r133404888 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/DateDirectDictionaryGenerator.java --- @@ -154,6 +162,12 @@ private int generateDirectSurrogateKeyForNonTimestampType(String memberStr) { } private int generateKey(long timeValue) { +if (timeValue < MIN_VALUE || timeValue > MAX_VALUE) { + if (LOGGER.isDebugEnabled()) { +LOGGER.debug("Value for date type column is not in valid range. Value considered as null."); + } + return 1; --- End diff -- if the value is not in defined range then it will be store as null, if bad records action is FORCE. For null value dictionary key is 1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1234: fix bug in Spi2CarbondataTypeMapper method, it will ...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1234 @linqer how to test this pr can you provide reproducable steps? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1257: [CARBONDATA-1347] Implemented Columnar Reading Of Da...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1257 SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/197/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1204: [CARBONDATA-1326] Fixed normal/low priority findbug ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1204 SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/198/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1236 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1236 @anubhav100 i prefer to using steven-qin's pull request, because steven-qin fix it earlier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1258: [CARBONDATA-1325] Add partition guidance doc
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1258 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1260: [CARBONDATA-1382] Add more test cases for bucket fea...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1260 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1252: [CARBONDATA-1372]Update the documentation
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1252 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (CARBONDATA-1372) Fix some errors and update the examples in documentation
[ https://issues.apache.org/jira/browse/CARBONDATA-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-1372. Resolution: Fixed Fix Version/s: 1.2.0 > Fix some errors and update the examples in documentation > > > Key: CARBONDATA-1372 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1372 > Project: CarbonData > Issue Type: Improvement > Components: docs >Affects Versions: 1.2.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Minor > Fix For: 1.2.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > There are some errors in CarbonData docs, these should be fixed. > Some examples in documentation should be updated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1260: [CARBONDATA-1382] Add more test cases for bucket fea...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1260 SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/201/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1236 @chenliang sure he has done a great job --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1258: [CARBONDATA-1325] Add partition guidance doc
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1258 SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/200/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1236 SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/202/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1240: [CARBONDATA-1365] add RLE codec implementatio...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1240#discussion_r133606402 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/RLECodec.java --- @@ -0,0 +1,417 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.page.encoding; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.datastore.page.ComplexColumnPage; +import org.apache.carbondata.core.datastore.page.statistics.SimpleStatsResult; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.core.metadata.CodecMetaFactory; +import org.apache.carbondata.core.metadata.datatype.DataType; + +/** + * RLE encoding implementation for integral column page. + * This encoding keeps track of repeated-run and non-repeated-run, and make use + * of the highest bit of the length field to indicate the type of run. + * The length field is encoded as 32bits value. + * + * For example: input data {5, 5, 1, 2, 3, 3, 3, 3, 3} will be encoded to + * {0x00, 0x00, 0x00, 0x02, 0x05, (repeated-run, 2 values of 5) + * 0x80, 0x00, 0x00, 0x03, 0x01, 0x02, 0x03, (non-repeated-run, 3 values: 1, 2, 3) + * 0x00, 0x00, 0x00, 0x04, 0x03} (repeated-run, 4 values of 3) + */ +public class RLECodec implements ColumnPageCodec { + + enum RUN_STATE { INIT, START, REPEATED_RUN, NONREPEATED_RUN } + + private DataType dataType; + private int pageSize; + + /** + * New RLECodec + * @param dataType data type of the raw column page before encode + * @param pageSize page size of the raw column page before encode + */ + RLECodec(DataType dataType, int pageSize) { +this.dataType = dataType; +this.pageSize = pageSize; + } + + @Override + public String getName() { +return "RLECodec"; + } + + @Override + public EncodedColumnPage encode(ColumnPage input) throws MemoryException, IOException { +Encoder encoder = new Encoder(); +return encoder.encode(input); + } + + @Override + public EncodedColumnPage[] encodeComplexColumn(ComplexColumnPage input) { +throw new UnsupportedOperationException("complex column does not support RLE encoding"); + } + + @Override + public ColumnPage decode(byte[] input, int offset, int length) throws MemoryException, + IOException { +Decoder decoder = new Decoder(dataType, pageSize); +return decoder.decode(input, offset, length); + } + + // This codec supports integral type only + private void validateDataType(DataType dataType) { +switch (dataType) { + case BYTE: + case SHORT: + case INT: + case LONG: --- End diff -- I think we better make it for integral value only. For double and decimal, their distinct value are more in most of the case. Anyway, we can add it in future if it is required in future PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1258: [CARBONDATA-1325] Add partition guidance doc
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1258#discussion_r133619148 --- Diff: docs/partition-guide.md --- @@ -0,0 +1,124 @@ + + +### CarbonData Partition Table Guidance +This guidance illustrates how to create & use partition table in CarbonData. + +* [Create Partition Table](#create-partition-table) + - [Create Hash Partition Table](#create-hash-partition-table) + - [Create Range Partition Table](#create-range-partition-table) + - [Create List Partition Table](#create-list-partition-table) +* [Show Partitions](#show-partitions) +* [Maintain the Partitions](#maintain-the-partitions) +* [Partition Id](#partition-id) +* [Tips](#tips) + +### Create Partition Table + +# Create Hash Partition Table +``` + CREATE TABLE [IF NOT EXISTS] [db_name.]table_name +[(col_name data_type , ...)] + STORED BY 'carbondata' + PARTITIONED BY (partition_col_name data_type) --- End diff -- The âPARTITIONED BY' must be before 'STORED BY', otherwise it will throw an error: `mismatched input 'PARTITIONED' expecting {, '(', 'SELECT', 'FROM', 'AS', 'WITH', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE', 'TBLPROPERTIES', 'LOCATION'}(line 11, pos 2)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1249: [WIP] Use ColumnPage in reader for measure co...
Github user jackylk closed the pull request at: https://github.com/apache/carbondata/pull/1249 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1231: [CARBONDATA-1359] Unable to use carbondata on hive
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1231 SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/206/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1231: [CARBONDATA-1359] Unable to use carbondata on hive
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1231 SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/203/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1240: [CARBONDATA-1365] add RLE codec implementation
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1240 SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/204/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1262: [BUGFIX] Fix ZERO_BYTE_ARRAY constant not fou...
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/1262 [BUGFIX] Fix ZERO_BYTE_ARRAY constant not found in codegen CarbonCommonConstant.ZERO_BYTE_ARRAY is used in codegen, it should not be deleted. This PR add it back You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata zero Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1262.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1262 commit 53c9f24edf3f9cad1c2149d7b48197ed00e5522e Author: Jacky LiDate: 2017-08-17T03:45:24Z fix code gen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1258: [CARBONDATA-1325] Add partition guidance doc
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1258#discussion_r133619308 --- Diff: docs/partition-guide.md --- @@ -0,0 +1,124 @@ + + +### CarbonData Partition Table Guidance +This guidance illustrates how to create & use partition table in CarbonData. + +* [Create Partition Table](#create-partition-table) + - [Create Hash Partition Table](#create-hash-partition-table) + - [Create Range Partition Table](#create-range-partition-table) + - [Create List Partition Table](#create-list-partition-table) +* [Show Partitions](#show-partitions) +* [Maintain the Partitions](#maintain-the-partitions) +* [Partition Id](#partition-id) +* [Tips](#tips) + +### Create Partition Table + +# Create Hash Partition Table +``` + CREATE TABLE [IF NOT EXISTS] [db_name.]table_name +[(col_name data_type , ...)] + STORED BY 'carbondata' + PARTITIONED BY (partition_col_name data_type) + [TBLPROPERTIES ('PARTITION_TYPE'='HASH', + 'PARTITION_NUM'='N' ...)] --- End diff -- change 'PARTITION_NUM' to 'NUM_PARTITIONS' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1385) Add Test cases for Hive Integration
anubhav tarar created CARBONDATA-1385: - Summary: Add Test cases for Hive Integration Key: CARBONDATA-1385 URL: https://issues.apache.org/jira/browse/CARBONDATA-1385 Project: CarbonData Issue Type: Test Reporter: anubhav tarar Assignee: anubhav tarar Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1231: [CARBONDATA-1359] Unable to use carbondata on hive
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1231 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1255: [CARBONDATA-1375] clean hive pom
Github user cenyuhai commented on the issue: https://github.com/apache/carbondata/pull/1255 1.use hadoop.version instead of 2.6.0 2.use hive.version instead of 1.2.1 3.remove thrift 4.remove zookeeper 5.remove spark-hive and spark-sql --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1234: fix bug in Spi2CarbondataTypeMapper method, it will ...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1234 @linqer please correct your PR title as per : https://github.com/apache/carbondata/blob/master/docs/How-to-contribute-to-Apache-CarbonData.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1236: fixed bug for fetching the error value of decimal ty...
Github user anubhav100 commented on the issue: https://github.com/apache/carbondata/pull/1236 @steven-qin 1.please correct your PR title as per : https://github.com/apache/carbondata/blob/master/docs/How-to-contribute-to-Apache-CarbonData.md 2.there is still problem with decimal type i tried to run this query in both carbon data and presto from your branch and presto is giving wrong results in carbon- carbon.sql("select l_tax from lineitem where l_tax=0.06").show() carbon.stop() +-+ |l_tax| +-+ | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| | 0.06| +-+ only showing top 20 rows when i execute it in presto select l_tax from lineitem where l_tax=0.06; i get no result --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/1192 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---