[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2139 ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2139#discussion_r180789005 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -69,6 +72,13 @@ private static MapfilterMap = new HashMap<>(); + private final static String HIVE_DEFAULT_DYNAMIC_PARTITION = "__HIVE_DEFAULT_PARTITION__"; --- End diff -- ok ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user geetikagupta16 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2139#discussion_r180672102 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -69,6 +72,13 @@ private static MapfilterMap = new HashMap<>(); + private final static String HIVE_DEFAULT_DYNAMIC_PARTITION = "__HIVE_DEFAULT_PARTITION__"; + private final static String PARTITION_VALUE_WILDCARD = ""; --- End diff -- In hive PARTITION_VALUE_WILDCARD is used to select all the partitions from the table when we don't have any filter conditions on the partition column in our query, but in our code we are already quering all the partitions for the above case. So we can remove this part. ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user geetikagupta16 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2139#discussion_r180649360 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -69,6 +72,13 @@ private static MapfilterMap = new HashMap<>(); + private final static String HIVE_DEFAULT_DYNAMIC_PARTITION = "__HIVE_DEFAULT_PARTITION__"; --- End diff -- HIVE_DEFAULT_DYNAMIC_PARTITION is created when the partition value for that column is null so the data goes to __HIVE_DEFAULT_PARTITION__ partition. This check is added so that we can query the data from __HIVE_DEFAULT_PARTITION__ as well. ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2139#discussion_r180554217 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -69,6 +72,13 @@ private static MapfilterMap = new HashMap<>(); + private final static String HIVE_DEFAULT_DYNAMIC_PARTITION = "__HIVE_DEFAULT_PARTITION__"; + private final static String PARTITION_VALUE_WILDCARD = ""; --- End diff -- Why need to add "wildcard" for partition filter? ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2139#discussion_r180553823 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -69,6 +72,13 @@ private static MapfilterMap = new HashMap<>(); + private final static String HIVE_DEFAULT_DYNAMIC_PARTITION = "__HIVE_DEFAULT_PARTITION__"; --- End diff -- can you explain, why need to add "HIVE_DEFAULT_DYNAMIC_PARTITION" ? what purpose? if (value == null) { filter.add(carbonDataColumnHandle.getColumnName() + "=" + HIVE_DEFAULT_DYNAMIC_PARTITION); ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user bhavya411 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2139#discussion_r179913482 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -415,14 +440,56 @@ public TBase create() { return result; } + /** Returns list of partition specs to query based on the domain constraints + * @param constraints + * @param carbonTable + * @throws IOException + */ + private List findRequiredPartitions(TupleDomain constraints, CarbonTable carbonTable, + LoadMetadataDetails[]loadMetadataDetails) { +Set partitionSpecs = new HashSet<>(); +List prunePartitions = new ArrayList(); + +for (LoadMetadataDetails loadMetadataDetail : loadMetadataDetails) { + SegmentFileStore segmentFileStore = null; + try { +segmentFileStore = +new SegmentFileStore(carbonTable.getTablePath(), loadMetadataDetail.getSegmentFile()); +partitionSpecs.addAll(segmentFileStore.getPartitionSpecs()); + + } catch (IOException e) { +e.printStackTrace(); + } +} + +List partitionValuesFromExpression = +PrestoFilterUtil.getPartitionFilters(carbonTable, constraints); + +ListpartitionSpecNamesList = partitionSpecs.stream().map( +PartitionSpec::getPartitions).collect(Collectors.toList()); + +List partitionSpecsList = new ArrayList(partitionSpecs); + +for (int i = 0; i < partitionSpecNamesList.size(); i++) { --- End diff -- I am not sure about this logic , you have a set and you create a specsNameList and then you create a list of partitionSpecs and then you check that if the list has all fiteredPartition you add it to prunedPartitiion, why not do it in the 1st map itself instead of complicating code ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user bhavya411 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2139#discussion_r179913431 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -415,14 +440,56 @@ public TBase create() { return result; } + /** Returns list of partition specs to query based on the domain constraints + * @param constraints + * @param carbonTable + * @throws IOException + */ + private List findRequiredPartitions(TupleDomain constraints, CarbonTable carbonTable, + LoadMetadataDetails[]loadMetadataDetails) { +Set partitionSpecs = new HashSet<>(); +List prunePartitions = new ArrayList(); + +for (LoadMetadataDetails loadMetadataDetail : loadMetadataDetails) { + SegmentFileStore segmentFileStore = null; + try { +segmentFileStore = +new SegmentFileStore(carbonTable.getTablePath(), loadMetadataDetail.getSegmentFile()); +partitionSpecs.addAll(segmentFileStore.getPartitionSpecs()); + + } catch (IOException e) { +e.printStackTrace(); + } +} + +List partitionValuesFromExpression = +PrestoFilterUtil.getPartitionFilters(carbonTable, constraints); + +ListpartitionSpecNamesList = partitionSpecs.stream().map( +PartitionSpec::getPartitions).collect(Collectors.toList()); + +List partitionSpecsList = new ArrayList(partitionSpecs); + +for (int i = 0; i < partitionSpecNamesList.size(); i++) { + List partitionSpecNames = partitionSpecNamesList.get(i); + if (partitionSpecNames.containsAll(partitionValuesFromExpression)) { +prunePartitions +.add(partitionSpecsList.get(i)); + } +} +return prunePartitions; + } + private CarbonTableInputFormat createInputFormat( Configuration conf, - AbsoluteTableIdentifier identifier, Expression filterExpression) - throws IOException { + AbsoluteTableIdentifier identifier, Expression filterExpression, List filteredPartitions) + throws IOException { CarbonTableInputFormat format = new CarbonTableInputFormat(); CarbonTableInputFormat.setTablePath(conf, -identifier.appendWithLocalPrefix(identifier.getTablePath())); +identifier.appendWithLocalPrefix(identifier.getTablePath())); CarbonTableInputFormat.setFilterPredicates(conf, filterExpression); - +if(filteredPartitions.size() != 0) { + CarbonTableInputFormat.setPartitionsToPrune(conf, new ArrayList<>(filteredPartitions)); --- End diff -- Why Create a new ArrayList here when the calling method is already returning a List ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
Github user bhavya411 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2139#discussion_r179913191 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -387,11 +398,25 @@ public TBase create() { config.set(CarbonTableInputFormat.DATABASE_NAME, carbonTable.getDatabaseName()); config.set(CarbonTableInputFormat.TABLE_NAME, carbonTable.getTableName()); +JobConf jobConf = new JobConf(config); +List filteredPartitions = new ArrayList(); + +try { + loadMetadataDetails= SegmentStatusManager + .readTableStatusFile(CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath())); +} catch (IOException e) { + e.printStackTrace(); --- End diff -- The Error should be logged and if there is an IOException than it needs to be handled or thrown to make sure that error is propogated. ---
[GitHub] carbondata pull request #2139: [CARBONDATA-2267] [Presto] Support Reading Ca...
GitHub user anubhav100 opened a pull request: https://github.com/apache/carbondata/pull/2139 [CARBONDATA-2267] [Presto] Support Reading CarbonData Partition From Presto Integration Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/anubhav100/incubator-carbondata prestopartition Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2139.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2139 commit 9a8cf766e97882a3671cba8d566489f5918cc948 Author: anubhav100Date: 2018-04-04T08:18:15Z added logic for fetching the partition columns in presto commit 3faf8e0e32e175e89b43edaceef23da0a03927b6 Author: Geetika Gupta Date: 2018-04-04T08:37:36Z Refactored CarbonTableReader to add partition spec to configuration object ---