[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1019 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...
Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1019#discussion_r121399194 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java --- @@ -126,6 +144,82 @@ private void intialiseInfos() { } } + /** + * Below method will be used to get the delete delta rows for a block + * + * @param dataBlock data block + * @param deleteDeltaInfo delete delta info + * @return blockid+pageid to deleted row mapping + */ + private MapgetDeleteDeltaDetails(AbstractIndex dataBlock, + DeleteDeltaInfo deleteDeltaInfo) { +// if datablock deleted delta timestamp is more then the current delete delta files timestamp +// then return the current deleted rows +if (dataBlock.getDeleteDeltaTimestamp() >= deleteDeltaInfo +.getLatestDeleteDeltaFileTimestamp()) { + return dataBlock.getDeletedRowsMap(); +} +CarbonDeleteFilesDataReader carbonDeleteDeltaFileReader = null; +// get the lock object so in case of concurrent query only one task will read the delete delta +// files other tasks will wait +Object lockObject = deleteDeltaToLockObjectMap.get(deleteDeltaInfo); +// if lock object is null then add a lock object +if (null == lockObject) { + synchronized (deleteDeltaToLockObjectMap) { +// double checking --- End diff -- ok. I missed it:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1019#discussion_r121395234 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java --- @@ -126,6 +144,82 @@ private void intialiseInfos() { } } + /** + * Below method will be used to get the delete delta rows for a block + * + * @param dataBlock data block + * @param deleteDeltaInfo delete delta info + * @return blockid+pageid to deleted row mapping + */ + private MapgetDeleteDeltaDetails(AbstractIndex dataBlock, + DeleteDeltaInfo deleteDeltaInfo) { +// if datablock deleted delta timestamp is more then the current delete delta files timestamp +// then return the current deleted rows +if (dataBlock.getDeleteDeltaTimestamp() >= deleteDeltaInfo +.getLatestDeleteDeltaFileTimestamp()) { + return dataBlock.getDeletedRowsMap(); +} +CarbonDeleteFilesDataReader carbonDeleteDeltaFileReader = null; +// get the lock object so in case of concurrent query only one task will read the delete delta +// files other tasks will wait +Object lockObject = deleteDeltaToLockObjectMap.get(deleteDeltaInfo); +// if lock object is null then add a lock object +if (null == lockObject) { + synchronized (deleteDeltaToLockObjectMap) { +// double checking --- End diff -- Again do `deleteDeltaToLockObjectMap.get(deleteDeltaInfo);` to avoid null pointer exception --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1019#discussion_r121390830 --- Diff: core/src/main/java/org/apache/carbondata/core/reader/CarbonDeleteFilesDataReader.java --- @@ -120,7 +122,53 @@ private void initThreadPoolSize() { } } return pageIdDeleteRowsMap; + } + /** + * Below method will be used to read the delete delta files + * and get the map of blockletid and page id mapping to deleted + * rows + * + * @param deltaFiles delete delta files array + * @return map of blockletid_pageid to deleted rows + */ + public MapgetDeletedRowsDataVo(String[] deltaFiles) { +List taskSubmitList = new ArrayList<>(); +ExecutorService executorService = Executors.newFixedThreadPool(thread_pool_size); +for (final String deltaFile : deltaFiles) { + taskSubmitList.add(executorService.submit(new Callable() { +@Override public DeleteDeltaBlockDetails call() throws IOException { + CarbonDeleteDeltaFileReaderImpl deltaFileReader = + new CarbonDeleteDeltaFileReaderImpl(deltaFile, FileFactory.getFileType(deltaFile)); + return deltaFileReader.readJson(); +} + })); +} +try { + executorService.shutdown(); + executorService.awaitTermination(30, TimeUnit.MINUTES); +} catch (InterruptedException e) { + LOGGER.error("Error while reading the delete delta files : " + e.getMessage()); +} +Map pageIdToBlockLetVo = new HashMap<>(); +List blockletDetails = null; +for (int i = 0; i < taskSubmitList.size(); i++) { + try { +blockletDetails = taskSubmitList.get(i).get().getBlockletDetails(); + } catch (InterruptedException | ExecutionException e) { +throw new RuntimeException(e); + } + for (DeleteDeltaBlockletDetails blockletDetail : blockletDetails) { +DeleteDeltaVo deleteDeltaVo = pageIdToBlockLetVo.get(blockletDetail.getBlockletKey()); +if (null == deleteDeltaVo) { + deleteDeltaVo = new DeleteDeltaVo(); + pageIdToBlockLetVo.put(blockletDetail.getBlockletKey(), deleteDeltaVo); +} +deleteDeltaVo.insertData(blockletDetail.getDeletedRows()); +; --- End diff -- remove semicolon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1019: [CARBONDATA-1156]Improve IUD performance and ...
GitHub user kumarvishal09 opened a pull request: https://github.com/apache/carbondata/pull/1019 [CARBONDATA-1156]Improve IUD performance and fixed synchronization issue Delete delta file loading is taking more time as it is read for blocklet level. Now added code to read block level. In current IUD design delete delta files are getting listed for each block in executor level in case of parallel query and iud operation it may give wrong result. Now passing delete delta information from driver to executor You can merge this pull request into a Git repository by running: $ git pull https://github.com/kumarvishal09/incubator-carbondata IUDPerformanceImprovement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1019.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1019 commit 60cfc66fe1f2de4cc3c2395a4dd479abb2a602f4 Author: kumarvishalDate: 2017-06-12T10:36:24Z Fixed Syncronization issue and improve IUD performance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---