[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2895 ---
[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2895#discussion_r231001948 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; -if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); -} +vector.setDictionary(dictionary); --- End diff -- done ---
[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2895#discussion_r230999455 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; -if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); -} +vector.setDictionary(dictionary); --- End diff -- Both the method are called in carbon flow for vector filling. One is direct fill case and the other one is old vector fill flow. Please cross check once ---
[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2895#discussion_r230997947 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; -if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); -} +vector.setDictionary(dictionary); --- End diff -- I have checked this while coding, fill row is our method. So, no issues. Only this vector is spark vector and dictionary needs to clear for it. ---
[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2895#discussion_r230814030 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/LocalDictDimensionDataChunkStore.java --- @@ -61,10 +61,7 @@ public void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] d int columnValueSize = dimensionDataChunkStore.getColumnValueSize(); int rowsNum = data.length / columnValueSize; CarbonColumnVector vector = vectorInfo.vector; -if (!dictionary.isDictionaryUsed()) { - vector.setDictionary(dictionary); - dictionary.setDictionaryUsed(); -} +vector.setDictionary(dictionary); --- End diff -- Is the same handling required in `fillRow` method also in the same class?..If required then `isDictionaryUsed` and `setDictionaryUsed` API's will not be required and those can also be removed from the interface ---
[GitHub] carbondata pull request #2895: [HOTFIX] Fix NPE in spark, when same vector r...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2895 [HOTFIX] Fix NPE in spark, when same vector reads files with local dictionary and without local dictionary problem: NPE in spark, when same vector reads files with local dictionary and without local dictionary cause: when two carbondata files are present, one with local dictionary and one without local dictionary. If same vector is used to read this files [can happen if task is launched to group of files]. If local dictionary files are found first, dictionary is set for that vector. But it was never reset for another file reading. solution: reset dictionary once batch is processed,set only for local dictionary batch processing. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done yes, cluster testing done. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master_new Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2895.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2895 commit 99c7621336e3cf180bfa0c3a326a2f1fafe51631 Author: ajantha-bhat Date: 2018-11-05T10:00:27Z Fix vectcor reading with local dictionary and without local dictionary ---