[GitHub] incubator-carbondata pull request #296: [CARBONDATA-382]Like Filter Query Op...
GitHub user sujith71955 opened a pull request: https://github.com/apache/incubator-carbondata/pull/296 [CARBONDATA-382]Like Filter Query Optimization for Dictionary Columns **Like Filter Query Optimization for Dictionary Columns** a) Added pushdown mechanism for the Like filters like startsWith,endsWith and contains so that the respective filters will be processed in Carbon layer itself. b) This mechanism can provide significant gain in the performance of Like filter queries applied in the dictionary since block and blocklet level pruning will be done in the carbon layer before applying the filters in dictionary columns. c) Since three new expressions has been added in carbon layer the carbon will be applying the expression only once for startsWith/endsWith/contains filter query, this will make the dictionary lookup also once for applying the Like expression. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/incubator-carbondata master_filterstartswith Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/296.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #296 commit 608906a25afc2bbabecfdd3c7bfa1351e83de9a0 Author: sujith71955 <sujithchacko.2...@gmail.com> Date: 2016-11-04T17:36:18Z [CARBONDATA-382]Like Filter Query Optimization for Dictionary Columns --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim property fo...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957010 --- Diff: processing/src/main/java/org/apache/carbondata/processing/graphgenerator/GraphGenerator.java --- @@ -998,4 +1000,24 @@ private void prepareIsUseInvertedIndex(List dims, graphConfig.setIsUseInvertedIndex( isUseInvertedIndexList.toArray(new Boolean[isUseInvertedIndexList.size()])); } + + /** + * Preparing the boolean [] to map whether the dimension use trim or not. + * + * @param dims + * @param graphConfig + */ + private void prepareIsUseTrim(List dims, +GraphConfigurationInfo graphConfig) { +List isUseTrimList = new ArrayList(); +for (CarbonDimension dimension : dims) { + if (dimension.isUseTrim()) { --- End diff -- Can we add this trim option as property of the column, i mean inside a column property rather than setting directly as CarbonDimension property, Already in CarbonColumn there is a property map which holds column related properties, i think we can use that, please check the feasibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim property fo...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957803 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -472,6 +475,7 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K break; } } +<<<<<<< HEAD --- End diff -- is this file is having any conflict? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim property fo...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957411 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java --- @@ -1694,5 +1699,19 @@ public void setTableOption(String tableOption) { public TableOptionWrapper getTableOptionWrapper() { return tableOptionWrapper; } + + public String getIsUseTrim() { +return isUseTrim; + } + + public void setIsUseTrim(Boolean[] isUseTrim) { +for (Boolean flag: isUseTrim) { + if (flag) { +this.isUseTrim += "T"; --- End diff -- Use TRUE/FALSE for better readability --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #246: [CARBONDATA-321] If user changes the...
GitHub user sujith71955 opened a pull request: https://github.com/apache/incubator-carbondata/pull/246 [CARBONDATA-321] If user changes the blocklet size the queries will b⦠**Problem:** If user changes the blocklet size (50) the queries will be failed. Currently byte size calculation in number compressor is int array length * bit length=x bytesize= x/8+x%8; when cardinality is 50 then byte size will be 3 but to calculate the actual int array size of index reverse based on the reverse calculation it is coming 4 int arrayLength=(3*8)/6 **Solution:** We need to round off bit length to nearest value which should be divisible by 8 You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/incubator-carbondata master_BlockletIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #246 commit e1d91197b4173625a1cb8c469cec95290bb93185 Author: sujith71955 <sujithchacko.2...@gmail.com> Date: 2016-10-17T11:12:16Z [CARBONDATA-321] If user changes the blocklet size the queries will be failed. this is happening since the inverted index compression is not been handled properly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/194#discussion_r83547082 --- Diff: core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java --- @@ -1426,4 +1423,25 @@ private static void getUnknownExpressionsList(Expression expression, getUnknownExpressionsList(child, lst); } } + /** + * This method will compare double values it will preserve + * the -0.0 and 0.0 equality as per == ,also preserve NaN equality check as per + * java.lang.Double.equals() + * + * @param d1 double value for equality check + * @param d2 double value for equality check + * @return boolean after comparing two double values. + */ + public static int compare(Double d1, Double d2) { +if ((d1.doubleValue() == d2.doubleValue()) || (Double.isNaN(d1) && Double.isNaN(d2))) { + return 0; +} +if (d1 < d2) { --- End diff -- one condition we can save in else, its fine i will update as per your comments. Thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/194#discussion_r83546963 --- Diff: core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java --- @@ -1426,4 +1423,25 @@ private static void getUnknownExpressionsList(Expression expression, getUnknownExpressionsList(child, lst); } } + /** + * This method will compare double values it will preserve + * the -0.0 and 0.0 equality as per == ,also preserve NaN equality check as per + * java.lang.Double.equals() + * + * @param d1 double value for equality check + * @param d2 double value for equality check + * @return boolean after comparing two double values. + */ + public static int compare(Double d1, Double d2) { +if ((d1.doubleValue() == d2.doubleValue()) || (Double.isNaN(d1) && Double.isNaN(d2))) { + return 0; +} +if (d1 < d2) { --- End diff -- since we are returning once any condition matches i think both if or if else makes no difference in this context. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #241: [CARBONDATA-319] Bad Records logging...
GitHub user sujith71955 opened a pull request: https://github.com/apache/incubator-carbondata/pull/241 [CARBONDATA-319] Bad Records logging for column data type is not proper **Problem:** Bad Records logging for column data type is not proper in case of long, carbon system while creating the table metadata uses BIGINT instead of LONG , internally it converts the bigint to long type and processes , while processing the data if any long type data is having issue it will be logged into the bad record with data type Long which is not proper since as per the metadata the datatype of column is BIGINT. **Solution :** Pass name for each data type ENUM constant in constructor, while logging the bad record get the column data type name value from the data type ENUM constant, for LONG the name will be BIGINT. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/incubator-carbondata master_BadRecord_Issue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/241.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #241 commit 2c08ecdbb50e57afcea26d25d80fe2b7f8208310 Author: sujith71955 <sujithchacko.2...@gmail.com> Date: 2016-10-15T15:39:27Z [CARBONDATA-319] Bad Records logging for column data type is not proper in case of long, carbon system while creating the table metadata uses BIGINT instead of LONG , internally it converts the bigint to long type and processes , while processing the data if any long type data is having issue it will be logged into the bad record with data type Long which is not proper since as per the metadata the datatype of column is BIGINT. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #224: [CARBONDATA-239]Add scan_blocklet_nu...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/224#discussion_r82983904 --- Diff: core/src/main/java/org/apache/carbondata/scan/scanner/impl/FilterScanner.java --- @@ -78,10 +80,11 @@ public FilterScanner(BlockExecutionInfo blockExecutionInfo) { * @throws QueryExecutionException * @throws FilterUnsupportedException */ - @Override public AbstractScannedResult scanBlocklet(BlocksChunkHolder blocksChunkHolder) + @Override public AbstractScannedResult scanBlocklet(BlocksChunkHolder blocksChunkHolder, + QueryStatisticsModel queryStatisticsModel) throws QueryExecutionException { try { - fillScannedResult(blocksChunkHolder); + fillScannedResult(blocksChunkHolder, queryStatisticsModel); --- End diff -- Pass the model in constructor so that no need to change in all API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977592 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- pros of this approach will be suppose in one load user loaded with dirty data and suddenly he realizes no i need to trim then in the next load he will enable the option and load the data, this will increase the dictionary space also, also in query dictionary lookup overhead will increase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968468 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- Guys i think if while data loading we are reading from configuration inorder to trim it or not same we need to do while doing filter also, based on configuration value decide. ex: while loading i enabled trim property, so system will trim and load the data, now in filter query also if user provides while space then it needs to be trimmed while creating filter model. This will provide more system consistentency. if user enable trim then we wont trim it while loading and also while querying. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r82758567 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -102,8 +102,8 @@ public void initialize() throws IOException { parserSettings.setMaxColumns( getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), csvParserVo.getMaxColumns())); parserSettings.setNullValue(""); -parserSettings.setIgnoreLeadingWhitespaces(false); -parserSettings.setIgnoreTrailingWhitespaces(false); +parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim()); --- End diff -- Same you need to handle in CarbonFilters.scala also, since while processing filter expressions the spaces are not getting trimmed, so here also you need to take care based on your feature --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #148: [CARBONDATA-233] bad record logger s...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/148#discussion_r82339778 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java --- @@ -89,41 +95,55 @@ public static String hasBadRecord(String key) { return badRecordEntry.remove(key); } - public void addBadRecordsToBilder(Object[] row, int size, String reason, String valueComparer) { + public void addBadRecordsToBuilder(Object[] row, int size, String reason, String valueComparer, + boolean badRecordsLogRedirect, boolean badRecordLoggerEnable) { StringBuilder logStrings = new StringBuilder(); +size = row.length; --- End diff -- better to validate whether we need to log the bad record in the starting of method, and then continue the log string creation logic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/189#discussion_r81310297 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -419,6 +420,7 @@ class TableNewProcessor(cm: tableModel, sqlContext: SQLContext) { schemaEvol .setSchemaEvolutionEntryList(new util.ArrayList[SchemaEvolutionEntry]()) tableSchema.setTableId(UUID.randomUUID().toString) + tableSchema.setBlocksize(Integer.parseInt(cm.tableBlockSize.getOrElse(0).toString)) --- End diff -- What if user will provide value as 1024M or 1MB, you will get exception and the same has not been handled. I think we need to handle --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281079 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java --- @@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus schmaModel, SchemaInfo info model.setEscapeCharacter(schmaModel.getEscapeCharacter()); model.setQuoteCharacter(schmaModel.getQuoteCharacter()); model.setCommentCharacter(schmaModel.getCommentCharacter()); +model.setTrim(schmaModel.getTrim()); --- End diff -- Not clear about the usecase, can you make it more clear by providing more details --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281050 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java --- @@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus schmaModel, SchemaInfo info model.setEscapeCharacter(schmaModel.getEscapeCharacter()); model.setQuoteCharacter(schmaModel.getQuoteCharacter()); model.setCommentCharacter(schmaModel.getCommentCharacter()); +model.setTrim(schmaModel.getTrim()); --- End diff -- why we need to set this in schemamodel? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/189#discussion_r81280420 --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/metadata/schema/table/TableSchema.java --- @@ -102,6 +107,14 @@ public void setSchemaEvalution(SchemaEvolution schemaEvalution) { this.schemaEvalution = schemaEvalution; } + public int getBlockszie() { +return blockszie; + } + + public void setBlockszie(int blockszie) { +this.blockszie = blockszie; --- End diff -- provide proper variable name 'blocksize' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/189#discussion_r81280323 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala --- @@ -546,6 +548,18 @@ class CarbonSqlParser() colGrpNames.toString() } + protected def getTableBlockSize(tableProperties: Map[String, String]): Integer = { +var tableBlockSize: Integer = 0 +if (tableProperties.get(CarbonCommonConstants.TABLE_BLOCKSIZE).isDefined) { + val blockSizeStr: String = tableProperties.get(CarbonCommonConstants.TABLE_BLOCKSIZE).get + try { +tableBlockSize = Integer.parseInt(blockSizeStr) + } catch { +case e: NumberFormatException => tableBlockSize = 0 --- End diff -- better assign default value. so that system will be in a consistent state always --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/194#discussion_r80221578 --- Diff: core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java --- @@ -1401,8 +1401,7 @@ public static void logError(Throwable e, boolean invalidRowsPresent) { public static boolean nanSafeEqualsDoubles(Double d1, Double d2) { Boolean xIsNan = Double.isNaN(d1); Boolean yIsNan = Double.isNaN(d2); -if ((xIsNan && yIsNan) || (d1.doubleValue() == d2.doubleValue())) { - +if ((d1.doubleValue() == d2.doubleValue()) || (xIsNan && yIsNan)) { --- End diff -- yes ramana, right, i will fix it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/156#discussion_r79291710 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -287,12 +307,18 @@ private Integer compareDateValues(Long loadValue, Long userValue) { } } else { -LOG.error("Unable to acquire the metadata lock"); +String errorMsg = "Delete segment by id is failed for " + tableDetails ++ ". Not able to acquire the delete segment lock due to other operation running " --- End diff -- LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/156#discussion_r79291381 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -287,12 +307,18 @@ private Integer compareDateValues(Long loadValue, Long userValue) { } } else { -LOG.error("Unable to acquire the metadata lock"); +String errorMsg = "Delete segment by id is failed for " + tableDetails ++ ". Not able to acquire the delete segment lock due to other operation running " --- End diff -- update message, provide operation name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/156#discussion_r79291346 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/locks/LockUsage.java --- @@ -28,5 +28,7 @@ public static String COMPACTION_LOCK = "compaction.lock"; public static String SYSTEMLEVEL_COMPACTION_LOCK = "system_level_compaction.lock"; public static String TABLE_STATUS_LOCK = "tablestatus.lock"; + public static String DELETE_SEGMENT_LOCK = "delete_segment.lock"; --- End diff -- make it final/use enum --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #163: Problem:Column heading was missing i...
GitHub user sujith71955 opened a pull request: https://github.com/apache/incubator-carbondata/pull/163 Problem:Column heading was missing in driver statistics table and sca⦠Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- â¦n block time was always zero in query statistics table. Analysis:While creating driver statistics table header was not created and the scan block time was not initialized before recording in statistics recorder. Fix:One more line is added to print the column header in driver statistics table and the scan block time is initialized before recording the statistics. Impact:Logs of all query You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/incubator-carbondata statistics_recorder_issue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/163.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #163 commit 665978e937603fd85c55a64c656655705ca20f0d Author: akashrn5 <akash.r.nilu...@huawei.com> Date: 2016-09-17T07:03:31Z Problem:Column heading was missing in driver statistics table and scan block time was always zero in query statistics table. Analysis:While creating driver statistics table header was not created and the scan block time was not initialized before recording in statistics recorder. Fix:One more line is added to print the column header in driver statistics table and the scan block time is initialized before recording the statistics. Impact:Logs of all query --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/156#discussion_r79204842 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -306,13 +330,18 @@ private Integer compareDateValues(Long loadValue, Long userValue) { * @return */ public List updateDeletionStatus(String loadDate, String tableFolderPath, - Long loadStartTime) { -ICarbonLock carbonLock = CarbonLockFactory - .getCarbonLockObj(absoluteTableIdentifier.getCarbonTableIdentifier(), -LockUsage.METADATA_LOCK); + Long loadStartTime) throws Exception { +CarbonTableIdentifier carbonTableIdentifier = +absoluteTableIdentifier.getCarbonTableIdentifier(); +ICarbonLock carbonMetadataLock = +CarbonLockFactory.getCarbonLockObj(carbonTableIdentifier, LockUsage.METADATA_LOCK); +ICarbonLock carbonTableStatusLock = +CarbonLockFactory.getCarbonLockObj(carbonTableIdentifier, LockUsage.TABLE_STATUS_LOCK); +String tableDetails = +carbonTableIdentifier.getDatabaseName() + "." + carbonTableIdentifier.getTableName(); List invalidLoadTimestamps = new ArrayList(0); try { - if (carbonLock.lockWithRetries()) { + if (carbonMetadataLock.lockWithRetries()) { --- End diff -- make it as operational log --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/156#discussion_r79187047 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -1077,19 +1077,38 @@ object CarbonDataRDDFactory extends Logging { val segmentStatusManager = new SegmentStatusManager(table.getAbsoluteTableIdentifier) val details = segmentStatusManager .readLoadMetadata(loadMetadataFilePath) + val carbonTableStatusLock = CarbonLockFactory + .getCarbonLockObj(table.getAbsoluteTableIdentifier.getCarbonTableIdentifier, + LockUsage.TABLE_STATUS_LOCK) // Delete marked loads val isUpdationRequired = DeleteLoadFolders .deleteLoadFoldersFromFileSystem(carbonLoadModel, hdfsStoreLocation, partitioner.partitionCount, isForceDeletion, details) if (isUpdationRequired) { +try { // Update load metadate file after cleaning deleted nodes -CarbonLoaderUtil.writeLoadMetadata( - carbonLoadModel.getCarbonDataLoadSchema, - carbonLoadModel.getDatabaseName, - carbonLoadModel.getTableName, details.toList.asJava -) +if (carbonTableStatusLock.lockWithRetries()) { + logger.info("Table status lock has been successfully acquired.") + CarbonLoaderUtil.writeLoadMetadata( +carbonLoadModel.getCarbonDataLoadSchema, +carbonLoadModel.getDatabaseName, +carbonLoadModel.getTableName, details.toList.asJava + ) +} +else { + val errorMsg = "Clean files request is failed for " + carbonLoadModel.getDatabaseName + + "." + carbonLoadModel.getTableName + + ". Not able to acquire the table status lock." --- End diff -- also mention that "due to other operation running in the background ", this message will give more clear picture to user about the task failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #159: [Issue Number] CARBONDATA-242
GitHub user sujith71955 opened a pull request: https://github.com/apache/incubator-carbondata/pull/159 [Issue Number] CARBONDATA-242 [Problem] Filter result was not compatible with hive result when a null filter member is present in not in filter model, as per hive no result shall be return if a NOT IN filter model has null object for comparison. eg: select country from t3 where country not in (null,'china','france') group by country [Description] When user provides Null member inside NOT IN filter condition the resultset is not compatible with hive result. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/incubator-carbondata master_StartsWithIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/159.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #159 commit 8709af0cebc07e8ac25f40f4a31dfdc95b4eb5e6 Author: sujith71955 <sujithchacko.2...@gmail.com> Date: 2016-09-15T09:23:04Z [Issue Number] CARBONDATA-242 [Description] When user provides Null member inside NOT IN filter condition the resultset is not compatible with hive result --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #122: [CARBONDATA-202] Handled exception t...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/122#discussion_r77335816 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -791,16 +819,23 @@ object GlobalDictionaryUtil extends Logging { if (requireDimension.nonEmpty) { val model = createDictionaryLoadModel(carbonLoadModel, table, requireDimension, hdfsLocation, dictfolderPath, false) +// check if dictionary files contains bad record +val accumulator = sqlContext.sparkContext.accumulator(0) // read local dictionary file, and group by key val allDictionaryRdd = readAllDictionaryFiles(sqlContext, headers, - requireColumnNames, allDictionaryPath) + requireColumnNames, allDictionaryPath, accumulator) // read exist dictionary and combine val inputRDD = new CarbonAllDictionaryCombineRDD(allDictionaryRdd, model) .partitionBy(new ColumnPartitioner(model.primDimensions.length)) // generate global dictionary files val statusList = new CarbonGlobalDictionaryGenerateRDD(inputRDD, model).collect() // check result status checkStatus(carbonLoadModel, sqlContext, model, statusList) +// if the dictionary contains wrong format record, throw ex +if (accumulator.value > 0) { + throw new DataLoadingException("Data Loading failure, the dictionary file " + --- End diff -- i think content term will be too generic, we can say dictionary values, Whats your opinion? better to use more meaningful term which is related to that particular file, like Dictionary values --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #122: [CARBONDATA-202] Handled exception t...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/122#discussion_r77332668 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -588,30 +588,59 @@ object GlobalDictionaryUtil extends Logging { allDictionaryPath: String) = { var allDictionaryRdd: RDD[(String, Iterable[String])] = null try { - // read local dictionary file, and spilt (columnIndex, columnValue) - val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath) -.map(x => { + // parse record and validate record + def parseRecord(x: String, accum: Accumulator[Int]) : (String, String) = { val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR) -if (tokens.size != 2) { - logError("Read a bad dictionary record: " + x) -} -var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME +var columnName: String = "" var value: String = "" -try { - columnName = csvFileColumns(tokens(0).toInt) - value = tokens(1) -} catch { - case ex: Exception => -logError("Reset bad dictionary record as default value") +// such as "," , "", throw ex +if (tokens.size == 0) { + logError("Read a bad dictionary record: " + x) + accum += 1 +} else if (tokens.size == 1) { + // such as "1", "jone", throw ex + if (x.contains(",") == false) { +accum += 1 + } else { +try { + columnName = csvFileColumns(tokens(0).toInt) +} catch { + case ex: Exception => +logError("Read a bad dictionary record: " + x) +accum += 1 +} + } +} else { + try { +columnName = csvFileColumns(tokens(0).toInt) +value = tokens(1) + } catch { +case ex: Exception => + logError("Read a bad dictionary record: " + x) + accum += 1 + } } (columnName, value) - }) + } + val accumulator = sqlContext.sparkContext.accumulator(0) + // read local dictionary file, and spilt (columnIndex, columnValue) + val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath) +.map(x => parseRecord(x, accumulator)).persist() + // for accumulator updates performed inside actions only + basicRdd.count() --- End diff -- again we are calling action which is not necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #105: [CARBONDATA-189] Drop database casca...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/105#discussion_r76657713 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala --- @@ -1342,4 +1345,9 @@ class CarbonSqlParser() } } + protected lazy val dropDatabaseCascade: Parser[LogicalPlan] = +DROP ~> (DATABASE|SCHEMA) ~> opt(IF ~> EXISTS) ~> ident ~> CASCADE <~ opt(";") ^^ { + case cascade => throw new MalformedCarbonCommandException( + "Unsupported cascade operation in drop database command") --- End diff -- since system supports both database and schema better to provide message including schema like database/schema command like "Unsupported cascade operation in drop database/schema command" , else better provide "Unsupported cascade operation in drop command". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/100#discussion_r76519992 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging { // filepath regex, look like "/path/*.dictionary" if (filePath.getName.startsWith("*")) { val dictExt = filePath.getName.substring(1) - val listFiles = filePath.getParentFile.listFiles() - if (listFiles.exists(file => -file.getName.endsWith(dictExt) && file.getSize > 0)) { -true + if (filePath.getParentFile.exists()) { +val listFiles = filePath.getParentFile.listFiles() --- End diff -- what if filePath.getParentFile returns null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/100#discussion_r76519971 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -591,29 +591,31 @@ object GlobalDictionaryUtil extends Logging { val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath) .map(x => { val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR) -var index: Int = 0 +if (tokens.size != 2) { + logError("[ALL_DICTIONARY] Read a bad dictionary record: " + x) +} +var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME var value: String = "" try { - index = tokens(0).toInt + columnName = csvFileColumns(tokens(0).toInt) value = tokens(1) } catch { case ex: Exception => -logError("read a bad dictionary record" + x) +logError("[ALL_DICTIONARY] Reset bad dictionary record as default value") } -(index, value) +(columnName, value) }) + // group by column index, and filter required columns val requireColumnsList = requireColumns.toList allDictionaryRdd = basicRdd .groupByKey() -.map(x => (csvFileColumns(x._1), x._2)) .filter(x => requireColumnsList.contains(x._1)) } catch { case ex: Exception => -logError("read local dictionary files failed") +logError("[ALL_DICTIONARY] Read dictionary files failed. Caused by" + ex.getMessage) --- End diff -- ALL_DICTIONARY this term is really required in this context? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---