[GitHub] [carbondata] jackylk commented on issue #3476: [CARBONDATA-3593] Fix TOTAL_BLOCKLET_NUM not right when blocklet filt…
jackylk commented on issue #3476: [CARBONDATA-3593] Fix TOTAL_BLOCKLET_NUM not right when blocklet filt… URL: https://github.com/apache/carbondata/pull/3476#issuecomment-557778576 @shenh062326 Thanks for fixing this, could you paste a comparison of the statistics printing before and after this change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3457: [HOTFIX] Ignore testcase for compatibility problem in spark 2.1
jackylk commented on issue #3457: [HOTFIX] Ignore testcase for compatibility problem in spark 2.1 URL: https://github.com/apache/carbondata/pull/3457#issuecomment-557778708 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python SDK for user to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Summary: PyCarbon: provide python SDK for user to use CarbonData by python code (was: [WIP] ) > PyCarbon: provide python SDK for user to use CarbonData by python code > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Description: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code Goals: 1. Apache CarbonData should provides python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. 2. Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark. 3. Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. was: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code > PyCarbon: provide python interface for users to use CarbonData by python code > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > More and more people use big data to optimize their algorithm, train > their model, deploy their model as service and inference image. It's big > challenge to storage, manage and analysis lots of structured and unstructured > data, especially unstructured data, like image, video, audio and so on. > Many users use python to install their project for these scenario. > Apache CarbonData is an indexed columnar data store solution for fast > analytics on big data platform. Apache CarbonData is an indexed columnar data > store solution for fast analytics on big data platform. Apache CarbonData has > many great feature and high performance to storage, manage and analysis big > data. Apache CarbonData not only already supported String, Int, Double, > Boolean, Char,Date, TImeStamp data types, but also supported Binay > [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], > which can avoid small binary files problem and speed up S3 access performance > reach dozens or even hundreds of times, also can decrease cost of accessing > OBS by decreasing the number of calling S3 API. But it's not easy for them to > use
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878336 ## File path: core/src/main/java/org/apache/carbondata/core/scan/expression/geo/PolygonExpression.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.scan.expression.geo; + +import java.util.List; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.conditional.GreaterThanEqualToExpression; +import org.apache.carbondata.core.scan.expression.conditional.LessThanEqualToExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.expression.logical.AndExpression; +import org.apache.carbondata.core.scan.expression.logical.OrExpression; +import org.apache.carbondata.core.scan.expression.logical.RangeExpression; +import org.apache.carbondata.core.scan.expression.logical.TrueExpression; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.util.CustomIndex; + +/** + * InPolygon expression processor. It inputs the InPolygon string to the GeoHash implementation's + * query method, gets the list of ranges of GeoHash IDs to filter as an output. And then, multiple + * range expressions are build from those list of ranges. + */ +@InterfaceAudience.Internal +public class PolygonExpression extends Expression { + private String polygon; + private String columnName; + private CustomIndex> handler; + private List ranges; + + public PolygonExpression(String polygon, String columnName, CustomIndex handler) { +this.polygon = polygon; +this.handler = handler; +this.columnName = columnName; + } + + /** + * This method builds the GeoHash range expressions from the list of ranges of GeoHash IDs. + */ + public void buildRangeExpression() { +try { + ranges = handler.query(polygon); +} catch (Exception e) { + throw new RuntimeException(e); +} + +// Convert these ranges into range expressions +Expression expression = null; +Expression prevExpression = null; +Expression rangeExpression; +for (Long[] range : ranges) { + assert (range.length == 2); Review comment: Yes, Modified to EqualToExpression now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878271 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java ## @@ -161,11 +166,59 @@ public DictionaryClient call() throws Exception { return null; } + private int getDataFieldIndexByName(String column) { +for (int i = 0; i < fields.length; i++) { + if (fields[i].getColumn().getColName().equalsIgnoreCase(column)) { +return i; + } +} +return -1; + } + + private String generateNonSchemaColumnValue(DataField field, CarbonRow row) { +Map properties = configuration.getTableSpec().getCarbonTable() +.getTableInfo().getFactTable().getTableProperties(); +String handler = properties.get(CarbonCommonConstants.INDEX_HANDLER ++ "." + field.getColumn().getColName() + ".instance"); +if (handler != null) { + try { +// TODO Need to check to how to store the instance. This serialization may be incorrect. +ByteArrayInputStream bis = new ByteArrayInputStream(Base64.getDecoder().decode(handler)); Review comment: Modified accordingly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878261 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java ## @@ -183,6 +236,35 @@ public CarbonRow convert(CarbonRow row) throws CarbonDataLoadingException { } } } + +/* If non schema fields are present, generate the value for them and convert. */ +if (bNonSchemaPresent) { Review comment: Added comment in code. Also have modified converter to run the loop for 2. First iteration converts schema columns and second iteration generates and converts for non schema columns. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3283) Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in P
[ https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3283: -- Description: Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. TODO: was:WIP Summary: Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. (was: WIP) > Apache CarbonData should provides python interface to manage and analysis > data based on Apache Spark. Apache CarbonData should support DDL, DML, > DataMap feature in Python. > --- > > Key: CARBONDATA-3283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 > Project: CarbonData > Issue Type: Sub-task >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > Apache CarbonData should provides python interface to manage and analysis > data based on Apache Spark. Apache CarbonData should support DDL, DML, > DataMap feature in Python. > TODO: -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878730 ## File path: core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java ## @@ -0,0 +1,255 @@ +package org.apache.carbondata.core.util; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +public class GeoHashDefault implements CustomIndex> { +// 角度转弧度的转换因子 +private final static double CONVERT_FACTOR = 180.0; +// 地球半径 +private final static double EARTH_RADIUS = 6371004.0; + +private static double transValue = Math.PI / CONVERT_FACTOR * EARTH_RADIUS; // 赤道经度1度或者纬度1度对应的地理空间距离 + +private double oriLongitude = 0; // 坐标原点的经度 + +private double oriLatitude = 0; // 坐标原点的纬度 + +private double userDefineMaxLongitude = 0; // 用户定义地图最大的经度 + +private double userDefineMaxLatitude = 0; // 用户定义地图最大的纬度 + +private double userDefineMinLongitude = 0; // 用户定义地图最小的经度 + +private double userDefineMinLatitude = 0; // 用户定义地图最小的纬度 + +private double CalculateMaxLongitude = 0; // 计算后得出的补齐地图最大的经度 + +private double CalculateMaxLatitude = 0; // 计算后得出的补齐地图最大的纬度 + +private int gridSize = 0; //栅格长度单位是米 + +private double mCos; // 坐标原点纬度的余玄数值 + +private double deltaY = 0;// 每一个gridSize长度对应Y轴的度数 + +private double deltaX = 0;// 每一个gridSize长度应X轴的度数 + +private double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数 + +private double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数 + +private int cutLevel = 0; // 对整个区域切的刀数(一横一竖为1刀),就是四叉树的深度 + +private int totalRowNumber = 0;// 整个区域的行数,从左上开始到右下 + +private int totalCloumnNumber = 0; // 整个区域的列数,从左上开始到右下 + +private int udfRowStartNumber = 0; // 用户定义区域的开始行数 + +private int udfRowEndNumber = 0; // 用户定义区域的结束的行数 + +private int udfCloumnStartNumber = 0; // 用户定义区域的开始列数 + +private int udfCloumnEndNumber = 0; // 用户定义区域的开始结束列数 + +private double lon0 = 0; // 栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + +private double lat0 = 0; // 栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + +private double lon0ByRation = 0; // *系数的常量 +private double lat0ByRation = 0; // *系数的常量 + +private int conversionRatio = 1; // 系数,用于将double类型的经纬度,转换成int类型后计算 + + +@Override +public void validateOption(Map properties) throws Exception { +String option = properties.get(CarbonCommonConstants.INDEX_HANDLER); +if (option == null || option.isEmpty()) { +throw new MalformedCarbonCommandException( +String.format("%s property is invalid.", CarbonCommonConstants.INDEX_HANDLER)); +} + +String commonKey = "." + option + "."; Review comment: Ok. Removed blank lines in the complete PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878724 ## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableDropColumnCommand.scala ## @@ -27,6 +27,7 @@ import org.apache.spark.util.{AlterTableUtil, SparkUtil} import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants Review comment: Yes. Reverted it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878702 ## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala ## @@ -53,6 +53,21 @@ private[sql] case class CarbonDescribeFormattedCommand( (field.name, field.dataType.simpleString, colComment) } +/* Append non-schema columns */ +val columns = relation.carbonTable.getTableInfo.getFactTable.getListOfColumns.asScala +val implicitColumns = for (column <- columns if column.getSchemaOrdinal == -1) yield { + (column.getColumnName, column.getDataType.getName.toLowerCase, "") +} + +if (implicitColumns.nonEmpty) { + results ++= Seq( +("", "", ""), +("## Non-Schema Columns", "", "") + ) + Review comment: Ok. Removed blank lines in the complete PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3271) Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3271: -- Affects Version/s: (was: 1.5.1) Description: Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark. Goals: 1. CarbonData provides python interface to support TensorFlow to ready data from CarbonData for training model 2. CarbonData provides python interface to support MXNet to ready data from CarbonData for training model 3. CarbonData provides python interface to support PyTorch to ready data from CarbonData for training model 4. CarbonData should support epoch function 5. CarbonData should support cache for speed up performance. Summary: Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData (was: WIP) > Apache CarbonData should provides python interface to support deep learning > framework to ready and write data from/to CarbonData > > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: Sub-task >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > Apache CarbonData should provides python interface to support deep learning > framework to ready and write data from/to CarbonData, like TensorFlow , > MXNet, PyTorch and so on. It should not dependency Apache Spark. > Goals: > 1. CarbonData provides python interface to support TensorFlow to ready data > from CarbonData for training model > 2. CarbonData provides python interface to support MXNet to ready data from > CarbonData for training model > 3. CarbonData provides python interface to support PyTorch to ready data from > CarbonData for training model > 4. CarbonData should support epoch function > 5. CarbonData should support cache for speed up performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878730 ## File path: core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java ## @@ -0,0 +1,255 @@ +package org.apache.carbondata.core.util; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +public class GeoHashDefault implements CustomIndex> { +// 角度转弧度的转换因子 +private final static double CONVERT_FACTOR = 180.0; +// 地球半径 +private final static double EARTH_RADIUS = 6371004.0; + +private static double transValue = Math.PI / CONVERT_FACTOR * EARTH_RADIUS; // 赤道经度1度或者纬度1度对应的地理空间距离 + +private double oriLongitude = 0; // 坐标原点的经度 + +private double oriLatitude = 0; // 坐标原点的纬度 + +private double userDefineMaxLongitude = 0; // 用户定义地图最大的经度 + +private double userDefineMaxLatitude = 0; // 用户定义地图最大的纬度 + +private double userDefineMinLongitude = 0; // 用户定义地图最小的经度 + +private double userDefineMinLatitude = 0; // 用户定义地图最小的纬度 + +private double CalculateMaxLongitude = 0; // 计算后得出的补齐地图最大的经度 + +private double CalculateMaxLatitude = 0; // 计算后得出的补齐地图最大的纬度 + +private int gridSize = 0; //栅格长度单位是米 + +private double mCos; // 坐标原点纬度的余玄数值 + +private double deltaY = 0;// 每一个gridSize长度对应Y轴的度数 + +private double deltaX = 0;// 每一个gridSize长度应X轴的度数 + +private double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数 + +private double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数 + +private int cutLevel = 0; // 对整个区域切的刀数(一横一竖为1刀),就是四叉树的深度 + +private int totalRowNumber = 0;// 整个区域的行数,从左上开始到右下 + +private int totalCloumnNumber = 0; // 整个区域的列数,从左上开始到右下 + +private int udfRowStartNumber = 0; // 用户定义区域的开始行数 + +private int udfRowEndNumber = 0; // 用户定义区域的结束的行数 + +private int udfCloumnStartNumber = 0; // 用户定义区域的开始列数 + +private int udfCloumnEndNumber = 0; // 用户定义区域的开始结束列数 + +private double lon0 = 0; // 栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + +private double lat0 = 0; // 栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + +private double lon0ByRation = 0; // *系数的常量 +private double lat0ByRation = 0; // *系数的常量 + +private int conversionRatio = 1; // 系数,用于将double类型的经纬度,转换成int类型后计算 + + +@Override +public void validateOption(Map properties) throws Exception { +String option = properties.get(CarbonCommonConstants.INDEX_HANDLER); +if (option == null || option.isEmpty()) { +throw new MalformedCarbonCommandException( +String.format("%s property is invalid.", CarbonCommonConstants.INDEX_HANDLER)); +} + +String commonKey = "." + option + "."; Review comment: Ok. Removed blank lines in the complete PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] jackylk commented on issue #3457: [HOTFIX] Ignore testcase for compatibility problem in spark 2.1
jackylk commented on issue #3457: [HOTFIX] Ignore testcase for compatibility problem in spark 2.1 URL: https://github.com/apache/carbondata/pull/3457#issuecomment-557778633 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Reopened] (CARBONDATA-3271) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu reopened CARBONDATA-3271: --- push pycarbon to Apache CarbonData > WIP > --- > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: Sub-task >Affects Versions: 1.5.1 >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349877872 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ## @@ -69,7 +69,12 @@ public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configurati DataField[] input = new DataField[fields.length]; inputMapping = new int[input.length]; int k = 0; +Boolean isNonSchemaPresent = false; for (int i = 0; i < fields.length; i++) { + if (fields[i].getColumn().getSchemaOrdinal() == -1) { +isNonSchemaPresent = true; +continue; + } Review comment: This could have been the easiest thing to do for me :) There was a reason to keep inputMapping for non schema columns to be at the end. We can stop the row parse moment a non schema columns is encountered. We have only 1 non schema columns at the moment though. Anyway, have modified as suggested. I believe your comment applies to for converter(`RowConverterImpl.convert()`) too. Have modified converter to run the loop for 2. First iteration converts schema columns and second iteration generates and converts for non schema columns. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878336 ## File path: core/src/main/java/org/apache/carbondata/core/scan/expression/geo/PolygonExpression.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.scan.expression.geo; + +import java.util.List; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.expression.ColumnExpression; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.conditional.GreaterThanEqualToExpression; +import org.apache.carbondata.core.scan.expression.conditional.LessThanEqualToExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.expression.logical.AndExpression; +import org.apache.carbondata.core.scan.expression.logical.OrExpression; +import org.apache.carbondata.core.scan.expression.logical.RangeExpression; +import org.apache.carbondata.core.scan.expression.logical.TrueExpression; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; +import org.apache.carbondata.core.util.CustomIndex; + +/** + * InPolygon expression processor. It inputs the InPolygon string to the GeoHash implementation's + * query method, gets the list of ranges of GeoHash IDs to filter as an output. And then, multiple + * range expressions are build from those list of ranges. + */ +@InterfaceAudience.Internal +public class PolygonExpression extends Expression { + private String polygon; + private String columnName; + private CustomIndex> handler; + private List ranges; + + public PolygonExpression(String polygon, String columnName, CustomIndex handler) { +this.polygon = polygon; +this.handler = handler; +this.columnName = columnName; + } + + /** + * This method builds the GeoHash range expressions from the list of ranges of GeoHash IDs. + */ + public void buildRangeExpression() { +try { + ranges = handler.query(polygon); +} catch (Exception e) { + throw new RuntimeException(e); +} + +// Convert these ranges into range expressions +Expression expression = null; +Expression prevExpression = null; +Expression rangeExpression; +for (Long[] range : ranges) { + assert (range.length == 2); Review comment: Yes, Modified to EqualToExpression. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878392 ## File path: core/src/main/java/org/apache/carbondata/core/geo/GeoHashImpl.java ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.geo; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.util.CustomIndex; + +/** + * GeoHash default implementation + */ +public class GeoHashImpl implements CustomIndex> { + /** + * Initialize the geohash index handler instance. + * @param handlerName + * @param properties + * @throws Exception + */ + @Override + public void init(String handlerName, Map properties) throws Exception { +String options = properties.get(CarbonCommonConstants.INDEX_HANDLER); +if (options == null || options.isEmpty()) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid.", CarbonCommonConstants.INDEX_HANDLER)); +} +options = options.toLowerCase(); +if (!options.contains(handlerName.toLowerCase())) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s is not present.", + CarbonCommonConstants.INDEX_HANDLER, handlerName)); +} + +String commonKey = CarbonCommonConstants.INDEX_HANDLER + "." + handlerName + "."; +String TYPE = commonKey + "type"; +String SOURCE_COLUMNS = commonKey + "sourcecolumns"; +String SOURCE_COLUMN_TYPES = commonKey + "sourcecolumntypes"; +String TARGET_DATA_TYPE = commonKey + "datatype"; +String ORIGIN_LATITUDE = commonKey + "originlatitude"; +String MIN_LONGITUDE = commonKey + "minlongitude"; +String MAX_LONGITUDE = commonKey + "maxlongitude"; +String MIN_LATITUDE = commonKey + "minlatitude"; +String MAX_LATITUDE = commonKey + "maxlatitude"; +String GRID_SIZE = commonKey + "gridsize"; +String CONVERSION_RATIO = commonKey + "conversionratio"; + + +String sourceColumnsOption = properties.get(SOURCE_COLUMNS); +if (sourceColumnsOption == null) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s property is not specified.", + CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS)); +} + +if (sourceColumnsOption.split(",").length != 2) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s property must have 2 columns.", + CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS)); +} + +String type = properties.get(TYPE); +if (type != null && !CarbonCommonConstants.GEOHASH.equalsIgnoreCase(type)) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s property must be %s for this class.", + CarbonCommonConstants.INDEX_HANDLER, TYPE, CarbonCommonConstants.GEOHASH)); +} + +properties.put(TYPE, CarbonCommonConstants.GEOHASH); + +String sourceDataTypes = properties.get(SOURCE_COLUMN_TYPES); +String[] srcTypes = sourceDataTypes.split(","); +for (String srcdataType : srcTypes) { + if (!"bigint".equalsIgnoreCase(srcdataType)) { +throw new MalformedCarbonCommandException( +String.format("%s property is invalid. %s datatypes must be long.", +CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS)); + } +} + +String dataType = properties.get(TARGET_DATA_TYPE); +if (dataType != null && !"long".equalsIgnoreCase(dataType)) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s property must be long for this class.", + CarbonCommonConstants.INDEX_HANDLER,
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878902 ## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ## @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } + /** + * The method parses, validates and processes the index_handler property. + * Review comment: Ok. Removed blank lines in the complete PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878902 ## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ## @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } + /** + * The method parses, validates and processes the index_handler property. + * Review comment: Ok. Removed blank line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878838 ## File path: core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java ## @@ -0,0 +1,255 @@ +package org.apache.carbondata.core.util; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +public class GeoHashDefault implements CustomIndex> { +// 角度转弧度的转换因子 +private final static double CONVERT_FACTOR = 180.0; +// 地球半径 +private final static double EARTH_RADIUS = 6371004.0; + +private static double transValue = Math.PI / CONVERT_FACTOR * EARTH_RADIUS; // 赤道经度1度或者纬度1度对应的地理空间距离 + +private double oriLongitude = 0; // 坐标原点的经度 + +private double oriLatitude = 0; // 坐标原点的纬度 + +private double userDefineMaxLongitude = 0; // 用户定义地图最大的经度 + +private double userDefineMaxLatitude = 0; // 用户定义地图最大的纬度 + +private double userDefineMinLongitude = 0; // 用户定义地图最小的经度 + +private double userDefineMinLatitude = 0; // 用户定义地图最小的纬度 + +private double CalculateMaxLongitude = 0; // 计算后得出的补齐地图最大的经度 + +private double CalculateMaxLatitude = 0; // 计算后得出的补齐地图最大的纬度 + +private int gridSize = 0; //栅格长度单位是米 + +private double mCos; // 坐标原点纬度的余玄数值 + +private double deltaY = 0;// 每一个gridSize长度对应Y轴的度数 + +private double deltaX = 0;// 每一个gridSize长度应X轴的度数 + +private double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数 + +private double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数 + +private int cutLevel = 0; // 对整个区域切的刀数(一横一竖为1刀),就是四叉树的深度 + +private int totalRowNumber = 0;// 整个区域的行数,从左上开始到右下 + +private int totalCloumnNumber = 0; // 整个区域的列数,从左上开始到右下 + +private int udfRowStartNumber = 0; // 用户定义区域的开始行数 + +private int udfRowEndNumber = 0; // 用户定义区域的结束的行数 + +private int udfCloumnStartNumber = 0; // 用户定义区域的开始列数 + +private int udfCloumnEndNumber = 0; // 用户定义区域的开始结束列数 + +private double lon0 = 0; // 栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + +private double lat0 = 0; // 栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + +private double lon0ByRation = 0; // *系数的常量 +private double lat0ByRation = 0; // *系数的常量 + +private int conversionRatio = 1; // 系数,用于将double类型的经纬度,转换成int类型后计算 + + +@Override +public void validateOption(Map properties) throws Exception { +String option = properties.get(CarbonCommonConstants.INDEX_HANDLER); +if (option == null || option.isEmpty()) { +throw new MalformedCarbonCommandException( +String.format("%s property is invalid.", CarbonCommonConstants.INDEX_HANDLER)); +} + +String commonKey = "." + option + "."; +String sourceColumnsOption = properties.get(CarbonCommonConstants.INDEX_HANDLER + commonKey + "sourcecolumns"); +if (sourceColumnsOption == null) { +throw new MalformedCarbonCommandException( +String.format("%s property is invalid. %s property is not specified.", +CarbonCommonConstants.INDEX_HANDLER, +CarbonCommonConstants.INDEX_HANDLER + commonKey + "sourcecolumns")); +} + +if (sourceColumnsOption.split(",").length != 2) { +throw new MalformedCarbonCommandException( +String.format("%s property is invalid. %s property must have 2 columns.", +CarbonCommonConstants.INDEX_HANDLER, +CarbonCommonConstants.INDEX_HANDLER + commonKey + "sourcecolumns")); +} + +String type = properties.get(CarbonCommonConstants.INDEX_HANDLER + commonKey + "type"); +if (type != null && !"geohash".equalsIgnoreCase(type)) { Review comment: Agreed. Modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3255) CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData
[ https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3255: -- Description: Apache CarbonData already provide Java/ Scala/C++ interface for users, and more and more people use python to manage and analysis big data, so it's better to provide python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. We called it is PYSDK. PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python code. Even though Apache Spark use py4j in PySpark to call java code in python, but it's low performance when use py4j to read bigdata with CarbonData format in python code, py4j also show low performance when read big data in their report: https://www.py4j.org/advanced_topics.html#performance. JPype is also a popular tool to call java code in python, but it already stoped update several years ago, so we can not use it. In our test, pyjnius has high performance to read big data by call java code in python, so it's good choice for us. We already work for these feature several months in https://github.com/xubo245/pycarbon Goals: 1. PYSDK should provide interface to support read data 2. PYSDK should provide interface to support write data 3. PYSDK should support basic data types 4. PYSDK should support projection 5. PYSDK should support filter was: Apache CarbonData already provide Java/ Scala/C++ interface for users, and more and more people use python to manage and analysis big data, so it's better to provide python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. We called it is PYSDK. PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python code. Even though Apache Spark use py4j in PySpark to call java code in python, but it's low performance when use py4j to read bigdata with CarbonData format in python code, py4j also show low performance when read big data in their report: https://www.py4j.org/advanced_topics.html#performance. JPype is also a popular tool to call java code in python, but it already stoped update several years ago, so we can not use it. In our test, pyjnius has high performance to read big data by call java code in python, so it's good choice for us. Goals: 1. PYSDK should provide interface to support read data 2. PYSDK should provide interface to support write data 3. PYSDK should support basic data types 4. PYSDK should support projection 5. PYSDK should support filter > CarbonData provides python interface to support to write and read structured > and unstructured data in CarbonData > > > Key: CARBONDATA-3255 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3255 > Project: CarbonData > Issue Type: Sub-task >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Apache CarbonData already provide Java/ Scala/C++ interface for users, and > more and more people use python to manage and analysis big data, so it's > better to provide python interface to support to write and read structured > and unstructured data in CarbonData, like String, int and binary data: > image/voice/video. It should not dependency Apache Spark. We called it is > PYSDK. > PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python > code. Even though Apache Spark use py4j in PySpark to call java code in > python, but it's low performance when use py4j to read bigdata with > CarbonData format in python code, py4j also show low performance when read > big data in their report: > https://www.py4j.org/advanced_topics.html#performance. JPype is also a > popular tool to call java code in python, but it already stoped update > several years ago, so we can not use it. In our test, pyjnius has high > performance to read big data by call java code in python, so it's good choice > for us. > We already work for these feature several months in > https://github.com/xubo245/pycarbon > Goals: > 1. PYSDK should provide interface to support read data > 2. PYSDK should provide interface to support write data > 3. PYSDK should support basic data types > 4. PYSDK should support projection > 5. PYSDK should support filter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878815 ## File path: core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java ## @@ -0,0 +1,255 @@ +package org.apache.carbondata.core.util; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + Review comment: Added comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878765 ## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ## @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } + /** + * The method parses, validates and processes the index_handler property. + * + * @param tableProperties Table properties + * @param tableFields Sequence of table fields + * @return Sequence of table fields + * + */ + private def processIndexProperty(tableProperties: mutable.Map[String, String], + tableFields: Seq[Field]): Seq[Field] = { +val option = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER) +val fields = ListBuffer[Field]() +if (option.isDefined) { + if (option.get.isEmpty) { +throw new MalformedCarbonCommandException( + s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + +s"Option value is empty.") + } + + val handlers = option.get.split(",") + handlers.foreach { e => +/* Validate target column name */ +if (tableFields.exists(_.column.equalsIgnoreCase(e))) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"handler value : $e is not allowed. It matches with another column name in table. " + + s"Cannot create column with it.") +} + +val sourceColumnsOption = tableProperties.get( + CarbonCommonConstants.INDEX_HANDLER + s".$e.sourcecolumns") +if (sourceColumnsOption.isEmpty) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns property is not specified.") +} else if (sourceColumnsOption.get.isEmpty) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns property cannot be empty.") +} + +/* Validate source columns */ +val sources = sourceColumnsOption.get.split(",") +if (sources.distinct.length != sources.size) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns property " + + s"have duplicate columns.") +} + +val sourceTypes = StringBuilder.newBuilder +sources.foreach { column => + tableFields.find(_.column.equalsIgnoreCase(column)) match { +case Some(field) => sourceTypes.append(field.dataType.get).append(",") +case None => + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"Source column : $column in property " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns " + + "is not a valid column in table.") + } +} + +tableProperties.put(CarbonCommonConstants.INDEX_HANDLER + + s".$e.sourcecolumntypes", sourceTypes.dropRight(1).toString()) + +val handlerType = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER + s".$e.type") +val handlerClass = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER + s".$e.class") + +val handlerClassName: String = handlerClass match { + case Some(className) => +className + case None => +/* use handler type to find the default implementation */ +if (handlerType.isEmpty) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"Both ${CarbonCommonConstants.INDEX_HANDLER}.$e.class and " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.type properties are not specified") +} else if (handlerType.get.equalsIgnoreCase("geohash")) { + /* Use geoHash default implementation */ + val className = classOf[org.apache.carbondata.core.util.GeoHashDefault].getName + tableProperties.put(s"${CarbonCommonConstants.INDEX_HANDLER}.$e.class", className) + className +} else { + throw new MalformedCarbonCommandException( +s"Carbon
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878796 ## File path: core/src/main/java/org/apache/carbondata/core/util/CustomIndex.java ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.util; + +import java.util.List; +import java.util.Map; + +public interface CustomIndex { Review comment: Added comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878782 ## File path: core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java ## @@ -0,0 +1,255 @@ +package org.apache.carbondata.core.util; Review comment: Added header This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349879251 ## File path: core/src/main/java/org/apache/carbondata/core/geo/GeoHashImpl.java ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.geo; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.util.CustomIndex; + +/** + * GeoHash default implementation + */ +public class GeoHashImpl implements CustomIndex> { + /** + * Initialize the geohash index handler instance. + * @param handlerName + * @param properties + * @throws Exception + */ + @Override + public void init(String handlerName, Map properties) throws Exception { +String options = properties.get(CarbonCommonConstants.INDEX_HANDLER); +if (options == null || options.isEmpty()) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid.", CarbonCommonConstants.INDEX_HANDLER)); +} +options = options.toLowerCase(); +if (!options.contains(handlerName.toLowerCase())) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s is not present.", + CarbonCommonConstants.INDEX_HANDLER, handlerName)); +} + +String commonKey = CarbonCommonConstants.INDEX_HANDLER + "." + handlerName + "."; +String TYPE = commonKey + "type"; +String SOURCE_COLUMNS = commonKey + "sourcecolumns"; +String SOURCE_COLUMN_TYPES = commonKey + "sourcecolumntypes"; +String TARGET_DATA_TYPE = commonKey + "datatype"; +String ORIGIN_LATITUDE = commonKey + "originlatitude"; +String MIN_LONGITUDE = commonKey + "minlongitude"; +String MAX_LONGITUDE = commonKey + "maxlongitude"; +String MIN_LATITUDE = commonKey + "minlatitude"; +String MAX_LATITUDE = commonKey + "maxlatitude"; +String GRID_SIZE = commonKey + "gridsize"; +String CONVERSION_RATIO = commonKey + "conversionratio"; + + +String sourceColumnsOption = properties.get(SOURCE_COLUMNS); +if (sourceColumnsOption == null) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s property is not specified.", + CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS)); +} + +if (sourceColumnsOption.split(",").length != 2) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s property must have 2 columns.", + CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS)); +} + +String type = properties.get(TYPE); +if (type != null && !CarbonCommonConstants.GEOHASH.equalsIgnoreCase(type)) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s property must be %s for this class.", + CarbonCommonConstants.INDEX_HANDLER, TYPE, CarbonCommonConstants.GEOHASH)); +} + +properties.put(TYPE, CarbonCommonConstants.GEOHASH); Review comment: Agreed. Modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Reopened] (CARBONDATA-3283) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu reopened CARBONDATA-3283: --- contribute pycarbon to Apache CarbonData > WIP > --- > > Key: CARBONDATA-3283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 > Project: CarbonData > Issue Type: Sub-task >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > WIP -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Description: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336].But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code Apache CarbonData already was: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform has many great feature and high performance to storage, manage and analysis big data.But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code > PyCarbon: provide python interface for users to use CarbonData by python code > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > More and more people use big data to optimize their algorithm, train > their model, deploy their model as service and inference image. It's big > challenge to storage, manage and analysis lots of structured and unstructured > data, especially unstructured data, like image, video, audio and so on. > Many users use python to install their project for these scenario. > Apache CarbonData is an indexed columnar data store solution for fast > analytics on big data platform. Apache CarbonData is an indexed columnar data > store solution for fast analytics on big data platform. Apache CarbonData has > many great feature and high performance to storage, manage and analysis big > data. Apache CarbonData not only already supported String, Int, Double, > Boolean, Char,Date, TImeStamp data types, but also supported Binay > [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336].But > it's not easy for them to use carbon by Java/Scala/C++. So it's better to > provide python interface for these users to use CarbonData by python code > > Apache CarbonData already -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878033 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java ## @@ -161,11 +166,59 @@ public DictionaryClient call() throws Exception { return null; } + private int getDataFieldIndexByName(String column) { +for (int i = 0; i < fields.length; i++) { + if (fields[i].getColumn().getColName().equalsIgnoreCase(column)) { +return i; + } +} +return -1; + } + + private String generateNonSchemaColumnValue(DataField field, CarbonRow row) { +Map properties = configuration.getTableSpec().getCarbonTable() +.getTableInfo().getFactTable().getTableProperties(); +String handler = properties.get(CarbonCommonConstants.INDEX_HANDLER ++ "." + field.getColumn().getColName() + ".instance"); +if (handler != null) { + try { +// TODO Need to check to how to store the instance. This serialization may be incorrect. +ByteArrayInputStream bis = new ByteArrayInputStream(Base64.getDecoder().decode(handler)); +ObjectInputStream in = new ObjectInputStream(bis); +CustomIndex instance = (CustomIndex) in.readObject(); +String sourceColumns = properties.get(CarbonCommonConstants.INDEX_HANDLER ++ "." + field.getColumn().getColName() + ".sourcecolumns"); +assert (sourceColumns != null); +String[] sources = sourceColumns.split(","); +int srcFieldIndex; +List sourceValues = new ArrayList(); +for (String source : sources) { + srcFieldIndex = getDataFieldIndexByName(source); + assert (srcFieldIndex != -1); + sourceValues.add(row.getData()[srcFieldIndex]); +} +return instance.generate(sourceValues); Review comment: Modified accordingly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3255) CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData
[ https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3255: -- Description: Apache CarbonData already provide Java/ Scala/C++ interface for users, and more and more people use python to manage and analysis big data, so it's better to provide python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. We called it is PYSDK. PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python code. Even though Apache Spark use py4j in PySpark to call java code in python, but it's low performance when use py4j to read bigdata with CarbonData format in python code, py4j also show low performance when read big data in their report: https://www.py4j.org/advanced_topics.html#performance. JPype is also a popular tool to call java code in python, but it already stoped update several years ago, so we can not use it. In our test, pyjnius has high performance to read big data by call java code in python, so it's good choice for us. Goals: 1. PYSDK should provide interface to support read data 2. PYSDK should provide interface to support write data 3. PYSDK should support basic data types 4. PYSDK should support projection 5. PYSDK should support filter Summary: CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData (was: WIP) > CarbonData provides python interface to support to write and read structured > and unstructured data in CarbonData > > > Key: CARBONDATA-3255 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3255 > Project: CarbonData > Issue Type: Sub-task >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Apache CarbonData already provide Java/ Scala/C++ interface for users, and > more and more people use python to manage and analysis big data, so it's > better to provide python interface to support to write and read structured > and unstructured data in CarbonData, like String, int and binary data: > image/voice/video. It should not dependency Apache Spark. We called it is > PYSDK. > PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python > code. Even though Apache Spark use py4j in PySpark to call java code in > python, but it's low performance when use py4j to read bigdata with > CarbonData format in python code, py4j also show low performance when read > big data in their report: > https://www.py4j.org/advanced_topics.html#performance. JPype is also a > popular tool to call java code in python, but it already stoped update > several years ago, so we can not use it. In our test, pyjnius has high > performance to read big data by call java code in python, so it's good choice > for us. > Goals: > 1. PYSDK should provide interface to support read data > 2. PYSDK should provide interface to support write data > 3. PYSDK should support basic data types > 4. PYSDK should support projection > 5. PYSDK should support filter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878448 ## File path: core/src/main/java/org/apache/carbondata/core/geo/GeoHashImpl.java ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.geo; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.util.CustomIndex; + +/** + * GeoHash default implementation + */ +public class GeoHashImpl implements CustomIndex> { + /** + * Initialize the geohash index handler instance. + * @param handlerName + * @param properties + * @throws Exception + */ + @Override + public void init(String handlerName, Map properties) throws Exception { +String options = properties.get(CarbonCommonConstants.INDEX_HANDLER); +if (options == null || options.isEmpty()) { Review comment: Modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Reopened] (CARBONDATA-3254) [WIP]
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu reopened CARBONDATA-3254: --- Contribute Pycarbon to Apache CarbonData > [WIP] > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Description: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform has many great feature and high performance to storage, manage and analysis big data.But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code was: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform has many great feature and high performance to storage, manage and analysis big data.But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code > PyCarbon: provide python interface for users to use CarbonData by python code > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > More and more people use big data to optimize their algorithm, train > their model, deploy their model as service and inference image. It's big > challenge to storage, manage and analysis lots of structured and unstructured > data, especially unstructured data, like image, video, audio and so on. > Many users use python to install their project for these scenario. > Apache CarbonData is an indexed columnar data store solution for fast > analytics on big data platform. Apache CarbonData is an indexed columnar data > store solution for fast analytics on big data platform has many great feature > and high performance to storage, manage and analysis big data.But it's not > easy for them to use carbon by Java/Scala/C++. So it's better to provide > python interface for these users to use CarbonData by python code -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Description: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform has many great feature and high performance to storage, manage and analysis big data.But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code > PyCarbon: provide python interface for users to use CarbonData by python code > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > More and more people use big data to optimize their algorithm, train > their model, deploy their model as service and inference image. It's big > challenge to storage, manage and analysis lots of structured and unstructured > data, especially unstructured data, like image, video, audio and so on. > Many users use python to install their project for these scenario. > Apache CarbonData is an indexed columnar data store solution for fast > analytics on big data platform. Apache CarbonData is an indexed columnar data > store solution for fast analytics on big data platform has many great feature > and high performance to storage, manage and analysis big data.But it's not > easy for them to use carbon by Java/Scala/C++. So it's better to provide > python interface for these users to use CarbonData by python code -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878186 ## File path: integration/spark2/src/main/scala/org/apache/carbondata/geo/InPolygonUDF.scala ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.geo + +import org.apache.spark.sql.sources.Filter + +import org.apache.carbondata.common.annotations.InterfaceAudience + +@InterfaceAudience.Internal +class InPolygonUDF extends (String => Boolean) with Serializable { + override def apply(v1: String): Boolean = { +v1.length > 0 Review comment: Added comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878436 ## File path: core/src/main/java/org/apache/carbondata/core/geo/GeoHashImpl.java ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.geo; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.util.CustomIndex; + +/** + * GeoHash default implementation + */ +public class GeoHashImpl implements CustomIndex> { + /** + * Initialize the geohash index handler instance. + * @param handlerName + * @param properties + * @throws Exception + */ + @Override + public void init(String handlerName, Map properties) throws Exception { +String options = properties.get(CarbonCommonConstants.INDEX_HANDLER); +if (options == null || options.isEmpty()) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid.", CarbonCommonConstants.INDEX_HANDLER)); +} +options = options.toLowerCase(); +if (!options.contains(handlerName.toLowerCase())) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s is not present.", + CarbonCommonConstants.INDEX_HANDLER, handlerName)); +} + +String commonKey = CarbonCommonConstants.INDEX_HANDLER + "." + handlerName + "."; +String TYPE = commonKey + "type"; +String SOURCE_COLUMNS = commonKey + "sourcecolumns"; +String SOURCE_COLUMN_TYPES = commonKey + "sourcecolumntypes"; +String TARGET_DATA_TYPE = commonKey + "datatype"; +String ORIGIN_LATITUDE = commonKey + "originlatitude"; +String MIN_LONGITUDE = commonKey + "minlongitude"; +String MAX_LONGITUDE = commonKey + "maxlongitude"; +String MIN_LATITUDE = commonKey + "minlatitude"; +String MAX_LATITUDE = commonKey + "maxlatitude"; +String GRID_SIZE = commonKey + "gridsize"; +String CONVERSION_RATIO = commonKey + "conversionratio"; + Review comment: Removed blank lines in complete PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Description: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code We already work for these feature several months in https://github.com/xubo245/pycarbon Goals: 1. Apache CarbonData should provides python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. 2. Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark. 3. Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. was: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code Goals: 1. Apache CarbonData should provides python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. 2. Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark. 3. Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. > PyCarbon: provide python interface for users to use CarbonData by python code > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > More and more people use big data to optimize their algorithm, train > their model, deploy their model as service and inference image. It's big > challenge to storage, manage and analysis lots of structured and unstructured > data, especially unstructured data, like image, video, audio and so on. > Many users use python to install their project for these scenario. > Apache CarbonData is an indexed columnar data store solution for fast > analytics on big data platform. Apache CarbonData has many great feature and > high performance to storage, manage and analysis big data. Apache CarbonData > not only already supported
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Summary: PyCarbon: provide python interface for users to use CarbonData by python code (was: PyCarbon: provide python SDK for user to use CarbonData by python code) > PyCarbon: provide python interface for users to use CarbonData by python code > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (CARBONDATA-3255) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu reopened CARBONDATA-3255: --- contribute pycarbon to Apache CarbonData > WIP > --- > > Key: CARBONDATA-3255 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3255 > Project: CarbonData > Issue Type: Sub-task >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878603 ## File path: core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; + + +public class GeoHashDefault implements CustomIndex> { + // 角度转弧度的转换因子 + private static final double CONVERT_FACTOR = 180.0; + // 地球半径 + private static final double EARTH_RADIUS = 6371004.0; + + private static final String GEOHASH = "geohash"; + // 赤道经度1度或者纬度1度对应的地理空间距离 + private static double transValue = Math.PI / CONVERT_FACTOR * EARTH_RADIUS; + + // private double oriLongitude = 0; // 坐标原点的经度 + + private double oriLatitude = 0; // 坐标原点的纬度 + + private double userDefineMaxLongitude = 0; // 用户定义地图最大的经度 + + private double userDefineMaxLatitude = 0; // 用户定义地图最大的纬度 + + private double userDefineMinLongitude = 0; // 用户定义地图最小的经度 + + private double userDefineMinLatitude = 0; // 用户定义地图最小的纬度 + + private double CalculateMaxLongitude = 0; // 计算后得出的补齐地图最大的经度 + + private double CalculateMaxLatitude = 0; // 计算后得出的补齐地图最大的纬度 + + private int gridSize = 0; //栅格长度单位是米 + + private double mCos; // 坐标原点纬度的余玄数值 + + private double deltaY = 0;// 每一个gridSize长度对应Y轴的度数 + + private double deltaX = 0;// 每一个gridSize长度应X轴的度数 + + private double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数 + + private double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数 + + private int cutLevel = 0; // 对整个区域切的刀数(一横一竖为1刀),就是四叉树的深度 + + //private int totalRowNumber = 0;// 整个区域的行数,从左上开始到右下 + + //private int totalCloumnNumber = 0; // 整个区域的列数,从左上开始到右下 + + //private int udfRowStartNumber = 0; // 用户定义区域的开始行数 + + //private int udfRowEndNumber = 0; // 用户定义区域的结束的行数 + + //private int udfCloumnStartNumber = 0; // 用户定义区域的开始列数 + + //private int udfCloumnEndNumber = 0; // 用户定义区域的开始结束列数 + + //private double lon0 = 0; // 栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + + //private double lat0 = 0; // 栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 Review comment: Agreed. This code is removed from this PR and it will be raised by @MarvinLitt in a different PR as part algorithm. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878610 ## File path: core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException; +import org.apache.carbondata.core.constants.CarbonCommonConstants; + + +public class GeoHashDefault implements CustomIndex> { + // 角度转弧度的转换因子 + private static final double CONVERT_FACTOR = 180.0; + // 地球半径 + private static final double EARTH_RADIUS = 6371004.0; + + private static final String GEOHASH = "geohash"; + // 赤道经度1度或者纬度1度对应的地理空间距离 + private static double transValue = Math.PI / CONVERT_FACTOR * EARTH_RADIUS; + + // private double oriLongitude = 0; // 坐标原点的经度 + + private double oriLatitude = 0; // 坐标原点的纬度 + + private double userDefineMaxLongitude = 0; // 用户定义地图最大的经度 + + private double userDefineMaxLatitude = 0; // 用户定义地图最大的纬度 + + private double userDefineMinLongitude = 0; // 用户定义地图最小的经度 + + private double userDefineMinLatitude = 0; // 用户定义地图最小的纬度 + + private double CalculateMaxLongitude = 0; // 计算后得出的补齐地图最大的经度 + + private double CalculateMaxLatitude = 0; // 计算后得出的补齐地图最大的纬度 + + private int gridSize = 0; //栅格长度单位是米 + + private double mCos; // 坐标原点纬度的余玄数值 + + private double deltaY = 0;// 每一个gridSize长度对应Y轴的度数 + + private double deltaX = 0;// 每一个gridSize长度应X轴的度数 + + private double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数 + + private double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数 + + private int cutLevel = 0; // 对整个区域切的刀数(一横一竖为1刀),就是四叉树的深度 + + //private int totalRowNumber = 0;// 整个区域的行数,从左上开始到右下 + + //private int totalCloumnNumber = 0; // 整个区域的列数,从左上开始到右下 + + //private int udfRowStartNumber = 0; // 用户定义区域的开始行数 + + //private int udfRowEndNumber = 0; // 用户定义区域的结束的行数 + + //private int udfCloumnStartNumber = 0; // 用户定义区域的开始列数 + + //private int udfCloumnEndNumber = 0; // 用户定义区域的开始结束列数 + + //private double lon0 = 0; // 栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + + //private double lat0 = 0; // 栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标 + + private double lon0ByRation = 0; // *系数的常量 + private double lat0ByRation = 0; // *系数的常量 + + private int conversionRatio = 1; // 系数,用于将double类型的经纬度,转换成int类型后计算 + + /** + * Initialize the geohash index handler instance. + * @param handlerName + * @param properties + * @throws Exception + */ + @Override + public void init(String handlerName, Map properties) throws Exception { +String options = properties.get(CarbonCommonConstants.INDEX_HANDLER); +if (options == null || options.isEmpty()) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid.", CarbonCommonConstants.INDEX_HANDLER)); +} +options = options.toLowerCase(); +if (!options.contains(handlerName.toLowerCase())) { + throw new MalformedCarbonCommandException( + String.format("%s property is invalid. %s is not present.", + CarbonCommonConstants.INDEX_HANDLER, handlerName)); +} + +String commonKey = CarbonCommonConstants.INDEX_HANDLER + "." + handlerName + "."; +String TYPE = commonKey + "type"; +String SOURCE_COLUMNS = commonKey + "sourcecolumns"; +String SOURCE_COLUMN_TYPES = commonKey + "sourcecolumntypes"; +String TARGET_DATA_TYPE = commonKey + "datatype"; +// String ORIGIN_LONGITUDE = commonKey + "originlongitude"; +String ORIGIN_LATITUDE = commonKey + "originlatitude"; +String MIN_LONGITUDE = commonKey + "minlongitude"; +String MAX_LONGITUDE = commonKey + "maxlongitude"; +String MIN_LATITUDE = commonKey + "minlatitude"; +String MAX_LATITUDE = commonKey + "maxlatitude"; +
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878645 ## File path: integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ## @@ -934,4 +934,21 @@ object AlterTableUtil { CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB) } } + + def validateForIndexHandlerSources(carbonTable: CarbonTable, alterColumns: List[String]): Unit = { +// Do not allow index handler's source columns to be altered +val properties = carbonTable.getTableInfo.getFactTable.getTableProperties.asScala +val indexProperty = properties.get(CarbonCommonConstants.INDEX_HANDLER) +if (indexProperty.isDefined) { + indexProperty.get.split(",") foreach { element => Review comment: Agreed. Modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878691 ## File path: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala ## @@ -53,6 +53,21 @@ private[sql] case class CarbonDescribeFormattedCommand( (field.name, field.dataType.simpleString, colComment) } +/* Append non-schema columns */ +val columns = relation.carbonTable.getTableInfo.getFactTable.getListOfColumns.asScala +val implicitColumns = for (column <- columns if column.getSchemaOrdinal == -1) yield { Review comment: Yes. Modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Description: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code was: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336].But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code Apache CarbonData already > PyCarbon: provide python interface for users to use CarbonData by python code > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > More and more people use big data to optimize their algorithm, train > their model, deploy their model as service and inference image. It's big > challenge to storage, manage and analysis lots of structured and unstructured > data, especially unstructured data, like image, video, audio and so on. > Many users use python to install their project for these scenario. > Apache CarbonData is an indexed columnar data store solution for fast > analytics on big data platform. Apache CarbonData is an indexed columnar data > store solution for fast analytics on big data platform. Apache CarbonData has > many great feature and high performance to storage, manage and analysis big > data. Apache CarbonData not only already supported String, Int, Double, > Boolean, Char,Date, TImeStamp data types, but also supported Binay > [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], > which can avoid small binary files problem and speed up S3 access performance > reach dozens or even hundreds of times, also can decrease cost of accessing > OBS by decreasing the number of calling S3 API. But it's not easy for them to > use carbon by Java/Scala/C++. So it's better to provide python interface for > these users to use CarbonData by python code > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349877866 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ## @@ -79,6 +84,17 @@ public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configurati } } } +/* If non schema fields are present, keep all of them at the end of input */ +if (isNonSchemaPresent) { + for (int i = 0; i < fields.length; i++) { +if (fields[i].getColumn().getSchemaOrdinal() != -1) { + continue; +} +input[k] = fields[i]; +inputMapping[k] = -1; +k++; + } +} Review comment: same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349877865 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ## @@ -95,6 +111,11 @@ public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configurati } Object[] out = new Object[genericParsers.length]; for (int i = 0; i < genericParsers.length; i++) { + if (inputMapping[i] == -1) { +/* All the non schema fields are placed at end. And input mapping for them are marked as -1. +Can break the loop when inputMapping[i] is -1. */ +break; + } Review comment: same as above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349877872 ## File path: processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java ## @@ -69,7 +69,12 @@ public RowParserImpl(DataField[] output, CarbonDataLoadConfiguration configurati DataField[] input = new DataField[fields.length]; inputMapping = new int[input.length]; int k = 0; +Boolean isNonSchemaPresent = false; for (int i = 0; i < fields.length; i++) { + if (fields[i].getColumn().getSchemaOrdinal() == -1) { +isNonSchemaPresent = true; +continue; + } Review comment: This could have been the easiest thing to do for me :) There was a reason to keep inputMapping for non schema columns to be at the end. We can stop the row parse moment a non schema columns is encountered. We have only 1 non schema columns at the moment though. Anyway, have modified as suggested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878568 ## File path: dev/scalastyle-config.xml ## @@ -228,7 +228,7 @@ This file is divided into 3 sections: ]]> - + Review comment: Agreed. Reverted it back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3254: -- Description: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code Goals: 1. Apache CarbonData should provides python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. 2. Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark. 3. Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. was: More and more people use big data to optimize their algorithm, train their model, deploy their model as service and inference image. It's big challenge to storage, manage and analysis lots of structured and unstructured data, especially unstructured data, like image, video, audio and so on. Many users use python to install their project for these scenario. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform. Apache CarbonData has many great feature and high performance to storage, manage and analysis big data. Apache CarbonData not only already supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but also supported Binay [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], which can avoid small binary files problem and speed up S3 access performance reach dozens or even hundreds of times, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. But it's not easy for them to use carbon by Java/Scala/C++. So it's better to provide python interface for these users to use CarbonData by python code Goals: 1. Apache CarbonData should provides python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. 2. Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark. 3. Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. > PyCarbon: provide python interface for users to use CarbonData by python code > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > More and more people use big data to optimize their algorithm, train > their model, deploy their model as service and inference image. It's big > challenge to storage, manage and analysis lots of structured and unstructured > data, especially unstructured data, like image, video, audio and so on. > Many users use python to install their project for these scenario. > Apache CarbonData is an indexed columnar data store solution for fast > analytics on big data platform. Apache CarbonData has many great feature and > high performance to storage, manage and analysis big data. Apache CarbonData > not only already
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878535 ## File path: core/src/main/java/org/apache/carbondata/core/util/QuadTreeUtil.scala ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.util + +import java.util + + +class QuadTreeUtil{ + + /** +* 二分查找法,传入的是列表 List arrays, 待查询的是 des +* +* @param arrays 数据区域,数据是有序的 +* @param des待查询数据 +* @return >=0 返回找到的元素在列表中的位置 < 0 未找到 +*/ + def binarySearch(arrays: util.List[Array[Long]], des: Long): (Int,(Int, Int) ) = { Review comment: This file is removed from this PR and it will be raised by @MarvinLitt in a different PR as part algorithm. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349879032 ## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ## @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } + /** + * The method parses, validates and processes the index_handler property. + * + * @param tableProperties Table properties + * @param tableFields Sequence of table fields + * @return Sequence of table fields + * + */ + private def processIndexProperty(tableProperties: mutable.Map[String, String], + tableFields: Seq[Field]): Seq[Field] = { +val option = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER) +val fields = ListBuffer[Field]() +if (option.isDefined) { + if (option.get.isEmpty) { +throw new MalformedCarbonCommandException( + s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + +s"Option value is empty.") + } + + val handlers = option.get.split(",") + handlers.foreach { e => +/* Validate target column name */ +if (tableFields.exists(_.column.equalsIgnoreCase(e))) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"handler value : $e is not allowed. It matches with another column name in table. " + + s"Cannot create column with it.") +} + +val sourceColumnsOption = tableProperties.get( + CarbonCommonConstants.INDEX_HANDLER + s".$e.sourcecolumns") +if (sourceColumnsOption.isEmpty) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns property is not specified.") +} else if (sourceColumnsOption.get.isEmpty) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns property cannot be empty.") +} + +/* Validate source columns */ +val sources = sourceColumnsOption.get.split(",") +if (sources.distinct.length != sources.size) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns property " + + s"have duplicate columns.") +} + +val sourceTypes = StringBuilder.newBuilder +sources.foreach { column => + tableFields.find(_.column.equalsIgnoreCase(column)) match { +case Some(field) => sourceTypes.append(field.dataType.get).append(",") +case None => + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"Source column : $column in property " + Review comment: Modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349879013 ## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ## @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } + /** + * The method parses, validates and processes the index_handler property. + * + * @param tableProperties Table properties + * @param tableFields Sequence of table fields + * @return Sequence of table fields + * + */ + private def processIndexProperty(tableProperties: mutable.Map[String, String], + tableFields: Seq[Field]): Seq[Field] = { +val option = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER) +val fields = ListBuffer[Field]() +if (option.isDefined) { + if (option.get.isEmpty) { +throw new MalformedCarbonCommandException( + s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + +s"Option value is empty.") + } + + val handlers = option.get.split(",") + handlers.foreach { e => +/* Validate target column name */ +if (tableFields.exists(_.column.equalsIgnoreCase(e))) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"handler value : $e is not allowed. It matches with another column name in table. " + + s"Cannot create column with it.") +} + +val sourceColumnsOption = tableProperties.get( + CarbonCommonConstants.INDEX_HANDLER + s".$e.sourcecolumns") +if (sourceColumnsOption.isEmpty) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns property is not specified.") +} else if (sourceColumnsOption.get.isEmpty) { + throw new MalformedCarbonCommandException( +s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + + s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns property cannot be empty.") +} + +/* Validate source columns */ +val sources = sourceColumnsOption.get.split(",") Review comment: Agreed. Modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And
VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#discussion_r349879065 ## File path: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ## @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { } } + /** + * The method parses, validates and processes the index_handler property. + * + * @param tableProperties Table properties + * @param tableFields Sequence of table fields + * @return Sequence of table fields + * + */ + private def processIndexProperty(tableProperties: mutable.Map[String, String], + tableFields: Seq[Field]): Seq[Field] = { +val option = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER) +val fields = ListBuffer[Field]() +if (option.isDefined) { + if (option.get.isEmpty) { +throw new MalformedCarbonCommandException( + s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. " + +s"Option value is empty.") + } + + val handlers = option.get.split(",") + handlers.foreach { e => Review comment: Modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF
CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557821381 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/984/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] xubo245 opened a new pull request #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData
xubo245 opened a new pull request #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData URL: https://github.com/apache/carbondata/pull/3478 Apache CarbonData already provide Java/ Scala/C++ interface for users, and more and more people use python to manage and analysis big data, so it's better to provide python interface to support to write and read structured and unstructured data in CarbonData, like String, int and binary data: image/voice/video. It should not dependency Apache Spark. We called it is PYSDK. PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python code. Even though Apache Spark use py4j in PySpark to call java code in python, but it's low performance when use py4j to read bigdata with CarbonData format in python code, py4j also show low performance when read big data in their report: https://www.py4j.org/advanced_topics.html#performance. JPype is also a popular tool to call java code in python, but it already stoped update several years ago, so we can not use it. In our test, pyjnius has high performance to read big data by call java code in python, so it's good choice for us. We already work for these feature several months in https://github.com/xubo245/pycarbon Goals: 1. PYSDK should provide interface to support read data 2. PYSDK should provide interface to support write data 3. PYSDK should support basic data types 4. PYSDK should support projection 5. PYSDK should support filter Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF
CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557814939 Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/983/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF
CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557814895 Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/992/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF
CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557827031 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/993/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface
CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface URL: https://github.com/apache/carbondata/pull/2465#issuecomment-557841833 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/987/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/
CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData URL: https://github.com/apache/carbondata/pull/3479#issuecomment-557829689 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/985/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData
CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData URL: https://github.com/apache/carbondata/pull/3478#issuecomment-557829721 Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/986/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF
CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557827588 Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/996/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData
CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData URL: https://github.com/apache/carbondata/pull/3478#issuecomment-557834742 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/995/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/
CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData URL: https://github.com/apache/carbondata/pull/3479#issuecomment-557834680 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/994/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData
CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData URL: https://github.com/apache/carbondata/pull/3478#issuecomment-557836745 Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/998/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/
CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData URL: https://github.com/apache/carbondata/pull/3479#issuecomment-557836744 Build Success with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/997/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF
CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557815040 Build Failed with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/995/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (CARBONDATA-3283) Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in P
[ https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xu updated CARBONDATA-3283: -- Description: Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. Goals: 1). PyCarbon support read data from local/HDFS/S3 in python code by PySpark DataFrame 2). PyCarbon support write data in python code to local/HDFS/S3 by PySpark DataFrame 3). PyCarbon support DDL in python with sql format 4). PyCarbon support DML in python with sql format 5). PyCarbon support DataMap in python with sql format was: Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in Python. TODO: > Apache CarbonData should provides python interface to manage and analysis > data based on Apache Spark. Apache CarbonData should support DDL, DML, > DataMap feature in Python. > --- > > Key: CARBONDATA-3283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 > Project: CarbonData > Issue Type: Sub-task >Reporter: Bo Xu >Assignee: Bo Xu >Priority: Major > > Apache CarbonData should provides python interface to manage and analysis > data based on Apache Spark. Apache CarbonData should support DDL, DML, > DataMap feature in Python. > Goals: > 1). PyCarbon support read data from local/HDFS/S3 in python code by PySpark > DataFrame > 2). PyCarbon support write data in python code to local/HDFS/S3 by PySpark > DataFrame > 3). PyCarbon support DDL in python with sql format > 4). PyCarbon support DML in python with sql format > 5). PyCarbon support DataMap in python with sql format -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] xubo245 opened a new pull request #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data fro
xubo245 opened a new pull request #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData URL: https://github.com/apache/carbondata/pull/3479 Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, PyTorch and so on. It should not dependency Apache Spark. Goals: 1. CarbonData provides python interface to support TensorFlow to ready data from CarbonData for training model 2. CarbonData provides python interface to support MXNet to ready data from CarbonData for training model 3. CarbonData provides python interface to support PyTorch to ready data from CarbonData for training model 4. CarbonData should support epoch function 5. CarbonData should support cache for speed up performance. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface
CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface URL: https://github.com/apache/carbondata/pull/2465#issuecomment-557845983 Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/996/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface
CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface URL: https://github.com/apache/carbondata/pull/2465#issuecomment-557850178 Build Failed with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/999/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [carbondata] shenh062326 commented on issue #3476: [CARBONDATA-3593] Fix TOTAL_BLOCKLET_NUM not right when blocklet filt…
shenh062326 commented on issue #3476: [CARBONDATA-3593] Fix TOTAL_BLOCKLET_NUM not right when blocklet filt… URL: https://github.com/apache/carbondata/pull/3476#issuecomment-557853006 > @shenh062326 Thanks for fixing this, could you paste a comparison of the statistics printing before and after this change? ok, I have paste the comparison in the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services