date:20191123

[GitHub] [carbondata] jackylk commented on issue #3476: [CARBONDATA-3593] Fix TOTAL_BLOCKLET_NUM not right when blocklet filt…

2019-11-23 Thread GitBox

jackylk commented on issue #3476: [CARBONDATA-3593] Fix TOTAL_BLOCKLET_NUM not 
right when blocklet filt…
URL: https://github.com/apache/carbondata/pull/3476#issuecomment-557778576
 
 
   @shenh062326 Thanks for fixing this, could you paste a comparison of the 
statistics printing before and after this change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] jackylk commented on issue #3457: [HOTFIX] Ignore testcase for compatibility problem in spark 2.1

2019-11-23 Thread GitBox

jackylk commented on issue #3457: [HOTFIX] Ignore testcase for compatibility 
problem in spark 2.1
URL: https://github.com/apache/carbondata/pull/3457#issuecomment-557778708
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python SDK for user to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Summary: PyCarbon: provide python SDK for user to use CarbonData by python 
code  (was: [WIP] )

> PyCarbon: provide python SDK for user to use CarbonData by python code
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Description: 
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform. Apache CarbonData has many great feature 
and high performance to storage, manage and analysis big data. Apache 
CarbonData not only already supported String, Int, Double, Boolean, Char,Date, 
TImeStamp data types, but also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
which can avoid small binary files problem and speed up S3 access performance 
reach dozens or even hundreds of times, also can decrease cost of accessing OBS 
by decreasing the number of calling S3 API. But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code

 Goals:
1. Apache CarbonData should provides python interface to support to write and 
read structured and unstructured data in CarbonData, like String, int and 
binary data: image/voice/video. It should not dependency Apache Spark.
2. Apache CarbonData should provides python interface to support deep learning 
framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, 
PyTorch and so on. It should not dependency Apache Spark.
3. Apache CarbonData should provides python interface to manage and analysis 
data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap 
feature  in Python. 

 



  was:
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform. Apache CarbonData has many great feature 
and high performance to storage, manage and analysis big data. Apache 
CarbonData not only already supported String, Int, Double, Boolean, Char,Date, 
TImeStamp data types, but also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
which can avoid small binary files problem and speed up S3 access performance 
reach dozens or even hundreds of times, also can decrease cost of accessing OBS 
by decreasing the number of calling S3 API. But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code
 




> PyCarbon: provide python interface for users to use CarbonData by python code
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
>   More and more people use big data to optimize their algorithm, train 
> their model, deploy their model as service and inference image.  It's big 
> challenge to storage, manage and analysis lots of structured and unstructured 
> data, especially unstructured data, like image, video, audio and so on.
>  Many users use python to install their project for these scenario. 
> Apache CarbonData is an indexed columnar data store solution for fast 
> analytics on big data platform. Apache CarbonData is an indexed columnar data 
> store solution for fast analytics on big data platform. Apache CarbonData has 
> many great feature and high performance to storage, manage and analysis big 
> data. Apache CarbonData not only already supported String, Int, Double, 
> Boolean, Char,Date, TImeStamp data types, but also supported Binay 
> [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
> which can avoid small binary files problem and speed up S3 access performance 
> reach dozens or even hundreds of times, also can decrease cost of accessing 
> OBS by decreasing the number of calling S3 API. But it's not easy for them to 
> use

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878336
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/scan/expression/geo/PolygonExpression.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.scan.expression.geo;
+
+import java.util.List;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.LiteralExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.GreaterThanEqualToExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.LessThanEqualToExpression;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException;
+import org.apache.carbondata.core.scan.expression.logical.AndExpression;
+import org.apache.carbondata.core.scan.expression.logical.OrExpression;
+import org.apache.carbondata.core.scan.expression.logical.RangeExpression;
+import org.apache.carbondata.core.scan.expression.logical.TrueExpression;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.util.CustomIndex;
+
+/**
+ * InPolygon expression processor. It inputs the InPolygon string to the 
GeoHash implementation's
+ * query method, gets the list of ranges of GeoHash IDs to filter as an 
output. And then, multiple
+ * range expressions are build from those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonExpression extends Expression {
+  private String polygon;
+  private String columnName;
+  private CustomIndex> handler;
+  private List ranges;
+
+  public PolygonExpression(String polygon, String columnName, CustomIndex 
handler) {
+this.polygon = polygon;
+this.handler = handler;
+this.columnName = columnName;
+  }
+
+  /**
+   * This method builds the GeoHash range expressions from the list of ranges 
of GeoHash IDs.
+   */
+  public void buildRangeExpression() {
+try {
+  ranges = handler.query(polygon);
+} catch (Exception e) {
+  throw new RuntimeException(e);
+}
+
+// Convert these ranges into range expressions
+Expression expression = null;
+Expression prevExpression = null;
+Expression rangeExpression;
+for (Long[] range : ranges) {
+  assert (range.length == 2);
 
 Review comment:
   Yes, Modified to EqualToExpression now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878271
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java
 ##
 @@ -161,11 +166,59 @@ public DictionaryClient call() throws Exception {
 return null;
   }
 
+  private int getDataFieldIndexByName(String column) {
+for (int i = 0; i < fields.length; i++) {
+  if (fields[i].getColumn().getColName().equalsIgnoreCase(column)) {
+return i;
+  }
+}
+return -1;
+  }
+
+  private String generateNonSchemaColumnValue(DataField field, CarbonRow row) {
+Map properties = 
configuration.getTableSpec().getCarbonTable()
+.getTableInfo().getFactTable().getTableProperties();
+String handler = properties.get(CarbonCommonConstants.INDEX_HANDLER
++ "." + field.getColumn().getColName() + ".instance");
+if (handler != null) {
+  try {
+// TODO Need to check to how to store the instance. This serialization 
may be incorrect.
+ByteArrayInputStream bis = new 
ByteArrayInputStream(Base64.getDecoder().decode(handler));
 
 Review comment:
   Modified accordingly


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878261
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java
 ##
 @@ -183,6 +236,35 @@ public CarbonRow convert(CarbonRow row) throws 
CarbonDataLoadingException {
 }
   }
 }
+
+/* If non schema fields are present, generate the value for them and 
convert. */
+if (bNonSchemaPresent) {
 
 Review comment:
   Added comment in code. Also have modified converter to run the loop for 2. 
First iteration converts schema columns and second iteration generates and 
converts for non schema columns.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3283) Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in P

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3283:
--
Description: 
Apache CarbonData should provides python interface to manage and analysis data 
based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap 
feature in Python.

TODO: 

  was:WIP

Summary: Apache CarbonData should provides python interface to manage 
and analysis data based on Apache Spark. Apache CarbonData should support DDL, 
DML, DataMap feature in Python.  (was: WIP)

> Apache CarbonData should provides python interface to manage and analysis 
> data based on Apache Spark. Apache CarbonData should support DDL, DML, 
> DataMap feature in Python.
> ---
>
> Key: CARBONDATA-3283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
> Apache CarbonData should provides python interface to manage and analysis 
> data based on Apache Spark. Apache CarbonData should support DDL, DML, 
> DataMap feature in Python.
> TODO: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878730
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java
 ##
 @@ -0,0 +1,255 @@
+package org.apache.carbondata.core.util;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class GeoHashDefault implements CustomIndex> 
{
+// 角度转弧度的转换因子
+private final static double CONVERT_FACTOR = 180.0;
+// 地球半径
+private final static double EARTH_RADIUS = 6371004.0;
+
+private static double transValue = Math.PI / CONVERT_FACTOR * 
EARTH_RADIUS; // 赤道经度1度或者纬度1度对应的地理空间距离
+
+private double oriLongitude = 0;  // 坐标原点的经度
+
+private double oriLatitude = 0;   // 坐标原点的纬度
+
+private double userDefineMaxLongitude = 0;  // 用户定义地图最大的经度
+
+private double userDefineMaxLatitude = 0;   // 用户定义地图最大的纬度
+
+private double userDefineMinLongitude = 0;  // 用户定义地图最小的经度
+
+private double userDefineMinLatitude = 0;   // 用户定义地图最小的纬度
+
+private double CalculateMaxLongitude = 0;  // 计算后得出的补齐地图最大的经度
+
+private double CalculateMaxLatitude = 0;  // 计算后得出的补齐地图最大的纬度
+
+private int gridSize = 0;  //栅格长度单位是米
+
+private double mCos;   // 坐标原点纬度的余玄数值
+
+private  double deltaY = 0;// 每一个gridSize长度对应Y轴的度数
+
+private  double deltaX = 0;// 每一个gridSize长度应X轴的度数
+
+private  double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数
+
+private  double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数
+
+private int cutLevel = 0;  // 对整个区域切的刀数（一横一竖为1刀），就是四叉树的深度
+
+private int totalRowNumber = 0;// 整个区域的行数，从左上开始到右下
+
+private int totalCloumnNumber = 0;   // 整个区域的列数，从左上开始到右下
+
+private int udfRowStartNumber = 0;   // 用户定义区域的开始行数
+
+private int udfRowEndNumber = 0;   // 用户定义区域的结束的行数
+
+private int udfCloumnStartNumber = 0;   // 用户定义区域的开始列数
+
+private int udfCloumnEndNumber = 0;   // 用户定义区域的开始结束列数
+
+private double lon0 = 0;  // 栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+private double lat0 = 0;  // 栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+private double lon0ByRation = 0;  // *系数的常量
+private double lat0ByRation = 0;  // *系数的常量
+
+private int conversionRatio = 1;  // 系数，用于将double类型的经纬度，转换成int类型后计算
+
+
+@Override
+public void validateOption(Map properties) throws 
Exception {
+String option = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (option == null || option.isEmpty()) {
+throw new MalformedCarbonCommandException(
+String.format("%s property is invalid.", 
CarbonCommonConstants.INDEX_HANDLER));
+}
+
+String commonKey = "." + option + ".";
 
 Review comment:
   Ok. Removed blank lines in the complete PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878724
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableDropColumnCommand.scala
 ##
 @@ -27,6 +27,7 @@ import org.apache.spark.util.{AlterTableUtil, SparkUtil}
 
 import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
 import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
 
 Review comment:
   Yes. Reverted it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878702
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ##
 @@ -53,6 +53,21 @@ private[sql] case class CarbonDescribeFormattedCommand(
   (field.name, field.dataType.simpleString, colComment)
 }
 
+/* Append non-schema columns */
+val columns = 
relation.carbonTable.getTableInfo.getFactTable.getListOfColumns.asScala
+val implicitColumns = for (column <- columns if column.getSchemaOrdinal == 
-1) yield {
+  (column.getColumnName, column.getDataType.getName.toLowerCase, "")
+}
+
+if (implicitColumns.nonEmpty) {
+  results ++= Seq(
+("", "", ""),
+("## Non-Schema Columns", "", "")
+  )
+
 
 Review comment:
   Ok. Removed blank lines in the complete PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3271) Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/to CarbonData

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3271:
--
Affects Version/s: (was: 1.5.1)
  Description: 
Apache CarbonData should provides python interface to support deep learning 
framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, 
PyTorch and so on. It should not dependency Apache Spark.

Goals：
1. CarbonData provides python interface to support TensorFlow to ready data 
from CarbonData for training model
2. CarbonData provides python interface to support MXNet to ready data from 
CarbonData for training model
3. CarbonData provides python interface to support PyTorch to ready data from 
CarbonData for training model
4. CarbonData should support epoch function
5. CarbonData should support cache for speed up performance.


  Summary: Apache CarbonData should provides python interface to 
support deep learning framework to ready and write data from/to CarbonData  
(was: WIP)

> Apache CarbonData should provides python interface to support deep learning 
> framework to ready and write data from/to CarbonData
> 
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Apache CarbonData should provides python interface to support deep learning 
> framework to ready and write data from/to CarbonData, like TensorFlow , 
> MXNet, PyTorch and so on. It should not dependency Apache Spark.
> Goals：
> 1. CarbonData provides python interface to support TensorFlow to ready data 
> from CarbonData for training model
> 2. CarbonData provides python interface to support MXNet to ready data from 
> CarbonData for training model
> 3. CarbonData provides python interface to support PyTorch to ready data from 
> CarbonData for training model
> 4. CarbonData should support epoch function
> 5. CarbonData should support cache for speed up performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878730
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java
 ##
 @@ -0,0 +1,255 @@
+package org.apache.carbondata.core.util;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class GeoHashDefault implements CustomIndex> 
{
+// 角度转弧度的转换因子
+private final static double CONVERT_FACTOR = 180.0;
+// 地球半径
+private final static double EARTH_RADIUS = 6371004.0;
+
+private static double transValue = Math.PI / CONVERT_FACTOR * 
EARTH_RADIUS; // 赤道经度1度或者纬度1度对应的地理空间距离
+
+private double oriLongitude = 0;  // 坐标原点的经度
+
+private double oriLatitude = 0;   // 坐标原点的纬度
+
+private double userDefineMaxLongitude = 0;  // 用户定义地图最大的经度
+
+private double userDefineMaxLatitude = 0;   // 用户定义地图最大的纬度
+
+private double userDefineMinLongitude = 0;  // 用户定义地图最小的经度
+
+private double userDefineMinLatitude = 0;   // 用户定义地图最小的纬度
+
+private double CalculateMaxLongitude = 0;  // 计算后得出的补齐地图最大的经度
+
+private double CalculateMaxLatitude = 0;  // 计算后得出的补齐地图最大的纬度
+
+private int gridSize = 0;  //栅格长度单位是米
+
+private double mCos;   // 坐标原点纬度的余玄数值
+
+private  double deltaY = 0;// 每一个gridSize长度对应Y轴的度数
+
+private  double deltaX = 0;// 每一个gridSize长度应X轴的度数
+
+private  double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数
+
+private  double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数
+
+private int cutLevel = 0;  // 对整个区域切的刀数（一横一竖为1刀），就是四叉树的深度
+
+private int totalRowNumber = 0;// 整个区域的行数，从左上开始到右下
+
+private int totalCloumnNumber = 0;   // 整个区域的列数，从左上开始到右下
+
+private int udfRowStartNumber = 0;   // 用户定义区域的开始行数
+
+private int udfRowEndNumber = 0;   // 用户定义区域的结束的行数
+
+private int udfCloumnStartNumber = 0;   // 用户定义区域的开始列数
+
+private int udfCloumnEndNumber = 0;   // 用户定义区域的开始结束列数
+
+private double lon0 = 0;  // 栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+private double lat0 = 0;  // 栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+private double lon0ByRation = 0;  // *系数的常量
+private double lat0ByRation = 0;  // *系数的常量
+
+private int conversionRatio = 1;  // 系数，用于将double类型的经纬度，转换成int类型后计算
+
+
+@Override
+public void validateOption(Map properties) throws 
Exception {
+String option = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (option == null || option.isEmpty()) {
+throw new MalformedCarbonCommandException(
+String.format("%s property is invalid.", 
CarbonCommonConstants.INDEX_HANDLER));
+}
+
+String commonKey = "." + option + ".";
 
 Review comment:
   Ok. Removed blank lines in the complete PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] jackylk commented on issue #3457: [HOTFIX] Ignore testcase for compatibility problem in spark 2.1

2019-11-23 Thread GitBox

jackylk commented on issue #3457: [HOTFIX] Ignore testcase for compatibility 
problem in spark 2.1
URL: https://github.com/apache/carbondata/pull/3457#issuecomment-557778633
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Reopened] (CARBONDATA-3271) WIP

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu reopened CARBONDATA-3271:
---

push pycarbon to Apache CarbonData

> WIP
> ---
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.1
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349877872
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java
 ##
 @@ -69,7 +69,12 @@ public RowParserImpl(DataField[] output, 
CarbonDataLoadConfiguration configurati
 DataField[] input = new DataField[fields.length];
 inputMapping = new int[input.length];
 int k = 0;
+Boolean isNonSchemaPresent = false;
 for (int i = 0; i < fields.length; i++) {
+  if (fields[i].getColumn().getSchemaOrdinal() == -1) {
+isNonSchemaPresent = true;
+continue;
+  }
 
 Review comment:
   This could have been the easiest thing to do for me :)
   There was a reason to keep inputMapping for non schema columns to be at the 
end. We can stop the row parse moment a non schema columns is encountered. We 
have only 1 non schema columns at the moment though. 
   
   Anyway, have modified as suggested. I believe your comment applies to for 
converter(`RowConverterImpl.convert()`) too. Have modified converter to run the 
loop for 2. First iteration converts schema columns and second iteration 
generates and converts for non schema columns.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878336
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/scan/expression/geo/PolygonExpression.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.scan.expression.geo;
+
+import java.util.List;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.ExpressionResult;
+import org.apache.carbondata.core.scan.expression.LiteralExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.GreaterThanEqualToExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.LessThanEqualToExpression;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException;
+import org.apache.carbondata.core.scan.expression.logical.AndExpression;
+import org.apache.carbondata.core.scan.expression.logical.OrExpression;
+import org.apache.carbondata.core.scan.expression.logical.RangeExpression;
+import org.apache.carbondata.core.scan.expression.logical.TrueExpression;
+import org.apache.carbondata.core.scan.filter.intf.ExpressionType;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.util.CustomIndex;
+
+/**
+ * InPolygon expression processor. It inputs the InPolygon string to the 
GeoHash implementation's
+ * query method, gets the list of ranges of GeoHash IDs to filter as an 
output. And then, multiple
+ * range expressions are build from those list of ranges.
+ */
+@InterfaceAudience.Internal
+public class PolygonExpression extends Expression {
+  private String polygon;
+  private String columnName;
+  private CustomIndex> handler;
+  private List ranges;
+
+  public PolygonExpression(String polygon, String columnName, CustomIndex 
handler) {
+this.polygon = polygon;
+this.handler = handler;
+this.columnName = columnName;
+  }
+
+  /**
+   * This method builds the GeoHash range expressions from the list of ranges 
of GeoHash IDs.
+   */
+  public void buildRangeExpression() {
+try {
+  ranges = handler.query(polygon);
+} catch (Exception e) {
+  throw new RuntimeException(e);
+}
+
+// Convert these ranges into range expressions
+Expression expression = null;
+Expression prevExpression = null;
+Expression rangeExpression;
+for (Long[] range : ranges) {
+  assert (range.length == 2);
 
 Review comment:
   Yes, Modified to EqualToExpression.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878392
 
 

 ##
 File path: core/src/main/java/org/apache/carbondata/core/geo/GeoHashImpl.java
 ##
 @@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.geo;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CustomIndex;
+
+/**
+ * GeoHash default implementation
+ */
+public class GeoHashImpl implements CustomIndex> {
+  /**
+   * Initialize the geohash index handler instance.
+   * @param handlerName
+   * @param properties
+   * @throws Exception
+   */
+  @Override
+  public void init(String handlerName, Map properties) throws 
Exception {
+String options = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (options == null || options.isEmpty()) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid.", 
CarbonCommonConstants.INDEX_HANDLER));
+}
+options = options.toLowerCase();
+if (!options.contains(handlerName.toLowerCase())) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s is not present.",
+  CarbonCommonConstants.INDEX_HANDLER, handlerName));
+}
+
+String commonKey = CarbonCommonConstants.INDEX_HANDLER + "." + handlerName 
+ ".";
+String TYPE = commonKey + "type";
+String SOURCE_COLUMNS = commonKey + "sourcecolumns";
+String SOURCE_COLUMN_TYPES = commonKey + "sourcecolumntypes";
+String TARGET_DATA_TYPE = commonKey + "datatype";
+String ORIGIN_LATITUDE = commonKey + "originlatitude";
+String MIN_LONGITUDE = commonKey + "minlongitude";
+String MAX_LONGITUDE = commonKey + "maxlongitude";
+String MIN_LATITUDE = commonKey + "minlatitude";
+String MAX_LATITUDE = commonKey + "maxlatitude";
+String GRID_SIZE = commonKey + "gridsize";
+String CONVERSION_RATIO = commonKey + "conversionratio";
+
+
+String sourceColumnsOption = properties.get(SOURCE_COLUMNS);
+if (sourceColumnsOption == null) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s property is not 
specified.",
+  CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS));
+}
+
+if (sourceColumnsOption.split(",").length != 2) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s property must have 2 
columns.",
+  CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS));
+}
+
+String type = properties.get(TYPE);
+if (type != null && !CarbonCommonConstants.GEOHASH.equalsIgnoreCase(type)) 
{
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s property must be %s 
for this class.",
+  CarbonCommonConstants.INDEX_HANDLER, TYPE, 
CarbonCommonConstants.GEOHASH));
+}
+
+properties.put(TYPE, CarbonCommonConstants.GEOHASH);
+
+String sourceDataTypes = properties.get(SOURCE_COLUMN_TYPES);
+String[] srcTypes = sourceDataTypes.split(",");
+for (String srcdataType : srcTypes) {
+  if (!"bigint".equalsIgnoreCase(srcdataType)) {
+throw new MalformedCarbonCommandException(
+String.format("%s property is invalid. %s datatypes must be 
long.",
+CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS));
+  }
+}
+
+String dataType = properties.get(TARGET_DATA_TYPE);
+if (dataType != null && !"long".equalsIgnoreCase(dataType)) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s property must be long 
for this class.",
+  CarbonCommonConstants.INDEX_HANDLER,

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878902
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+  /**
+   * The method parses, validates and processes the index_handler property.
+   *
 
 Review comment:
   Ok. Removed blank lines in the complete PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878902
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+  /**
+   * The method parses, validates and processes the index_handler property.
+   *
 
 Review comment:
   Ok. Removed blank line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878838
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java
 ##
 @@ -0,0 +1,255 @@
+package org.apache.carbondata.core.util;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class GeoHashDefault implements CustomIndex> 
{
+// 角度转弧度的转换因子
+private final static double CONVERT_FACTOR = 180.0;
+// 地球半径
+private final static double EARTH_RADIUS = 6371004.0;
+
+private static double transValue = Math.PI / CONVERT_FACTOR * 
EARTH_RADIUS; // 赤道经度1度或者纬度1度对应的地理空间距离
+
+private double oriLongitude = 0;  // 坐标原点的经度
+
+private double oriLatitude = 0;   // 坐标原点的纬度
+
+private double userDefineMaxLongitude = 0;  // 用户定义地图最大的经度
+
+private double userDefineMaxLatitude = 0;   // 用户定义地图最大的纬度
+
+private double userDefineMinLongitude = 0;  // 用户定义地图最小的经度
+
+private double userDefineMinLatitude = 0;   // 用户定义地图最小的纬度
+
+private double CalculateMaxLongitude = 0;  // 计算后得出的补齐地图最大的经度
+
+private double CalculateMaxLatitude = 0;  // 计算后得出的补齐地图最大的纬度
+
+private int gridSize = 0;  //栅格长度单位是米
+
+private double mCos;   // 坐标原点纬度的余玄数值
+
+private  double deltaY = 0;// 每一个gridSize长度对应Y轴的度数
+
+private  double deltaX = 0;// 每一个gridSize长度应X轴的度数
+
+private  double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数
+
+private  double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数
+
+private int cutLevel = 0;  // 对整个区域切的刀数（一横一竖为1刀），就是四叉树的深度
+
+private int totalRowNumber = 0;// 整个区域的行数，从左上开始到右下
+
+private int totalCloumnNumber = 0;   // 整个区域的列数，从左上开始到右下
+
+private int udfRowStartNumber = 0;   // 用户定义区域的开始行数
+
+private int udfRowEndNumber = 0;   // 用户定义区域的结束的行数
+
+private int udfCloumnStartNumber = 0;   // 用户定义区域的开始列数
+
+private int udfCloumnEndNumber = 0;   // 用户定义区域的开始结束列数
+
+private double lon0 = 0;  // 栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+private double lat0 = 0;  // 栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+private double lon0ByRation = 0;  // *系数的常量
+private double lat0ByRation = 0;  // *系数的常量
+
+private int conversionRatio = 1;  // 系数，用于将double类型的经纬度，转换成int类型后计算
+
+
+@Override
+public void validateOption(Map properties) throws 
Exception {
+String option = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (option == null || option.isEmpty()) {
+throw new MalformedCarbonCommandException(
+String.format("%s property is invalid.", 
CarbonCommonConstants.INDEX_HANDLER));
+}
+
+String commonKey = "." + option + ".";
+String sourceColumnsOption = 
properties.get(CarbonCommonConstants.INDEX_HANDLER + commonKey + 
"sourcecolumns");
+if (sourceColumnsOption == null) {
+throw new MalformedCarbonCommandException(
+String.format("%s property is invalid. %s property is not 
specified.",
+CarbonCommonConstants.INDEX_HANDLER,
+CarbonCommonConstants.INDEX_HANDLER + commonKey + 
"sourcecolumns"));
+}
+
+if (sourceColumnsOption.split(",").length != 2) {
+throw new MalformedCarbonCommandException(
+String.format("%s property is invalid. %s property must 
have 2 columns.",
+CarbonCommonConstants.INDEX_HANDLER,
+CarbonCommonConstants.INDEX_HANDLER + commonKey + 
"sourcecolumns"));
+}
+
+String type = properties.get(CarbonCommonConstants.INDEX_HANDLER + 
commonKey + "type");
+if (type != null && !"geohash".equalsIgnoreCase(type)) {
 
 Review comment:
   Agreed. Modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3255) CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3255:
--
Description: 
Apache CarbonData already provide Java/ Scala/C++ interface for users, and more 
and more people use python to manage and analysis big data, so it's better to 
provide python interface to support to write and read structured and 
unstructured data in CarbonData, like String, int and binary data: 
image/voice/video. It should not dependency Apache Spark. We called it is PYSDK.

PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python 
code.  Even though Apache Spark use py4j in PySpark to call java code in 
python, but it's low performance when use py4j to read bigdata with CarbonData 
format in python code, py4j also show low performance when read big data in 
their report: https://www.py4j.org/advanced_topics.html#performance.  JPype is 
also a popular tool to call java code in python, but it already stoped update 
several years ago, so we can not use it.  In our test, pyjnius has high 
performance to read big data by call java code in python, so it's good choice 
for us.

We already work for these feature several months in  
https://github.com/xubo245/pycarbon
Goals:

1. PYSDK should provide interface to support read data
2. PYSDK should provide interface to support write data
3. PYSDK should support basic data types
4. PYSDK should support projection
5. PYSDK should support filter



  was:

Apache CarbonData already provide Java/ Scala/C++ interface for users, and more 
and more people use python to manage and analysis big data, so it's better to 
provide python interface to support to write and read structured and 
unstructured data in CarbonData, like String, int and binary data: 
image/voice/video. It should not dependency Apache Spark. We called it is PYSDK.

PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python 
code.  Even though Apache Spark use py4j in PySpark to call java code in 
python, but it's low performance when use py4j to read bigdata with CarbonData 
format in python code, py4j also show low performance when read big data in 
their report: https://www.py4j.org/advanced_topics.html#performance.  JPype is 
also a popular tool to call java code in python, but it already stoped update 
several years ago, so we can not use it.  In our test, pyjnius has high 
performance to read big data by call java code in python, so it's good choice 
for us.

Goals:

1. PYSDK should provide interface to support read data
2. PYSDK should provide interface to support write data
3. PYSDK should support basic data types
4. PYSDK should support projection
5. PYSDK should support filter




> CarbonData provides python interface to support to write and read structured 
> and unstructured data in CarbonData
> 
>
> Key: CARBONDATA-3255
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3255
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Apache CarbonData already provide Java/ Scala/C++ interface for users, and 
> more and more people use python to manage and analysis big data, so it's 
> better to provide python interface to support to write and read structured 
> and unstructured data in CarbonData, like String, int and binary data: 
> image/voice/video. It should not dependency Apache Spark. We called it is 
> PYSDK.
> PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python 
> code.  Even though Apache Spark use py4j in PySpark to call java code in 
> python, but it's low performance when use py4j to read bigdata with 
> CarbonData format in python code, py4j also show low performance when read 
> big data in their report: 
> https://www.py4j.org/advanced_topics.html#performance.  JPype is also a 
> popular tool to call java code in python, but it already stoped update 
> several years ago, so we can not use it.  In our test, pyjnius has high 
> performance to read big data by call java code in python, so it's good choice 
> for us.
> We already work for these feature several months in  
> https://github.com/xubo245/pycarbon
> Goals:
> 1. PYSDK should provide interface to support read data
> 2. PYSDK should provide interface to support write data
> 3. PYSDK should support basic data types
> 4. PYSDK should support projection
> 5. PYSDK should support filter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878815
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java
 ##
 @@ -0,0 +1,255 @@
+package org.apache.carbondata.core.util;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
 
 Review comment:
   Added comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878765
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+  /**
+   * The method parses, validates and processes the index_handler property.
+   *
+   * @param tableProperties Table properties
+   * @param tableFields Sequence of table fields
+   * @return  Sequence of table fields
+   *
+   */
+  private def processIndexProperty(tableProperties: mutable.Map[String, 
String],
+   tableFields: Seq[Field]): Seq[Field] = {
+val option = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER)
+val fields = ListBuffer[Field]()
+if (option.isDefined) {
+  if (option.get.isEmpty) {
+throw new MalformedCarbonCommandException(
+  s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. 
" +
+s"Option value is empty.")
+  }
+
+  val handlers = option.get.split(",")
+  handlers.foreach { e =>
+/* Validate target column name */
+if (tableFields.exists(_.column.equalsIgnoreCase(e))) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"handler value : $e is not allowed. It matches with another 
column name in table. " +
+  s"Cannot create column with it.")
+}
+
+val sourceColumnsOption = tableProperties.get(
+  CarbonCommonConstants.INDEX_HANDLER + s".$e.sourcecolumns")
+if (sourceColumnsOption.isEmpty) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns 
property is not specified.")
+} else if (sourceColumnsOption.get.isEmpty) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns 
property cannot be empty.")
+}
+
+/* Validate source columns */
+val sources = sourceColumnsOption.get.split(",")
+if (sources.distinct.length != sources.size) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns 
property " +
+  s"have duplicate columns.")
+}
+
+val sourceTypes = StringBuilder.newBuilder
+sources.foreach { column =>
+  tableFields.find(_.column.equalsIgnoreCase(column)) match {
+case Some(field) => 
sourceTypes.append(field.dataType.get).append(",")
+case None =>
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"Source column : $column in property " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns " +
+  "is not a valid column in table.")
+  }
+}
+
+tableProperties.put(CarbonCommonConstants.INDEX_HANDLER +
+  s".$e.sourcecolumntypes", sourceTypes.dropRight(1).toString())
+
+val handlerType = 
tableProperties.get(CarbonCommonConstants.INDEX_HANDLER + s".$e.type")
+val handlerClass = 
tableProperties.get(CarbonCommonConstants.INDEX_HANDLER + s".$e.class")
+
+val handlerClassName: String = handlerClass match {
+  case Some(className) =>
+className
+  case None =>
+/* use handler type to find the default implementation */
+if (handlerType.isEmpty) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"Both ${CarbonCommonConstants.INDEX_HANDLER}.$e.class and " 
+
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.type properties 
are not specified")
+} else if (handlerType.get.equalsIgnoreCase("geohash")) {
+  /* Use geoHash default implementation */
+  val className = 
classOf[org.apache.carbondata.core.util.GeoHashDefault].getName
+  
tableProperties.put(s"${CarbonCommonConstants.INDEX_HANDLER}.$e.class", 
className)
+  className
+} else {
+  throw new MalformedCarbonCommandException(
+s"Carbon

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878796
 
 

 ##
 File path: core/src/main/java/org/apache/carbondata/core/util/CustomIndex.java
 ##
 @@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.util;
+
+import java.util.List;
+import java.util.Map;
+
+public interface CustomIndex {
 
 Review comment:
   Added comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878782
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java
 ##
 @@ -0,0 +1,255 @@
+package org.apache.carbondata.core.util;
 
 Review comment:
   Added header


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349879251
 
 

 ##
 File path: core/src/main/java/org/apache/carbondata/core/geo/GeoHashImpl.java
 ##
 @@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.geo;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CustomIndex;
+
+/**
+ * GeoHash default implementation
+ */
+public class GeoHashImpl implements CustomIndex> {
+  /**
+   * Initialize the geohash index handler instance.
+   * @param handlerName
+   * @param properties
+   * @throws Exception
+   */
+  @Override
+  public void init(String handlerName, Map properties) throws 
Exception {
+String options = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (options == null || options.isEmpty()) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid.", 
CarbonCommonConstants.INDEX_HANDLER));
+}
+options = options.toLowerCase();
+if (!options.contains(handlerName.toLowerCase())) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s is not present.",
+  CarbonCommonConstants.INDEX_HANDLER, handlerName));
+}
+
+String commonKey = CarbonCommonConstants.INDEX_HANDLER + "." + handlerName 
+ ".";
+String TYPE = commonKey + "type";
+String SOURCE_COLUMNS = commonKey + "sourcecolumns";
+String SOURCE_COLUMN_TYPES = commonKey + "sourcecolumntypes";
+String TARGET_DATA_TYPE = commonKey + "datatype";
+String ORIGIN_LATITUDE = commonKey + "originlatitude";
+String MIN_LONGITUDE = commonKey + "minlongitude";
+String MAX_LONGITUDE = commonKey + "maxlongitude";
+String MIN_LATITUDE = commonKey + "minlatitude";
+String MAX_LATITUDE = commonKey + "maxlatitude";
+String GRID_SIZE = commonKey + "gridsize";
+String CONVERSION_RATIO = commonKey + "conversionratio";
+
+
+String sourceColumnsOption = properties.get(SOURCE_COLUMNS);
+if (sourceColumnsOption == null) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s property is not 
specified.",
+  CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS));
+}
+
+if (sourceColumnsOption.split(",").length != 2) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s property must have 2 
columns.",
+  CarbonCommonConstants.INDEX_HANDLER, SOURCE_COLUMNS));
+}
+
+String type = properties.get(TYPE);
+if (type != null && !CarbonCommonConstants.GEOHASH.equalsIgnoreCase(type)) 
{
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s property must be %s 
for this class.",
+  CarbonCommonConstants.INDEX_HANDLER, TYPE, 
CarbonCommonConstants.GEOHASH));
+}
+
+properties.put(TYPE, CarbonCommonConstants.GEOHASH);
 
 Review comment:
   Agreed. Modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Reopened] (CARBONDATA-3283) WIP

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu reopened CARBONDATA-3283:
---

contribute pycarbon to Apache CarbonData

> WIP
> ---
>
> Key: CARBONDATA-3283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
> WIP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Description: 
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform. Apache CarbonData has many great feature 
and high performance to storage, manage and analysis big data. Apache 
CarbonData not only already supported String, Int, Double, Boolean, Char,Date, 
TImeStamp data types, but also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336].But 
it's not easy for them to use carbon by Java/Scala/C++. So it's better to 
provide python interface for these users to use CarbonData by python code
 
Apache CarbonData already 


  was:
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform has many great feature and high performance 
to storage, manage and analysis big data.But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code


> PyCarbon: provide python interface for users to use CarbonData by python code
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
>   More and more people use big data to optimize their algorithm, train 
> their model, deploy their model as service and inference image.  It's big 
> challenge to storage, manage and analysis lots of structured and unstructured 
> data, especially unstructured data, like image, video, audio and so on.
>  Many users use python to install their project for these scenario. 
> Apache CarbonData is an indexed columnar data store solution for fast 
> analytics on big data platform. Apache CarbonData is an indexed columnar data 
> store solution for fast analytics on big data platform. Apache CarbonData has 
> many great feature and high performance to storage, manage and analysis big 
> data. Apache CarbonData not only already supported String, Int, Double, 
> Boolean, Char,Date, TImeStamp data types, but also supported Binay 
> [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336].But 
> it's not easy for them to use carbon by Java/Scala/C++. So it's better to 
> provide python interface for these users to use CarbonData by python code
>  
> Apache CarbonData already 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878033
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java
 ##
 @@ -161,11 +166,59 @@ public DictionaryClient call() throws Exception {
 return null;
   }
 
+  private int getDataFieldIndexByName(String column) {
+for (int i = 0; i < fields.length; i++) {
+  if (fields[i].getColumn().getColName().equalsIgnoreCase(column)) {
+return i;
+  }
+}
+return -1;
+  }
+
+  private String generateNonSchemaColumnValue(DataField field, CarbonRow row) {
+Map properties = 
configuration.getTableSpec().getCarbonTable()
+.getTableInfo().getFactTable().getTableProperties();
+String handler = properties.get(CarbonCommonConstants.INDEX_HANDLER
++ "." + field.getColumn().getColName() + ".instance");
+if (handler != null) {
+  try {
+// TODO Need to check to how to store the instance. This serialization 
may be incorrect.
+ByteArrayInputStream bis = new 
ByteArrayInputStream(Base64.getDecoder().decode(handler));
+ObjectInputStream in = new ObjectInputStream(bis);
+CustomIndex instance = (CustomIndex) in.readObject();
+String sourceColumns = 
properties.get(CarbonCommonConstants.INDEX_HANDLER
++ "." + field.getColumn().getColName() + ".sourcecolumns");
+assert (sourceColumns != null);
+String[] sources = sourceColumns.split(",");
+int srcFieldIndex;
+List sourceValues = new ArrayList();
+for (String source : sources) {
+  srcFieldIndex = getDataFieldIndexByName(source);
+  assert (srcFieldIndex != -1);
+  sourceValues.add(row.getData()[srcFieldIndex]);
+}
+return instance.generate(sourceValues);
 
 Review comment:
   Modified accordingly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3255) CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3255:
--
Description: 

Apache CarbonData already provide Java/ Scala/C++ interface for users, and more 
and more people use python to manage and analysis big data, so it's better to 
provide python interface to support to write and read structured and 
unstructured data in CarbonData, like String, int and binary data: 
image/voice/video. It should not dependency Apache Spark. We called it is PYSDK.

PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python 
code.  Even though Apache Spark use py4j in PySpark to call java code in 
python, but it's low performance when use py4j to read bigdata with CarbonData 
format in python code, py4j also show low performance when read big data in 
their report: https://www.py4j.org/advanced_topics.html#performance.  JPype is 
also a popular tool to call java code in python, but it already stoped update 
several years ago, so we can not use it.  In our test, pyjnius has high 
performance to read big data by call java code in python, so it's good choice 
for us.

Goals:

1. PYSDK should provide interface to support read data
2. PYSDK should provide interface to support write data
3. PYSDK should support basic data types
4. PYSDK should support projection
5. PYSDK should support filter


Summary: CarbonData provides python interface to support to write and 
read structured and unstructured data in CarbonData  (was: WIP)

> CarbonData provides python interface to support to write and read structured 
> and unstructured data in CarbonData
> 
>
> Key: CARBONDATA-3255
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3255
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Apache CarbonData already provide Java/ Scala/C++ interface for users, and 
> more and more people use python to manage and analysis big data, so it's 
> better to provide python interface to support to write and read structured 
> and unstructured data in CarbonData, like String, int and binary data: 
> image/voice/video. It should not dependency Apache Spark. We called it is 
> PYSDK.
> PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python 
> code.  Even though Apache Spark use py4j in PySpark to call java code in 
> python, but it's low performance when use py4j to read bigdata with 
> CarbonData format in python code, py4j also show low performance when read 
> big data in their report: 
> https://www.py4j.org/advanced_topics.html#performance.  JPype is also a 
> popular tool to call java code in python, but it already stoped update 
> several years ago, so we can not use it.  In our test, pyjnius has high 
> performance to read big data by call java code in python, so it's good choice 
> for us.
> Goals:
> 1. PYSDK should provide interface to support read data
> 2. PYSDK should provide interface to support write data
> 3. PYSDK should support basic data types
> 4. PYSDK should support projection
> 5. PYSDK should support filter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878448
 
 

 ##
 File path: core/src/main/java/org/apache/carbondata/core/geo/GeoHashImpl.java
 ##
 @@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.geo;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CustomIndex;
+
+/**
+ * GeoHash default implementation
+ */
+public class GeoHashImpl implements CustomIndex> {
+  /**
+   * Initialize the geohash index handler instance.
+   * @param handlerName
+   * @param properties
+   * @throws Exception
+   */
+  @Override
+  public void init(String handlerName, Map properties) throws 
Exception {
+String options = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (options == null || options.isEmpty()) {
 
 Review comment:
   Modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Reopened] (CARBONDATA-3254) [WIP]

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu reopened CARBONDATA-3254:
---

Contribute  Pycarbon to Apache CarbonData

> [WIP] 
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Description: 
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform has many great feature and high performance 
to storage, manage and analysis big data.But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code

  was:
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.
 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform has many great feature and high performance 
to storage, manage and analysis big data.But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code


> PyCarbon: provide python interface for users to use CarbonData by python code
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
>   More and more people use big data to optimize their algorithm, train 
> their model, deploy their model as service and inference image.  It's big 
> challenge to storage, manage and analysis lots of structured and unstructured 
> data, especially unstructured data, like image, video, audio and so on.
>  Many users use python to install their project for these scenario. 
> Apache CarbonData is an indexed columnar data store solution for fast 
> analytics on big data platform. Apache CarbonData is an indexed columnar data 
> store solution for fast analytics on big data platform has many great feature 
> and high performance to storage, manage and analysis big data.But it's not 
> easy for them to use carbon by Java/Scala/C++. So it's better to provide 
> python interface for these users to use CarbonData by python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Description: 
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.
 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform has many great feature and high performance 
to storage, manage and analysis big data.But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code

> PyCarbon: provide python interface for users to use CarbonData by python code
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
>   More and more people use big data to optimize their algorithm, train 
> their model, deploy their model as service and inference image.  It's big 
> challenge to storage, manage and analysis lots of structured and unstructured 
> data, especially unstructured data, like image, video, audio and so on.
>  Many users use python to install their project for these scenario. 
> Apache CarbonData is an indexed columnar data store solution for fast 
> analytics on big data platform. Apache CarbonData is an indexed columnar data 
> store solution for fast analytics on big data platform has many great feature 
> and high performance to storage, manage and analysis big data.But it's not 
> easy for them to use carbon by Java/Scala/C++. So it's better to provide 
> python interface for these users to use CarbonData by python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878186
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/geo/InPolygonUDF.scala
 ##
 @@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo
+
+import org.apache.spark.sql.sources.Filter
+
+import org.apache.carbondata.common.annotations.InterfaceAudience
+
+@InterfaceAudience.Internal
+class InPolygonUDF extends (String => Boolean) with Serializable {
+  override def apply(v1: String): Boolean = {
+v1.length > 0
 
 Review comment:
   Added comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878436
 
 

 ##
 File path: core/src/main/java/org/apache/carbondata/core/geo/GeoHashImpl.java
 ##
 @@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.geo;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CustomIndex;
+
+/**
+ * GeoHash default implementation
+ */
+public class GeoHashImpl implements CustomIndex> {
+  /**
+   * Initialize the geohash index handler instance.
+   * @param handlerName
+   * @param properties
+   * @throws Exception
+   */
+  @Override
+  public void init(String handlerName, Map properties) throws 
Exception {
+String options = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (options == null || options.isEmpty()) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid.", 
CarbonCommonConstants.INDEX_HANDLER));
+}
+options = options.toLowerCase();
+if (!options.contains(handlerName.toLowerCase())) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s is not present.",
+  CarbonCommonConstants.INDEX_HANDLER, handlerName));
+}
+
+String commonKey = CarbonCommonConstants.INDEX_HANDLER + "." + handlerName 
+ ".";
+String TYPE = commonKey + "type";
+String SOURCE_COLUMNS = commonKey + "sourcecolumns";
+String SOURCE_COLUMN_TYPES = commonKey + "sourcecolumntypes";
+String TARGET_DATA_TYPE = commonKey + "datatype";
+String ORIGIN_LATITUDE = commonKey + "originlatitude";
+String MIN_LONGITUDE = commonKey + "minlongitude";
+String MAX_LONGITUDE = commonKey + "maxlongitude";
+String MIN_LATITUDE = commonKey + "minlatitude";
+String MAX_LATITUDE = commonKey + "maxlatitude";
+String GRID_SIZE = commonKey + "gridsize";
+String CONVERSION_RATIO = commonKey + "conversionratio";
+
 
 Review comment:
   Removed blank lines in complete PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Description: 
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData has many great feature and high performance to 
storage, manage and analysis big data. Apache CarbonData not only already 
supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but 
also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
which can avoid small binary files problem and speed up S3 access performance 
reach dozens or even hundreds of times, also can decrease cost of accessing OBS 
by decreasing the number of calling S3 API. But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code

We already work for these feature several months in  
https://github.com/xubo245/pycarbon

 Goals:
1. Apache CarbonData should provides python interface to support to write and 
read structured and unstructured data in CarbonData, like String, int and 
binary data: image/voice/video. It should not dependency Apache Spark.
2. Apache CarbonData should provides python interface to support deep learning 
framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, 
PyTorch and so on. It should not dependency Apache Spark.
3. Apache CarbonData should provides python interface to manage and analysis 
data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap 
feature  in Python. 

 



  was:
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData has many great feature and high performance to 
storage, manage and analysis big data. Apache CarbonData not only already 
supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but 
also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
which can avoid small binary files problem and speed up S3 access performance 
reach dozens or even hundreds of times, also can decrease cost of accessing OBS 
by decreasing the number of calling S3 API. But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code

 Goals:
1. Apache CarbonData should provides python interface to support to write and 
read structured and unstructured data in CarbonData, like String, int and 
binary data: image/voice/video. It should not dependency Apache Spark.
2. Apache CarbonData should provides python interface to support deep learning 
framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, 
PyTorch and so on. It should not dependency Apache Spark.
3. Apache CarbonData should provides python interface to manage and analysis 
data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap 
feature  in Python. 

 




> PyCarbon: provide python interface for users to use CarbonData by python code
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
>   More and more people use big data to optimize their algorithm, train 
> their model, deploy their model as service and inference image.  It's big 
> challenge to storage, manage and analysis lots of structured and unstructured 
> data, especially unstructured data, like image, video, audio and so on.
>  Many users use python to install their project for these scenario. 
> Apache CarbonData is an indexed columnar data store solution for fast 
> analytics on big data platform. Apache CarbonData has many great feature and 
> high performance to storage, manage and analysis big data. Apache CarbonData 
> not only already supported

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Summary: PyCarbon: provide python interface for users to use CarbonData by 
python code  (was: PyCarbon: provide python SDK for user to use CarbonData by 
python code)

> PyCarbon: provide python interface for users to use CarbonData by python code
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (CARBONDATA-3255) WIP

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu reopened CARBONDATA-3255:
---

contribute pycarbon to Apache CarbonData

> WIP
> ---
>
> Key: CARBONDATA-3255
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3255
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878603
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java
 ##
 @@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+
+
+public class GeoHashDefault implements CustomIndex> 
{
+  // 角度转弧度的转换因子
+  private static final double CONVERT_FACTOR = 180.0;
+  // 地球半径
+  private static final double EARTH_RADIUS = 6371004.0;
+
+  private static final String GEOHASH = "geohash";
+  // 赤道经度1度或者纬度1度对应的地理空间距离
+  private static double transValue = Math.PI / CONVERT_FACTOR * EARTH_RADIUS;
+
+  // private double oriLongitude = 0;  // 坐标原点的经度
+
+  private double oriLatitude = 0;   // 坐标原点的纬度
+
+  private double userDefineMaxLongitude = 0;  // 用户定义地图最大的经度
+
+  private double userDefineMaxLatitude = 0;   // 用户定义地图最大的纬度
+
+  private double userDefineMinLongitude = 0;  // 用户定义地图最小的经度
+
+  private double userDefineMinLatitude = 0;   // 用户定义地图最小的纬度
+
+  private double CalculateMaxLongitude = 0;  // 计算后得出的补齐地图最大的经度
+
+  private double CalculateMaxLatitude = 0;  // 计算后得出的补齐地图最大的纬度
+
+  private int gridSize = 0;  //栅格长度单位是米
+
+  private double mCos;   // 坐标原点纬度的余玄数值
+
+  private double deltaY = 0;// 每一个gridSize长度对应Y轴的度数
+
+  private double deltaX = 0;// 每一个gridSize长度应X轴的度数
+
+  private double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数
+
+  private double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数
+
+  private int cutLevel = 0;  // 对整个区域切的刀数（一横一竖为1刀），就是四叉树的深度
+
+  //private int totalRowNumber = 0;// 整个区域的行数，从左上开始到右下
+
+  //private int totalCloumnNumber = 0;   // 整个区域的列数，从左上开始到右下
+
+  //private int udfRowStartNumber = 0;   // 用户定义区域的开始行数
+
+  //private int udfRowEndNumber = 0;   // 用户定义区域的结束的行数
+
+  //private int udfCloumnStartNumber = 0;   // 用户定义区域的开始列数
+
+  //private int udfCloumnEndNumber = 0;   // 用户定义区域的开始结束列数
+
+  //private double lon0 = 0;  // 
栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+  //private double lat0 = 0;  // 
栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
 
 Review comment:
   Agreed. This code is removed from this PR and it will be raised by 
@MarvinLitt in a different PR as part algorithm.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878610
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/GeoHashDefault.java
 ##
 @@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+
+
+public class GeoHashDefault implements CustomIndex> 
{
+  // 角度转弧度的转换因子
+  private static final double CONVERT_FACTOR = 180.0;
+  // 地球半径
+  private static final double EARTH_RADIUS = 6371004.0;
+
+  private static final String GEOHASH = "geohash";
+  // 赤道经度1度或者纬度1度对应的地理空间距离
+  private static double transValue = Math.PI / CONVERT_FACTOR * EARTH_RADIUS;
+
+  // private double oriLongitude = 0;  // 坐标原点的经度
+
+  private double oriLatitude = 0;   // 坐标原点的纬度
+
+  private double userDefineMaxLongitude = 0;  // 用户定义地图最大的经度
+
+  private double userDefineMaxLatitude = 0;   // 用户定义地图最大的纬度
+
+  private double userDefineMinLongitude = 0;  // 用户定义地图最小的经度
+
+  private double userDefineMinLatitude = 0;   // 用户定义地图最小的纬度
+
+  private double CalculateMaxLongitude = 0;  // 计算后得出的补齐地图最大的经度
+
+  private double CalculateMaxLatitude = 0;  // 计算后得出的补齐地图最大的纬度
+
+  private int gridSize = 0;  //栅格长度单位是米
+
+  private double mCos;   // 坐标原点纬度的余玄数值
+
+  private double deltaY = 0;// 每一个gridSize长度对应Y轴的度数
+
+  private double deltaX = 0;// 每一个gridSize长度应X轴的度数
+
+  private double deltaYByRatio = 0; // 每一个gridSize长度对应Y轴的度数 * 系数
+
+  private double deltaXByRatio = 0; // 每一个gridSize长度应X轴的度数 * 系数
+
+  private int cutLevel = 0;  // 对整个区域切的刀数（一横一竖为1刀），就是四叉树的深度
+
+  //private int totalRowNumber = 0;// 整个区域的行数，从左上开始到右下
+
+  //private int totalCloumnNumber = 0;   // 整个区域的列数，从左上开始到右下
+
+  //private int udfRowStartNumber = 0;   // 用户定义区域的开始行数
+
+  //private int udfRowEndNumber = 0;   // 用户定义区域的结束的行数
+
+  //private int udfCloumnStartNumber = 0;   // 用户定义区域的开始列数
+
+  //private int udfCloumnEndNumber = 0;   // 用户定义区域的开始结束列数
+
+  //private double lon0 = 0;  // 
栅格最小数值的经度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+  //private double lat0 = 0;  // 
栅格最小数值的纬度坐标,最小栅格坐标是扩展区域最左上角的经纬度坐标
+
+  private double lon0ByRation = 0;  // *系数的常量
+  private double lat0ByRation = 0;  // *系数的常量
+
+  private int conversionRatio = 1;  // 系数，用于将double类型的经纬度，转换成int类型后计算
+
+  /**
+   * Initialize the geohash index handler instance.
+   * @param handlerName
+   * @param properties
+   * @throws Exception
+   */
+  @Override
+  public void init(String handlerName, Map properties) throws 
Exception {
+String options = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (options == null || options.isEmpty()) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid.", 
CarbonCommonConstants.INDEX_HANDLER));
+}
+options = options.toLowerCase();
+if (!options.contains(handlerName.toLowerCase())) {
+  throw new MalformedCarbonCommandException(
+  String.format("%s property is invalid. %s is not present.",
+  CarbonCommonConstants.INDEX_HANDLER, handlerName));
+}
+
+String commonKey = CarbonCommonConstants.INDEX_HANDLER + "." + handlerName 
+ ".";
+String TYPE = commonKey + "type";
+String SOURCE_COLUMNS = commonKey + "sourcecolumns";
+String SOURCE_COLUMN_TYPES = commonKey + "sourcecolumntypes";
+String TARGET_DATA_TYPE = commonKey + "datatype";
+// String ORIGIN_LONGITUDE = commonKey + "originlongitude";
+String ORIGIN_LATITUDE = commonKey + "originlatitude";
+String MIN_LONGITUDE = commonKey + "minlongitude";
+String MAX_LONGITUDE = commonKey + "maxlongitude";
+String MIN_LATITUDE = commonKey + "minlatitude";
+String MAX_LATITUDE = commonKey + "maxlatitude";
+

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878645
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/util/AlterTableUtil.scala
 ##
 @@ -934,4 +934,21 @@ object AlterTableUtil {
 CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB)
 }
   }
+
+  def validateForIndexHandlerSources(carbonTable: CarbonTable, alterColumns: 
List[String]): Unit = {
+// Do not allow index handler's source columns to be altered
+val properties = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
+val indexProperty = properties.get(CarbonCommonConstants.INDEX_HANDLER)
+if (indexProperty.isDefined) {
+  indexProperty.get.split(",") foreach { element =>
 
 Review comment:
   Agreed. Modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878691
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ##
 @@ -53,6 +53,21 @@ private[sql] case class CarbonDescribeFormattedCommand(
   (field.name, field.dataType.simpleString, colComment)
 }
 
+/* Append non-schema columns */
+val columns = 
relation.carbonTable.getTableInfo.getFactTable.getListOfColumns.asScala
+val implicitColumns = for (column <- columns if column.getSchemaOrdinal == 
-1) yield {
 
 Review comment:
   Yes. Modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Description: 
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform. Apache CarbonData has many great feature 
and high performance to storage, manage and analysis big data. Apache 
CarbonData not only already supported String, Int, Double, Boolean, Char,Date, 
TImeStamp data types, but also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
which can avoid small binary files problem and speed up S3 access performance 
reach dozens or even hundreds of times, also can decrease cost of accessing OBS 
by decreasing the number of calling S3 API. But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code
 



  was:
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform. Apache CarbonData has many great feature 
and high performance to storage, manage and analysis big data. Apache 
CarbonData not only already supported String, Int, Double, Boolean, Char,Date, 
TImeStamp data types, but also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336].But 
it's not easy for them to use carbon by Java/Scala/C++. So it's better to 
provide python interface for these users to use CarbonData by python code
 
Apache CarbonData already 



> PyCarbon: provide python interface for users to use CarbonData by python code
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
>   More and more people use big data to optimize their algorithm, train 
> their model, deploy their model as service and inference image.  It's big 
> challenge to storage, manage and analysis lots of structured and unstructured 
> data, especially unstructured data, like image, video, audio and so on.
>  Many users use python to install their project for these scenario. 
> Apache CarbonData is an indexed columnar data store solution for fast 
> analytics on big data platform. Apache CarbonData is an indexed columnar data 
> store solution for fast analytics on big data platform. Apache CarbonData has 
> many great feature and high performance to storage, manage and analysis big 
> data. Apache CarbonData not only already supported String, Int, Double, 
> Boolean, Char,Date, TImeStamp data types, but also supported Binay 
> [(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
> which can avoid small binary files problem and speed up S3 access performance 
> reach dozens or even hundreds of times, also can decrease cost of accessing 
> OBS by decreasing the number of calling S3 API. But it's not easy for them to 
> use carbon by Java/Scala/C++. So it's better to provide python interface for 
> these users to use CarbonData by python code
>  
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349877866
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java
 ##
 @@ -79,6 +84,17 @@ public RowParserImpl(DataField[] output, 
CarbonDataLoadConfiguration configurati
 }
   }
 }
+/* If non schema fields are present, keep all of them at the end of input 
*/
+if (isNonSchemaPresent) {
+  for (int i = 0; i < fields.length; i++) {
+if (fields[i].getColumn().getSchemaOrdinal() != -1) {
+  continue;
+}
+input[k] = fields[i];
+inputMapping[k] = -1;
+k++;
+  }
+}
 
 Review comment:
   same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349877865
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java
 ##
 @@ -95,6 +111,11 @@ public RowParserImpl(DataField[] output, 
CarbonDataLoadConfiguration configurati
 }
 Object[] out = new Object[genericParsers.length];
 for (int i = 0; i < genericParsers.length; i++) {
+  if (inputMapping[i] == -1) {
+/* All the non schema fields are placed at end. And input mapping for 
them are marked as -1.
+Can break the loop when inputMapping[i] is -1. */
+break;
+  }
 
 Review comment:
   same as above


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349877872
 
 

 ##
 File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/RowParserImpl.java
 ##
 @@ -69,7 +69,12 @@ public RowParserImpl(DataField[] output, 
CarbonDataLoadConfiguration configurati
 DataField[] input = new DataField[fields.length];
 inputMapping = new int[input.length];
 int k = 0;
+Boolean isNonSchemaPresent = false;
 for (int i = 0; i < fields.length; i++) {
+  if (fields[i].getColumn().getSchemaOrdinal() == -1) {
+isNonSchemaPresent = true;
+continue;
+  }
 
 Review comment:
   This could have been the easiest thing to do for me :)
   There was a reason to keep inputMapping for non schema columns to be at the 
end. We can stop the row parse moment a non schema columns is encountered. We 
have only 1 non schema columns at the moment though. 
   
   Anyway, have modified as suggested.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878568
 
 

 ##
 File path: dev/scalastyle-config.xml
 ##
 @@ -228,7 +228,7 @@ This file is divided into 3 sections:
 ]]>
  
 
- 
+ 
 
 Review comment:
   Agreed. Reverted it back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3254) PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3254:
--
Description: 
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData has many great feature and high performance to 
storage, manage and analysis big data. Apache CarbonData not only already 
supported String, Int, Double, Boolean, Char,Date, TImeStamp data types, but 
also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
which can avoid small binary files problem and speed up S3 access performance 
reach dozens or even hundreds of times, also can decrease cost of accessing OBS 
by decreasing the number of calling S3 API. But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code

 Goals:
1. Apache CarbonData should provides python interface to support to write and 
read structured and unstructured data in CarbonData, like String, int and 
binary data: image/voice/video. It should not dependency Apache Spark.
2. Apache CarbonData should provides python interface to support deep learning 
framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, 
PyTorch and so on. It should not dependency Apache Spark.
3. Apache CarbonData should provides python interface to manage and analysis 
data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap 
feature  in Python. 

 



  was:
  More and more people use big data to optimize their algorithm, train 
their model, deploy their model as service and inference image.  It's big 
challenge to storage, manage and analysis lots of structured and unstructured 
data, especially unstructured data, like image, video, audio and so on.


 Many users use python to install their project for these scenario. Apache 
CarbonData is an indexed columnar data store solution for fast analytics on big 
data platform. Apache CarbonData is an indexed columnar data store solution for 
fast analytics on big data platform. Apache CarbonData has many great feature 
and high performance to storage, manage and analysis big data. Apache 
CarbonData not only already supported String, Int, Double, Boolean, Char,Date, 
TImeStamp data types, but also supported Binay 
[(CARBONDATA-3336)|https://issues.apache.org/jira/browse/CARBONDATA-3336], 
which can avoid small binary files problem and speed up S3 access performance 
reach dozens or even hundreds of times, also can decrease cost of accessing OBS 
by decreasing the number of calling S3 API. But it's not easy for them to use 
carbon by Java/Scala/C++. So it's better to provide python interface for these 
users to use CarbonData by python code

 Goals:
1. Apache CarbonData should provides python interface to support to write and 
read structured and unstructured data in CarbonData, like String, int and 
binary data: image/voice/video. It should not dependency Apache Spark.
2. Apache CarbonData should provides python interface to support deep learning 
framework to ready and write data from/to CarbonData, like TensorFlow , MXNet, 
PyTorch and so on. It should not dependency Apache Spark.
3. Apache CarbonData should provides python interface to manage and analysis 
data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap 
feature  in Python. 

 




> PyCarbon: provide python interface for users to use CarbonData by python code
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
>   More and more people use big data to optimize their algorithm, train 
> their model, deploy their model as service and inference image.  It's big 
> challenge to storage, manage and analysis lots of structured and unstructured 
> data, especially unstructured data, like image, video, audio and so on.
>  Many users use python to install their project for these scenario. 
> Apache CarbonData is an indexed columnar data store solution for fast 
> analytics on big data platform. Apache CarbonData has many great feature and 
> high performance to storage, manage and analysis big data. Apache CarbonData 
> not only already

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349878535
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/QuadTreeUtil.scala
 ##
 @@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util
+
+import java.util
+
+
+class QuadTreeUtil{
+
+  /**
+* 二分查找法,传入的是列表  List arrays, 待查询的是 des
+*
+* @param arrays 数据区域,数据是有序的
+* @param des待查询数据
+* @return >=0 返回找到的元素在列表中的位置  < 0 未找到
+*/
+  def binarySearch(arrays: util.List[Array[Long]], des: Long): (Int,(Int, Int) 
) = {
 
 Review comment:
   This file is removed from this PR and it will be raised by @MarvinLitt  in a 
different PR as part algorithm.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349879032
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+  /**
+   * The method parses, validates and processes the index_handler property.
+   *
+   * @param tableProperties Table properties
+   * @param tableFields Sequence of table fields
+   * @return  Sequence of table fields
+   *
+   */
+  private def processIndexProperty(tableProperties: mutable.Map[String, 
String],
+   tableFields: Seq[Field]): Seq[Field] = {
+val option = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER)
+val fields = ListBuffer[Field]()
+if (option.isDefined) {
+  if (option.get.isEmpty) {
+throw new MalformedCarbonCommandException(
+  s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. 
" +
+s"Option value is empty.")
+  }
+
+  val handlers = option.get.split(",")
+  handlers.foreach { e =>
+/* Validate target column name */
+if (tableFields.exists(_.column.equalsIgnoreCase(e))) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"handler value : $e is not allowed. It matches with another 
column name in table. " +
+  s"Cannot create column with it.")
+}
+
+val sourceColumnsOption = tableProperties.get(
+  CarbonCommonConstants.INDEX_HANDLER + s".$e.sourcecolumns")
+if (sourceColumnsOption.isEmpty) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns 
property is not specified.")
+} else if (sourceColumnsOption.get.isEmpty) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns 
property cannot be empty.")
+}
+
+/* Validate source columns */
+val sources = sourceColumnsOption.get.split(",")
+if (sources.distinct.length != sources.size) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns 
property " +
+  s"have duplicate columns.")
+}
+
+val sourceTypes = StringBuilder.newBuilder
+sources.foreach { column =>
+  tableFields.find(_.column.equalsIgnoreCase(column)) match {
+case Some(field) => 
sourceTypes.append(field.dataType.get).append(",")
+case None =>
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"Source column : $column in property " +
 
 Review comment:
   Modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349879013
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+  /**
+   * The method parses, validates and processes the index_handler property.
+   *
+   * @param tableProperties Table properties
+   * @param tableFields Sequence of table fields
+   * @return  Sequence of table fields
+   *
+   */
+  private def processIndexProperty(tableProperties: mutable.Map[String, 
String],
+   tableFields: Seq[Field]): Seq[Field] = {
+val option = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER)
+val fields = ListBuffer[Field]()
+if (option.isDefined) {
+  if (option.get.isEmpty) {
+throw new MalformedCarbonCommandException(
+  s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. 
" +
+s"Option value is empty.")
+  }
+
+  val handlers = option.get.split(",")
+  handlers.foreach { e =>
+/* Validate target column name */
+if (tableFields.exists(_.column.equalsIgnoreCase(e))) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"handler value : $e is not allowed. It matches with another 
column name in table. " +
+  s"Cannot create column with it.")
+}
+
+val sourceColumnsOption = tableProperties.get(
+  CarbonCommonConstants.INDEX_HANDLER + s".$e.sourcecolumns")
+if (sourceColumnsOption.isEmpty) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns 
property is not specified.")
+} else if (sourceColumnsOption.get.isEmpty) {
+  throw new MalformedCarbonCommandException(
+s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is 
invalid. " +
+  s"${CarbonCommonConstants.INDEX_HANDLER}.$e.sourcecolumns 
property cannot be empty.")
+}
+
+/* Validate source columns */
+val sources = sourceColumnsOption.get.split(",")
 
 Review comment:
   Agreed. Modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And

2019-11-23 Thread GitBox

VenuReddy2103 commented on a change in pull request #3436: 
[CARBONDATA-3548]Geospatial Support: Modified to create and load the table with 
a nonschema dimension sort column. And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#discussion_r349879065
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ##
 @@ -268,6 +309,151 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+  /**
+   * The method parses, validates and processes the index_handler property.
+   *
+   * @param tableProperties Table properties
+   * @param tableFields Sequence of table fields
+   * @return  Sequence of table fields
+   *
+   */
+  private def processIndexProperty(tableProperties: mutable.Map[String, 
String],
+   tableFields: Seq[Field]): Seq[Field] = {
+val option = tableProperties.get(CarbonCommonConstants.INDEX_HANDLER)
+val fields = ListBuffer[Field]()
+if (option.isDefined) {
+  if (option.get.isEmpty) {
+throw new MalformedCarbonCommandException(
+  s"Carbon ${CarbonCommonConstants.INDEX_HANDLER} property is invalid. 
" +
+s"Option value is empty.")
+  }
+
+  val handlers = option.get.split(",")
+  handlers.foreach { e =>
 
 Review comment:
   Modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557821381
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/984/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] xubo245 opened a new pull request #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData

2019-11-23 Thread GitBox

xubo245 opened a new pull request #3478:  [CARBONDATA-3255] CarbonData provides 
python interface to support to write and read structured and unstructured data 
in CarbonData
URL: https://github.com/apache/carbondata/pull/3478
 
 
  Apache CarbonData already provide Java/ Scala/C++ interface for users, 
and more and more people use python to manage and analysis big data, so it's 
better to provide python interface to support to write and read structured and 
unstructured data in CarbonData, like String, int and binary data: 
image/voice/video. It should not dependency Apache Spark. We called it is PYSDK.
   
   PYSDK based on CarbonData Java SDK, use pyjnius to call java code in 
python code. Even though Apache Spark use py4j in PySpark to call java code in 
python, but it's low performance when use py4j to read bigdata with CarbonData 
format in python code, py4j also show low performance when read big data in 
their report: https://www.py4j.org/advanced_topics.html#performance. JPype is 
also a popular tool to call java code in python, but it already stoped update 
several years ago, so we can not use it. In our test, pyjnius has high 
performance to read big data by call java code in python, so it's good choice 
for us.
   
   We already work for these feature several months in 
https://github.com/xubo245/pycarbon
   Goals:
   
   1. PYSDK should provide interface to support read data
   2. PYSDK should provide interface to support write data
   3. PYSDK should support basic data types
   4. PYSDK should support projection
   5. PYSDK should support filter
   
   
   
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557814939
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/983/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557814895
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/992/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557827031
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/993/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile 
interface
URL: https://github.com/apache/carbondata/pull/2465#issuecomment-557841833
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/987/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache 
CarbonData should provides python interface to support deep learning framework 
to ready and write data from/to CarbonData
URL: https://github.com/apache/carbondata/pull/3479#issuecomment-557829689
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/985/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3478:  [CARBONDATA-3255] CarbonData provides 
python interface to support to write and read structured and unstructured data 
in CarbonData
URL: https://github.com/apache/carbondata/pull/3478#issuecomment-557829721
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/986/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557827588
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/996/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3478:  [CARBONDATA-3255] CarbonData provides 
python interface to support to write and read structured and unstructured data 
in CarbonData
URL: https://github.com/apache/carbondata/pull/3478#issuecomment-557834742
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/995/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache 
CarbonData should provides python interface to support deep learning framework 
to ready and write data from/to CarbonData
URL: https://github.com/apache/carbondata/pull/3479#issuecomment-557834680
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/994/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3478: [CARBONDATA-3255] CarbonData provides python interface to support to write and read structured and unstructured data in CarbonData

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3478:  [CARBONDATA-3255] CarbonData provides 
python interface to support to write and read structured and unstructured data 
in CarbonData
URL: https://github.com/apache/carbondata/pull/3478#issuecomment-557836745
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/998/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data from/

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3479: [WIP][ CARBONDATA-3271] Apache 
CarbonData should provides python interface to support deep learning framework 
to ready and write data from/to CarbonData
URL: https://github.com/apache/carbondata/pull/3479#issuecomment-557836744
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/997/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: Modified to create and load the table with a nonschema dimension sort column. And added InPolygon UDF

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #3436: [CARBONDATA-3548]Geospatial Support: 
Modified to create and load the table with a nonschema dimension sort column. 
And added InPolygon UDF
URL: https://github.com/apache/carbondata/pull/3436#issuecomment-557815040
 
 
   Build Failed  with Spark 2.3.2, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/995/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (CARBONDATA-3283) Apache CarbonData should provides python interface to manage and analysis data based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap feature in P

2019-11-23 Thread Bo Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xu updated CARBONDATA-3283:
--
Description: 
Apache CarbonData should provides python interface to manage and analysis data 
based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap 
feature in Python.

Goals:

1). PyCarbon support  read data from local/HDFS/S3 in python code by PySpark 
DataFrame
2). PyCarbon support  write data in python code to local/HDFS/S3 by PySpark 
DataFrame
3). PyCarbon support  DDL in python with sql format
4). PyCarbon support  DML in python with sql format
5). PyCarbon support  DataMap in python with sql format

  was:
Apache CarbonData should provides python interface to manage and analysis data 
based on Apache Spark. Apache CarbonData should support DDL, DML, DataMap 
feature in Python.

TODO: 


> Apache CarbonData should provides python interface to manage and analysis 
> data based on Apache Spark. Apache CarbonData should support DDL, DML, 
> DataMap feature in Python.
> ---
>
> Key: CARBONDATA-3283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Bo Xu
>Assignee: Bo Xu
>Priority: Major
>
> Apache CarbonData should provides python interface to manage and analysis 
> data based on Apache Spark. Apache CarbonData should support DDL, DML, 
> DataMap feature in Python.
> Goals:
> 1). PyCarbon support  read data from local/HDFS/S3 in python code by PySpark 
> DataFrame
> 2). PyCarbon support  write data in python code to local/HDFS/S3 by PySpark 
> DataFrame
> 3). PyCarbon support  DDL in python with sql format
> 4). PyCarbon support  DML in python with sql format
> 5). PyCarbon support  DataMap in python with sql format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] xubo245 opened a new pull request #3479: [WIP][ CARBONDATA-3271] Apache CarbonData should provides python interface to support deep learning framework to ready and write data fro

2019-11-23 Thread GitBox

xubo245 opened a new pull request #3479: [WIP][ CARBONDATA-3271] Apache 
CarbonData should provides python interface to support deep learning framework 
to ready and write data from/to CarbonData
URL: https://github.com/apache/carbondata/pull/3479
 
 
   
   Apache CarbonData should provides python interface to support deep 
learning framework to ready and write data from/to CarbonData, like TensorFlow 
, MXNet, PyTorch and so on. It should not dependency Apache Spark.
   
   Goals：
   1. CarbonData provides python interface to support TensorFlow to ready 
data from CarbonData for training model
   2. CarbonData provides python interface to support MXNet to ready data 
from CarbonData for training model
   3. CarbonData provides python interface to support PyTorch to ready data 
from CarbonData for training model
   4. CarbonData should support epoch function
   5. CarbonData should support cache for speed up performance.
   
   
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile 
interface
URL: https://github.com/apache/carbondata/pull/2465#issuecomment-557845983
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/996/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile interface

2019-11-23 Thread GitBox

CarbonDataQA commented on issue #2465: [CARBONDATA-2863] Refactored CarbonFile 
interface
URL: https://github.com/apache/carbondata/pull/2465#issuecomment-557850178
 
 
   Build Failed  with Spark 2.3.2, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/999/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] shenh062326 commented on issue #3476: [CARBONDATA-3593] Fix TOTAL_BLOCKLET_NUM not right when blocklet filt…

2019-11-23 Thread GitBox

shenh062326 commented on issue #3476: [CARBONDATA-3593] Fix TOTAL_BLOCKLET_NUM 
not right when blocklet filt…
URL: https://github.com/apache/carbondata/pull/3476#issuecomment-557853006
 
 
   > @shenh062326 Thanks for fixing this, could you paste a comparison of the 
statistics printing before and after this change?
   
   ok, I have paste the comparison in the comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

73 matches

Mail list logo