date:20200507

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3751: [CARBONDATA-3803] Mark CarbonSession as deprecated since 2.0

2020-05-07 Thread GitBox



ajantha-bhat commented on a change in pull request #3751:
URL: https://github.com/apache/carbondata/pull/3751#discussion_r421945890



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSession.scala
##
@@ -40,12 +40,18 @@ import 
org.apache.carbondata.streaming.CarbonStreamingQueryListener
  * Session implementation for {org.apache.spark.sql.SparkSession}
  * Implemented this class only to use our own SQL DDL commands.
  * User needs to use {CarbonSession.getOrCreateCarbon} to create Carbon 
session.
+ *
+ * @deprecated Since 2.0, only use for backward compatibility,
+ * please switch to use {@link CarbonExtensions}.
  */
+@Deprecated
 class CarbonSession(@transient val sc: SparkContext,

Review comment:
   "**stored by**" syntax also we need to deprecate and update document for 
the same?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3754: [CARBONDATA-3808] Added documentation for cdc and scd scenarios

2020-05-07 Thread GitBox



ajantha-bhat commented on a change in pull request #3754:
URL: https://github.com/apache/carbondata/pull/3754#discussion_r421944153



##
File path: docs/scd-and-cdc-guide.md
##
@@ -0,0 +1,118 @@
+
+
+# Upsert into a Carbon DataSet using Merge 
+
+## SCD and CDC Scenarios
+Change Data Capture (CDC), is to apply all data changes generated from an 
external data set 
+into a target dataset. In other words, a set of changes (update/delete/insert) 
applied to an external 
+table needs to be applied to a target table.
+
+Slowly Changing Dimensions (SCD), are the dimensions in which the data changes 
slowly, rather 
+than changing regularly on a time basis.
+
+SCD and CDC data changes can be merged to a carbon dataset online using the 
data frame level `MERGE` API.
+
+ MERGE API
+
+Below API merges the datasets online and applies the actions as per the 
conditions. 
+
+```
+  targetDS.merge(sourceDS, )

Review comment:
   Need to explain what each and every API does here and when to use it. 
Because these are added newly by carbon





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3754: [CARBONDATA-3808] Added documentation for cdc and scd scenarios

2020-05-07 Thread GitBox



ajantha-bhat commented on a change in pull request #3754:
URL: https://github.com/apache/carbondata/pull/3754#discussion_r421943365



##
File path: docs/scd-and-cdc-guide.md
##
@@ -0,0 +1,118 @@
+
+
+# Upsert into a Carbon DataSet using Merge 
+
+## SCD and CDC Scenarios
+Change Data Capture (CDC), is to apply all data changes generated from an 
external data set 
+into a target dataset. In other words, a set of changes (update/delete/insert) 
applied to an external 
+table needs to be applied to a target table.
+
+Slowly Changing Dimensions (SCD), are the dimensions in which the data changes 
slowly, rather 
+than changing regularly on a time basis.
+
+SCD and CDC data changes can be merged to a carbon dataset online using the 
data frame level `MERGE` API.
+
+ MERGE API
+
+Below API merges the datasets online and applies the actions as per the 
conditions. 
+
+```
+  targetDS.merge(sourceDS, )
+  .whenMatched()
+  .updateExpr(updateMap)
+  .insertExpr(insertMap_u)
+  .whenNotMatched()
+  .insertExpr(insertMap)
+  .whenNotMatchedAndExistsOnlyOnTarget()
+  .delete()
+  .insertHistoryTableExpr(insertMap_d, )
+  .execute()
+```
+**NOTE:** SQL syntax for merge is not yet supported.
+
+# Example code sample to implement ccd scenario

Review comment:
   Just give a link to `MergeTestCase.scala` and test case name
   No need have same thing here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421921906



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421921948



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process
+
+  Typical flink stream: Source -> Process -> Output(Carbon Writer Sink)
+  
+  Pseudo code and description: 
+  
+  ```scala
+// Import dependencies.
+import java.util.Properties
+import org.apache.carbon.flink.CarbonWriterFactory
+import org.apache.carbon.flink.ProxyFileSystem
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.flink.api.common.restartstrategy.RestartStrategies
+import org.apache.flink.core.fs.Path
+import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+   
+// Specify database name.
+val databaseName = "default"
+   
+// Specify target table name.
+val tableName = "test"
+// Table path of the target table.
+val tablePath = "/data/warehouse/test"
+// Specify local temporary path.
+val dataTempPath = "/data/temp/"
+   
+val tableProperties = new Properties
+// Set the table properties here.
+   
+val writerProperties = new Properties
+// Set the writer properties here, such as temp path, commit threshold, 
access key, secret key, endpoint, etc.
+   
+val carbonProperties = new Properties
+// Set the carbon properties here, such as date format, store location, 
etc.
+ 
+// Create carbon bulk writer factory. Two writer types are supported: 
'Local' and 'S3'.
+val writerFactory = CarbonWriterFactory.builder("Local").build(
+  databaseName,
+  tableName,
+  tablePath,
+  tableProperties,
+  writerProperties,
+  carbonProperties
+)
+ 
+// Build a flink stream and run it.
+// 1. New a flink execution environment.
+val environment = StreamExecutionEnvironment.getExecutionEnvironment
+// Set flink environment configuration here, such as parallelism, 
checkpointing, restart strategy, etc.
+   
+// 2. Create flink data source, may be a kafka source, custom source, or 
others.
+// The data type of source should be Array[AnyRef].
+// Array length should equals to table column count, and values order in 
array should matches table column order.
+val source = ...
+// 3. Create flink stream and set source.
+val stream = environment.addSource(source)
+// 4. Add other flink operators here.
+// ...
+// 5. Set flink stream target (write data to carbon with a write sink).
+stream.addSink(StreamingFileSink.forBulkFormat(new 
Path(ProxyFileSystem.DEFAULT_URI), writerFactory).build)
+// 6. Run flink stream.
+try {
+  environment.execute
+} catch {
+  case exception: Exception =>
+// Handle execute exception here.
+}
+  ```
+
+###Writer properties
+
+Local Writer
+
+  | Property | Name
 | Description  
   |
+  
|--|--|-|
+  | CarbonLocalProperty.DATA_TEMP_PATH   | carbon.writer.local.data.temp.path  
 | Usually is a local path, data will write to temp path first, and mv to 
target data path finally.|

Review comment:
   OK

##
File path: docs/flink-integration-guide.md

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421921602



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process
+
+  Typical flink stream: Source -> Process -> Output(Carbon Writer Sink)
+  
+  Pseudo code and description: 
+  
+  ```scala
+// Import dependencies.
+import java.util.Properties
+import org.apache.carbon.flink.CarbonWriterFactory
+import org.apache.carbon.flink.ProxyFileSystem
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.flink.api.common.restartstrategy.RestartStrategies
+import org.apache.flink.core.fs.Path
+import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+   
+// Specify database name.
+val databaseName = "default"
+   
+// Specify target table name.
+val tableName = "test"
+// Table path of the target table.
+val tablePath = "/data/warehouse/test"
+// Specify local temporary path.
+val dataTempPath = "/data/temp/"
+   
+val tableProperties = new Properties
+// Set the table properties here.
+   
+val writerProperties = new Properties
+// Set the writer properties here, such as temp path, commit threshold, 
access key, secret key, endpoint, etc.
+   
+val carbonProperties = new Properties
+// Set the carbon properties here, such as date format, store location, 
etc.
+ 
+// Create carbon bulk writer factory. Two writer types are supported: 
'Local' and 'S3'.
+val writerFactory = CarbonWriterFactory.builder("Local").build(
+  databaseName,
+  tableName,
+  tablePath,
+  tableProperties,
+  writerProperties,
+  carbonProperties
+)
+ 
+// Build a flink stream and run it.
+// 1. New a flink execution environment.

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421921350



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process
+
+  Typical flink stream: Source -> Process -> Output(Carbon Writer Sink)
+  
+  Pseudo code and description: 
+  
+  ```scala
+// Import dependencies.
+import java.util.Properties
+import org.apache.carbon.flink.CarbonWriterFactory
+import org.apache.carbon.flink.ProxyFileSystem
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.flink.api.common.restartstrategy.RestartStrategies
+import org.apache.flink.core.fs.Path
+import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+   
+// Specify database name.
+val databaseName = "default"
+   
+// Specify target table name.
+val tableName = "test"
+// Table path of the target table.
+val tablePath = "/data/warehouse/test"
+// Specify local temporary path.
+val dataTempPath = "/data/temp/"
+   
+val tableProperties = new Properties
+// Set the table properties here.
+   
+val writerProperties = new Properties
+// Set the writer properties here, such as temp path, commit threshold, 
access key, secret key, endpoint, etc.
+   
+val carbonProperties = new Properties
+// Set the carbon properties here, such as date format, store location, 
etc.
+ 
+// Create carbon bulk writer factory. Two writer types are supported: 
'Local' and 'S3'.
+val writerFactory = CarbonWriterFactory.builder("Local").build(
+  databaseName,
+  tableName,
+  tablePath,
+  tableProperties,
+  writerProperties,
+  carbonProperties
+)
+ 
+// Build a flink stream and run it.
+// 1. New a flink execution environment.
+val environment = StreamExecutionEnvironment.getExecutionEnvironment
+// Set flink environment configuration here, such as parallelism, 
checkpointing, restart strategy, etc.
+   
+// 2. Create flink data source, may be a kafka source, custom source, or 
others.
+// The data type of source should be Array[AnyRef].
+// Array length should equals to table column count, and values order in 
array should matches table column order.
+val source = ...
+// 3. Create flink stream and set source.
+val stream = environment.addSource(source)
+// 4. Add other flink operators here.
+// ...
+// 5. Set flink stream target (write data to carbon with a write sink).
+stream.addSink(StreamingFileSink.forBulkFormat(new 
Path(ProxyFileSystem.DEFAULT_URI), writerFactory).build)
+// 6. Run flink stream.
+try {
+  environment.execute
+} catch {
+  case exception: Exception =>
+// Handle execute exception here.
+}
+  ```
+
+###Writer properties
+
+Local Writer
+
+  | Property | Name
 | Description  
   |
+  
|--|--|-|
+  | CarbonLocalProperty.DATA_TEMP_PATH   | carbon.writer.local.data.temp.path  
 | Usually is a local path, data will write to temp path first, and mv to 
target data path finally.|
+  | CarbonLocalProperty.COMMIT_THRESHOLD | 
carbon.writer.local.commit.threshold

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421921411



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421890991



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process
+
+  Typical flink stream: Source -> Process -> Output(Carbon Writer Sink)
+  
+  Pseudo code and description: 
+  
+  ```scala
+// Import dependencies.
+import java.util.Properties
+import org.apache.carbon.flink.CarbonWriterFactory
+import org.apache.carbon.flink.ProxyFileSystem
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.flink.api.common.restartstrategy.RestartStrategies
+import org.apache.flink.core.fs.Path
+import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+   
+// Specify database name.
+val databaseName = "default"
+   
+// Specify target table name.
+val tableName = "test"
+// Table path of the target table.
+val tablePath = "/data/warehouse/test"
+// Specify local temporary path.
+val dataTempPath = "/data/temp/"
+   
+val tableProperties = new Properties
+// Set the table properties here.
+   
+val writerProperties = new Properties
+// Set the writer properties here, such as temp path, commit threshold, 
access key, secret key, endpoint, etc.
+   
+val carbonProperties = new Properties
+// Set the carbon properties here, such as date format, store location, 
etc.
+ 
+// Create carbon bulk writer factory. Two writer types are supported: 
'Local' and 'S3'.
+val writerFactory = CarbonWriterFactory.builder("Local").build(
+  databaseName,
+  tableName,
+  tablePath,
+  tableProperties,
+  writerProperties,
+  carbonProperties
+)
+ 
+// Build a flink stream and run it.
+// 1. New a flink execution environment.
+val environment = StreamExecutionEnvironment.getExecutionEnvironment
+// Set flink environment configuration here, such as parallelism, 
checkpointing, restart strategy, etc.
+   
+// 2. Create flink data source, may be a kafka source, custom source, or 
others.
+// The data type of source should be Array[AnyRef].
+// Array length should equals to table column count, and values order in 
array should matches table column order.
+val source = ...
+// 3. Create flink stream and set source.
+val stream = environment.addSource(source)
+// 4. Add other flink operators here.
+// ...
+// 5. Set flink stream target (write data to carbon with a write sink).
+stream.addSink(StreamingFileSink.forBulkFormat(new 
Path(ProxyFileSystem.DEFAULT_URI), writerFactory).build)
+// 6. Run flink stream.
+try {
+  environment.execute
+} catch {
+  case exception: Exception =>
+// Handle execute exception here.
+}
+  ```
+
+###Writer properties
+
+Local Writer
+
+  | Property | Name
 | Description  
   |
+  
|--|--|-|
+  | CarbonLocalProperty.DATA_TEMP_PATH   | carbon.writer.local.data.temp.path  
 | Usually is a local path, data will write to temp path first, and mv to 
target data path finally.|
+  | CarbonLocalProperty.COMMIT_THRESHOLD | 
carbon.writer.local.commit.threshold

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421890931



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process
+
+  Typical flink stream: Source -> Process -> Output(Carbon Writer Sink)
+  
+  Pseudo code and description: 
+  
+  ```scala
+// Import dependencies.
+import java.util.Properties
+import org.apache.carbon.flink.CarbonWriterFactory
+import org.apache.carbon.flink.ProxyFileSystem
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.flink.api.common.restartstrategy.RestartStrategies
+import org.apache.flink.core.fs.Path
+import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+   
+// Specify database name.
+val databaseName = "default"
+   
+// Specify target table name.
+val tableName = "test"
+// Table path of the target table.
+val tablePath = "/data/warehouse/test"
+// Specify local temporary path.
+val dataTempPath = "/data/temp/"
+   
+val tableProperties = new Properties
+// Set the table properties here.
+   
+val writerProperties = new Properties
+// Set the writer properties here, such as temp path, commit threshold, 
access key, secret key, endpoint, etc.
+   
+val carbonProperties = new Properties
+// Set the carbon properties here, such as date format, store location, 
etc.
+ 
+// Create carbon bulk writer factory. Two writer types are supported: 
'Local' and 'S3'.
+val writerFactory = CarbonWriterFactory.builder("Local").build(
+  databaseName,
+  tableName,
+  tablePath,
+  tableProperties,
+  writerProperties,
+  carbonProperties
+)
+ 
+// Build a flink stream and run it.
+// 1. New a flink execution environment.
+val environment = StreamExecutionEnvironment.getExecutionEnvironment
+// Set flink environment configuration here, such as parallelism, 
checkpointing, restart strategy, etc.
+   
+// 2. Create flink data source, may be a kafka source, custom source, or 
others.
+// The data type of source should be Array[AnyRef].
+// Array length should equals to table column count, and values order in 
array should matches table column order.
+val source = ...
+// 3. Create flink stream and set source.
+val stream = environment.addSource(source)
+// 4. Add other flink operators here.
+// ...
+// 5. Set flink stream target (write data to carbon with a write sink).
+stream.addSink(StreamingFileSink.forBulkFormat(new 
Path(ProxyFileSystem.DEFAULT_URI), writerFactory).build)
+// 6. Run flink stream.
+try {
+  environment.execute
+} catch {
+  case exception: Exception =>
+// Handle execute exception here.
+}
+  ```
+
+###Writer properties
+
+Local Writer

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421890603



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process
+
+  Typical flink stream: Source -> Process -> Output(Carbon Writer Sink)
+  
+  Pseudo code and description: 
+  
+  ```scala
+// Import dependencies.
+import java.util.Properties
+import org.apache.carbon.flink.CarbonWriterFactory
+import org.apache.carbon.flink.ProxyFileSystem
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.flink.api.common.restartstrategy.RestartStrategies
+import org.apache.flink.core.fs.Path
+import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
+import 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink
+   
+// Specify database name.
+val databaseName = "default"
+   
+// Specify target table name.
+val tableName = "test"
+// Table path of the target table.
+val tablePath = "/data/warehouse/test"
+// Specify local temporary path.
+val dataTempPath = "/data/temp/"
+   
+val tableProperties = new Properties
+// Set the table properties here.
+   
+val writerProperties = new Properties
+// Set the writer properties here, such as temp path, commit threshold, 
access key, secret key, endpoint, etc.
+   
+val carbonProperties = new Properties
+// Set the carbon properties here, such as date format, store location, 
etc.
+ 
+// Create carbon bulk writer factory. Two writer types are supported: 
'Local' and 'S3'.
+val writerFactory = CarbonWriterFactory.builder("Local").build(
+  databaseName,
+  tableName,
+  tablePath,
+  tableProperties,
+  writerProperties,
+  carbonProperties
+)
+ 
+// Build a flink stream and run it.
+// 1. New a flink execution environment.
+val environment = StreamExecutionEnvironment.getExecutionEnvironment
+// Set flink environment configuration here, such as parallelism, 
checkpointing, restart strategy, etc.
+   
+// 2. Create flink data source, may be a kafka source, custom source, or 
others.
+// The data type of source should be Array[AnyRef].
+// Array length should equals to table column count, and values order in 
array should matches table column order.
+val source = ...
+// 3. Create flink stream and set source.
+val stream = environment.addSource(source)
+// 4. Add other flink operators here.
+// ...
+// 5. Set flink stream target (write data to carbon with a write sink).
+stream.addSink(StreamingFileSink.forBulkFormat(new 
Path(ProxyFileSystem.DEFAULT_URI), writerFactory).build)
+// 6. Run flink stream.
+try {
+  environment.execute
+} catch {
+  case exception: Exception =>
+// Handle execute exception here.
+}
+  ```
+
+###Writer properties

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421890531



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description

Review comment:
   OK

##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] niuge01 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



niuge01 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421890386



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios

Review comment:
   OK

##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3755: [WIP]refactor MV events

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3755:
URL: https://github.com/apache/carbondata/pull/3755#issuecomment-62519


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1254/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3755: [WIP]refactor MV events

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3755:
URL: https://github.com/apache/carbondata/pull/3755#issuecomment-625443989


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2972/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (CARBONDATA-3809) Refresh index command fails for secondary index as per syntax mentioned in https://github.com/apache/carbondata/blob/master/docs/index/secondary-index-guide.md

2020-05-07 Thread Chetan Bhat (Jira)

Chetan Bhat created CARBONDATA-3809:
---

 Summary: Refresh index command fails for secondary index as per 
syntax mentioned in 
https://github.com/apache/carbondata/blob/master/docs/index/secondary-index-guide.md
 Key: CARBONDATA-3809
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3809
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 2.0.0
 Environment: Spark 2.3.2, Spark 2.4.5
Reporter: Chetan Bhat


Refresh index command fails for secondary index as per syntax mentioned in 
[https://github.com/apache/carbondata/blob/master/docs/index/secondary-index-guide.md]

 

0: jdbc:hive2://10.20.255.171:23040/default> create table brinjal (imei 
string,AMSize string,channelsId string,ActiveCountry string, Activecity 
string,gamePointId double,deviceInformationId double,productionDate 
Timestamp,deliveryDate timestamp,deliverycharge double) STORED as carbondata 
TBLPROPERTIES('table_blocksize'='1');
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.218 seconds)
0: jdbc:hive2://10.20.255.171:23040/default> LOAD DATA INPATH 
'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE brinjal 
OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (2.31 seconds)
0: jdbc:hive2://10.20.255.171:23040/default> CREATE INDEX indextable2 ON TABLE 
brinjal (AMSize) AS 'carbondata';
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (2.379 seconds)
0: jdbc:hive2://10.20.255.171:23040/default> refresh index indextable2;
Error: org.apache.spark.sql.AnalysisException: == Parser1: 
org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
[1.26] failure: end of input

refresh index indextable2
 ^;
== Parser2: org.apache.spark.sql.execution.SparkSqlParser ==

REFRESH statements cannot contain ' ', '\n', '\r', '\t' inside unquoted 
resource paths(line 1, pos 0)

== SQL ==
refresh index indextable2
^^^; (state=,code=0)
0: jdbc:hive2://10.20.255.171:23040/default> REFRESH INDEX indextable2 WHERE 
SEGMENT.ID IN(0);
Error: org.apache.spark.sql.AnalysisException: == Parser1: 
org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
[1.27] failure: identifier matching regex (?i)ON expected

REFRESH INDEX indextable2 WHERE SEGMENT.ID IN(0)
 ^;
== Parser2: org.apache.spark.sql.execution.SparkSqlParser ==

REFRESH statements cannot contain ' ', '\n', '\r', '\t' inside unquoted 
resource paths(line 1, pos 0)

== SQL ==
REFRESH INDEX indextable2 WHERE SEGMENT.ID IN(0)
^^^; (state=,code=0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] akashrn5 opened a new pull request #3755: [WIP]refactor MV events

2020-05-07 Thread GitBox



akashrn5 opened a new pull request #3755:
URL: https://github.com/apache/carbondata/pull/3755


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3754: [CARBONDATA-3808] Added documentation for cdc and scd scenarios

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3754:
URL: https://github.com/apache/carbondata/pull/3754#issuecomment-625357707


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1253/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3754: [CARBONDATA-3808] Added documentation for cdc and scd scenarios

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3754:
URL: https://github.com/apache/carbondata/pull/3754#issuecomment-625350618


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2971/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (CARBONDATA-3805) Drop index on bloom and lucene index fails

2020-05-07 Thread Chetan Bhat (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-3805:

Description: 
Drop index on bloom and lucene index fails

0: jdbc:hive2://10.20.255.35:23040/default> create table brinjal_bloom (imei 
string,AMSize string,channelsId string,ActiveCountry string, Activecity 
string,gamePointId double,deviceInformationId double,productionDate 
Timestamp,deliveryDate timestamp,deliverycharge double) STORED as carbondata 
TBLPROPERTIES('table_blocksize'='1');
 +---++
|Result|

+---++
 +---++
 No rows selected (0.261 seconds)
 0: jdbc:hive2://10.20.255.35:23040/default> LOAD DATA INPATH 
'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE brinjal_bloom 
OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
 +---++
|Result|

+---++
 +---++
 No rows selected (2.196 seconds)
 0: jdbc:hive2://10.20.255.35:23040/default> CREATE INDEX dm_brinjal ON TABLE 
brinjal_bloom(AMSize) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 
'BLOOM_FPP'='0.1');
 +---++
|Result|

+---++
 +---++
 No rows selected (1.039 seconds)
 0: jdbc:hive2://10.20.255.35:23040/default> drop index dm_brinjal on TABLE 
brinjal_bloom;

 *Error: org.apache.carbondata.core.exception.CarbonFileException: Error while 
setting modified time: (state=,code=0)*

0: jdbc:hive2://10.20.255.171:23040/default> CREATE TABLE 
uniqdata_lucene(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB 
timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED as carbondata;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.632 seconds)
0: jdbc:hive2://10.20.255.171:23040/default> LOAD DATA INPATH 
'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata_lucene 
OPTIONS('DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (3.894 seconds)
0: jdbc:hive2://10.20.255.171:23040/default> CREATE INDEX dm4 ON TABLE 
uniqdata_lucene (CUST_NAME) AS 'lucene';
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (2.518 seconds)
0: jdbc:hive2://10.20.255.171:23040/default> drop index dm4 on table 
uniqdata_lucene;
*Error: org.apache.carbondata.core.exception.CarbonFileException: Error while 
setting modified time: (state=,code=0)*

 

*Exception -*

2020-05-07 20:10:13,865 | ERROR | [HiveServer2-Background-Pool: Thread-358] | 
Error executing query, currentState RUNNING,  | 
org.apache.spark.internal.Logging$class.logError(Logging.scala:91)2020-05-07 
20:10:13,865 | ERROR | [HiveServer2-Background-Pool: Thread-358] | Error 
executing query, currentState RUNNING,  | 
org.apache.spark.internal.Logging$class.logError(Logging.scala:91)org.apache.carbondata.core.exception.CarbonFileException:
 Error while setting modified time:  at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.setLastModifiedTime(AbstractDFSCarbonFile.java:192)
 at 
org.apache.spark.sql.secondaryindex.util.FileInternalUtil$.touchStoreTimeStamp(FileInternalUtil.scala:53)
 at 
org.apache.spark.sql.hive.CarbonHiveIndexMetadataUtil$.removeIndexInfoFromParentTable(CarbonHiveIndexMetadataUtil.scala:111)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.removeIndexInfoFromParentTable(DropIndexCommand.scala:261)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.dropIndex(DropIndexCommand.scala:179)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.run(DropIndexCommand.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at 
org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at 
org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370) at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at

[jira] [Updated] (CARBONDATA-3805) Drop index on bloom index fails

2020-05-07 Thread Chetan Bhat (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-3805:

Description: 
Drop index on bloom index fails

0: jdbc:hive2://10.20.255.35:23040/default> create table brinjal_bloom (imei 
string,AMSize string,channelsId string,ActiveCountry string, Activecity 
string,gamePointId double,deviceInformationId double,productionDate 
Timestamp,deliveryDate timestamp,deliverycharge double) STORED as carbondata 
TBLPROPERTIES('table_blocksize'='1');
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (0.261 seconds)
 0: jdbc:hive2://10.20.255.35:23040/default> LOAD DATA INPATH 
'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE brinjal_bloom 
OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (2.196 seconds)
 0: jdbc:hive2://10.20.255.35:23040/default> CREATE INDEX dm_brinjal ON TABLE 
brinjal_bloom(AMSize) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 
'BLOOM_FPP'='0.1');
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (1.039 seconds)
 0: jdbc:hive2://10.20.255.35:23040/default> drop index dm_brinjal on TABLE 
brinjal_bloom;

 Error: org.apache.carbondata.core.exception.CarbonFileException: Error while 
setting modified time: (state=,code=0)

*Exception -*

2020-05-07 20:10:13,865 | ERROR | [HiveServer2-Background-Pool: Thread-358] | 
Error executing query, currentState RUNNING,  | 
org.apache.spark.internal.Logging$class.logError(Logging.scala:91)2020-05-07 
20:10:13,865 | ERROR | [HiveServer2-Background-Pool: Thread-358] | Error 
executing query, currentState RUNNING,  | 
org.apache.spark.internal.Logging$class.logError(Logging.scala:91)org.apache.carbondata.core.exception.CarbonFileException:
 Error while setting modified time:  at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.setLastModifiedTime(AbstractDFSCarbonFile.java:192)
 at 
org.apache.spark.sql.secondaryindex.util.FileInternalUtil$.touchStoreTimeStamp(FileInternalUtil.scala:53)
 at 
org.apache.spark.sql.hive.CarbonHiveIndexMetadataUtil$.removeIndexInfoFromParentTable(CarbonHiveIndexMetadataUtil.scala:111)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.removeIndexInfoFromParentTable(DropIndexCommand.scala:261)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.dropIndex(DropIndexCommand.scala:179)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.run(DropIndexCommand.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at 
org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at 
org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370) at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369) at 
org.apache.spark.sql.Dataset.(Dataset.scala:194) at 
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) at 
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at 
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:232)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
 at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at

[jira] [Updated] (CARBONDATA-3805) Drop index on bloom and lucene index fails

2020-05-07 Thread Chetan Bhat (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-3805:

Summary: Drop index on bloom and lucene index fails  (was: Drop index on 
bloom index fails)

> Drop index on bloom and lucene index fails
> --
>
> Key: CARBONDATA-3805
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3805
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 2.0.0
> Environment: Spark 2.3.2, Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Major
>
> Drop index on bloom index fails
> 0: jdbc:hive2://10.20.255.35:23040/default> create table brinjal_bloom (imei 
> string,AMSize string,channelsId string,ActiveCountry string, Activecity 
> string,gamePointId double,deviceInformationId double,productionDate 
> Timestamp,deliveryDate timestamp,deliverycharge double) STORED as carbondata 
> TBLPROPERTIES('table_blocksize'='1');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.261 seconds)
>  0: jdbc:hive2://10.20.255.35:23040/default> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE brinjal_bloom 
> OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
> '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
> 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (2.196 seconds)
>  0: jdbc:hive2://10.20.255.35:23040/default> CREATE INDEX dm_brinjal ON TABLE 
> brinjal_bloom(AMSize) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 
> 'BLOOM_FPP'='0.1');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.039 seconds)
>  0: jdbc:hive2://10.20.255.35:23040/default> drop index dm_brinjal on TABLE 
> brinjal_bloom;
>  Error: org.apache.carbondata.core.exception.CarbonFileException: Error while 
> setting modified time: (state=,code=0)
> *Exception -*
> 2020-05-07 20:10:13,865 | ERROR | [HiveServer2-Background-Pool: Thread-358] | 
> Error executing query, currentState RUNNING,  | 
> org.apache.spark.internal.Logging$class.logError(Logging.scala:91)2020-05-07 
> 20:10:13,865 | ERROR | [HiveServer2-Background-Pool: Thread-358] | Error 
> executing query, currentState RUNNING,  | 
> org.apache.spark.internal.Logging$class.logError(Logging.scala:91)org.apache.carbondata.core.exception.CarbonFileException:
>  Error while setting modified time:  at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.setLastModifiedTime(AbstractDFSCarbonFile.java:192)
>  at 
> org.apache.spark.sql.secondaryindex.util.FileInternalUtil$.touchStoreTimeStamp(FileInternalUtil.scala:53)
>  at 
> org.apache.spark.sql.hive.CarbonHiveIndexMetadataUtil$.removeIndexInfoFromParentTable(CarbonHiveIndexMetadataUtil.scala:111)
>  at 
> org.apache.spark.sql.execution.command.index.DropIndexCommand.removeIndexInfoFromParentTable(DropIndexCommand.scala:261)
>  at 
> org.apache.spark.sql.execution.command.index.DropIndexCommand.dropIndex(DropIndexCommand.scala:179)
>  at 
> org.apache.spark.sql.execution.command.index.DropIndexCommand.run(DropIndexCommand.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>  at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at 
> org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at 
> org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370) at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369) at 
> org.apache.spark.sql.Dataset.(Dataset.scala:194) at 
> org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) at 
> org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at 
> org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:232)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175)
>  at 
>

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801] Query on partition table with SI having multiple partition columns gives empty results

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-625301427


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1252/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3748: [CARBONDATA-3801] Query on partition table with SI having multiple partition columns gives empty results

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3748:
URL: https://github.com/apache/carbondata/pull/3748#issuecomment-625300724


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2970/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] Indhumathi27 opened a new pull request #3754: [CARBONDATA-3808] Added documentation for cdc and scd scenarios

2020-05-07 Thread GitBox



Indhumathi27 opened a new pull request #3754:
URL: https://github.com/apache/carbondata/pull/3754


### Why is this PR needed?
Added documentation for cdc and scd scenarios

### What changes were proposed in this PR?
   Added documentation for cdc and scd scenarios
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] Indhumathi27 commented on pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



Indhumathi27 commented on pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#issuecomment-625278288


   @niuge01 Please add a link in readme.md file



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (CARBONDATA-3808) Add documentation for cdc and scd scenarios

2020-05-07 Thread Indhumathi Muthumurugesh (Jira)

Indhumathi Muthumurugesh created CARBONDATA-3808:


 Summary: Add documentation for cdc and scd scenarios
 Key: CARBONDATA-3808
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3808
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-05-07 Thread Prasanna Ravichandran (Jira)

Prasanna Ravichandran created CARBONDATA-3807:
-

 Summary: Filter queries and projection queries with bloom columns 
are not hitting the bloom datamap.
 Key: CARBONDATA-3807
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
 Project: CarbonData
  Issue Type: Bug
 Environment: Ant cluster - opensource
Reporter: Prasanna Ravichandran
 Attachments: bloom-filtercolumn-plan.png, bloom-show index.png

Filter queries and projection queries with bloom columns are not hitting the 
bloom datamap.

 

Test queries: 

drop table if exists uniqdata;
CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, 
dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');

create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');

show indexes on uniqdata;

explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; --not 
hitting;

explain select cust_name from uniqdata; --not hitting;

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CARBONDATA-3806) Create bloom datamap fails with null pointer exception

2020-05-07 Thread Chetan Bhat (Jira)

Chetan Bhat created CARBONDATA-3806:
---

 Summary: Create bloom datamap fails with null pointer exception
 Key: CARBONDATA-3806
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3806
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.6.1
 Environment: Spark 2.3.2
Reporter: Chetan Bhat


Create bloom datamap fails with null pointer exception

create table brinjal_bloom (imei string,AMSize string,channelsId 
string,ActiveCountry string, Activecity string,gamePointId 
double,deviceInformationId double,productionDate Timestamp,deliveryDate 
timestamp,deliverycharge double) STORED BY 'carbondata' 
TBLPROPERTIES('table_blocksize'='1');
LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE 
brinjal_bloom OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');



0: jdbc:hive2://10.20.255.171:23040/default> CREATE DATAMAP dm_brinjal4 ON 
TABLE brinjal_bloom USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 
'AMSize', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 210.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
210.0 (TID 1477, vm2, executor 2): java.lang.NullPointerException
 at 
org.apache.carbondata.core.datamap.Segment.getCommittedIndexFile(Segment.java:150)
 at 
org.apache.carbondata.core.util.BlockletDataMapUtil.getTableBlockUniqueIdentifiers(BlockletDataMapUtil.java:198)
 at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getTableBlockIndexUniqueIdentifiers(BlockletDataMapFactory.java:176)
 at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getDataMaps(BlockletDataMapFactory.java:154)
 at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getSegmentProperties(BlockletDataMapFactory.java:425)
 at 
org.apache.carbondata.datamap.IndexDataMapRebuildRDD.internalCompute(IndexDataMapRebuildRDD.scala:359)
 at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:84)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:109)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

Driver stacktrace: (state=,code=0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (CARBONDATA-3792) remove the system folder location property as it is not used and refactor the getSystemFolderLocation code

2020-05-07 Thread Kunal Kapoor (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3792.
--
Fix Version/s: 2.0.0
   Resolution: Fixed

> remove the system folder location property as it is not used and refactor the 
> getSystemFolderLocation code
> --
>
> Key: CARBONDATA-3792
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3792
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> remove the system folder location property as it is not used and refactor the 
> getSystemFolderLocation code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] kunal642 commented on pull request #3743: [CARBONDATA-3792]Refactor system folder location and removed unwanted property

2020-05-07 Thread GitBox



kunal642 commented on pull request #3743:
URL: https://github.com/apache/carbondata/pull/3743#issuecomment-625249259


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 edited a comment on pull request #3751: [CARBONDATA-3803] Mark CarbonSession as deprecated since 2.0

2020-05-07 Thread GitBox



kunal642 edited a comment on pull request #3751:
URL: https://github.com/apache/carbondata/pull/3751#issuecomment-625229890


   Please print a warn log in CarbonSession to indicate to the user that this 
is deprecated



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on pull request #3751: [CARBONDATA-3803] Mark CarbonSession as deprecated since 2.0

2020-05-07 Thread GitBox



kunal642 commented on pull request #3751:
URL: https://github.com/apache/carbondata/pull/3751#issuecomment-625229890


   Please print a log in CarbonSession to indicate to the user that this is 
deprecated



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on pull request #3739: [CARBONDATA-3791] Index server doc changes

2020-05-07 Thread GitBox



kunal642 commented on pull request #3739:
URL: https://github.com/apache/carbondata/pull/3739#issuecomment-625228432


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on pull request #3746: [CARBONDATA-3799] Fix inverted index cannot work with adaptive encoding

2020-05-07 Thread GitBox



kunal642 commented on pull request #3746:
URL: https://github.com/apache/carbondata/pull/3746#issuecomment-625228185


   LGTM.. @jackylk Please review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (CARBONDATA-3805) Drop index on bloom index fails

2020-05-07 Thread Chetan Bhat (Jira)

Chetan Bhat created CARBONDATA-3805:
---

 Summary: Drop index on bloom index fails
 Key: CARBONDATA-3805
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3805
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 2.0.0
 Environment: Spark 2.3.2, Spark 2.4.5
Reporter: Chetan Bhat


Drop index on bloom index fails

0: jdbc:hive2://10.20.255.35:23040/default> create table brinjal_bloom (imei 
string,AMSize string,channelsId string,ActiveCountry string, Activecity 
string,gamePointId double,deviceInformationId double,productionDate 
Timestamp,deliveryDate timestamp,deliverycharge double) STORED as carbondata 
TBLPROPERTIES('table_blocksize'='1');
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.261 seconds)
0: jdbc:hive2://10.20.255.35:23040/default> LOAD DATA INPATH 
'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE brinjal_bloom 
OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (2.196 seconds)
0: jdbc:hive2://10.20.255.35:23040/default> CREATE INDEX dm_brinjal ON TABLE 
brinjal_bloom(AMSize) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 
'BLOOM_FPP'='0.1');
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (1.039 seconds)
0: jdbc:hive2://10.20.255.35:23040/default> drop index dm_brinjal on TABLE 
brinjal_bloom;

 

*Exception -*

2020-05-07 20:10:13,865 | ERROR | [HiveServer2-Background-Pool: Thread-358] | 
Error executing query, currentState RUNNING,  | 
org.apache.spark.internal.Logging$class.logError(Logging.scala:91)2020-05-07 
20:10:13,865 | ERROR | [HiveServer2-Background-Pool: Thread-358] | Error 
executing query, currentState RUNNING,  | 
org.apache.spark.internal.Logging$class.logError(Logging.scala:91)org.apache.carbondata.core.exception.CarbonFileException:
 Error while setting modified time:  at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.setLastModifiedTime(AbstractDFSCarbonFile.java:192)
 at 
org.apache.spark.sql.secondaryindex.util.FileInternalUtil$.touchStoreTimeStamp(FileInternalUtil.scala:53)
 at 
org.apache.spark.sql.hive.CarbonHiveIndexMetadataUtil$.removeIndexInfoFromParentTable(CarbonHiveIndexMetadataUtil.scala:111)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.removeIndexInfoFromParentTable(DropIndexCommand.scala:261)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.dropIndex(DropIndexCommand.scala:179)
 at 
org.apache.spark.sql.execution.command.index.DropIndexCommand.run(DropIndexCommand.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at 
org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at 
org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370) at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369) at 
org.apache.spark.sql.Dataset.(Dataset.scala:194) at 
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) at 
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at 
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:232)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
 at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3753: [WIP] Partition column name should be case insensitive

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3753:
URL: https://github.com/apache/carbondata/pull/3753#issuecomment-625184095


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2969/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3753: [WIP] Partition column name should be case insensitive

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3753:
URL: https://github.com/apache/carbondata/pull/3753#issuecomment-625177043


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1251/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3745: [CARBONDATA-3793]Data load with partition columns fail with InvalidLoadOptionException when load option 'header' is set to '

2020-05-07 Thread GitBox



akashrn5 commented on a change in pull request #3745:
URL: https://github.com/apache/carbondata/pull/3745#discussion_r421406251



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -126,6 +126,7 @@ with Serializable {
 optionsFinal.put(
   "fileheader",
   dataSchema.fields.map(_.name.toLowerCase).mkString(",") + "," + 
partitionStr)
+optionsFinal.put("header", "false")

Review comment:
   i think you can remove from filling in optionsLocal





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



Indhumathi27 commented on a change in pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#discussion_r421233356



##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios

Review comment:
   Please add license header

##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios

Review comment:
   ```suggestion
   ## Usage scenarios
   ```

##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description

Review comment:
   ```suggestion
   ## Usage description
   ```

##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process

Review comment:
   ```suggestion
   ### Writing process
   ```

##
File path: docs/flink-integration-guide.md
##
@@ -0,0 +1,193 @@
+##Usage scenarios
+  
+  A typical scenario is that the data is cleaned and preprocessed by Flink, 
and then written to Carbon, 
+  for subsequent analysis and queries. 
+
+  The CarbonData flink integration module is used connect Flink and Carbon in 
the above scenario.
+
+  The CarbonData flink integration module provides a set of Flink BulkWriter 
implementations 
+  (CarbonLocalWriter and CarbonS3Writer). The data is processed by the Flink, 
and finally written into 
+  the stage directory of the target table by the CarbonXXXWriter. 
+
+  By default, those data in table stage directory, can not be immediately 
queried, those data can be queried 
+  after the "INSERT INTO $tableName STAGE" command is executed.
+
+  Since the flink data written to carbon is endless, in order to ensure the 
visibility of data 
+  and the controllable amount of data processed during the execution of each 
insert form stage command, 
+  the user should execute the insert from stage command in a timely manner.
+
+  The execution interval of the insert form stage command should take the data 
visibility requirements 
+  of the actual business and the flink data traffic. When the data visibility 
requirements are high 
+  or the data traffic is large, the execution interval should be appropriately 
shortened.
+
+##Usage description
+
+###Writing process
+
+  Typical flink stream: Source -> Process -> Output(Carbon Writer Sink)
+  
+  Pseudo code and

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3745: [CARBONDATA-3793]Data load with partition columns fail with InvalidLoadOptionException when load option 'header' is set to '

2020-05-07 Thread GitBox



akashrn5 commented on a change in pull request #3745:
URL: https://github.com/apache/carbondata/pull/3745#discussion_r421350945



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -126,6 +126,7 @@ with Serializable {
 optionsFinal.put(
   "fileheader",
   dataSchema.fields.map(_.name.toLowerCase).mkString(",") + "," + 
partitionStr)
+optionsFinal.put("header", "false")

Review comment:
   please refer line 77 in 
org.apache.carbondata.processing.loading.model.CarbonLoadModelBuilder#build(java.util.Map,
 long, java.lang.String), can be handled like that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] QiangCai opened a new pull request #3753: [WIP] Partition column name should be case insensitive

2020-05-07 Thread GitBox



QiangCai opened a new pull request #3753:
URL: https://github.com/apache/carbondata/pull/3753


### Why is this PR needed?
when inserting into the static partition, the partition column name is case 
sensitive now.

### What changes were proposed in this PR?
   the partition column name should be case insensitive.
   convert all partition column names to low case. 
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3739: [CARBONDATA-3791] Index server doc changes

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3739:
URL: https://github.com/apache/carbondata/pull/3739#issuecomment-625082581


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1250/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3750: [CARBONDATA-3548]Renamed index_handler to spatial_index property and indexColumn is changed to spatialColumn

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3750:
URL: https://github.com/apache/carbondata/pull/3750#issuecomment-625061586


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2967/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#issuecomment-625052127


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2966/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3752: [CARBONDATA-3804] Provide end-to-end flink integration guide

2020-05-07 Thread GitBox



CarbonDataQA1 commented on pull request #3752:
URL: https://github.com/apache/carbondata/pull/3752#issuecomment-625050735


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1248/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

46 matches

Mail list logo