[jira] [Closed] (CARBONDATA-4227) SDK CarbonWriterBuilder cannot execute `build()` several times with different output path

2021-08-04 Thread ChenKai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenKai closed CARBONDATA-4227.
---
Resolution: Not A Bug

> SDK CarbonWriterBuilder cannot execute `build()` several times with different 
> output path
> -
>
> Key: CARBONDATA-4227
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4227
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: ChenKai
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Sometimes we want to reuse CarbonWriterBuilder object to build CarbonWriter 
> with different output paths, but it does not work.
> For example: 
> {code:scala}
> val builder = CarbonWriter.builder().withCsvInput(...).writtenBy(...)
> // 1. first writing with path1
> val writer1 = builder.outputPath(path1).build()
> // write data, it works 
> // 2. second writing with path2
> val writer2 = builder.outputPath(path2).build()
> // write data, it does not work. It still writes data to path1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4227) SDK CarbonWriterBuilder cannot execute `build()` several times with different output path

2021-08-04 Thread ChenKai (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392791#comment-17392791
 ] 

ChenKai commented on CARBONDATA-4227:
-

Hi [~nihal], thanks for your reply. I think you have a point, it's really 
confusing to reuse the `builder` object to build different CarbonWriters, but 
my original intention is to reduce the generation of the `builder` object. It 
is also possible to do as you suggest, then I will close this issue, thanks.

> SDK CarbonWriterBuilder cannot execute `build()` several times with different 
> output path
> -
>
> Key: CARBONDATA-4227
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4227
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: ChenKai
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Sometimes we want to reuse CarbonWriterBuilder object to build CarbonWriter 
> with different output paths, but it does not work.
> For example: 
> {code:scala}
> val builder = CarbonWriter.builder().withCsvInput(...).writtenBy(...)
> // 1. first writing with path1
> val writer1 = builder.outputPath(path1).build()
> // write data, it works 
> // 2. second writing with path2
> val writer2 = builder.outputPath(path2).build()
> // write data, it does not work. It still writes data to path1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4227) SDK CarbonWriterBuilder cannot execute `build()` several times with different output path

2021-06-22 Thread ChenKai (Jira)
ChenKai created CARBONDATA-4227:
---

 Summary: SDK CarbonWriterBuilder cannot execute `build()` several 
times with different output path
 Key: CARBONDATA-4227
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4227
 Project: CarbonData
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.1
Reporter: ChenKai


Sometimes we want to reuse CarbonWriterBuilder object to build CarbonWriter 
with different output paths, but it does not work.

For example: 

{code:scala}
val builder = CarbonWriter.builder().withCsvInput(...).writtenBy(...)

// 1. first writing with path1
val writer1 = builder.outputPath(path1).build()
// write data, it works 

// 2. second writing with path2
val writer2 = builder.outputPath(path2).build()
// write data, it does not work. It still writes data to path1

{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3953) Dead lock when doing dataframe persist and loading

2020-08-18 Thread ChenKai (Jira)
ChenKai created CARBONDATA-3953:
---

 Summary: Dead lock when doing dataframe persist and loading
 Key: CARBONDATA-3953
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3953
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: ChenKai
 Attachments: image-2020-08-18-15-59-46-108.png, 
image-2020-08-18-16-03-33-370.png

Thread-1
 !image-2020-08-18-15-59-46-108.png! 

Thread-2
 !image-2020-08-18-16-03-33-370.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3942) Fix type cast when loading data into partitioned table

2020-08-06 Thread ChenKai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenKai updated CARBONDATA-3942:

Summary: Fix type cast when loading data into partitioned table  (was: Fix 
type cast when doing data load into partitioned table)

> Fix type cast when loading data into partitioned table
> --
>
> Key: CARBONDATA-3942
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3942
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.1.0
>Reporter: ChenKai
>Priority: Major
>
> Loading Int type data to carbondata double type, the value will be broken 
> like this:
> +---++++
> |cnt |name|time|
> +---++++
> |4.9E-323|a |2020|
> |1.0E-322|b |2020|
> +---++++
> original cnt is: 10, 20
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3942) Fix type cast when doing data load into partitioned table

2020-08-06 Thread ChenKai (Jira)
ChenKai created CARBONDATA-3942:
---

 Summary: Fix type cast when doing data load into partitioned table
 Key: CARBONDATA-3942
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3942
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 2.1.0
Reporter: ChenKai


Loading Int type data to carbondata double type, the value will be broken like 
this:

+---++++
|cnt |name|time|
+---++++
|4.9E-323|a |2020|
|1.0E-322|b |2020|
+---++++

original cnt is: 10, 20

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3891) Loading Data to the partitioned table will update all segments updateDeltaEndTimestamp

2020-07-08 Thread ChenKai (Jira)
ChenKai created CARBONDATA-3891:
---

 Summary: Loading Data to the partitioned table will update all 
segments updateDeltaEndTimestamp
 Key: CARBONDATA-3891
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3891
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: ChenKai


Loading Data to the partitioned table will update all segments 
updateDeltaEndTimestamp,that will cause the driver to clear all segments cache 
when doing the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3657) [FOLLOW-UP] Support alter hive table add columns with complex types

2020-01-15 Thread ChenKai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenKai updated CARBONDATA-3657:

Description: 
FOLLOW-UP CARBONDATA-3628

Alter hive table is not fully supported in carbon, the unsupported are as 
follows:
 * Map
 * Array
 * Struct
 * Decimal with precision and scale
 * Column with comments

  was:FOLLOW-UP CARBONDATA-3628


> [FOLLOW-UP] Support alter hive table add columns with complex types
> ---
>
> Key: CARBONDATA-3657
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3657
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.6.1
>Reporter: ChenKai
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> FOLLOW-UP CARBONDATA-3628
> Alter hive table is not fully supported in carbon, the unsupported are as 
> follows:
>  * Map
>  * Array
>  * Struct
>  * Decimal with precision and scale
>  * Column with comments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3657) [FOLLOW-UP] Support alter hive table add columns with complex types

2020-01-08 Thread ChenKai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenKai updated CARBONDATA-3657:

Summary: [FOLLOW-UP] Support alter hive table add columns with complex 
types  (was: [FOLLOW-UP] Alter table add columns support complex types)

> [FOLLOW-UP] Support alter hive table add columns with complex types
> ---
>
> Key: CARBONDATA-3657
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3657
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.6.1
>Reporter: ChenKai
>Priority: Major
>
> FOLLOW-UP CARBONDATA-3628



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3657) [FOLLOW-UP] Alter table add columns support complex types

2020-01-08 Thread ChenKai (Jira)
ChenKai created CARBONDATA-3657:
---

 Summary: [FOLLOW-UP] Alter table add columns support complex types
 Key: CARBONDATA-3657
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3657
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 1.6.1
Reporter: ChenKai


FOLLOW-UP CARBONDATA-3628



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3628) Alter hive table add complex column type

2019-12-23 Thread ChenKai (Jira)
ChenKai created CARBONDATA-3628:
---

 Summary: Alter hive table add complex column type
 Key: CARBONDATA-3628
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3628
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 1.6.0
Reporter: ChenKai


ERROR: NullPointerException
{code:java}
 alter table alter_hive add columns (var map)

{code}
Tips: Complex type only supports default, see *DataTypeUtil#valueOf*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3469) CarbonData with 2.3.2 can not run on CDH spark 2.4

2019-11-06 Thread ChenKai (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969008#comment-16969008
 ] 

ChenKai commented on CARBONDATA-3469:
-

[~imperio] You can use this version 
[growingio/carbondata|https://github.com/growingio/carbondata] temporarily, 
maybe need some small changes. :D

> CarbonData with 2.3.2 can not run on CDH spark 2.4
> --
>
> Key: CARBONDATA-3469
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3469
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.5.3
>Reporter: wxmimperio
>Priority: Major
>
> *{color:#33}spark2-shell --jars 
> [apache-carbondata-1.5.3-bin-spark2.3.2-hadoop2.7.2.jar|https://dist.apache.org/repos/dist/release/carbondata/1.5.3/apache-carbondata-1.5.3-bin-spark2.3.2-hadoop2.7.2.jar]{color}*
>  
> {code:java}
> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.internal.SharedState.externalCatalog()Lorg/apache/spark/sql/catalyst/catalog/ExternalCatalog;{code}
> {code:java}
> scala> carbon.sql(
> | s"""
> | | CREATE TABLE IF NOT EXISTS test_table(
> | | id string,
> | | name string,
> | | city string,
> | | age Int)
> | | STORED AS carbondata
> | """.stripMargin)
> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.internal.SharedState.externalCatalog()Lorg/apache/spark/sql/catalyst/catalog/ExternalCatalog;
> at 
> org.apache.spark.sql.hive.CarbonSessionStateBuilder.externalCatalog(CarbonSessionState.scala:227)
> at 
> org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog$lzycompute(CarbonSessionState.scala:214)
> at 
> org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog(CarbonSessionState.scala:212)
> at 
> org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog(CarbonSessionState.scala:191)
> at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$1.apply(BaseSessionStateBuilder.scala:291)
> at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$1.apply(BaseSessionStateBuilder.scala:291)
> at 
> org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:77)
> at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:77)
> at org.apache.spark.sql.CarbonEnv$.getInstance(CarbonEnv.scala:135)
> at 
> org.apache.spark.sql.CarbonSession$.updateSessionInfoToCurrentThread(CarbonSession.scala:326)
> at 
> org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan(CarbonSparkSqlParser.scala:47)
> at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:125)
> at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:88)
> ... 59 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3565) Binary to string issue when loading dataframe data in NewRddIterator

2019-11-03 Thread ChenKai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenKai updated CARBONDATA-3565:

Description: 
* issue
Spark DataFrame(SQL) load complex binary data to a hive table, the data will be 
broken when reading out. I see in RddIterator, the data will be converted to a 
string, and then be converted back.

* test case
Binary data can be *DataOutputStream#writeDouble* and so on.

* discussion
I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep 
into the code in 2016, the code was used in kettle *CsvInput* (commit: 
0018756d). But the code has been removed now, I think this converting operation 
is a little redundant. (UPDATE: The follow-up code GenericParser will use this 
string-convert logic, should consider here.)

  was:
* issue
Spark DataFrame(SQL) load complex binary data to a hive table, the data will be 
broken when reading out. I see in RddIterator, the data will be converted to a 
string, and then be converted back.

* test case
Binary data can be *DataOutputStream#writeDouble* and so on.

* discussion
I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep 
into the code in 2016, the code was used in kettle *CsvInput* (commit: 
0018756d). But the code has been removed now, I think this converting operation 
is a little redundant.


> Binary to string issue when loading dataframe data in NewRddIterator
> 
>
> Key: CARBONDATA-3565
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3565
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.6.0
>Reporter: ChenKai
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> * issue
> Spark DataFrame(SQL) load complex binary data to a hive table, the data will 
> be broken when reading out. I see in RddIterator, the data will be converted 
> to a string, and then be converted back.
> * test case
> Binary data can be *DataOutputStream#writeDouble* and so on.
> * discussion
> I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep 
> into the code in 2016, the code was used in kettle *CsvInput* (commit: 
> 0018756d). But the code has been removed now, I think this converting 
> operation is a little redundant. (UPDATE: The follow-up code GenericParser 
> will use this string-convert logic, should consider here.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3565) Binary to string issue when loading dataframe data in NewRddIterator

2019-11-02 Thread ChenKai (Jira)
ChenKai created CARBONDATA-3565:
---

 Summary: Binary to string issue when loading dataframe data in 
NewRddIterator
 Key: CARBONDATA-3565
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3565
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 1.6.0
Reporter: ChenKai


* issue
Spark DataFrame(SQL) load complex binary data to a hive table, the data will be 
broken when reading out. I see in RddIterator, the data will be converted to a 
string, and then be converted back.

* test case
Binary data can be *DataOutputStream#writeDouble* and so on.

* discussion
I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep 
into the code in 2016, the code was used in kettle *CsvInput* (commit: 
0018756d). But the code has been removed now, I think this converting operation 
is a little redundant.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)