[jira] [Closed] (CARBONDATA-4227) SDK CarbonWriterBuilder cannot execute `build()` several times with different output path
[ https://issues.apache.org/jira/browse/CARBONDATA-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenKai closed CARBONDATA-4227. --- Resolution: Not A Bug > SDK CarbonWriterBuilder cannot execute `build()` several times with different > output path > - > > Key: CARBONDATA-4227 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4227 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.1 >Reporter: ChenKai >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Sometimes we want to reuse CarbonWriterBuilder object to build CarbonWriter > with different output paths, but it does not work. > For example: > {code:scala} > val builder = CarbonWriter.builder().withCsvInput(...).writtenBy(...) > // 1. first writing with path1 > val writer1 = builder.outputPath(path1).build() > // write data, it works > // 2. second writing with path2 > val writer2 = builder.outputPath(path2).build() > // write data, it does not work. It still writes data to path1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4227) SDK CarbonWriterBuilder cannot execute `build()` several times with different output path
[ https://issues.apache.org/jira/browse/CARBONDATA-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392791#comment-17392791 ] ChenKai commented on CARBONDATA-4227: - Hi [~nihal], thanks for your reply. I think you have a point, it's really confusing to reuse the `builder` object to build different CarbonWriters, but my original intention is to reduce the generation of the `builder` object. It is also possible to do as you suggest, then I will close this issue, thanks. > SDK CarbonWriterBuilder cannot execute `build()` several times with different > output path > - > > Key: CARBONDATA-4227 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4227 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.1 >Reporter: ChenKai >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Sometimes we want to reuse CarbonWriterBuilder object to build CarbonWriter > with different output paths, but it does not work. > For example: > {code:scala} > val builder = CarbonWriter.builder().withCsvInput(...).writtenBy(...) > // 1. first writing with path1 > val writer1 = builder.outputPath(path1).build() > // write data, it works > // 2. second writing with path2 > val writer2 = builder.outputPath(path2).build() > // write data, it does not work. It still writes data to path1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4227) SDK CarbonWriterBuilder cannot execute `build()` several times with different output path
ChenKai created CARBONDATA-4227: --- Summary: SDK CarbonWriterBuilder cannot execute `build()` several times with different output path Key: CARBONDATA-4227 URL: https://issues.apache.org/jira/browse/CARBONDATA-4227 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 2.1.1 Reporter: ChenKai Sometimes we want to reuse CarbonWriterBuilder object to build CarbonWriter with different output paths, but it does not work. For example: {code:scala} val builder = CarbonWriter.builder().withCsvInput(...).writtenBy(...) // 1. first writing with path1 val writer1 = builder.outputPath(path1).build() // write data, it works // 2. second writing with path2 val writer2 = builder.outputPath(path2).build() // write data, it does not work. It still writes data to path1 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3953) Dead lock when doing dataframe persist and loading
ChenKai created CARBONDATA-3953: --- Summary: Dead lock when doing dataframe persist and loading Key: CARBONDATA-3953 URL: https://issues.apache.org/jira/browse/CARBONDATA-3953 Project: CarbonData Issue Type: Bug Affects Versions: 2.1.0 Reporter: ChenKai Attachments: image-2020-08-18-15-59-46-108.png, image-2020-08-18-16-03-33-370.png Thread-1 !image-2020-08-18-15-59-46-108.png! Thread-2 !image-2020-08-18-16-03-33-370.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3942) Fix type cast when loading data into partitioned table
[ https://issues.apache.org/jira/browse/CARBONDATA-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenKai updated CARBONDATA-3942: Summary: Fix type cast when loading data into partitioned table (was: Fix type cast when doing data load into partitioned table) > Fix type cast when loading data into partitioned table > -- > > Key: CARBONDATA-3942 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3942 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.1.0 >Reporter: ChenKai >Priority: Major > > Loading Int type data to carbondata double type, the value will be broken > like this: > +---++++ > |cnt |name|time| > +---++++ > |4.9E-323|a |2020| > |1.0E-322|b |2020| > +---++++ > original cnt is: 10, 20 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3942) Fix type cast when doing data load into partitioned table
ChenKai created CARBONDATA-3942: --- Summary: Fix type cast when doing data load into partitioned table Key: CARBONDATA-3942 URL: https://issues.apache.org/jira/browse/CARBONDATA-3942 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 2.1.0 Reporter: ChenKai Loading Int type data to carbondata double type, the value will be broken like this: +---++++ |cnt |name|time| +---++++ |4.9E-323|a |2020| |1.0E-322|b |2020| +---++++ original cnt is: 10, 20 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3891) Loading Data to the partitioned table will update all segments updateDeltaEndTimestamp
ChenKai created CARBONDATA-3891: --- Summary: Loading Data to the partitioned table will update all segments updateDeltaEndTimestamp Key: CARBONDATA-3891 URL: https://issues.apache.org/jira/browse/CARBONDATA-3891 Project: CarbonData Issue Type: Bug Affects Versions: 2.1.0 Reporter: ChenKai Loading Data to the partitioned table will update all segments updateDeltaEndTimestamp,that will cause the driver to clear all segments cache when doing the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3657) [FOLLOW-UP] Support alter hive table add columns with complex types
[ https://issues.apache.org/jira/browse/CARBONDATA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenKai updated CARBONDATA-3657: Description: FOLLOW-UP CARBONDATA-3628 Alter hive table is not fully supported in carbon, the unsupported are as follows: * Map * Array * Struct * Decimal with precision and scale * Column with comments was:FOLLOW-UP CARBONDATA-3628 > [FOLLOW-UP] Support alter hive table add columns with complex types > --- > > Key: CARBONDATA-3657 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3657 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.6.1 >Reporter: ChenKai >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > FOLLOW-UP CARBONDATA-3628 > Alter hive table is not fully supported in carbon, the unsupported are as > follows: > * Map > * Array > * Struct > * Decimal with precision and scale > * Column with comments -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3657) [FOLLOW-UP] Support alter hive table add columns with complex types
[ https://issues.apache.org/jira/browse/CARBONDATA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenKai updated CARBONDATA-3657: Summary: [FOLLOW-UP] Support alter hive table add columns with complex types (was: [FOLLOW-UP] Alter table add columns support complex types) > [FOLLOW-UP] Support alter hive table add columns with complex types > --- > > Key: CARBONDATA-3657 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3657 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.6.1 >Reporter: ChenKai >Priority: Major > > FOLLOW-UP CARBONDATA-3628 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3657) [FOLLOW-UP] Alter table add columns support complex types
ChenKai created CARBONDATA-3657: --- Summary: [FOLLOW-UP] Alter table add columns support complex types Key: CARBONDATA-3657 URL: https://issues.apache.org/jira/browse/CARBONDATA-3657 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.6.1 Reporter: ChenKai FOLLOW-UP CARBONDATA-3628 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3628) Alter hive table add complex column type
ChenKai created CARBONDATA-3628: --- Summary: Alter hive table add complex column type Key: CARBONDATA-3628 URL: https://issues.apache.org/jira/browse/CARBONDATA-3628 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.6.0 Reporter: ChenKai ERROR: NullPointerException {code:java} alter table alter_hive add columns (var map) {code} Tips: Complex type only supports default, see *DataTypeUtil#valueOf* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3469) CarbonData with 2.3.2 can not run on CDH spark 2.4
[ https://issues.apache.org/jira/browse/CARBONDATA-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969008#comment-16969008 ] ChenKai commented on CARBONDATA-3469: - [~imperio] You can use this version [growingio/carbondata|https://github.com/growingio/carbondata] temporarily, maybe need some small changes. :D > CarbonData with 2.3.2 can not run on CDH spark 2.4 > -- > > Key: CARBONDATA-3469 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3469 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.5.3 >Reporter: wxmimperio >Priority: Major > > *{color:#33}spark2-shell --jars > [apache-carbondata-1.5.3-bin-spark2.3.2-hadoop2.7.2.jar|https://dist.apache.org/repos/dist/release/carbondata/1.5.3/apache-carbondata-1.5.3-bin-spark2.3.2-hadoop2.7.2.jar]{color}* > > {code:java} > java.lang.NoSuchMethodError: > org.apache.spark.sql.internal.SharedState.externalCatalog()Lorg/apache/spark/sql/catalyst/catalog/ExternalCatalog;{code} > {code:java} > scala> carbon.sql( > | s""" > | | CREATE TABLE IF NOT EXISTS test_table( > | | id string, > | | name string, > | | city string, > | | age Int) > | | STORED AS carbondata > | """.stripMargin) > java.lang.NoSuchMethodError: > org.apache.spark.sql.internal.SharedState.externalCatalog()Lorg/apache/spark/sql/catalyst/catalog/ExternalCatalog; > at > org.apache.spark.sql.hive.CarbonSessionStateBuilder.externalCatalog(CarbonSessionState.scala:227) > at > org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog$lzycompute(CarbonSessionState.scala:214) > at > org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog(CarbonSessionState.scala:212) > at > org.apache.spark.sql.hive.CarbonSessionStateBuilder.catalog(CarbonSessionState.scala:191) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$1.apply(BaseSessionStateBuilder.scala:291) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$1.apply(BaseSessionStateBuilder.scala:291) > at > org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:77) > at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:77) > at org.apache.spark.sql.CarbonEnv$.getInstance(CarbonEnv.scala:135) > at > org.apache.spark.sql.CarbonSession$.updateSessionInfoToCurrentThread(CarbonSession.scala:326) > at > org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan(CarbonSparkSqlParser.scala:47) > at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:125) > at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:88) > ... 59 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3565) Binary to string issue when loading dataframe data in NewRddIterator
[ https://issues.apache.org/jira/browse/CARBONDATA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenKai updated CARBONDATA-3565: Description: * issue Spark DataFrame(SQL) load complex binary data to a hive table, the data will be broken when reading out. I see in RddIterator, the data will be converted to a string, and then be converted back. * test case Binary data can be *DataOutputStream#writeDouble* and so on. * discussion I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep into the code in 2016, the code was used in kettle *CsvInput* (commit: 0018756d). But the code has been removed now, I think this converting operation is a little redundant. (UPDATE: The follow-up code GenericParser will use this string-convert logic, should consider here.) was: * issue Spark DataFrame(SQL) load complex binary data to a hive table, the data will be broken when reading out. I see in RddIterator, the data will be converted to a string, and then be converted back. * test case Binary data can be *DataOutputStream#writeDouble* and so on. * discussion I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep into the code in 2016, the code was used in kettle *CsvInput* (commit: 0018756d). But the code has been removed now, I think this converting operation is a little redundant. > Binary to string issue when loading dataframe data in NewRddIterator > > > Key: CARBONDATA-3565 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3565 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.6.0 >Reporter: ChenKai >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > * issue > Spark DataFrame(SQL) load complex binary data to a hive table, the data will > be broken when reading out. I see in RddIterator, the data will be converted > to a string, and then be converted back. > * test case > Binary data can be *DataOutputStream#writeDouble* and so on. > * discussion > I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep > into the code in 2016, the code was used in kettle *CsvInput* (commit: > 0018756d). But the code has been removed now, I think this converting > operation is a little redundant. (UPDATE: The follow-up code GenericParser > will use this string-convert logic, should consider here.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3565) Binary to string issue when loading dataframe data in NewRddIterator
ChenKai created CARBONDATA-3565: --- Summary: Binary to string issue when loading dataframe data in NewRddIterator Key: CARBONDATA-3565 URL: https://issues.apache.org/jira/browse/CARBONDATA-3565 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.6.0 Reporter: ChenKai * issue Spark DataFrame(SQL) load complex binary data to a hive table, the data will be broken when reading out. I see in RddIterator, the data will be converted to a string, and then be converted back. * test case Binary data can be *DataOutputStream#writeDouble* and so on. * discussion I think *CarbonScalaUtil#getString* operation can be removed now. I dig deep into the code in 2016, the code was used in kettle *CsvInput* (commit: 0018756d). But the code has been removed now, I think this converting operation is a little redundant. -- This message was sent by Atlassian Jira (v8.3.4#803005)