[jira] [Reopened] (CARBONDATA-2369) Add a document for Non Transactional table with SDK writer guide
[ https://issues.apache.org/jira/browse/CARBONDATA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reopened CARBONDATA-2369: -- Add about avro complex type > Add a document for Non Transactional table with SDK writer guide > > > Key: CARBONDATA-2369 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2369 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2442) Reading two sdk writer output with differnt schema should prompt exception
Ajantha Bhat created CARBONDATA-2442: Summary: Reading two sdk writer output with differnt schema should prompt exception Key: CARBONDATA-2442 URL: https://issues.apache.org/jira/browse/CARBONDATA-2442 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2369) Add a document for Non Transactional table with SDK writer guide
[ https://issues.apache.org/jira/browse/CARBONDATA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2369: Assignee: Ajantha Bhat > Add a document for Non Transactional table with SDK writer guide > > > Key: CARBONDATA-2369 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2369 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2369) Add a document for Non Transactional table with SDK writer guide
[ https://issues.apache.org/jira/browse/CARBONDATA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat closed CARBONDATA-2369. Resolution: Fixed updated in PR #2198 > Add a document for Non Transactional table with SDK writer guide > > > Key: CARBONDATA-2369 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2369 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2479) Multiple issue in sdk writer and external table flow
Ajantha Bhat created CARBONDATA-2479: Summary: Multiple issue in sdk writer and external table flow Key: CARBONDATA-2479 URL: https://issues.apache.org/jira/browse/CARBONDATA-2479 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Multiple issues: fixed external table path display fixed default value for array in AVRO fixed NPE when delete folder before the second select query fixed primetive time stamp sdk load Issue fix -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2472) Refactor NonTransactional table code for Index file IO performance
[ https://issues.apache.org/jira/browse/CARBONDATA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2472: - Summary: Refactor NonTransactional table code for Index file IO performance (was: NonTransactional table performance degarde issue induced by PR #2273) > Refactor NonTransactional table code for Index file IO performance > -- > > Key: CARBONDATA-2472 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2472 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2472) NonTransactional table performance degarde issue induced by #2273
Ajantha Bhat created CARBONDATA-2472: Summary: NonTransactional table performance degarde issue induced by #2273 Key: CARBONDATA-2472 URL: https://issues.apache.org/jira/browse/CARBONDATA-2472 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2472) NonTransactional table performance degarde issue induced by PR #2273
[ https://issues.apache.org/jira/browse/CARBONDATA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2472: - Summary: NonTransactional table performance degarde issue induced by PR #2273 (was: NonTransactional table performance degarde issue induced by #2273) > NonTransactional table performance degarde issue induced by PR #2273 > > > Key: CARBONDATA-2472 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2472 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2472) NonTransactional table performance degarde issue induced by #2273
[ https://issues.apache.org/jira/browse/CARBONDATA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2472: Assignee: Ajantha Bhat > NonTransactional table performance degarde issue induced by #2273 > - > > Key: CARBONDATA-2472 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2472 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2457) Add converter to get Carbon SDK Schema from Avro schema directly.
[ https://issues.apache.org/jira/browse/CARBONDATA-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2457: - Description: In the current implementation, SDK users have to manually create carbon schema of fields from avro schema. This is time consuming and error prone. Also usere should not be worried about this logic. So, abstract the carbon schema creation from avro schema by exposing a method to user. > Add converter to get Carbon SDK Schema from Avro schema directly. > - > > Key: CARBONDATA-2457 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2457 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ajantha Bhat >Priority: Major > > In the current implementation, SDK users have to manually create carbon > schema of fields from avro schema. This is time consuming and error prone. > Also usere should not be worried about this logic. > So, abstract the carbon schema creation from avro schema by exposing a method > to user. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2457) Add converter to get Carbon sdk Schema from Avro schema directly.
Ajantha Bhat created CARBONDATA-2457: Summary: Add converter to get Carbon sdk Schema from Avro schema directly. Key: CARBONDATA-2457 URL: https://issues.apache.org/jira/browse/CARBONDATA-2457 Project: CarbonData Issue Type: Sub-task Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2457) Add converter to get Carbon SDK Schema from Avro schema directly.
[ https://issues.apache.org/jira/browse/CARBONDATA-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2457: - Summary: Add converter to get Carbon SDK Schema from Avro schema directly. (was: Add converter to get Carbon sdk Schema from Avro schema directly.) > Add converter to get Carbon SDK Schema from Avro schema directly. > - > > Key: CARBONDATA-2457 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2457 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ajantha Bhat >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2493) DataType.equals() failes for complex types
Ajantha Bhat created CARBONDATA-2493: Summary: DataType.equals() failes for complex types Key: CARBONDATA-2493 URL: https://issues.apache.org/jira/browse/CARBONDATA-2493 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Only object comparision happens for DataTYpe.equals complex type are not singleton objects, So, even the data types are same, compare returns false One of place I found issue is in ColumnSchema.equals() } {color:#80}else if {color}(!{color:#660e7a}dataType{color}.equals(other.{color:#660e7a}dataType{color})) { {color:#80}return false{color}; } That check will fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support Non Transactional carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon unamanged table_V1.0.pdf > Support Non Transactional carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Assignee: sounak chakraborty >Priority: Major > Attachments: carbon unamanged table_V1.0.pdf > > Time Spent: 21h 50m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2479) Multiple issue in sdk writer and external table flow
[ https://issues.apache.org/jira/browse/CARBONDATA-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2479: - Description: Multiple issues: fixed external table path display fixed default value for array in AVRO fixed NPE when delete folder before the second select query fixed primetive time stamp sdk load Issue fix was: Multiple issues: fixed external table path display fixed default value for array in AVRO fixed NPE when delete folder before the second select query fixed primetive time stamp sdk load Issue fix fixed: avro float value precision change issue Added: SDK support for NO_SORT > Multiple issue in sdk writer and external table flow > > > Key: CARBONDATA-2479 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2479 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Multiple issues: > fixed external table path display > fixed default value for array in AVRO > fixed NPE when delete folder before the second select query > fixed primetive time stamp sdk load Issue fix > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2479) Multiple issue in sdk writer and external table flow
[ https://issues.apache.org/jira/browse/CARBONDATA-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2479: - Description: Multiple issues: fixed external table path display fixed default value for array in AVRO fixed NPE when delete folder before the second select query fixed primetive time stamp sdk load Issue fix fixed: avro float value precision change issue Added: SDK support for NO_SORT was: Multiple issues: fixed external table path display fixed default value for array in AVRO fixed NPE when delete folder before the second select query fixed primetive time stamp sdk load Issue fix > Multiple issue in sdk writer and external table flow > > > Key: CARBONDATA-2479 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2479 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Multiple issues: > fixed external table path display > fixed default value for array in AVRO > fixed NPE when delete folder before the second select query > fixed primetive time stamp sdk load Issue fix > fixed: avro float value precision change issue > Added: SDK support for NO_SORT -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2608) SDK Support JSON data loading directly without AVRO conversion
[ https://issues.apache.org/jira/browse/CARBONDATA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2608: - Summary: SDK Support JSON data loading directly without AVRO conversion (was: Support JSON data loading directly into Carbon table.) > SDK Support JSON data loading directly without AVRO conversion > -- > > Key: CARBONDATA-2608 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2608 > Project: CarbonData > Issue Type: Sub-task >Reporter: sounak chakraborty >Priority: Major > > Support JSON data loading directly into Carbon table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2557) Improve Carbon Reader Schema reading performance on S3
[ https://issues.apache.org/jira/browse/CARBONDATA-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2557: - Summary: Improve Carbon Reader Schema reading performance on S3 (was: SDK Reader performance is very slow in S3) > Improve Carbon Reader Schema reading performance on S3 > --- > > Key: CARBONDATA-2557 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2557 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Major > > On S3 reader performance is slow as current schema fetching API is reading > schema from carobndatafile. If the carbonData file is huge, this can be slow. > Should use index file for getting schema -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2570) Carbon SDK Reader, second time reader instance have an issue in cluster test
Ajantha Bhat created CARBONDATA-2570: Summary: Carbon SDK Reader, second time reader instance have an issue in cluster test Key: CARBONDATA-2570 URL: https://issues.apache.org/jira/browse/CARBONDATA-2570 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat Debugged the issue, This is happening only in cluster. Not in local. root cause: old table's blocklet datamap is not cleared. solution: In CarbonReader.close() API used for clearing datamap is not clearing all the datamap in cluster so change DataMapStoreManager.getInstance().getDefaultDataMap(queryModel.getTable()).clear(); to DataMapStoreManager.getInstance() .clearDataMaps({color:#660e7a}queryModel{color}.getTable().getAbsoluteTableIdentifier()); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2570) Carbon SDK Reader, second time reader instance have an issue in cluster test
[ https://issues.apache.org/jira/browse/CARBONDATA-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497763#comment-16497763 ] Ajantha Bhat commented on CARBONDATA-2570: -- Steps: # Take sdk jars and dependent jars # create a intellij test project without spark cluster dependency # Create a carbon reader on SDK writer's output. Read files and close the reader. # Create a reader on another set of SDK writer output (different schema) but same table name.Now can observe that read fails due to schema mismatch. This is because old blocklet datamap with same table name is still present > Carbon SDK Reader, second time reader instance have an issue in cluster test > > > Key: CARBONDATA-2570 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2570 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > > Debugged the issue, This is happening only in cluster. Not in local. > root cause: old table's blocklet datamap is not cleared. > > solution: In CarbonReader.close() API used for clearing datamap is not > clearing all the datamap in cluster > so change > DataMapStoreManager.getInstance().getDefaultDataMap(queryModel.getTable()).clear(); > to > DataMapStoreManager.getInstance() > .clearDataMaps({color:#660e7a}queryModel{color}.getTable().getAbsoluteTableIdentifier()); > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2577) Nested Array logical type of date, timestamp-millis, timestamp-micros is not working.
Ajantha Bhat created CARBONDATA-2577: Summary: Nested Array logical type of date, timestamp-millis, timestamp-micros is not working. Key: CARBONDATA-2577 URL: https://issues.apache.org/jira/browse/CARBONDATA-2577 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2548) When using presto to query a table on carbondata, the "Could not read blocklet details" exception is reported
[ https://issues.apache.org/jira/browse/CARBONDATA-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492594#comment-16492594 ] Ajantha Bhat commented on CARBONDATA-2548: -- [~kevintop]: Can you please give/attach a create table, load and query to reproduce the issue ? > When using presto to query a table on carbondata, the "Could not read > blocklet details" exception is reported > - > > Key: CARBONDATA-2548 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2548 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 1.4.0 >Reporter: Kevin Kong >Priority: Blocker > > I think it may be line 373 in CarbonTableInputFormat.java. When the > CarbonInputSplit object was constructed, the detailInfo property was not > initialized > > while (((double) bytesRemaining) / splitSize > 1.1) { > int blkIndex = getBlockIndex(blkLocations, length - > bytesRemaining); > splits.add(makeSplit(segment.getSegmentNo(), path, length > - bytesRemaining, > splitSize, blkLocations[blkIndex].getHosts(), > blkLocations[blkIndex].getCachedHosts(), > FileFormat.ROW_V1)); > > > > The detailed exception is as follows: > > 2018-05-28T00:43:23.903+0800 DEBUG query-execution-11 > com.facebook.presto.execution.QueryStateMachine Query > 20180527_164323_8_2xnww failed > java.lang.RuntimeException: Could not read blocklet details > at > org.apache.carbondata.presto.impl.CarbonLocalInputSplit.convertSplit(CarbonLocalInputSplit.java:131) > at > org.apache.carbondata.presto.CarbondataPageSourceProvider.createQueryModel(CarbondataPageSourceProvider.java:139) > at > org.apache.carbondata.presto.CarbondataPageSourceProvider.createReader(CarbondataPageSourceProvider.java:98) > at > org.apache.carbondata.presto.CarbondataPageSourceProvider.createPageSource(CarbondataPageSourceProvider.java:79) > at > com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44) > at > com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56) > at > com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:259) > at com.facebook.presto.operator.Driver.processInternal(Driver.java:337) > at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:241) > at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:614) > at com.facebook.presto.operator.Driver.processFor(Driver.java:235) > at > com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622) > at > com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) > at > com.facebook.presto.execution.executor.LegacyPrioritizedSplitRunner.process(LegacyPrioritizedSplitRunner.java:23) > at > com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:485) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support Non Transactional carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: (was: carbon unamanged table_V1.0.pdf) > Support Non Transactional carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Assignee: sounak chakraborty >Priority: Major > Time Spent: 21h 50m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support Non Transactional carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon NonTranscational Table_v1.0.pdf > Support Non Transactional carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Assignee: sounak chakraborty >Priority: Major > Attachments: carbon NonTranscational Table_v1.0.pdf > > Time Spent: 21h 50m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2555) SDK Reader should have isTransactionalTable = false by default to be inline with SDK writer
Ajantha Bhat created CARBONDATA-2555: Summary: SDK Reader should have isTransactionalTable = false by default to be inline with SDK writer Key: CARBONDATA-2555 URL: https://issues.apache.org/jira/browse/CARBONDATA-2555 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2555) SDK Reader should have isTransactionalTable = false by default to be inline with SDK writer
[ https://issues.apache.org/jira/browse/CARBONDATA-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2555: Assignee: Ajantha Bhat > SDK Reader should have isTransactionalTable = false by default to be inline > with SDK writer > --- > > Key: CARBONDATA-2555 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2555 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2555) SDK Reader should have isTransactionalTable = false by default, to be inline with SDK writer
[ https://issues.apache.org/jira/browse/CARBONDATA-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2555: - Summary: SDK Reader should have isTransactionalTable = false by default, to be inline with SDK writer (was: SDK Reader should have isTransactionalTable = false by default to be inline with SDK writer) > SDK Reader should have isTransactionalTable = false by default, to be inline > with SDK writer > > > Key: CARBONDATA-2555 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2555 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2557) SDK Reader performance is very slow in S3
Ajantha Bhat created CARBONDATA-2557: Summary: SDK Reader performance is very slow in S3 Key: CARBONDATA-2557 URL: https://issues.apache.org/jira/browse/CARBONDATA-2557 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat On S3 reader performance is slow as current schema fetching API is reading schema from carobndatafile. If the carbonData file is huge, this can be slow. Should use index file for getting schema -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2608) SDK Support JSON data loading directly without AVRO conversion
[ https://issues.apache.org/jira/browse/CARBONDATA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2608: Assignee: Ajantha Bhat > SDK Support JSON data loading directly without AVRO conversion > -- > > Key: CARBONDATA-2608 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2608 > Project: CarbonData > Issue Type: Sub-task >Reporter: sounak chakraborty >Assignee: Ajantha Bhat >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Support JSON data loading directly into Carbon table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2417) SDK writer goes to infinite wait when consumer thread goes dead
Ajantha Bhat created CARBONDATA-2417: Summary: SDK writer goes to infinite wait when consumer thread goes dead Key: CARBONDATA-2417 URL: https://issues.apache.org/jira/browse/CARBONDATA-2417 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: SDK writer goes to infinite wait when cosumer thread is dead root cause: due to bad record when exception happens at consumer thread during write, this message is not reached producer (sdk writer). So, SDK keeps writing data assuming consumer will consume it. But as consumer is dead. Queue becomes full and queue.put() will be blocked forever. Solution: If cannot be added to queue, check whether consumer is alive or not after every 10 seconds. If not alive throw exception, if alive try again -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2786) NPE when SDK writer tries to write a file
[ https://issues.apache.org/jira/browse/CARBONDATA-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2786: Assignee: Ajantha Bhat > NPE when SDK writer tries to write a file > - > > Key: CARBONDATA-2786 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2786 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > > #2387 , in > [CarbonProperties.java|https://github.com/apache/carbondata/pull/2387/files#diff-4888f978087a7a1843a22fe016ea6532] > After systemLocation = getStorePath(); Null validation missing for > systemLocation. > because this can be null in SDK case. As Store location is not applicable for > SDK. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2786) NPE when SDK writer tries to write a file
[ https://issues.apache.org/jira/browse/CARBONDATA-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558118#comment-16558118 ] Ajantha Bhat commented on CARBONDATA-2786: -- #2387 , in [CarbonProperties.java|https://github.com/apache/carbondata/pull/2387/files#diff-4888f978087a7a1843a22fe016ea6532] After systemLocation = getStorePath(); Null validation missing for systemLocation. because this can be null in SDK case. As Store location is not applicable for SDK. > NPE when SDK writer tries to write a file > - > > Key: CARBONDATA-2786 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2786 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > > #2387 , in > [CarbonProperties.java|https://github.com/apache/carbondata/pull/2387/files#diff-4888f978087a7a1843a22fe016ea6532] > After systemLocation = getStorePath(); Null validation missing for > systemLocation. > because this can be null in SDK case. As Store location is not applicable for > SDK. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2786) NPE when SDK writer tries to write a file
Ajantha Bhat created CARBONDATA-2786: Summary: NPE when SDK writer tries to write a file Key: CARBONDATA-2786 URL: https://issues.apache.org/jira/browse/CARBONDATA-2786 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat #2387 , in [CarbonProperties.java|https://github.com/apache/carbondata/pull/2387/files#diff-4888f978087a7a1843a22fe016ea6532] After systemLocation = getStorePath(); Null validation missing for systemLocation. because this can be null in SDK case. As Store location is not applicable for SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2830) Support Merge index files read from non transactional table.
Ajantha Bhat created CARBONDATA-2830: Summary: Support Merge index files read from non transactional table. Key: CARBONDATA-2830 URL: https://issues.apache.org/jira/browse/CARBONDATA-2830 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem : Currently SDK read/ nontransactional table read from external table gives null output when carbonMergeindex file is present instead of carobnindex files. cause : In LatestFileReadCommitted, while taking snapshot, merge index files were not considered. solution: consider the merge index files while taking snapshot -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2831) Support Merge index files read from non transactional table.
Ajantha Bhat created CARBONDATA-2831: Summary: Support Merge index files read from non transactional table. Key: CARBONDATA-2831 URL: https://issues.apache.org/jira/browse/CARBONDATA-2831 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem : Currently SDK read/ nontransactional table read from external table gives null output when carbonMergeindex file is present instead of carobnindex files. cause : In LatestFileReadCommitted, while taking snapshot, merge index files were not considered. solution: consider the merge index files while taking snapshot -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2895) [Batch-sort]Query result mismatch with Batch-sort in save to disk (sort temp files) scenario.
Ajantha Bhat created CARBONDATA-2895: Summary: [Batch-sort]Query result mismatch with Batch-sort in save to disk (sort temp files) scenario. Key: CARBONDATA-2895 URL: https://issues.apache.org/jira/browse/CARBONDATA-2895 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat probelm: Query result mismatch with Batch-sort in save to disk (sort temp files) scenario. scenario: a) Configure batchsort but give batch size more than UnsafeMemoryManager.INSTANCE.getUsableMemory(). b) Load data that is greater than batch size. Observe that unsafeMemoryManager save to disk happened as it cannot process one batch. c) so load happens in 2 batch. d) When query the results. There result data rows is more than expected data rows. root cause: For each batch, createSortDataRows() will be called. Files saved to disk during sorting of previous batch was considered for this batch. solution: Files saved to disk during sorting of previous batch ,should not be considered for this batch. Hence use batchID as rangeID field of sorttempfiles. So getFilesToMergeSort() will select files of only this batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2874) Support SDK writer as thread safe api
Ajantha Bhat created CARBONDATA-2874: Summary: Support SDK writer as thread safe api Key: CARBONDATA-2874 URL: https://issues.apache.org/jira/browse/CARBONDATA-2874 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat h1. Support SDK writer as thread safe api -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2775) Adaptive encoding fails for Unsafe OnHeap if, target data type is SHORT_INT
Ajantha Bhat created CARBONDATA-2775: Summary: Adaptive encoding fails for Unsafe OnHeap if, target data type is SHORT_INT Key: CARBONDATA-2775 URL: https://issues.apache.org/jira/browse/CARBONDATA-2775 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2775) Adaptive encoding fails for Unsafe OnHeap if, target data type is SHORT_INT
[ https://issues.apache.org/jira/browse/CARBONDATA-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2775: Assignee: Ajantha Bhat > Adaptive encoding fails for Unsafe OnHeap if, target data type is SHORT_INT > --- > > Key: CARBONDATA-2775 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2775 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2777) NonTransactional tables, Select count(*) is not giving latest results for incremental load with same segment ID (UUID)
Ajantha Bhat created CARBONDATA-2777: Summary: NonTransactional tables, Select count(*) is not giving latest results for incremental load with same segment ID (UUID) Key: CARBONDATA-2777 URL: https://issues.apache.org/jira/browse/CARBONDATA-2777 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2777) NonTransactional tables, Select count(*) is not giving latest results for incremental load with same segment ID (UUID)
[ https://issues.apache.org/jira/browse/CARBONDATA-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-2777: Assignee: Ajantha Bhat > NonTransactional tables, Select count(*) is not giving latest results for > incremental load with same segment ID (UUID) > -- > > Key: CARBONDATA-2777 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2777 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2784) [SDK writer] Forever blocking wait with more than 20 batch of data, when consumer is dead due to data loading exception
Ajantha Bhat created CARBONDATA-2784: Summary: [SDK writer] Forever blocking wait with more than 20 batch of data, when consumer is dead due to data loading exception Key: CARBONDATA-2784 URL: https://issues.apache.org/jira/browse/CARBONDATA-2784 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: [SDK writer] Forever blocking wait with more than 21 batch of data, when consumer is dead due to data loading exception (bad record / out of memory) root cause: When the consumer is dead due to data loading exception, writer will be forcefully closed. but queue.clear() cleared only snapshot of entries (10 batches) and close is set to true after that. In between clear() and close = true, If more than 10 batches of data is again put into queue. For 11th batch, queue.put() goes for forever block as consumer is dead. Solution: set close = true, before clearing the queue. This will avoid adding more batches to queue from write(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2926) ArrayIndexOutOfBoundException if varchar column is present before dictionary columns along with empty sort_columns.
Ajantha Bhat created CARBONDATA-2926: Summary: ArrayIndexOutOfBoundException if varchar column is present before dictionary columns along with empty sort_columns. Key: CARBONDATA-2926 URL: https://issues.apache.org/jira/browse/CARBONDATA-2926 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat ArrayIndexOutOfBoundException if varchar column is present before dictionary columns along with empty sort_columns. cause: CarbonFactDataHandlerColumnar.isVarcharColumnFull() method uses model.getVarcharDimIdxInNoDict() and index of varchar column in no dictonary array became negative. currently index was calculated based on ordinal-number of dictionary columns. This can go negative in no_sort column case, solution: take the varchar dimension index from no dictionary array from at runtime based on schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2921) support long string columns with spark FileFormat and SDK with "long_string_columns" TableProperties
Ajantha Bhat created CARBONDATA-2921: Summary: support long string columns with spark FileFormat and SDK with "long_string_columns" TableProperties Key: CARBONDATA-2921 URL: https://issues.apache.org/jira/browse/CARBONDATA-2921 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2921) support long string columns with spark FileFormat and SDK with "long_string_columns" TableProperties
[ https://issues.apache.org/jira/browse/CARBONDATA-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat closed CARBONDATA-2921. Resolution: Duplicate > support long string columns with spark FileFormat and SDK with > "long_string_columns" TableProperties > > > Key: CARBONDATA-2921 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2921 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2922) support long string columns with spark FileFormat and SDK with "long_string_columns" TableProperties
Ajantha Bhat created CARBONDATA-2922: Summary: support long string columns with spark FileFormat and SDK with "long_string_columns" TableProperties Key: CARBONDATA-2922 URL: https://issues.apache.org/jira/browse/CARBONDATA-2922 Project: CarbonData Issue Type: Sub-task Reporter: Ajantha Bhat Assignee: Ajantha Bhat CSV and Json SDk writer takes carbonschema, hence directly varchar can be given. but AVRO writer needs this table property. Also spark file format needs this table property to convert string columns to varchar columns. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2918) NPE If SDK sort column is not same case as schema
Ajantha Bhat created CARBONDATA-2918: Summary: NPE If SDK sort column is not same case as schema Key: CARBONDATA-2918 URL: https://issues.apache.org/jira/browse/CARBONDATA-2918 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: Currently in CarbonWriterBuilder, sortColumnsList.indexOf(field.getFieldName()) return null due to case issue. so, one of the sortcolumns in array will be null. so one of the column schema becomes null, hence when column schema is accessed, we get NPE solution: Make sort columns as case insensitive.. Change pareant column (as sort column is supported only foro parent columns) name to lower case when schema received by the user and change sort columns also to lower case when received by the user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2918) NPE If SDK sort column is not same case as schema
[ https://issues.apache.org/jira/browse/CARBONDATA-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604297#comment-16604297 ] Ajantha Bhat commented on CARBONDATA-2918: -- https://github.com/apache/carbondata/pull/2692 > NPE If SDK sort column is not same case as schema > - > > Key: CARBONDATA-2918 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2918 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > > problem: > Currently in CarbonWriterBuilder, > sortColumnsList.indexOf(field.getFieldName()) return null due to case issue. > so, one of the sortcolumns in array will be null. so one of the column schema > becomes null, > hence when column schema is accessed, we get NPE > solution: Make sort columns as case insensitive.. > Change pareant column (as sort column is supported only foro parent columns) > name to lower case when schema received by the user > and change sort columns also to lower case when received by the user. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2722) [SDK] [JsonWriter] Json writer is writing only first element of an array and discarding the rest of the elements
Ajantha Bhat created CARBONDATA-2722: Summary: [SDK] [JsonWriter] Json writer is writing only first element of an array and discarding the rest of the elements Key: CARBONDATA-2722 URL: https://issues.apache.org/jira/browse/CARBONDATA-2722 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2734) [BUG] support struct of date in create table
Ajantha Bhat created CARBONDATA-2734: Summary: [BUG] support struct of date in create table Key: CARBONDATA-2734 URL: https://issues.apache.org/jira/browse/CARBONDATA-2734 Project: CarbonData Issue Type: Sub-task Reporter: Ajantha Bhat Assignee: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2734) [BUG] support struct of date in create table
[ https://issues.apache.org/jira/browse/CARBONDATA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2734: - Description: Currently due to code issue. StringOutOfBound exception will be thrown > [BUG] support struct of date in create table > > > Key: CARBONDATA-2734 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2734 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > > Currently due to code issue. StringOutOfBound exception will be thrown -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2927) Multiple issue fixes for varchar column and complex columns that grows more than 2MB
Ajantha Bhat created CARBONDATA-2927: Summary: Multiple issue fixes for varchar column and complex columns that grows more than 2MB Key: CARBONDATA-2927 URL: https://issues.apache.org/jira/browse/CARBONDATA-2927 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat *Fixed:* *1. varchar data length is more than 2MB, buffer overflow exception (thread local row buffer)* *root* casue*: thread* loaclbuffer *was hardcoded with 2MB.* *solution: grow dynamically based on the row size.* *2. read data from carbon file having one row of varchar data with 150 MB length is very slow.* *root casue: At UnsafeDMStore, ensure memory is just incresing by 8KB each time and lot of time malloc and free happens before reaching 150MB. hence very slow performance.* *solution: directly check and allocate the required size.* *3. Jvm crash when data size is more than 128 MB in unsafe sort step.* *root cause: unsafeCarbonRowPage is of 128MB, so if data is more than 128MB for one row, we access block beyond allocated, leading to JVM crash.* *solution: validate the size before access and prompt user to increase unsafe memory. (by carbon property)* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2313) Support Reading unmanaged carbon table
Ajantha Bhat created CARBONDATA-2313: Summary: Support Reading unmanaged carbon table Key: CARBONDATA-2313 URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support Reading unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon unamanged table desgin doc_V1.0.pdf > Support Reading unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unamanged table desgin doc_V1.0.pdf > > Time Spent: 10h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Summary: Support unmanaged carbon table (was: Support Reading unmanaged carbon table) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unamanged table desgin doc_V1.0.pdf > > Time Spent: 10h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Description: h5. Support unmanaged carbon table (was: h1. Support unmanaged carbon table) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: (was: carbon unamanged table desgin doc_V1.0.pdf) > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Description: h1. Support unmanaged carbon table > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > > h1. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon unmanaged table desgin doc_V1.0.pdf > Support unmanaged carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon unmanaged table desgin doc_V1.0.pdf > > Time Spent: 11h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2359) Support applicable load options and table properties for unmanaged table
Ajantha Bhat created CARBONDATA-2359: Summary: Support applicable load options and table properties for unmanaged table Key: CARBONDATA-2359 URL: https://issues.apache.org/jira/browse/CARBONDATA-2359 Project: CarbonData Issue Type: Sub-task Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2369) Add a document for Non Transactional table with SDK writer guide
Ajantha Bhat created CARBONDATA-2369: Summary: Add a document for Non Transactional table with SDK writer guide Key: CARBONDATA-2369 URL: https://issues.apache.org/jira/browse/CARBONDATA-2369 Project: CarbonData Issue Type: Sub-task Reporter: Ajantha Bhat -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support Non Transactional carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: (was: carbon unmanaged table desgin doc_V1.0.pdf) > Support Non Transactional carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon NonTranscational Table.pdf > > Time Spent: 18h 50m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2313) Support Non Transactional carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2313: - Attachment: carbon NonTranscational Table.pdf > Support Non Transactional carbon table > -- > > Key: CARBONDATA-2313 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2313 > Project: CarbonData > Issue Type: New Feature >Reporter: Ajantha Bhat >Priority: Major > Attachments: carbon NonTranscational Table.pdf > > Time Spent: 18h 50m > Remaining Estimate: 0h > > h5. Support unmanaged carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3054) Dictionary file cannot be read in S3a with CarbonDictionaryDecoder.doConsume() codeGen
Ajantha Bhat created CARBONDATA-3054: Summary: Dictionary file cannot be read in S3a with CarbonDictionaryDecoder.doConsume() codeGen Key: CARBONDATA-3054 URL: https://issues.apache.org/jira/browse/CARBONDATA-3054 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: In S3a environment, when quiried the data which has dictionary files, Dictionary file cannot be read in S3a with CarbonDictionaryDecoder.doConsume() codeGen even though file is present. cause: CarbonDictionaryDecoder.doConsume() codeGen doesn't set hadoop conf in thread local variable, only doExecute() sets it. Hence, when getDictionaryWrapper() called from doConsume() codeGen, AbstractDictionaryCache.getDictionaryMetaCarbonFile() returns false for fileExists() operation. solution: In CarbonDictionaryDecoder.doConsume() codeGen, set hadoop conf in thread local variable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2987) Data mismatch after compaction with measure sort columns
Ajantha Bhat created CARBONDATA-2987: Summary: Data mismatch after compaction with measure sort columns Key: CARBONDATA-2987 URL: https://issues.apache.org/jira/browse/CARBONDATA-2987 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: Data mismatch after compaction with measure sort columns root cause : In compaction flow (DictionaryBasedResultCollector), in ColumnPageWrapper inverted index mapping is not handled. Because of this, row of no dictionary dimension columns gets data form other rows. Hence the data mismatch solution: Handle inverted index mapping for DictionaryBasedResultCollector flow in ColumnPageWrapper -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3001) Propose configurable page size in MB (via carbon property)
Ajantha Bhat created CARBONDATA-3001: Summary: Propose configurable page size in MB (via carbon property) Key: CARBONDATA-3001 URL: https://issues.apache.org/jira/browse/CARBONDATA-3001 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat For better in-memory processing of carbondata pages, I am proposing configurable page size in MB (via carbon property). please find the attachment for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3001) Propose configurable page size in MB (via carbon property)
[ https://issues.apache.org/jira/browse/CARBONDATA-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3001: - Attachment: Propose configurable page size in MB (via carbon property).pdf > Propose configurable page size in MB (via carbon property) > -- > > Key: CARBONDATA-3001 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3001 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Attachments: Propose configurable page size in MB (via carbon > property).pdf > > > For better in-memory processing of carbondata pages, I am proposing > configurable page size in MB (via carbon property). > please find the attachment for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2901) Problem: Jvm crash in Load scenario when unsafe memory allocation is failed.
Ajantha Bhat created CARBONDATA-2901: Summary: Problem: Jvm crash in Load scenario when unsafe memory allocation is failed. Key: CARBONDATA-2901 URL: https://issues.apache.org/jira/browse/CARBONDATA-2901 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat Problem: Jvm crash in Load scenario when unsafe memory allocation is failed. scenario: a) Have many cores while loading. suggested more than 10. [carbon.number.of.cores.while.loading] b) Load huge data with local sort, more than 5GB (keeping default unsafe memory manager as 512 MB) c) when task failes due to not enough unsafae memory, JVM crashes with SIGSEGV. root casue: while sorting, all iterator threads are waiting at UnsafeSortDataRows.addRowBatch as all iterator works on one row page. Only one iterator thread will try to allocate memory. Before that it has freed current page in handlePreviousPage(). When allocate memory failed, row page will still have that old reference. next thread will again use same reference and call handlePreviousPage(). So, Jvm crashes as freed memory is accessed. solution: When allocation failed, set row page reference to null. So, that next thread will not do any operation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2891) Job aborted while loading long string 32k data into carbon table from hive
[ https://issues.apache.org/jira/browse/CARBONDATA-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602759#comment-16602759 ] Ajantha Bhat commented on CARBONDATA-2891: -- java.lang.NegativeArraySizeException at org.apache.carbondata.processing.loading.sort.SortStepRowHandler.readRowFromMemoryWithNoSortFieldConvert(SortStepRowHandler.java:447) at org.apache.carbondata.processing.loading.sort.unsafe.UnsafeCarbonRowPage.getRow(UnsafeCarbonRowPage.java:93) at org.apache.carbondata.processing.loading.sort.unsafe.holder.UnsafeInmemoryHolder.readRow(UnsafeInmemoryHolder.java:61) at org.apache.carbondata.processing.loading.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startSorting(UnsafeSingleThreadFinalSortFilesMerger.java:127) at org.apache.carbondata.processing.loading.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startFinalMerge(UnsafeSingleThreadFinalSortFilesMerger.java:94) at org.apache.carbondata.processing.loading.sort.impl.UnsafeParallelReadMergeSorterImpl.sort(UnsafeParallelReadMergeSorterImpl.java:110) at org.apache.carbondata.processing.loading.steps.SortProcessorStepImpl.execute(SortProcessorStepImpl.java:55) at org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:112) at org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51) at org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.run(CarbonTableOutputFormat.java:260) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) > Job aborted while loading long string 32k data into carbon table from hive > -- > > Key: CARBONDATA-2891 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2891 > Project: CarbonData > Issue Type: Bug > Components: data-load >Reporter: Rahul Singha >Priority: Minor > > _*Steps:*_ > CREATE TABLE local1(id int, name string, description string,address string, > note string) using carbon options('long_string_columns'='description,note'); > CREATE TABLE local_hive(id int, name string, description string,address > string, note string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > LOAD DATA local INPATH '/opt/csv/longStringData_100rec.csv' overwrite into > table local_hive; > insert into local1 select * from local_hive; > _*Expected result:*_ > Data should get loaded. > _*Actual Result:*_ > Error: org.apache.spark.SparkException: Job aborted. (state=,code=0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2913) BufferOverflowException when use varchar data of length 10000000
[ https://issues.apache.org/jira/browse/CARBONDATA-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602772#comment-16602772 ] Ajantha Bhat commented on CARBONDATA-2913: -- !image-2018-09-04-14-19-00-708.png! That could be the reason. Need to analyze more > BufferOverflowException when use varchar data of length 1000 > > > Key: CARBONDATA-2913 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2913 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Major > Attachments: image-2018-09-04-14-19-00-708.png > > > SDKwriterTestCase has a test case "Test sdk with longstring" > > use > RandomStringUtils.randomAlphabetic(1000). > > we get below exception during sort. > > java.nio.BufferOverflowException > at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189) > at java.nio.ByteBuffer.put(ByteBuffer.java:859) > at > org.apache.carbondata.processing.loading.sort.SortStepRowHandler.packNoSortFieldsToBytes(SortStepRowHandler.java:585) > at > org.apache.carbondata.processing.loading.sort.SortStepRowHandler.writeRawRowAsIntermediateSortTempRowToUnsafeMemory(SortStepRowHandler.java:548) > at > org.apache.carbondata.processing.loading.sort.unsafe.UnsafeCarbonRowPage.addRow(UnsafeCarbonRowPage.java:82) > at > org.apache.carbondata.processing.loading.sort.unsafe.UnsafeCarbonRowPage.addRow(UnsafeCarbonRowPage.java:68) > at > org.apache.carbondata.processing.loading.sort.unsafe.UnsafeSortDataRows.addBatch(UnsafeSortDataRows.java:203) > at > org.apache.carbondata.processing.loading.sort.unsafe.UnsafeSortDataRows.addRowBatch(UnsafeSortDataRows.java:179) > at > org.apache.carbondata.processing.loading.sort.impl.UnsafeParallelReadMergeSorterImpl$SortIteratorThread.run(UnsafeParallelReadMergeSorterImpl.java:205) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-09-04 14:06:55 ERROR CarbonTableOutputFormat:442 - Error while loading > data > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2913) BufferOverflowException when use varchar data of length 10000000
[ https://issues.apache.org/jira/browse/CARBONDATA-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2913: - Attachment: image-2018-09-04-14-19-00-708.png > BufferOverflowException when use varchar data of length 1000 > > > Key: CARBONDATA-2913 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2913 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Major > Attachments: image-2018-09-04-14-19-00-708.png > > > SDKwriterTestCase has a test case "Test sdk with longstring" > > use > RandomStringUtils.randomAlphabetic(1000). > > we get below exception during sort. > > java.nio.BufferOverflowException > at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189) > at java.nio.ByteBuffer.put(ByteBuffer.java:859) > at > org.apache.carbondata.processing.loading.sort.SortStepRowHandler.packNoSortFieldsToBytes(SortStepRowHandler.java:585) > at > org.apache.carbondata.processing.loading.sort.SortStepRowHandler.writeRawRowAsIntermediateSortTempRowToUnsafeMemory(SortStepRowHandler.java:548) > at > org.apache.carbondata.processing.loading.sort.unsafe.UnsafeCarbonRowPage.addRow(UnsafeCarbonRowPage.java:82) > at > org.apache.carbondata.processing.loading.sort.unsafe.UnsafeCarbonRowPage.addRow(UnsafeCarbonRowPage.java:68) > at > org.apache.carbondata.processing.loading.sort.unsafe.UnsafeSortDataRows.addBatch(UnsafeSortDataRows.java:203) > at > org.apache.carbondata.processing.loading.sort.unsafe.UnsafeSortDataRows.addRowBatch(UnsafeSortDataRows.java:179) > at > org.apache.carbondata.processing.loading.sort.impl.UnsafeParallelReadMergeSorterImpl$SortIteratorThread.run(UnsafeParallelReadMergeSorterImpl.java:205) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-09-04 14:06:55 ERROR CarbonTableOutputFormat:442 - Error while loading > data > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2913) BufferOverflowException when use varchar data of length 10000000
Ajantha Bhat created CARBONDATA-2913: Summary: BufferOverflowException when use varchar data of length 1000 Key: CARBONDATA-2913 URL: https://issues.apache.org/jira/browse/CARBONDATA-2913 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat SDKwriterTestCase has a test case "Test sdk with longstring" use RandomStringUtils.randomAlphabetic(1000). we get below exception during sort. java.nio.BufferOverflowException at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189) at java.nio.ByteBuffer.put(ByteBuffer.java:859) at org.apache.carbondata.processing.loading.sort.SortStepRowHandler.packNoSortFieldsToBytes(SortStepRowHandler.java:585) at org.apache.carbondata.processing.loading.sort.SortStepRowHandler.writeRawRowAsIntermediateSortTempRowToUnsafeMemory(SortStepRowHandler.java:548) at org.apache.carbondata.processing.loading.sort.unsafe.UnsafeCarbonRowPage.addRow(UnsafeCarbonRowPage.java:82) at org.apache.carbondata.processing.loading.sort.unsafe.UnsafeCarbonRowPage.addRow(UnsafeCarbonRowPage.java:68) at org.apache.carbondata.processing.loading.sort.unsafe.UnsafeSortDataRows.addBatch(UnsafeSortDataRows.java:203) at org.apache.carbondata.processing.loading.sort.unsafe.UnsafeSortDataRows.addRowBatch(UnsafeSortDataRows.java:179) at org.apache.carbondata.processing.loading.sort.impl.UnsafeParallelReadMergeSorterImpl$SortIteratorThread.run(UnsafeParallelReadMergeSorterImpl.java:205) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-09-04 14:06:55 ERROR CarbonTableOutputFormat:442 - Error while loading data java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2961) Simplify SDK API interfaces
[ https://issues.apache.org/jira/browse/CARBONDATA-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2961: - Description: Added: public CarbonWriterBuilder withThreadSafe(short numOfThreads) public CarbonWriterBuilder withHadoopConf(Configuration conf) public CarbonWriterBuilder withCsvInput(Schema schema) public CarbonWriterBuilder withAvroInput(org.apache.avro.Schema avroSchema) public CarbonWriterBuilder withJsonInput(Schema carbonSchema) public CarbonWriter build() throws IOException, InvalidLoadOptionException Removed: public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) setAccessKey setAccessKey setSecretKey setSecretKey setEndPoint setEndPoint public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration configuration) public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short numOfThreads,Configuration configuration) public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema avroSchema,Configuration configuration) public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema avroSchema,short numOfThreads, Configuration configuration) public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, Configuration configuration) public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema carbonSchema, short numOfThreads,Configuration configuration) was: Added: public CarbonWriterBuilder withThreadSafe(short numOfThreads) public CarbonWriterBuilder withHadoopConf(Configuration conf) public CarbonWriterBuilder forCsvInput(Schema schema) public CarbonWriterBuilder forAvroInput(org.apache.avro.Schema avroSchema) public CarbonWriterBuilder forJsonInput(Schema carbonSchema) public CarbonWriter build() throws IOException, InvalidLoadOptionException Removed: public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) setAccessKey setAccessKey setSecretKey setSecretKey setEndPoint setEndPoint public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration configuration) public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short numOfThreads,Configuration configuration) public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema avroSchema,Configuration configuration) public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema avroSchema,short numOfThreads, Configuration configuration) public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, Configuration configuration) public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema carbonSchema, short numOfThreads,Configuration configuration) > Simplify SDK API interfaces > --- > > Key: CARBONDATA-2961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2961 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Added: > public CarbonWriterBuilder withThreadSafe(short numOfThreads) > public CarbonWriterBuilder withHadoopConf(Configuration conf) > public CarbonWriterBuilder withCsvInput(Schema schema) > public CarbonWriterBuilder withAvroInput(org.apache.avro.Schema avroSchema) > public CarbonWriterBuilder withJsonInput(Schema carbonSchema) > public CarbonWriter build() throws IOException, InvalidLoadOptionException > Removed: > public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) > setAccessKey > setAccessKey > setSecretKey > setSecretKey > setEndPoint > setEndPoint > public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration > configuration) > public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short > numOfThreads,Configuration configuration) > public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema > avroSchema,Configuration configuration) > public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema > avroSchema,short numOfThreads, Configuration configuration) > public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, > Configuration configuration) > public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema > carbonSchema, short numOfThreads,Configuration configuration) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2961) Simplify SDK API interfaces
[ https://issues.apache.org/jira/browse/CARBONDATA-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2961: - Description: [CARBONDATA-2961] Simplify SDK API interfaces problem: current SDK API interfaces are not simpler and don't follow builder pattern. If new features are added, it will become more complex. Solution: Simplify the SDK interfaces as per builder pattern. *Refer the latest sdk-guide.* *Added:* *changes in Carbon Writer:* public CarbonWriterBuilder withThreadSafe(short numOfThreads) public CarbonWriterBuilder withHadoopConf(Configuration conf) public CarbonWriterBuilder withCsvInput(Schema schema) public CarbonWriterBuilder withAvroInput(org.apache.avro.Schema avroSchema) public CarbonWriterBuilder withJsonInput(Schema carbonSchema) public CarbonWriter build() throws IOException, InvalidLoadOptionException *Changes in carbon Reader* public CarbonReaderBuilder withHadoopConf(Configuration conf) public CarbonWriter build() throws IOException, InvalidLoadOptionException *Removed:* *changes in Carbon Writer:* public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) setAccessKey setAccessKey setSecretKey setSecretKey setEndPoint setEndPoint public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration configuration) public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short numOfThreads,Configuration configuration) public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema avroSchema,Configuration configuration) public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema avroSchema,short numOfThreads, Configuration configuration) public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, Configuration configuration) public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema carbonSchema, short numOfThreads,Configuration configuration) *Changes in carbon Reader* public CarbonReaderBuilder isTransactionalTable(boolean isTransactionalTable) public CarbonWriter build(Configuration conf) throws IOException, InvalidLoadOptionException was: Added: public CarbonWriterBuilder withThreadSafe(short numOfThreads) public CarbonWriterBuilder withHadoopConf(Configuration conf) public CarbonWriterBuilder withCsvInput(Schema schema) public CarbonWriterBuilder withAvroInput(org.apache.avro.Schema avroSchema) public CarbonWriterBuilder withJsonInput(Schema carbonSchema) public CarbonWriter build() throws IOException, InvalidLoadOptionException Removed: public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) setAccessKey setAccessKey setSecretKey setSecretKey setEndPoint setEndPoint public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration configuration) public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short numOfThreads,Configuration configuration) public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema avroSchema,Configuration configuration) public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema avroSchema,short numOfThreads, Configuration configuration) public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, Configuration configuration) public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema carbonSchema, short numOfThreads,Configuration configuration) > Simplify SDK API interfaces > --- > > Key: CARBONDATA-2961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2961 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > [CARBONDATA-2961] Simplify SDK API interfaces > problem: current SDK API interfaces are not simpler and don't follow builder > pattern. > If new features are added, it will become more complex. > Solution: Simplify the SDK interfaces as per builder pattern. > *Refer the latest sdk-guide.* > *Added:* > *changes in Carbon Writer:* > public CarbonWriterBuilder withThreadSafe(short numOfThreads) > public CarbonWriterBuilder withHadoopConf(Configuration conf) > public CarbonWriterBuilder withCsvInput(Schema schema) > public CarbonWriterBuilder withAvroInput(org.apache.avro.Schema avroSchema) > public CarbonWriterBuilder withJsonInput(Schema carbonSchema) > public CarbonWriter build() throws IOException, InvalidLoadOptionException > *Changes in carbon Reader* > public CarbonReaderBuilder withHadoopConf(Configuration conf) > public CarbonWriter build() throws IOException, InvalidLoadOptionException > *Removed:* > *changes in Carbon Writer:* > public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) > setAccessKey > setAccessKey > setSecretKey > setSecretKey > setEndPoint > setEndPoint > public
[jira] [Updated] (CARBONDATA-2961) Simplify SDK API interfaces
[ https://issues.apache.org/jira/browse/CARBONDATA-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-2961: - Description: CARBONDATA-2961 Simplify SDK API interfaces problem: current SDK API interfaces are not simpler and don't follow builder pattern. If new features are added, it will become more complex. Solution: Simplify the SDK interfaces as per builder pattern. *Refer the latest sdk-guide.* *Added:* *changes in Carbon Writer:* public CarbonWriterBuilder withThreadSafe(short numOfThreads) public CarbonWriterBuilder withHadoopConf(Configuration conf) public CarbonWriterBuilder withCsvInput(Schema schema) public CarbonWriterBuilder withAvroInput(org.apache.avro.Schema avroSchema) public CarbonWriterBuilder withJsonInput(Schema carbonSchema) public CarbonWriter build() throws IOException, InvalidLoadOptionException *Changes in carbon Reader* public CarbonReaderBuilder withHadoopConf(Configuration conf) public CarbonWriter build() throws IOException, InvalidLoadOptionException *Removed:* *changes in Carbon Writer:* public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) {{public CarbonWriterBuilder persistSchemaFile(boolean persist);}} setAccessKey setAccessKey setSecretKey setSecretKey setEndPoint setEndPoint public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration configuration) public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short numOfThreads,Configuration configuration) public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema avroSchema,Configuration configuration) public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema avroSchema,short numOfThreads, Configuration configuration) public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, Configuration configuration) public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema carbonSchema, short numOfThreads,Configuration configuration) *Changes in carbon Reader* public CarbonReaderBuilder isTransactionalTable(boolean isTransactionalTable) public CarbonWriter build(Configuration conf) throws IOException, InvalidLoadOptionException was: [CARBONDATA-2961] Simplify SDK API interfaces problem: current SDK API interfaces are not simpler and don't follow builder pattern. If new features are added, it will become more complex. Solution: Simplify the SDK interfaces as per builder pattern. *Refer the latest sdk-guide.* *Added:* *changes in Carbon Writer:* public CarbonWriterBuilder withThreadSafe(short numOfThreads) public CarbonWriterBuilder withHadoopConf(Configuration conf) public CarbonWriterBuilder withCsvInput(Schema schema) public CarbonWriterBuilder withAvroInput(org.apache.avro.Schema avroSchema) public CarbonWriterBuilder withJsonInput(Schema carbonSchema) public CarbonWriter build() throws IOException, InvalidLoadOptionException *Changes in carbon Reader* public CarbonReaderBuilder withHadoopConf(Configuration conf) public CarbonWriter build() throws IOException, InvalidLoadOptionException *Removed:* *changes in Carbon Writer:* public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) setAccessKey setAccessKey setSecretKey setSecretKey setEndPoint setEndPoint public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration configuration) public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short numOfThreads,Configuration configuration) public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema avroSchema,Configuration configuration) public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema avroSchema,short numOfThreads, Configuration configuration) public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, Configuration configuration) public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema carbonSchema, short numOfThreads,Configuration configuration) *Changes in carbon Reader* public CarbonReaderBuilder isTransactionalTable(boolean isTransactionalTable) public CarbonWriter build(Configuration conf) throws IOException, InvalidLoadOptionException > Simplify SDK API interfaces > --- > > Key: CARBONDATA-2961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2961 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > CARBONDATA-2961 Simplify SDK API interfaces > problem: current SDK API interfaces are not simpler and don't follow builder > pattern. > If new features are added, it will become more complex. > Solution: Simplify the SDK interfaces as per builder pattern. > *Refer the latest sdk-guide.* > *Added:* > *changes in Carbon Writer:* > public CarbonWriterBuilder
[jira] [Created] (CARBONDATA-2961) Simplify SDK API interfaces
Ajantha Bhat created CARBONDATA-2961: Summary: Simplify SDK API interfaces Key: CARBONDATA-2961 URL: https://issues.apache.org/jira/browse/CARBONDATA-2961 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat Added: public CarbonWriterBuilder withThreadSafe(short numOfThreads) public CarbonWriterBuilder withHadoopConf(Configuration conf) public CarbonWriterBuilder forCsvInput(Schema schema) public CarbonWriterBuilder forAvroInput(org.apache.avro.Schema avroSchema) public CarbonWriterBuilder forJsonInput(Schema carbonSchema) public CarbonWriter build() throws IOException, InvalidLoadOptionException Removed: public CarbonWriterBuilder isTransactionalTable(boolean isTransactionalTable) setAccessKey setAccessKey setSecretKey setSecretKey setEndPoint setEndPoint public CarbonWriter buildWriterForCSVInput(Schema schema, Configuration configuration) public CarbonWriter buildThreadSafeWriterForCSVInput(Schema schema, short numOfThreads,Configuration configuration) public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema avroSchema,Configuration configuration) public CarbonWriter buildThreadSafeWriterForAvroInput(org.apache.avro.Schema avroSchema,short numOfThreads, Configuration configuration) public JsonCarbonWriter buildWriterForJsonInput(Schema carbonSchema, Configuration configuration) public JsonCarbonWriter buildThreadSafeWriterForJsonInput(Schema carbonSchema, short numOfThreads,Configuration configuration) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3261) support float and byte reading from presto
Ajantha Bhat created CARBONDATA-3261: Summary: support float and byte reading from presto Key: CARBONDATA-3261 URL: https://issues.apache.org/jira/browse/CARBONDATA-3261 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: support float and byte reading from presto cause: currently float and byte cannot be read in presto due to code issue. It was going as double data type. Hence array out of bound issue used to come as float/byte read from double stream reader. solution: Implement a new stream reader for float and byte. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3164) During no_sort, excpetion happend at converter step is not reached to user. same problem in SDK and spark file format flow also.
Ajantha Bhat created CARBONDATA-3164: Summary: During no_sort, excpetion happend at converter step is not reached to user. same problem in SDK and spark file format flow also. Key: CARBONDATA-3164 URL: https://issues.apache.org/jira/browse/CARBONDATA-3164 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat During no_sort, excpetion happend at converter step is not reached to user. same problem in SDK and spark file format flow also. TestLoadDataGeneral. test({color:#008000}"test load / insert / update with data more than 32000 bytes - dictionary_exclude"{color}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3162) Range filters doesn't remove null values for no_sort direct dictionary dimension columns.
Ajantha Bhat created CARBONDATA-3162: Summary: Range filters doesn't remove null values for no_sort direct dictionary dimension columns. Key: CARBONDATA-3162 URL: https://issues.apache.org/jira/browse/CARBONDATA-3162 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat Range filters doesn't remove null values for no_sort direct dictionary dimension columns. TimestampDataTypeDirectDictionaryTest. test({color:#008000}"test timestamp with dictionary include and no_inverted index"{color}) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3163) If table has different time format, for no_sort columns data goes as bad record (null) for second table when loaded after first table.
Ajantha Bhat created CARBONDATA-3163: Summary: If table has different time format, for no_sort columns data goes as bad record (null) for second table when loaded after first table. Key: CARBONDATA-3163 URL: https://issues.apache.org/jira/browse/CARBONDATA-3163 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat If table has different time format, for no_sort columns data goes as bad record (null) for second table when loaded after first table. FilterProcessorTestCase.test("Between filter") -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3158) support presto-carbon to read sdk cabron files
Ajantha Bhat created CARBONDATA-3158: Summary: support presto-carbon to read sdk cabron files Key: CARBONDATA-3158 URL: https://issues.apache.org/jira/browse/CARBONDATA-3158 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat Currently, carbon SDK files output (files without metadata folder and its contents) are read by spark using an external table with carbon session. But presto carbon integration doesn't support that. It can currently read only the transactional table output files. Hence we can enhance presto to read SDK output files. This will increase the use cases for presto-carbon integration. The above scenario can be achieved by inferring schema if metadata folder not exists and setting read committed scope to LatestFilesReadCommittedScope, if non-transctional table output files are present. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3136) JVM crash with preaggregate datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3136: - Description: JVM crash with preaggregate datamap. callstack: Stack: [0x7efebd49a000,0x7efebd59b000], sp=0x7efebd598dc8, free space=1019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x7b2b50] J 7620 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x7eff4a3479e1 [0x7eff4a347900+0xe1] j org.apache.spark.unsafe.Platform.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V+34 j org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(I)[B+54 j org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(III)Lorg/apache/spark/sql/types/Decimal;+30 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;+36 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 J 3104 C1 scala.collection.Iterator$$anon$11.next()Ljava/lang/Object; (19 bytes) @ 0x7eff49154724 [0x7eff49154560+0x1c4] j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Lscala/collection/Iterator;)Lscala/collection/Iterator;+78 j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 J 14007 C1 org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (17 bytes) @ 0x7eff4a6ed204 [0x7eff4a6ecc40+0x5c4] J 11684 C1 org.apache.spark.rdd.MapPartitionsRDD.compute(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator; (36 bytes) @ 0x7eff4ad11274 [0x7eff4ad10f60+0x314] J 13771 C1 org.apache.spark.rdd.RDD.iterator(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator; (46 bytes) @ 0x7eff4b39dd3c [0x7eff4b39d160+0xbdc] test({color:#008000}"Test Pre_aggregate with decimal column with order by"{color}) { sql({color:#008000}"drop table if exists maintable"{color}) sql({color:#008000}"create table maintable(name string, decimal_col decimal(30,16)) stored by 'carbondata'"{color}) sql({color:#008000}"insert into table maintable select 'abc',452.564"{color}) sql( {color:#008000}"create datamap ag1 on table maintable using 'preaggregate' as select name,avg(decimal_col)" {color}+ {color:#008000}" from maintable group by name"{color}) checkAnswer(sql({color:#008000}"select avg(decimal_col) from maintable group by name order by name"{color}), {color:#660e7a}Seq{color}(Row({color:#ff}452.5640{color}))) } was: JVM crash with preaggregate datamap. callstack: Stack: [0x7efebd49a000,0x7efebd59b000], sp=0x7efebd598dc8, free space=1019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x7b2b50] J 7620 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x7eff4a3479e1 [0x7eff4a347900+0xe1] j org.apache.spark.unsafe.Platform.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V+34 j org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(I)[B+54 j org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(III)Lorg/apache/spark/sql/types/Decimal;+30 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;+36 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 J 3104 C1 scala.collection.Iterator$$anon$11.next()Ljava/lang/Object; (19 bytes) @ 0x7eff49154724 [0x7eff49154560+0x1c4] j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Lscala/collection/Iterator;)Lscala/collection/Iterator;+78 j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 J 14007 C1 org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (17 bytes) @ 0x7eff4a6ed204 [0x7eff4a6ecc40+0x5c4] J 11684 C1 org.apache.spark.rdd.MapPartitionsRDD.compute(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator; (36 bytes) @ 0x7eff4ad11274 [0x7eff4ad10f60+0x314] J 13771 C1 org.apache.spark.rdd.RDD.iterator(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator; (46 bytes) @ 0x7eff4b39dd3c [0x7eff4b39d160+0xbdc] > JVM crash with preaggregate datamap > --- > >
[jira] [Created] (CARBONDATA-3118) Parallelize block pruning of default datamap in driver for filter query processing
Ajantha Bhat created CARBONDATA-3118: Summary: Parallelize block pruning of default datamap in driver for filter query processing Key: CARBONDATA-3118 URL: https://issues.apache.org/jira/browse/CARBONDATA-3118 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat *"Parallelize block pruning of default datamap in driver for filter query processing"* *Background:* We do block pruning for the filter queries at the driver side. In real time big data scenario, we can have millions of carbon files for one carbon table. It is currently observed that for 1 million carbon files it takes around 5 seconds for block pruning. As each carbon file takes around 0.005ms for pruning (with only one filter columns set in 'column_meta_cache' tblproperty). If the files are more, we might take more time for block pruning. Also, spark Job will not be launched until block pruning is completed. so, the user will not know what is happening at that time and why spark job is not launching. currently, block pruning is taking time as each segment processing is happening sequentially. we can reduce the time by parallelizing it. *solution:*Keep default number of threads for block pruning as 4. User can reduce this number by a carbon property "carbon.max.driver.threads.for.pruning" to set between -> 1 to 4. In TableDataMap.prune(), group the segments as per the threads by distributing equal carbon files to each thread. Launch the threads for a group of segments to handle block pruning. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3136) JVM crash with preaggregate datamap
Ajantha Bhat created CARBONDATA-3136: Summary: JVM crash with preaggregate datamap Key: CARBONDATA-3136 URL: https://issues.apache.org/jira/browse/CARBONDATA-3136 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat Assignee: Ajantha Bhat JVM crash with preaggregate datamap. callstack: Stack: [0x7efebd49a000,0x7efebd59b000], sp=0x7efebd598dc8, free space=1019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x7b2b50] J 7620 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x7eff4a3479e1 [0x7eff4a347900+0xe1] j org.apache.spark.unsafe.Platform.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V+34 j org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(I)[B+54 j org.apache.spark.sql.catalyst.expressions.UnsafeRow.getDecimal(III)Lorg/apache/spark/sql/types/Decimal;+30 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;+36 j org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 J 3104 C1 scala.collection.Iterator$$anon$11.next()Ljava/lang/Object; (19 bytes) @ 0x7eff49154724 [0x7eff49154560+0x1c4] j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Lscala/collection/Iterator;)Lscala/collection/Iterator;+78 j org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 J 14007 C1 org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (17 bytes) @ 0x7eff4a6ed204 [0x7eff4a6ecc40+0x5c4] J 11684 C1 org.apache.spark.rdd.MapPartitionsRDD.compute(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator; (36 bytes) @ 0x7eff4ad11274 [0x7eff4ad10f60+0x314] J 13771 C1 org.apache.spark.rdd.RDD.iterator(Lorg/apache/spark/Partition;Lorg/apache/spark/TaskContext;)Lscala/collection/Iterator; (46 bytes) @ 0x7eff4b39dd3c [0x7eff4b39d160+0xbdc] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3138) Random count mismatch in query in multi-thread block-pruning scenario
Ajantha Bhat created CARBONDATA-3138: Summary: Random count mismatch in query in multi-thread block-pruning scenario Key: CARBONDATA-3138 URL: https://issues.apache.org/jira/browse/CARBONDATA-3138 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: Random count mismatch in query in multi-thread block-pruning scenario. cause:Existing prune method not meant for multi-threading as synchronization was missiing. only in implicit filter scenario, while preparing the block ID list, synchronization was missing. Hence pruning was giving wrong result. solution: syncronize the imlicit filter prepartion, as prune now called in multi-thread -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3237) optimize presto query time for dictionary include string column
Ajantha Bhat created CARBONDATA-3237: Summary: optimize presto query time for dictionary include string column Key: CARBONDATA-3237 URL: https://issues.apache.org/jira/browse/CARBONDATA-3237 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat optimize presto query time for dictionary include string column. problem: currently, for each query, presto carbon creates dictionary block for string columns. This happens for each query and if cardinality is more , it takes more time to build. This is not required. we can lookup using normal dictionary lookup. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3186) NPE when all the records in a file is badrecord with action redirect/ignore
Ajantha Bhat created CARBONDATA-3186: Summary: NPE when all the records in a file is badrecord with action redirect/ignore Key: CARBONDATA-3186 URL: https://issues.apache.org/jira/browse/CARBONDATA-3186 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat *problem:* In the no_sort flow, writer will be open as there is no blocking sort step. So, when all the record goes as bad record with redirect in converted step. writer is closing the empty .carbondata file. when this empty carbondata file is queried , we get multiple issues including NPE. *solution:* When the file size is 0 bytes. do the following a) If one data and one index file -- delete carbondata file and avoid index file creation b) If multiple data and one index file (with few data file is full of bad recod) -- delete carbondata files, remove them from blockIndexInfoList, so index file not will not have that info of empty carbon files c) In case direct write to store path is enable. need to delete data file from there and avoid writing index file with that carbondata in info. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3330) Fix Invalid exception when SDK reader is trying to clear the datamap
Ajantha Bhat created CARBONDATA-3330: Summary: Fix Invalid exception when SDK reader is trying to clear the datamap Key: CARBONDATA-3330 URL: https://issues.apache.org/jira/browse/CARBONDATA-3330 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat java.io.IOException: File does not exist: /opt/csdk/out/cmplx_Schema/Metadata/schema at org.apache.carbondata.core.metadata.schema.SchemaReader.readCarbonTableFromStore(SchemaReader.java:60) at org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildFromTablePath(CarbonTable.java:302) at org.apache.carbondata.core.datamap.DataMapStoreManager.getCarbonTable(DataMapStoreManager.java:512) at org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:476) at org.apache.carbondata.hadoop.CarbonRecordReader.close(CarbonRecordReader.java:164) at org.apache.carbondata.sdk.file.CarbonReader.close(CarbonReader.java:219) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3311) Support latest presto [0.217] in carbon
Ajantha Bhat created CARBONDATA-3311: Summary: Support latest presto [0.217] in carbon Key: CARBONDATA-3311 URL: https://issues.apache.org/jira/browse/CARBONDATA-3311 Project: CarbonData Issue Type: Improvement Reporter: Ajantha Bhat supporting the latest version of the presto. please refer the release doc of presto for more details, there is a change in presto-hive interfaces and hive analyser is added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3282) presto carbon doesn't work with Hadoop conf in cluster.
Ajantha Bhat created CARBONDATA-3282: Summary: presto carbon doesn't work with Hadoop conf in cluster. Key: CARBONDATA-3282 URL: https://issues.apache.org/jira/browse/CARBONDATA-3282 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: when datamap path is given , presto carbon throws 'hacluster' unkown host exception even when hdfs configuration is present. solution: from HDFS environment, set the hadoop configuration to thread local, so that FileFactory can use this configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CARBONDATA-3001) Propose configurable page size in MB (via carbon property)
[ https://issues.apache.org/jira/browse/CARBONDATA-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816145#comment-16816145 ] Ajantha Bhat edited comment on CARBONDATA-3001 at 4/12/19 10:10 AM: Future scope: # child tables support? check about inherit # store size increases ? # CLI tool or total log summary # Impact of many pages on page wise creation of tools. was (Author: ajantha_bhat): 1. child tables support? check about inherit 2. store size increases ? 3. CLI tool or total log summary 4. impact of many pages on page wise creation of tools. > Propose configurable page size in MB (via carbon property) > -- > > Key: CARBONDATA-3001 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3001 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Attachments: Propose configurable page size in MB (via carbon > property).pdf > > Time Spent: 16h 20m > Remaining Estimate: 0h > > For better in-memory processing of carbondata pages, I am proposing > configurable page size in MB (via carbon property). > please find the attachment for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-3001) Propose configurable page size in MB (via carbon property)
[ https://issues.apache.org/jira/browse/CARBONDATA-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816145#comment-16816145 ] Ajantha Bhat commented on CARBONDATA-3001: -- 1. child tables support? check about inherit 2. store size increases ? 3. CLI tool or total log summary 4. impact of many pages on page wise creation of tools. > Propose configurable page size in MB (via carbon property) > -- > > Key: CARBONDATA-3001 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3001 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Minor > Attachments: Propose configurable page size in MB (via carbon > property).pdf > > Time Spent: 16h 20m > Remaining Estimate: 0h > > For better in-memory processing of carbondata pages, I am proposing > configurable page size in MB (via carbon property). > please find the attachment for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3395) When same split object is passed to concurrent readers, build() fails randomly with Exception.
Ajantha Bhat created CARBONDATA-3395: Summary: When same split object is passed to concurrent readers, build() fails randomly with Exception. Key: CARBONDATA-3395 URL: https://issues.apache.org/jira/browse/CARBONDATA-3395 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat When same split object is passed to concurrent readers, build() fails randomly with Exception 2019-05-24 13:51:55 ERROR CarbonVectorizedRecordReader:116 - java.lang.ArrayIndexOutOfBoundsException: 4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3413) Arrow allocators gives OutOfMemory error when test with hugedata
Ajantha Bhat created CARBONDATA-3413: Summary: Arrow allocators gives OutOfMemory error when test with hugedata Key: CARBONDATA-3413 URL: https://issues.apache.org/jira/browse/CARBONDATA-3413 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Arrow allocators gives OutOfMemory error when test with hugedata problem: OOM exception when in arrow with huge data cause: In ArrowConverter, allocator is not closed solution: close the allocator in arrowConverter. Also handle the problems in test utility API -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3405) SDK reader getSplits() must clear the cache.
Ajantha Bhat created CARBONDATA-3405: Summary: SDK reader getSplits() must clear the cache. Key: CARBONDATA-3405 URL: https://issues.apache.org/jira/browse/CARBONDATA-3405 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat a. cache key is not filled during sdk reader, its always with null table name. fill this b. clear the cache, after splits are obtained in getSplits() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3411) ClearDatamaps logs an exception in SDK
Ajantha Bhat created CARBONDATA-3411: Summary: ClearDatamaps logs an exception in SDK Key: CARBONDATA-3411 URL: https://issues.apache.org/jira/browse/CARBONDATA-3411 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat problem: In sdk when datamaps are cleared, below exception is logged java.io.IOException: File does not exist: /home/root1/Documents/ab/workspace/carbonFile/carbondata/store/sdk/testWriteFiles/771604793030370/Metadata/schema at org.apache.carbondata.core.metadata.schema.SchemaReader.readCarbonTableFromStore(SchemaReader.java:60) at org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildFromTablePath(CarbonTable.java:272) at org.apache.carbondata.core.datamap.DataMapStoreManager.getCarbonTable(DataMapStoreManager.java:566) at org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:514) at org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:504) at org.apache.carbondata.sdk.file.CarbonReaderBuilder.getSplits(CarbonReaderBuilder.java:419) at org.apache.carbondata.sdk.file.CarbonReaderTest.testGetSplits(CarbonReaderTest.java:2605) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) cause: CarbonTable is required for only launching the job, SDK there is no need to launch job. so , no need to build a carbon table. solution: build carbon table only when need to launch job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3414) when Insert into partition table fails exception doesn't print reason.
Ajantha Bhat created CARBONDATA-3414: Summary: when Insert into partition table fails exception doesn't print reason. Key: CARBONDATA-3414 URL: https://issues.apache.org/jira/browse/CARBONDATA-3414 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat problem: when Insert into partition table fails, exception doesn't print reason. cause: Exception was caught , but error message was not from that exception. solution: throw the exception directly -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3414) when Insert into partition table fails exception doesn't print reason.
[ https://issues.apache.org/jira/browse/CARBONDATA-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3414: - Description: problem: when Insert into partition table fails, exception doesn't print reason. cause: Exception was caught , but error message was not from that exception. solution: throw the exception directly Steps to reproduce: # Open multiple spark beeline (say 10) # Create a carbon table with partition # Insert overwrite to carbon table from all the 10 beeline concurrently # some insert overwrite will be success and some will fail due to non-availability of lock even after retry. # For the failed insert into sql, Exception is just "DataLoadFailure: " no error reason is printed. Need to print the valid error reason for the failure. was: problem: when Insert into partition table fails, exception doesn't print reason. cause: Exception was caught , but error message was not from that exception. solution: throw the exception directly Steps to reproduce: # Open multiple spark beeline (say 10) # Create a carbon table with partition # Insert overwrite to carbon table from all the 10 beeline concurrently # some insert overwrite will be success and some will fail due to availability of lock even after retry. # For the failed insert into sql, Exception is just "DataLoadFailure: " no error reason is printed. Need to print the valid error reason for the failure. > when Insert into partition table fails exception doesn't print reason. > -- > > Key: CARBONDATA-3414 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3414 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Minor > > problem: when Insert into partition table fails, exception doesn't print > reason. > cause: Exception was caught , but error message was not from that exception. > solution: throw the exception directly > > Steps to reproduce: > # Open multiple spark beeline (say 10) > # Create a carbon table with partition > # Insert overwrite to carbon table from all the 10 beeline concurrently > # some insert overwrite will be success and some will fail due to > non-availability of lock even after retry. > # For the failed insert into sql, Exception is just "DataLoadFailure: " no > error reason is printed. > Need to print the valid error reason for the failure. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3414) when Insert into partition table fails exception doesn't print reason.
[ https://issues.apache.org/jira/browse/CARBONDATA-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat updated CARBONDATA-3414: - Description: problem: when Insert into partition table fails, exception doesn't print reason. cause: Exception was caught , but error message was not from that exception. solution: throw the exception directly Steps to reproduce: # Open multiple spark beeline (say 10) # Create a carbon table with partition # Insert overwrite to carbon table from all the 10 beeline concurrently # some insert overwrite will be success and some will fail due to availability of lock even after retry. # For the failed insert into sql, Exception is just "DataLoadFailure: " no error reason is printed. Need to print the valid error reason for the failure. was: problem: when Insert into partition table fails, exception doesn't print reason. cause: Exception was caught , but error message was not from that exception. solution: throw the exception directly > when Insert into partition table fails exception doesn't print reason. > -- > > Key: CARBONDATA-3414 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3414 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Minor > > problem: when Insert into partition table fails, exception doesn't print > reason. > cause: Exception was caught , but error message was not from that exception. > solution: throw the exception directly > > Steps to reproduce: > # Open multiple spark beeline (say 10) > # Create a carbon table with partition > # Insert overwrite to carbon table from all the 10 beeline concurrently > # some insert overwrite will be success and some will fail due to > availability of lock even after retry. > # For the failed insert into sql, Exception is just "DataLoadFailure: " no > error reason is printed. > Need to print the valid error reason for the failure. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3426) Fix Load performance degrade by fixing task distribution
Ajantha Bhat created CARBONDATA-3426: Summary: Fix Load performance degrade by fixing task distribution Key: CARBONDATA-3426 URL: https://issues.apache.org/jira/browse/CARBONDATA-3426 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Problem: Load performance degrade by fixing task distribution issue. Cause: Consider 3 node cluster (host name a,b,c with IP1, IP2, IP3 as ip address), to launch load task, host name is required from NewCarbonDataLoadRDD in getPreferredLocations(). But if the driver is a (IP1), result is IP1, b,c instead of a,b,c. Hence task was not launching to one executor which is same ip as driver. getLocalhostIPs is modified in current version recently and instead of IP it was returning address, hence local ip hostanme was removed instead of address. solution: Revert the change in getLocalhostIPs as it is not used in any other flow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)