date:20180413

[jira] [Commented] (CARBONDATA-2340) load数据超过32000byte

2018-04-13 Thread xuchuanyin (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436926#comment-16436926
 ] 

xuchuanyin commented on CARBONDATA-2340:


[~niaoshu] This is a known issue/restriction in carbondata. The reason is that 
carbondata store the length of string using `short`.

> load数据超过32000byte
> -
>
> Key: CARBONDATA-2340
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2340
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
>Reporter: niaoshu
>Priority: Blocker
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> INFO storage.BlockManagerMasterEndpoint: Registering block manager 
> spark1:12603 with 5.2 GB RAM, BlockManagerId(1, spark1, 12603, None)
> 18/04/11 14:24:23 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
> memory on spark1:12603 (size: 34.9 KB, free: 5.2 GB)
> 18/04/11 14:24:34 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, spark1, executor 1): 
> org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException:
>  Dataload failed, String size cannot exceed 32000 bytes
>  at 
> org.apache.carbondata.processing.loading.converter.impl.NonDictionaryFieldConverterImpl.convert(NonDictionaryFieldConverterImpl.java:75)
>  at 
> org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:162)
>  at 
> org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl.processRowBatch(DataConverterProcessorStepImpl.java:104)
>  at 
> org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:91)
>  at 
> org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:77)
>  at 
> org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:214)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread ajantha-bhat

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
retest this please


---

[jira] [Comment Edited] (CARBONDATA-2340) load数据超过32000byte

2018-04-13 Thread xuchuanyin (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436926#comment-16436926
 ] 

xuchuanyin edited comment on CARBONDATA-2340 at 4/13/18 7:02 AM:
-

[~niaoshu] This is a known issue/restriction in carbondata. The reason is that 
carbondata store the length of string using `short`.

 

If we want to solve this problem, maybe we can add a new datatype called `TEXT` 
in carbondata to support this scenario.


was (Author: xuchuanyin):
[~niaoshu] This is a known issue/restriction in carbondata. The reason is that 
carbondata store the length of string using `short`.

> load数据超过32000byte
> -
>
> Key: CARBONDATA-2340
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2340
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.0
>Reporter: niaoshu
>Priority: Blocker
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> INFO storage.BlockManagerMasterEndpoint: Registering block manager 
> spark1:12603 with 5.2 GB RAM, BlockManagerId(1, spark1, 12603, None)
> 18/04/11 14:24:23 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
> memory on spark1:12603 (size: 34.9 KB, free: 5.2 GB)
> 18/04/11 14:24:34 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, spark1, executor 1): 
> org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException:
>  Dataload failed, String size cannot exceed 32000 bytes
>  at 
> org.apache.carbondata.processing.loading.converter.impl.NonDictionaryFieldConverterImpl.convert(NonDictionaryFieldConverterImpl.java:75)
>  at 
> org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:162)
>  at 
> org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl.processRowBatch(DataConverterProcessorStepImpl.java:104)
>  at 
> org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:91)
>  at 
> org.apache.carbondata.processing.loading.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:77)
>  at 
> org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:214)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:748)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread ajantha-bhat

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
@jackylk & @gvramana : please review this PR.


---

[jira] [Created] (CARBONDATA-2343) Improper filter resolver cause more filter scan on data that could be skipped

2018-04-13 Thread xuchuanyin (JIRA)

xuchuanyin created CARBONDATA-2343:
--

 Summary: Improper filter resolver cause more filter scan on data 
that could be skipped
 Key: CARBONDATA-2343
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2343
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Reporter: xuchuanyin
Assignee: xuchuanyin


In DataMapChooser, Carbondata try to choose and combine datamap for 
expressions. In some scenario, it will generate `TrueConditionalResolverImpl` 
to wrap the sub-expression, which will cause data scan on the blocklet which 
can be skipped (For `TrueConditionalResolverImpl`, the  `TrueFilterExecutor` 
will always cause scanning the data even it simply wraps a range expression).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (CARBONDATA-2327) invalid schema name _system shows when executed show schemas in presto

2018-04-13 Thread anubhav tarar (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anubhav tarar reassigned CARBONDATA-2327:
-

Assignee: anubhav tarar

> invalid schema name  _system shows when executed show schemas in presto
> ---
>
> Key: CARBONDATA-2327
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2327
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
>Affects Versions: 1.4.0
>Reporter: anubhav tarar
>Assignee: anubhav tarar
>Priority: Trivial
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> presto> show schemas;
>  Schema 
> 
>  _system 
>  default 
>  information_schema 
> (3 rows)
> Query 20180410_101915_00010_sidw4, FINISHED, 1 node
> Splits: 18 total, 18 done (100.00%)
> 0:00 [3 rows, 47B] [25 rows/s, 395B/s]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CARBONDATA-2023) Optimization in data loading for skewed data

2018-04-13 Thread xuchuanyin (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-2023.

Resolution: Fixed

> Optimization in data loading for skewed data
> 
>
> Key: CARBONDATA-2023
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2023
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Affects Versions: 1.3.0
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> In one of my cases, carbondata has to load skewed data files. The size of 
> data file ranges from 1KB to about 5GB.
> In current implementation, carbondata will distribute the file blocks(splits) 
> among the nodes to maximum the data locality and data evenly distributed, we 
> call it `block-node-assignment` for short.
> However, the current implementation has some problems.
> The assignment is block number based. The goal is to make sure that all the 
> nodes deal the same amount number of blocks. In the skewed data scenario 
> described above, the block of a small file and the block of a big file are 
> very different from its size (1KB v.s. 64MB). As a result, the difference of 
> total data size assigned for each data node is very large.
> In order to solve this problem, the size of block should be considered during 
> the block-node-assignment. One node can deal more blocks than another as long 
> as the total size of blocks are almost the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CARBONDATA-2288) Compaction should be able to run concurrently with data loading

2018-04-13 Thread xuchuanyin (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-2288.

Resolution: Fixed

> Compaction should be able to run concurrently with data loading
> ---
>
> Key: CARBONDATA-2288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2288
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
>
> Currently in carbondata, compaction can be triggered in two ways：
> 1. Manually trigger compaction using ALTER statement.
> 2. Atomically trigger compaction when doing data loading.
> In both ways, compaction and data loading cannot run concurrently. In way 1, 
> compation will fail if data load is processing. In way 2, the compaction will 
> only start after the main data loading finished and the user has to wait 
> until the compaction is finished.
> In my option, data loading will work on a new segment, whereas compaction 
> works on the existed segments, so we can let them run concurrently.
> For the 1st way, compaction will succeed even data loading is processing;
> For the 2nd way, compaction will run concurrently with the data loading, or 
> after the data loading (we can configure it). And user will not have to wait 
> the compaction finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2157: [CARBONDATA-2334] Added Property enabling user to bl...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2157
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4986/



---

[GitHub] carbondata pull request #2167: [CARBONDATA-2337][BACKPORT-1.3] Fix duplicate...

2018-04-13 Thread zzcclp

Github user zzcclp closed the pull request at:

https://github.com/apache/carbondata/pull/2167


---

[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...

2018-04-13 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2097
  
retest this please


---

[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...

2018-04-13 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2136
  
retest sdv please


---

[GitHub] carbondata pull request #2168: [CARBONDATA-2343][DataMap]Improper filter res...

2018-04-13 Thread xuchuanyin

GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2168

[CARBONDATA-2343][DataMap]Improper filter resolver cause more filter scan 
on data that could be skipped

Currently DataMapChooser will choose and combine datamap for
expressions and it will wrap the expression in a
`TrueConditionalResolverImpl`. However the executor `TrueFilterExecutor`
will always cause scanning the blocklet which could be skipped.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 `NO, only internal interface has been changed`
 - [x] Any backward compatibility impacted?
 `NO`
 - [x] Document update required?
`NO`
 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata 0413_bug_dm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2168


commit 2e2c0683f867a7ecda7a4e7f80b2c7030220cd4a
Author: xuchuanyin 
Date:   2018-04-13T07:14:25Z

Fix bugs in datamap chooser

Currently DataMapChooser will choose and combine datamap for
expressions and it will wrap the expression in a
`TrueConditionalResolverImpl`. However the executor `TrueFilterExecutor`
will always cause scanning the blocklet which could be skipped.




---

[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2113
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4989/



---

[jira] [Commented] (CARBONDATA-2318) Remove invalid table name(.ds_store) of presto integration

2018-04-13 Thread Liang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436954#comment-16436954
 ] 

Liang Chen commented on CARBONDATA-2318:


I tested it in sparkshell : 

Step1: ./bin/spark-shell --master local --jars ${carbon_jar} --driver-memory 4G

Step2: 

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val carbon = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("/Users/apple/DEMO/presto_test/data","/Users/apple/DEMO/presto_test/metadata")

Step3: reuse the old carbondata

1) : copy all data "default/carbon_table/.." to new location : 
/Users/apple/DEMO/presto_test/data
2) : run carbon.sql("refresh table carbon_table")

 

 

> Remove invalid table name(.ds_store) of presto integration 
> ---
>
> Key: CARBONDATA-2318
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2318
> Project: CarbonData
>  Issue Type: Improvement
>  Components: presto-integration
>Reporter: Liang Chen
>Priority: Minor
>
> For presto integration , will get the invalid table name via "show tables 
> from default"
> As below.
> presto:default> show tables from default;
>  Table
> 
>  .ds_store
>  carbon_table
>  carbontable
>  partition_bigtable
>  partition_table
> (5 rows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2113
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3772/



---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4435/



---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4990/



---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3771/



---

[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...

2018-04-13 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2148
  
retest this please


---

[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...

2018-04-13 Thread akashrn5

Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2113
  
retest this please


---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread ajantha-bhat

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
retest this please


---

[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2168
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3774/



---

[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2148
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3776/



---

[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2168
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4992/



---

[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2148
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4993/



---

[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2097
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3775/



---

[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2097
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4991/



---

[jira] [Created] (CARBONDATA-2344) Fix bugs in BlockletDataMap

2018-04-13 Thread xuchuanyin (JIRA)

xuchuanyin created CARBONDATA-2344:
--

 Summary: Fix bugs in BlockletDataMap
 Key: CARBONDATA-2344
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2344
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Reporter: xuchuanyin
Assignee: xuchuanyin


DMStore stores DataMapRows for each blocklet.

Currently carbondata access the DMStore by blockletId, which is not unique and 
will cause problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table

2018-04-13 Thread Ajantha Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-2313:
-
Attachment: (was: carbon unamanged table desgin doc_V1.0.pdf)

> Support unmanaged carbon table
> --
>
> Key: CARBONDATA-2313
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2313
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Ajantha Bhat
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table

2018-04-13 Thread Ajantha Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-2313:
-
Attachment: carbon unmanaged table desgin doc_V1.0.pdf

> Support unmanaged carbon table
> --
>
> Key: CARBONDATA-2313
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2313
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Ajantha Bhat
>Priority: Major
> Attachments: carbon unmanaged table desgin doc_V1.0.pdf
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #2169: [CARBONDATA-2344][DataMap] Fix bugs in mappin...

2018-04-13 Thread xuchuanyin

GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2169

[CARBONDATA-2344][DataMap] Fix bugs in mapping blocklet to UnsafeDMStore 
rows

In BlockletDataMap, carbondata stores DMRow in an array for each
blocklet. But currently carbondata accesses the DMRow only by
blockletId(0, 1, etc.), which will cause problem since different
block can have same blockletId.

This PR adds a map to map the blockId#blockletId to array index,
carbondata can access the DMRow by blockId and blockletId.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 `NO, only internal interfaces have been changed`
 - [x] Any backward compatibility impacted?
 `NO`
 - [x] Document update required?
`NO`
 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
`NO`
- How it is tested? Please attach test report.
`Tested in local`
- Is it a performance related change? Please attach the performance 
test report.
`No`
- Any additional information to help reviewers in testing this 
change.
 `NO`  
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
`Not related`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata 
0413_bug_blocklet_dm_unsafe_row

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2169


commit dd010297c7f7428dc8f42ec1a292b8cdddcc09aa
Author: xuchuanyin 
Date:   2018-04-13T08:18:23Z

Fix bugs in mapping blocklet to UnsafeDMStore

In BlockletDataMap, carbondata stores DMRow in an array for each
blocklet. But currently carbondata accesses the DMRow only by
blockletId(0, 1, etc.), which will cause problem since different
block can have same blockletId.

This PR adds a map to map the blockId#blockletId to array index,
carbondata can access the DMRow by blockId and blockletId.




---

[jira] [Commented] (CARBONDATA-2313) Support unmanaged carbon table

2018-04-13 Thread Ajantha Bhat (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437058#comment-16437058
 ] 

Ajantha Bhat commented on CARBONDATA-2313:
--

Attached the design document.

> Support unmanaged carbon table
> --
>
> Key: CARBONDATA-2313
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2313
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Ajantha Bhat
>Priority: Major
> Attachments: carbon unmanaged table desgin doc_V1.0.pdf
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table

2018-04-13 Thread Ajantha Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-2313:
-
Description: h1. Support unmanaged carbon table

> Support unmanaged carbon table
> --
>
> Key: CARBONDATA-2313
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2313
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Ajantha Bhat
>Priority: Major
> Attachments: carbon unmanaged table desgin doc_V1.0.pdf
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> h1. Support unmanaged carbon table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2313) Support unmanaged carbon table

2018-04-13 Thread Ajantha Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-2313:
-
Description: h5. Support unmanaged carbon table  (was: h1. Support 
unmanaged carbon table)

> Support unmanaged carbon table
> --
>
> Key: CARBONDATA-2313
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2313
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Ajantha Bhat
>Priority: Major
> Attachments: carbon unmanaged table desgin doc_V1.0.pdf
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> h5. Support unmanaged carbon table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2113
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4436/



---

[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2113
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3777/



---

[GitHub] carbondata issue #2113: [WIP][LUCENE_DATAMAP]load issue in lucene datamap, m...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2113
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4994/



---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3778/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3780/



---

[GitHub] carbondata pull request #2149: [CARBONDATA-2325]Page level uncompress and Im...

2018-04-13 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2149#discussion_r181342780
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/unsafe/UnsafeVariableLengthDimensionDataChunkStore.java
 ---
@@ -78,70 +88,96 @@ public UnsafeVariableLengthDimensionDataChunkStore(long 
totalSize, boolean isInv
 
 // start position will be used to store the current data position
 int startOffset = 0;
-// position from where offsets will start
-long pointerOffsets = this.dataPointersOffsets;
 // as first position will be start from 2 byte as data is stored first 
in the memory block
 // we need to skip first two bytes this is because first two bytes 
will be length of the data
 // which we have to skip
-CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(),
-dataPageMemoryBlock.getBaseOffset() + pointerOffsets,
-CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
-// incrementing the pointers as first value is already filled and as 
we are storing as int
-// we need to increment the 4 bytes to set the position of the next 
value to set
-pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE;
+int [] dataOffsets = new int[numberOfRows];
+dataOffsets[0] = CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
 // creating a byte buffer which will wrap the length of the row
-// using byte buffer as unsafe will return bytes in little-endian 
encoding
-ByteBuffer buffer = 
ByteBuffer.allocate(CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
-// store length of data
-byte[] length = new byte[CarbonCommonConstants.SHORT_SIZE_IN_BYTE];
-// as first offset is already stored, we need to start from the 2nd 
row in data array
+ByteBuffer buffer = ByteBuffer.wrap(data);
 for (int i = 1; i < numberOfRows; i++) {
-  // first copy the length of previous row
-  
CarbonUnsafe.getUnsafe().copyMemory(dataPageMemoryBlock.getBaseObject(),
-  dataPageMemoryBlock.getBaseOffset() + startOffset, length, 
CarbonUnsafe.BYTE_ARRAY_OFFSET,
-  CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
-  buffer.put(length);
-  buffer.flip();
+  buffer.position(startOffset);
   // so current row position will be
   // previous row length + 2 bytes used for storing previous row data
-  startOffset += CarbonCommonConstants.SHORT_SIZE_IN_BYTE + 
buffer.getShort();
+  startOffset += buffer.getShort() + 
CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
   // as same byte buffer is used to avoid creating many byte buffer 
for each row
   // we need to clear the byte buffer
-  buffer.clear();
-  // now put the offset of current row, here we need to add 2 more 
bytes as current will
-  // also have length part so we have to skip length
-  CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(),
-  dataPageMemoryBlock.getBaseOffset() + pointerOffsets,
-  startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
-  // incrementing the pointers as first value is already filled and as 
we are storing as int
-  // we need to increment the 4 bytes to set the position of the next 
value to set
-  pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE;
+  dataOffsets[i] = startOffset + 
CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
 }
-
+CarbonUnsafe.getUnsafe().copyMemory(dataOffsets, 
CarbonUnsafe.INT_ARRAY_OFFSET,
+dataPageMemoryBlock.getBaseObject(),
+dataPageMemoryBlock.getBaseOffset() + this.dataPointersOffsets,
+dataOffsets.length * CarbonCommonConstants.INT_SIZE_IN_BYTE);
   }
 
   /**
* Below method will be used to get the row based on row id passed
-   *
+   * Getting the row from unsafe works in below logic
+   * 1. if inverted index is present then get the row id based on reverse 
inverted index
+   * 2. get the current row id data offset
+   * 3. if it's not a last row- get the next row offset
+   * Subtract the current row offset + 2 bytes(to skip the data length) 
with next row offset
+   * 4. if it's last row
+   * subtract the current row offset + 2 bytes(to skip the data length) 
with complete data length
* @param rowId
* @return row
*/
   @Override public byte[] getRow(int rowId) {
+// get the actual row id
+rowId = getRowId(rowId);
+// get offset of data in unsafe
+int currentDataOffset = getOffSet(rowId);
+// get the data length
+short length = getLength(rowId, currentDataOffset);
+// create data array
+byte[] data = new byte[length];
+// fill the row data
+fillRowInternal(length,

[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2148
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3781/



---

[jira] [Created] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table

2018-04-13 Thread ocean (JIRA)

ocean created CARBONDATA-2345:
-

 Summary: "Task failed while writing rows" error occuers when 
streaming ingest into carbondata table
 Key: CARBONDATA-2345
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2345
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.3.1
Reporter: ocean


carbondata version:1.3.1。spark:2.2.1

When using spark structured streaming ingest data into carbondata table , such 
error occurs:

warning: there was one deprecation warning; re-run with -deprecation for details
qry: org.apache.spark.sql.streaming.StreamingQuery = 
org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a

[Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 in 
stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): 
org.apache.carbondata.streaming.CarbonStreamException: Task failed while 
writing rows
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345)
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247)
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
 at 
org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126)
 at 
org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164)
 at 
org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186)
 at 
org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244)
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336)
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326)
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326)
 at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338)
 ... 8 more

[Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: Task 
0 in stage 1.0 failed 4 times; aborting job
18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread 
for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = 
689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 
11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): 
org.apache.carbondata.streaming.CarbonStreamException: Task failed while 
writing rows
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345)
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247)
 at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
 at 
org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126)
 at 
org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164)
 at 
org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186)
 at 
org.apache.carbondata.st

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4996/



---

[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table

2018-04-13 Thread ocean (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437140#comment-16437140
 ] 

ocean commented on CARBONDATA-2345:
---

stream source is parquet file。

reproduce can use this code:

val tableName = "profile_carbondata_stream2"
 val pqtpath = "/test/stream"
 val warehouse = new File("./warehouse").getCanonicalPath
 val metastore = new File("./metastore").getCanonicalPath
 val spark = SparkSession
 .builder()
 .appName("StreamExample")
 .config("spark.sql.warehouse.dir", warehouse)
 .getOrCreateCarbonSession(warehouse, metastore)

 

val carbonTable = CarbonEnv.getCarbonTable(Some("default"), tableName)(spark)
 val tablePath = 
CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier)

var qry: StreamingQuery = null
 val userSchema = spark.read.parquet(pqtpath).schema
 val readSocketDF = spark.readStream.schema(userSchema).parquet(pqtpath)

// Write data from socket stream to carbondata file
 qry = readSocketDF.writeStream
 .format("carbondata")
 .trigger(ProcessingTime("20 seconds"))
 .option("checkpointLocation", tablePath.getStreamingCheckpointDir)
 .option("dbName", "default")
 .option("tableName", tableName)
 .outputMode("append")
 .start()

 

qry.awaitTermination()

> "Task failed while writing rows" error occuers when streaming ingest into 
> carbondata table
> --
>
> Key: CARBONDATA-2345
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2345
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.1
>Reporter: ocean
>Priority: Major
>
> carbondata version:1.3.1。spark:2.2.1
> When using spark structured streaming ingest data into carbondata table , 
> such error occurs:
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> qry: org.apache.spark.sql.streaming.StreamingQuery = 
> org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a
> [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 
> in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor 
> 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while 
> writing rows
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:108)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126)
>  at 
> org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164)
>  at 
> org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186)
>  at 
> org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338)
>  ... 8 more
> [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: 
> Task 0 in stage 1.0 failed 4 times; aborting job
> 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread 
> for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = 
> 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 4 times, most r

[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2148
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4997/



---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4995/



---

[jira] [Created] (CARBONDATA-2346) Dropping partition failing with null error for Partition table with Pre-Aggregate tables

2018-04-13 Thread Praveen M P (JIRA)

Praveen M P created CARBONDATA-2346:
---

 Summary: Dropping partition failing with null error for Partition 
table with Pre-Aggregate tables
 Key: CARBONDATA-2346
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2346
 Project: CarbonData
  Issue Type: Bug
Reporter: Praveen M P
Assignee: Praveen M P






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2161: [CARBONDATA-2218] AlluxioCarbonFile while trying to ...

2018-04-13 Thread chandrasaripaka

Github user chandrasaripaka commented on the issue:

https://github.com/apache/carbondata/pull/2161
  
@CarbonDataQA May I know if this has to be fixed from my side..as a part of 
the pull request, Kindly advise. @xubo245 Also, I dont have access to resolve 
the conflicts and recommit. Please advise.


---

[GitHub] carbondata pull request #2170: [CARBONDATA-2346] Added fix for NULL error wh...

2018-04-13 Thread praveenmeenakshi56

GitHub user praveenmeenakshi56 opened a pull request:

https://github.com/apache/carbondata/pull/2170

[CARBONDATA-2346] Added fix for NULL error while dropping partition with 
multiple Pre-Aggregate tables

Fixed null value issue for childcolumn

 - [ ] Any interfaces changed?
 NA
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
NA
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
NA  
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/praveenmeenakshi56/carbondata defect_part

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2170


commit dd3d3d1181847a1930048144740bfa053c878dd8
Author: praveenmeenakshi56 
Date:   2018-04-13T10:31:35Z

Added fix for error while dropping partition with multiple Pre-Aggregate 
tables




---

[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2170
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3782/



---

[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2136
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4438/



---

[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2170
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4998/



---

[GitHub] carbondata pull request #2171: [wip]test lucene sdv and UT in CI

2018-04-13 Thread Indhumathi27

GitHub user Indhumathi27 opened a pull request:

https://github.com/apache/carbondata/pull/2171

[wip]test lucene sdv and UT in CI

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Indhumathi27/carbondata test_ci_luc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2171


commit 46b29dd2103156a5096a04cc72960dd5170e2d9a
Author: Indhumathi27 
Date:   2018-04-13T06:29:22Z

Added UT & SDV Testcases for LuceneDataMap

commit 1be3dfd26a96cfa123de403512d4d04121340aed
Author: akashrn5 
Date:   2018-03-29T14:29:36Z

load issue in lucene datamap, make multiple directory based on taskId
make the datamap distributable object based on lucene index path written 
during load

Added Lucene Listener and Fixed Show Datamap




---

[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2170
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5000/



---

[jira] [Created] (CARBONDATA-2347) Fix Functional issues in LuceneDatamap in load and query and make stable

2018-04-13 Thread Akash R Nilugal (JIRA)

Akash R Nilugal created CARBONDATA-2347:
---

 Summary: Fix Functional issues in LuceneDatamap in load and query 
and make stable
 Key: CARBONDATA-2347
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2347
 Project: CarbonData
  Issue Type: Bug
  Components: data-load, data-query
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2170
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3784/



---

[jira] [Updated] (CARBONDATA-2347) Fix Functional issues in LuceneDatamap in load and query and make stable

2018-04-13 Thread Akash R Nilugal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal updated CARBONDATA-2347:

Description: 
1) The index write location for the lucene is same, and to IndexWriter will 
take a lock file called write.lock in write location while writing the index 
files. In carbon loading the writer tasks are launched parallel and those many 
writers are opened,Since the write.lock file is acquired by one writer, all 
other tasks will fail and dataloading will fail.

2)in query side, read index path for lucene was in single path, but after load 
fix, there will be multiple index directories after load

functional issues in drop table, drop datamap, show datamap 

 

> Fix Functional issues in LuceneDatamap in load and query and make stable
> 
>
> Key: CARBONDATA-2347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2347
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load, data-query
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Major
>
> 1) The index write location for the lucene is same, and to IndexWriter will 
> take a lock file called write.lock in write location while writing the index 
> files. In carbon loading the writer tasks are launched parallel and those 
> many writers are opened,Since the write.lock file is acquired by one writer, 
> all other tasks will fail and dataloading will fail.
> 2)in query side, read index path for lucene was in single path, but after 
> load fix, there will be multiple index directories after load
> functional issues in drop table, drop datamap, show datamap 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2171
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3783/



---

[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2171
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4999/



---

[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2168
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4437/



---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread ajantha-bhat

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
retest this please


---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4439/



---

[GitHub] carbondata pull request #2172: [CARBONDATA-2333] Block insert overwrite if a...

2018-04-13 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2172

[CARBONDATA-2333] Block insert overwrite if all partition columns are not 
present in anâ¦

â¦y one of the datamaps

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata preagg_partition_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2172


commit 5c53f555d3a46b1a0961b7eb82e7ba5df628e994
Author: kunal642 
Date:   2018-04-11T11:22:08Z

block insert overwrite if all partition columns are not present in any one 
of the datamaps




---

[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2170
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5001/



---

[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2170
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3785/



---

[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2171
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3787/



---

[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2171
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5003/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2169
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4440/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

2018-04-13 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2169
  
retest this please


---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5004/



---

[GitHub] carbondata issue #2141: [CARBONDATA-2313] Fixed SDK writer issues and added ...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2141
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3788/



---

[GitHub] carbondata issue #2166: [CARBONDATA-2341] Added Clean up of files for Pre-Ag...

2018-04-13 Thread praveenmeenakshi56

Github user praveenmeenakshi56 commented on the issue:

https://github.com/apache/carbondata/pull/2166
  
retest SDV please


---

[jira] [Commented] (CARBONDATA-2318) Remove invalid table name(.ds_store) of presto integration

2018-04-13 Thread anubhav tarar (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437283#comment-16437283
 ] 

anubhav tarar commented on CARBONDATA-2318:
---

hi i tried again using same steps that you provided but didn't able to 
replicate the issue

step 1: create carbonsession using sparkshell 
val carbon = SparkSession.builder().config(sc.getConf) 
.getOrCreateCarbonSession("/home/anubhav/Documents/prestostore/data","/home/anubhav/Documents/prestostore/metadata")

step2: copy old carbondata to new store location
/home/anubhav/Documents/carbondata/carbondata/examples/spark2/target/store/default
 to /home/anubhav/Documents/prestostore/data

step3:refresh table
scala> carbon.sql("refresh table carbonsession_table").show
18/04/13 13:43:48 AUDIT CarbonCreateTableCommand: 
[anubhav-Vostro-3559][anubhav][Thread-1]Creating Table with Database name 
[default] and Table name [carbonsession_table]
18/04/13 13:43:49 WARN HiveExternalCatalog: Couldn't find corresponding Hive 
SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting 
data source table `default`.`carbonsession_table` into Hive metastore in Spark 
SQL specific format, which is NOT compatible with Hive.
18/04/13 13:43:49 AUDIT CarbonCreateTableCommand: 
[anubhav-Vostro-3559][anubhav][Thread-1]Table created with Database name 
[default] and Table name [carbonsession_table]
18/04/13 13:43:49 AUDIT RefreshCarbonTableCommand: 
[anubhav-Vostro-3559][anubhav][Thread-1]Table registration with Database name 
[default] and Table name [carbonsession_table] is successful.

step4:query the new store from presto

./presto-cli-0.187-executable.jar --server localhost:9000 --catalog carbondata 
presto> show tables from default;
 Table 
-
 carbonsession_table 
(1 row)

Query 20180413_080021_0_vev2q, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:02 [1 rows, 36B] [0 rows/s, 22B/s]

> Remove invalid table name(.ds_store) of presto integration 
> ---
>
> Key: CARBONDATA-2318
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2318
> Project: CarbonData
>  Issue Type: Improvement
>  Components: presto-integration
>Reporter: Liang Chen
>Priority: Minor
>
> For presto integration , will get the invalid table name via "show tables 
> from default"
> As below.
> presto:default> show tables from default;
>  Table
> 
>  .ds_store
>  carbon_table
>  carbontable
>  partition_bigtable
>  partition_table
> (5 rows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2169
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4441/



---

[GitHub] carbondata issue #2148: [CARBONDATA-2323][WIP] Distributed search mode using...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2148
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4442/



---

[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2171
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5006/



---

[GitHub] carbondata issue #2172: [CARBONDATA-2333] Block insert overwrite if all part...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2172
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3789/



---

[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2171
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3790/



---

[GitHub] carbondata issue #2172: [CARBONDATA-2333] Block insert overwrite if all part...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2172
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5005/



---

[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...

2018-04-13 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2113#discussion_r181394048
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1642,6 +1642,16 @@
 
   public static final String CARBON_SEARCH_MODE_THREAD_DEFAULT = "3";
 
+  /**
+   * compression mode used by lucene for index writing
+   */
+  public static final String CARBON_LUCENE_COMPRESSION_MODE = 
"carbon.lucene.compression.mode";
--- End diff --

what are the options available for this property?


---

[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...

2018-04-13 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2113#discussion_r181396888
  
--- Diff: datamap/lucene/pom.xml ---
@@ -141,6 +141,34 @@
   
 
   
+  
--- End diff --

I realize that in this pom, it should not depend on carbon-spark2, please 
modify the dependency in this pom


---

[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...

2018-04-13 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2113#discussion_r181397988
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala ---
@@ -173,6 +174,10 @@ object CarbonEnv {
   .addListener(classOf[AlterTableDropPartitionPostStatusEvent],
 AlterTableDropPartitionPostStatusListener)
   .addListener(classOf[AlterTableDropPartitionMetaEvent], 
AlterTableDropPartitionMetaListener)
+  .addListener(classOf[AlterTableRenamePreEvent], 
LuceneRenameTablePreListener)
--- End diff --

Is this required? Ideally, lucene datamap is a separate module which should 
not have intrusive modification in other modules


---

[GitHub] carbondata pull request #2113: [CARBONDATA-2347][LUCENE_DATAMAP]load issue i...

2018-04-13 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2113#discussion_r181399425
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala
 ---
@@ -69,11 +69,33 @@ case class CarbonCreateDataMapCommand(
 }
 
 dataMapSchema = new DataMapSchema(dataMapName, dmClassName)
-if (mainTable != null &&
-mainTable.isStreamingTable &&
-
!(dataMapSchema.getProviderName.equalsIgnoreCase(DataMapClassProvider.PREAGGREGATE.toString)
-  || dataMapSchema.getProviderName
-.equalsIgnoreCase(DataMapClassProvider.TIMESERIES.toString))) {
+if 
(dataMapSchema.getProviderName.equalsIgnoreCase(DataMapClassProvider.LUCENEFG.toString)
 ||
--- End diff --

I think we should abstract interface for it. We can not add if check for 
every new datamap added


---

[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...

2018-04-13 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2136
  
retest this please


---

[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...

2018-04-13 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2097
  
retest this please


---

[GitHub] carbondata issue #2170: [CARBONDATA-2346] Added fix for NULL error while dro...

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2170
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/4443/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5007/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2169
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3791/



---

[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2097
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3793/



---

[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...

2018-04-13 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/2097
  
retest this please


---

[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2097
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5009/



---

[GitHub] carbondata issue #2171: [wip]test lucene sdv and UT in CI

2018-04-13 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2171
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests//



---

[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2136
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3792/



---

[GitHub] carbondata issue #2136: [CARBONDATA-2307] Fix OOM issue when using DataFrame...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2136
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5008/



---

[jira] [Commented] (CARBONDATA-2345) "Task failed while writing rows" error occuers when streaming ingest into carbondata table

2018-04-13 Thread Zhichao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437420#comment-16437420
 ] 

Zhichao  Zhang commented on CARBONDATA-2345:


[~oceaneast], you need to add below option into 'writeStream' block:

 
{code:java}
.option(CarbonStreamParser.CARBON_STREAM_PARSER,
 CarbonStreamParser.CARBON_STREAM_PARSER_ROW_PARSER)
 
{code}
 

for example:

 
{code:java}
qry = readSocketDF.writeStream
.format("carbondata")
.trigger(ProcessingTime("20 seconds"))
.option("checkpointLocation", tablePath.getStreamingCheckpointDir)
.option("dbName", "default")
.option("tableName", tableName)
.option(CarbonStreamParser.CARBON_STREAM_PARSER,
CarbonStreamParser.CARBON_STREAM_PARSER_ROW_PARSER)
.outputMode("append")
.start()
{code}
 

 

Please try again.

> "Task failed while writing rows" error occuers when streaming ingest into 
> carbondata table
> --
>
> Key: CARBONDATA-2345
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2345
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.3.1
>Reporter: ocean
>Priority: Major
>
> carbondata version:1.3.1。spark:2.2.1
> When using spark structured streaming ingest data into carbondata table , 
> such error occurs:
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> qry: org.apache.spark.sql.streaming.StreamingQuery = 
> org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@7ddf193a
> [Stage 1:> (0 + 2) / 5]18/04/13 18:03:56 WARN TaskSetManager: Lost task 1.0 
> in stage 1.0 (TID 2, sz-pg-entanalytics-research-004.tendcloud.com, executor 
> 1): org.apache.carbondata.streaming.CarbonStreamException: Task failed while 
> writing rows
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:247)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply$mcV$sp$1.apply(CarbonAppendableStreamSink.scala:246)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:108)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.carbondata.processing.loading.BadRecordsLogger.addBadRecordsToBuilder(BadRecordsLogger.java:126)
>  at 
> org.apache.carbondata.processing.loading.converter.impl.RowConverterImpl.convert(RowConverterImpl.java:164)
>  at 
> org.apache.carbondata.hadoop.streaming.CarbonStreamRecordWriter.write(CarbonStreamRecordWriter.java:186)
>  at 
> org.apache.carbondata.streaming.segment.StreamSegment.appendBatchData(StreamSegment.java:244)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:336)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileTask$1.apply(CarbonAppendableStreamSink.scala:326)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:338)
>  ... 8 more
> [Stage 1:===> (1 + 2) / 5]18/04/13 18:03:57 ERROR TaskSetManager: 
> Task 0 in stage 1.0 failed 4 times; aborting job
> 18/04/13 18:03:57 ERROR CarbonAppendableStreamSink$: stream execution thread 
> for [id = 3abdadea-65f6-4d94-8686-306fccae4559, runId = 
> 689adf7e-a617-41d9-96bc-de075ce4dd73] Aborting job job_20180413180354_.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 
> (TID 11, sz-pg-entanalytics-research-004.tendcloud.com, executor 1): 
> org.apache.carbondata.streaming.CarbonStreamException: Task failed while 
> writing rows
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileTask(CarbonAppendableStreamSink.scala:345)
>  at 
> org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1$$anonfun$apply

[GitHub] carbondata issue #2097: [CARBONDATA-2275]Query Failed for 0 byte deletedelta...

2018-04-13 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2097
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5010/



---

[GitHub] carbondata issue #2168: [CARBONDATA-2343][DataMap]Improper filter resolver c...

2018-04-13 Thread jackylk

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2168
  
LGTM


---

[GitHub] carbondata pull request #2168: [CARBONDATA-2343][DataMap]Improper filter res...

2018-04-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2168


---

[jira] [Resolved] (CARBONDATA-2343) Improper filter resolver cause more filter scan on data that could be skipped

2018-04-13 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2343.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

> Improper filter resolver cause more filter scan on data that could be skipped
> -
>
> Key: CARBONDATA-2343
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2343
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
> Fix For: 1.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In DataMapChooser, Carbondata try to choose and combine datamap for 
> expressions. In some scenario, it will generate `TrueConditionalResolverImpl` 
> to wrap the sub-expression, which will cause data scan on the blocklet which 
> can be skipped (For `TrueConditionalResolverImpl`, the  `TrueFilterExecutor` 
> will always cause scanning the data even it simply wraps a range expression).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 115 matches

Mail list logo