[GitHub] incubator-carbondata issue #799: [CARBONDATA-929] adding optional field numb...

2017-04-16 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on the issue:

https://github.com/apache/incubator-carbondata/pull/799
  
Manual PR builder pass for spark2.1 is passed.
http://136.243.101.176:8080/job/ManualApacheCarbonPRBuilder2.1/115/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #806: Docs for optimizing mass data loading

2017-04-16 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/806
  
Build Failed  with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1653/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #805: [WIP] Cast Push Down Optimization

2017-04-16 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/805
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1652/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #805: [WIP] Cast Push Down Optimization

2017-04-16 Thread sounakr
GitHub user sounakr opened a pull request:

https://github.com/apache/incubator-carbondata/pull/805

[WIP] Cast Push Down Optimization

Problem : With Filter Expression with Casts, PushDown to Carbon layer was 
not happening. 

Analysis : Cast Expressions are handled in Spark. So Whenever Cast 
Expressions are present in the Filter predicates then Carbon Scan all its data 
and move it to spark for evaluation. This makes the query processing slow.

Fix : Handle Cast Expression in Carbon Layer. As of now it is handled in 
Spark -2.1 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sounakr/incubator-carbondata casts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/805.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #805


commit 2c65a98b5c181f144f3d9ff1033d01bbb604e567
Author: sounakr 
Date:   2017-04-04T09:46:46Z

Cast Push Down Optimization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (CARBONDATA-906) Always OOM error when import large dataset (100milion rows)

2017-04-16 Thread Liang Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970356#comment-15970356
 ] 

Liang Chen commented on CARBONDATA-906:
---

[~crabo] 

Totally ,did you use 20 nodes to load data ?
Please do the below two actions:
1. please configure the bellow 3 parameters in carbon.properties(note: please 
copy the
latest carbon.properties to all nodes)
carbon.graph.rowset.size=1   (by default is 10, please set to 1/10
for reducing Rowset size exchanged between data load graph)

carbon.number.of.cores.while.loading=5 (if your machine has 5 cores, then set 
to 5, if your each machine has 6 cores,then set to 6)

carbon.sort.size=5 ( by default is 50, please set to 1/10 for
reducing temp intermediate files)

2.For high cardinality String column, please use DICTIONARY_EXCLUDE, you can 
refer to : 
https://github.com/apache/incubator-carbondata/blob/master/docs/useful-tips-on-carbondata.md




> Always OOM error when import large dataset (100milion rows)
> ---
>
> Key: CARBONDATA-906
> URL: https://issues.apache.org/jira/browse/CARBONDATA-906
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.0.0-incubating
>Reporter: Crabo Yang
> Attachments: carbon.properties
>
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at 
> java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:457)
>   at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>   at 
> org.apache.carbondata.core.cache.dictionary.ColumnReverseDictionaryInfo.addDataToDictionaryMap(ColumnReverseDictionaryInfo.java:101)
>   at 
> org.apache.carbondata.core.cache.dictionary.ColumnReverseDictionaryInfo.addDictionaryChunk(ColumnReverseDictionaryInfo.java:88)
>   at 
> org.apache.carbondata.core.cache.dictionary.DictionaryCacheLoaderImpl.fillDictionaryValuesAndAddToDictionaryChunks(DictionaryCacheLoaderImpl.java:113)
>   at 
> org.apache.carbondata.core.cache.dictionary.DictionaryCacheLoaderImpl.load(DictionaryCacheLoaderImpl.java:81)
>   at 
> org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.loadDictionaryData(AbstractDictionaryCache.java:236)
>   at 
> org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.checkAndLoadDictionaryData(AbstractDictionaryCache.java:186)
>   at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.getDictionary(ReverseDictionaryCache.java:174)
>   at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:67)
>   at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:38)
>   at 
> org.apache.carbondata.processing.newflow.converter.impl.DictionaryFieldConverterImpl.(DictionaryFieldConverterImpl.java:92)
>   at 
> org.apache.carbondata.processing.newflow.converter.impl.FieldEncoderFactory.createFieldEncoder(FieldEncoderFactory.java:77)
>   at 
> org.apache.carbondata.processing.newflow.converter.impl.RowConverterImpl.initialize(RowConverterImpl.java:102)
>   at 
> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl.initialize(DataConverterProcessorStepImpl.java:69)
>   at 
> org.apache.carbondata.processing.newflow.steps.SortProcessorStepImpl.initialize(SortProcessorStepImpl.java:57)
>   at 
> org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.initialize(DataWriterProcessorStepImpl.java:79)
>   at 
> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:45)
>   at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.(NewCarbonDataLoadRDD.scala:425)
>   at 
> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:383)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)