[jira] [Updated] (CARBONDATA-322) Integration with spark 2.x

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-322:
-
Issue Type: New Feature  (was: Improvement)

> Integration with  spark 2.x 
> 
>
> Key: CARBONDATA-322
> URL: https://issues.apache.org/jira/browse/CARBONDATA-322
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Affects Versions: 0.2.0-incubating
>Reporter: Fei Wang
>Assignee: Fei Wang
> Fix For: 1.0.0-incubating
>
>
> Since spark 2.0 released. there are many nice features such as more efficient 
> parser, vectorized execution, adaptive execution. 
> It is good to integrate with spark 2.x
> current integration up to Spark v1.6 is tightly coupled with spark, we would 
> like to cleanup the interface with following design points in mind: 
> 1. decoupled with Spark, integration based on Spark's v2 datasource API
> 2. Enable vectorized carbon reader
> 3. Support saving DataFrame to Carbondata file through Carbondata's output 
> format.
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-322) Integration with spark 2.x

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-322:
-
Description: 
Since spark 2.0 released. there are many nice features such as more efficient 
parser, vectorized execution, adaptive execution. 
It is good to integrate with spark 2.x

current integration up to Spark v1.6 is tightly coupled with spark, we would 
like to cleanup the interface with following design points in mind: 

1. decoupled with Spark, integration based on Spark's v2 datasource API
2. Enable vectorized carbon reader
3. Support saving DataFrame to Carbondata file through Carbondata's output 
format.
...


  was:
As spark 2.0 released. there are many nice features such as more efficient 
parser, vectorized execution, adaptive execution. 
It is good to integrate with spark 2.x

Another side now in carbondata, spark integration is heavy coupling with spark 
code and the code need clean, we should redesign the spark integration, it 
should satisfy flowing requirement:

1. decoupled with spark, integrate according to spark datasource API(V2)
2. This integration should support vectorized carbon reader
3. Supoort write to carbondata from dadatrame
...


 Issue Type: Improvement  (was: Bug)
Summary: Integration with  spark 2.x   (was: integrate spark 2.x )

> Integration with  spark 2.x 
> 
>
> Key: CARBONDATA-322
> URL: https://issues.apache.org/jira/browse/CARBONDATA-322
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.2.0-incubating
>Reporter: Fei Wang
>Assignee: Fei Wang
> Fix For: 1.0.0-incubating
>
>
> Since spark 2.0 released. there are many nice features such as more efficient 
> parser, vectorized execution, adaptive execution. 
> It is good to integrate with spark 2.x
> current integration up to Spark v1.6 is tightly coupled with spark, we would 
> like to cleanup the interface with following design points in mind: 
> 1. decoupled with Spark, integration based on Spark's v2 datasource API
> 2. Enable vectorized carbon reader
> 3. Support saving DataFrame to Carbondata file through Carbondata's output 
> format.
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)