[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2018-09-11 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610810#comment-16610810
 ] 

Reynold Xin commented on SPARK-19489:
-

We can close this now.




> Stable serialization format for external & native code integration
> --
>
> Key: SPARK-19489
> URL: https://issues.apache.org/jira/browse/SPARK-19489
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>Priority: Major
>
> As a Spark user, I want access to a (semi) stable serialization format that 
> is high performance so I can integrate Spark with my application written in 
> native code (C, C++, Rust, etc).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2018-09-11 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610798#comment-16610798
 ] 

Wes McKinney commented on SPARK-19489:
--

Since there's a native Rust library for Arrow in development it would be 
interesting to see a POC for UDFs written in Rust at some point

> Stable serialization format for external & native code integration
> --
>
> Key: SPARK-19489
> URL: https://issues.apache.org/jira/browse/SPARK-19489
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>Priority: Major
>
> As a Spark user, I want access to a (semi) stable serialization format that 
> is high performance so I can integrate Spark with my application written in 
> native code (C, C++, Rust, etc).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2018-09-11 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610668#comment-16610668
 ] 

Wenchen Fan commented on SPARK-19489:
-

[~rxin] do we still need this as we have pandas UDF now?

> Stable serialization format for external & native code integration
> --
>
> Key: SPARK-19489
> URL: https://issues.apache.org/jira/browse/SPARK-19489
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>Priority: Major
>
> As a Spark user, I want access to a (semi) stable serialization format that 
> is high performance so I can integrate Spark with my application written in 
> native code (C, C++, Rust, etc).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2017-02-08 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858577#comment-15858577
 ] 

Wes McKinney commented on SPARK-19489:
--

I'm really glad to see this is becoming a priority in 2017. 

In the same way that Google internally standardized on "RecordIO" and 
"ColumnIO" record-oriented and column-oriented serialization formats (and 
protocol buffers for anything not fitting those models), it may make sense to 
support both orientation-styles in a binary protocol to support different kinds 
of native code plugins. Spark SQL is internally record-oriented (but maybe 
in-memory columnar someday?), but native code plugins may be column-oriented. 

I've been helping lead efforts in Apache Arrow to have a stable 
lightweight/zero-copy column-oriented binary format for Python, R, and Java 
applications -- some of the results on integration with pandas and Parquet are 
encouraging:

* http://wesmckinney.com/blog/high-perf-arrow-to-pandas/
* http://wesmckinney.com/blog/arrow-streaming-columnar/
* http://wesmckinney.com/blog/python-parquet-update/

The initial Spark-Arrow work in SPARK-13534 is also promising, but having these 
kinds of fast IPC tools more deeply integrated into the Spark SQL execution 
engine (especially being able to collect results from task executors as 
serialized column batches) would unlock significantly higher performance. 

I'll be interested to learn more about the broader requirements of external 
serialization formats and the different types of use cases. 

> Stable serialization format for external & native code integration
> --
>
> Key: SPARK-19489
> URL: https://issues.apache.org/jira/browse/SPARK-19489
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>
> As a Spark user, I want access to a (semi) stable serialization format that 
> is high performance so I can integrate Spark with my application written in 
> native code (C, C++, Rust, etc).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2017-02-08 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858160#comment-15858160
 ] 

Takeshi Yamamuro commented on SPARK-19489:
--

Aha, I see.

> Stable serialization format for external & native code integration
> --
>
> Key: SPARK-19489
> URL: https://issues.apache.org/jira/browse/SPARK-19489
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>
> As a Spark user, I want access to a (semi) stable serialization format that 
> is high performance so I can integrate Spark with my application written in 
> native code (C, C++, Rust, etc).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2017-02-08 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858156#comment-15858156
 ] 

Reynold Xin commented on SPARK-19489:
-

The ticket doesn't dictate either.


> Stable serialization format for external & native code integration
> --
>
> Key: SPARK-19489
> URL: https://issues.apache.org/jira/browse/SPARK-19489
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>
> As a Spark user, I want access to a (semi) stable serialization format that 
> is high performance so I can integrate Spark with my application written in 
> native code (C, C++, Rust, etc).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2017-02-08 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858129#comment-15858129
 ] 

Takeshi Yamamuro commented on SPARK-19489:
--

This ticket means we need to make new formats for that purpose? Or, do we need 
to integrate Spark with existing efficient formats? If you mean the existing 
ones, do you have any candidate for now?

> Stable serialization format for external & native code integration
> --
>
> Key: SPARK-19489
> URL: https://issues.apache.org/jira/browse/SPARK-19489
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>
> As a Spark user, I want access to a (semi) stable serialization format that 
> is high performance so I can integrate Spark with my application written in 
> native code (C, C++, Rust, etc).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org