Re: Next Releases

2016-10-13 Thread Ted Yu
Spark 2.0.1 is in maven:

https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/2.0.1/

Time to roll Bahir for Spark 2.0.1 release ?

On Mon, Sep 26, 2016 at 11:48 AM, Ted Yu  wrote:

> +1
>
> > On Sep 26, 2016, at 11:33 AM, Jean-Baptiste Onofré 
> wrote:
> >
> > +1
> >
> > Regards
> > JB
> >
> >> On 09/26/2016 08:32 PM, Luciano Resende wrote:
> >> Looks like Spark is working on 2.0.1 release (RC3 being voted now), and
> I
> >> was wondering if we should use this opportunity to create a Bahir for
> Spark
> >> 2.0.1 release and also a Bahir for Flink 1.0 release.
> >>
> >> Thoughts ?
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>


[jira] [Commented] (BAHIR-67) Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster

2016-10-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573088#comment-15573088
 ] 

Steve Loughran commented on BAHIR-67:
-

this is very much a sibling of the SPARK-7481 patch where I've been trying to 
add a module for dependencies and tests. ignoring the problem of getting a 
webhdfs JAR into SPARK_HOME/jars, the tests in that module should cover what's 
needed, both in terms of operations (basic IO) and the more minimal 
classpath/config checking.

I think you could bring up minidfs cluster in webhdfs mode, so have a 
functional test of things

> Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
> ---
>
> Key: BAHIR-67
> URL: https://issues.apache.org/jira/browse/BAHIR-67
> Project: Bahir
>  Issue Type: Improvement
>  Components: Spark SQL Data Sources
>Reporter: Sourav Mazumder
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In today's world of Analytics many use cases need capability to access data 
> from multiple remote data sources in Spark. Though Spark has great 
> integration with local Hadoop cluster it lacks heavily on capability for 
> connecting to a remote Hadoop cluster. However, in reality not all data of 
> enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster 
> is not always a solution.
> In this improvement we propose to create a connector for accessing data (read 
> and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs 
> api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BAHIR-67) Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster

2016-10-13 Thread Luciano Resende (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572953#comment-15572953
 ] 

Luciano Resende commented on BAHIR-67:
--

Thanks [~sourav-mazumder], it would be great to enable high level SQL APIs to 
go over remote webhdfs which can also help in multi-cluster environment or 
cloud/hybrid cloud environments. Are you planning to submit a PR for this ?

> Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
> ---
>
> Key: BAHIR-67
> URL: https://issues.apache.org/jira/browse/BAHIR-67
> Project: Bahir
>  Issue Type: Improvement
>  Components: Spark SQL Data Sources
>Affects Versions: Not Applicable
>Reporter: Sourav Mazumder
> Fix For: Spark-2.0.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In today's world of Analytics many use cases need capability to access data 
> from multiple remote data sources in Spark. Though Spark has great 
> integration with local Hadoop cluster it lacks heavily on capability for 
> connecting to a remote Hadoop cluster. However, in reality not all data of 
> enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster 
> is not always a solution.
> In this improvement we propose to create a connector for accessing data (read 
> and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs 
> api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BAHIR-67) Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster

2016-10-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572371#comment-15572371
 ] 

Steve Loughran commented on BAHIR-67:
-

Is this really just a matter of getting hadoop webhdfs on the CP?

> Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
> ---
>
> Key: BAHIR-67
> URL: https://issues.apache.org/jira/browse/BAHIR-67
> Project: Bahir
>  Issue Type: Improvement
>  Components: Spark SQL Data Sources
>Affects Versions: Not Applicable
>Reporter: Sourav Mazumder
> Fix For: Spark-2.0.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In today's world of Analytics many use cases need capability to access data 
> from multiple remote data sources in Spark. Though Spark has great 
> integration with local Hadoop cluster it lacks heavily on capability for 
> connecting to a remote Hadoop cluster. However, in reality not all data of 
> enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster 
> is not always a solution.
> In this improvement we propose to create a connector for accessing data (read 
> and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs 
> api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)