[jira] [Commented] (BAHIR-67) Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
[ https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573088#comment-15573088 ] Steve Loughran commented on BAHIR-67: - this is very much a sibling of the SPARK-7481 patch where I've been trying to add a module for dependencies and tests. ignoring the problem of getting a webhdfs JAR into SPARK_HOME/jars, the tests in that module should cover what's needed, both in terms of operations (basic IO) and the more minimal classpath/config checking. I think you could bring up minidfs cluster in webhdfs mode, so have a functional test of things > Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster > --- > > Key: BAHIR-67 > URL: https://issues.apache.org/jira/browse/BAHIR-67 > Project: Bahir > Issue Type: Improvement > Components: Spark SQL Data Sources >Reporter: Sourav Mazumder > Original Estimate: 336h > Remaining Estimate: 336h > > In today's world of Analytics many use cases need capability to access data > from multiple remote data sources in Spark. Though Spark has great > integration with local Hadoop cluster it lacks heavily on capability for > connecting to a remote Hadoop cluster. However, in reality not all data of > enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster > is not always a solution. > In this improvement we propose to create a connector for accessing data (read > and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs > api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BAHIR-67) Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
[ https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572953#comment-15572953 ] Luciano Resende commented on BAHIR-67: -- Thanks [~sourav-mazumder], it would be great to enable high level SQL APIs to go over remote webhdfs which can also help in multi-cluster environment or cloud/hybrid cloud environments. Are you planning to submit a PR for this ? > Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster > --- > > Key: BAHIR-67 > URL: https://issues.apache.org/jira/browse/BAHIR-67 > Project: Bahir > Issue Type: Improvement > Components: Spark SQL Data Sources >Affects Versions: Not Applicable >Reporter: Sourav Mazumder > Fix For: Spark-2.0.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > In today's world of Analytics many use cases need capability to access data > from multiple remote data sources in Spark. Though Spark has great > integration with local Hadoop cluster it lacks heavily on capability for > connecting to a remote Hadoop cluster. However, in reality not all data of > enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster > is not always a solution. > In this improvement we propose to create a connector for accessing data (read > and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs > api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BAHIR-67) Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
[ https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572371#comment-15572371 ] Steve Loughran commented on BAHIR-67: - Is this really just a matter of getting hadoop webhdfs on the CP? > Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster > --- > > Key: BAHIR-67 > URL: https://issues.apache.org/jira/browse/BAHIR-67 > Project: Bahir > Issue Type: Improvement > Components: Spark SQL Data Sources >Affects Versions: Not Applicable >Reporter: Sourav Mazumder > Fix For: Spark-2.0.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > In today's world of Analytics many use cases need capability to access data > from multiple remote data sources in Spark. Though Spark has great > integration with local Hadoop cluster it lacks heavily on capability for > connecting to a remote Hadoop cluster. However, in reality not all data of > enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster > is not always a solution. > In this improvement we propose to create a connector for accessing data (read > and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs > api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)