Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Niranda Perera
Hi Michael,

About this new data source API, what type of data sources would it support?
Does it have to be RDBMS necessarily?

Cheers

On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust mich...@databricks.com
wrote:

 You probably don't need to create a new kind of SchemaRDD.  Instead I'd
 suggest taking a look at the data sources API that we are adding in Spark
 1.2.  There is not a ton of documentation, but the test cases show how to
 implement the various interfaces
 https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources,
 and there is an example library for reading Avro data
 https://github.com/databricks/spark-avro.

 On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera nira...@wso2.com wrote:

 Hi,

 I am evaluating Spark for an analytic component where we do batch
 processing of data using SQL.

 So, I am particularly interested in Spark SQL and in creating a SchemaRDD
 from an existing API [1].

 This API exposes elements in a database as datasources. Using the methods
 allowed by this data source, we can access and edit data.

 So, I want to create a custom SchemaRDD using the methods and provisions
 of
 this API. I tried going through Spark documentation and the Java Docs, but
 unfortunately, I was unable to come to a final conclusion if this was
 actually possible.

 I would like to ask the Spark Devs,
 1. As of the current Spark release, can we make a custom SchemaRDD?
 2. What is the extension point to a custom SchemaRDD? or are there
 particular interfaces?
 3. Could you please point me the specific docs regarding this matter?

 Your help in this regard is highly appreciated.

 Cheers

 [1]

 https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

 --
 *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44





-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 https://twitter.com/N1R44


Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Michael Armbrust
No, it should support any data source that has a schema and can produce
rows.

On Mon, Dec 1, 2014 at 1:34 AM, Niranda Perera nira...@wso2.com wrote:

 Hi Michael,

 About this new data source API, what type of data sources would it
 support? Does it have to be RDBMS necessarily?

 Cheers

 On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust mich...@databricks.com
  wrote:

 You probably don't need to create a new kind of SchemaRDD.  Instead I'd
 suggest taking a look at the data sources API that we are adding in Spark
 1.2.  There is not a ton of documentation, but the test cases show how
 to implement the various interfaces
 https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources,
 and there is an example library for reading Avro data
 https://github.com/databricks/spark-avro.

 On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera nira...@wso2.com
 wrote:

 Hi,

 I am evaluating Spark for an analytic component where we do batch
 processing of data using SQL.

 So, I am particularly interested in Spark SQL and in creating a SchemaRDD
 from an existing API [1].

 This API exposes elements in a database as datasources. Using the methods
 allowed by this data source, we can access and edit data.

 So, I want to create a custom SchemaRDD using the methods and provisions
 of
 this API. I tried going through Spark documentation and the Java Docs,
 but
 unfortunately, I was unable to come to a final conclusion if this was
 actually possible.

 I would like to ask the Spark Devs,
 1. As of the current Spark release, can we make a custom SchemaRDD?
 2. What is the extension point to a custom SchemaRDD? or are there
 particular interfaces?
 3. Could you please point me the specific docs regarding this matter?

 Your help in this regard is highly appreciated.

 Cheers

 [1]

 https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

 --
 *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44





 --
 *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44



Re: Creating a SchemaRDD from an existing API

2014-11-28 Thread Michael Armbrust
You probably don't need to create a new kind of SchemaRDD.  Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2.  There is not a ton of documentation, but the test cases show how to
implement the various interfaces
https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources,
and there is an example library for reading Avro data
https://github.com/databricks/spark-avro.

On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera nira...@wso2.com wrote:

 Hi,

 I am evaluating Spark for an analytic component where we do batch
 processing of data using SQL.

 So, I am particularly interested in Spark SQL and in creating a SchemaRDD
 from an existing API [1].

 This API exposes elements in a database as datasources. Using the methods
 allowed by this data source, we can access and edit data.

 So, I want to create a custom SchemaRDD using the methods and provisions of
 this API. I tried going through Spark documentation and the Java Docs, but
 unfortunately, I was unable to come to a final conclusion if this was
 actually possible.

 I would like to ask the Spark Devs,
 1. As of the current Spark release, can we make a custom SchemaRDD?
 2. What is the extension point to a custom SchemaRDD? or are there
 particular interfaces?
 3. Could you please point me the specific docs regarding this matter?

 Your help in this regard is highly appreciated.

 Cheers

 [1]

 https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

 --
 *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44



Creating a SchemaRDD from an existing API

2014-11-27 Thread Niranda Perera
Hi,

I am evaluating Spark for an analytic component where we do batch
processing of data using SQL.

So, I am particularly interested in Spark SQL and in creating a SchemaRDD
from an existing API [1].

This API exposes elements in a database as datasources. Using the methods
allowed by this data source, we can access and edit data.

So, I want to create a custom SchemaRDD using the methods and provisions of
this API. I tried going through Spark documentation and the Java Docs, but
unfortunately, I was unable to come to a final conclusion if this was
actually possible.

I would like to ask the Spark Devs,
1. As of the current Spark release, can we make a custom SchemaRDD?
2. What is the extension point to a custom SchemaRDD? or are there
particular interfaces?
3. Could you please point me the specific docs regarding this matter?

Your help in this regard is highly appreciated.

Cheers

[1]
https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 https://twitter.com/N1R44