Re: redshift spark

2015-06-17 Thread Xiangrui Meng
Hi Hafiz,

As Ewan mentioned, the path is the path to the S3 files unloaded from
Redshift. This is a more scalable way to get a large amount of data
from Redshift than via JDBC. I'd recommend using the SQL API instead
of the Hadoop API (https://github.com/databricks/spark-redshift).

Best,
Xiangrui

On Fri, Jun 5, 2015 at 7:29 AM, Ewan Leith  wrote:
> That project is for reading data in from Redshift table exports stored in s3 
> by running commands in redshift like this:
>
> unload ('select * from venue')
> to 's3://mybucket/tickit/unload/'
>
> http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html
>
> The path in the parameters below is the s3 bucket path.
>
> Hope this helps,
> Ewan
>
> -Original Message-
> From: Hafiz Mujadid [mailto:hafizmujadi...@gmail.com]
> Sent: 05 June 2015 15:25
> To: user@spark.apache.org
> Subject: redshift spark
>
> Hi All,
>
> I want to read and write data to aws redshift. I found spark-redshift project 
> at following address.
> https://github.com/databricks/spark-redshift
>
> in its documentation there is following code is written.
> import com.databricks.spark.redshift.RedshiftInputFormat
>
> val records = sc.newAPIHadoopFile(
>   path,
>   classOf[RedshiftInputFormat],
>   classOf[java.lang.Long],
>   classOf[Array[String]])
>
> I am unable to understand it's parameters. Can somebody explain how to use 
> this? what is meant by path in this case?
>
> thanks
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/redshift-spark-tp23175.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
> commands, e-mail: user-h...@spark.apache.org
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: redshift spark

2015-06-05 Thread Ewan Leith
That project is for reading data in from Redshift table exports stored in s3 by 
running commands in redshift like this:

unload ('select * from venue')   
to 's3://mybucket/tickit/unload/'

http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html

The path in the parameters below is the s3 bucket path.

Hope this helps,
Ewan

-Original Message-
From: Hafiz Mujadid [mailto:hafizmujadi...@gmail.com] 
Sent: 05 June 2015 15:25
To: user@spark.apache.org
Subject: redshift spark

Hi All,

I want to read and write data to aws redshift. I found spark-redshift project 
at following address.
https://github.com/databricks/spark-redshift

in its documentation there is following code is written. 
import com.databricks.spark.redshift.RedshiftInputFormat

val records = sc.newAPIHadoopFile(
  path,
  classOf[RedshiftInputFormat],
  classOf[java.lang.Long],
  classOf[Array[String]])

I am unable to understand it's parameters. Can somebody explain how to use 
this? what is meant by path in this case?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/redshift-spark-tp23175.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org