subject:"Sourcing data from RedShift"

Re: Sourcing data from RedShift

2014-11-18 Thread Gary Malouf

Hi guys, We ultimately needed to add 8 ec2 xl's to get better performance. As was suspected, we could not fit all the data into ram. This worked great with files sized around 100-350MB in size as our initial export task produced. Unfortunately, for the partition settings that we were able to ge

Re: Sourcing data from RedShift

2014-11-14 Thread Gary Malouf

I'll try this out and follow up with what I find. On Fri, Nov 14, 2014 at 8:54 PM, Xiangrui Meng wrote: > For each node, if the CSV reader is implemented efficiently, you should be > able to hit at least half of the theoretical network bandwidth, which is > about 60MB/second/node. So if you just

Re: Sourcing data from RedShift

2014-11-14 Thread Gary Malouf

Hmm, we actually read the CSV data in S3 now and were looking to avoid that. Unfortunately, we've experienced dreadful performance reading 100GB of text data for a job directly from S3 - our hope had been connecting directly to Redshift would provide some boost. We had been using 12 m3.xlarges, b

Re: Sourcing data from RedShift

2014-11-14 Thread Michael Armbrust

I'd guess that its an s3n://key:secret_key@bucket/path from the UNLOAD command used to produce the data. Xiangrui can correct me if I'm wrong though. On Fri, Nov 14, 2014 at 2:19 PM, Gary Malouf wrote: > We have a bunch of data in RedShift tables that we'd like to pull in > during job runs to S

Sourcing data from RedShift

2014-11-14 Thread Gary Malouf

We have a bunch of data in RedShift tables that we'd like to pull in during job runs to Spark. What is the path/url format one uses to pull data from there? (This is in reference to using the https://github.com/mengxr/redshift-input-format)

Re: Sourcing data from RedShift

Re: Sourcing data from RedShift

Re: Sourcing data from RedShift

Re: Sourcing data from RedShift

Sourcing data from RedShift

5 matches

Site Navigation

Mail list logo

Footer information