Re: Leveraging S3 select

2017-12-08 Thread Andrew Duffy
Hey Steve, Happen to have a link to the TPC-DS benchmark data w/random S3 reads? I've done a decent amount of digging, but all I've found is a reference in a slide deck and some jira tickets. From: Steve Loughran Date: Tuesday, December 5, 2017 at 09:44 To: "Lalwani, Jayesh" Cc: Apache Spark

master snapshots not publishing?

2016-07-21 Thread Andrew Duffy
I’m trying to use a Snapshot build off of master, and after looking through Jenkins it appears that the last commit where the snapshot was built is back on 757dc2c09d23400dacac22e51f52062bbe471136, 22 days ago: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-sna

Re: master snapshots not publishing?

2016-07-21 Thread Andrew Duffy
the 2.0.0-SNAPSHOTS which are > generated off of branch-2.0. > > I can go ahead and re-enable it later today. > > On Thu, Jul 21, 2016 at 11:10 AM Andrew Duffy wrote: > >> I’m trying to use a Snapshot build off of master, and after looking >> through Jenkins it appears tha

Re: What's the use of RangePartitioner.hashCode

2016-09-21 Thread Andrew Duffy
Pedantic note about hashCode and equals: the equality doesn't need to be bidirectional, you just need to ensure that a.hashCode == b.hashCode when a.equals(b), the bidirectional case is usually harder to satisfy due to possibility of collisions. Good info: http://www.programcreek.com/2011/07/j

Re: Broadcast big dataset

2016-09-28 Thread Andrew Duffy
Have you tried upping executor memory? There's a separate spark conf for that: spark.executor.memory In general driver configurations don't automatically apply to executors. On Wed, Sep 28, 2016 at 7:03 AM -0700, "WangJianfei" wrote: Hi Devs In my application, i just broadcast a

Re: Support for local disk columnar storage for DataFrames

2015-11-12 Thread Andrew Duffy
Relevant link: http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files On Wed, Nov 11, 2015 at 7:31 PM, Reynold Xin wrote: > Thanks for the email. Can you explain what the difference is between this > and existing formats such as Parquet/ORC? > > > On Wed, Nov 11, 2015 at 4: