Re: [discuss] making SparkEnv private in Spark 2.0

2016-03-19 Thread Mridul Muralidharan
We use it in executors to get to : a) spark conf (for getting to hadoop config in map doing custom writing of side-files) b) Shuffle manager (to get shuffle reader) Not sure if there are alternative ways to get to these. Regards, Mridul On Wed, Mar 16, 2016 at 2:52 PM, Reynold Xin wrote: > Any

Re: help coercing types

2016-03-19 Thread Jacek Laskowski
Hi, Just a side question: why do you convert DataFrame to RDD? It's like driving backwards (possible but ineffective and dangerous at times) P. S. I'd even go for Dataset. Jacek 18.03.2016 5:20 PM "Bauer, Robert" napisał(a): > I have data that I pull in using a sql context and then I convert t

Re: sql timestamp timezone bug

2016-03-19 Thread Andy Davidson
For completeness. Clearly spark sql returned a different data set In [4]: rawDF.selectExpr("count(row_key) as num_samples", "sum(count) as total", "max(count) as max ").show() +---++-+ |num_samples|total|max| +---

Re: sql timestamp timezone bug

2016-03-19 Thread Andy Davidson
Hi Davies > > What's the type of `created`? TimestampType? The Œcreated¹ column in cassandra is a timestamp https://docs.datastax.com/en/cql/3.0/cql/cql_reference/timestamp_type_r.html In the spark data frame it is a a timestamp > > If yes, when created is compared to a string, it will be c

RE: Extra libs in executor classpath

2016-03-19 Thread Silvio Fiorito
Could you publish it as a library (to an internal repo) then you can simply use the “--packages" option? Also will help with versioning as you make changes, that way you’re not having to manually ship JARs around to your machines and users. From: Леонид Поляков Sent:

Re: Apache Beam Spark runner

2016-03-19 Thread Jean-Baptiste Onofré
Hi Amit, well done ;) I'm reviewing it now (as I didn't have to do it before, sorry about that). Regards JB On 03/17/2016 06:26 PM, Sela, Amit wrote: Hi all, The Apache Beam Spark runner is now available at: https://github.com/apache/incubator-beam/tree/master/runners/spark Check it out! The

bug spark should not use java.sql.timestamp was: sql timestamp timezone bug

2016-03-19 Thread Andy Davidson
Here is a nice analysis of the issue from the Cassandra mail list. (Datastax is the Databricks for Cassandra) Should I fill a bug? Kind regards Andy http://stackoverflow.com/questions/2305973/java-util-date-vs-java-sql-date and this one On Fri, Mar 18, 2016 at 11:35 AM Russell Spitzer wrote:

Re: spark launching range is 10 mins

2016-03-19 Thread Jialin Liu
Hi, I have set the partitions as 6000, and requested 100 nodes, with 32 cores each node, and the number of executors is 32 per node spark-submit --master $SPARKURL --executor-cores 32 --driver-memory 20G --executor-memory 80G single-file-test.py And I'm reading a 2.2 TB, the code, just has simpl

<    1   2