Quick question about hive-exec 1.2.1.spark2

2016-08-03 Thread Tao Li
Hi, The spark-hive module has a dependency on hive-exec module (a custom built module from "Hive on Spark” project). Can someone point me to the source code repo of the hive-exec module? Thanks. Here is the maven repo link:

Re: AccumulatorV2 += operator

2016-08-03 Thread Holden Karau
Ah in that case the programming guides text is still talking about the deprecated accumulator API despite having an updated code sample (the way it suggests making an accumulator is also deprecated). I think the fix is updating the programming guide rather than adding += to the API. On Wednesday,

Re: How does MapWithStateRDD distribute the data

2016-08-03 Thread Cody Koeninger
Are you using KafkaUtils.createDirectStream? On Wed, Aug 3, 2016 at 9:42 AM, Soumitra Johri wrote: > Hi, > > I am running a steaming job with 4 executors and 16 cores so that each > executor has two cores to work with. The input Kafka topic has 4 partitions. > With

Re: AccumulatorV2 += operator

2016-08-03 Thread Bryan Cutler
No, I was referring to the programming guide section on accumulators, it says " Tasks running on a cluster can then add to it using the add method or the += operator (in Scala and Python)." On Aug 2, 2016 2:52 PM, "Holden Karau" wrote: > I believe it was intentional with

How does MapWithStateRDD distribute the data

2016-08-03 Thread Soumitra Johri
Hi, I am running a steaming job with 4 executors and 16 cores so that each executor has two cores to work with. The input Kafka topic has 4 partitions. With this given configuration I was expecting MapWithStateRDD to be evenly distributed across all executors, how ever I see that it uses only two

Re: What happens in Dataset limit followed by rdd

2016-08-03 Thread Maciej Szymkiewicz
Pushing down across mapping would be great. If you're used to SQL or work frequently with lazy collections this is a behavior you learn to expect. On 08/02/2016 02:12 PM, Sun Rui wrote: > Spark does optimise subsequent limits, for example: > > scala> df1.limit(3).limit(1).explain > ==

Spark SQL and Kryo registration

2016-08-03 Thread Olivier Girardot
Hi everyone, I'm currently to use Spark 2.0.0 and making Dataframes work with kryo.registrationRequired=true Is it even possible at all considering the codegen ? Regards, Olivier Girardot | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Saisai Shao
Use dominant resource calculator instead of default resource calculator will get the expected vcores as you wanted. Basically by default yarn does not honor cpu cores as resource, so you will always see vcore is 1 no matter what number of cores you set in spark. On Wed, Aug 3, 2016 at 12:11 PM,