Thanks for the fast reply. I am running CDH 4.4 with the Cloudera Parcel of Spark 0.9.0, in standalone mode.
On Saturday, May 31, 2014, Aaron Davidson <ilike...@gmail.com> wrote: > First issue was because your cluster was configured incorrectly. You could > probably read 1 file because that was done on the driver node, but when it > tried to run a job on the cluster, it failed. > > Second issue, it seems that the jar containing avro is not getting > propagated to the Executors. What version of Spark are you running on? What > deployment mode (YARN, standalone, Mesos)? > > > On Sat, May 31, 2014 at 9:37 PM, Russell Jurney <russell.jur...@gmail.com> > wrote: > > Now I get this: > > scala> rdd.first > > 14/05/31 21:36:28 INFO spark.SparkContext: Starting job: first at > <console>:41 > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Got job 4 (first at > <console>:41) with 1 output partitions (allowLocal=true) > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Final stage: Stage 4 (first > at <console>:41) > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Parents of final stage: > List() > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Missing parents: List() > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Computing the requested > partition locally > > 14/05/31 21:36:28 INFO rdd.HadoopRDD: Input split: > hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00000.avro:0+3864 > > 14/05/31 21:36:28 INFO spark.SparkContext: Job finished: first at > <console>:41, took 0.037371256 s > > 14/05/31 21:36:28 INFO spark.SparkContext: Starting job: first at > <console>:41 > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Got job 5 (first at > <console>:41) with 16 output partitions (allowLocal=true) > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Final stage: Stage 5 (first > at <console>:41) > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Parents of final stage: > List() > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Missing parents: List() > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Submitting Stage 5 > (HadoopRDD[0] at hadoopRDD at <console>:37), which has no missing parents > > 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Submitting 16 missing tasks > from Stage 5 (HadoopRDD[0] at hadoopRDD at <console>:37) > > 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 > with 16 tasks > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:0 as > TID 92 on executor 2: hivecluster3 (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:0 as > 1294 bytes in 1 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:3 as > TID 93 on executor 1: hivecluster5.labs.lan (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:3 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:1 as > TID 94 on executor 4: hivecluster4 (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:1 as > 1294 bytes in 1 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:2 as > TID 95 on executor 0: hivecluster6.labs.lan (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:2 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:4 as > TID 96 on executor 3: hivecluster1.labs.lan (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:4 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:6 as > TID 97 on executor 2: hivecluster3 (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:6 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:5 as > TID 98 on executor 1: hivecluster5.labs.lan (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:5 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:8 as > TID 99 on executor 4: hivecluster4 (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:8 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:7 as > TID 100 on executor 0: hivecluster6.labs.lan (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:7 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:10 as > TID 101 on executor 3: hivecluster1.labs.lan (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:10 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:14 as > TID 102 on executor 2: hivecluster3 (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:14 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:9 as > TID 103 on executor 1: hivecluster5.labs.lan (NODE_LOCAL) > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:9 as > 1294 bytes in 0 ms > > 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:11 as > TID 104 on executor 4: hivecluster4 (N > > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com