Re: Map tuple to case class in Dataset

2016-06-01 Thread Michael Armbrust
That error looks like its caused by the class being defined in the repl
itself.  $line29.$read$ is the name of out outer object that is being used
to compile the line containing case class Test(a: Int).

Is this EMR or the Apache 1.6.1 release?

On Wed, Jun 1, 2016 at 8:05 AM, Tim Gautier  wrote:

> I spun up another EC2 cluster today with Spark 1.6.1 and I still get the
> error.
>
> scala>   case class Test(a: Int)
> defined class Test
>
> scala>   Seq(1,2).toDS.map(t => Test(t)).show
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage
> 0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> 16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in
> stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition
> 39,PROCESS_LOCAL, 2386 bytes)
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage
> 0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.ExceptionInInitializerError
> at $line29.$read$$iwC.(:7)
> at $line29.$read.(:24)
> at $line29.$read$.(:28)
> at $line29.$read$.()
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at $line3.$read$$iwC$$iwC.(:15)
> at $line3.$read$$iwC.(:24)
> at $line3.$read.(:26)
> at $line3.$read$.(:30)
> at $line3.$read$.()
> ... 18 more
>
>
> On Tue, May 31, 2016 at 8:48 PM Tim Gautier  wrote:
>
>> That's really odd. I copied that code directly out of the shell and it
>> errored out on me, several times. I wonder if something I did previously
>> caused some instability. I'll see if it happens again tomorrow.
>>
>> On Tue, May 31, 2016, 8:37 PM Ted Yu  wrote:
>>
>>> Using spark-shell of 1.6.1 :
>>>
>>> scala> case class Test(a: Int)
>>> defined class Test
>>>
>>> scala> Seq(1,2).toDS.map(t => Test(t)).show
>>> +---+
>>> |  a|
>>> +---+
>>> |  1|
>>> |  2|
>>> +---+
>>>
>>> FYI
>>>
>>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier 
>>> wrote:
>>>
 1.6.1 The exception is a null pointer exception. I'll paste the whole
 thing after I fire my cluster up again tomorrow.

 I take it by the responses that this is supposed to work?

 Anyone know when the next version is coming out? I keep running into
 bugs with 1.6.1 that are hindering my progress.

 On Tue, May 31, 2016, 8:21 PM Saisai Shao 
 wrote:

> It works fine in my local test, I'm using latest master, maybe this
> bug is already fixed.
>
> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <
> mich...@databricks.com> wrote:
>
>> Version of Spark? What is the exception?
>>
>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier 
>> wrote:
>>
>>> How 

Re: Map tuple to case class in Dataset

2016-06-01 Thread Tim Gautier
I was getting a warning about /tmp/hive not being writable whenever I
started spark-shell, but I was ignoring it. I decided to set the
permissions to 777 and restart the shell. After doing that, I now get the
same result as Ted Yu when running Seq(1,2).toDS.map(t => Test(t)).show.

On Wed, Jun 1, 2016 at 9:05 AM Tim Gautier  wrote:

> I spun up another EC2 cluster today with Spark 1.6.1 and I still get the
> error.
>
> scala>   case class Test(a: Int)
> defined class Test
>
> scala>   Seq(1,2).toDS.map(t => Test(t)).show
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage
> 0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> 16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in
> stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition
> 39,PROCESS_LOCAL, 2386 bytes)
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage
> 0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.ExceptionInInitializerError
> at $line29.$read$$iwC.(:7)
> at $line29.$read.(:24)
> at $line29.$read$.(:28)
> at $line29.$read$.()
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at $line3.$read$$iwC$$iwC.(:15)
> at $line3.$read$$iwC.(:24)
> at $line3.$read.(:26)
> at $line3.$read$.(:30)
> at $line3.$read$.()
> ... 18 more
>
>
> On Tue, May 31, 2016 at 8:48 PM Tim Gautier  wrote:
>
>> That's really odd. I copied that code directly out of the shell and it
>> errored out on me, several times. I wonder if something I did previously
>> caused some instability. I'll see if it happens again tomorrow.
>>
>> On Tue, May 31, 2016, 8:37 PM Ted Yu  wrote:
>>
>>> Using spark-shell of 1.6.1 :
>>>
>>> scala> case class Test(a: Int)
>>> defined class Test
>>>
>>> scala> Seq(1,2).toDS.map(t => Test(t)).show
>>> +---+
>>> |  a|
>>> +---+
>>> |  1|
>>> |  2|
>>> +---+
>>>
>>> FYI
>>>
>>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier 
>>> wrote:
>>>
 1.6.1 The exception is a null pointer exception. I'll paste the whole
 thing after I fire my cluster up again tomorrow.

 I take it by the responses that this is supposed to work?

 Anyone know when the next version is coming out? I keep running into
 bugs with 1.6.1 that are hindering my progress.

 On Tue, May 31, 2016, 8:21 PM Saisai Shao 
 wrote:

> It works fine in my local test, I'm using latest master, maybe this
> bug is already fixed.
>
> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <
> mich...@databricks.com> wrote:
>
>> Version of Spark? What is the exception?
>>
>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier 

Re: Map tuple to case class in Dataset

2016-06-01 Thread Tim Gautier
I spun up another EC2 cluster today with Spark 1.6.1 and I still get the
error.

scala>   case class Test(a: Int)
defined class Test

scala>   Seq(1,2).toDS.map(t => Test(t)).show
16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage
0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal):
java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$
at
$line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
at
$line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in
stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition
39,PROCESS_LOCAL, 2386 bytes)
16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage
0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal):
java.lang.ExceptionInInitializerError
at $line29.$read$$iwC.(:7)
at $line29.$read.(:24)
at $line29.$read$.(:28)
at $line29.$read$.()
at
$line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
at
$line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at $line3.$read$$iwC$$iwC.(:15)
at $line3.$read$$iwC.(:24)
at $line3.$read.(:26)
at $line3.$read$.(:30)
at $line3.$read$.()
... 18 more


On Tue, May 31, 2016 at 8:48 PM Tim Gautier  wrote:

> That's really odd. I copied that code directly out of the shell and it
> errored out on me, several times. I wonder if something I did previously
> caused some instability. I'll see if it happens again tomorrow.
>
> On Tue, May 31, 2016, 8:37 PM Ted Yu  wrote:
>
>> Using spark-shell of 1.6.1 :
>>
>> scala> case class Test(a: Int)
>> defined class Test
>>
>> scala> Seq(1,2).toDS.map(t => Test(t)).show
>> +---+
>> |  a|
>> +---+
>> |  1|
>> |  2|
>> +---+
>>
>> FYI
>>
>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier 
>> wrote:
>>
>>> 1.6.1 The exception is a null pointer exception. I'll paste the whole
>>> thing after I fire my cluster up again tomorrow.
>>>
>>> I take it by the responses that this is supposed to work?
>>>
>>> Anyone know when the next version is coming out? I keep running into
>>> bugs with 1.6.1 that are hindering my progress.
>>>
>>> On Tue, May 31, 2016, 8:21 PM Saisai Shao 
>>> wrote:
>>>
 It works fine in my local test, I'm using latest master, maybe this bug
 is already fixed.

 On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Version of Spark? What is the exception?
>
> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier 
> wrote:
>
>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>> Dataset[]?
>>
>> I tried to use a map, but it throws exceptions:
>>
>> case class Test(a: Int)
>> Seq(1,2).toDS.map(t => Test(t)).show
>>
>> Thanks,
>> Tim
>>
>
>

>>


Re: Map tuple to case class in Dataset

2016-05-31 Thread Tim Gautier
That's really odd. I copied that code directly out of the shell and it
errored out on me, several times. I wonder if something I did previously
caused some instability. I'll see if it happens again tomorrow.

On Tue, May 31, 2016, 8:37 PM Ted Yu  wrote:

> Using spark-shell of 1.6.1 :
>
> scala> case class Test(a: Int)
> defined class Test
>
> scala> Seq(1,2).toDS.map(t => Test(t)).show
> +---+
> |  a|
> +---+
> |  1|
> |  2|
> +---+
>
> FYI
>
> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier 
> wrote:
>
>> 1.6.1 The exception is a null pointer exception. I'll paste the whole
>> thing after I fire my cluster up again tomorrow.
>>
>> I take it by the responses that this is supposed to work?
>>
>> Anyone know when the next version is coming out? I keep running into bugs
>> with 1.6.1 that are hindering my progress.
>>
>> On Tue, May 31, 2016, 8:21 PM Saisai Shao  wrote:
>>
>>> It works fine in my local test, I'm using latest master, maybe this bug
>>> is already fixed.
>>>
>>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust >> > wrote:
>>>
 Version of Spark? What is the exception?

 On Tue, May 31, 2016 at 4:17 PM, Tim Gautier 
 wrote:

> How should I go about mapping from say a Dataset[(Int,Int)] to a
> Dataset[]?
>
> I tried to use a map, but it throws exceptions:
>
> case class Test(a: Int)
> Seq(1,2).toDS.map(t => Test(t)).show
>
> Thanks,
> Tim
>


>>>
>


Re: Map tuple to case class in Dataset

2016-05-31 Thread Ted Yu
Using spark-shell of 1.6.1 :

scala> case class Test(a: Int)
defined class Test

scala> Seq(1,2).toDS.map(t => Test(t)).show
+---+
|  a|
+---+
|  1|
|  2|
+---+

FYI

On Tue, May 31, 2016 at 7:35 PM, Tim Gautier  wrote:

> 1.6.1 The exception is a null pointer exception. I'll paste the whole
> thing after I fire my cluster up again tomorrow.
>
> I take it by the responses that this is supposed to work?
>
> Anyone know when the next version is coming out? I keep running into bugs
> with 1.6.1 that are hindering my progress.
>
> On Tue, May 31, 2016, 8:21 PM Saisai Shao  wrote:
>
>> It works fine in my local test, I'm using latest master, maybe this bug
>> is already fixed.
>>
>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust 
>> wrote:
>>
>>> Version of Spark? What is the exception?
>>>
>>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier 
>>> wrote:
>>>
 How should I go about mapping from say a Dataset[(Int,Int)] to a
 Dataset[]?

 I tried to use a map, but it throws exceptions:

 case class Test(a: Int)
 Seq(1,2).toDS.map(t => Test(t)).show

 Thanks,
 Tim

>>>
>>>
>>


Re: Map tuple to case class in Dataset

2016-05-31 Thread Tim Gautier
1.6.1 The exception is a null pointer exception. I'll paste the whole thing
after I fire my cluster up again tomorrow.

I take it by the responses that this is supposed to work?

Anyone know when the next version is coming out? I keep running into bugs
with 1.6.1 that are hindering my progress.

On Tue, May 31, 2016, 8:21 PM Saisai Shao  wrote:

> It works fine in my local test, I'm using latest master, maybe this bug is
> already fixed.
>
> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust 
> wrote:
>
>> Version of Spark? What is the exception?
>>
>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier 
>> wrote:
>>
>>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>>> Dataset[]?
>>>
>>> I tried to use a map, but it throws exceptions:
>>>
>>> case class Test(a: Int)
>>> Seq(1,2).toDS.map(t => Test(t)).show
>>>
>>> Thanks,
>>> Tim
>>>
>>
>>
>


Re: Map tuple to case class in Dataset

2016-05-31 Thread Saisai Shao
It works fine in my local test, I'm using latest master, maybe this bug is
already fixed.

On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust 
wrote:

> Version of Spark? What is the exception?
>
> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier 
> wrote:
>
>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>> Dataset[]?
>>
>> I tried to use a map, but it throws exceptions:
>>
>> case class Test(a: Int)
>> Seq(1,2).toDS.map(t => Test(t)).show
>>
>> Thanks,
>> Tim
>>
>
>


Re: Map tuple to case class in Dataset

2016-05-31 Thread Michael Armbrust
Version of Spark? What is the exception?

On Tue, May 31, 2016 at 4:17 PM, Tim Gautier  wrote:

> How should I go about mapping from say a Dataset[(Int,Int)] to a
> Dataset[]?
>
> I tried to use a map, but it throws exceptions:
>
> case class Test(a: Int)
> Seq(1,2).toDS.map(t => Test(t)).show
>
> Thanks,
> Tim
>


Map tuple to case class in Dataset

2016-05-31 Thread Tim Gautier
How should I go about mapping from say a Dataset[(Int,Int)] to a
Dataset[]?

I tried to use a map, but it throws exceptions:

case class Test(a: Int)
Seq(1,2).toDS.map(t => Test(t)).show

Thanks,
Tim