+spark-user
---------- Forwarded message ----------
From: lonely Feb <[email protected]>
Date: 2015-01-16 19:09 GMT+08:00
Subject: Re: Problems with TeraValidate
To: Ewan Higgs <[email protected]>
thx a lot.
btw, here is my output:
1. when dataset is 1000g:
num records: 10000000000
checksum: 12aa5028310ea763e
part 0
lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164)
max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
part 1
lastMaxArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164)
max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
Exception in thread "main" java.lang.AssertionError: assertion failed:
current partition min < last partition max
at scala.Predef$.assert(Predef.scala:179)
at
org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:117)
at
org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:111)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
org.apache.spark.examples.terasort.TeraValidate$.validate(TeraValidate.scala:111)
at
org.apache.spark.examples.terasort.TeraValidate$.main(TeraValidate.scala:59)
at
org.apache.spark.examples.terasort.TeraValidate.main(TeraValidate.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2. when dataset is 200m:
um records: 2000000
checksum: ca93e5d2fad40
part 0
lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
min ArrayBuffer(82, 24, 27, 218, 62, 68, 174, 208, 69, 78)
max ArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252)
part 1
lastMaxArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252)
min ArrayBuffer(82, 24, 27, 218, 62, 68, 174, 208, 69, 78)
max ArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252)
Exception in thread "main" java.lang.AssertionError: assertion failed:
current partition min < last partition max
at scala.Predef$.assert(Predef.scala:179)
at
org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:117)
at
org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:111)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
org.apache.spark.examples.terasort.TeraValidate$.validate(TeraValidate.scala:111)
at
org.apache.spark.examples.terasort.TeraValidate$.main(TeraValidate.scala:59)
at
org.apache.spark.examples.terasort.TeraValidate.main(TeraValidate.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I suspect sth. is wrong with the function "clone".
2015-01-16 19:02 GMT+08:00 Ewan Higgs <[email protected]>:
> Hi Ionely,
> I am looking at this now. If you need to validate a terasort benchmark as
> soon as possible, I would use Hadoop's TeraValidate.
>
> I'll let you know when I have a fix.
>
> Yours,
> Ewan Higgs
>
>
> On 16/01/15 09:47, lonely Feb wrote:
>
>> Hi i run your terasort program on my spark cluster, when the dataset is
>> small (below 1000g) everything goes fine, but when the dataset is over
>> 1000g, the TeraValidate always assert error with:
>> current partition min < last partition max
>>
>> eg. output is :
>> num records: 10000000000
>> checksum: 12aa5028310ea763e
>> part 0
>> lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
>> min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164)
>> max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
>> part 1
>> lastMaxArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
>> min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164)
>> max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
>> Exception in thread "main" java.lang.AssertionError: assertion failed:
>> current partition min < last partition max
>> at scala.Predef$.assert(Predef.scala:179)
>> at org.apache.spark.examples.terasort.TeraValidate$$
>> anonfun$validate$3.apply(TeraValidate.scala:117)
>> at org.apache.spark.examples.terasort.TeraValidate$$
>> anonfun$validate$3.apply(TeraValidate.scala:111)
>> at scala.collection.IndexedSeqOptimized$class.
>> foreach(IndexedSeqOptimized.scala:33)
>> at scala.collection.mutable.ArrayOps$ofRef.foreach(
>> ArrayOps.scala:108)
>> at org.apache.spark.examples.terasort.TeraValidate$.
>> validate(TeraValidate.scala:111)
>> at org.apache.spark.examples.terasort.TeraValidate$.main(
>> TeraValidate.scala:59)
>> at org.apache.spark.examples.terasort.TeraValidate.main(
>> TeraValidate.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(
>> NativeMethodAccessorImpl.java:57)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:616)
>> at org.apache.spark.deploy.SparkSubmit$.launch(
>> SparkSubmit.scala:329)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
>> scala:75)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> what's the problem?
>>
>
>