Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Zhan Zhang
Is it because countByValue or toArray put too much stress on the driver, if 
there are many unique words 
To me it is a typical word count problem, then you can solve it as follows 
(correct me if I am wrong)

val textFile = sc.textFile(“file)
val counts = textFile.flatMap(line = line.split( )).map(word = (word, 
1)).reduceByKey((a, b) = a + b)
counts.saveAsTextFile(“file”)//any way you don’t want to collect results to 
master, and instead putting them in file.

Thanks.

Zhan Zhang

On Aug 16, 2014, at 9:18 AM, Jerry Ye jerr...@gmail.com wrote:

 The job ended up running overnight with no progress. :-(
 
 
 On Sat, Aug 16, 2014 at 12:16 AM, Jerry Ye jerr...@gmail.com wrote:
 
 Hi Xiangrui,
 I actually tried branch-1.1 and master and it resulted in the job being
 stuck at the TaskSetManager:
 14/08/16 06:55:48 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
 with 2 tasks
 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:0 as
 TID 2 on executor 8: ip-10-226-199-225.us-west-2.compute.internal
 (PROCESS_LOCAL)
 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as
 28055875 bytes in 162 ms
 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:1 as
 TID 3 on executor 0: ip-10-249-53-62.us-west-2.compute.internal
 (PROCESS_LOCAL)
 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as
 28055875 bytes in 178 ms
 
 It's been 10 minutes with no progress on relatively small data. I'll let
 it run overnight and update in the morning. Is there some place that I
 should look to see what is happening? I tried to ssh into the executor and
 look at /root/spark/logs but there wasn't anything informative there.
 
 I'm sure using CountByValue works fine but my use of a HashMap is only an
 example. In my actual task, I'm loading a Trie data structure to perform
 efficient string matching between a dataset of locations and strings
 possibly containing mentions of locations.
 
 This seems like a common thing, to process input with a relatively memory
 intensive object like a Trie. I hope I'm not missing something obvious. Do
 you know of any example code like my use case?
 
 Thanks!
 
 - jerry
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Jerry Ye
Hi Zhan,
Thanks for looking into this. I'm actually using the hash map as an example
of the simplest snippet of code that is failing for me. I know that this is
just the word count. In my actual problem I'm using a Trie data structure
to find substring matches.


On Sun, Aug 17, 2014 at 11:35 PM, Zhan Zhang zzh...@hortonworks.com wrote:

 Is it because countByValue or toArray put too much stress on the driver,
 if there are many unique words
 To me it is a typical word count problem, then you can solve it as follows
 (correct me if I am wrong)

 val textFile = sc.textFile(“file)
 val counts = textFile.flatMap(line = line.split( )).map(word = (word,
 1)).reduceByKey((a, b) = a + b)
 counts.saveAsTextFile(“file”)//any way you don’t want to collect results
 to master, and instead putting them in file.

 Thanks.

 Zhan Zhang

 On Aug 16, 2014, at 9:18 AM, Jerry Ye jerr...@gmail.com wrote:

  The job ended up running overnight with no progress. :-(
 
 
  On Sat, Aug 16, 2014 at 12:16 AM, Jerry Ye jerr...@gmail.com wrote:
 
  Hi Xiangrui,
  I actually tried branch-1.1 and master and it resulted in the job being
  stuck at the TaskSetManager:
  14/08/16 06:55:48 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
  with 2 tasks
  14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:0 as
  TID 2 on executor 8: ip-10-226-199-225.us-west-2.compute.internal
  (PROCESS_LOCAL)
  14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:0
 as
  28055875 bytes in 162 ms
  14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:1 as
  TID 3 on executor 0: ip-10-249-53-62.us-west-2.compute.internal
  (PROCESS_LOCAL)
  14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:1
 as
  28055875 bytes in 178 ms
 
  It's been 10 minutes with no progress on relatively small data. I'll let
  it run overnight and update in the morning. Is there some place that I
  should look to see what is happening? I tried to ssh into the executor
 and
  look at /root/spark/logs but there wasn't anything informative there.
 
  I'm sure using CountByValue works fine but my use of a HashMap is only
 an
  example. In my actual task, I'm loading a Trie data structure to perform
  efficient string matching between a dataset of locations and strings
  possibly containing mentions of locations.
 
  This seems like a common thing, to process input with a relatively
 memory
  intensive object like a Trie. I hope I'm not missing something obvious.
 Do
  you know of any example code like my use case?
 
  Thanks!
 
  - jerry
 


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Zhan Zhang
Not sure exactly how you use it. My understanding is that in spark it would be 
better to keep the overhead of driver as less as possible. Is it possible to 
broadcast trie to executors, do computation there and then aggregate the 
counters (??) in reduct phase?

Thanks.

Zhan Zhang

On Aug 18, 2014, at 8:54 AM, Jerry Ye jerr...@gmail.com wrote:

 Hi Zhan,
 Thanks for looking into this. I'm actually using the hash map as an example 
 of the simplest snippet of code that is failing for me. I know that this is 
 just the word count. In my actual problem I'm using a Trie data structure to 
 find substring matches.
 
 
 On Sun, Aug 17, 2014 at 11:35 PM, Zhan Zhang zzh...@hortonworks.com wrote:
 Is it because countByValue or toArray put too much stress on the driver, if 
 there are many unique words
 To me it is a typical word count problem, then you can solve it as follows 
 (correct me if I am wrong)
 
 val textFile = sc.textFile(“file)
 val counts = textFile.flatMap(line = line.split( )).map(word = (word, 
 1)).reduceByKey((a, b) = a + b)
 counts.saveAsTextFile(“file”)//any way you don’t want to collect results to 
 master, and instead putting them in file.
 
 Thanks.
 
 Zhan Zhang
 
 On Aug 16, 2014, at 9:18 AM, Jerry Ye jerr...@gmail.com wrote:
 
  The job ended up running overnight with no progress. :-(
 
 
  On Sat, Aug 16, 2014 at 12:16 AM, Jerry Ye jerr...@gmail.com wrote:
 
  Hi Xiangrui,
  I actually tried branch-1.1 and master and it resulted in the job being
  stuck at the TaskSetManager:
  14/08/16 06:55:48 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
  with 2 tasks
  14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:0 as
  TID 2 on executor 8: ip-10-226-199-225.us-west-2.compute.internal
  (PROCESS_LOCAL)
  14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as
  28055875 bytes in 162 ms
  14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:1 as
  TID 3 on executor 0: ip-10-249-53-62.us-west-2.compute.internal
  (PROCESS_LOCAL)
  14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as
  28055875 bytes in 178 ms
 
  It's been 10 minutes with no progress on relatively small data. I'll let
  it run overnight and update in the morning. Is there some place that I
  should look to see what is happening? I tried to ssh into the executor and
  look at /root/spark/logs but there wasn't anything informative there.
 
  I'm sure using CountByValue works fine but my use of a HashMap is only an
  example. In my actual task, I'm loading a Trie data structure to perform
  efficient string matching between a dataset of locations and strings
  possibly containing mentions of locations.
 
  This seems like a common thing, to process input with a relatively memory
  intensive object like a Trie. I hope I'm not missing something obvious. Do
  you know of any example code like my use case?
 
  Thanks!
 
  - jerry
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Shuffle overlapping

2014-08-18 Thread zycodefish
Hi all,

I'm reading the implementation of the shuffle in Spark. 
My understanding is that it's not overlapping with upstream stage.

Is it helpful to overlap the computation of upstream stage w/ the shuffle (I
mean the network copy, like in Hadoop)? If it is, is there any plan to
implement it in the any version?

--Z



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Shuffle-overlapping-tp7902.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Shuffle overlapping

2014-08-18 Thread Josh Rosen
I think there's some discussion of this at
https://issues.apache.org/jira/browse/SPARK-2387 and
https://github.com/apache/spark/pull/1328.

- Josh


On Mon, Aug 18, 2014 at 9:46 AM, zycodefish opensourcecodef...@gmail.com
wrote:

 Hi all,

 I'm reading the implementation of the shuffle in Spark.
 My understanding is that it's not overlapping with upstream stage.

 Is it helpful to overlap the computation of upstream stage w/ the shuffle
 (I
 mean the network copy, like in Hadoop)? If it is, is there any plan to
 implement it in the any version?

 --Z



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Shuffle-overlapping-tp7902.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: mvn test error

2014-08-18 Thread Cheng Lian
The exception indicates that the forked process doesn’t executed as
expected, thus the test case *should* fail.

Instead of replacing the exception with a logWarning, capturing and
printing stdout/stderr of the forked process can be helpful for diagnosis.
Currently the only information we have at hand is the process exit code,
it’s hard to determine the reason why the forked process fails.
​


On Tue, Aug 19, 2014 at 1:27 PM, scwf wangf...@huawei.com wrote:

 hi, all
   I notice that jenkins may also throw this error when running tests(
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/
 consoleFull).


 This is because in Utils.executeAndGetOutput our progress exitCode is not
 0, may be we should logWarning here rather than throw a exception?

 Utils.executeAndGetOutput {
 val exitCode = process.waitFor()
 stdoutThread.join()   // Wait for it to finish reading output
 if (exitCode != 0) {
   throw new SparkException(Process  + command +  exited with code 
 + exitCode)
 }
 }

 any idea?



 On 2014/8/15 11:01, scwf wrote:

 env: ubuntu 14.04 + spark master buranch

 mvn -Pyarn -Phive -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
 package

 mvn -Pyarn -Phadoop-2.4 -Phive test

 test error:

 DriverSuite:
 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath
 - driver should exit after finishing *** FAILED ***
SparkException was thrown during property evaluation.
 (DriverSuite.scala:40)
  Message: Process List(./bin/spark-class, 
 org.apache.spark.DriverWithoutCleanup,
 local) exited with code 1
  Occurred at table row 0 (zero based, not counting headings), which
 had values (
master = local
  )

 SparkSubmitSuite:
 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath
 - launch simple application with spark-submit *** FAILED ***
org.apache.spark.SparkException: Process List(./bin/spark-submit,
 --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp,
 --master, local, file:/tmp/1408015655220-0/testJar-1408015655220.jar)
 exited with code 1

at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:810)
at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(
 SparkSubmitSuite.scala:311)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.
 apply$mcV$sp(SparkSubmitSuite.scala:291)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.
 apply(SparkSubmitSuite.scala:284)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.
 apply(SparkSubmitSuite.scala:284)
at org.scalatest.Transformer$$anonfun$apply$1.apply(
 Transformer.scala:22)
at org.scalatest.Transformer$$anonfun$apply$1.apply(
 Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
...
 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath
 - spark submit includes jars passed in through --jar *** FAILED ***
org.apache.spark.SparkException: Process List(./bin/spark-submit,
 --class, org.apache.spark.deploy.JarCreationTest, --name, testApp,
 --master, local-cluster[2,1,512], --jars, file:/tmp/1408015659416-0/
 testJar-1408015659471.jar,fi
 le:/tmp/1408015659472-0/testJar-1408015659513.jar,
 file:/tmp/1408015659415-0/testJar-1408015659416.jar) exited with code 1
at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:810)
at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(
 SparkSubmitSuite.scala:311)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.
 apply$mcV$sp(SparkSubmitSuite.scala:305)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.
 apply(SparkSubmitSuite.scala:294)
at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.
 apply(SparkSubmitSuite.scala:294)
at org.scalatest.Transformer$$anonfun$apply$1.apply(
 Transformer.scala:22)
at org.scalatest.Transformer$$anonfun$apply$1.apply(
 Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
...


 but only test the specific suite as follows will be ok:
 mvn -Pyarn -Phadoop-2.4 -Phive -DwildcardSuites=org.apache.spark.DriverSuite
 test

 it seems when run with mvn -Pyarn -Phadoop-2.4 -Phive test,the process
 with Utils.executeAndGetOutput started can not exited successfully
 (exitcode is not zero)

 anyone has idea for this?






 --

 Best Regards
 Fei Wang

 
 



 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Spark on YARN webui

2014-08-18 Thread Debasish Das
Hi,

We are running the snapshots (new spark features) on YARN and I was
wondering if the webui is available on YARN mode...

The deployment document does not mention webui on YARN mode...

Is it available ?

Thanks.
Deb