date:20150130

Re: Error Compiling

2015-01-30 Thread Akhil Das

This is how i do it:

val tmp = test.map(x => (x, 1L)).reduceByWindow({ case ((word1, count1),
(word2, count2)) => (word1 + " " + word2, count1 + count2)}, Seconds(10),
Seconds(10))


In your case you are actually having a type mismatch:

[image: Inline image 1]



Thanks
Best Regards

On Sat, Jan 31, 2015 at 5:30 AM, Eduardo Costa Alfaia <
e.costaalf...@unibs.it> wrote:

> Hi Guys,
>
> some idea how solve this error
>
> [error]
> /sata_disk/workspace/spark-1.1.1/examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala:76:
> missing parameter type for expanded function ((x$6, x$7) => x$6.$plus(x$7))
>
> [error] val wordCounts = words.map(x => (x, 1L)).reduceByWindow(_ +
> _, _ - _, Minutes(1), Seconds(2), 2)
>
> Thanks
>
> Informativa sulla Privacy: http://www.unibs.it/node/8155

Re: Long pauses after writing to sequence files

2015-01-30 Thread Akhil Das

Not quiet sure, but it could be the GC Pause, if you are holding too much
objects in memory. You can check this tuning
 part if you haven't
already been through it.

Thanks
Best Regards

On Sat, Jan 31, 2015 at 7:22 AM, Corey Nolet  wrote:

> We have a series of spark jobs which run in succession over various cached
> datasets, do small groups and transforms, and then call
> saveAsSequenceFile() on them.
>
> Each call to save as a sequence file appears to have done its work, the
> task says it completed in "xxx.x seconds" but then it pauses and the
> pauses are quite significant- sometimes up to 2 minutes. We are trying to
> figure out what's going on during this pause- if the executors are really
> still writing to the sequence files or if maybe a race condition is
> happening on an executor which is causing timeouts.
>
> Any ideas? Anyone else seen this happening?
>
>
> We also tried running all the saveAsSequenceFile calls in separate futures
> to see if maybe the waiting would still only take 1-2 minutes but it looks
> like the waiting still takes the sum of the amount  of time it would have
> originally (several several minutes). Our job runs, in its entirety, 35
> minutes and we're estimating that we're spending at least 16 minutes in
> this paused state. What I haven't been able to do is figure out how to
> trace through all the executors. Is there a way to do this? The event logs
> in yarn don't seem to help much with this.
>

Re: measuring time taken in map, reduceByKey, filter, flatMap

2015-01-30 Thread Akhil Das

I believe From the webui (running on port 8080) you will get these
measurements.

Thanks
Best Regards

On Sat, Jan 31, 2015 at 9:29 AM, Josh J  wrote:

> Hi,
>
> I have a stream pipeline which invokes map, reduceByKey, filter, and
> flatMap. How can I measure the time taken in each stage?
>
> Thanks,
> Josh
>

Re: Build error

2015-01-30 Thread Tathagata Das

That is a known issue uncovered last week. It fails on certain
environments, not on Jenkins which is our testing environment.
There is already a PR up to fix it. For now you can build using "mvn
package -DskipTests"
TD

On Fri, Jan 30, 2015 at 8:59 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Off master, got this error; is that typical?
>
> ---
>  T E S T S
> ---
> Running org.apache.spark.streaming.mqtt.JavaMQTTStreamSuite
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.495 sec
> - in org.apache.spark.streaming.mqtt.JavaMQTTStreamSuite
>
> Results :
>
>
>
>
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
>
> [INFO]
> [INFO] --- scalatest-maven-plugin:1.0:test (test) @
> spark-streaming-mqtt_2.10 ---
> Discovery starting.
> Discovery completed in 498 milliseconds.
> Run starting. Expected test count is: 1
> MQTTStreamSuite:
> - mqtt input stream *** FAILED ***
>   org.eclipse.paho.client.mqttv3.MqttException: Too many publishes in
> progress
>   at
> org.eclipse.paho.client.mqttv3.internal.ClientState.send(ClientState.java:432)
>   at
> org.eclipse.paho.client.mqttv3.internal.ClientComms.internalSend(ClientComms.java:121)
>   at
> org.eclipse.paho.client.mqttv3.internal.ClientComms.sendNoWait(ClientComms.java:139)
>   at org.eclipse.paho.client.mqttv3.MqttTopic.publish(MqttTopic.java:107)
>   at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$publishData$1.apply(MQTTStreamSuite.scala:125)
>   at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$publishData$1.apply(MQTTStreamSuite.scala:124)
>   at scala.collection.immutable.Range.foreach(Range.scala:141)
>   at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite.publishData(MQTTStreamSuite.scala:124)
>   at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$3.apply$mcV$sp(MQTTStreamSuite.scala:78)
>   at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$3.apply(MQTTStreamSuite.scala:66)
>   ...
> Exception in thread "Thread-20" org.apache.spark.SparkException: Job
> cancelled because SparkContext was shut down
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:690)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:689)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:689)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1384)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81)
> at
> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1319)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1250)
> at
> org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:510)
> at
> org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:485)
> at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$2.apply$mcV$sp(MQTTStreamSuite.scala:59)
> at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$2.apply(MQTTStreamSuite.scala:57)
> at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$2.apply(MQTTStreamSuite.scala:57)
> at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:210)
> at
> org.apache.spark.streaming.mqtt.MQTTStreamSuite.runTest(MQTTStreamSuite.scala:38)
> at
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
> at
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
> at
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
> at
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> at org.scalatest.SuperEngine.org
> $scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
> at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
> at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
> at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
> at org.scalatest.Suite$class.run(Suite.scala:1424)
> at org.scalatest.FunSuite.org
> $scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
> at
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
> at
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
> at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
> at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
> at org.apache.spark.streaming.mqtt.MQTTStreamSuite.org
> $scalatest$BeforeAndAfter$$super$run(MQTTStreamSuite.scala:38)
> at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)

Re: Spark on YARN: java.lang.ClassCastException SerializedLambda to org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1

2015-01-30 Thread Milad khajavi

Here is the same issues:
[1] 
http://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser
[2] 
http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJUHuJoE7nP6MMOJJKTL6kZtamQ=qhym1aozmezbnetla1y...@mail.gmail.com%3E#archives

Could you please explain your exact effort? show the code that you are
working on it?

On Thu, Jan 22, 2015 at 12:29 PM, thanhtien522  wrote:
> Update: I deployed a stand-alone spark in localhost then set Master as
> spark://localhost:7077 and it met the same issue
> Don't know how to solve it.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-java-lang-ClassCastException-SerializedLambda-to-org-apache-spark-api-java-function-Fu1-tp21261p21315.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>



-- 
Milād Khājavi
http://blog.khajavi.ir
Having the source means you can do it yourself.
I tried to change the world, but I couldn’t find the source code.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Build error

2015-01-30 Thread Andrew Musselman

Off master, got this error; is that typical?

---
 T E S T S
---
Running org.apache.spark.streaming.mqtt.JavaMQTTStreamSuite
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.495 sec -
in org.apache.spark.streaming.mqtt.JavaMQTTStreamSuite

Results :




Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (test) @
spark-streaming-mqtt_2.10 ---
Discovery starting.
Discovery completed in 498 milliseconds.
Run starting. Expected test count is: 1
MQTTStreamSuite:
- mqtt input stream *** FAILED ***
  org.eclipse.paho.client.mqttv3.MqttException: Too many publishes in
progress
  at
org.eclipse.paho.client.mqttv3.internal.ClientState.send(ClientState.java:432)
  at
org.eclipse.paho.client.mqttv3.internal.ClientComms.internalSend(ClientComms.java:121)
  at
org.eclipse.paho.client.mqttv3.internal.ClientComms.sendNoWait(ClientComms.java:139)
  at org.eclipse.paho.client.mqttv3.MqttTopic.publish(MqttTopic.java:107)
  at
org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$publishData$1.apply(MQTTStreamSuite.scala:125)
  at
org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$publishData$1.apply(MQTTStreamSuite.scala:124)
  at scala.collection.immutable.Range.foreach(Range.scala:141)
  at
org.apache.spark.streaming.mqtt.MQTTStreamSuite.publishData(MQTTStreamSuite.scala:124)
  at
org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$3.apply$mcV$sp(MQTTStreamSuite.scala:78)
  at
org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$3.apply(MQTTStreamSuite.scala:66)
  ...
Exception in thread "Thread-20" org.apache.spark.SparkException: Job
cancelled because SparkContext was shut down
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:690)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:689)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:689)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1384)
at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1319)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1250)
at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:510)
at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:485)
at
org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$2.apply$mcV$sp(MQTTStreamSuite.scala:59)
at
org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$2.apply(MQTTStreamSuite.scala:57)
at
org.apache.spark.streaming.mqtt.MQTTStreamSuite$$anonfun$2.apply(MQTTStreamSuite.scala:57)
at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:210)
at
org.apache.spark.streaming.mqtt.MQTTStreamSuite.runTest(MQTTStreamSuite.scala:38)
at
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.org
$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at org.scalatest.FunSuite.org
$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
at
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
at org.apache.spark.streaming.mqtt.MQTTStreamSuite.org
$scalatest$BeforeAndAfter$$super$run(MQTTStreamSuite.scala:38)
at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
at
org.apache.spark.streaming.mqtt.MQTTStreamSuite.run(MQTTStreamSuite.scala:38)
at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
at
org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
at
org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.scalatest.Suit

measuring time taken in map, reduceByKey, filter, flatMap

2015-01-30 Thread Josh J

Hi,

I have a stream pipeline which invokes map, reduceByKey, filter, and
flatMap. How can I measure the time taken in each stage?

Thanks,
Josh

Re: HiveContext created SchemaRDD's saveAsTable is not working on 1.2.0

2015-01-30 Thread Cheng Lian

Yeah, currently there isn't such a repo. However, the Spark team is
working on this.

Cheng

On 1/30/15 8:19 AM, Ayoub wrote:

I am not personally aware of a repo for snapshot builds.
In my use case, I had to build spark 1.2.1-snapshot

see https://spark.apache.org/docs/latest/building-spark.html

2015-01-30 17:11 GMT+01:00 Debajyoti Roy <[hidden email]
>:

Thanks Ayoub and Zhan,
I am new to spark and wanted to make sure i am not trying
something stupid or using a wrong API.

Is there a repo where i can pull the snapshot or nighly builds for
spark ?

On Fri, Jan 30, 2015 at 2:45 AM, Ayoub Benali <[hidden email]
> wrote:

Hello,

I had the same issue then I found this JIRA ticket
https://issues.apache.org/jira/browse/SPARK-4825
So I switched to Spark 1.2.1-snapshot witch solved the problem.

2015-01-30 8:40 GMT+01:00 Zhan Zhang <[hidden email]
>:

I think it is expected. Refer to the comments in
saveAsTable "Note that this currently only works with
SchemaRDDs that are created from a HiveContext”. If I
understand correctly, here the SchemaRDD means those
generated by HiveContext.sql, instead of applySchema.

Thanks.

Zhan Zhang

On Jan 29, 2015, at 9:38 PM, matroyd <[hidden email]
> wrote:

Hi, I am trying saveAsTable on SchemaRDD created from
HiveContext and it fails. This is on Spark 1.2.0.
Following are details of the code, command and
exceptions:

80 matches

Mail list logo