First sample with Spark Streaming and three Time's?

2014-05-18 Thread Jacek Laskowski
that would process these 4 `store`s? Jacek -- Jacek Laskowski | http://blog.japila.pl Never discourage anyone who continually makes progress, no matter how slow. Plato

Re: Unable to run a Standalone job

2014-05-23 Thread Jacek Laskowski
) at java.lang.reflect.Method.invoke(Method.java:597) Thanks, Shrikar -- Jacek Laskowski | http://blog.japila.pl Never discourage anyone who continually makes progress, no matter how slow. Plato

Re: building spark1.2 meet error

2014-12-31 Thread Jacek Laskowski
Hi, Where does the following path that appears in the logs below come from? /opt/xdsp/spark-1.2.0/H:\Soft\Maven\repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar Did you somehow point at the local maven repository that's H:\Soft\Maven? Jacek 31 gru 2014 01:48 j_soft

Re: Using spark in cluster mode

2015-10-21 Thread Jacek Laskowski
cek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Tue, Oct 20, 2015 at 5:48 PM, masoom alam <masoom.a...@wanclouds.net> wrote: > Dear all > > I want to setu

Re: SF Spark Office Hours Experiment - Friday Afternoon

2015-10-21 Thread Jacek Laskowski
Hi Holden, What a great idea! I'd love to join, but since I'm in Europe it's not gonna happen by this Fri. Any plans to visit Europe or perhaps Warsaw, Poland and host office hours here? ;-) p.s. What about an virtual event with Google Hangout on Air on? Pozdrawiam, Jacek -- Jacek Laskowski

Why does sortByKey() transformation trigger a job in spark-shell?

2015-11-02 Thread Jacek Laskowski
ion and hence should be lazy? Is this a special transformation? Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskow

Re: Why does sortByKey() transformation trigger a job in spark-shell?

2015-11-02 Thread Jacek Laskowski
er" that should not be that hard to fix. Does this still hold? I'd like to work on it if it's "simple" and doesn't get me swamped. Thanks! Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvo

ResultStage's parent stages only ShuffleMapStages?

2015-11-06 Thread Jacek Laskowski
`). Are a ResultStage's parent stages only ShuffleMapStages? Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski

How StorageLevel, CacheManager and checkpointing influence computing RDD partitions?

2015-10-10 Thread Jacek Laskowski
/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L260-L266 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L292-L298 Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me

Re: sbt error -- before Terasort compilation

2015-08-27 Thread Jacek Laskowski
project - gitter is the best option for such cases). Could you remove ~/.ivy2 and ~/.sbt and start over? Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344

Re: sbt error -- before Terasort compilation

2015-08-27 Thread Jacek Laskowski
On Thu, Aug 27, 2015 at 5:40 PM, Jacek Laskowski ja...@japila.pl wrote: Server access Error: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty url=https://jcenter.bintray.com/org/scala-sbt/sbt

Re: suggest configuration for debugging spark streaming, kafka

2015-08-27 Thread Jacek Laskowski
hadoop? Hi, It should be enough and you don't need Hadoop. I described the process of setting up both in http://blog.jaceklaskowski.pl/2015/07/20/real-time-data-processing-using-apache-kafka-and-spark-streaming.html. Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http

[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile

2015-08-27 Thread Jacek Laskowski
:spark-mllib_2.11 Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski

Re: [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile

2015-08-27 Thread Jacek Laskowski
Hi, I'm trying to nail it down myself, too. Is there anything relevant to help on my side? Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek

Re: [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile

2015-08-27 Thread Jacek Laskowski
Hi, Sean helped me offline and I sent https://github.com/apache/spark/pull/8479 for review. That's the only breaking place for the build I could find. Tested with 2.10 and 2.11. Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https

`sbt core/test` hangs on LogUrlsStandaloneSuite?

2015-09-02 Thread Jacek Laskowski
NGs and TIMED_WAITING "at sun.misc.Unsafe.park(Native Method)" Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/ja

WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,...

2015-10-03 Thread Jacek Laskowski
nt.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242) ... 15 more ``` Pozdrawiam, Jacek -- Jacek L

preferredNodeLocationData, SPARK-8949, and SparkContext - a leftover?

2015-10-03 Thread Jacek Laskowski
it up via a pull req? BTW, What do you think about removing SparkContext.preferredNodeLocationData as part of the cleanup? [1] https://issues.apache.org/jira/browse/SPARK-8949 Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://t

Re: WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,...

2015-10-03 Thread Jacek Laskowski
regularly. Give it a shot yourself as it's easy to reproduce - build Spark from the sources and have a project with libraryDependencies set with Spark core 1.6.0-SNAPSHOT. Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com

Re: preferredNodeLocationData, SPARK-8949, and SparkContext - a leftover?

2015-10-04 Thread Jacek Laskowski
important for Spark on YARN. Would "Removing the internal field and one usage of it seems OK, though I don't think it would help much of anything." still hold? I don't think so and hence the issue reported. Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jacekla

How does FAIR job scheduler work in Standalone cluster mode?

2015-10-02 Thread Jacek Laskowski
r-docs/latest/job-scheduling.html Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-las

Re: How does FAIR job scheduler work in Standalone cluster mode?

2015-10-02 Thread Jacek Laskowski
orrect or not? :( Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Fri, Oct 2, 2015 at 8:20 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >

SPARK_WORKER_INSTANCES was detected (set to '2')…This is deprecated in Spark 1.0+

2015-09-22 Thread Jacek Laskowski
config. = Why is the deprecation? Is it not supported (not recommended given the message) to have a Spark Standalone cluster and executing spark-submit on the same machine? Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https

Re: Spark as standalone or with Hadoop stack.

2015-09-22 Thread Jacek Laskowski
On Tue, Sep 22, 2015 at 10:03 PM, Ted Yu wrote: > To my knowledge, no one runs HBase on top of Mesos. Hi, That sentence caught my attention. Could you explain the reasons for not running HBase on Mesos, i.e. what makes Mesos inappropriate for HBase? Jacek

Re: In yarn-client mode, is it the driver or application master that issue commands to executors?

2015-12-07 Thread Jacek Laskowski
Hi, That's my understanding, too. Just spent an entire morning today to check it out and would be surprised to hear otherwise. Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering

Re: Scala 2.11 and Akka 2.4.0

2015-12-01 Thread Jacek Laskowski
On Tue, Dec 1, 2015 at 2:32 PM, RodrigoB wrote: > I'm currently trying to build spark with Scala 2.11 and Akka 2.4.0. Why? AFAIK Spark's leaving Akka's boat and joins Netty's. Jacek - To

Re: SparkContext.cancelJob - what part of Spark uses it? Nothing in webUI to kill jobs?

2015-12-16 Thread Jacek Laskowski
for stages that are in a sense similar to jobs so...I'm still unsure why the method is not used by Spark itself. If it's not used by Spark why could it be useful for others outside Spark? Doh, why did I come across the method? It will take some time before I forget about it :-) Pozdrawiam, Jacek -- Jacek

SparkContext.cancelJob - what part of Spark uses it? Nothing in webUI to kill jobs?

2015-12-16 Thread Jacek Laskowski
there is a way to kill/cancel stages, but no corresponding feature to kill/cancel jobs. Why? Is there a JIRA ticket to have it some day perhaps? Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-sp

Re: Window Functions importing issue in Spark 1.4.0

2016-01-07 Thread Jacek Laskowski
Ok, enuf! :) Leaving the room for now as I'm like a copycat :) https://en.wiktionary.org/wiki/enuf Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.

Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Jacek Laskowski
On Sat, Jan 9, 2016 at 1:48 PM, Sean Owen wrote: > (For similar reasons I personally don't favor supporting Java 7 or > Scala 2.10 in Spark 2.x.) That reflects my sentiments as well. Thanks Sean for bringing that up! Jacek

Re: Newbie Help for spark's not finding native hadoop warning

2015-12-24 Thread Jacek Laskowski
Hi, To add to it, you can read about the native libs in https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html. Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/master

Re: SparkContext.cancelJob - what part of Spark uses it? Nothing in webUI to kill jobs?

2015-12-17 Thread Jacek Laskowski
Thanks Mark! That helped a lot, and my takeaway from it is to...back away now! :) I'm following the advice as there's simply too much at the moment to learn in Spark. Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> ht

Re: Cant start master on windows 7

2015-11-30 Thread Jacek Laskowski
On Fri, Nov 27, 2015 at 4:27 PM, Shuo Wang wrote: > I am trying to use the start-master.sh script on windows 7. >From http://spark.apache.org/docs/latest/spark-standalone.html: "Note: The launch scripts do not currently support Windows. To run a Spark cluster on Windows,

Re: Debug Spark

2015-11-30 Thread Jacek Laskowski
5005): -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005 In IntelliJ IDEA, define a new debug configuration for Remote and press Debug. You're done. https://www.jetbrains.com/idea/help/debugging-2.html might help. Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com

Re: how to using local repository in spark[dev]

2015-11-30 Thread Jacek Laskowski
the other options a try. Can you show the exact location of the jar you want your Spark app to depend on (using `ls`) and how you defined the dependency in build.sbt? Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https

Re: Spark on yarn vs spark standalone

2015-11-30 Thread Jacek Laskowski
and submitting jobs using YARN. Standalone's an entry option where throwing in YARN could kill introducing Spark to organizations without Hadoop YARN. Just my two cents. Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https

Re: Spark, Windows 7 python shell non-reachable ip address

2015-11-30 Thread Jacek Laskowski
java.net.InetAddress.getLocalHost() that Spark executes under the covers before running into the network-related issue. Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark

Re: Spark on yarn vs spark standalone

2015-11-30 Thread Jacek Laskowski
find it now :( Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek

Re: In yarn-client mode, is it the driver or application master that issue commands to executors?

2015-11-30 Thread Jacek Laskowski
On Fri, Nov 27, 2015 at 12:12 PM, Nisrina Luthfiyati < nisrina.luthfiy...@gmail.com> wrote: > Hi all, > I'm trying to understand how yarn-client mode works and found these two > diagrams: > > > > > In the first diagram, it looks like the driver running in client directly > communicates with

Re: Cant start master on windows 7

2015-12-01 Thread Jacek Laskowski
rk-related? Thanks for any help! Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow

Re: merge 3 different types of RDDs in one

2015-12-01 Thread Jacek Laskowski
On Tue, Dec 1, 2015 at 10:57 AM, Shams ul Haque wrote: > Thanks for the suggestion, i am going to try union. ...and please report your findings back. > And what is your opinion on 2nd question. Dunno. If you find a solution, let us know. Jacek

Re: spark rdd grouping

2015-12-01 Thread Jacek Laskowski
ments: Stream((0,CompactBuffer((0,1), (0,1), (0,1), (0,1 1 with 1 elements: Stream((1,CompactBuffer((1,1), (1,1), (1,1), (1,1 Do I miss anything? Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskow

Re: Blocked REPL commands

2015-11-19 Thread Jacek Laskowski
2) ... Guess I should file an issue? Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Apache Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski Upvote at http:

Re: Spark Streaming stateful operation to HBase

2016-06-09 Thread Jacek Laskowski
Hi, Check the number of records inside the DStream at a batch before you do the save. Gist the code with mapWithState and save? Jacek On 9 Jun 2016 7:58 a.m., "soumick dasgupta" wrote: Hi, I am using mapwithstate to keep the state and then ouput the result to

Re: data frame or RDD for machine learning

2016-06-09 Thread Jacek Laskowski
Hi, Use DataFrame-based API (aka spark.ml) first and if your ml algorithm doesn't support it switch to a RDD-based API (spark.mllib). What algorithm are you going to use? Jacek On 9 Jun 2016 9:12 a.m., "pseudo oduesp" wrote: > Hi, > after spark 1.3 we have dataframe (

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Jacek Laskowski
Hi, What's the version of Spark? You're using Kafka 0.9.0.1, ain't you? What's the topic name? Jacek On 7 Jun 2016 11:06 a.m., "Dominik Safaric" wrote: > As I am trying to integrate Kafka into Spark, the following exception > occurs: > >

Re: Environment tab meaning

2016-06-07 Thread Jacek Laskowski
that the console knows what happens under the covers (and can calculate the stats). BTW, spark.ui.port (default: 4040) controls the port Web UI binds to. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https

Re: Spark 2.0 Release Date

2016-06-08 Thread Jacek Laskowski
Whoohoo! What a great news! Looks like a RC is coming...Thanks a lot, Reynold! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Jun 8, 2016 at 7:55 AM, Reynold

Re: setting column names on dataset

2016-06-07 Thread Jacek Laskowski
et[Person] = [name: string, age: int] scala> ds.as("a").joinWith(ds.as("b"), $"a.name" === $"b.name").show(false) +++ |_1 |_2 | +++ |[foo,42]|[foo,42]| |[bar,24]|[bar,24]| +++ Pozdrawiam, Jacek Lask

Re: Environment tab meaning

2016-06-07 Thread Jacek Laskowski
Hi, I'm not surprised to see Hadoop jars on the driver (yet I couldn't explain exactly why they need to be there). I can't find a way now to display the classpath for executors. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering

Re: Dataset - reduceByKey

2016-06-07 Thread Jacek Laskowski
#org.apache.spark.sql.expressions.UserDefinedAggregateFunction Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jun 7, 2016 at 8:32 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > Hello. > &g

Re: Dealing with failures

2016-06-08 Thread Jacek Laskowski
On Wed, Jun 8, 2016 at 2:38 AM, Mohit Anchlia wrote: > I am looking to write an ETL job using spark that reads data from the > source, perform transformation and insert it into the destination. Is this going to be one-time job or you want it to run every time interval? >

Re: Trainning a spark ml linear regresion model fail after migrating from 1.5.2 to 1.6.1

2016-06-08 Thread Jacek Laskowski
Hi, Is it me only to *not* see the snippets? Could you please gist 'em => https://gist.github.com ? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Ju

Optional columns in Aggregated Metrics by Executor in web UI?

2016-06-07 Thread Jacek Laskowski
ark job to execute to have Input Size / Records and Output Size / Records + Shuffle Spill (Memory) and Shuffle Spill (Disk) columns. Any ideas? Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at ht

Re: Spark 2.0 Release Date

2016-06-07 Thread Jacek Laskowski
Finally, the PMC voice on the subject. Thanks a lot, Sean! p.s. Given how much time it takes to ship 2.0 (with so many cool features already backed in!) I'd vote for releasing a few more RCs before 2.0 hits the shelves. I hope 2.0 is not Java 9 or Jigsaw ;-) Pozdrawiam, Jacek Laskowski

Re: Specify node where driver should run

2016-06-07 Thread Jacek Laskowski
Hi, It's not possible. YARN uses CPU and memory for resource constraints and places AM on any node available. Same about executors (unless data locality constraints the placement). Jacek On 6 Jun 2016 1:54 a.m., "Saiph Kappa" wrote: > Hi, > > In yarn-cluster mode, is

Re: Spark 2.0 Release Date

2016-06-07 Thread Jacek Laskowski
On Tue, Jun 7, 2016 at 1:25 PM, Arun Patel wrote: > Do we have any further updates on release date? Nope :( And it's even more quiet than I could have thought. I was so certain that today's the date. Looks like Spark Summit has "consumed" all the people behind

Re: Specify node where driver should run

2016-06-07 Thread Jacek Laskowski
Hi, --master yarn-client is deprecated and you should use --master yarn --deploy-mode client instead. There are two deploy-modes: client (default) and cluster. See http://spark.apache.org/docs/latest/cluster-overview.html. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski

Re: Spark 2.0 Release Date

2016-06-07 Thread Jacek Laskowski
On Tue, Jun 7, 2016 at 3:25 PM, Sean Owen wrote: > That's not any kind of authoritative statement, just my opinion and guess. Oh, come on. You're not **a** Sean but **the** Sean (= a PMC member and the JIRA/PRs keeper) so what you say **is** kinda official. Sorry. But don't

Re: Spark ML - Is it safe to schedule two trainings job at the same time or will worker state be corrupted?

2016-06-09 Thread Jacek Laskowski
Hi, It's supposed to work like this - share SparkContext to share datasets between threads. Ad 1. No Ad 2. Yes See CrossValidation and similar validations in spark.ml. Jacek On 9 Jun 2016 7:29 p.m., "Brandon White" wrote: > For example, say I want to train two Linear

Re: Seq.toDF vs sc.parallelize.toDF = no Spark job vs one - why?

2016-06-09 Thread Jacek Laskowski
for simple operations. In contrast, an RDD is opaque to > catalyst so we can't perform that optimization. > > On Wed, Jun 8, 2016 at 7:49 AM, Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi, >> >> I just noticed it today while toying with Spark 2.0.0 (today's build) &

Re: how to increase threads per executor

2016-06-03 Thread Jacek Laskowski
--executor-cores 1 to be exact. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Jun 3, 2016 at 12:28 AM, Mich Talebzadeh <mich.talebza...@gmail.com>

Re: Not able to write output to local filsystem from Standalone mode.

2016-05-25 Thread Jacek Laskowski
Hi Mathieu, Thanks a lot for the answer! I did *not* know it's the driver to create the directory. You said "standalone mode", is this the case for the other modes - yarn and mesos? p.s. Did you find it in the code or...just experienced before? #curious Pozdrawiam, Jacek Laskowski

Re: choice of RDD function

2016-06-15 Thread Jacek Laskowski
Hi, Good to hear so! Mind sharing a few snippets of your solution? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Jun 15, 2016 at 5:03 PM, Sivakumaran S

Re: spark standalone High availibilty issues

2016-06-15 Thread Jacek Laskowski
Can you post the error? Jacek On 14 Jun 2016 10:56 p.m., "Darshan Singh" wrote: > Hi, > > I am using standalone spark cluster and using zookeeper cluster for the > high availbilty. I am getting sometimes error when I start the master. The > error is related to Leader

Re: choice of RDD function

2016-06-15 Thread Jacek Laskowski
Hi, Ad Q1, yes. See stateful operators like mapWithState and windows. Ad Q2, RDDs should be fine (and available out of the box), but I'd give Datasets a try too since they're .toDF away. Jacek On 14 Jun 2016 10:29 p.m., "Sivakumaran S" wrote: Dear friends, I have set up

Re: Basic question on using one's own classes in the Scala app

2016-06-05 Thread Jacek Laskowski
On Sun, Jun 5, 2016 at 9:01 PM, Ashok Kumar wrote: > Now I have added this > > libraryDependencies += "com.databricks" % "apps.twitter_classifier" > > However, I am getting an error > > > error: No implicit for Append.Value[Seq[sbt.ModuleID], >

Re: Akka with Hadoop/Spark

2016-06-05 Thread Jacek Laskowski
Hi, "I am supposed to work with akka and Hadoop in building apps on top of the data available in hadoop" <-- that's outside the topics covered in this mailing list (unless you're going to use Spark, too). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering

Re: Environment tab meaning

2016-06-07 Thread Jacek Laskowski
object. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jun 7, 2016 at 8:18 PM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > It is t

Re: comparaing row in pyspark data frame

2016-06-08 Thread Jacek Laskowski
On Wed, Jun 8, 2016 at 2:05 PM, pseudo oduesp <pseudo20...@gmail.com> wrote: > how we can compare columns to get max of row not columns and get name of > columns where max it present ? First thought - a UDF. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mast

Seq.toDF vs sc.parallelize.toDF = no Spark job vs one - why?

2016-06-08 Thread Jacek Laskowski
a "view" layer atop data and when this data is local/in memory already there's no need to submit a job to...well...compute the data. I'd appreciate more in-depth answer, perhaps with links to the code. Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Ap

Re: Not able to write output to local filsystem from Standalone mode.

2016-05-27 Thread Jacek Laskowski
e". Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, May 27, 2016 at 3:42 AM, Yong Zhang <java8...@hotmail.com> wrote: > That just makes sense, d

Re: Container preempted by scheduler - Spark job error

2016-06-02 Thread Jacek Laskowski
Hi, Few things for closer examination: * is yarn master URL accepted in 1.3? I thought it was only in later releases. Since you're seeing the issue it seems it does work. * I've never seen specifying confs using a single string. Can you check in the Web ui they're applied? * what about this

--driver-cores for Standalone and YARN only?! What about Mesos?

2016-06-01 Thread Jacek Laskowski
or fix) my understanding before I file a JIRA issue. Thanks! [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L475-L476 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-ap

Re: Spark Thrift Server run job as hive user

2016-05-31 Thread Jacek Laskowski
Hi, How do you start thrift server? What's your user name? I think it takes the user and always runs as it. Seen proxyUser today in spark-submit that may or may not be useful here. Jacek On 31 May 2016 10:01 a.m., "Radhika Kothari" wrote: Hi Anyone knows about

Re: Spark Thrift Server run job as hive user

2016-05-31 Thread Jacek Laskowski
What's "With the help of UI"? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, May 31, 2016 at 1:02 PM, Radhika Kothari <radhikakothari100...@gma

Re: choice of RDD function

2016-06-16 Thread Jacek Laskowski
Rather val df = sqlContext.read.json(rdd) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Jun 15, 2016 at 11:55 PM, Sivakumaran S <siva.kuma...@me.com>

Re: choice of RDD function

2016-06-16 Thread Jacek Laskowski
Hi, That's one of my concerns with the code. What concerned me the most is that the RDD(s) were converted to DataFrames only to registerTempTable and execute SQLs. I think it'd have better performance if DataFrame operators were used instead. Wish I had numbers. Pozdrawiam, Jacek Laskowski

Re: Can I control the execution of Spark jobs?

2016-06-16 Thread Jacek Laskowski
Hi, When you say "several ETL types of things", what is this exactly? What would an example of "dependency between these jobs" be? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at

Re: In yarn-cluster mode, provide system prop to the client jvm

2016-06-16 Thread Jacek Laskowski
Hi, You could use --properties-file to point to the properties file with properties or use spark.driver.extraJavaOptions. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: Error Running SparkPi.scala Example

2016-06-16 Thread Jacek Laskowski
IDEA :)) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Jun 16, 2016 at 1:37 AM, Krishna Kalyan <krishnakaly...@gmail.com> wrote: > Hello, >

Re: How to deal with tasks running too long?

2016-06-16 Thread Jacek Laskowski
Hi, I'd check Details for Stage page in web UI. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Jun 16, 2016 at 6:45 AM, Utkarsh Sengar <utkar

Re: How to enable core dump in spark

2016-06-16 Thread Jacek Laskowski
Hi, Can you make sure that the ulimit settings are applied to the Spark process? Is this Spark on YARN or Standalone? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

[YARN] Questions about YARN's queues and Spark's FAIR scheduler

2016-06-16 Thread Jacek Laskowski
(and without your support I won't be able to recover from this painful mental state :)) Thanks for reading so far! Appreciate any help. [1] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski

Re: Update Batch DF with Streaming

2016-06-20 Thread Jacek Laskowski
Hi, How would you do that without/outside streaming? Jacek On 17 Jun 2016 12:12 a.m., "Amit Assudani" wrote: > Hi All, > > > Can I update batch data frames loaded in memory with Streaming data, > > > For eg, > > > I have employee DF is registered as temporary table, it

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Jacek Laskowski
that the task > passes. > > FYI > > On Sun, Jun 19, 2016 at 3:22 AM, Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi, >> >> Thanks Burak for the idea, but it *only* fails the tasks that >> eventually fail the entire job not a particular stage (just

How to cause a stage to fail (using spark-shell)?

2016-06-18 Thread Jacek Laskowski
s. Please guide. Thanks. /me on to reviewing the Spark code... Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jace

Re: Running Spark in local mode

2016-06-19 Thread Jacek Laskowski
, be it on your local machine where you executed spark-submit or on one node in a YARN cluster. The same applies to Spark Standalone and Mesos and is controlled by --deploy-mode, i.e. client (default) or cluster. Please update your notes accordingly ;-) Pozdrawiam, Jacek Laskowski https://m

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Jacek Laskowski
finishing up properly. Any ideas? I've got one but it requires quite an extensive cluster set up which I'd like to avoid if possible. Just something I could use during workshops or demos and others could reproduce easily to learn Spark's internals. Pozdrawiam, Jacek Laskowski https

Many executors with the same ID in web UI (under Executors)?

2016-06-18 Thread Jacek Laskowski
for future reference. Why are there multiple executor entries under the same executor IDs? What are the executor entries exactly? When are the new ones created (after a Spark application is launched and assigned the --num-executors executors)? Pozdrawiam, Jacek Laskowski https://medium.com

Re: How to cause a stage to fail (using spark-shell)?

2016-06-18 Thread Jacek Laskowski
Hi, Following up on this question, is a stage considered failed only when there is a FetchFailed exception? Can I have a failed stage with only a single-stage job? Appreciate any help on this...(as my family doesn't like me spending the weekend with Spark :)) Pozdrawiam, Jacek Laskowski

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Jacek Laskowski
Hi, What do you see under Executors and Details for Stage (for the affected stages)? Anything weird memory-related? How does your "I am reading data from Kafka into Spark and writing it into Cassandra after processing it." pipeline look like? Pozdrawiam, Jacek Laskowski https://

Re: ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some

2016-06-16 Thread Jacek Laskowski
Hi, Why do you provided spark-core while the others are non-provided? How do you assemble the app? How do you submit it for execution? What's the deployment environment? More info...more info... Jacek On 15 Jun 2016 10:26 p.m., "S Sarkar" wrote: Hello, I built

Re: cache datframe

2016-06-16 Thread Jacek Laskowski
Yes. Yes. What's the use case? Jacek On 16 Jun 2016 2:17 p.m., "pseudo oduesp" wrote: > hi, > if i cache same data frame and transforme and add collumns i should cache > second times > > df.cache() > > transforamtion > add new columns > > df.cache() > ? > >

Re: Cost of converting RDD's to dataframe and back

2016-06-24 Thread Jacek Laskowski
Hi Jorn, You can measure the time for ser/deser yourself using web UI or SparkListeners. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Jun 24, 2016 at 10

Re: Spark SQL NoSuchMethodException...DriverWrapper.()

2016-06-24 Thread Jacek Laskowski
Hi Mirko, What exactly was the setting? I'd like to reproduce it. Can you file an issue in JIRA to fix that? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri

Re: Can I control the execution of Spark jobs?

2016-06-18 Thread Jacek Laskowski
it. pipeline == load a dataset, transform it and save it to persistent storage Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Jun 17, 2016 at 4:15 AM, Haopu Wang <

Re: How to enable core dump in spark

2016-06-18 Thread Jacek Laskowski
What about the user of NodeManagers? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Jun 16, 2016 at 10:51 PM, prateek arora <prateek.arora...@gmail.

Re: Many executors with the same ID in web UI (under Executors)?

2016-06-18 Thread Jacek Laskowski
-executors.png Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Jun 18, 2016 at 6:05 PM, Akhil Das <ak...@hacked.work> wrote: > A screenshot of the exe

Re: Many executors with the same ID in web UI (under Executors)?

2016-06-18 Thread Jacek Laskowski
org.apache.spark.deploy.yarn.ExecutorLauncher 28463 org.apache.spark.deploy.SparkSubmit Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Jun 18, 2016 at 6:16 PM, Mich Talebzadeh

  1   2   3   4   5   >