Re: Spark driver getting out of memory

2016-07-24 Thread Raghava Mutharaju
? Regards, Raghava. On Wed, Jul 20, 2016 at 2:08 AM, Saurav Sinha <sauravsinh...@gmail.com> wrote: > Hi, > > I have set driver memory 10 GB and job ran with intermediate failure which > is recovered back by spark. > > But I still what to know if no of parts incre

Re: OOM on the driver after increasing partitions

2016-06-22 Thread Raghava Mutharaju
Thank you. Sure, if I find something I will post it. Regards, Raghava. On Wed, Jun 22, 2016 at 7:43 PM, Nirav Patel <npa...@xactlycorp.com> wrote: > I believe it would be task, partitions, task status etc information. I do > not know exact of those things but I had OOM on drive

Re: OOM on the driver after increasing partitions

2016-06-22 Thread Raghava Mutharaju
available limit. So the other options are 1) Separate the driver from master, i.e., run them on two separate nodes 2) Increase the RAM capacity on the driver/master node. Regards, Raghava. On Wed, Jun 22, 2016 at 7:05 PM, Nirav Patel <npa...@xactlycorp.com> wrote: > Yes driver keeps fa

Re: OOM on the driver after increasing partitions

2016-06-22 Thread Raghava Mutharaju
them to T, i.e., T = T + deltaT 3) Stop when current T size (count) is same as previous T size, i.e., deltaT is 0. Do you think something happens on the driver due to the application logic and when the partitions are increased? Regards, Raghava. On Wed, Jun 22, 2016 at 12:33 PM, Sonal Goyal

OOM on the driver after increasing partitions

2016-06-22 Thread Raghava Mutharaju
uld be the possible reasons behind the driver-side OOM when the number of partitions are increased? Regards, Raghava.

Spark 2.0.0-snapshot: IllegalArgumentException: requirement failed: chunks must be non-empty

2016-05-13 Thread Raghava Mutharaju
(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Fri, May 13, 2016 at 6:33 AM, Raghava Mutharaju < m.vijayaragh...@gmail.com> wrote: > Thank you for the response. > > I use

Re: sbt for Spark build with Scala 2.11

2016-05-13 Thread Raghava Mutharaju
= "org.apache.spark" % "spark-sql_2.11" % "2.0.0-SNAPSHOT" lazy val root = (project in file(".")). settings( name := "sparkel", version := "0.1.0", scalaVersion := "2.11.8", libraryDependencies += spark, library

sbt for Spark build with Scala 2.11

2016-05-12 Thread Raghava Mutharaju
of spark version gives sbt error unresolved dependency: org.apache.spark#spark-core_2.11;2.0.0-SNAPSHOT I guess this is because the repository doesn't contain 2.0.0-SNAPSHOT. Does this mean, the only option is to put all the required jars in the lib folder (unmanaged dependencies)? Regards, Raghava.

Re: partitioner aware subtract

2016-05-10 Thread Raghava Mutharaju
t that both RDDs are already hash partitioned. Regards, Raghava. On Tue, May 10, 2016 at 11:44 AM, Rishi Mishra <rmis...@snappydata.io> wrote: > As you have same partitioner and number of partitions probably you can use > zipPartition and provide a user defined function to substract . >

Re: partitioner aware subtract

2016-05-09 Thread Raghava Mutharaju
))) (3,(16,Some(30))) (3,(16,Some(16))) case (x, (y, z)) => Apart from allowing z == None and filtering on y == z, we also should filter out (3, (16, Some(30))). How can we do that efficiently without resorting to broadcast of any elements of rdd2? Regards, Raghava. On Mon, May 9, 2016 at 6

partitioner aware subtract

2016-05-08 Thread Raghava Mutharaju
. Regards, Raghava.

Re: executor delay in Spark

2016-04-28 Thread Raghava Mutharaju
use Spark 1.6.0 We noticed the following 1) persisting an RDD seems to lead to unbalanced distribution of partitions across the executors. 2) If one RDD has an all-nothing skew then rest of the RDDs that depend on it also get all-nothing skew. Regards, Raghava. On Wed, Apr 27, 2016 at 10:20 AM

Re: executor delay in Spark

2016-04-24 Thread Raghava Mutharaju
s even (happens when count is moved). Any pointers in figuring out this issue is much appreciated. Regards, Raghava. On Fri, Apr 22, 2016 at 7:40 PM, Mike Hynes <91m...@gmail.com> wrote: > Glad to hear that the problem was solvable! I have not seen delays of this > type for la

Re: executor delay in Spark

2016-04-22 Thread Raghava Mutharaju
Thank you. For now we plan to use spark-shell to submit jobs. Regards, Raghava. On Fri, Apr 22, 2016 at 7:40 PM, Mike Hynes <91m...@gmail.com> wrote: > Glad to hear that the problem was solvable! I have not seen delays of this > type for later stages in jobs run by spark-subm

executor delay in Spark

2016-04-22 Thread Raghava Mutharaju
stage also. Apart from introducing a dummy stage or running it from spark-shell, is there any other option to fix this? Regards, Raghava. On Mon, Apr 18, 2016 at 12:17 AM, Mike Hynes <91m...@gmail.com> wrote: > When submitting a job with spark-submit, I've observed delays (up to >

Re: strange HashPartitioner behavior in Spark

2016-04-18 Thread Raghava Mutharaju
No. We specify it as a configuration option to the spark-submit. Does that make a difference? Regards, Raghava. On Mon, Apr 18, 2016 at 9:56 AM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > Are you specifying your spark master in the scala program? > > Best Regards, > Son

Re: strange HashPartitioner behavior in Spark

2016-04-18 Thread Raghava Mutharaju
be that all the data is on one node and nothing on the other and no, the keys are not the same. They vary from 1 to around 55000 (integers). What makes this strange is that it seems to work fine on the spark shell (REPL). Regards, Raghava. On Mon, Apr 18, 2016 at 1:14 AM, Mike Hynes <91m...@gmail.

Re: strange HashPartitioner behavior in Spark

2016-04-17 Thread Raghava Mutharaju
size (which is more than adequate now). This behavior is different in spark-shell and spark scala program. We are not using YARN, its the stand alone version of Spark. Regards, Raghava. On Mon, Apr 18, 2016 at 12:09 AM, Anuj Kumar <anujs...@gmail.com> wrote: > Few params like- spark.

Re: strange HashPartitioner behavior in Spark

2016-04-17 Thread Raghava Mutharaju
tainedJobs and retainedStages has been increased to check them in the UI. What information regarding Spark Context would be of interest here? Regards, Raghava. On Sun, Apr 17, 2016 at 10:54 PM, Anuj Kumar <anujs...@gmail.com> wrote: > If the data file is same then it should have s

strange HashPartitioner behavior in Spark

2016-04-17 Thread Raghava Mutharaju
s, but this behavior does not change. This seems strange. Is there some problem with the way we use HashPartitioner? Thanks in advance. Regards, Raghava.

DataFrames - Kryo registration issue

2016-03-10 Thread Raghava Mutharaju
la? Does this point to some other issue? In some other posts, I noticed use of kryo.register(). In this case, how do we pass the kryo object to SparkContext? Thanks in advance. Regards, Raghava.

Dataset takes more memory compared to RDD

2016-02-12 Thread Raghava Mutharaju
(org.apache.spark.sql.types.StructField[].class); I tried registering using conf.registerKryoClasses(Array(classOf[StructField[]])) But StructField[] does not exist. Is there any other way to register it? I already registered StructField. Regards, Raghava.

Re: Dataset joinWith condition

2016-02-10 Thread Raghava Mutharaju
Thanks a lot Ted. If the two columns are of different types say Int and Long, then will be ds.select(expr("_2 / _1").as[(Int, Long)]) Regards, Raghava. On Wed, Feb 10, 2016 at 5:19 PM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. I followed something similar $"

Dataset joinWith condition

2016-02-09 Thread Raghava Mutharaju
uot;x") == B.toDF().col("y")) Is there a way to avoid using toDF()? I am having similar issues with the usage of filter(A.x == B.y) -- Regards, Raghava

Re: Dataset joinWith condition

2016-02-09 Thread Raghava Mutharaju
Ted, Thank you for the pointer. That works, but what does a string prepended with $ sign mean? Is it an expression? Could you also help me with the select() parameter syntax? I followed something similar $"a.x" and it gives an error message that a TypedColumn is expected. Regard

DAG visualization: no visualization information available with history server

2016-01-31 Thread Raghava
/stages? Thanks in advance. Raghava. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-visualization-no-visualization-information-available-with-history-server-tp26117.html Sent from the Apache Spark User List mailing list archive at Nabble.com

understanding iterative algorithms in Spark

2016-01-25 Thread Raghava
Hello All, I am new to Spark and I am trying to understand how iterative application of operations are handled in Spark. Consider the following program in Scala. var u = sc.textFile(args(0)+"s1.txt").map(line => { line.split("\\|") match { case Array(x,y) => (y.toInt,x.toInt)}})