How is the order ensured in the jdbc relation provider when inserting data from multiple executors

2016-11-21 Thread Niranda Perera
core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L277 -- Niranda Perera @n1r44 <https://twitter.com/N1R44> +94 71 554 8430 https://www.linkedin.com/in/niranda https://pythagoreanscript.wordpress.com/

SQL Syntax for pivots

2016-11-16 Thread Niranda Perera
columns=['C'], aggfunc=np.sum) * *reshape2 (R) - dcast(df, A + B ~ C, sum) * *Oracle 11g - SELECT * FROM df PIVOT (sum(D) FOR C IN ('small', 'large')) p* Best [1] http://www.slideshare.net/SparkSummit/pivoting-data-with-sparksql-by-andrew-ray -- Niranda Per

Executors go OOM when using JDBC relation provider

2016-08-16 Thread Niranda Perera
here. This OOM exception is actually a blocker! Are there any other tuning I should do? And it certainly worries me to see that MySQL gave a significantly fast result than Spark here! Look forward to hearing from you! Best -- Niranda Perera @n1r44 <https://twitter.com/N1R44> +94 7

Why isnt spark-yarn module is excluded from the spark parent pom?

2016-07-12 Thread Niranda Perera
Hi guys, I could not find the spark-yarn module in the spark parent pom? Is there any particular reason why this has been excluded? Best -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.wordpress.com/

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
; > On Wed, Jul 6, 2016 at 11:12 PM, Niranda Perera > wrote: > >> Thanks Reynold >> >> On Thu, Jul 7, 2016 at 11:40 AM, Reynold Xin wrote: >> >>> Yes definitely. >>> >>> >>> On Wed, Jul 6, 2016 at 11:08 PM, Niranda Perera < >

Re: Latest spark release in the 1.4 branch

2016-07-06 Thread Niranda Perera
Thanks Reynold On Thu, Jul 7, 2016 at 11:40 AM, Reynold Xin wrote: > Yes definitely. > > > On Wed, Jul 6, 2016 at 11:08 PM, Niranda Perera > wrote: > >> Thanks Reynold for the prompt response. Do you think we could use a >> 1.4-branch latest build in a producti

Re: Latest spark release in the 1.4 branch

2016-07-06 Thread Niranda Perera
ch-1.4. You can build from the > branch yourself, but it might be better to upgrade to the later versions. > > On Wed, Jul 6, 2016 at 11:02 PM, Niranda Perera > wrote: > >> Hi guys, >> >> May I know if you have halted development in the Spark 1.4 branch? I see >&

Latest spark release in the 1.4 branch

2016-07-06 Thread Niranda Perera
Hi guys, May I know if you have halted development in the Spark 1.4 branch? I see that there is a release tag for 1.4.2 but it was never released. Can we expect a 1.4.x bug fixing release anytime soon? Best -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.

Re: Possible deadlock in registering applications in the recovery mode

2016-04-21 Thread Niranda Perera
Hi guys, any update on this? Best On Wed, Apr 20, 2016 at 3:00 AM, Niranda Perera wrote: > Hi Reynold, > > I have created a JIRA for this [1]. I have also created a PR for the same > issue [2]. > > Would be very grateful if you could look into this, because this is a >

Re: Possible deadlock in registering applications in the recovery mode

2016-04-19 Thread Niranda Perera
/browse/SPARK-14736 [2] https://github.com/apache/spark/pull/12506 On Mon, Apr 18, 2016 at 9:02 AM, Reynold Xin wrote: > I haven't looked closely at this, but I think your proposal makes sense. > > > On Sun, Apr 17, 2016 at 6:40 PM, Niranda Perera > wrote: > >> Hi g

Re: Possible deadlock in registering applications in the recovery mode

2016-04-17 Thread Niranda Perera
Hi guys, Any update on this? Best On Tue, Apr 12, 2016 at 12:46 PM, Niranda Perera wrote: > Hi all, > > I have encountered a small issue in the standalone recovery mode. > > Let's say there was an application A running in the cluster. Due to some > issue, the entire clu

Possible deadlock in registering applications in the recovery mode

2016-04-12 Thread Niranda Perera
Hi all, I have encountered a small issue in the standalone recovery mode. Let's say there was an application A running in the cluster. Due to some issue, the entire cluster, together with the application A goes down. Then later on, cluster comes back online, and the master then goes into the 're

Control the stdout and stderr streams in a executor JVM

2016-02-28 Thread Niranda Perera
Hi all, Is there any possibility to control the stdout and stderr streams in an executor JVM? I understand that there are some configurations provided from the spark conf as follows spark.executor.logs.rolling.maxRetainedFiles spark.executor.logs.rolling.maxSize spark.executor.logs.rolling.strate

Re: spark job scheduling

2016-01-27 Thread Niranda Perera
k cluster architecture design decision. > > Best, > > Chayapan (A) > > On Thu, Jan 28, 2016 at 10:07 AM, Niranda Perera > wrote: > >> hi all, >> >> I have a few questions on spark job scheduling. >> >> 1. As I understand, the smallest unit of w

spark job scheduling

2016-01-27 Thread Niranda Perera
hi all, I have a few questions on spark job scheduling. 1. As I understand, the smallest unit of work an executor can perform. In the 'fair' scheduler mode, let's say a job is submitted to the spark ctx which has a considerable amount of work to do in a task. While such a 'big' task is running,

releasing Spark 1.4.2

2015-11-15 Thread Niranda Perera
Hi, I am wondering when spark 1.4.2 will be released? is it in the voting stage at the moment? rgds -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.wordpress.com/

taking the heap dump when an executor goes OOM

2015-10-11 Thread Niranda Perera
Hi all, is there a way for me to get the heap-dump hprof of an executor jvm, when it goes out of memory? is this currently supported or do I have to change some configurations? cheers -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.wordpress.com/

passing a AbstractFunction1 to sparkContext().runJob instead of a Closure

2015-10-09 Thread Niranda Perera
hi all, I want to run a job in the spark context and since I am running the system in the java environment, I can not use a closure in the sparkContext().runJob. Instead, I am passing an AbstractFunction1 extension. while I get the jobs run without an issue, I constantly get the following WARN me

adding jars to the classpath with the relative path to spark home

2015-09-08 Thread Niranda Perera
Hi, is it possible to add jars to the spark executor/ driver classpath with the relative path of the jar (relative to the spark home)? I need to set the following settings in the spark conf - spark.driver.extraClassPath - spark.executor.extraClassPath the reason why I need to use the relative pa

Spark SQL sort by and collect by in multiple partitions

2015-09-02 Thread Niranda Perera
Hi all, I have been using sort by and order by in spark sql and I observed the following when using SORT BY and collect results, the results are getting sorted partition by partition. example: if we have 1, 2, ... , 12 and 4 partitions and I want to sort it in descending order, partition 0 (p0) w

Re: taking an n number of rows from and RDD starting from an index

2015-09-02 Thread Niranda Perera
Iterator is what you want. But it will keep one >> partition's data in-memory. >> >> On Wed, Sep 2, 2015 at 10:05 AM, Niranda Perera > > wrote: >> >>> Hi all, >>> >>> I have a large set of data which would not fit into the memory. So, I &

taking an n number of rows from and RDD starting from an index

2015-09-01 Thread Niranda Perera
Hi all, I have a large set of data which would not fit into the memory. So, I wan to take n number of data from the RDD given a particular index. for an example, take 1000 rows starting from the index 1001. I see that there is a take(num: Int): Array[T] method in the RDD, but it only returns the

dynamically update the master list of a worker or a spark context

2015-07-27 Thread Niranda Perera
Hi all, I have been developing a custom recovery implementation for spark masters and workers using hazlecast clustering. in the Spark worker code [1], we see that a list of masters needs to be provided at the worker start up, in order to achieve high availability. this effectively means that one

databases currently supported by Spark SQL JDBC

2015-07-09 Thread Niranda Perera
Hi, I'm planning to use Spark SQL JDBC datasource provider in various RDBMS databases. what are the databases currently supported by Spark JDBC relation provider? rgds -- Niranda @n1r44 https://pythagoreanscript.wordpress.com/

Re: Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-07-05 Thread Niranda Perera
Hi, Sorry this was a class loading issue at my side. Sorted it out. Sorry if I caused any inconvenience Rgds Niranda Perera +94 71 554 8430 On Jul 5, 2015 17:08, "Niranda Perera" wrote: > Hi Josh, > > I tried using the spark 1.4.0 upgrade. > > here is the class I&#x

Re: Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-07-05 Thread Niranda Perera
e the reason for this? rgds On Thu, Jun 25, 2015 at 11:42 AM, Niranda Perera wrote: > thanks Josh. > > this looks very similar to my problem. > > On Thu, Jun 25, 2015 at 11:32 AM, Josh Rosen wrote: > >> This sounds like https://issues.apache.org/jira/browse/SPARK-7436, whi

Re: Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-06-24 Thread Niranda Perera
thanks Josh. this looks very similar to my problem. On Thu, Jun 25, 2015 at 11:32 AM, Josh Rosen wrote: > This sounds like https://issues.apache.org/jira/browse/SPARK-7436, which > has been fixed in Spark 1.4+ and in branch-1.3 (for Spark 1.3.2). > > On Wed, Jun 24, 2015 at 10:57

Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-06-24 Thread Niranda Perera
Hi all, I'm trying to implement a custom StandaloneRecoveryModeFactory in the Java environment. Pls find the implementation here. [1] . I'm new to Scala, hence I'm trying to use Java environment as much as possible. when I start a master with spark.deploy.recoveryMode.factory property to be "CUST

custom REST port from spark-defaults.cof

2015-06-22 Thread Niranda Perera
Hi, is there a configuration setting to set a custom port number for the master REST URL? can that be included in the spark-defaults.conf? cheers -- Niranda @n1r44 https://pythagoreanscript.wordpress.com/

Re: Tentative due dates for Spark 1.3.2 release

2015-05-17 Thread Niranda Perera
Hi Reynold, sorry, my mistake. can do that. thanks On Mon, May 18, 2015 at 9:51 AM, Reynold Xin wrote: > You can just look at this branch, can't you? > https://github.com/apache/spark/tree/branch-1.3 > > > On Sun, May 17, 2015 at 9:20 PM, Niranda Perera > wrote: &g

Re: Tentative due dates for Spark 1.3.2 release

2015-05-17 Thread Niranda Perera
his case branch-1.3). They never > introduce new API's. If you have a particular bug fix you are waiting > for, you can always build Spark off of that branch. > > - Patrick > > On Fri, May 15, 2015 at 12:46 AM, Niranda Perera > wrote: > > Hi, > > &g

Tentative due dates for Spark 1.3.2 release

2015-05-15 Thread Niranda Perera
Hi, May I know the tentative release dates for spark 1.3.2? rgds -- Niranda

Re: Custom PersistanceEngine and LeaderAgent implementation in Java

2015-05-01 Thread Niranda Perera
Wed, Apr 29, 2015 at 11:02 PM, Niranda Perera > wrote: > >> Hi, >> >> this follows the following feature in this feature [1] >> >> I'm trying to implement a custom persistence engine and a leader agent in >> the Java environment. >> >> v

Custom PersistanceEngine and LeaderAgent implementation in Java

2015-04-29 Thread Niranda Perera
Hi, this follows the following feature in this feature [1] I'm trying to implement a custom persistence engine and a leader agent in the Java environment. vis-a-vis scala, when I implement the PersistenceEngine trait in java, I would have to implement methods such as readPersistedData, removeDri

org.spark-project.jetty and guava repo locations

2015-04-02 Thread Niranda Perera
Hi, I am looking for the org.spark-project.jetty and org.spark-project.guava repo locations but I'm unable to find it in the maven repository. are these publicly available? rgds -- Niranda

Migrating from 1.2.1 to 1.3.0 - org.apache.spark.sql.api.java.Row

2015-03-31 Thread Niranda Perera
Hi, previously in 1.2.1, the result row from a Spark SQL query was a org.apache.spark.sql.api.java.Row. In 1.3.0 I do not see a sql.api.java package. so does it mean that even the SQL query result row is an implementation of org.apache.spark.sql.Row such as GenericRow etc? -- Niranda

What is the meaning to of 'STATE' in a worker/ an executor?

2015-03-29 Thread Niranda Perera
Hi, I have noticed in the Spark UI, workers and executors run on several states, ALIVE, LOADING, RUNNING, DEAD etc? What exactly are these states mean and what is the effect it has on working with those executor? ex: whether an executor can not be used in the loading state, etc cheers -- Niran

Connecting a worker to the master after a spark context is made

2015-03-20 Thread Niranda Perera
Hi, Please consider the following scenario. I've started the spark master by invoking the org.apache.spark.deploy.master.Master.startSystemAndActor method in a java code and connected a worker to it using the org.apache.spark.deploy.worker.Worker.startSystemAndActor method. and then I have succes

Re: Fixed worker ports in the spark worker

2015-03-18 Thread Niranda Perera
015 at 11:10 AM, Niranda Perera > wrote: > >> Hi all, >> >> I see that spark server opens up random ports, especially in the workers. >> >> is there any way to fix these ports or give an set of ports for the worker >> to choose from? >> >> cheers &

Fixed worker ports in the spark worker

2015-03-17 Thread Niranda Perera
Hi all, I see that spark server opens up random ports, especially in the workers. is there any way to fix these ports or give an set of ports for the worker to choose from? cheers -- Niranda

Deploying master and worker programatically in java

2015-03-03 Thread Niranda Perera
Hi, I want to start a Spark standalone cluster programatically in java. I have been checking these classes, - org.apache.spark.deploy.master.Master - org.apache.spark.deploy.worker.Worker I successfully started a master with this simple main class. public static void main(String[] args) {

Re: OSGI bundles for spark project..

2015-02-20 Thread Niranda Perera
> not generally embeddable. Packaging is generally 'out of scope' for > the core project beyond the standard Maven and assembly releases. > > On Fri, Feb 20, 2015 at 8:33 AM, Niranda Perera > wrote: > > Hi, > > > > I am interested in a Spark OSGI bundle. &

OSGI bundles for spark project..

2015-02-20 Thread Niranda Perera
Hi, I am interested in a Spark OSGI bundle. While checking the maven repository I found out that it is still not being implemented. Can we see an OSGI bundle being released soon? Is it in the Spark Project roadmap? Rgds -- Niranda

Re: Replacing Jetty with TomCat

2015-02-19 Thread Niranda Perera
ver configurable. > Mostly because there's no real problem in running an HTTP service > internally based on Netty while you run your own HTTP service based on > something else like Tomcat. What's the problem? > > On Wed, Feb 18, 2015 at 3:14 AM, Niranda Perera > wrote: > >

Re: Replacing Jetty with TomCat

2015-02-17 Thread Niranda Perera
ou won't be able to switch it out > without rewriting a fair bit of code, no, but you don't need to. > > On Mon, Feb 16, 2015 at 5:08 AM, Niranda Perera > wrote: > > Hi, > > > > We are thinking of integrating Spark server inside a product. Our current > >

Re: Replacing Jetty with TomCat

2015-02-15 Thread Niranda Perera
ded mode of Jetty, rather than using > servlets. > > Even if it is possible, you probably wouldn't want to embed Spark in your > application server ... > > > On Sun, Feb 15, 2015 at 9:08 PM, Niranda Perera > wrote: > >> Hi, >> >> We are thinking of i

Replacing Jetty with TomCat

2015-02-15 Thread Niranda Perera
Hi, We are thinking of integrating Spark server inside a product. Our current product uses Tomcat as its webserver. Is it possible to switch the Jetty webserver in Spark to Tomcat off-the-shelf? Cheers -- Niranda

create a SchemaRDD from a custom datasource

2015-01-13 Thread Niranda Perera
Hi, We have a custom datasources API, which connects to various data sources and exposes them out as a common API. We are now trying to implement the Spark datasources API released in 1.2.0 to connect Spark for analytics. Looking at the sources API, we figured out that we should extend a scan cla

Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
SuchMethodError: com.google.common.hash.HashFunction.hashInt" error occurs, which is understandable because hashInt is not available before Guava 12. So, I''m wondering why this occurs? Cheers -- Niranda Perera

Re: Can the Scala classes in the spark source code, be inherited in Java classes?

2014-12-02 Thread Niranda Perera
are compiled down to classes in bytecode. Take > a look at this: https://twitter.github.io/scala_school/java.html > > Note that questions like this are not exactly what this dev list is meant > for ... > > On Mon, Dec 1, 2014 at 9:22 PM, Niranda Perera wrote: > >> Hi, >&

Can the Scala classes in the spark source code, be inherited in Java classes?

2014-12-01 Thread Niranda Perera
Hi, Can the Scala classes in the spark source code, be inherited (and other OOP concepts) in Java classes? I want to customize some part of the code, but I would like to do it in a Java environment. Rgds -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44

Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Niranda Perera
nd there is an example library for reading Avro data > <https://github.com/databricks/spark-avro>. > > On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera wrote: > >> Hi, >> >> I am evaluating Spark for an analytic component where we do batch >> processing o

Creating a SchemaRDD from an existing API

2014-11-27 Thread Niranda Perera
[1] https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 <https://twitter.com/N1R44>

Getting the execution times of spark job

2014-09-02 Thread Niranda Perera
-- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 <https://twitter.com/N1R44>

Storage Handlers in Spark SQL

2014-08-21 Thread Niranda Perera
ndler+for+Hive https://docs.wso2.com/display/BAM241/Creating+Hive+Queries+to+Analyze+Data#CreatingHiveQueriestoAnalyzeData-cas I would like to know where Spark SQL can work with these storage handlers (while using HiveContext may be) ? Best regards -- *Niranda Perera* Software Engineer, WSO2 Inc. M