Spark mllin k-means taking too much time

2016-03-03 Thread Priya Ch
Hi Team, I am running k-means algorithm on KDD 1999 data set ( http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html). I am running the algorithm for different values of k as such - 5,10,15,40. The data set is 709 MB. I have placed the file in hdfs with a block size of 128MB (6 blocks).

BlockFetchFailed Exception

2016-03-11 Thread Priya Ch
Hi All, I am trying to run spark k-means on a data set which is closely to 1 GB. Most often I seen BlockFetchFailed Exception which I am suspecting because of Out of memory. Here the configuration details- Total cores:12 Total workers:3 Memory per node: 6GB When running the job, I an giving th

Re: Why KMeans with mllib is so slow ?

2016-03-14 Thread Priya Ch
Hi Xi Shen, Changing the initialization step from "kmeans||" to "random" decreased the execution time from 2 hrs to 6 min. However, by default the no.of runs is 1. If I try to set the number of runs to 10, then again see increase in job execution time. How to proceed on this ?. By the way how

Streaming k-means visualization

2016-04-08 Thread Priya Ch
Hi All, I am using Streaming k-means to train my model on streaming data. Now I want to visualize the clusters. What would be the reporting tool used for this ? Would zeppelin used to visualize the clusters Regards, Padma Ch

Need Streaming output to single HDFS File

2016-04-12 Thread Priya Ch
Hi All, I am working with Kafka, Spark Streaming and I want to write the streaming output to a single file. dstream.saveAsTexFiles() is creating files in different folders. Is there a way to write to a single folder ? or else if written to different folders, how do I merge them ? Thanks, Padma C

Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
Hi All, I have two RDDs A and B where in A is of size 30 MB and B is of size 7 MB, A.cartesian(B) is taking too much time. Is there any bottleneck in cartesian operation ? I am using spark 1.6.0 version Regards, Padma Ch

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
from the >> FinancialData table. >> >> >> Not very useful >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/pro

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
iting as > parquet, orc, ...? > > // maropu > > On Wed, May 25, 2016 at 7:10 PM, Priya Ch > wrote: > >> Hi , Yes I have joined using DataFrame join. Now to save this into hdfs >> .I am converting the joined dataframe to rdd (dataframe.rdd) and using >> saveAsTextFile, tr

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
g cartesian. Till now application took 30 mins and on top of that I see executor lost issues. Thanks, Padma Ch On Wed, May 25, 2016 at 4:22 PM, Jörn Franke wrote: > What is the use case of this ? A Cartesian product is by definition slow > in any system. Why do you need this? How long does y

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
On Wed, May 25, 2016 at 4:49 PM, Jörn Franke wrote: > > Alternatively depending on the exact use case you may employ solr on > Hadoop for text analytics > > > On 25 May 2016, at 12:57, Priya Ch wrote: > > > > Lets say i have rdd A of strings as {"hi","

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
e matching with the > bigger dataset there. > This highly depends on the data in your data set. How they compare in size > etc. > > > > On 25 May 2016, at 13:27, Priya Ch wrote: > > Why do i need to deploy solr for text anaytics...i have files placed in > HDFS. j

Spark Job Execution halts during shuffle...

2016-05-26 Thread Priya Ch
Hello Team, I am trying to perform join 2 rdds where one is of size 800 MB and the other is 190 MB. During the join step, my job halts and I don't see progress in the execution. This is the message I see on console - INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output locations

Re: Spark Job Execution halts during shuffle...

2016-06-01 Thread Priya Ch
gt; On Thu, May 26, 2016 at 8:00 PM, Takeshi Yamamuro > wrote: > >> Hi, >> >> If you get stuck in job fails, one of best practices is to increase >> #partitions. >> Also, you'd better off using DataFrame instread of RDD in terms of join >> optimization.

Spark Task failure with File segment length as negative

2016-06-22 Thread Priya Ch
Hi All, I am running Spark Application with 1.8TB of data (which is stored in Hive tables format). I am reading the data using HiveContect and processing it. The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I am launching the application with 25 executors with 5 cores each

Re: Spark Task failure with File segment length as negative

2016-07-06 Thread Priya Ch
Is anyone resolved this ? Thanks, Padma CH On Wed, Jun 22, 2016 at 4:39 PM, Priya Ch wrote: > Hi All, > > I am running Spark Application with 1.8TB of data (which is stored in Hive > tables format). I am reading the data using HiveContect and processing it. > The cluster ha

Send real-time alert using Spark

2016-07-12 Thread Priya Ch
Hi All, I am building Real-time Anomaly detection system where I am using k-means to detect anomaly. Now in-order to send alert to mobile or an email alert how do i send it using Spark itself ? Thanks, Padma CH

Re: Send real-time alert using Spark

2016-07-12 Thread Priya Ch
wouldn't necessarily "use spark" to send the alert. Spark is in an > important sense one library among many. You can have your application use > any other library available for your language to send the alert. > > Marcin > > On Tue, Jul 12, 2016 at 9:25 AM, Priya Ch >

java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-17 Thread Priya Ch
Hi All, When running streaming application, I am seeing the below error: java.io.FileNotFoundException: /data1/yarn/nm/usercache/root/appcache/application_1450172646510_0004/blockmgr-a81f42cd-6b52-4704-83f3-2cfc12a11b86/02/temp_shuffle_589ddccf-d436-4d2c-9935-e5f8c137b54b (Too many open files

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-22 Thread Priya Ch
Jakob, Increased the settings like fs.file-max in /etc/sysctl.conf and also increased user limit in /etc/security/limits.conf. But still see the same issue. On Fri, Dec 18, 2015 at 12:54 AM, Jakob Odersky wrote: > It might be a good idea to see how many files are open and try increasing > th

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-23 Thread Priya Ch
ulimit -n 65000 fs.file-max = 65000 ( in etc/sysctl.conf file) Thanks, Padma Ch On Tue, Dec 22, 2015 at 6:47 PM, Yash Sharma wrote: > Could you share the ulimit for your setup please ? > > - Thanks, via mobile, excuse brevity. > On Dec 22, 2015 6:39 PM, "Priya Ch&qu

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-28 Thread Priya Ch
s using lsof >> command. Need root permissions. If it is cluster not sure much ! >> 2) which exact line in the code is triggering this error ? Can you paste >> that snippet ? >> >> >> On Wednesday 23 December 2015, Priya Ch >> wrote: >> >>&g

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Priya Ch
Can some one throw light on this ? Regards, Padma Ch On Mon, Dec 28, 2015 at 3:59 PM, Priya Ch wrote: > Chris, we are using spark 1.3.0 version. we have not set > spark.streaming.concurrentJobs > this parameter. It takes the default value. > > Vijay, > > From the tac

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Priya Ch
iles" > exception. > > > On Tuesday, January 5, 2016 8:03 AM, Priya Ch < > learnings.chitt...@gmail.com> wrote: > > > Can some one throw light on this ? > > Regards, > Padma Ch > > On Mon, Dec 28, 2015 at 3:59 PM, Priya Ch > wrote: > > Chris

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
f" on > one of the spark executors (perhaps run it in a for loop, writing the > output to separate files) until it fails and see which files are being > opened, if there's anything that seems to be taking up a clear majority > that might key you in on the culprit. > > O

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
which would convey the same. On Wed, Jan 6, 2016 at 8:19 PM, Annabel Melongo wrote: > Priya, > > It would be helpful if you put the entire trace log along with your code > to help determine the root cause of the error. > > Thanks > > > On Wednesday, Januar

Real time anomaly system

2016-03-01 Thread Priya Ch
Hi, I am trying to build real time anomaly detection system using Spark, kafka, Cassandra and Akka. I have network intrusion dataset (KDD 1999 cup). how can i build the system using this ? I understood that certain part of the data, I am considering as historical data for my model training and o

Spark Mllib kmeans execution

2016-03-02 Thread Priya Ch
Hi All, I am running k-means clustering algorithm. Now, when I am running the algorithm as - val conf = new SparkConf val sc = new SparkContext(conf) . . val kmeans = new KMeans() val model = kmeans.run(RDD[Vector]) . . . The 'kmeans' object gets created on driver. Now does *kmeans.run() *get e

CassandraSQLContext throwing NullPointer Exception

2015-09-28 Thread Priya Ch
Hi All, I am trying to use dataframes (which contain data from cassandra) in rdd.foreach. This is throwing the following exception: Is CassandraSQLContext accessible within executor 15/09/28 17:22:40 ERROR JobScheduler: Error running job streaming job 144344116 ms.0 org.apache.spark.Sp

Re: CassandraSQLContext throwing NullPointer Exception

2015-09-28 Thread Priya Ch
Sep 28, 2015 at 6:54 PM, Ted Yu wrote: > Which Spark release are you using ? > > Can you show the snippet of your code around CassandraSQLContext#sql() ? > > Thanks > > On Mon, Sep 28, 2015 at 6:21 AM, Priya Ch > wrote: > >> Hi All, >> >> I am trying to use da

Concurrency issue in Streams of data

2015-10-20 Thread Priya Ch
Hi All, When processing streams of data (with batch inter val 1 sec), there is possible case of Concurrency issue. i.e two messages M1 and M2 (updated version of M1) with same key are processed by 2 threads in parallel. To resolve this concurrency issue, I am applying Hash Partitioner on RDD. (

Inner Joins on Cassandra RDDs

2015-10-21 Thread Priya Ch
Hello All, I have two Cassandra RDDs. I am using joinWithCassandraTable which is doing a cartesian join because of which we are getting unwanted rows. How to perform inner join on Cassandra RDDs ? If I intend to use normal join, i have to read entire table which is costly. Is there any speci

Mock Cassandra DB Connection in Unit Testing

2015-10-29 Thread Priya Ch
Hi All, For my Spark Streaming code, which writes the results to Cassandra DB, I need to write Unit test cases. what are the available test frameworks to mock the connection to Cassandra DB ?

Re: Mock Cassandra DB Connection in Unit Testing

2015-10-29 Thread Priya Ch
On Thu, Oct 29, 2015 at 12:27 PM, Priya Ch > wrote: > >> Hi All, >> >> For my Spark Streaming code, which writes the results to Cassandra DB, >> I need to write Unit test cases. what are the available test frameworks to >> mock the connection to Cassandra DB ? >> > >

Re: Mock Cassandra DB Connection in Unit Testing

2015-10-29 Thread Priya Ch
One more question, if i have a function which takes RDD as a parameter, how do we mock an RDD ?? On Thu, Oct 29, 2015 at 5:20 PM, Priya Ch wrote: > How do we do it for Cassandra..can we use the same Mocking ? > EmbeddedCassandra Server is available with CassandraUnit. Can this be use

Allow multiple SparkContexts in Unit Testing

2015-11-04 Thread Priya Ch
Hello All, How to use multiple Spark Context in executing multiple test suite of spark code ??? Can some one throw light on this ?

Re: Allow multiple SparkContexts in Unit Testing

2015-11-04 Thread Priya Ch
conf, Seconds(seconds)) > } > > Regards, > > Bryan Jeffrey > > > On Wed, Nov 4, 2015 at 9:49 AM, Ted Yu wrote: > >> Are you trying to speed up tests where each test suite uses single >> SparkContext >> ? >> >> You may want to read

Subtract on rdd2 is throwing below exception

2015-11-05 Thread Priya Ch
Hi All, I am seeing exception when trying to substract 2 rdds. Lets say rdd1 has messages like - * pnr, bookingId, BookingObject* 101, 1, BookingObject1 // - event number is 0 102, 1, BookingObject2// - event number is 0 103, 2, Bo

Spark Streaming Use Cases

2015-12-02 Thread Priya Ch
Hi All, I have the following use case for Spark Streaming - There are 2 streams of data say - FlightBookings and Ticket For each ticket, I need to associate it with relevant Booking info. There are distinct applications for Booking and Ticket. The Booking streaming application processes the in

Inconsistent data in Cassandra

2015-12-06 Thread Priya Ch
Hi All, I have the following scenario in writing rows to Cassandra from Spark Streaming - in a 1 sec batch, I have 3 tickets with same ticket number (primary key) but with different envelope numbers (i.e envelope 1, envelope 2, envelope 3.) I am writing these messages to Cassandra using saveToc

Writing streaming data to cassandra creates duplicates

2015-07-26 Thread Priya Ch
Hi All, I have a problem when writing streaming data to cassandra. Or existing product is on Oracle DB in which while wrtiting data, locks are maintained such that duplicates in the DB are avoided. But as spark has parallel processing architecture, if more than 1 thread is trying to write same d

Fwd: Writing streaming data to cassandra creates duplicates

2015-07-28 Thread Priya Ch
s will guard against multiple attempts to > run the task that inserts into Cassandra. > > See > http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations > > TD > > On Sun, Jul 26, 2015 at 11:19 AM, Priya Ch > wrote: > >>

Re: Writing streaming data to cassandra creates duplicates

2015-07-30 Thread Priya Ch
Hi All, Can someone throw insights on this ? On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch wrote: > > > Hi TD, > > Thanks for the info. I have the scenario like this. > > I am reading the data from kafka topic. Let's say kafka has 3 partitions > for the topic. I

Fwd: Writing streaming data to cassandra creates duplicates

2015-08-04 Thread Priya Ch
combine the messages with the same primary key. > > Hope that helps. > > Greetings, > > Juan > > > 2015-07-30 10:50 GMT+02:00 Priya Ch : > >> Hi All, >> >> Can someone throw insights on this ? >> >> On Wed, Jul 29, 2015 at 8:29 AM, Priya

Write to cassandra...each individual statement

2015-08-13 Thread Priya Ch
Hi All, I have a question in writing rdd to cassandra. Instead of writing entire rdd to cassandra, i want to write individual statement into cassandra beacuse there is a need to perform to ETL on each message ( which requires checking with the DB). How could i insert statements individually? Usin

Re: Write to cassandra...each individual statement

2015-08-13 Thread Priya Ch
o do is *transform* the rdd before writing it, e.g. using > the .map function. > > > On Thu, Aug 13, 2015 at 11:30 AM, Priya Ch > wrote: > >> Hi All, >> >> I have a question in writing rdd to cassandra. Instead of writing entire >> rdd to cassandra, i want t

rdd count is throwing null pointer exception

2015-08-17 Thread Priya Ch
Hi All, Thank you very much for the detailed explanation. I have scenario like this- I have rdd of ticket records and another rdd of booking records. for each ticket record, i need to check whether any link exists in booking table. val ticketCachedRdd = ticketRdd.cache ticketRdd.foreach{ ticke

Re: rdd count is throwing null pointer exception

2015-08-17 Thread Priya Ch
map transformation. For more information, see SPARK-5063. On Mon, Aug 17, 2015 at 8:13 PM, Preetam wrote: > The error could be because of the missing brackets after the word cache - > .ticketRdd.cache() > > > On Aug 17, 2015, at 7:26 AM, Priya Ch > wrote: > > > > Hi All,

Spark RDD join with CassandraRDD

2015-08-25 Thread Priya Ch
Hi All, I have the following scenario: There exists a booking table in cassandra, which holds the fields like, bookingid, passengeName, contact etc etc. Now in my spark streaming application, there is one class Booking which acts as a container and holds all the field details - class Booking

Executor lost failure

2015-09-01 Thread Priya Ch
Hi All, I have a spark streaming application which writes the processed results to cassandra. In local mode, the code seems to work fine. The moment i start running in distributed mode using yarn, i see executor lost failure. I increased executor memory to occupy entire node's memory which is aro

Spark - launchng job for each action

2015-09-05 Thread Priya Ch
Hi All, In Spark, each action results in launching a job. Lets say my spark app looks as- val baseRDD =sc.parallelize(Array(1,2,3,4,5),2) val rdd1 = baseRdd.map(x => x+2) val rdd2 = rdd1.filter(x => x%2 ==0) val count = rdd2.count val firstElement = rdd2.first println("Count is"+count) println(

Re: Spark - launchng job for each action

2015-09-06 Thread Priya Ch
ady computed by > rdd2.count, because it is already available. If some partitions are not > available due to GC, then only those partitions are recomputed. > > On Sun, Sep 6, 2015 at 5:11 PM, Jeff Zhang wrote: > >> If you want to reuse the data, you need to call rdd2.cache >

foreachRDD causing executor lost failure

2015-09-08 Thread Priya Ch
Hello All, I am using foreachRDD in my code as - dstream.foreachRDD { rdd => rdd.foreach { record => // look up with cassandra table // save updated rows to cassandra table. } } This foreachRDD is causing executor lost failure. what is the behavior of this foreachRDD ??? Thanks, Padma Ch

Spark Streaming..Exception

2015-09-12 Thread Priya Ch
Hello All, When I push messages into kafka and read into streaming application, I see the following exception- I am running the application on YARN and no where broadcasting the message within the application. Just simply reading message, parsing it and populating fields in a class and then prin

Re: Spark Streaming..Exception

2015-09-14 Thread Priya Ch
; true. What is the possible solution for this ? Is this a bug in Spark 1.3.0? Changing the scheduling mode to Stand-alone or Mesos mode would work fine ?? Please someone share your views on this. On Sat, Sep 12, 2015 at 11:04 PM, Priya Ch wrote: > Hello All, > > When I push messages into

SparkContext declared as object variable

2015-09-18 Thread Priya Ch
Hello All, Instead of declaring sparkContext in main, declared as object variable as - object sparkDemo { val conf = new SparkConf val sc = new SparkContext(conf) def main(args:Array[String]) { val baseRdd = sc.parallelize() . . . } } But this piece of code is giving

passing SparkContext as parameter

2015-09-21 Thread Priya Ch
Hello All, How can i pass sparkContext as a parameter to a method in an object. Because passing sparkContext is giving me TaskNotSerializable Exception. How can i achieve this ? Thanks, Padma Ch

Re: passing SparkContext as parameter

2015-09-21 Thread Priya Ch
ep 21, 2015 at 3:06 PM, Petr Novak wrote: > add @transient? > > On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch > wrote: > >> Hello All, >> >> How can i pass sparkContext as a parameter to a method in an object. >> Because passing sparkContext is giving me Ta

Re: passing SparkContext as parameter

2015-09-21 Thread Priya Ch
orkers ??? Regards, Padma Ch On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu wrote: > You can use broadcast variable for passing connection information. > > Cheers > > On Sep 21, 2015, at 4:27 AM, Priya Ch > wrote: > > can i use this sparkContext on executors ?? > In my applic

Re: passing SparkContext as parameter

2015-09-22 Thread Priya Ch
.map(func), which will run distributed on the spark >>> workers >>> >>> *Romi Kuntsman*, *Big Data Engineer* >>> http://www.totango.com >>> >>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch >>> wrote: >>&

Re: passing SparkContext as parameter

2015-09-22 Thread Priya Ch
..)).joinWithCassandra().map(...).saveToCassandra() > > I'm not sure about exactly 10 messages, spark streaming focus on time not > count.. > > > On Tue, Sep 22, 2015 at 2:14 AM, Priya Ch > wrote: > >> I have scenario like this - >> >> I read dstream

Re: SparkContext declared as object variable

2015-09-24 Thread Priya Ch
ode and the spark version is 1.3.0 This has thrown java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_5_piece0 of broadcast_5 Regards, Padma Ch On Thu, Sep 24, 2015 at 11:25 AM, Akhil Das wrote: > It should, i don't see any reason for it to not run in cluster mode. >

Video analytics on SPark

2016-09-09 Thread Priya Ch
Hi All, I have video surveillance data and this needs to be processed in Spark. I am going through the Spark + OpenCV. How to load .mp4 images into an RDD ? Can we directly do this or the video needs to be coverted to sequenceFile ? Thanks, Padma CH

DirectFileOutputCommitter in Spark 2.3.1

2018-09-19 Thread Priya Ch
Hello Team, I am trying to write a DataSet as parquet file in Append mode partitioned by few columns. However since the job is time consuming, I would like to enable DirectFileOutputCommitter (i.e by-passing the writes to temporary folder). Version of the spark i am using is 2.3.1. Can someone p

Custom SparkListener

2018-09-20 Thread Priya Ch
Hello All, I am trying to extend SparkListener and post job ends trying to retrieve job name to check the status of either success/failure and write to log file. I couldn't find a way where I could fetch job name in the onJobEnd method. Thanks, Padma CH

Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-04 Thread Priya Ch
Hi All, I configured Kafka cluster on a single node and I have streaming application which reads data from kafka topic using KafkaUtils. When I execute the code in local mode from the IDE, the application runs fine. But when I submit the same to spark cluster in standalone mode, I end up with

Spark build error

2014-08-06 Thread Priya Ch
Hi, I am trying to build jars using the command : mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package Execution of the above command is throwing the following error: [INFO] Spark Project Core . FAILURE [ 0.295 s] [INFO] Spark Project Bagel .

spark.local.dir and spark.worker.dir not used

2014-09-23 Thread Priya Ch
Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /

Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
Hi Team, When I am trying to use DenseMatrix of breeze library in spark, its throwing me the following error: java.lang.noclassdeffounderror: breeze/storage/Zero Can someone help me on this ? Thanks, Padma Ch

Fwd: Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
the classpath? In Spark >> 1.0, we use breeze 0.7, and in Spark 1.1 we use 0.9. If the breeze >> version you used is different from the one comes with Spark, you might >> see class not found. -Xiangrui >> >> On Fri, Oct 3, 2014 at 4:22 AM, Priya Ch >> wrote: >&

Default spark.deploy.recoveryMode

2014-10-14 Thread Priya Ch
Hi Spark users/experts, In Spark source code (Master.scala & Worker.scala), when registering the worker with master, I see the usage of *persistenceEngine*. When we don't specify spark.deploy.recovery mode explicitly, what is the default value used ? This recovery mode is used to persists and re

1gb file processing...task doesn't launch on all the node...Unseen exception

2014-11-14 Thread Priya Ch
Hi All, We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set up hdfs which has 2 TB capacity and the block size is 256 mb When we try to process 1 gb file on spark, we see the following exception 14/11/

Fwd: 1gb file processing...task doesn't launch on all the node...Unseen exception

2014-11-24 Thread Priya Ch
be > doing the processing. If yes, then try setting the level of parallelism and > number of partitions while creating/transforming the RDD. > > Thanks > Best Regards > > On Fri, Nov 14, 2014 at 5:17 PM, Priya Ch <[hidden email] > <http://user/SendEmail.jtp?type=node&nod

Spark exception when sending message to akka actor

2014-12-22 Thread Priya Ch
Hi All, I have akka remote actors running on 2 nodes. I submitted spark application from node1. In the spark code, in one of the rdd, i am sending message to actor running on node1. My Spark code is as follows: class ActorClient extends Actor with Serializable { import context._ val curre

TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-05 Thread Priya Ch
Hi All, Word2Vec and TF-IDF algorithms in spark mllib-1.1.0 are working only in local mode and not on distributed mode. Null pointer exception has been thrown. Is this a bug in spark-1.1.0 ? *Following is the code:* def main(args:Array[String]) { val conf=new SparkConf val sc=new Sp

Re: TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-09 Thread Priya Ch
Please find the attached worker log. I could see stream closed exception On Wed, Jan 7, 2015 at 10:51 AM, Xiangrui Meng wrote: > Could you attach the executor log? That may help identify the root > cause. -Xiangrui > > On Mon, Jan 5, 2015 at 11:12 PM, Priya Ch > wr