Re: saveAsTextFile at treeEnsembleModels.scala:447, took 2.513396 s Killed

2016-07-28 Thread Ascot Moss
Hi, Thanks for your reply. permissions (access) is not an issue in my case, it is because this issue only happened when the bigger input file was used to generate the model, i.e. with smaller input(s) all worked well. It seems to me that ".save" cannot save big file. Q1: Any idea about the siz

RE: saveAsTextFile is not writing to local fs

2016-02-01 Thread Mohammed Guller
ler Cc: spark users Subject: Re: saveAsTextFile is not writing to local fs Hi Mohamed, Thanks for your response. Data is available in worker nodes. But looking for something to write directly to local fs. Seems like it is not an option. Thanks, Sivakumar Bhavanari. On Mon, Feb 1, 2016 at 5

Re: saveAsTextFile is not writing to local fs

2016-02-01 Thread Siva
hor: Big Data Analytics with Spark > <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> > > > > *From:* Siva [mailto:sbhavan...@gmail.com] > *Sent:* Friday, January 29, 2016 5:40 PM > *To:* Mohammed Guller > *Cc:* spark users > *

RE: saveAsTextFile is not writing to local fs

2016-02-01 Thread Mohammed Guller
Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> From: Siva [mailto:sbhavan...@gmail.com] Sent: Friday, January 29, 2016 5:40 PM To: Mohammed Guller Cc: spark users Subject: Re: saveAsTextFile is not writing to local fs Hi Mohammed, Thanks fo

Re: saveAsTextFile is not writing to local fs

2016-01-29 Thread Siva
Hi Mohammed, Thanks for your quick response. I m submitting spark job to Yarn in "yarn-client" mode on a 6 node cluster. I ran the job by turning on DEBUG mode. I see the below exception, but this exception occurred after saveAsTextfile function is finished. 16/01/29 20:26:57 DEBUG HttpParser: ja

RE: saveAsTextFile is not writing to local fs

2016-01-29 Thread Mohammed Guller
Is it a multi-node cluster or you running Spark on a single machine? You can change Spark’s logging level to INFO or DEBUG to see what is going on. Mohammed Author: Big Data Analytics with Spark From: Siva [mailto:sbha

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ajay Chander
Hi Jacin, If I was you, first thing that I would do is, write a sample java application to write data into hdfs and see if it's working fine. Meta data is being created in hdfs, that means, communication to namenode is working fine but not to datanodes since you don't see any data inside the file.

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Jacinto Arias
Yes printing the result with collect or take is working, actually this is a minimal example, but also when working with real data the actions are performed, and the resulting RDDs can be printed out without problem. The data is there and the operations are correct, they just cannot be written t

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ted Yu
bq. val dist = sc.parallelize(l) Following the above, can you call, e.g. count() on dist before saving ? Cheers On Fri, Oct 2, 2015 at 1:21 AM, jarias wrote: > Dear list, > > I'm experimenting a problem when trying to write any RDD to HDFS. I've > tried > with minimal examples, scala programs

Re: saveAsTextFile() part- files are missing

2015-05-21 Thread Tomasz Fruboes
Hi, it looks you are writing to a local filesystem. Could you try writing to a location visible by all nodes (master and workers), e.g. nfs share? HTH, Tomasz W dniu 21.05.2015 o 17:16, rroxanaioana pisze: Hello! I just started with Spark. I have an application which counts words in a fi

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-16 Thread Ilya Ganelin
All - this issue showed up when I was tearing down a spark context and creating a new one. Often, I was unable to then write to HDFS due to this error. I subsequently switched to a different implementation where instead of tearing down and re initializing the spark context I'd instead submit a sepa

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-15 Thread Puneet Kapoor
I am seeing this on hadoop 2.4.0 version. Thanks for your suggestions, i will try those and let you know if they help ! On Sat, May 16, 2015 at 1:57 AM, Steve Loughran wrote: > What version of Hadoop are you seeing this on? > > > On 15 May 2015, at 20:03, Puneet Kapoor > wrote: > > Hey, > >

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-15 Thread Steve Loughran
What version of Hadoop are you seeing this on? On 15 May 2015, at 20:03, Puneet Kapoor mailto:puneet.cse.i...@gmail.com>> wrote: Hey, Did you find any solution for this issue, we are seeing similar logs in our Data node logs. Appreciate any help. 2015-05-15 10:51:43,615 ERROR org.apache.

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-15 Thread Puneet Kapoor
Hey, Did you find any solution for this issue, we are seeing similar logs in our Data node logs. Appreciate any help. 2015-05-15 10:51:43,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: NttUpgradeDN1:50010:DataXceiver error processing WRITE_BLOCK operation src: /192.168.112.190:46253

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan Murty
Thanks much for your help. Here's what was happening ... The HDP VM was running in VirtualBox and host was connected to the guest VM in NAT mode. When I connected this in "Bridged Adapter" mode it worked ! On Tue, May 5, 2015 at 8:54 PM, ayan guha wrote: > Try to add one more data node or make

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread ayan guha
Try to add one more data node or make minreplication to 0. Hdfs is trying to replicate at least one more copy and not able to find another DN to do thay On 6 May 2015 09:37, "Sudarshan Murty" wrote: > Another thing - could it be a permission problem ? > It creates all the directory structure (in

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan Murty
Another thing - could it be a permission problem ? It creates all the directory structure (in red)/tmp/wordcount/ _temporary/0/_temporary/attempt_201505051439_0001_m_01_3/part-1 so I am guessing not. On Tue, May 5, 2015 at 7:27 PM, Sudarshan Murty wrote: > You are most probably right

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan Murty
You are most probably right. I assumed others may have run into this. When I try to put the files in there, it creates a directory structure with the part-0 and part1 files but these files are of size 0 - no content. The client error and the server logs have the error message shown - which

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread ayan guha
What happens when you try to put files to your hdfs from local filesystem? Looks like its a hdfs issue rather than spark thing. On 6 May 2015 05:04, "Sudarshan" wrote: > > I have searched all replies to this question & not found an answer. > > I am running standalone Spark 1.3.1 and Hortonwork's

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
Copy should be doable but I'm not sure how to specify a prefix for the directory while keeping the filename (ie part-0) fixed in copy command. > On Apr 16, 2015, at 1:51 PM, Sean Owen wrote: > > Just copy the files? it shouldn't matter that much where they are as > you can find them easil

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
files and directories From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] Sent: Thursday, April 16, 2015 6:45 PM To: Evo Eftimov Cc: Subject: Re: saveAsTextFile Thanks Evo for your detailed explanation. On Apr 16, 2015, at 1:38 PM, Evo Eftimov wrote: The reason for this is

Re: saveAsTextFile

2015-04-16 Thread Sean Owen
Just copy the files? it shouldn't matter that much where they are as you can find them easily. Or consider somehow sending the batches of data straight into Redshift? no idea how that is done but I imagine it's doable. On Thu, Apr 16, 2015 at 6:38 PM, Vadim Bichutskiy wrote: > Thanks Sean. I want

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
Thanks Evo for your detailed explanation. > On Apr 16, 2015, at 1:38 PM, Evo Eftimov wrote: > > The reason for this is as follows: > > 1. You are saving data on HDFS > 2. HDFS as a cluster/server side Service has a Single Writer / Multiple > Reader multithreading model > 3.

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
@spark.apache.org Subject: Re: saveAsTextFile Thanks Sean. I want to load each batch into Redshift. What's the best/most efficient way to do that? Vadim > On Apr 16, 2015, at 1:35 PM, Sean Owen wrote: > > You can't, since that's how it's designed to work. Batches are saved >

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
Nop Sir, it is possible - check my reply earlier -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Thursday, April 16, 2015 6:35 PM To: Vadim Bichutskiy Cc: user@spark.apache.org Subject: Re: saveAsTextFile You can't, since that's how it's designed t

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
The reason for this is as follows: 1. You are saving data on HDFS 2. HDFS as a cluster/server side Service has a Single Writer / Multiple Reader multithreading model 3. Hence each thread of execution in Spark has to write to a separate file in HDFS 4. Moreover the

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
Thanks Sean. I want to load each batch into Redshift. What's the best/most efficient way to do that? Vadim > On Apr 16, 2015, at 1:35 PM, Sean Owen wrote: > > You can't, since that's how it's designed to work. Batches are saved > in different "files", which are really directories containing >

Re: saveAsTextFile

2015-04-16 Thread Sean Owen
You can't, since that's how it's designed to work. Batches are saved in different "files", which are really directories containing partitions, as is common in Hadoop. You can move them later, or just read them where they are. On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy wrote: > I am using S

Re: saveAsTextFile extremely slow near finish

2015-03-11 Thread Imran Rashid
is your data skewed? Could it be that there are a few keys with a huge number of records? You might consider outputting (recordA, count) (recordB, count) instead of recordA recordA recordA ... you could do this with: input = sc.textFile pairsCounts = input.map{x => (x,1)}.reduceByKey{_ + _}

Re: saveAsTextFile extremely slow near finish

2015-03-10 Thread Sean Owen
This is more of an aside, but why repartition this data instead of letting it define partitions naturally? You will end up with a similar number. On Mar 9, 2015 5:32 PM, "mingweili0x" wrote: > I'm basically running a sorting using spark. The spark program will read > from > HDFS, sort on composit

Re: saveAsTextFile extremely slow near finish

2015-03-09 Thread Akhil Das
Don't you think 1000 is too less for 160GB of data? Also you could try using KryoSerializer, Enabling RDD Compression. Thanks Best Regards On Mon, Mar 9, 2015 at 11:01 PM, mingweili0x wrote: > I'm basically running a sorting using spark. The spark program will read > from > HDFS, sort on compos

Re: saveAsTextFile of RDD[Array[Any]]

2015-02-09 Thread Jong Wook Kim
If you have `RDD[Array[Any]]` you can do rdd.map(_.mkString("\t")) or with some other delimiter to make it `RDD[String]`, and then call `saveAsTextFile`. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-of-RDD-Array-Any-tp21548p21554.html Se

Re: SaveAsTextFile to S3 bucket

2015-01-27 Thread Thomas Demoor
is possible to have output directory created under dev directory I > created upfront. > > From: Nick Pentreath > Date: Monday, January 26, 2015 9:15 PM > To: "user@spark.apache.org" > Subject: Re: SaveAsTextFile to S3 bucket > > Your output folder specifies &g

Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Chen, Kevin
to have output directory created under dev directory I created upfront. From: Nick Pentreath mailto:nick.pentre...@gmail.com>> Date: Monday, January 26, 2015 9:15 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re:

Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Ashish Rangole
By default, the files will be created under the path provided as the argument for saveAsTextFile. This argument is considered as a folder in the bucket and actual files are created in it with the naming convention part-n, where n is the number of output partition. On Mon, Jan 26, 2015 at 9

Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Nick Pentreath
Your output folder specifies rdd.saveAsTextFile("s3n://nexgen-software/dev/output"); So it will try to write to /dev/output which is as expected. If you create the directory /dev/output upfront in your bucket, and try to save it to that (empty) directory, what is the behaviour? On Tue, Jan 27, 2

Re: saveAsTextFile

2015-01-15 Thread ankits
I have seen this happen when the RDD contains null values. Essentially, saveAsTextFile calls toString() on the elements of the RDD, so a call to null.toString will result in an NPE. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-tp20951p21178

Re: saveAsTextFile

2015-01-15 Thread Prannoy
Hi, Before saving the rdd do a collect to the rdd and print the content of the rdd. Probably its a null value. Thanks. On Sat, Jan 3, 2015 at 5:37 PM, Pankaj Narang [via Apache Spark User List] < ml-node+s1001560n20953...@n3.nabble.com> wrote: > If you can paste the code here I can certainly he

Re: saveAsTextFile just uses toString and Row@37f108

2015-01-13 Thread Reynold Xin
It is just calling RDD's saveAsTextFile. I guess we should really override the saveAsTextFile in SchemaRDD (or make Row.toString comma separated). Do you mind filing a JIRA ticket and copy me? On Tue, Jan 13, 2015 at 12:03 AM, Kevin Burton wrote: > This is almost funny. > > I want to dump a co

Re: saveAsTextFile

2015-01-03 Thread Sanjay Subramanian
@lailaBased on the error u mentioned in the nabble link below, it seems like there are no permissions to write to HDFS. So this is possibly why saveAsTextFile is failing. From: Pankaj Narang To: user@spark.apache.org Sent: Saturday, January 3, 2015 4:07 AM Subject: Re

Re: saveAsTextFile

2015-01-03 Thread Pankaj Narang
If you can paste the code here I can certainly help. Also confirm the version of spark you are using Regards Pankaj Infoshore Software India -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-tp20951p20953.html Sent from the Apache Spark User

Re: saveAsTextFile error

2014-11-15 Thread Prannoy
Hi Niko, Have you tried it running keeping the wordCounts.print() ?? Possibly the import to the package *org.apache.spark.streaming._* is not there so during sbt package it is unable to locate the saveAsTextFile API. Go to https://github.com/apache/spark/blob/master/examples/src/main/scala/org/

Re: saveAsTextFile error

2014-11-14 Thread Harold Nguyen
Hi Niko, It looks like you are calling a method on DStream, which does not exist. Check out: https://spark.apache.org/docs/1.1.0/streaming-programming-guide.html#output-operations-on-dstreams for the method "saveAsTextFiles" Harold On Fri, Nov 14, 2014 at 10:39 AM, Niko Gamulin wrote: > Hi,

Re: saveAsTextFile makes no progress without caching RDD

2014-09-02 Thread jerryye
As an update. I'm still getting the same issue. I ended up doing a coalesce instead of a cache to get around the memory issue but saveAsTextFile still won't proceed without the coalesce or cache first. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTe

Re: saveAsTextFile hangs with hdfs

2014-08-26 Thread Burak Yavuz
Hi David, Your job is probably hanging on the groupByKey process. Probably GC is kicking in and the process starts to hang or the data is unbalanced and you end up with stragglers (Once GC kicks in you'll start to get the connection errors you shared). If you don't care about the list of value

Re: saveAsTextFile hangs with hdfs

2014-08-19 Thread evadnoob
Not sure if this is helpful or not, but in one executor "stderr" log, I found this: 14/08/19 20:17:04 INFO CacheManager: Partition rdd_5_14 not found, computing it 14/08/19 20:17:04 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/08/1

Re: saveAsTextFile hangs with hdfs

2014-08-19 Thread evadnoob
update: hangs even when not writing to hdfs. I changed the code to avoid saveAsTextFile() and instead do a forEachParitition and log the results. This time it hangs at 96/100 tasks, but still hangs. I changed the saveAsTextFile to: stringIntegerJavaPairRDD.foreachPartition(p -> {

Re: saveAsTextFile

2014-08-10 Thread durin
This should work: jobs.saveAsTextFile("file:home/hysom/testing") Note the 4 slashes, it's really 3 slashes + absolute path. This should be mentioned in the docu though, I only remember that from having seen it somewhere else. The output folder, here "testing", will be created and must theref