Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt

2020-04-06 Thread jane thorpe
Did you know that simple demo program of reading characters from file didn't work ? Who wrote that simple hello world type little program ? jane thorpe janethor...@aol.com -Original Message- From: jane thorpe To: somplasticllc ; user Sent: Fri, 3 Apr 2020 2:44 Subject: Re: HDF

Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt

2020-04-06 Thread Som Lima
> > jane thorpe > janethor...@aol.com > > > -Original Message- > From: jane thorpe > To: somplasticllc ; user > Sent: Fri, 3 Apr 2020 2:44 > Subject: Re: HDFS file hdfs:// > 127.0.0.1:9000/hdfs/spark/examples/README.txt > > > Thanks darling > >

Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt

2020-04-02 Thread jane thorpe
0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at textFile at :27 counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at reduceByKey at :30 scala> :quit jane thorpe janethor...@aol.com -Original Message- From: Som Lima CC: user Sent: Tue, 31 Mar 2020

Re: HDFS file

2020-03-31 Thread Som Lima
Hi Jane Try this example https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala Som On Tue, 31 Mar 2020, 21:34 jane thorpe, wrote: > hi, > > Are there setup instructions on the website for >

Re: HDFS or NFS as a cache?

2017-10-02 Thread Miguel Morales
From: Steve Loughran [mailto:ste...@hortonworks.com] > > Sent: Saturday, September 30, 2017 6:10 AM > > To: JG Perrin <jper...@lumeris.com> > > Cc: Alexander Czech <alexander.cz...@googlemail.com>; > user@spark.apache.org > > Subject: Re: HDFS or NFS as a

Re: HDFS or NFS as a cache?

2017-10-02 Thread Marcelo Vanzin
> From: Steve Loughran [mailto:ste...@hortonworks.com] > Sent: Saturday, September 30, 2017 6:10 AM > To: JG Perrin <jper...@lumeris.com> > Cc: Alexander Czech <alexander.cz...@googlemail.com>; user@spark.apache.org > Subject: Re: HDFS or NFS as a cache? > > >

RE: HDFS or NFS as a cache?

2017-10-02 Thread JG Perrin
[mailto:ste...@hortonworks.com] Sent: Saturday, September 30, 2017 6:10 AM To: JG Perrin <jper...@lumeris.com> Cc: Alexander Czech <alexander.cz...@googlemail.com>; user@spark.apache.org Subject: Re: HDFS or NFS as a cache? On 29 Sep 2017, at 20:03, JG Perrin <jper...@lumeris.

Re: HDFS or NFS as a cache?

2017-09-30 Thread Steve Loughran
On 29 Sep 2017, at 20:03, JG Perrin > wrote: You will collect in the driver (often the master) and it will save the data, so for saving, you will not have to set up HDFS. no, it doesn't work quite like that. 1. workers generate their data and

Re: HDFS or NFS as a cache?

2017-09-30 Thread Steve Loughran
On 29 Sep 2017, at 15:59, Alexander Czech > wrote: Yes I have identified the rename as the problem, that is why I think the extra bandwidth of the larger instances might not help. Also there is a consistency issue with S3

RE: HDFS or NFS as a cache?

2017-09-29 Thread JG Perrin
You will collect in the driver (often the master) and it will save the data, so for saving, you will not have to set up HDFS. From: Alexander Czech [mailto:alexander.cz...@googlemail.com] Sent: Friday, September 29, 2017 8:15 AM To: user@spark.apache.org Subject: HDFS or NFS as a cache? I have

Re: HDFS or NFS as a cache?

2017-09-29 Thread Alexander Czech
Yes I have identified the rename as the problem, that is why I think the extra bandwidth of the larger instances might not help. Also there is a consistency issue with S3 because of the how the rename works so that I probably lose data. On Fri, Sep 29, 2017 at 4:42 PM, Vadim Semenov

Re: HDFS or NFS as a cache?

2017-09-29 Thread Vadim Semenov
How many files you produce? I believe it spends a lot of time on renaming the files because of the output committer. Also instead of 5x c3.2xlarge try using 2x c3.8xlarge instead because they have 10GbE and you can get good throughput for S3. On Fri, Sep 29, 2017 at 9:15 AM, Alexander Czech <

Re: hdfs persist rollbacks when spark job is killed

2016-08-08 Thread Gourav Sengupta
There is a mv command in GCS but I am not quite sure (because of limitation of data on which I work on it and lack my budget) whether the mv command actually copies and deletes or just re-points the files to a new directory by changing its meta-data. Yes the Data Quality checks are done after the

Re: hdfs persist rollbacks when spark job is killed

2016-08-08 Thread Chanh Le
Thank you Gourav, > Moving files from _temp folders to main folders is an additional overhead > when you are working on S3 as there is no move operation. Good catch. Is that GCS the same? > I generally have a set of Data Quality checks after each job to ascertain > whether everything went

Re: hdfs persist rollbacks when spark job is killed

2016-08-08 Thread Gourav Sengupta
But you have to be careful, that is the default setting. There is a way you can overwrite it so that the writing to _temp folder does not take place and you write directly to the main folder. Moving files from _temp folders to main folders is an additional overhead when you are working on S3 as

Re: hdfs persist rollbacks when spark job is killed

2016-08-08 Thread Chanh Le
It’s out of the box in Spark. When you write data into hfs or any storage it only creates a new parquet folder properly if your Spark job was success else only _temp folder inside to mark it’s still not success (spark was killed) or nothing inside (Spark job was failed). > On Aug 8, 2016,

Re: HDFS

2015-12-14 Thread Akhil Das
Try to set the spark.locality.wait to a higher number and see if things change. You can read more about the configuration properties from here http://spark.apache.org/docs/latest/configuration.html#scheduling Thanks Best Regards On Sat, Dec 12, 2015 at 12:16 AM, shahid ashraf

RE: hdfs-ha on mesos - odd bug

2015-11-11 Thread Buttler, David
Vanzin [mailto:van...@cloudera.com] Sent: Tuesday, September 15, 2015 7:47 PM To: Adrian Bridgett Cc: user Subject: Re: hdfs-ha on mesos - odd bug On Mon, Sep 14, 2015 at 6:55 AM, Adrian Bridgett <adr...@opensignal.com> wrote: > 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage

Re: HDFS small file generation problem

2015-10-03 Thread nibiau
ndredi 2 Octobre 2015 18:37:22 Objet: Re: HDFS small file generation problem Ok thanks, but can I also update data instead of insert data ? - Mail original - De: "Brett Antonides" <banto...@gmail.com> À: user@spark.apache.org Envoyé: Vendredi 2 Octobre 2015 18:18:18 Objet

Re: HDFS small file generation problem

2015-10-03 Thread nibiau
;Jörn Franke" <jornfra...@gmail.com> À: nib...@free.fr, "Brett Antonides" <banto...@gmail.com> Cc: user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 11:17:51 Objet: Re: HDFS small file generation problem You can update data in hive if you use the orc format Le

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
nks a lot ! > Nicolas > > > - Mail original - > De: "Jörn Franke" <jornfra...@gmail.com> > À: nib...@free.fr, "Brett Antonides" <banto...@gmail.com> > Cc: user@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 11:17:51 > Objet: Re: H

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
nides" <banto...@gmail.com> > Cc: user@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 11:17:51 > Objet: Re: HDFS small file generation problem > > > > You can update data in hive if you use the orc format > > > > Le sam. 3 oct. 2015 à 10:42, < nib...@

Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
gt; - Mail original - >> De: "Jörn Franke" <jornfra...@gmail.com> >> À: nib...@free.fr, "Brett Antonides" <banto...@gmail.com> >> Cc: user@spark.apache.org >> Envoyé: Samedi 3 Octobre 2015 11:17:51 >> Objet: Re: HDFS small fil

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
; Nicolas > > - Mail original - > De: nib...@free.fr > À: "Brett Antonides" <banto...@gmail.com> > Cc: user@spark.apache.org > Envoyé: Vendredi 2 Octobre 2015 18:37:22 > Objet: Re: HDFS small file generation problem > > Ok thanks, but can I also upda

RE : Re: HDFS small file generation problem

2015-10-03 Thread nibiau
user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 11:17:51 Objet: Re: HDFS small file generation problem You can update data in hive if you use the orc format Le sam. 3 oct. 2015 à 10:42, < nib...@free.fr > a écrit : Hello, Finally Hive is not a solution as I cannot update the data.

Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread nibiau
Thanks a lot, why you said "the most recent version" ? - Mail original - De: "Jörn Franke" <jornfra...@gmail.com> À: "nibiau" <nib...@free.fr> Cc: banto...@gmail.com, user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 13:56:43 Objet: Re: RE : Re:

Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 13:56:43 > Objet: Re: RE : Re: HDFS small file generation problem > > > > Yes the most recent version yes, or you can use phoenix on top of hbase. I > recommend to try out both and see which one is the most suitable. > > &

Re: HDFS small file generation problem

2015-10-02 Thread nibiau
: "Jörn Franke" <jornfra...@gmail.com> À: nib...@free.fr, "user" <user@spark.apache.org> Envoyé: Lundi 28 Septembre 2015 23:53:56 Objet: Re: HDFS small file generation problem Use hadoop archive Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écrit :

Re: HDFS small file generation problem

2015-10-02 Thread Brett Antonides
Mail original - > De: "Jörn Franke" <jornfra...@gmail.com> > À: nib...@free.fr, "user" <user@spark.apache.org> > Envoyé: Lundi 28 Septembre 2015 23:53:56 > Objet: Re: HDFS small file generation problem > > > > Use hadoop archive > > >

Re: HDFS small file generation problem

2015-10-02 Thread nibiau
Ok thanks, but can I also update data instead of insert data ? - Mail original - De: "Brett Antonides" <banto...@gmail.com> À: user@spark.apache.org Envoyé: Vendredi 2 Octobre 2015 18:18:18 Objet: Re: HDFS small file generation problem I had a very similar pr

Re: HDFS small file generation problem

2015-09-28 Thread Jörn Franke
Use hadoop archive Le dim. 27 sept. 2015 à 15:36, a écrit : > Hello, > I'm still investigating my small file generation problem generated by my > Spark Streaming jobs. > Indeed, my Spark Streaming jobs are receiving a lot of small events (avg > 10kb), and I have to store them

Re: HDFS is undefined

2015-09-28 Thread Akhil Das
For some reason Spark isnt picking up your hadoop confs, Did you download spark compiled with the hadoop version that you are having in the cluster? Thanks Best Regards On Fri, Sep 25, 2015 at 7:43 PM, Angel Angel wrote: > hello, > I am running the spark application. >

Re: HDFS is undefined

2015-09-28 Thread Ted Yu
Please post the question on vendor's forum. > On Sep 25, 2015, at 7:13 AM, Angel Angel wrote: > > hello, > I am running the spark application. > > I have installed the cloudera manager. > it includes the spark version 1.2.0 > > > But now i want to use spark version

Re: HDFS small file generation problem

2015-09-27 Thread ayan guha
I would suggest not to write small files to hdfs. rather you can hold them in memory, maybe off heap. and then you may flush it to hdfs using another job. similar to https://github.com/ptgoetz/storm-hdfs (not sure if spark already has something like it) On Sun, Sep 27, 2015 at 11:36 PM,

Re: HDFS small file generation problem

2015-09-27 Thread Deenar Toraskar
You could try a couple of things a) use Kafka for stream processing, store current incoming events and spark streaming job ouput in Kafka rather than on HDFS and dual write to HDFS too (in a micro batched mode), so every x minutes. Kafka is more suited to processing lots of small events/ b)

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Marcelo Vanzin
On Mon, Sep 14, 2015 at 6:55 AM, Adrian Bridgett wrote: > 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > 10.1.200.245): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at >

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Adrian Bridgett
Hi Sam, in short, no, it's a traditional install as we plan to use spot instances and didn't want price spikes to kill off HDFS. We're actually doing a bit of a hybrid, using spot instances for the mesos slaves, ondemand for the mesos masters. So for the time being, putting hdfs on the

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Steve Loughran
> On 15 Sep 2015, at 08:55, Adrian Bridgett wrote: > > Hi Sam, in short, no, it's a traditional install as we plan to use spot > instances and didn't want price spikes to kill off HDFS. > > We're actually doing a bit of a hybrid, using spot instances for the mesos >

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Iulian Dragoș
I've seen similar traces, but couldn't track down the failure completely. You are using Kerberos for your HDFS cluster, right? AFAIK Kerberos isn't supported in Mesos deployments. Can you resolve that host name (nameservice1) from the driver machine (ping nameservice1)? Can it be resolved from

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Adrian Bridgett
Thanks Steve - we are already taking the safe route - putting NN and datanodes on the central mesos-masters which are on demand. Later (much later!) we _may_ put some datanodes on spot instances (and using several spot instance types as the spikes seem to only affect one type - worst case we

Re: hdfs-ha on mesos - odd bug

2015-09-14 Thread Sam Bessalah
I don't know about the broken url. But are you running HDFS as a mesos framework? If so is it using mesos-dns? Then you should resolve the namenode via hdfs:/// On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett wrote: > I'm hitting an odd issue with running spark on

Re: HDFS performances + unexpected death of executors.

2015-07-14 Thread Max Demoulin
I will try a fresh setup very soon. Actually, I tried to compile spark by myself, against hadoop 2.5.2, but I had the issue that I mentioned in this thread: http://apache-spark-user-list.1001560.n3.nabble.com/Master-doesn-t-start-no-logs-td23651.html I was wondering if maybe

Re: HDFS not supported by databricks cloud :-(

2015-06-16 Thread Simon Elliston Ball
You could consider using Zeppelin and spark on yarn as an alternative. http://zeppelin.incubator.apache.org/ Simon On 16 Jun 2015, at 17:58, Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID wrote: hey guys After day one at the spark-summit SFO, I realized sadly that (indeed) HDFS

Re: HDFS Rest Service not available

2015-06-02 Thread Akhil Das
It says your namenode is down (connection refused on 8020), you can restart your HDFS by going into hadoop directory and typing sbin/stop-dfs.sh and then sbin/start-dfs.sh Thanks Best Regards On Tue, Jun 2, 2015 at 5:03 AM, Su She suhsheka...@gmail.com wrote: Hello All, A bit scared I did

Re: HDFS Rest Service not available

2015-06-02 Thread Su She
Ahh, this did the trick, I had to get the name node out of same mode however before it fully worked. Thanks! On Tue, Jun 2, 2015 at 12:09 AM, Akhil Das ak...@sigmoidanalytics.com wrote: It says your namenode is down (connection refused on 8020), you can restart your HDFS by going into hadoop

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-27 Thread Su She
Thanks Akhil! 1) I had to do sudo -u hdfs hdfs dfsadmin -safemode leave a) I had created a user called hdfs with superuser privileges in Hue, hence the double hdfs. 2) Lastly, I know this is getting a bit off topic, but this is my etc/hosts file: 127.0.0.1 localhost.localdomain

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-26 Thread Su She
Hello Sean and Akhil, I shut down the services on Cloudera Manager. I shut them down in the appropriate order and then stopped all services of CM. I then shut down my instances. I then turned my instances back on, but I am getting the same error. 1) I tried hadoop fs -safemode leave and it said

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-26 Thread Akhil Das
Command would be: hadoop dfsadmin -safemode leave If you are not able to ping your instances, it can be because of you are blocking all the ICMP requests. Im not quiet sure why you are not able to ping google.com from your instances. Make sure the internal IP (ifconfig) is proper in the

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-22 Thread Sean Owen
If you are using CDH, you would be shutting down services with Cloudera Manager. I believe you can do it manually using Linux 'services' if you do the steps correctly across your whole cluster. I'm not sure if the stock stop-all.sh script is supposed to work. Certainly, if you are using CM, by far

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Su She
Thanks Akhil and Sean for the responses. I will try shutting down spark, then storage and then the instances. Initially, when hdfs was in safe mode, I waited for 1 hour and the problem still persisted. I will try this new method. Thanks! On Sat, Jan 17, 2015 at 2:03 AM, Sean Owen

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Akhil Das
Safest way would be to first shutdown HDFS and then shutdown Spark (call stop-all.sh would do) and then shutdown the machines. You can execute the following command to disable safe mode: *hadoop fs -safemode leave* Thanks Best Regards On Sat, Jan 17, 2015 at 8:31 AM, Su She

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Sean Owen
You would not want to turn off storage underneath Spark. Shut down Spark first, then storage, then shut down the instances. Reverse the order when restarting. HDFS will be in safe mode for a short time after being started before it becomes writeable. I would first check that it's not just that.

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
Try (hdfs:///localhost:8020/user/data/*) With 3 /. Thx tri -Original Message- From: Benjamin Cuthbert [mailto:cuthbert@gmail.com] Sent: Monday, December 01, 2014 4:41 PM To: user@spark.apache.org Subject: hdfs streaming context All, Is it possible to stream on HDFS directory

Re: hdfs streaming context

2014-12-01 Thread Andy Twigg
Have you tried just passing a path to ssc.textFileStream() ? It monitors the path for new files by looking at mtime/atime ; all new/touched files in the time window appear as an rdd in the dstream. On 1 December 2014 at 14:41, Benjamin Cuthbert cuthbert@gmail.com wrote: All, Is it possible

Re: hdfs streaming context

2014-12-01 Thread Sean Owen
Yes, in fact, that's the only way it works. You need hdfs://localhost:8020/user/data, I believe. (No it's not correct to write hdfs:///...) On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert cuthbert@gmail.com wrote: All, Is it possible to stream on HDFS directory and listen for multiple

Re: hdfs streaming context

2014-12-01 Thread Benjamin Cuthbert
Thanks Sean, That worked just removing the /* and leaving it as /user/data Seems to be streaming in. On 1 Dec 2014, at 22:50, Sean Owen so...@cloudera.com wrote: Yes, in fact, that's the only way it works. You need hdfs://localhost:8020/user/data, I believe. (No it's not correct to

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
@spark.apache.org Subject: Re: hdfs streaming context Yes, in fact, that's the only way it works. You need hdfs://localhost:8020/user/data, I believe. (No it's not correct to write hdfs:///...) On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert cuthbert@gmail.com wrote: All, Is it possible

Re: hdfs streaming context

2014-12-01 Thread Sean Owen
Yes but you can't follow three slashes with host:port. No host probably defaults to whatever is found in your HDFS config. On Mon, Dec 1, 2014 at 11:02 PM, Bui, Tri tri@verizonwireless.com wrote: For the streaming example I am working on, Its accepted (hdfs:///user/data) without the

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
@spark.apache.org Subject: Re: hdfs streaming context Yes but you can't follow three slashes with host:port. No host probably defaults to whatever is found in your HDFS config. On Mon, Dec 1, 2014 at 11:02 PM, Bui, Tri tri@verizonwireless.com wrote: For the streaming example I am working on, Its

Re: HDFS read text file

2014-11-17 Thread Akhil Das
You can use the sc.objectFile https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext to read it. It will be RDD[Student] type. Thanks Best Regards On Mon, Nov 17, 2014 at 4:03 PM, Naveen Kumar Pokala npok...@spcapitaliq.com wrote: Hi, JavaRDDInstrument

Re: HDFS read text file

2014-11-17 Thread Hlib Mykhailenko
Hello Naveen, I think you should first override toString method of your sample.spark.test.Student class. -- Cordialement, Hlib Mykhailenko Doctorant à INRIA Sophia-Antipolis Méditerranée 2004 Route des Lucioles BP93 06902 SOPHIA ANTIPOLIS cedex - Original Message - From:

Re: hdfs read performance issue

2014-08-20 Thread Gurvinder Singh
I got some time to look in to it. It appears as that Spark (latest git) is doing this operation much more often compare to Aug 1 version. Here is the log from operation I am referring to 14/08/19 12:37:26 INFO spark.CacheManager: Partition rdd_8_414 not found, computing it 14/08/19 12:37:26 INFO

Re: hdfs replication on saving RDD

2014-07-15 Thread Andrew Ash
In general it would be nice to be able to configure replication on a per-job basis. Is there a way to do that without changing the config values in the Hadoop conf/ directory between jobs? Maybe by modifying OutputFormats or the JobConf ? On Mon, Jul 14, 2014 at 11:12 PM, Matei Zaharia

Re: hdfs replication on saving RDD

2014-07-15 Thread Kan Zhang
Andrew, there are overloaded versions of saveAsHadoopFile or saveAsNewAPIHadoopFile that allow you to pass in a per-job Hadoop conf. saveAsTextFile is just a convenience wrapper on top of saveAsHadoopFile. On Mon, Jul 14, 2014 at 11:22 PM, Andrew Ash and...@andrewash.com wrote: In general it

Re: hdfs replication on saving RDD

2014-07-14 Thread valgrind_girl
eager to know this issue too,does any one knows how? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/hdfs-replication-on-saving-RDD-tp289p9700.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: hdfs replication on saving RDD

2014-07-14 Thread Matei Zaharia
You can change this setting through SparkContext.hadoopConfiguration, or put the conf/ directory of your Hadoop installation on the CLASSPATH when you launch your app so that it reads the config values from there. Matei On Jul 14, 2014, at 8:06 PM, valgrind_girl 124411...@qq.com wrote: eager

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-23 Thread Andrew Lee
Commit: 5f48721, github.com/apache/spark/pull/586 From: alee...@hotmail.com To: user@spark.apache.org Subject: RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode Date: Wed, 18 Jun 2014 11:24:36 -0700 Forgot to mention that I am using spark-submit to submit jobs

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-18 Thread Andrew Lee
Forgot to mention that I am using spark-submit to submit jobs, and a verbose mode print out looks like this with the SparkPi examples.The .sparkStaging won't be deleted. My thoughts is that this should be part of the staging and should be cleaned up as well when sc gets terminated.

Re: HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-12 Thread bijoy deb
Hi, The problem was due to a pre-built/binary Tachyon-0.4.1 jar in the SPARK_CLASSPATH, and that Tachyon jar had been built against Hadoop-1.0.4.Building the Tachyon against Hadoop-2.0.0 resolved the issue. Thanks On Wed, Jun 11, 2014 at 11:34 PM, Marcelo Vanzin van...@cloudera.com wrote:

Re: HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-11 Thread bijoy deb
Any suggestions from anyone? Thanks Bijoy On Tue, Jun 10, 2014 at 11:46 PM, bijoy deb bijoy.comput...@gmail.com wrote: Hi all, I have build Shark-0.9.1 using sbt using the below command: *SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.6.0 sbt/sbt assembly* My Hadoop cluster is also having version

Re: HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-11 Thread Marcelo Vanzin
The error is saying that your client libraries are older than what your server is using (2.0.0-mr1-cdh4.6.0 is IPC version 7). Try double-checking that your build is actually using that version (e.g., by looking at the hadoop jar files in lib_managed/jars). On Wed, Jun 11, 2014 at 2:07 AM, bijoy