RE: newbie question for reduce

2022-01-27 Thread Christopher Robson
or that you are seeing. There are several ways you could fix it. One way is to use a map before the reduce, e.g. rdd..map(lambda x: x[1]).reduce(lambda x,y: x + y) Hope that's helpful, Chris -Original Message- From: capitnfrak...@free.fr Sent: 19 January 2022 02:41 To: us

Re: newbie question for reduce

2022-01-18 Thread Sean Owen
The problem is that you are reducing a list of tuples, but you are producing an int. The resulting int can't be combined with other tuples with your function. reduce() has to produce the same type as its arguments. rdd.map(lambda x: x[1]).reduce(lambda x,y: x+y) ... would work On Tue, Jan 18, 2022

newbie question for reduce

2022-01-18 Thread capitnfrakass
Hello Please help take a look why my this simple reduce doesn't work? rdd = sc.parallelize([("a",1),("b",2),("c",3)]) rdd.reduce(lambda x,y: x[1]+y[1]) Traceback (most recent call last): File "", line 1, in File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce return reduce(f

Re: Spark Newbie question

2019-07-11 Thread infa elance
Thanks Jerry for the clarification. Ajay. On Thu, Jul 11, 2019 at 12:48 PM Jerry Vinokurov wrote: > Hi Ajay, > > When a Spark SQL statement references a table, that table has to be > "registered" first. Usually the way this is done is by reading in a > DataFrame, then calling the createOrRepla

Re: Spark Newbie question

2019-07-11 Thread Jerry Vinokurov
Hi Ajay, When a Spark SQL statement references a table, that table has to be "registered" first. Usually the way this is done is by reading in a DataFrame, then calling the createOrReplaceTempView (or one of a few other functions) on that data frame, with the argument being the name under which yo

Re: Spark Newbie question

2019-07-11 Thread infa elance
Sorry, i guess i hit the send button too soon This question is regarding a spark stand-alone cluster. My understanding is spark is an execution engine and not a storage layer. Spark processes data in memory but when someone refers to a spark table created through sparksql(df/rdd) what exactly

Spark Newbie question

2019-07-11 Thread infa elance
This is stand-alone spark cluster. My understanding is spark is an execution engine and not a storage layer. Spark processes data in memory but when someone refers to a spark table created through sparksql(df/rdd) what exactly are they referring to? Could it be a Hive table? If yes, is it the same

Re: Newbie question on how to extract column value

2018-08-07 Thread James Starks
Because of some legacy issues I can't immediately upgrade spark version. But I try filter data before loading it into spark based on the suggestion by val df = sparkSession.read.format("jdbc").option(...).option("dbtable", "(select .. from ... where url <> '') table_name")load() df

Re: Newbie question on how to extract column value

2018-08-07 Thread Gourav Sengupta
Hi James, It is always advisable to use the latest SPARK version. That said, can you please giving a try to dataframes and udf if possible. I think, that would be a much scalable way to address the issue. Also in case possible, it is always advisable to use the filter option before fetching the d

Newbie question on how to extract column value

2018-08-07 Thread James Starks
I am very new to Spark. Just successfully setup Spark SQL connecting to postgresql database, and am able to display table with code sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show() Now I want to perform filter and map function on col_b value. In plain scala it would

Re: newbie question about RDD

2016-11-22 Thread Mohit Durgapal
Hi Raghav, Please refer to the following code: SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("PersonApp"); //creating java spark context JavaSparkContext sc = new JavaSparkContext(sparkConf); //reading file from hfs into spark rdd , the name node is localhost JavaRDD p

Re: newbie question about RDD

2016-11-21 Thread Raghav
Sorry I forgot to ask how can I use spark context here ? I have hdfs directory path of the files, as well as the name node of hdfs cluster. Thanks for your help. On Mon, Nov 21, 2016 at 9:45 PM, Raghav wrote: > Hi > > I am extremely new to Spark. I have to read a file form HDFS, and get it > in

newbie question about RDD

2016-11-21 Thread Raghav
Hi I am extremely new to Spark. I have to read a file form HDFS, and get it in memory in RDD format. I have a Java class as follows: class Person { private long UUID; private String FirstName; private String LastName; private String zip; // public methods } The file in HDFS

Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Jon Gregg
t; > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark- > tp28032p28069.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --

Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Rishikesh Teke
Integrate spark with apache zeppelin https://zeppelin.apache.org/ <https://zeppelin.apache.org/> its again a very handy way to bootstrap with spark. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with

Re: Newbie question - Best way to bootstrap with Spark

2016-11-10 Thread jggg777
astic MapReduce cluster with Spark pre-installed, but you'll need to sign up for an AWS account. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032p28061.html Sent from the Apache Spark User List ma

Re: Newbie question - Best way to bootstrap with Spark

2016-11-07 Thread Raghav
hance to get my > hands dirty. There are tons of resources for Spark, but I am looking for > some guidance for starter material, or videos. > > Thanks. > > Raghav > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/New

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Denny Lee
dirty. There are tons of resources for Spark, but I am looking for > some guidance for starter material, or videos. > > Thanks. > > Raghav > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-b

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Raghav
t I am looking for >> some guidance for starter material, or videos. >> >> Thanks. >> >> Raghav >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Newbie-questio

Re: Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread warmb...@qq.com
.com From: ayan guha Date: 2016-11-07 10:08 To: raghav CC: user Subject: Re: Newbie question - Best way to bootstrap with Spark I would start with Spark documentation, really. Then you would probably start with some older videos from youtube, especially spark summit 2014,2015 and 2016 videos. Rega

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread ayan guha
p Reduce but have not had a chance to get my > hands dirty. There are tons of resources for Spark, but I am looking for > some guidance for starter material, or videos. > > Thanks. > > Raghav > > > > -- > View this message in context: http://apache-spark-user-list. >

Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread raghav
some guidance for starter material, or videos. Thanks. Raghav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
ew SparkConf() is called from >> main. Top few lines of the exception are pasted below. >> >> These are the following versions: >> >> Spark jar: spark-assembly-1.6.0-hadoop2.6.0.jar >> pom: spark-core_2.11 >> 1.6.0 >> >> I h

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
IntelliJ. I get a run time error as soon as new SparkConf() is called >>>> from >>>> main. Top few lines of the exception are pasted below. >>>> >>>> These are the following versions: >>>> >>>> Spark jar: spark-assembly-1.6.0-hadoop2.6.

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
o added a library dependency in the project structure. > > Thanks for any help! > > Vasu > > > Exception in thread "main" java.lang.NoSuchMethodError: > scala.Predef$.augmentString(Ljava/lang/String;)Ljava/lang/String; > at org.apache.spark.util.Utils$

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Jacek Laskowski
ed a dependency. >>> >>> I have also added a library dependency in the project structure. >>> >>> Thanks for any help! >>> >>> Vasu >>> >>> >>> Exception in thread "main" java.l

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
s$.(Utils.scala:1682) >> at org.apache.spark.util.Utils$.(Utils.scala) >> at org.apache.spark.SparkConf.(SparkConf.scala:59) >> >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-use

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
t;> at org.apache.spark.util.Utils$.(Utils.scala:1682) >> at org.apache.spark.util.Utils$.(Utils.scala) >> at org.apache.spark.SparkConf.(SparkConf.scala:59) >> >> >> >> >> >> >>

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Jacek Laskowski
va/lang/String; > at org.apache.spark.util.Utils$.(Utils.scala:1682) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.SparkConf.(SparkConf.scala:59) > > > > > > > -- > View this message in context: > http://apache-spark

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Ted Yu
l.Utils$.(Utils.scala:1682) > at org.apache.spark.util.Utils$.(Utils.scala) > at org.apache.spark.SparkConf.(SparkConf.scala:59) > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-H

Newbie question - Help with runtime error on augmentString

2016-03-11 Thread vasu20
Ljava/lang/String;)Ljava/lang/String; at org.apache.spark.util.Utils$.(Utils.scala:1682) at org.apache.spark.util.Utils$.(Utils.scala) at org.apache.spark.SparkConf.(SparkConf.scala:59) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.

Re: Newbie question

2016-01-07 Thread dEEPU
If the method is not final or static then u can On Jan 8, 2016 12:07 PM, yuliya Feldman wrote: Hello, I am new to Spark and have a most likely basic question - can I override a method from SparkContext? Thanks

Re: Newbie question

2016-01-07 Thread yuliya Feldman
Thank you From: Deepak Sharma To: yuliya Feldman Cc: "user@spark.apache.org" Sent: Thursday, January 7, 2016 10:41 PM Subject: Re: Newbie question Yes , you can do it unless the method is marked static/final.Most of the methods in SparkContext are marked static so

Re: Newbie question

2016-01-07 Thread censj
You can try it. > 在 2016年1月8日,14:44,yuliya Feldman 写道: > > invoked

Re: Newbie question

2016-01-07 Thread yuliya Feldman
e.org" Sent: Thursday, January 7, 2016 10:38 PM Subject: Re: Newbie question why to override a method from SparkContext? 在 2016年1月8日,14:36,yuliya Feldman 写道: Hello, I am new to Spark and have a most likely basic question - can I override a method from SparkContext? Thanks

Re: Newbie question

2016-01-07 Thread Deepak Sharma
Yes , you can do it unless the method is marked static/final. Most of the methods in SparkContext are marked static so you can't over ride them definitely , else over ride would work usually. Thanks Deepak On Fri, Jan 8, 2016 at 12:06 PM, yuliya Feldman wrote: > Hello, > > I am new to Spark and

Re: Newbie question

2016-01-07 Thread censj
why to override a method from SparkContext? > 在 2016年1月8日,14:36,yuliya Feldman 写道: > > Hello, > > I am new to Spark and have a most likely basic question - can I override a > method from SparkContext? > > Thanks

Newbie question

2016-01-07 Thread yuliya Feldman
Hello, I am new to Spark and have a most likely basic question - can I override a method from SparkContext? Thanks

Spark ML/MLib newbie question

2015-10-19 Thread George Paulson
spark-user-list.1001560.n3.nabble.com/Spark-ML-MLib-newbie-question-tp25129.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional c

Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Corey Nolet
1) Spark only needs to shuffle when data needs to be partitioned around the workers in an all-to-all fashion. 2) Multi-stage jobs that would normally require several map reduce jobs, thus causing data to be dumped to disk between the jobs can be cached in memory.

Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Hien Luu
This blog outlines a few things that make Spark faster than MapReduce - https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html On Fri, Aug 7, 2015 at 9:13 AM, Muler wrote: > Consider the classic word count application over a 4 node cluster with a > sizable working data. What makes Spark

Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Muler
Consider the classic word count application over a 4 node cluster with a sizable working data. What makes Spark ran faster than MapReduce considering that Spark also has to write to disk during shuffle?

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
Thanks! On Wed, Aug 5, 2015 at 5:24 PM, Saisai Shao wrote: > Yes, finally shuffle data will be written to disk for reduce stage to > pull, no matter how large you set to shuffle memory fraction. > > Thanks > Saisai > > On Thu, Aug 6, 2015 at 7:50 AM, Muler wrote: > >> thanks, so if I have enoug

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Saisai Shao
Yes, finally shuffle data will be written to disk for reduce stage to pull, no matter how large you set to shuffle memory fraction. Thanks Saisai On Thu, Aug 6, 2015 at 7:50 AM, Muler wrote: > thanks, so if I have enough large memory (with enough > spark.shuffle.memory) then shuffle (in-memory

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
thanks, so if I have enough large memory (with enough spark.shuffle.memory) then shuffle (in-memory shuffle) spill doesn't happen (per node) but still shuffle data has to be ultimately written to disk so that reduce stage pulls if across network? On Wed, Aug 5, 2015 at 4:40 PM, Saisai Shao wrote:

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Saisai Shao
Hi Muler, Shuffle data will be written to disk, no matter how large memory you have, large memory could alleviate shuffle spill where temporary file will be generated if memory is not enough. Yes, each node writes shuffle data to file and pulled from disk in reduce stage from network framework (d

Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
Hi, Consider I'm running WordCount with 100m of data on 4 node cluster. Assuming my RAM size on each node is 200g and i'm giving my executors 100g (just enough memory for 100m data) 1. If I have enough memory, can Spark 100% avoid writing to disk? 2. During shuffle, where results have to b

Re: MLlib/kmeans newbie question(s)

2015-03-09 Thread Xiangrui Meng
You need to change `== 1` to `== i`. `println(t)` happens on the workers, which may not be what you want. Try the following: noSets.filter(t => model.predict(Utils.featurize(t)) == i).collect().foreach(println) -Xiangrui On Sat, Mar 7, 2015 at 3:20 PM, Pierce Lamb wrote: > Hi all, > > I'm very

MLlib/kmeans newbie question(s)

2015-03-07 Thread Pierce Lamb
Hi all, I'm very new to machine learning algorithms and Spark. I'm follow the Twitter Streaming Language Classifier found here: http://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/README.html Specifically this code: http://databricks.gitbooks.io/data

Re: Newbie Question on How Tasks are Executed

2015-01-19 Thread davidkl
Hello Mixtou, if you want to look at partition ID, I believe you want to use mapPartitionsWithIndex -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-Question-on-How-Tasks-are-Executed-tp21064p21228.html Sent from the Apache Spark User List mailing

Newbie Question on How Tasks are Executed

2015-01-09 Thread mixtou
; } def estimateGuaranteedFrequentWords(): Unit = { frequent_words_counters.foreach{tuple => if (tuple._2(0) - tuple._2(1) < words_no*fi) { guaranteed_words -= tuple._1; } else { System.out.println("Guaranteed Word : "+tuple._1+" with co

Re: A spark newbie question

2015-01-04 Thread Sanjay Subramanian
rning process :-) Plus IMHO , if u r planning on learning Spark, I would say YES to Scala and NO to Java. Yes its a diff paradigm but being a Java and Hadoop programmer for many years, I am excited to learn Scala as the language and use Spark. Its exciting.   regards sanjay From: Aniket Bh

Re: A spark newbie question

2015-01-04 Thread Aniket Bhatnagar
Go through spark API documentation. Basically you have to do group by (date, message_type) and then do a count. On Sun, Jan 4, 2015, 9:58 PM Dinesh Vallabhdas wrote: > A spark cassandra newbie question. Thanks in advance for the help. > I have a cassandra table with 2 columns message_tim

A spark newbie question on summary statistics

2015-01-04 Thread anondin
A spark cassandra newbie question. Appreciate the help.u...@host.com I have a cassandra table with 2 columns message_timestamp(timestamp) and message_type(text). The data is of the form 2014-06-25 12:01:39 "START" 2014-06-25 12:02:39 "START" 2014-06-25 12:02:39 "PAUSE&q

A spark newbie question

2015-01-04 Thread Dinesh Vallabhdas
A spark cassandra newbie question. Thanks in advance for the help.I have a cassandra table with 2 columns message_timestamp(timestamp) and  message_type(text). The data is of the form2014-06-25 12:01:39 "START" 2014-06-25 12:02:39 "START" 2014-06-25 12:02:39 "PAUSE&q

Newbie Question

2014-12-11 Thread Fernando O.
Hi guys, I'm planning to use spark on a project and I'm facing a problem, I couldn't find a log that explains what's wrong with what I'm doing. I have 2 vms that run a small hadoop (2.6.0) cluster. I added a file that has a 50 lines of json data Compiled spark, all tests passed, I run some si

Re: newbie question quickstart example sbt issue

2014-10-28 Thread Akhil Das
#L7-8) >> [warn]+- simple-project:simple-project_2.10:1.0 >> sbt.ResolveException: unresolved dependency: >> org.apache.spark#spark-core_2.10;1.1.0: not found >> >> What am I doing wrong? >> >> Regards Hans-Peter >

Re: newbie question quickstart example sbt issue

2014-10-28 Thread nl19856
;> [warn] >> [warn] Note: Unresolved dependencies path: >> [warn] org.apache.spark:spark-core_2.10:1.1.0 >> (/root/simple.sbt#L7-8) >> [warn]+- simple-project:simple-project_2.10:1.0 >> sbt.ResolveException: unresolved dependency: >> org.a

Re: newbie question quickstart example sbt issue

2014-10-28 Thread Yanbo Liang
che.spark:spark-core_2.10:1.1.0 > (/root/simple.sbt#L7-8) > [warn]+- simple-project:simple-project_2.10:1.0 > sbt.ResolveException: unresolved dependency: > org.apache.spark#spark-core_2.10;1.1.0: not found > > What am I doing wrong? > > Regards Hans-Peter > > >

newbie question quickstart example sbt issue

2014-10-28 Thread nl19856
? Regards Hans-Peter -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/newbie-question-quickstart-example-sbt-issue-tp17477.html Sent from the Apache Spark User List mailing list archive at Nabble.com

JDBC Connections / newbie question

2014-07-20 Thread Ahmed Ibrahim
Hi All, In a JAVA based scenario where we have a large Oracle DB and want to use spark to do some distributed analysis being done on the data -- in such case how exactly we go about defining a JDBC connection and querying the data thanks, -- Ahmed Osama Ibrahim ITSC International Technology