Re: using spark context in map funciton TASk not serilizable error
method1 looks like this reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) reRDD has userId's def method1(sc:SparkContext , userId: string){ sc.cassandraTable("Keyspace", "Table2").where("userid = ?" userId) ...do something return "Test" } On Wed, Jan 20, 2016 at 11:00 AM, Shixiong(Ryan) Zhu < shixi...@databricks.com> wrote: > You should not use SparkContext or RDD directly in your closures. > > Could you show the codes of "method1"? Maybe you only needs join or > something else. E.g., > > val cassandraRDD = sc.cassandraTable("keySpace", "tableName") > reRDD.join(cassandraRDD).map().saveAsTextFile(outputDir) > > > On Tue, Jan 19, 2016 at 4:12 AM, Ricardo Paiva < > ricardo.pa...@corp.globo.com> wrote: > >> Did you try SparkContext.getOrCreate() ? >> >> You don't need to pass the sparkContext to the map function, you can >> retrieve it from the SparkContext singleton. >> >> Regards, >> >> Ricardo >> >> >> On Mon, Jan 18, 2016 at 6:29 PM, gpatcham [via Apache Spark User List] >> <[hidden >> email] <http:///user/SendEmail.jtp?type=node=26006=0>> wrote: >> >>> Hi, >>> >>> I have a use case where I need to pass sparkcontext in map function >>> >>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) >>> >>> Method1 needs spark context to query cassandra. But I see below error >>> >>> java.io.NotSerializableException: org.apache.spark.SparkContext >>> >>> Is there a way we can fix this ? >>> >>> Thanks >>> >>> -- >>> If you reply to this email, your message will be added to the discussion >>> below: >>> >>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html >>> To start a new topic under Apache Spark User List, email [hidden email] >>> <http:///user/SendEmail.jtp?type=node=26006=1> >>> To unsubscribe from Apache Spark User List, click here. >>> NAML >>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >>> >> >> >> >> -- >> Ricardo Paiva >> Big Data >> *globo.com* <http://www.globo.com> >> >> -- >> View this message in context: Re: using spark context in map funciton >> TASk not serilizable error >> <http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998p26006.html> >> >> Sent from the Apache Spark User List mailing list archive >> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >> > >
Re: using spark context in map funciton TASk not serilizable error
You should not use SparkContext or RDD directly in your closures. Could you show the codes of "method1"? Maybe you only needs join or something else. E.g., val cassandraRDD = sc.cassandraTable("keySpace", "tableName") reRDD.join(cassandraRDD).map().saveAsTextFile(outputDir) On Tue, Jan 19, 2016 at 4:12 AM, Ricardo Paiva <ricardo.pa...@corp.globo.com > wrote: > Did you try SparkContext.getOrCreate() ? > > You don't need to pass the sparkContext to the map function, you can > retrieve it from the SparkContext singleton. > > Regards, > > Ricardo > > > On Mon, Jan 18, 2016 at 6:29 PM, gpatcham [via Apache Spark User List] > <[hidden > email] <http:///user/SendEmail.jtp?type=node=26006=0>> wrote: > >> Hi, >> >> I have a use case where I need to pass sparkcontext in map function >> >> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) >> >> Method1 needs spark context to query cassandra. But I see below error >> >> java.io.NotSerializableException: org.apache.spark.SparkContext >> >> Is there a way we can fix this ? >> >> Thanks >> >> -------------- >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html >> To start a new topic under Apache Spark User List, email [hidden email] >> <http:///user/SendEmail.jtp?type=node=26006=1> >> To unsubscribe from Apache Spark User List, click here. >> NAML >> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > > -- > Ricardo Paiva > Big Data > *globo.com* <http://www.globo.com> > > -- > View this message in context: Re: using spark context in map funciton > TASk not serilizable error > <http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998p26006.html> > > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >
Re: using spark context in map funciton TASk not serilizable error
Did you try SparkContext.getOrCreate() ? You don't need to pass the sparkContext to the map function, you can retrieve it from the SparkContext singleton. Regards, Ricardo On Mon, Jan 18, 2016 at 6:29 PM, gpatcham [via Apache Spark User List] < ml-node+s1001560n25998...@n3.nabble.com> wrote: > Hi, > > I have a use case where I need to pass sparkcontext in map function > > reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) > > Method1 needs spark context to query cassandra. But I see below error > > java.io.NotSerializableException: org.apache.spark.SparkContext > > Is there a way we can fix this ? > > Thanks > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=1=cmljYXJkby5wYWl2YUBjb3JwLmdsb2JvLmNvbXwxfDQ1MDcxMTc2Mw==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- Ricardo Paiva Big Data *globo.com* <http://www.globo.com> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998p26006.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
using spark context in map funciton TASk not serilizable error
Hi, I have a use case where I need to pass sparkcontext in map function reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) Method1 needs spark context to query cassandra. But I see below error java.io.NotSerializableException: org.apache.spark.SparkContext Is there a way we can fix this ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: using spark context in map funciton TASk not serilizable error
Can you pass the properties which are needed for accessing Cassandra without going through SparkContext ? SparkContext isn't designed to be used in the way illustrated below. Cheers On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gpatc...@gmail.com> wrote: > Hi, > > I have a use case where I need to pass sparkcontext in map function > > reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) > > Method1 needs spark context to query cassandra. But I see below error > > java.io.NotSerializableException: org.apache.spark.SparkContext > > Is there a way we can fix this ? > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: using spark context in map funciton TASk not serilizable error
I'm using spark cassandra connector to do this and the way we access cassandra table is sc.cassandraTable("keySpace", "tableName") Thanks Giri On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you pass the properties which are needed for accessing Cassandra > without going through SparkContext ? > > SparkContext isn't designed to be used in the way illustrated below. > > Cheers > > On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gpatc...@gmail.com> wrote: > >> Hi, >> >> I have a use case where I need to pass sparkcontext in map function >> >> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) >> >> Method1 needs spark context to query cassandra. But I see below error >> >> java.io.NotSerializableException: org.apache.spark.SparkContext >> >> Is there a way we can fix this ? >> >> Thanks >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: using spark context in map funciton TASk not serilizable error
Can we use @transient ? On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gpatc...@gmail.com> wrote: > I'm using spark cassandra connector to do this and the way we access > cassandra table is > > sc.cassandraTable("keySpace", "tableName") > > Thanks > Giri > > On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Can you pass the properties which are needed for accessing Cassandra >> without going through SparkContext ? >> >> SparkContext isn't designed to be used in the way illustrated below. >> >> Cheers >> >> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gpatc...@gmail.com> wrote: >> >>> Hi, >>> >>> I have a use case where I need to pass sparkcontext in map function >>> >>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) >>> >>> Method1 needs spark context to query cassandra. But I see below error >>> >>> java.io.NotSerializableException: org.apache.spark.SparkContext >>> >>> Is there a way we can fix this ? >>> >>> Thanks >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >
Re: using spark context in map funciton TASk not serilizable error
Did you mean constructing SparkContext on the worker nodes ? Not sure whether that would work. Doesn't seem to be good practice. On Mon, Jan 18, 2016 at 1:27 PM, Giri P <gpatc...@gmail.com> wrote: > Can we use @transient ? > > > On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gpatc...@gmail.com> wrote: > >> I'm using spark cassandra connector to do this and the way we access >> cassandra table is >> >> sc.cassandraTable("keySpace", "tableName") >> >> Thanks >> Giri >> >> On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Can you pass the properties which are needed for accessing Cassandra >>> without going through SparkContext ? >>> >>> SparkContext isn't designed to be used in the way illustrated below. >>> >>> Cheers >>> >>> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gpatc...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I have a use case where I need to pass sparkcontext in map function >>>> >>>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) >>>> >>>> Method1 needs spark context to query cassandra. But I see below error >>>> >>>> java.io.NotSerializableException: org.apache.spark.SparkContext >>>> >>>> Is there a way we can fix this ? >>>> >>>> Thanks >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >
Re: using spark context in map funciton TASk not serilizable error
yes I tried doing that but that doesn't work. I'm looking at using SQLContext and dataframes. Is SQLCOntext serializable? On Mon, Jan 18, 2016 at 1:29 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Did you mean constructing SparkContext on the worker nodes ? > > Not sure whether that would work. > > Doesn't seem to be good practice. > > On Mon, Jan 18, 2016 at 1:27 PM, Giri P <gpatc...@gmail.com> wrote: > >> Can we use @transient ? >> >> >> On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gpatc...@gmail.com> wrote: >> >>> I'm using spark cassandra connector to do this and the way we access >>> cassandra table is >>> >>> sc.cassandraTable("keySpace", "tableName") >>> >>> Thanks >>> Giri >>> >>> On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> Can you pass the properties which are needed for accessing Cassandra >>>> without going through SparkContext ? >>>> >>>> SparkContext isn't designed to be used in the way illustrated below. >>>> >>>> Cheers >>>> >>>> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gpatc...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have a use case where I need to pass sparkcontext in map function >>>>> >>>>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) >>>>> >>>>> Method1 needs spark context to query cassandra. But I see below error >>>>> >>>>> java.io.NotSerializableException: org.apache.spark.SparkContext >>>>> >>>>> Is there a way we can fix this ? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> - >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>> >> >
Re: using spark context in map funciton TASk not serilizable error
class SQLContext private[sql]( @transient val sparkContext: SparkContext, @transient protected[sql] val cacheManager: CacheManager, @transient private[sql] val listener: SQLListener, val isRootContext: Boolean) extends org.apache.spark.Logging with Serializable { FYI On Mon, Jan 18, 2016 at 1:44 PM, Giri P <gpatc...@gmail.com> wrote: > yes I tried doing that but that doesn't work. > > I'm looking at using SQLContext and dataframes. Is SQLCOntext serializable? > > On Mon, Jan 18, 2016 at 1:29 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Did you mean constructing SparkContext on the worker nodes ? >> >> Not sure whether that would work. >> >> Doesn't seem to be good practice. >> >> On Mon, Jan 18, 2016 at 1:27 PM, Giri P <gpatc...@gmail.com> wrote: >> >>> Can we use @transient ? >>> >>> >>> On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gpatc...@gmail.com> wrote: >>> >>>> I'm using spark cassandra connector to do this and the way we access >>>> cassandra table is >>>> >>>> sc.cassandraTable("keySpace", "tableName") >>>> >>>> Thanks >>>> Giri >>>> >>>> On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Can you pass the properties which are needed for accessing Cassandra >>>>> without going through SparkContext ? >>>>> >>>>> SparkContext isn't designed to be used in the way illustrated below. >>>>> >>>>> Cheers >>>>> >>>>> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gpatc...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have a use case where I need to pass sparkcontext in map function >>>>>> >>>>>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) >>>>>> >>>>>> Method1 needs spark context to query cassandra. But I see below error >>>>>> >>>>>> java.io.NotSerializableException: org.apache.spark.SparkContext >>>>>> >>>>>> Is there a way we can fix this ? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> - >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >> >