Re: Spark and Beam

tal m Wed, 27 Sep 2017 03:53:15 -0700

my last email want't clear, please ignore.
Thanks it's looks better (no error or exceptions)
my problem now is how to set Spark conf to my pipeline, this is what i have
?


onf = SparkConf();
 conf.setAppName("InsightEdge Python Example")
 conf.set("my.field1", "XXX")

 conf.set("my.field2", "YYY")

how can i send it to my pipeline, i guess within command line ?

Thanks Tal


On Wed, Sep 27, 2017 at 1:45 PM, tal m <[email protected]> wrote:

> Thanks it's looks better (no error or exceptions)
> my problem now is how to set Spark conf to my pipeline, this is what i
> have ?
>
> onf = SparkConf();
>  conf.setAppName("InsightEdge Python Example")
>  conf.set("spark.insightedge.space.name", "insightedge-space")
>  conf.set("spark.insightedge.space.lookup.group", "insightedge")
>  conf.set("spark.insightedge.space.lookup.locator", "127.0.0.1:4174")
>
>
>
> On Tue, Sep 26, 2017 at 3:46 PM, Romain Manni-Bucau <[email protected]
> > wrote:
>
>> Yes, you need a coder for your product if it is passed to the output for
>> next "apply". You can register it on the pipeline or through beam SPI. Here
>> is a sample to use java serialization:
>>
>> pipeline.getCoderRegistry().registerCoderForClass(Product.class, 
>> SerializableCoder.of(Product.class));
>>
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>> <https://blog-rmannibucau.rhcloud.com> | Old Blog
>> <http://rmannibucau.wordpress.com> | Github
>> <https://github.com/rmannibucau> | LinkedIn
>> <https://www.linkedin.com/in/rmannibucau> | JavaEE Factory
>> <https://javaeefactory-rmannibucau.rhcloud.com>
>>
>> 2017-09-26 14:37 GMT+02:00 tal m <[email protected]>:
>>
>>> hi
>>> i tried what you wrote the argumants that i use are:
>>> --sparkMaster=local --runner=SparkRunner
>>> i already have Spark running.
>>> now i'm getting the following error:
>>>
>>>
>>> .IllegalStateException: Unable to return a default Coder for
>>> ParDo(Anonymous)/ParMultiDo(Anonymous).out0 [PCollectio
>>>
>>>
>>> Thanks Tal
>>>
>>>
>>>
>>> On Tue, Sep 26, 2017 at 12:32 PM, Romain Manni-Bucau <
>>> [email protected]> wrote:
>>>
>>>> Hi Tal,
>>>>
>>>> Did you try something like that:
>>>>
>>>> public static void main(final String[] args) {
>>>>     final Pipeline pipeline = 
>>>> Pipeline.create(PipelineOptionsFactory.fromArgs(args).create());
>>>>
>>>>     pipeline.apply(GenerateSequence.from(0).to(100000L))
>>>>             .apply(ParDo.of(new DoFn<Integer, Product>() {
>>>>                 @ProcessElement
>>>>                 public void onElement(final ProcessContext context) {
>>>>                     final int i = context.element();
>>>>                     context.output(new Product(i, "Product #" + i));
>>>>                 }
>>>>             }));
>>>>
>>>>     pipeline.run();
>>>> }
>>>>
>>>>
>>>> Then it is just a matter of having the beam dependency matching your 
>>>> runner (target environment). For testing the direct runner is enough but 
>>>> to run on spark you will need to import the spark one as dependency.
>>>>
>>>>
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>> <https://blog-rmannibucau.rhcloud.com> | Old Blog
>>>> <http://rmannibucau.wordpress.com> | Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau> | JavaEE Factory
>>>> <https://javaeefactory-rmannibucau.rhcloud.com>
>>>>
>>>> 2017-09-26 11:02 GMT+02:00 tal m <[email protected]>:
>>>>
>>>>> HI
>>>>> i looked at the links you sent me, and i haven't found any clue how to
>>>>> adapt it to my current code.
>>>>> my code is very simple:
>>>>>
>>>>> val sc = spark.sparkContext
>>>>>
>>>>> val productsNum = 100000
>>>>> println(s"Saving $productsNum products RDD to the space")
>>>>> val rdd = sc.parallelize(1 to productsNum).map { i =>
>>>>>   Product(i, "Description of product " + i, Random.nextInt(10), 
>>>>> Random.nextBoolean())
>>>>> }
>>>>>
>>>>> is that simple to use beam instead of SparkContext ? i'm not familiar 
>>>>> with Spark at all so i have no idea what is Spark runner and how can i 
>>>>> use it in my case, just need to make it work :).
>>>>>
>>>>> Thanks Tal
>>>>>
>>>>>
>>>>> On Tue, Sep 26, 2017 at 11:57 AM, Aviem Zur <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Tal,
>>>>>>
>>>>>> Thanks for reaching out!
>>>>>>
>>>>>> Please take a look at our documentation:
>>>>>>
>>>>>> Quickstart guide (Java): https://beam.apache.or
>>>>>> g/get-started/quickstart-java/
>>>>>> This guide will show you how to run our wordcount example using each
>>>>>> any of the runners (For example, direct runner or Spark runner in your
>>>>>> case).
>>>>>>
>>>>>> More reading:
>>>>>> Programming guide: https://beam.apache.org
>>>>>> /documentation/programming-guide/
>>>>>> Spark runner: https://beam.apache.org/documentation/runners/spark/
>>>>>>
>>>>>> Please let us know if you have further questions, and good luck with
>>>>>> your first try of Beam!
>>>>>>
>>>>>> Aviem.
>>>>>>
>>>>>> On Tue, Sep 26, 2017 at 11:47 AM tal m <[email protected]> wrote:
>>>>>>
>>>>>>> hi
>>>>>>> i'm new at Spark and also at beam.
>>>>>>> currently i have Java code that use Spark from reading some data
>>>>>>> from DB.
>>>>>>> my Spark code using SparkSession.builder (.....) and also
>>>>>>> sparkContext.
>>>>>>> how can i make beam work similar to my current code, i just want
>>>>>>> make it work for now.
>>>>>>> Thanks Tal
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark and Beam

Reply via email to