Re: Using Spark as a fail-over platform for Java app

2021-03-12 Thread Jungtaek Lim
That's what resource managers provide to you. So you can code and deal with resource managers, but I assume you're finding ways to not deal with resource managers directly and let Spark do it instead. I admit I have no experience (I did the similar with Apache Storm on standalone setup 5+ years ag

Re: Using Spark as a fail-over platform for Java app

2021-03-12 Thread Lalwani, Jayesh
Can I cut a steak with a hammer? Sure you can, but the steak would taste awful Do you have organizational/bureaucratic issues with using a Load Balancer? Because that’s what you really need. Run your application on multiple nodes with a load balancer in front. When a node crashes, the load balan

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
va >>>>>> > > > > > > I do it this way: >>>>>> > > > > > > Dataset productUpdates = watermarkedDS >>>>>> > > > > > > .groupByKey( >>>>>> > > > > > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
;>>> > > > > > > appConfig, accumulators), >>>>> > > > > > > >>>>> Encoders.bean(ModelStateInfo.class), >>>>> > > > > > > Encoders.bean(ModelUpdate.class), >>>>>

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
>>> > > > > > > > Yes, that's exactly how I am creating them. >>>> > > > > > > > >>>> > > > > > > > Question... Are you using 'Stateful Structured Streaming' >>&g

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
gTimeTimeout())( >>> > > > > > > > updateAcrossEvents >>> > > > > > > > ) >>> > > > > > > > >>> > > > > > > > And updating the Accumulator inside 'updateAcrossEvents'? >>> We&#

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
pecific Accumulators. >> > > > Actually I >> > > > > > am >> > > > > > > >> getting the values printed in my driver log as well as >> sent to >> > > > > > Grafana. Not >> > > > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-07 Thread Something Something
> >> same for cluster mode as well. > > > > > > > >> Create accumulators like this: > > > > > > > >> AccumulatorV2 accumulator = > sparkContext.longAccumulator(name); > > > > > > > >> > > > > > > > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-03 Thread ZHANG Wei
gt; > > > > >>> your code? I am talking about the Application Specific > > > Accumulators. > > > > > The > > > > > > >>> other standard counters such as > > > 'event.progress.inputRowsPerSecond' > > > > > are > > > > > > >>> getting populated correctly! >

Re: Using Spark Accumulators with Structured Streaming

2020-06-01 Thread ZHANG Wei
> > > > >>>> Even for me it comes as 0 when I print in OnQueryProgress. I use > > > > > >>>> LongAccumulator as well. Yes, it prints on my local but not on > > > > cluster. > > > > > >>>> But one consolation is

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
gt;>>>> computed in your code? I am talking about the Application Specific >>>>>>>>> Accumulators. The other standard counters such as >>>>>>>>> 'event.progress.inputRowsPerSecond' are getting populated correctly! >>

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
;>>>>> >>>>>>>> On Mon, May 25, 2020 at 8:39 PM Srinivas V >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> Even for me it comes as 0 when I print in OnQuery

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
cumulator as well. Yes, it prints on my local but not on cluster. >>>>>>>> But one consolation is that when I send metrics to Graphana, the >>>>>>>> values are coming there. >>>>>>>> >>>>>>>> On Tue, May 26, 2020 at 3:10 AM Somet

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
ay 26, 2020 at 3:10 AM Something Something < >>>>>>> mailinglist...@gmail.com> wrote: >>>>>>> >>>>>>>> No this is not working even if I use LongAccumulator. >>>>>>>> >>>>>>>> On Fri, May 15, 2020 at 9:54 PM ZHANG Wei wrote: >>>>>

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
t; >>>> mailinglist...@gmail.com> wrote: > > > > >>>> > > > > >>>>> No this is not working even if I use LongAccumulator. > > > > >>>>> > > > > >>>>> On Fri, May 15, 2020 at 9:

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
; On Fri, May 15, 2020 at 9:54 PM ZHANG Wei wrote: >>>>>>> >>>>>>>> There is a restriction in AccumulatorV2 API [1], the OUT type >>>>>>>> should be atomic or thread safe. I'm wondering if the implementation >>>>>>&

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread ZHANG Wei
afe. I'm wondering if the implementation for > > > >>>>>> `java.util.Map[T, Long]` can meet it or not. Is there any chance > > to replace > > > >>>>>> CollectionLongAccumulator by CollectionAccumulator[2] or > > LongAccumulator[3] > &

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Something Something
`java.util.Map[T, Long]` can meet it or not. Is there any chance to >>>>>>> replace >>>>>>> CollectionLongAccumulator by CollectionAccumulator[2] or >>>>>>> LongAccumulator[3] >>>>>>> and test if the StreamingLi

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
gt; > >>>>>> [1] > > >>>>>> > https://eur06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fscala%2Findex.html%23org.apache.spark.util.AccumulatorV2&data=02%7C01%7C%7Ce9cd79340511422f368608d802fc468d%7

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread ZHANG Wei
.org%2Fdocs%2Flatest%2Fapi%2Fscala%2Findex.html%23org.apache.spark.util.AccumulatorV2&data=02%7C01%7C%7Ce9cd79340511422f368608d802fc468d%7C84df9e7fe9f640afb435%7C1%7C0%7C637262629816034378&sdata=73AxOzjhvImCuhXPoMN%2Bm7%2BY3KYwwaoCvmYMoOEGDtU%3D&reserved=0 > >>>>&

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
to work? >>>>>> >>>>>> --- >>>>>> Cheers, >>>>>> -z >>>>>> [1] >>>>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.AccumulatorV2 >>>>>> [2

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Something Something
>>>> [1] >>>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.AccumulatorV2 >>>>> [2] >>>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.CollectionAccumulator >>>>&g

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Srinivas V
index.html#org.apache.spark.util.AccumulatorV2 >>>> [2] >>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.CollectionAccumulator >>>> [3] >>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator >>>> >>>> _

Re: Using Spark Accumulators with Structured Streaming

2020-05-26 Thread Something Something
e.org/docs/latest/api/scala/index.html#org.apache.spark.util.CollectionAccumulator >>> [3] >>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator >>> >>> >>> From: Something Somethi

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Srinivas V
;> [3] >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator >> >> ________ >> From: Something Something >> Sent: Saturday, May 16, 2020 0:38 >> To: spark-user >> Subject: Re: Using Spark Accumulators with Structured Streami

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Something Something
__ > From: Something Something > Sent: Saturday, May 16, 2020 0:38 > To: spark-user > Subject: Re: Using Spark Accumulators with Structured Streaming > > Can someone from Spark Development team tell me if this functionality is > supported and tested? I've spent a l

Re: Using Spark Accumulators with Structured Streaming

2020-05-15 Thread ZHANG Wei
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator From: Something Something Sent: Saturday, May 16, 2020 0:38 To: spark-user Subject: Re: Using Spark Accumulators with Structured Streaming Can someone from Spark Develo

Re: Using Spark Accumulators with Structured Streaming

2020-05-15 Thread Something Something
Can someone from Spark Development team tell me if this functionality is supported and tested? I've spent a lot of time on this but can't get it to work. Just to add more context, we've our own Accumulator class that extends from AccumulatorV2. In this class we keep track of one or more accumulator

Re: Using Spark as a simulator

2017-07-07 Thread Steve Loughran
On 7 Jul 2017, at 08:37, Esa Heikkinen mailto:esa.heikki...@student.tut.fi>> wrote: I only want to simulate very huge "network" with even millions parallel time syncronized actors (state machines). There are also communication between actors via some (key-value pairs) database. I also want th

Re: Using Spark with Local File System/NFS

2017-06-22 Thread Michael Mior
If you put a * in the path, Spark will look for a file or directory named *. To read all the files in a directory, just remove the star. -- Michael Mior michael.m...@gmail.com On Jun 22, 2017 17:21, "saatvikshah1994" wrote: > Hi, > > I've downloaded and kept the same set of data files on all my

RE: Using Spark as a simulator

2017-06-21 Thread Mahesh Sawaiker
lto:jornfra...@gmail.com>> Lähetetty: 20. kesäkuuta 2017 17:12 Vastaanottaja: Esa Heikkinen Kopio: user@spark.apache.org<mailto:user@spark.apache.org> Aihe: Re: Using Spark as a simulator It is fine, but you have to design it that generated rows are written in large blocks for optimal performan

RE: Using Spark as a simulator

2017-06-20 Thread Mahesh Sawaiker
I have already seen on example where data is generated using spark, no reason to think it's a bad idea as far as I know. You can check the code here, I m not very sure but I think there is something there which generates data for the TPCDS benchmark and you can provide how much data you want in

Re: Using Spark as a simulator

2017-06-20 Thread Jörn Franke
It is fine, but you have to design it that generated rows are written in large blocks for optimal performance. The most tricky part with data generation is the conceptual part, such as probabilistic distribution etc You have to check as well that you use a good random generator, for some cases

RE: using spark to load a data warehouse in real time

2017-03-07 Thread Adaryl Wakefield
t.net<http://www.massstreet.net> www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData From: Henry Tremblay [mailto:paulhtremb...@gmail.com] Sent: Tuesday, February 28, 2017 3:56 PM To: user@spark.apache.org Subject: Re: using spark to load a data

RE: using spark to load a data warehouse in real time

2017-03-04 Thread Adaryl Wakefield
/in/bobwakefieldmba> Twitter: @BobLovesData From: Sam Elamin [mailto:hussam.ela...@gmail.com] Sent: Wednesday, March 1, 2017 2:29 AM To: Adaryl Wakefield ; Jörn Franke Cc: user@spark.apache.org Subject: Re: using spark to load a data warehouse in real time Hi Adaryl Having come from a W

RE: using spark to load a data warehouse in real time

2017-03-04 Thread Adaryl Wakefield
yl Wakefield Cc: user@spark.apache.org Subject: Re: using spark to load a data warehouse in real time I am not sure that Spark Streaming is what you want to do. It is for streaming analytics not for loading in a DWH. You need also define what realtime means and what is needed there - it will differ

Re: using spark to load a data warehouse in real time

2017-03-01 Thread Sam Elamin
lto:donta...@gmail.com ] *Sent:* Tuesday, February 28, 2017 12:57 PM *To:* Adaryl Wakefield *Cc:* user@spark.apache.org *Subject:* Re: using spark to load a data warehouse in real time Hi Adaryl, You could definitely load data into a warehouse through Spark's JDBC support through DataFram

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Jörn Franke
Mohammad Tariq [mailto:donta...@gmail.com] > Sent: Tuesday, February 28, 2017 12:57 PM > To: Adaryl Wakefield > Cc: user@spark.apache.org > Subject: Re: using spark to load a data warehouse in real time > > Hi Adaryl, > > You could definitely load data into a warehous

RE: using spark to load a data warehouse in real time

2017-02-28 Thread Adaryl Wakefield
ad Tariq [mailto:donta...@gmail.com] Sent: Tuesday, February 28, 2017 12:57 PM To: Adaryl Wakefield Cc: user@spark.apache.org Subject: Re: using spark to load a data warehouse in real time Hi Adaryl, You could definitely load data into a warehouse through Spark's JDBC support through DataFrame

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Henry Tremblay
[mailto:femib...@gmail.com <mailto:femib...@gmail.com>] *Sent:* Tuesday, February 28, 2017 4:13 AM *To:* Adaryl Wakefield mailto:adaryl.wakefi...@hotmail.com>> *Cc:* user@spark.apache.org <mailto:user@spark.apache.org> *Subject:* Re: usi

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Mohammad Tariq
SELECT statements, no INSERTS or MERGE >> statements. >> >> >> >> Adaryl "Bob" Wakefield, MBA >> Principal >> Mass Street Analytics, LLC >> 913.938.6685 >> >> www.massstreet.net >> >> www.linkedin.com/in/bobwakefieldmba >

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Mohammad Tariq
bwakefieldmba > Twitter: @BobLovesData > > > > *From:* Femi Anthony [mailto:femib...@gmail.com] > *Sent:* Tuesday, February 28, 2017 4:13 AM > *To:* Adaryl Wakefield > *Cc:* user@spark.apache.org > *Subject:* Re: using spark to load a data warehouse in real

RE: using spark to load a data warehouse in real time

2017-02-28 Thread Adaryl Wakefield
ail.com] Sent: Tuesday, February 28, 2017 4:13 AM To: Adaryl Wakefield Cc: user@spark.apache.org Subject: Re: using spark to load a data warehouse in real time Have you checked to see if there are any drivers to enable you to write to Greenplum directly from Spark ? You can also take a loo

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Femi Anthony
Have you checked to see if there are any drivers to enable you to write to Greenplum directly from Spark ? You can also take a look at this link: https://groups.google.com/a/greenplum.org/forum/m/#!topic/gpdb-users/lnm0Z7WBW6Q Apparently GPDB is based on Postgres so maybe that approach may work

Re: using spark-xml_2.10 to extract data from XML file

2017-02-15 Thread Carlo . Allocca
Hi Hyukjin, Thank you very much for this. Sure I am going to do it today based on data + java code. Many Thanks for the support. Best Regards, Carlo On 15 Feb 2017, at 00:22, Hyukjin Kwon mailto:gurwls...@gmail.com>> wrote: Hi Carlo, There was a bug in lower versions when accessing to nest

Re: using spark-xml_2.10 to extract data from XML file

2017-02-14 Thread Hyukjin Kwon
Hi Carlo, There was a bug in lower versions when accessing to nested values in the library. Otherwise, I suspect another issue about parsing malformed XML. Could you maybe open an issue in https://github.com/databricks/spark-xml/issues with your sample data? I will stick with it until it is so

Re: using spark-xml_2.10 to extract data from XML file

2017-02-14 Thread Carlo . Allocca
more specifically: Given the following XML data structure: This is the Structure of the XML file: xocs:doc |-- xocs:item: struct (nullable = true) ||-- bibrecord: struct (nullable = true) |||-- head: struct (nullable = true) ||||-- abstracts: struct (nullable = true)

Re: using spark-xml_2.10 to extract data from XML file

2017-02-14 Thread Carlo . Allocca
Dear All, I would like to ask you help about the following issue when using spark-xml_2.10: Given a XML file with the following structure: xocs:doc |-- xocs:item: struct (nullable = true) ||-- bibrecord: struct (nullable = true) |||-- head: struct (nullable = true) |||

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-30 Thread Steve Loughran
On 29 Sep 2016, at 10:37, Olivier Girardot mailto:o.girar...@lateral-thoughts.com>> wrote: I know that the code itself would not be the same, but it would be useful to at least have the pom/build.sbt transitive dependencies different when fetching the artifact with a specific classifier, don't

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-29 Thread Sean Owen
No, I think that's what dependencyManagent (or equivalent) is definitely for. On Thu, Sep 29, 2016 at 5:37 AM, Olivier Girardot wrote: > I know that the code itself would not be the same, but it would be useful to > at least have the pom/build.sbt transitive dependencies different when > fetching

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-29 Thread Olivier Girardot
I know that the code itself would not be the same, but it would be useful to at least have the pom/build.sbt transitive dependencies different when fetching the artifact with a specific classifier, don't you think ?For now I've overriden them myself using the dependency versions defined in the pom.

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-28 Thread Sean Owen
I guess I'm claiming the artifacts wouldn't even be different in the first place, because the Hadoop APIs that are used are all the same across these versions. That would be the thing that makes you need multiple versions of the artifact under multiple classifiers. On Wed, Sep 28, 2016 at 1:16 PM,

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-28 Thread Olivier Girardot
ok, don't you think it could be published with just different classifiers hadoop-2.6hadoop-2.4 hadoop-2.2 being the current default. So for now, I should just override spark 2.0.0's dependencies with the ones defined in the pom profile On Thu, Sep 22, 2016 11:17 AM, Sean Owen so...@cloudera.

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-22 Thread Sean Owen
There can be just one published version of the Spark artifacts and they have to depend on something, though in truth they'd be binary-compatible with anything 2.2+. So you merely manage the dependency versions up to the desired version in your . On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <

Re: Using Spark SQL to Create JDBC Tables

2016-09-13 Thread ayan guha
I did not install myself, as it is part of Oracle's product, However, you can bring in any SerDe yourself and add them to library. See this blog for more information. On Wed, Sep 14, 2016 at 2:15 PM, Benjamin Kim wrote: >

Re: Using Spark SQL to Create JDBC Tables

2016-09-13 Thread Benjamin Kim
Thank you for the idea. I will look for a PostgreSQL Serde for Hive. But, if you don’t mind me asking, how did you install the Oracle Serde? Cheers, Ben > On Sep 13, 2016, at 7:12 PM, ayan guha wrote: > > One option is have Hive as the central point of exposing data ie create hive > tables w

Re: Using Spark SQL to Create JDBC Tables

2016-09-13 Thread ayan guha
One option is have Hive as the central point of exposing data ie create hive tables which "point to" any other DB. i know Oracle provides there own Serde for hive. Not sure about PG though. Once tables are created in hive, STS will automatically see it. On Wed, Sep 14, 2016 at 11:08 AM, Benjamin

Re: Using spark package XGBoost

2016-09-08 Thread janardhan shetty
Tried to implement spark package in 2.0 https://spark-packages.org/package/rotationsymmetry/sparkxgboost but it is throwing the error: error: not found: type SparkXGBoostClassifier On Tue, Sep 6, 2016 at 11:26 AM, janardhan shetty wrote: > Is this merged to Spark ML ? If so which version ? > >

Re: Using spark package XGBoost

2016-09-06 Thread janardhan shetty
Is this merged to Spark ML ? If so which version ? On Tue, Sep 6, 2016 at 12:58 AM, Takeshi Yamamuro wrote: > Hi, > > Sorry to bother you, but I'ld like to inform you our activities. > We'll start incubating our product, Hivemall, in Apache and this is a > scalable ML library > for Hive/Spark/Pi

Re: Using spark to distribute jobs to standalone servers

2016-08-25 Thread Igor Berman
imho, you'll need to implement custom rdd with your locality settings(i.e. custom implementation of discovering where each partition is located) + setting for spark.locality.wait On 24 August 2016 at 03:48, Mohit Jaggi wrote: > It is a bit hacky but possible. A lot depends on what kind of querie

Re: Using spark to distribute jobs to standalone servers

2016-08-23 Thread Mohit Jaggi
It is a bit hacky but possible. A lot depends on what kind of queries etc you want to run. You could write a data source that reads your data and keeps it partitioned the way you want, then use mapPartitions() to execute your code… Mohit Jaggi Founder, Data Orchard LLC www.dataorchardllc.com

Re: Using spark package XGBoost

2016-08-14 Thread Brandon White
The XGBoost integration with Spark is currently only supported for RDDs, there is a ticket for dataframe and folks calm to be working on it. On Aug 14, 2016 8:15 PM, "Jacek Laskowski" wrote: > Hi, > > I've never worked with the library and speaking about sbt setup only. > > It appears that the p

Re: Using spark package XGBoost

2016-08-14 Thread Jacek Laskowski
Hi, I've never worked with the library and speaking about sbt setup only. It appears that the project didn't release 2.11-compatible jars (only 2.10) [1] so you need to build the project yourself and uber-jar it (using sbt-assembly plugin). [1] https://spark-packages.org/package/rotationsymmetry

Re: Using spark package XGBoost

2016-08-14 Thread janardhan shetty
Any leads how to do acheive this? On Aug 12, 2016 6:33 PM, "janardhan shetty" wrote: > I tried using *sparkxgboost package *in build.sbt file but it failed. > Spark 2.0 > Scala 2.11.8 > > Error: > [warn] http://dl.bintray.com/spark-packages/maven/ > rotationsymmetry/sparkxgboost/0.2.1-s_2.10/

Re: Using spark package XGBoost

2016-08-12 Thread janardhan shetty
I tried using *sparkxgboost package *in build.sbt file but it failed. Spark 2.0 Scala 2.11.8 Error: [warn] http://dl.bintray.com/spark-packages/maven/rotationsymmetry/sparkxgboost/0.2.1-s_2.10/sparkxgboost-0.2.1-s_2.10-javadoc.jar [warn] :::

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
So in short this prevents writing data >>>>> back and forth after every reduce step which for me is a significant >>>>> improvement, compared to the classical MapReduce algorithm. >>>>> >>>>> Now Tez is basically MR with DAG.

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
Hi Ayan, This is a very valid question and I have not seen any available instrumentation in Spark that allows one to measure this in a practical way in a cluster. Classic example: 1. if you have memory issue do you upgrade your RAM or scale out horizontally by adding couple of more nodes

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
arising from > such loss, damage or destruction. > > > > On 12 July 2016 at 09:33, Markovitz, Dudu wrote: > >> I don’t see how this explains the time differences. >> >> >> >> Dudu >> >> >> >> *From:* Mich Talebzadeh [mailto:mic

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Jörn Franke
t; loss, damage or destruction. > > >> On 12 July 2016 at 09:33, Markovitz, Dudu wrote: >> I don’t see how this explains the time differences. >> >> >> >> Dudu >> >> >> >> From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
r property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > > On 12 July 2016 at 08:16, Markovitz, Du

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
gt; > > *From:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com] > *Sent:* Monday, July 11, 2016 11:55 PM > *To:* user ; user @spark > *Subject:* Re: Using Spark on Hive with Hive also using Spark as its > execution engine > > > > In my test I did like for like keepi

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread ayan guha
ccHi Mich Thanks for showing examples, makes perfect sense. One question: "...I agree that on VLT (very large tables), the limitation in available memory may be the overriding factor in using Spark"...have you observed any specific threshold for VLT which tilts the favor against Spark. For exampl

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
Another point with Hive on spark and Hive on Tez + LLAP, I am thinking loud :) 1. I am using Hive on Spark and I have a table of 10GB say with 100 users concurrently accessing the same partition of ORC table (last one hour or so) 2. Spark takes data and puts in in memory. I gather on

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
In my test I did like for like keeping the systematic the same namely: 1. Table was a parquet table of 100 Million rows 2. The same set up was used for both Hive on Spark and Hive on MR 3. Spark was very impressive compared to MR on this particular test. Just to see any issues I create

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
Appreciate all the comments. Hive on Spark. Spark runs as an execution engine and is only used when you query Hive. Otherwise it is not running. I run it in Yarn client mode. let me show you an example In hive-site xml set the execution engine to be spark to spark. It requires some configuration

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Michael Segel
Just a clarification. Tez is ‘vendor’ independent. ;-) Yeah… I know… Anyone can support it. Only Hortonworks has stacked the deck in their favor. Drill could be in the same boat, although there now more committers who are not working for MapR. I’m not sure who outside of HW is supporting

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Jörn Franke
I think llap should be in the future a general component so llap + spark can make sense. I see tez and spark not as competitors but they have different purposes. Hive+Tez+llap is not the same as hive+spark. I think it goes beyond that for interactive queries . Tez - you should use a distribution

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Michael Segel
I don’t think that it would be a good comparison. If memory serves, Tez w LLAP is going to be running a separate engine that is constantly running, no? Spark? That runs under hive… Unless you’re suggesting that the spark context is constantly running as part of the hiveserver2? > On May

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
The presentation will go deeper into the topic. Otherwise some thoughts of mine. Fell free to comment. criticise :) 1. I am a member of Spark Hive and Tez user groups plus one or two others 2. Spark is by far the biggest in terms of community interaction 3. Tez, typically one thread in

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Ashok Kumar
Hi Mich, Your recent presentation in London on this topic "Running Spark on Hive or Hive on Spark" Have you made any more interesting findings that you like to bring up? If Hive is offering both Spark and Tez in addition to MR, what stopping one not to use Spark? I still don't get why TEZ + LLAP

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Mich Talebzadeh
I think we are going to move to a model that the computation stack will be separate from storage stack and moreover something like Hive that provides the means for persistent storage (well HDFS is the one that stores all the data) will have an in-memory type capability much like what Oracle TimesTe

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Michael Segel
And you have MapR supporting Apache Drill. So these are all alternatives to Spark, and its not necessarily an either or scenario. You can have both. > On May 30, 2016, at 12:49 PM, Mich Talebzadeh > wrote: > > yep Hortonworks supports Tez for one reason or other which I am going > hopefull

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Jörn Franke
I do not think that in-memory itself will make things faster in all cases. Especially if you use Tez with Orc or parquet. Especially for ad hoc queries on large dataset (indecently if they fit in-memory or not) this will have a significant impact. This is an experience I have also with the in-m

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Ovidiu-Cristian MARCU
Spark in relation to Tez can be like a Flink runner for Apache Beam? The use case of Tez however may be interesting (but current implementation only YARN-based?) Spark is efficient (or faster) for a number of reasons, including its ‘in-memory’ execution (from my understanding and experiments).

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Mich Talebzadeh
yep Hortonworks supports Tez for one reason or other which I am going hopefully to test it as the query engine for hive. Tthough I think Spark will be faster because of its in-memory support. Also if you are independent then you better off dealing with Spark and Hive without the need to support an

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Michael Segel
Mich, Most people use vendor releases because they need to have the support. Hortonworks is the vendor who has the most skin in the game when it comes to Tez. If memory serves, Tez isn’t going to be M/R but a local execution engine? Then LLAP is the in-memory piece to speed up Tez? HTH -M

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Mich Talebzadeh
thanks I think the problem is that the TEZ user group is exceptionally quiet. Just sent an email to Hive user group to see anyone has managed to built a vendor independent version. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Jörn Franke
Well I think it is different from MR. It has some optimizations which you do not find in MR. Especially the LLAP option in Hive2 makes it interesting. I think hive 1.2 works with 0.7 and 2.0 with 0.8 . At least for 1.2 it is integrated in the Hortonworks distribution. > On 29 May 2016, at 21

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Mich Talebzadeh
Hi Jorn, I started building apache-tez-0.8.2 but got few errors. Couple of guys from TEZ user group kindly gave a hand but I could not go very far (or may be I did not make enough efforts) making it work. That TEZ user group is very quiet as well. My understanding is TEZ is MR with DAG but of co

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-29 Thread Jörn Franke
Very interesting do you plan also a test with TEZ? > On 29 May 2016, at 13:40, Mich Talebzadeh wrote: > > Hi, > > I did another study of Hive using Spark engine compared to Hive with MR. > > Basically took the original table imported using Sqoop and created and > populated a new ORC table par

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-24 Thread Mich Talebzadeh
Hi, We use Hive as the database and use Spark as an all purpose query tool. Whether Hive is the write database for purpose or one is better off with something like Phoenix on Hbase, well the answer is it depends and your mileage varies. So fit for purpose. Ideally what wants is to use the faste

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread ayan guha
Hi Thanks for very useful stats. Did you have any benchmark for using Spark as backend engine for Hive vs using Spark thrift server (and run spark code for hive queries)? We are using later but it will be very useful to remove thriftserver, if we can. On Tue, May 24, 2016 at 9:51 AM, Jörn Franke

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Jörn Franke
Hi Mich, I think these comparisons are useful. One interesting aspect could be hardware scalability in this context. Additionally different type of computations. Furthermore, one could compare Spark and Tez+llap as execution engines. I have the gut feeling that each one can be justified by di

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Ashok Kumar
Hi Dr Mich, This is very good news. I will be interested to know how Hive engages with Spark as an engine. What Spark processes are used to make this work?  Thanking you On Monday, 23 May 2016, 19:01, Mich Talebzadeh wrote: Have a look at this thread Dr Mich Talebzadeh LinkedIn   https

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Mich Talebzadeh
Have a look at this thread Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 23 May 2016 at 09:10, Mich

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Mich Talebzadeh
Hi Timur and everyone. I will answer your first question as it is very relevant 1) How to make 2 versions of Spark live together on the same cluster (libraries clash, paths, etc.) ? Most of the Spark users perform ETL, ML operations on Spark as well. So, we may have 3 Spark installations simultan

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-22 Thread Timur Shenkao
Hi, Thanks a lot for such interesting comparison. But important questions remain / to be addressed: 1) How to make 2 versions of Spark live together on the same cluster (libraries clash, paths, etc.) ? Most of the Spark users perform ETL, ML operations on Spark as well. So, we may have 3 Spark ins

Re: Using Spark SQL / Hive on AWS EMR

2016-03-03 Thread Suresh Thalamati
the error ('Invalid method name: ‘alter_table_with_cascade’’) you are seeing may be related to mismatch of hive versions. Error looks similar to one reported in https://issues.apache.org/jira/browse/SPARK-12496 > On Mar 3, 2016, at 7:43 AM, Gourav Sengupta wrote: > > Hi, > > Why are you

Re: Using Spark SQL / Hive on AWS EMR

2016-03-03 Thread Gourav Sengupta
Hi, Why are you trying to load data into HIVE and then access it via hiveContext? (by the way hiveContext tables are not visible in the sqlContext). Please read the data directly into a SPARK dataframe and then register it as a temp table to run queries on it. Regards, Gourav On Thu, Mar 3, 20

Re: Using Spark functional programming rather than SQL, Spark on Hive tables

2016-02-24 Thread Mich Talebzadeh
Well spotted Sab. You are correct. An oversight by me. They should both use "sales". The results are now comparable The following statement "On the other hand using SQL the query 1 takes 19 seconds compared to just under 4 minutes for functional programming The seconds query using SQL ta

  1   2   3   >