Re: Silly question on Dropping Temp Table

2018-05-26 Thread Aakash Basu
Well, it did, meaning, internally a TempTable and a TempView are the same. Thanks buddy! On Sat, May 26, 2018 at 9:23 PM, Aakash Basu wrote: > Question is, while registering, using registerTempTable() and while > dropping, using a dropTempView(), would it go and hit the same TempTable > interna

Re: Silly question on Dropping Temp Table

2018-05-26 Thread Aakash Basu
Question is, while registering, using registerTempTable() and while dropping, using a dropTempView(), would it go and hit the same TempTable internally or would search for a registered view? Not sure. Any idea? On Sat, May 26, 2018 at 9:04 PM, SNEHASISH DUTTA wrote: > I think it's dropTempView >

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Michael Segel
The only documentation on this… in terms of direction … (that I could find) If your client is not close to the cluster (e.g. your PC) then you definitely want to go cluster to improve performance. If your client is close to the cluster (e.g. an edge node) then you could go either client or clust

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Marcelo Vanzin
On Wed, Jun 22, 2016 at 1:32 PM, Mich Talebzadeh wrote: > Does it also depend on the number of Spark nodes involved in choosing which > way to go? Not really. -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Mich Talebzadeh
Thanks Marcelo, Sounds like cluster mode is more resilient than the client-mode. Does it also depend on the number of Spark nodes involved in choosing which way to go? Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Marcelo Vanzin
Trying to keep the answer short and simple... On Wed, Jun 22, 2016 at 1:19 PM, Michael Segel wrote: > But this gets to the question… what are the real differences between client > and cluster modes? > What are the pros/cons and use cases where one has advantages over the > other? - client mode r

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Michael Segel
LOL… I hate YARN, but unfortunately I don’t get to make the call on which tools we’re going to use, I just get paid to make stuff work on the tools provided. ;-) Testing is somewhat problematic. You have to really test at some large enough fraction of scale. Fortunately for this issue (YARN

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Mich Talebzadeh
This is exactly the sort of topics that distinguish lab work from enterprise practice :) The question on YARN client versus YARN cluster mode. I am not sure how much in real life it is going to make an impact if I choose one over the other? These days I yell developers that it is perfectly valid

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Michael Segel
JDBC reliability problem? Ok… a bit more explanation… Usually when you have to go back to a legacy system, its because the data set is usually metadata and is relatively small. Its not the sort of data that gets ingested in to a data lake unless you’re also ingesting the metadata and are us

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Mich Talebzadeh
Thanks Mike for clarification. I think there is another option to get data out of RDBMS through some form of SELECT ALL COLUMNS TAB SEPARATED OR OTHER and put them in a flat file or files. scp that file from the RDBMS directory to a private directory on HDFS system and push it into HDFS. That wil

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Michael Segel
Hi, Just to clear a few things up… First I know its hard to describe some problems because they deal with client confidential information. (Also some basic ‘dead hooker’ thought problems to work through before facing them at a client.) The questions I pose here are very general and deal wit

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-21 Thread Mich Talebzadeh
If you are going to get data out of an RDBMS like Oracle then the correct procedure is: 1. Use Hive on Spark execution engine. That improves Hive performance 2. You can use JDBC through Spark itself. No issue there. It will use JDBC provided by HiveContext 3. JDBC is fine. Every vendo

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-21 Thread Jörn Franke
I would import data via sqoop and put it on HDFS. It has some mechanisms to handle the lack of reliability by jdbc. Then you can process the data via Spark. You could also use jdbc rdd but I do not recommend to use it, because you do not want to pull data all the time out of the database when

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-21 Thread ayan guha
I may be wrong here, but beeline is basically a client library. So you "connect" to STS and/or HS2 using beeline. Spark connecting to jdbc is different discussion and no way related to beeline. When you read data from DB (Oracle, DB2 etc) then you do not use beeline, but use jdbc connection to the

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-21 Thread Michael Segel
Sorry, I think you misunderstood. Spark can read from JDBC sources so to say using beeline as a way to access data is not a spark application isn’t really true. Would you say the same if you were pulling data in to spark from Oracle or DB2? There are a couple of different design patterns and

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-21 Thread ayan guha
1. Yes, in the sense you control number of executors from spark application config. 2. Any IO will be done from executors (never ever on driver, unless you explicitly call collect()). For example, connection to a DB happens one for each worker (and used by local executors). Also, if you run a reduc

Re: Silly Question on my part...

2016-05-17 Thread Gene Pang
Hi Michael, Yes, you can use Alluxio to share Spark RDDs. Here is a blog post about getting started with Spark and Alluxio ( http://www.alluxio.com/2016/04/getting-started-with-alluxio-and-spark/), and some documentation ( http://alluxio.org/documentation/master/en/Running-Spark-on-Alluxio.html).

Re: Silly Question on my part...

2016-05-17 Thread Dood
On 5/16/2016 12:12 PM, Michael Segel wrote: For one use case.. we were considering using the thrift server as a way to allow multiple clients access shared RDDs. Within the Thrift Context, we create an RDD and expose it as a hive table. The question is… where does the RDD exist. On the Thrift

Re: Silly Question on my part...

2016-05-17 Thread Michael Segel
Thanks for the response. That’s what I thought, but I didn’t want to assume anything. (You know what happens when you ass u me … :-) Not sure about Tachyon though. Its a thought, but I’m very conservative when it comes to design choices. > On May 16, 2016, at 5:21 PM, John Trengrove >

Re: Silly Question on my part...

2016-05-16 Thread John Trengrove
If you are wanting to share RDDs it might be a good idea to check out Tachyon / Alluxio. For the Thrift server, I believe the datasets are located in your Spark cluster as RDDs and you just communicate with it via the Thrift JDBC Distributed Query Engine connector. 2016-05-17 5:12 GMT+10:00 Micha

Re: Silly question...

2016-04-13 Thread Mich Talebzadeh
These are the components *java -versionjava version "1.8.0_77"*Java(TM) SE Runtime Environment (build 1.8.0_77-b03) Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode) *hadoop versionHadoop 2.6.0*Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba9

Re: Silly question...

2016-04-13 Thread Michael Segel
Mich Are you building your own releases from the source? Which version of Scala? Again, the builds seem to be ok and working, but I don’t want to hit some ‘gotcha’ if I could avoid it. > On Apr 13, 2016, at 7:15 AM, Mich Talebzadeh > wrote: > > Hi, > > I am not sure this helps. > > we

Re: Silly question...

2016-04-13 Thread Mich Talebzadeh
Hi, I am not sure this helps. we use Spark 1.6 and Hive 2. I also use JDBC (beeline for Hive) plus Oracle and Sybase. They all work fine. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Silly question about building Spark 1.4.1

2015-07-20 Thread Michael Segel
Thanks Dean… I was building based on the information found on the Spark 1.4.1 documentation. So I have to ask the following: Shouldn’t the examples be updated to reflect Hadoop 2.6 or are the vendors’ distro not up to 2.6 and that’s why its still showing 2.4? Also I’m trying to build with s

Re: Silly question about building Spark 1.4.1

2015-07-20 Thread Ted Yu
In master (as well as 1.4.1) I don't see hive profile in pom.xml I do find hive-provided profile, though. FYI On Mon, Jul 20, 2015 at 1:05 PM, Dean Wampler wrote: > hadoop-2.6 is supported (look for "profile" XML in the pom.xml file). > > For Hive, add "-Phive -Phive-thriftserver" (See > http

Re: Silly question about building Spark 1.4.1

2015-07-20 Thread Dean Wampler
hadoop-2.6 is supported (look for "profile" XML in the pom.xml file). For Hive, add "-Phive -Phive-thriftserver" (See http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables) for more details. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition