Re:Re: How can I read this avro file using spark scala?

2015-02-11 Thread Todd
Databricks provides a sample code on its website...but i can't find it for now. At 2015-02-12 00:43:07, captainfranz captainfr...@gmail.com wrote: I am confused as to whether avro support was merged into Spark 1.2 or it is still an independent library. I see some people writing

Re:Re: A signature in Logging.class refers to type Logger in package org.slf4j which is not available.

2015-02-11 Thread Todd
yuzhih...@gmail.com wrote: Spark depends on slf4j 1.7.5 Please check your classpath and make sure slf4j is included. Cheers On Wed, Feb 11, 2015 at 6:20 AM, Todd bit1...@163.com wrote: After compiling the Spark 1.2.0 codebase in Intellj Idea, and run the LocalPi example,I got the following

A signature in Logging.class refers to type Logger in package org.slf4j which is not available.

2015-02-11 Thread Todd
After compiling the Spark 1.2.0 codebase in Intellj Idea, and run the LocalPi example,I got the following slf4j related issue. Does anyone know how to fix this? Thanks Error:scalac: bad symbolic reference. A signature in Logging.class refers to type Logger in package org.slf4j which is not

Re:Is Databricks log analysis reference app only based on Java API

2015-02-18 Thread Todd
sorry for the noise. I have found it.. At 2015-02-18 23:34:40, Todd bit1...@163.com wrote: Looks the log anylysis reference app provided by Databricks at https://github.com/databricks/reference-apps only has java API? I'd like to see the Scala version one.

Is Databricks log analysis reference app only based on Java API

2015-02-18 Thread Todd
Looks the log anylysis reference app provided by Databricks at https://github.com/databricks/reference-apps only has java API? I'd like to see the Scala version one.

I think I am almost lost in the internals of Spark

2015-01-06 Thread Todd
I am a bit new to Spark, except that I tried simple things like word count, and the examples given in the spark sql programming guide. Now, I am investigating the internals of Spark, but I think I am almost lost, because I could not grasp a whole picture what spark does when it executes the

Re:Re: EventBatch and SparkFlumeProtocol not found in spark codebase?

2015-01-09 Thread Todd
Thanks Sean. I follow the guide, import the codebase into IntellijIdea as Maven project, with the profiles:hadoop2.4 and yarn. In the maven project view, I run Maven Install against the module: Spark Project Parent POM(root).After a pretty long time, all the modules are built successfully.

Build spark source code with Maven in Intellij Idea

2015-01-08 Thread Todd
Hi, I have imported the Spark source code in Intellij Idea as a SBT project. I try to do maven install in Intellij Idea by clicking Install in the Spark Project Parent POM(root),but failed. I would ask which profiles should be checked. What I want to achieve is staring Spark in IDE and Hadoop

Why there are overlapping for tasks on the EventTimeline UI

2015-08-18 Thread Todd
Hi, Following is copied from the spark EventTimeline UI. I don't understand why there are overlapping between tasks? I think they should be sequentially one by one in one executor(there are one core each executor). The blue part of each task is the scheduler delay time. Does it mean it is the

Paper on Spark SQL

2015-08-17 Thread Todd
Hi, I can't access http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf. Could someone help try to see if it is available and reply with it?Thanks!

Can't understand the size of raw RDD and its DataFrame

2015-08-15 Thread Todd
Hi, With following code snippet, I cached the raw RDD(which is already in memory, but just for illustration) and its DataFrame. I thought that the df cache would take less space than the rdd cache,which is wrong because from the UI that I see the rdd cache takes 168B,while the df cache takes

Re:Re: Can't understand the size of raw RDD and its DataFrame

2015-08-15 Thread Todd
expecting footprint of dataframe to be lower when it contains more information ( RDD + Schema) On Sat, Aug 15, 2015 at 6:35 PM, Todd bit1...@163.com wrote: Hi, With following code snippet, I cached the raw RDD(which is already in memory, but just for illustration) and its DataFrame. I thought

Re:Re: Regarding rdd.collect()

2015-08-18 Thread Todd
One spark application can have many jobs,eg,first call rdd.count then call rdd.collect At 2015-08-18 15:37:14, Hemant Bhanawat hemant9...@gmail.com wrote: It is still in memory for future rdd transformations and actions. This is interesting. You mean Spark holds the data in memory

Re:Changed Column order in DataFrame.Columns call and insertIntoJDBC

2015-08-18 Thread Todd
Take a look at the doc for the method: /** * Applies a schema to an RDD of Java Beans. * * WARNING: Since there is no guaranteed ordering for fields in a Java Bean, * SELECT * queries will return the columns in an undefined order. * @group dataframes * @since

blogs/articles/videos on how to analyse spark performance

2015-08-19 Thread Todd
Hi, I would ask if there are some blogs/articles/videos on how to analyse spark performance during runtime,eg, tools that can be used or something related.

Re:Re: How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Todd
? Is there a way to auto relaunch if driver runs as a Hadoop Yarn Application? On Wednesday, 19 August 2015 12:49 PM, Todd bit1...@163.com wrote: There is an option for the spark-submit (Spark standalone or Mesos with cluster deploy mode only) --supervise If given, restarts

Re:Why there are overlapping for tasks on the EventTimeline UI

2015-08-18 Thread Todd
I think I find the answer.. On the UI, the recording time of each task is when it is put into the thread pool. Then the UI makes sense At 2015-08-18 17:40:07, Todd bit1...@163.com wrote: Hi, Following is copied from the spark EventTimeline UI. I don't understand why there are overlapping

Re:SPARK sql :Need JSON back isntead of roq

2015-08-21 Thread Todd
please try DataFrame.toJSON, it will give you an RDD of JSON string. At 2015-08-21 15:59:43, smagadi sudhindramag...@fico.com wrote: val teenagers = sqlContext.sql(SELECT name FROM people WHERE age = 13 AND age = 19) I need teenagers to be a JSON object rather a simple row .How can we get

Re:How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Todd
There is an option for the spark-submit (Spark standalone or Mesos with cluster deploy mode only) --supervise If given, restarts the driver on failure. At 2015-08-19 14:55:39, Spark Enthusiast sparkenthusi...@yahoo.in wrote: Folks, As I see, the Driver program is a

Does spark sql support column indexing

2015-08-19 Thread Todd
I don't find related talk on whether spark sql supports column indexing. If it does, is there guide how to do it? Thanks.

Understanding the two jobs run with spark sql join

2015-08-16 Thread Todd
Hi,I have a basic spark sql join run in the local mode. I checked the UI,and see that there are two jobs are run. There DAG graph are pasted at the end. I have several questions here: 1. Looks that Job0 and Job1 all have the same DAG Stages, but the stage 3 and stage4 are skipped. I would ask

Re:Re: About Databricks's spark-sql-perf

2015-08-13 Thread Todd
requires that you have already created the data/tables. I'll work on updating the README as the QA period moves forward. On Thu, Aug 13, 2015 at 6:49 AM, Todd bit1...@163.com wrote: Hi, I got a question about the spark-sql-perf project by Databricks at https://github.com/databricks/spark-sql

Materials for deep insight into Spark SQL

2015-08-13 Thread Todd
Hi, I would ask whether there are slides, blogs or videos on the topic about how spark sql is implemented, the process or the whole picture when spark sql executes the code, Thanks!.

About Databricks's spark-sql-perf

2015-08-13 Thread Todd
Hi, I got a question about the spark-sql-perf project by Databricks at https://github.com/databricks/spark-sql-perf/ The Tables.scala (https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/bigdata/Tables.scala) and BigData

Re:Re: Materials for deep insight into Spark SQL

2015-08-14 Thread Todd
/a/databricks.com/document/d/1Hc_Ehtr0G8SQUg69cmViZsMi55_Kf3tISD9GPGU5M1Y/edit FYI On Thu, Aug 13, 2015 at 8:54 PM, Todd bit1...@163.com wrote: Hi, I would ask whether there are slides, blogs or videos on the topic about how spark sql is implemented, the process or the whole picture when spark sql

What does Attribute and AttributeReference mean in Spark SQL

2015-08-24 Thread Todd
There are many such kind of case class or concept such as Attribute/AttributeReference/Expression in Spark SQL I would ask what Attribute/AttributeReference/Expression mean, given a sql query like select a,b from c, it a, b are two Attributes? a + b is an expression? Looks I misunderstand it

How to use --principal and --keytab in SparkSubmit

2015-11-08 Thread Todd
Hi, I am staring spark thrift server with the following script, ./start-thriftserver.sh --master yarn-client --driver-memory 1G --executor-memory 2G --driver-cores 2 --executor-cores 2 --num-executors 4 --hiveconf hive.server2.thrift.port=10001 --hiveconf

How 'select name,age from TBL_STUDENT where age = 37' is optimized when caching it

2015-11-16 Thread Todd
Hi, When I cache the dataframe and run the query, val df = sqlContext.sql("select name,age from TBL_STUDENT where age = 37") df.cache() df.show println(df.queryExecution) I got the following execution plan,from the optimized logical plan,I can see the whole analyzed logical

Required file not found: sbt-interface.jar

2015-11-02 Thread Todd
Hi, I am trying to build spark 1.5.1 in my environment, but encounter the following error complaining Required file not found: sbt-interface.jar: The error message is below and I am building with: ./make-distribution.sh --name spark-1.5.1-bin-2.6.0 --tgz --with-tachyon -Phadoop-2.6

[Spark R]could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'

2015-11-06 Thread Todd
I am launching spark R with following script: ./sparkR --driver-memory 12G and I try to load a local 3G csv file with following code, > a=read.transactions("/home/admin/datamining/data.csv",sep="\t",format="single",cols=c(1,2)) but I encounter an error: could not allocate memory (2048 Mb) in

BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Todd
I am using tachyon in the spark program below,but I encounter a BlockNotFoundxception. Does someone know what's wrong and also is there guide on how to configure spark to work with Tackyon?Thanks! conf.set(spark.externalBlockStore.url, tachyon://10.18.19.33:19998)

Re:Re: How to increase data scale in Spark SQL Perf

2015-08-25 Thread Todd
. Are you able to get more detailed error message ? Thanks On Aug 25, 2015, at 6:57 PM, Todd bit1...@163.com wrote: Thanks Ted Yu. Following are the error message: 1. The exception that is shown on the UI is : Exception in thread Thread-113 Exception in thread Thread-126 Exception

Re:Re:Re: How to increase data scale in Spark SQL Perf

2015-08-26 Thread Todd
Sorry for the noise, It's my bad...I have worked it out now. At 2015-08-26 13:20:57, Todd bit1...@163.com wrote: I think the answer is No. I only see such message on the console..and #2 is the thread stack trace。 I am thinking is that in Spark SQL Perf forks many dsdgen process to generate

Re:Re: Exception throws when running spark pi in Intellij Idea that scala.collection.Seq is not found

2015-08-25 Thread Todd
to understand more about scope of modules: https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html On Tue, Aug 25, 2015 at 12:18 PM, Todd bit1...@163.com wrote: I cloned the code from https://github.com/apache/spark to my machine. It can compile successfully

Re:RE: Test case for the spark sql catalyst

2015-08-25 Thread Todd
Thanks Chenghao! At 2015-08-25 13:06:40, Cheng, Hao hao.ch...@intel.com wrote: Yes, check the source code under:https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst From: Todd [mailto:bit1...@163.com] Sent: Tuesday, August 25, 2015 1:01

Exception throws when running spark pi in Intellij Idea that scala.collection.Seq is not found

2015-08-25 Thread Todd
I cloned the code from https://github.com/apache/spark to my machine. It can compile successfully, But when I run the sparkpi, it throws an exception below complaining the scala.collection.Seq is not found. I have installed scala2.10.4 in my machine, and use the default profiles:

Test case for the spark sql catalyst

2015-08-24 Thread Todd
Hi, Are there test cases for the spark sql catalyst, such as testing the rules of transforming unsolved query plan? Thanks!

Re:Re: What does Attribute and AttributeReference mean in Spark SQL

2015-08-25 Thread Todd
:13 PM, Todd bit1...@163.com wrote: There are many such kind of case class or concept such as Attribute/AttributeReference/Expression in Spark SQL I would ask what Attribute/AttributeReference/Expression mean, given a sql query like select a,b from c, it a, b are two Attributes? a + b

Re:Re: How to increase data scale in Spark SQL Perf

2015-08-26 Thread Todd
Increase the number of executors, :-) At 2015-08-26 16:57:48, Ted Yu yuzhih...@gmail.com wrote: Mind sharing how you fixed the issue ? Cheers On Aug 26, 2015, at 1:50 AM, Todd bit1...@163.com wrote: Sorry for the noise, It's my bad...I have worked it out now. At 2015-08-26 13:20:57

How to increase data scale in Spark SQL Perf

2015-08-25 Thread Todd
Hi, The spark sql perf itself contains benchmark data generation. I am using spark shell to run the spark sql perf to generate the data with 10G memory for both driver and executor. When I increase the scalefactor to be 30,and run the job, Then I got the following error: When I jstack it to

Re:Re: How to increase data scale in Spark SQL Perf

2015-08-25 Thread Todd
- or paste error in text. Cheers On Tue, Aug 25, 2015 at 4:22 AM, Todd bit1...@163.com wrote: Hi, The spark sql perf itself contains benchmark data generation. I am using spark shell to run the spark sql perf to generate the data with 10G memory for both driver and executor. When I increase

spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
Hi, I am using data generated with sparksqlperf(https://github.com/databricks/spark-sql-perf) to test the spark sql performance (spark on yarn, with 10 nodes) with the following code (The table store_sales is about 90 million records, 6G in size) val

Re:Re: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-13 Thread Todd
t;> code generation could introduce slowness >> >> >> 在2015年09月11日 15:58,Cheng, Hao 写道: >> >> Can you confirm if the query really run in the cluster mode? Not the local >> mode. Can you print the call stack of the executor when the query is running? &g

Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
e;” in Spark 1.5, and run the query again? In our previous testing, it’s about 20% slower for sort merge join. I am not sure if there anything else slow down the performance. Hao From: Jesse F Chen [mailto:jfc...@us.ibm.com] Sent: Friday, September 11, 2015 1:18 PM To: Michael Armbrus

Re:Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
oth runs would be helpful whenever reporting performance changes. On Thu, Sep 10, 2015 at 1:24 AM, Todd <bit1...@163.com> wrote: Hi, I am using data generated with sparksqlperf(https://github.com/databricks/spark-sql-perf) to test the spark sql performance (spark on yarn, with 10 nodes

Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
5, and it’s true by default, but we found it probably causes the performance reduce dramatically. From: Todd [mailto:bit1...@163.com] Sent: Friday, September 11, 2015 2:17 PM To: Cheng, Hao Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org Subject: Re:RE: spark 1.5 SQL slows down dr

Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
',there is no table to show queries and execution plan information. At 2015-09-11 14:39:06, "Todd" <bit1...@163.com> wrote: Thanks Hao. Yes,it is still low as SMJ。Let me try the option your suggested, At 2015-09-11 14:34:46, "Cheng, Hao" <hao.ch...@intel.com> w

Compile error when compiling spark 2.0.0 snapshot code base in IDEA

2016-01-27 Thread Todd
Hi, I am able to maven install the whole spark project(from github ) in my IDEA. But, when I run the SparkPi example, IDEA compiles the code again and following exeception is thrown, Does someone meet this problem? Thanks a lot. Error:scalac: while compiling:

Re:Hive on Spark knobs

2016-01-28 Thread Todd
Did you run hive on spark with spark 1.5 and hive 1.1? I think hive on spark doesn't support spark 1.5. There are compatibility issues. At 2016-01-28 01:51:43, "Ruslan Dautkhanov" wrote: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

How data locality is honored when spark is running on yarn

2016-01-27 Thread Todd
Hi, I am kind of confused about how data locality is honored when spark is running on yarn(client or cluster mode),can someone please elaberate on this? Thanks!

What's the benifit of RDD checkpoint against RDD save

2016-03-23 Thread Todd
Hi, I have a long computing chain, when I get the last RDD after a series of transformation. I have two choices to do with this last RDD 1. Call checkpoint on RDD to materialize it to disk 2. Call RDD.saveXXX to save it to HDFS, and read it back for further processing I would ask which choice

Does spark support Apache Arrow

2016-05-19 Thread Todd
From the official site http://arrow.apache.org/, Apache Arrow is used for Columnar In-Memory storage. I have two quick questions: 1. Does spark support Apache Arrow? 2. When dataframe is cached in memory, the data are saved in columnar in-memory style. What is the relationship between this

Does Structured Streaming support Kafka as data source?

2016-05-18 Thread Todd
Hi, I brief the spark code, and it looks that structured streaming doesn't support kafka as data source yet?

How spark depends on Guava

2016-05-22 Thread Todd
Hi, In the spark code, guava maven dependency scope is provided, my question is, how spark depends on guava during runtime? I looked into the spark-assembly-1.6.1-hadoop2.6.1.jar,and didn't find class entries like com.google.common.base.Preconditions etc...

Re:Re: How spark depends on Guava

2016-05-23 Thread Todd
r" <m...@schaffer.me> wrote: I got curious so I tried sbt dependencyTree. Looks like Guava comes into spark core from a couple places. -Mat matschaffer.com On Mon, May 23, 2016 at 2:32 PM, Todd <bit1...@163.com> wrote: Can someone please take alook at my question?I am

Re:How spark depends on Guava

2016-05-22 Thread Todd
Can someone please take alook at my question?I am spark-shell local mode and yarn-client mode.Spark code uses guava library,spark should have guava in place during run time. Thanks. At 2016-05-23 11:48:58, "Todd" <bit1...@163.com> wrote: Hi, In the spark code, guava

Re:why spark 1.6 use Netty instead of Akka?

2016-05-23 Thread Todd
As far as I know, there would be Akka version conflicting issue when using Akka as spark streaming source. At 2016-05-23 21:19:08, "Chaoqiang" wrote: >I want to know why spark 1.6 use Netty instead of Akka? Is there some >difficult problems which Akka can not

Re:how to config spark thrift jdbc server high available

2016-05-23 Thread Todd
There is a jira that works on spark thrift server HA, the patch works,but still hasn't merged into the master branch. At 2016-05-23 20:10:26, "qmzhang" <578967...@qq.com> wrote: >Dear guys, please help... > >In hive,we can enable hiveserver2 high available by using dynamic service

Re:Re: Code Example of Structured Streaming of 2.0

2016-05-17 Thread Todd
Thanks Ted! At 2016-05-17 16:16:09, "Ted Yu" <yuzhih...@gmail.com> wrote: Please take a look at: [SPARK-13146][SQL] Management API for continuous queries [SPARK-14555] Second cut of Python API for Structured Streaming On Mon, May 16, 2016 at 11:46 PM, Todd <bit1...@

Does Structured Streaming support count(distinct) over all the streaming data?

2016-05-17 Thread Todd
Hi, We have a requirement to do count(distinct) in a processing batch against all the streaming data(eg, last 24 hours' data),that is,when we do count(distinct),we actually want to compute distinct against last 24 hours' data. Does structured streaming support this scenario?Thanks!

Code Example of Structured Streaming of 2.0

2016-05-17 Thread Todd
Hi, Are there code examples about how to use the structured streaming feature? Thanks.

Re:Re: Does Structured Streaming support count(distinct) over all the streaming data?

2016-05-17 Thread Todd
ByValueAndWindow(Seconds(windowLength), Seconds(slidingInterval)) HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 17 May 2016 at 20:02, Michael Armbrust <mich...@databricks.

How to use Kafka as data source for Structured Streaming

2016-05-17 Thread Todd
Hi, I am wondering whether structured streaming supports Kafka as data source. I brief the source code(meanly related with DataSourceRegister trait), and didn't find kafka data source things If Thanks.

How to change output mode to Update

2016-05-17 Thread Todd
scala> records.groupBy("name").count().write.trigger(ProcessingTime("30 seconds")).option("checkpointLocation", "file:///home/hadoop/jsoncheckpoint").startStream("file:///home/hadoop/jsonresult") org.apache.spark.sql.AnalysisException: Aggregations are not supported on streaming

Re:Re: Re: How to change output mode to Update

2016-05-17 Thread Todd
s queries // outputMode() is used for continuous queries assertNotStreaming("mode() can only be called on non-continuous queries") this.mode = saveMode this } On Wed, May 18, 2016 at 12:25 PM, Todd <bit1...@163.com> wrote: Thanks Ted. I didn't try, but I think SaveMode and OuputM

RE: Unit test failure: Address already in use

2014-06-18 Thread Lisonbee, Todd
, Todd From: Anselme Vignon [mailto:anselme.vig...@flaminem.com] Sent: Wednesday, June 18, 2014 12:33 AM To: user@spark.apache.org Subject: Re: Unit test failure: Address already in use Hi, Could your problem come from the fact that you run your tests in parallel ? If you are spark in local mode

subscribe

2014-07-28 Thread James Todd

RE: Shuffle files

2014-10-07 Thread Lisonbee, Todd
-on-reduceByKey-td2462.html Thanks, Todd -Original Message- From: SK [mailto:skrishna...@gmail.com] Sent: Tuesday, October 7, 2014 2:12 PM To: u...@spark.incubator.apache.org Subject: Re: Shuffle files - We set ulimit to 50. But I still get the same too many open files warning. - I tried

Re: Link existing Hive to Spark

2015-02-06 Thread Todd Nist
points to the Hive Metastore URI in your cluster --* valuethrift://*HostNameHere*:9083/value descriptionURI for client to contact metastore server/description /property /configuration HTH. -Todd On Fri, Feb 6, 2015 at 4:12 AM, ashu ashutosh.triv...@iiitb.org wrote: Hi, I have Hive

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
on this one. Anything I may be missing here? Thanks for the help, it is much appreciated. I will give Arush suggestion a try tomorrow. -Todd On Tue, Feb 10, 2015 at 7:24 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: Todd, I just tried it in bin/spark-sql shell. I created a folder

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Todd Nist
for the assistance. -Todd On Wed, Feb 11, 2015 at 3:20 PM, Andrew Lee alee...@hotmail.com wrote: Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read the logs since there were other activities going on on the cluster. -- From: alee

Re: Is it possible to expose SchemaRDD’s from thrift server?

2015-02-12 Thread Todd Nist
.html On Thu, Feb 12, 2015 at 7:24 AM, Todd Nist tsind...@gmail.com wrote: I have a question with regards to accessing SchemaRDD’s and Spark SQL temp tables via the thrift server. It appears that a SchemaRDD when created is only available in the local namespace / context and are unavailable

Re: No suitable driver found error, Create table in hive from spark sql

2015-02-19 Thread Todd Nist
Hi Dhimant, I believe if you change your spark-shell to pass -driver-class-path /usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar vs putting it in --jars. -Todd On Wed, Feb 18, 2015 at 10:41 PM, Dhimant dhimant84.jays...@gmail.com wrote: Found solution from one of the post found

Re: Tableau beta connector

2015-02-19 Thread Todd Nist
(path '/user/data/json/test/*’); cache table test; 1. Refresh connection. Then select “New Custom SQL” and issue something like: select * from test; You will see your table appear. HTH. -Todd On Thu, Feb 19, 2015 at 5:41 AM, ashu ashutosh.triv...@iiitb.org wrote: Hi, I would like

Re: Where to look for potential causes for Akka timeout errors in a Spark Streaming Application?

2015-02-20 Thread Todd Nist
Hi Emre, Have you tried adjusting these: .set(spark.akka.frameSize, 500).set(spark.akka.askTimeout, 30).set(spark.core.connection.ack.wait.timeout, 600) -Todd On Fri, Feb 20, 2015 at 8:14 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, We are building a Spark Streaming application

Re: SparkSQL + Tableau Connector

2015-02-19 Thread Todd Nist
, but does work for now. I also have it working by doing a SaveAsTable on the ingested data which stores the reference into the metastore for access via the thrift server. Thanks for the help. -Todd On Wed, Feb 11, 2015 at 8:41 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: Hey Todd, I

Re: Set EXTRA_JAR environment variable for spark-jobserver

2015-01-06 Thread Todd Nist
your executors: spark.executor.extraClassPath=. HTH. -Todd On Tue, Jan 6, 2015 at 10:00 AM, bchazalet bchaza...@companywatch.net wrote: It does not look like you're supposed to fiddle with the SparkConf and even SparkContext in a 'job' (again, I don't know much about jobserver), as you're

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Todd Nist
for the assistance. -Todd On Wed, Feb 11, 2015 at 2:34 AM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: BTW what tableau connector are you using? On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: I am a little confused here, why do you want to create the tables

SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
something to expose these via hive / metastore other than creating a table in hive? 3. Does the thriftserver need to be configured to expose these in some fashion, sort of related to question 2. TIA for the assistance. -Todd

Is it possible to expose SchemaRDD’s from thrift server?

2015-02-12 Thread Todd Nist
like this: create temporary table test using org.apache.spark.sql.json options (path ‘/data/json/*'); cache table test; I am using Spark 1.2.1. If not available now will it be in 1.3.x? Or is the only way to achieve this is store into the metastore and does the imply hive. -Todd

Re: Unable to query hive tables from spark

2015-02-15 Thread Todd Nist
HTH. -Todd On Thu, Feb 12, 2015 at 1:16 AM, kundan kumar iitr.kun...@gmail.com wrote: I want to create/access the hive tables from spark. I have placed the hive-site.xml inside the spark/conf directory. Even though it creates a local metastore in the directory where I run the spark shell

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
/resources/kv1.txt' INTO TABLE src) // Queries are expressed in HiveQLsqlContext.sql(FROM src SELECT key, value).collect().foreach(println) Or did you have something else in mind? -Todd On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist tsind...@gmail.com wrote: Arush, Thank you will take a look

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
Arush, Thank you will take a look at that approach in the morning. I sort of figured the answer to #1 was NO and that I would need to do 2 and 3 thanks for clarifying it for me. -Todd On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: 1. Can the connector

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
to the requested database. Thanks again for the suggestion and I will give work with it a bit more tomorrow. -Todd On Tue, Feb 10, 2015 at 5:48 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: Hi Todd, What you could do is run some SparkSQL commands immediately after the Thrift

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Todd Nist
to be working fine with HDP as well and steps 2a and 2b are not required. HTH -Todd On Mon, Mar 16, 2015 at 10:13 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi, Trying to run spark ( 1.2.1 built for hdp 2.2) against a yarn cluster results in the AM failing to start with following

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-17 Thread Todd Nist
no luck running purpose-built 1.3 against HDP 2.2 after following all the instructions. Anyone else faced this issue? On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi Todd, Thanks for the help. I'll try again after building a distribution with the 1.3 sources

Re: [SQL] Elasticsearch-hadoop, exception creating temporary table

2015-03-18 Thread Todd Nist
version set to: sparkVersion = 1.2.1 Other than possibly missing an exclude that is bring in an older version of Spark from some where, I do see that I am referencing the org.apache.hadoop % hadoop-client % 2.6.0 % provided, but I don't think that is the issue. Any other thoughts? -Todd On Wed, Mar

[SQL] Elasticsearch-hadoop, exception creating temporary table

2015-03-18 Thread Todd Nist
ElasticSearch 1.4.4, spark-1.2.1-bin-hadoop2.4, and the elasticsearch-hadoop: org.elasticsearch % elasticsearch-hadoop % 2.1.0.BUILD-SNAPSHOT Any insight on what I am doing wrong? TIA for the assistance. -Todd

[Spark SQL] Elasticsearch-hadoop - exception when creating Temporary table

2015-03-18 Thread Todd Nist
ElasticSearch 1.4.4, spark-1.2.1-bin-hadoop2.4, and the elasticsearch-hadoop: org.elasticsearch % elasticsearch-hadoop % 2.1.0.BUILD-SNAPSHOT Any insight on what I am doing wrong? TIA for the assistance. -Todd

Re: hbase sql query

2015-03-12 Thread Todd Nist
/com/cloudera/spark/hbase/example/JavaHBaseMapGetPutExample.java https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/JavaHBaseContext.scala On Thu, Mar 12, 2015 at 8:34 AM, Udbhav Agarwal udbhav.agar...@syncoms.com wrote: Thanks Todd, But this link

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
with flushing remote transports.15/03/06 12:35:40 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. Thanks again for the help. -Todd On Thu, Mar 5, 2015 at 7:06 PM, Zhan Zhang zzh...@hortonworks.com wrote: In addition, you may need following patch if it is not in 1.2.1

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
=2.2.0.0-2041 spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 without the patch *${hdp.version} * was not being substituted. Thanks for pointing me to that patch, appreciate it. -Todd On Fri, Mar 6, 2015 at 1:12 PM, Zhan Zhang zzh...@hortonworks.com wrote: Hi Todd, Looks like

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
, at 11:40 AM, Zhan Zhang zzh...@hortonworks.com wrote: You are using 1.2.1 right? If so, please add java-opts in conf directory and give it a try. [root@c6401 conf]# more java-opts -Dhdp.version=2.2.2.0-2041 Thanks. Zhan Zhang On Mar 6, 2015, at 11:35 AM, Todd Nist tsind

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread Todd Nist
There is the PR https://github.com/apache/spark/pull/2077 for doing this. On Fri, Mar 13, 2015 at 6:42 AM, t1ny wbr...@gmail.com wrote: Hi all, We are looking for a tool that would let us visualize the DAG generated by a Spark application as a simple graph. This graph would represent the

Re: hbase sql query

2015-03-12 Thread Todd Nist
Have you considered using the spark-hbase-connector for this: https://github.com/nerdammer/spark-hbase-connector On Thu, Mar 12, 2015 at 5:19 AM, Udbhav Agarwal udbhav.agar...@syncoms.com wrote: Thanks Akhil. Additionaly if we want to do sql query we need to create JavaPairRdd, then

Re: Spark as a service

2015-03-24 Thread Todd Nist
Perhaps this project, https://github.com/calrissian/spark-jetty-server, could help with your requirements. On Tue, Mar 24, 2015 at 7:12 AM, Jeffrey Jedele jeffrey.jed...@gmail.com wrote: I don't think there's are general approach to that - the usecases are just to different. If you really need

OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Todd Leo
be collected to master hence further transmutations can be applied, as DataFrame has “richer optimizations under the hood” and the convention from an R/julia user, I really hope this error is able to be tackled, and DataFrame is robust enough to depend. Thanks in advance! REGARDS, Todd ​ -- View

SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread Todd Nist
into where I am off? I'm sure it is probably something small, just not seeing it yet. TIA for the assistance. -Todd

Re: Query REST web service with Spark?

2015-03-31 Thread Todd Nist
looking for. -Todd On Tue, Mar 31, 2015 at 5:06 PM, Burak Yavuz brk...@gmail.com wrote: Hi, If I recall correctly, I've read people integrating REST calls to Spark Streaming jobs in the user list. I don't imagine any cases for why it shouldn't be possible. Best, Burak On Tue, Mar 31, 2015

Re: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread Todd Nist
) Is this the right approach? Is this syntax available in 1.2.1: SELECT v1.name, v2.city, v2.state FROM people LATERAL VIEW json_tuple(people.jsonObject, 'name', 'address') v1 as name, address LATERAL VIEW json_tuple(v1.address, 'city', 'state') v2 as city, state; -Todd On Tue, Mar 31, 2015

  1   2   3   >