[ANNOUNCE] .NET for Apache Spark™ 2.1 released

2022-02-02 Thread Terry Kim
ents of this release. Here are the some of the highlights: - Support for Apache Spark 3.2 - Exposing new SQL function APIs introduced in Spark 3.2 We would like to thank the community for the great feedback and all those who contributed to this release. Thanks, Terry Kim on behalf of the .NE

Announcing Hyperspace v0.4.0 - an indexing subsystem for Apache Spark™

2021-02-08 Thread Terry Kim
PR to support Iceberg tables. We would like to thank the community for the great feedback and all those who contributed to this release. Thanks, Terry Kim on behalf of the Hyperspace team

Re: [Spark SQL]HiveQL and Spark SQL producing different results

2021-01-12 Thread Terry Kim
Ying, Can you share a query that produces different results? Thanks, Terry On Sun, Jan 10, 2021 at 1:48 PM Ying Zhou wrote: > Hi, > > I run some SQL using both Hive and Spark. Usually we get the same results. > However when a window function is in the script Hive and Spark

Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

2020-11-17 Thread Terry Kim
nd all those who contributed to this release. Thanks, Terry Kim on behalf of the Hyperspace team

Announcing .NET for Apache Spark™ 1.0

2020-11-06 Thread Terry Kim
- Support for all the complex types in Spark SQL - Support for Delta Lake <https://github.com/delta-io/delta> v0.7 and Hyperspace <https://github.com/microsoft/hyperspace> v0.2 We would like to thank the community for the great feedback and all those who contributed to this release

Re: Renaming a DataFrame column makes Spark lose partitioning information

2020-08-04 Thread Terry Kim
on($"c") .explain() // Exiting paste mode, now interpreting. == Physical Plan == *(1) Project [a#7, b#8 AS c#11] +- Exchange hashpartitioning(b#8, 200), false, [id=#12] +- LocalTableScan [a#7, b#8] Thanks, Terry On Tue, Aug 4, 2020 at 6:26 AM Antoine Wendlinger wrote: >

Re: Future timeout

2020-07-20 Thread Terry Kim
"spark.sql.broadcastTimeout" is the config you can use: https://github.com/apache/spark/blob/fe07521c9efd9ce0913eee0d42b0ffd98b1225ec/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L863 Thanks, Terry On Mon, Jul 20, 2020 at 11:20 AM Amit Sharma wrote: > P

Announcing .NET for Apache Spark™ 0.12

2020-07-02 Thread Terry Kim
4.6 (3.0 support is on the way!) - SparkSession.CreateDataFrame, Broadcast variable - Preliminary support for MLLib (TF-IDF, Word2Vec, Bucketizer, etc.) - Support for .NET Core 3.1 We would like to thank all those who contributed to this release. Thanks, Terry Kim on behalf of the .NET for Apache Spark™ team

Hyperspace v0.1 is now open-sourced!

2020-07-02 Thread Terry Kim
-indexing-subsystem-for-apache-spark - Docs: https://aka.ms/hyperspace This project would not have been possible without the outstanding work from the Apache Spark™ community. Thank you everyone and we look forward to collaborating with the community towards evolving Hyperspace. Thanks, Terry Kim on

Re: Using existing distribution for join when subset of keys

2020-05-31 Thread Terry Kim
ocation: InMemoryFileIndex[file:/], PartitionFilters: [], PushedFilters: [IsNotNull(x), IsNotNull(y)], ReadSchema: struct, SelectedBucketsCount: 8 out of 8 On Sun, May 31, 2020 at 2:38 PM Patrick Woody wrote: > Hey Terry, > > Thanks for the response! I'm not sure that it ends up wor

Re: Using existing distribution for join when subset of keys

2020-05-31 Thread Terry Kim
You can use bucketBy to avoid shuffling in your scenario. This test suite has some examples: https://github.com/apache/spark/blob/45cf5e99503b00a6bd83ea94d6d92761db1a00ab/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala#L343 Thanks, Terry On Sun, May 31, 2020 at 7:43

Re: [Spark SQL]: Does namespace name is always needed in a query for tables from a user defined catalog plugin

2019-12-01 Thread Terry Kim
gt; and will follow up. Thanks, Terry On Sun, Dec 1, 2019 at 7:12 PM xufei wrote: > Hi, > > I'm trying to write a catalog plugin based on spark-3.0-preview, and I > found even when I use 'use catalog.namespace' to set the current catalog > and namespace, I still ne

Announcing .NET for Apache Spark 0.5.0

2019-09-30 Thread Terry Kim
ents - Support for Spark 2.3.4/2.4.4 The release notes <https://github.com/dotnet/spark/blob/master/docs/release-notes/0.5/release-0.5.md> include the full list of features/improvements of this release. We would like to thank all those who contributed to this release. Thanks, Terry

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Terry Kim
Can the following be included? [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in EpochTracker (to support Python UDFs) <https://github.com/apache/spark/pull/24946> Thanks, Terry On Tue, Aug 13, 2019 at 10:24 PM Wenchen Fan wrote: > +1 > > On Wed, Aug 14

Announcing .NET for Apache Spark 0.4.0

2019-07-31 Thread Terry Kim
oading - Local UDF debugging The release notes <https://github.com/dotnet/spark/blob/master/docs/release-notes/0.4/release-0.4.md> include the full list of features/improvements of this release. We would like to thank all those who contributed to this release. Thanks, Terry

The last successful batch before stop re-execute after restart the DStreams with checkpoint

2018-03-11 Thread Terry Hoo
. Regards - Terry

Re: Getting memory error when starting spark shell but not often

2016-09-06 Thread Terry Hoo
Maybe not enough continues memory (10G?) in your host Regards, - Terry On Wed, Sep 7, 2016 at 10:51 AM, Divya Gehlot wrote: > Hi, > I am using EMR 4.7 with Spark 1.6 > Sometimes when I start the spark shell I get below error > > OpenJDK 64-Bit Server VM warning: INFO: os

Re: spark2.0 how to use sparksession and StreamingContext same time

2016-07-25 Thread Terry Hoo
Kevin, Try to create the StreamingContext as following: val ssc = new StreamingContext(spark.sparkContext, Seconds(2)) On Tue, Jul 26, 2016 at 11:25 AM, kevin wrote: > hi,all: > I want to read data from kafka and regist as a table then join a jdbc > table. > My sample like this : > > val spa

Re: Another problem about parallel computing

2016-06-13 Thread Terry Hoo
hero, Did you check whether there is any exception after retry? If the port is 0, the spark worker should bind to a random port. BTW, what's the spark version? Regards, - Terry On Mon, Jun 13, 2016 at 4:24 PM, hero wrote: > Hi, guys > > I have another problem about spark yarn.

Re: StackOverflow in Spark

2016-06-13 Thread Terry Hoo
Maybe the same issue with SPARK_6847 <https://issues.apache.org/jira/browse/SPARK-6847>, which has been fixed in spark 2.0 Regards - Terry On Mon, Jun 13, 2016 at 3:15 PM, Michel Hubert wrote: > > > I’ve found my problem. > > > > I’ve got a DAG with two consecutive “

ArrayIndexOutOfBoundsException in model selection via cross-validation sample with spark 1.6.1

2016-05-04 Thread Terry Hoo
submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Regards - Terry

Re: Number of batches in the Streaming Statics visualization screen

2016-01-29 Thread Terry Hoo
Yes, the data is stored in driver memory. Mehdi Ben Haj Abbes 于2016年1月29日星期五 18:13写道: > Thanks Terry for the quick answer. > > I did not tried it. Lets say I will increase the value to 2, what > side effect should I expect. In fact in the explanation of the property "Ho

Re: Number of batches in the Streaming Statics visualization screen

2016-01-29 Thread Terry Hoo
Hi Mehdi, Do you try a larger value of "spark.streaming.ui.retainedBatches"(default is 1000)? Regards, - Terry On Fri, Jan 29, 2016 at 5:45 PM, Mehdi Ben Haj Abbes wrote: > Hi folks, > > I have a streaming job running for more than 24 hours. It seems that there > is a

Re: [Spark 1.6][Streaming] About the behavior of mapWithState

2016-01-17 Thread Terry Hoo
state. Regards, -Terry On Sat, Jan 16, 2016 at 6:20 AM, Shixiong(Ryan) Zhu wrote: > Hey Terry, > > That's expected. If you want to only output (1, 3), you can use > "reduceByKey" before "mapWithState" like this: > > dstream.reduceByKey(_ + _).mapWithStat

[Spark 1.6][Streaming] About the behavior of mapWithState

2016-01-15 Thread Terry Hoo
ith the same key "1": (1,1) and (1,3), is this expected behavior? I would expect (1,3) only. Regards - Terry

[Streaming] Long time to catch up when streaming application restarts from checkpoint

2015-11-06 Thread Terry Hoo
to skip these batches or to speed up the catch up processing?* Thanks! Terry

[SQL] Memory leak with spark streaming and spark sql in spark 1.5.1

2015-10-14 Thread Terry Hoo
LListener has about 1K entries), is this a leak in SQLListener? Thanks! Terry

Re: Streaming Application Unable to get Stream from Kafka

2015-10-09 Thread Terry Hoo
Hi Prateek, How many cores (threads) do you assign to spark in local mode? It is very likely the local spark does not have enough resource to proceed. You can check http://yourip:4040 to check the details. Thanks! Terry On Fri, Oct 9, 2015 at 10:34 PM, Prateek . wrote: > Hi All, > >

Re: Cant perform full outer join

2015-09-29 Thread Terry Hoo
Saif, Might be you can rename one of the dataframe to different name first, then do an outer join and a select like this: val cur_d = cur_data.toDF("Date_1", "Value_1") val r = data.join(cur_d, data("DATE" === cur_d("Date_1", "outer").select($&q

Re: Why Checkpoint is throwing "actor.OneForOneStrategy: NullPointerException"

2015-09-24 Thread Terry Hoo
I met this before: in my program, some DStreams are not initialized since they are not in the path of of output. You can check if you are the same case. Thanks! - Terry On Fri, Sep 25, 2015 at 10:22 AM, Tathagata Das wrote: > Are you by any chance setting DStream.remember() with n

Re: How to convert dataframe to a nested StructType schema

2015-09-15 Thread Terry Hole
Hao, For spark 1.4.1, you can try this: val rowrdd = df.rdd.map(r => Row(Row(r(3)), Row(r(0), r(1), r(2 val newDF = sqlContext.createDataFrame(rowrdd, yourNewSchema) Thanks! - Terry On Wed, Sep 16, 2015 at 2:10 AM, Hao Wang wrote: > Hi, > > I created a dataframe with 4 st

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-08 Thread Terry Hole
= { val na = NominalAttribute.defaultAttr.withValues("0", "1") na.toMetadata(m) } val newSchema = StructType(schema.map(f => if (f.name == "label") f.copy(metadata=enrich(f.metadata)) else f)) val model = pipeline.fit(sqlContext.createDataFrame(rowRDD, newSchem

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-07 Thread Terry Hole
Xiangrui, Do you have any idea how to make this work? Thanks - Terry Terry Hole 于2015年9月6日星期日 17:41写道: > Sean > > Do you know how to tell decision tree that the "label" is a binary or set > some attributes to dataframe to carry number of classes? > > Thanks! >

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
Sean Do you know how to tell decision tree that the "label" is a binary or set some attributes to dataframe to carry number of classes? Thanks! - Terry On Sun, Sep 6, 2015 at 5:23 PM, Sean Owen wrote: > (Sean) > The error suggests that the type is not a binary or nominal attri

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
at $iwC$$iwC$$iwC.(:72) at $iwC$$iwC.(:74) at $iwC.(:76) at (:78) at .(:82) at .() at .(:7) at .() at $print() Thanks! - Terry On Sun, Sep 6, 2015 at 4:53 PM, Sean Owen wrote: > I think somewhere alone the line you've n

Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-05 Thread Terry Hole
Hi, Experts, I followed the guide of spark ml pipe to test DecisionTreeClassifier on spark shell with spark 1.4.1, but always meets error like following, do you have any idea how to fix this? The error stack: *java.lang.IllegalArgumentException:

SparkSQL without access to arrays?

2015-09-03 Thread Terry
Hi, i'm using Spark 1.4.1. Here is de printSchema after load my json file: root |-- result: struct (nullable = true) ||-- negative_votes: long (nullable = true) ||-- players: array (nullable = true) ||||-- account_id: long (nullable = true) ||||-- assists: lon

Re: Job aborted due to stage failure: java.lang.StringIndexOutOfBoundsException: String index out of range: 18

2015-08-28 Thread Terry Hole
Ricky, You may need to use map instead of flatMap in your case *val rowRDD=sc.textFile("/user/spark/short_model").map(_.split("\\t")).map(p => Row(...))* Thanks! -Terry On Fri, Aug 28, 2015 at 5:08 PM, our...@cnsuning.com wrote: > hi all, > > when using s

Re: standalone to connect mysql

2015-07-21 Thread Terry Hole
Jack, You can refer the hive sql syntax if you use HiveContext: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML Thanks! -Terry That works! Thanks. > Can I ask you one further question? > > How did spark sql support insertion? > > > >

Re: standalone to connect mysql

2015-07-20 Thread Terry Hole
Maybe you can try: spark-submit --class "sparkwithscala.SqlApp" --jars /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar Thanks! -Terry > Hi there, > > > > I would like to use spark to access the data in mysql. So firstly I t

Re: [Spark Shell] Could the spark shell be reset to the original status?

2015-07-16 Thread Terry Hole
. Thanks! - Terry Ted Yu 于2015年7月17日周五 下午12:02写道: > See this recent thread: > > > http://search-hadoop.com/m/q3RTtFW7iMDkrj61/Spark+shell+oom+&subj=java+lang+OutOfMemoryError+PermGen+space > > > > On Jul 16, 2015, at 8:51 PM, Terry Hole wrote: > > Hi, > > Bac

[Spark Shell] Could the spark shell be reset to the original status?

2015-07-16 Thread Terry Hole
$line16' in 'C:\Users\jhu\AppData\Local\Temp\spark-2ad09490-c0c6-41e2-addb-63087ce0ae63' but it is not a directory That entry seems to have slain the compiler. Shall I replayyour session? I can re-run each line except the last one.[y/n] Abandoning crashed session. Thanks! -Terry

Re: fileStream with old files

2015-07-15 Thread Terry Hole
Hi, Hunter, *What **behavior do you see with the HDFS? The local file system and HDFS should have the same ** behavior.* *Thanks!* *- Terry* Hunter Morgan 于2015年7月16日周四 上午2:04写道: > After moving the setting of the parameter to SparkConf initialization > instead of after the context is a

Re: fileStream with old files

2015-07-13 Thread Terry Hole
://issues.apache.org/jira/browse/SPARK-3276 -Terry On Tue, Jul 14, 2015 at 4:44 AM, automaticgiant wrote: > It's not as odd as it sounds. I want to ensure that long streaming job > outages can recover all the files that went into a directory while the job > was down. > I've looke

Re: [Spark Hive SQL] Set the hive connection in hive context is broken in spark 1.4.1-rc1?

2015-07-10 Thread Terry Hole
Michael, Thanks - Terry Michael Armbrust 于2015年7月11日星期六 04:02写道: > Metastore configuration should be set in hive-site.xml. > > On Thu, Jul 9, 2015 at 8:59 PM, Terry Hole wrote: > >> Hi, >> >> I am trying to set the hive metadata destination to a mysql databas

[Spark Hive SQL] Set the hive connection in hive context is broken in spark 1.4.1-rc1?

2015-07-09 Thread Terry Hole
hive.metastore.warehouse.dir", "/user/hive/warehouse")* *hiveContext.sql("select * from mysqltable").show()* *Thanks!* *-Terry*

Re: Is there a way to shutdown the derby in hive context in spark shell?

2015-07-09 Thread Terry Hole
> On Wed, Jul 8, 2015 at 8:12 PM, Terry Hole wrote: > >> I am using spark 1.4.1rc1 with default hive settings >> >> Thanks >> - Terry >> >> Hi All, >> >> I'd like to use the hive context in spark shell, i need to recreate the >> hi

Re: Is there a way to shutdown the derby in hive context in spark shell?

2015-07-08 Thread Terry Hole
I am using spark 1.4.1rc1 with default hive settings Thanks - Terry Hi All, I'd like to use the hive context in spark shell, i need to recreate the hive meta database in the same location, so i want to close the derby connection previous created in the spark shell, is there any way to do

Is there a way to shutdown the derby in hive context in spark shell?

2015-07-08 Thread Terry Hole
ction("jdbc:derby:;shutdown=true"); Thanks! - Terry

Re: Meets class not found error in spark console with newly hive context

2015-07-02 Thread Terry Hole
Found this a bug in spark 1.4.0: SPARK-8368 <https://issues.apache.org/jira/browse/SPARK-8368> Thanks! Terry On Thu, Jul 2, 2015 at 1:20 PM, Terry Hole wrote: > All, > > I am using spark console 1.4.0 to do some tests, when a create a newly > HiveContext (Line 18 in th

Meets class not found error in spark console with newly hive context

2015-07-01 Thread Terry Hole
ion += 1; if ( accum.value > 0 || duration >= 120) {println("### STOP SSC ###");ssc.stop(false, true); duration = 0; isRun = false} }}33 ssc.awaitTermination()34 println(">>> Streaming context terminated.")35 }36 37 streamingTest(null)38 Thanks Terry

Re: Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-11 Thread Terry Hole
kConf.set("spark.akka.extensions","Whatever"), underneath i think > spark won't ship properties which don't start with spark.* to the executors. > > Thanks > Best Regards > > On Mon, May 11, 2015 at 8:33 AM, Terry Hole wrote: > >> Hi all, >> &

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = ["kamon.system.SystemMetrics", "kamon.statsd.StatsD"] } But i can not find a way to do this, i have tried these: 1. SparkConf.set("akka

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = ["kamon.system.SystemMetrics", "kamon.statsd.StatsD"] } But i can not find a way to do this, i have tried these: 1. SparkConf.set("

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-07 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = ["kamon.system.SystemMetrics", "kamon.statsd.StatsD"] } But i can not find a way to do this, i have tried these: 1. SparkConf.set("akka

Re: spark 1.3.0 strange log message

2015-04-23 Thread Terry Hole
Use this in spark conf: spark.ui.showConsoleProgress=false Best Regards, On Fri, Apr 24, 2015 at 11:23 AM, Henry Hung wrote: > Dear All, > > > > When using spark 1.3.0 spark-submit with directing out and err to a log > file, I saw some strange lines inside that looks like this: > > [Stage 0:>

Re: [Spark Streaming] The FileInputDStream newFilesOnly=false does not work in 1.2 since

2015-01-21 Thread Terry Hole
See also SPARK-3276 and SPARK-3553. Can you say more about the > problem? what are the file timestamps, what happens when you run, what > log messages if any are relevant. I do not expect there was any > intended behavior change. > > On Wed, Jan 21, 2015 at 5:17 AM, Terry Hole wrote:

Fwd: [Spark Streaming] The FileInputDStream newFilesOnly=false does not work in 1.2 since

2015-01-20 Thread Terry Hole
rds - Terry

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Terry Siu
? From: Tridib Samanta mailto:tridib.sama...@live.com>> Date: Thursday, November 6, 2014 at 9:49 AM To: Terry Siu mailto:terry@smartfocus.com>>, "u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>" mailto:u...@spark.incubator.apache.org>> Subj

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Terry Siu
What version of Spark are you using? Did you compile your Spark version and if so, what compile options did you use? On 11/6/14, 9:22 AM, "tridib" wrote: >Help please! > > > >-- >View this message in context: >http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveCont >ext-in-spa

Re: SparkSQL - No support for subqueries in 1.2-snapshot?

2014-11-04 Thread Terry Siu
Done. https://issues.apache.org/jira/browse/SPARK-4226 Hoping this will make it into 1.3? :) -Terry From: Michael Armbrust mailto:mich...@databricks.com>> Date: Tuesday, November 4, 2014 at 11:31 AM To: Terry Siu mailto:terry@smartfocus.com>> Cc: "user@spark.apach

SparkSQL - No support for subqueries in 1.2-snapshot?

2014-11-04 Thread Terry Siu
e.com/Subquery-in-having-clause-Spark-1-1-0-td17401.html Thanks, -Terry

Re: ParquetFilters and StringType support for GT, GTE, LT, LTE

2014-11-03 Thread Terry Siu
Done. https://issues.apache.org/jira/browse/SPARK-4213 Thanks, -Terry From: Michael Armbrust mailto:mich...@databricks.com>> Date: Monday, November 3, 2014 at 1:37 PM To: Terry Siu mailto:terry@smartfocus.com>> Cc: "user@spark.apache.org<mailto:user@spark.apach

Re: NoClassDefFoundError encountered in Spark 1.2-snapshot build with hive-0.13.1 profile

2014-11-03 Thread Terry Siu
Thanks, Kousuke. I’ll wait till this pull request makes it into the master branch. -Terry From: Kousuke Saruta mailto:saru...@oss.nttdata.co.jp>> Date: Monday, November 3, 2014 at 11:11 AM To: Terry Siu mailto:terry@smartfocus.com>>, "user@spark.apache.org<mailto:u

ParquetFilters and StringType support for GT, GTE, LT, LTE

2014-11-03 Thread Terry Siu
morning and now the same query will give me a MatchError for this column of string type. Thanks, -Terry

NoClassDefFoundError encountered in Spark 1.2-snapshot build with hive-0.13.1 profile

2014-11-03 Thread Terry Siu
ion going on. Thanks, -Terry

Re: Spark Build

2014-10-31 Thread Terry Siu
Thanks for the update, Shivaram. -Terry On 10/31/14, 12:37 PM, "Shivaram Venkataraman" wrote: >Yeah looks like https://github.com/apache/spark/pull/2744 broke the >build. We will fix it soon > >On Fri, Oct 31, 2014 at 12:21 PM, Terry Siu >wrote: >> I am synced u

Spark Build

2014-10-31 Thread Terry Siu
not find MemLimitLogger anywhere in the Spark code. Anybody else seen/encounter this? Thanks, -Terry

Re: Ambiguous references to id : what does it mean ?

2014-10-30 Thread Terry Siu
ld still get an "Unresolved attributes" error back. Is there any way around this short of renaming the columns in the join sources? Thanks -Terry Michael Armbrust wrote Yes, but if both tagCollection and selectedVideos have a column named "id" then Spark SQL does not know wh

Re: SparkSQL - TreeNodeException for unresolved attributes

2014-10-21 Thread Terry Siu
Just to follow up, the queries worked against master and I got my whole flow rolling. Thanks for the suggestion! Now if only Spark 1.2 will come out with the next release of CDH5 :P -Terry From: Terry Siu mailto:terry@smartfocus.com>> Date: Monday, October 20, 2014 at 12:22 PM To: M

Re: SparkSQL - TreeNodeException for unresolved attributes

2014-10-20 Thread Terry Siu
Hi Michael, Thanks again for the reply. Was hoping it was something I was doing wrong in 1.1.0, but I’ll try master. Thanks, -Terry From: Michael Armbrust mailto:mich...@databricks.com>> Date: Monday, October 20, 2014 at 12:11 PM To: Terry Siu mailto:terry@smartfocus.com>&g

SparkSQL - TreeNodeException for unresolved attributes

2014-10-20 Thread Terry Siu
h the columns from the two tables on which the join table is constructed from as I see the plan, a breakdown of various pieces from the queries on my two source tables. Help? Thanks, -Terry

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-20 Thread Terry Siu
Hi Yin, Sorry for the delay, but I’ll try the code change when I get a chance, but Michael’s initial response did solve my problem. In the meantime, I’m hitting another issue with SparkSQL which I will probably post another message if I can’t figure a workaround. Thanks, -Terry From: Yin

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-15 Thread Terry Siu
. Let me know if you need more information. Thanks -Terry From: Yin Huai mailto:huaiyin@gmail.com>> Date: Tuesday, October 14, 2014 at 6:29 PM To: Terry Siu mailto:terry@smartfocus.com>> Cc: Michael Armbrust mailto:mich...@databricks.com>>, "user@spark

Re: SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-14 Thread Terry Siu
Hi Michael, That worked for me. At least I’m now further than I was. Thanks for the tip! -Terry From: Michael Armbrust mailto:mich...@databricks.com>> Date: Monday, October 13, 2014 at 5:05 PM To: Terry Siu mailto:terry@smartfocus.com>> Cc: "user@spark.apach

SparkSQL IndexOutOfBoundsException when reading from Parquet

2014-10-13 Thread Terry Siu
umns and two partitions defined. Does this error look familiar to anyone? Could my usage of SparkSQL with Hive be incorrect or is support with Hive/Parquet/partitioning still buggy at this point in Spark 1.1.0? Thanks, -Terry