Re: Spark REPL question

2014-04-17 Thread Zhan Zhang
Thanks a lot. By spins up, do you mean using the same directory, specified by following? /** Local directory to save .class files too */ val outputDir = { val tmp = System.getProperty(java.io.tmpdir) val rootDir = new SparkConf().get(spark.repl.classdir, tmp)

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
The API change seems not major. I have locally change it and compiled, but not test yet. The major problem is still how to solve the hive-exec jar dependency. I am willing to help on this issue. Is it better stick to the same way as hive-0.12 until hive-exec is cleaned enough to switch back? --

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
I can compile with no error, but my patch also includes other stuff. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7775.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Here is the patch. Please ignore the pom.xml related change, which just for compiling purpose. I need to further work on this one based on Wandou's previous work. -- View this message in context:

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Sorry, forget to upload files. I have never posted before :) hive.diff http://apache-spark-developers-list.1001551.n3.nabble.com/file/n/hive.diff -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p.html

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Attached the diff the PR SPARK-2706. I am currently working on this problem. If somebody are also working on this, we can share the load. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7782.html Sent from the

Spark testsuite error for hive 0.13.

2014-08-11 Thread Zhan Zhang
I am trying to change spark to support hive-0.13, but always met following problem when running the test. My feeling is the test setup may need to change, but don't know exactly. Who has the similar issue or is able to shed light on it? 13:50:53.331 ERROR org.apache.hadoop.hive.ql.Driver: FAILED:

Re: Spark testsuite error for hive 0.13.

2014-08-11 Thread Zhan Zhang
Thanks Sean, I change both the API and version because there are some incompatibility with hive-0.13, and actually can do some basic operation with the real hive environment. But the test suite always complain with no default database message. No clue yet. -- View this message in context:

Re: Spark testsuite error for hive 0.13.

2014-08-12 Thread Zhan Zhang
Problem solved by a walkaround with create database and use database. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-testsuite-error-for-hive-0-13-tp7807p7819.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Zhan Zhang
( )).map(word = (word, 1)).reduceByKey((a, b) = a + b) counts.saveAsTextFile(“file”)//any way you don’t want to collect results to master, and instead putting them in file. Thanks. Zhan Zhang On Aug 16, 2014, at 9:18 AM, Jerry Ye jerr...@gmail.com wrote: The job ended up running overnight

Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-18 Thread Zhan Zhang
Not sure exactly how you use it. My understanding is that in spark it would be better to keep the overhead of driver as less as possible. Is it possible to broadcast trie to executors, do computation there and then aggregate the counters (??) in reduct phase? Thanks. Zhan Zhang On Aug 18

RE: Working Formula for Hive 0.13?

2014-08-29 Thread Zhan Zhang
issue to spark-2706 soon. Thanks. Zhan Zhang -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p8118.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Zhan Zhang
-Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0” is to enable hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but expected to go to upstream soon (Spark-3720). Thanks. Zhan Zhang On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote

How spark and hive integrate in long term?

2014-11-21 Thread Zhan Zhang
on hive, e.g., metastore, thriftserver, hcatlog may not be able to help much. Does anyone have any insight or idea in mind? Thanks. Zhan Zhang -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-spark-and-hive-integrate-in-long-term-tp9482.html Sent

Re: How spark and hive integrate in long term?

2014-11-21 Thread Zhan Zhang
and more features added, it would be great if user can take advantage of both. Current, spark sql give us such benefits partially, but I am wondering how to keep such integration in long term. Thanks. Zhan Zhang On Nov 21, 2014, at 3:12 PM, Dean Wampler deanwamp...@gmail.com wrote: I can't comment

Re: How spark and hive integrate in long term?

2014-11-22 Thread Zhan Zhang
some basic functions using hive-0.13 connect to hive-0.14 metastore, and it looks like they are compatible. Thanks. Zhan Zhang On Nov 22, 2014, at 7:14 AM, Cheng Lian lian.cs@gmail.com wrote: Should emphasize that this is still a quick and rough conclusion, will investigate

Re: Welcoming three new committers

2015-02-03 Thread Zhan Zhang
Congratulations! On Feb 3, 2015, at 2:34 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi all, The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley and Sean Owen. All three have been major contributors to Spark in the past year: Cheng on Spark SQL, Joseph on

Re: Setting JVM options to Spark executors in Standalone mode

2015-01-16 Thread Zhan Zhang
You can try to add it in in conf/spark-defaults.conf # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers=one two three” Thanks. Zhan Zhang On Jan 16, 2015, at 9:56 AM, Michel Dufresne sparkhealthanalyt...@gmail.com wrote: Hi All, I'm trying to set some JVM

Re: Spark-thriftserver Issue

2015-03-24 Thread Zhan Zhang
You can try to set it in spark-env.sh. # - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs) # - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp) Thanks. Zhan Zhang On Mar 24, 2015, at 12:10 PM, Anubhav Agarwal anubha...@gmail.commailto:anubha

Re: Review request for SPARK-6112:Provide OffHeap support through HDFS RAM_DISK

2015-03-23 Thread Zhan Zhang
Thanks Reynold, Agree with you to open another JIRA to unify the block storage API. I have upload the design doc to SPARK-6479 as well. Thanks. Zhan Zhang On Mar 23, 2015, at 4:03 PM, Reynold Xin r...@databricks.commailto:r...@databricks.com wrote: I created a ticket to separate the API

Re: Make off-heap store pluggable

2015-07-21 Thread Zhan Zhang
Hi Alexey, SPARK-6479https://issues.apache.org/jira/browse/SPARK-6479 is for the plugin API, and SPARK-6112https://issues.apache.org/jira/browse/SPARK-6112 is for hdfs plugin. Thanks. Zhan Zhang On Jul 21, 2015, at 10:56 AM, Alexey Goncharuk alexey.goncha...@gmail.commailto:alexey.goncha

Re: Support for views/ virtual tables in SparkSQL

2015-11-09 Thread Zhan Zhang
I think you can rewrite those TPC-H queries not using view, for example registerTempTable Thanks. Zhan Zhang On Nov 9, 2015, at 9:34 PM, Sudhir Menon <sme...@pivotal.io> wrote: > Team: > > Do we plan to add support for views/ virtual tables in SparkSQL anytime soon? > Tryin

Re: Proposal for SQL join optimization

2015-11-12 Thread Zhan Zhang
, and we can move the discussion there. Thanks. Zhan Zhang On Nov 11, 2015, at 6:16 PM, Xiao Li <gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>> wrote: Hi, Zhan, That sounds really interesting! Please at me when you submit the PR. If possible, please also posted the performanc

Proposal for SQL join optimization

2015-11-11 Thread Zhan Zhang
are eliminated. Without such manual tuning, the query will never finish if a, c are big. But we should not relies on such manual optimization. Please provide your inputs. If they are both valid, I will open liras for each. Than

Re: spark-shell 1.5 doesn't seem to work in local mode

2015-09-19 Thread Zhan Zhang
It does not matter whether you start your spark with local or other mode. If you have hdfs-site.xml somewhere and spark configuration pointing to that config, you will read/write to HDFS. Thanks. Zhan Zhang From: Madhu <ma...@madhu.com> Sent: Sa

Re: Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
I noticed that it is configurable in job level spark.task.cpus. Anyway to support on task level? Thanks. Zhan Zhang On Dec 11, 2015, at 10:46 AM, Zhan Zhang <zzh...@hortonworks.com> wrote: > Hi Folks, > > Is it possible to assign multiple core per task and how? Suppo

Multi-core support per task in Spark

2015-12-11 Thread Zhan Zhang
it make sense to add this feature. It may seems make user worry about more configuration, but by default we can still do 1 core per task and only advanced users need to be aware of this feature. Thanks. Zhan Zhang - To unsubscribe

Dr.appointment this afternoon and WFH tomorrow for another Dr. appointment (EOM)

2016-01-07 Thread Zhan Zhang
- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Zhan Zhang
. Thanks. Zhan Zhang Note that when sc is stopped, all resources are released (for example in yarn On Dec 20, 2015, at 2:59 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Spark developers, > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > correctly when

Re: ORC file writing hangs in pyspark

2016-02-23 Thread Zhan Zhang
Hi James, You can try to write with other format, e.g., parquet to see whether it is a orc specific issue or more generic issue. Thanks. Zhan Zhang On Feb 23, 2016, at 6:05 AM, James Barney <jamesbarne...@gmail.com<mailto:jamesbarne...@gmail.com>> wrote: I'm trying to write

Re: RFC: Remote "HBaseTest" from examples?

2016-04-21 Thread Zhan Zhang
FYI: There are several pending patches for DataFrame support on top of HBase. Thanks. Zhan Zhang On Apr 20, 2016, at 2:43 AM, Saisai Shao <sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>> wrote: +1, HBaseTest in Spark Example is quite old and obsolete, the HBase connector i

Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Zhan Zhang
You can take a look at this blog from data bricks about GraphFrames https://databricks.com/blog/2016/03/03/introducing-graphframes.html Thanks. Zhan Zhang On Apr 21, 2016, at 12:53 PM, Robin East <robin.e...@xense.co.uk<mailto:robin.e...@xense.co.uk>> wrote: Hi Aside fro

Re: right outer joins on Datasets

2016-05-24 Thread Zhan Zhang
The reason for "-1" is that the default value for Integer is -1 if the value is null def defaultValue(jt: String): String = jt match { ... case JAVA_INT => "-1" ... } -- View this message in context:

Re: more uniform exception handling?

2016-04-18 Thread Zhan Zhang
+1 Both of the would be very helpful in debugging Thanks. Zhan Zhang On Apr 18, 2016, at 1:18 PM, Evan Chan <velvia.git...@gmail.com> wrote: > +1000. > > Especially if the UI can help correlate exceptions, and we can reduce > some exceptions. > > There a

Re: SparkSQL - Limit pushdown on BroadcastHashJoin

2016-04-18 Thread Zhan Zhang
>From the physical plan, the limit is one level up than the WholeStageCodegen, >Thus, I don’t think shouldStop would work here. To move it work, the limit has >to be part of the wholeStageCodeGen. Correct me if I am wrong. Thanks. Zhan Zhang On Apr 18, 2016, at 11:09 AM, Reynol

Re: SparkSQL - Limit pushdown on BroadcastHashJoin

2016-04-18 Thread Zhan Zhang
, SinglePartition, serializer)) shuffled.mapPartitionsInternal(_.take(limit)) } Thus, there is no way to avoid processing all data before the shuffle. I think that is the reason. Do I understand correctly? Thanks. Zhan Zhang On Apr 18, 2016, at 10:08 PM, Reynold Xin <r...@databricks.com<ma

Re: SparkSQL - Limit pushdown on BroadcastHashJoin

2016-04-18 Thread Zhan Zhang
Thanks Reynold. Not sure why doExecute is not invoked, since CollectLimit does not support wholeStage case class CollectLimit(limit: Int, child: SparkPlan) extends UnaryNode { I will dig further into this. Zhan Zhang On Apr 18, 2016, at 10:36 PM, Reynold Xin <r...@databricks.com<ma

Anyone knows the hive repo for spark-2.0?

2016-07-07 Thread Zhan Zhang
I saw the pom file having hive version as 1.2.1.spark2. But I cannot find the branch in https://github.com/pwendell/ Does anyone know where the repo is? Thanks. Zhan Zhang -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Anyone-knows-the-hive

SparkPlan/Shuffle stage reuse with Dataset/DataFrame

2016-10-18 Thread Zhan Zhang
Hi Folks, We have some Dataset/Dataframe use cases that will benefit from reuse the SparkPlan and shuffle stage. For example, the following cases. Because the query optimization and sparkplan is generated by catalyst when it is executed, as a result, the underlying RDD lineage is regenerated