Thanks a lot.
By spins up, do you mean using the same directory, specified by following?
/** Local directory to save .class files too */
val outputDir = {
val tmp = System.getProperty(java.io.tmpdir)
val rootDir = new SparkConf().get(spark.repl.classdir, tmp)
The API change seems not major. I have locally change it and compiled, but
not test yet. The major problem is still how to solve the hive-exec jar
dependency. I am willing to help on this issue. Is it better stick to the
same way as hive-0.12 until hive-exec is cleaned enough to switch back?
--
I can compile with no error, but my patch also includes other stuff.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7775.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Here is the patch. Please ignore the pom.xml related change, which just for
compiling purpose. I need to further work on this one based on Wandou's
previous work.
--
View this message in context:
Sorry, forget to upload files. I have never posted before :) hive.diff
http://apache-spark-developers-list.1001551.n3.nabble.com/file/n/hive.diff
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p.html
Attached the diff the PR SPARK-2706. I am currently working on this problem.
If somebody are also working on this, we can share the load.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7782.html
Sent from the
I am trying to change spark to support hive-0.13, but always met following
problem when running the test. My feeling is the test setup may need to
change, but don't know exactly. Who has the similar issue or is able to shed
light on it?
13:50:53.331 ERROR org.apache.hadoop.hive.ql.Driver: FAILED:
Thanks Sean,
I change both the API and version because there are some incompatibility
with hive-0.13, and actually can do some basic operation with the real hive
environment. But the test suite always complain with no default database
message. No clue yet.
--
View this message in context:
Problem solved by a walkaround with create database and use database.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-testsuite-error-for-hive-0-13-tp7807p7819.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
( )).map(word = (word,
1)).reduceByKey((a, b) = a + b)
counts.saveAsTextFile(“file”)//any way you don’t want to collect results to
master, and instead putting them in file.
Thanks.
Zhan Zhang
On Aug 16, 2014, at 9:18 AM, Jerry Ye jerr...@gmail.com wrote:
The job ended up running overnight
Not sure exactly how you use it. My understanding is that in spark it would be
better to keep the overhead of driver as less as possible. Is it possible to
broadcast trie to executors, do computation there and then aggregate the
counters (??) in reduct phase?
Thanks.
Zhan Zhang
On Aug 18
issue to spark-2706 soon.
Thanks.
Zhan Zhang
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p8118.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com
-Phive is to enable hive-0.13.1 and -Phive -Phive-0.12.0” is to enable
hive-0.12.0. Note that the thrift-server is not supported yet in hive-0.13, but
expected to go to upstream soon (Spark-3720).
Thanks.
Zhan Zhang
On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote
on hive, e.g., metastore, thriftserver,
hcatlog may not be able to help much.
Does anyone have any insight or idea in mind?
Thanks.
Zhan Zhang
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/How-spark-and-hive-integrate-in-long-term-tp9482.html
Sent
and more features added, it would be great if user
can take advantage of both. Current, spark sql give us such benefits partially,
but I am wondering how to keep such integration in long term.
Thanks.
Zhan Zhang
On Nov 21, 2014, at 3:12 PM, Dean Wampler deanwamp...@gmail.com wrote:
I can't comment
some basic functions using hive-0.13 connect to
hive-0.14 metastore, and it looks like they are compatible.
Thanks.
Zhan Zhang
On Nov 22, 2014, at 7:14 AM, Cheng Lian lian.cs@gmail.com wrote:
Should emphasize that this is still a quick and rough conclusion, will
investigate
Congratulations!
On Feb 3, 2015, at 2:34 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi all,
The PMC recently voted to add three new committers: Cheng Lian, Joseph
Bradley and Sean Owen. All three have been major contributors to Spark in the
past year: Cheng on Spark SQL, Joseph on
You can try to add it in in conf/spark-defaults.conf
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value
-Dnumbers=one two three”
Thanks.
Zhan Zhang
On Jan 16, 2015, at 9:56 AM, Michel Dufresne sparkhealthanalyt...@gmail.com
wrote:
Hi All,
I'm trying to set some JVM
You can try to set it in spark-env.sh.
# - SPARK_LOG_DIR Where log files are stored. (Default:
${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
Thanks.
Zhan Zhang
On Mar 24, 2015, at 12:10 PM, Anubhav Agarwal
anubha...@gmail.commailto:anubha
Thanks Reynold,
Agree with you to open another JIRA to unify the block storage API. I have
upload the design doc to SPARK-6479 as well.
Thanks.
Zhan Zhang
On Mar 23, 2015, at 4:03 PM, Reynold Xin
r...@databricks.commailto:r...@databricks.com wrote:
I created a ticket to separate the API
Hi Alexey,
SPARK-6479https://issues.apache.org/jira/browse/SPARK-6479 is for the plugin
API, and SPARK-6112https://issues.apache.org/jira/browse/SPARK-6112 is for
hdfs plugin.
Thanks.
Zhan Zhang
On Jul 21, 2015, at 10:56 AM, Alexey Goncharuk
alexey.goncha...@gmail.commailto:alexey.goncha
I think you can rewrite those TPC-H queries not using view, for example
registerTempTable
Thanks.
Zhan Zhang
On Nov 9, 2015, at 9:34 PM, Sudhir Menon <sme...@pivotal.io> wrote:
> Team:
>
> Do we plan to add support for views/ virtual tables in SparkSQL anytime soon?
> Tryin
, and we
can move the discussion there.
Thanks.
Zhan Zhang
On Nov 11, 2015, at 6:16 PM, Xiao Li
<gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>> wrote:
Hi, Zhan,
That sounds really interesting! Please at me when you submit the PR. If
possible, please also posted the performanc
are
eliminated.
Without such manual tuning, the query will never finish if a, c are big. But we
should not relies on such manual optimization.
Please provide your inputs. If they are both valid, I will open liras for each.
Than
It does not matter whether you start your spark with local or other mode. If
you have hdfs-site.xml somewhere and spark configuration pointing to that
config, you will read/write to HDFS.
Thanks.
Zhan Zhang
From: Madhu <ma...@madhu.com>
Sent: Sa
I noticed that it is configurable in job level spark.task.cpus. Anyway to
support on task level?
Thanks.
Zhan Zhang
On Dec 11, 2015, at 10:46 AM, Zhan Zhang <zzh...@hortonworks.com> wrote:
> Hi Folks,
>
> Is it possible to assign multiple core per task and how? Suppo
it make sense to add this feature. It may seems
make user worry about more configuration, but by default we can still do 1 core
per task and only advanced users need to be aware of this feature.
Thanks.
Zhan Zhang
-
To unsubscribe
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org
.
Thanks.
Zhan Zhang
Note that when sc is stopped, all resources are released (for example in yarn
On Dec 20, 2015, at 2:59 PM, Jerry Lam <chiling...@gmail.com> wrote:
> Hi Spark developers,
>
> I found that SQLContext.getOrCreate(sc: SparkContext) does not behave
> correctly when
Hi James,
You can try to write with other format, e.g., parquet to see whether it is a
orc specific issue or more generic issue.
Thanks.
Zhan Zhang
On Feb 23, 2016, at 6:05 AM, James Barney
<jamesbarne...@gmail.com<mailto:jamesbarne...@gmail.com>> wrote:
I'm trying to write
FYI: There are several pending patches for DataFrame support on top of HBase.
Thanks.
Zhan Zhang
On Apr 20, 2016, at 2:43 AM, Saisai Shao
<sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>> wrote:
+1, HBaseTest in Spark Example is quite old and obsolete, the HBase connector
i
You can take a look at this blog from data bricks about GraphFrames
https://databricks.com/blog/2016/03/03/introducing-graphframes.html
Thanks.
Zhan Zhang
On Apr 21, 2016, at 12:53 PM, Robin East
<robin.e...@xense.co.uk<mailto:robin.e...@xense.co.uk>> wrote:
Hi
Aside fro
The reason for "-1" is that the default value for Integer is -1 if the value
is null
def defaultValue(jt: String): String = jt match {
...
case JAVA_INT => "-1"
...
}
--
View this message in context:
+1
Both of the would be very helpful in debugging
Thanks.
Zhan Zhang
On Apr 18, 2016, at 1:18 PM, Evan Chan <velvia.git...@gmail.com> wrote:
> +1000.
>
> Especially if the UI can help correlate exceptions, and we can reduce
> some exceptions.
>
> There a
>From the physical plan, the limit is one level up than the WholeStageCodegen,
>Thus, I don’t think shouldStop would work here. To move it work, the limit has
>to be part of the wholeStageCodeGen.
Correct me if I am wrong.
Thanks.
Zhan Zhang
On Apr 18, 2016, at 11:09 AM, Reynol
, SinglePartition, serializer))
shuffled.mapPartitionsInternal(_.take(limit))
}
Thus, there is no way to avoid processing all data before the shuffle. I think
that is the reason. Do I understand correctly?
Thanks.
Zhan Zhang
On Apr 18, 2016, at 10:08 PM, Reynold Xin
<r...@databricks.com<ma
Thanks Reynold.
Not sure why doExecute is not invoked, since CollectLimit does not support
wholeStage
case class CollectLimit(limit: Int, child: SparkPlan) extends UnaryNode {
I will dig further into this.
Zhan Zhang
On Apr 18, 2016, at 10:36 PM, Reynold Xin
<r...@databricks.com<ma
I saw the pom file having hive version as
1.2.1.spark2. But I cannot find the branch in
https://github.com/pwendell/
Does anyone know where the repo is?
Thanks.
Zhan Zhang
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Anyone-knows-the-hive
Hi Folks,
We have some Dataset/Dataframe use cases that will benefit from reuse the
SparkPlan and shuffle stage.
For example, the following cases. Because the query optimization and
sparkplan is generated by catalyst when it is executed, as a result, the
underlying RDD lineage is regenerated
39 matches
Mail list logo