Hi All,
For the given DataFrame created by hive sql, however, then it is required to
add one more column based on the existing column, and should also keep the
previous columns there for the result DataFrame.
final double DAYS_30 = 1000 * 60 * 60 * 24 * 30.0;
//DAYS_30 seems difficult to call in
Hi All,
For the given DataFrame created by hive sql, however, then it is required to
add one more column based on the existing column, and should also keep the
previous columns there for the result DataFrame.
final double DAYS_30 = 1000 * 60 * 60 * 24 * 30.0;
//DAYS_30 seems difficult to call i
On Tuesday, May 17, 2016 10:44 AM, Zhiliang Zhu
wrote:
Hi All,
For the given DataFrame created by hive sql, however, then it is required to
add one more column based on the existing column, and should also keep the
previous columns there for the result DataFrame.
final double
just for test, since it seemed that the user email system was something wrong
ago, is okay now.
On Friday, June 17, 2016 12:18 PM, Zhiliang Zhu
wrote:
On Tuesday, May 17, 2016 10:44 AM, Zhiliang Zhu
wrote:
Hi All,
For the given DataFrame created by hive sql, however
Hi All,
I have a big job which mainly takes more than one hour to run the whole,
however, it is very much unreasonable to exit & finish to run midway (almost
80% of the job finished actually, but not all), without any apparent error or
exception log.
I submitted the same job for many times, it i
anyone ever met the similar problem, which is quite strange ...
On Friday, June 17, 2016 2:13 PM, Zhiliang Zhu
wrote:
Hi All,
I have a big job which mainly takes more than one hour to run the whole,
however, it is very much unreasonable to exit & finish to run midway (almost
80
n this
situation, please check yarn userlogs for more information… --WBR, Alexander
From: Zhiliang Zhu
Sent: 17 июня 2016 г. 9:36
To: Zhiliang Zhu; User
Subject: Re: spark job automatically killed without rhyme or reason anyone
ever met the similar problem, which is quite strange ...
On Frid
in advance~
On Friday, June 17, 2016 6:53 PM, Zhiliang Zhu wrote:
Hi Alexander,
Thanks a lot for your reply.
Yes, submitted by yarn.Do you just mean in the executor log file by way of yarn
logs -applicationId id,
in this file, both in some containers' stdout and stderr :
16/06/
more information… --WBR, Alexander
From: Zhiliang Zhu
Sent: 17 июня 2016 г. 9:36
To: Zhiliang Zhu; User
Subject: Re: spark job automatically killed without rhyme or reason anyone
ever met the similar problem, which is quite strange ...
On Friday, June 17, 2016 2:13 PM, Zhiliang Zhu
w
currently ...
Thank you in advance~
On Friday, June 17, 2016 6:53 PM, Zhiliang Zhu wrote:
Hi Alexander,
Thanks a lot for your reply.
Yes, submitted by yarn.Do you just mean in the executor log file by way of yarn
logs -applicationId id,
in this file, both in some containers
, Alexander From: Zhiliang Zhu
Sent: 17 июня 2016 г. 14:10
To: User; kp...@hotmail.com
Subject: Re: spark job automatically killed without rhyme or reason
Show original message
Hi Alexander,
is your yarn userlog just for the executor log ?
as for those logs seem a little difficult to exactly
lar
spark.yarn.executor.memoryOverhead
Everything else you mention is a symptom of YARN shutting down your
jobs because your memory settings don't match what your app does.
They're not problems per se, based on what you have provided.
On Mon, Jun 20, 2016 at 9:17 AM, Zhiliang Zhu
wrote:
>
Hi All,
Here we have one application, it needs to extract different columns from 6 hive
tables, and then does some easy calculation, there is around 100,000 number of
rows in each table,finally need to output another table or file (with format of
consistent columns) .
However, after lots of d
the sql logic in the program is very much complex , so do not describe the
detailed codes here .
On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu
wrote:
Hi All,
Here we have one application, it needs to extract different columns from 6 hive
tables, and then does some easy
u can use something
like EXPLAIN command to show what going on.
On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu wrote:
the sql logic in the program is very much complex , so do not describe the
detailed codes here .
On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu
wrote:
Hi All,
He
try to set --drive-memory xg , x would be as large as can be set .
On Monday, July 18, 2016 6:31 PM, Saurav Sinha
wrote:
Hi,
I am running spark job.
Master memory - 5Gexecutor memort 10G(running on 4 node)
My job is getting killed as no of partition increase to 20K.
16/07/18 14:53:13 I
clue is also good.
Thanks in advance~
On Tuesday, July 19, 2016 11:05 AM, Zhiliang Zhu wrote:
Show original message
Hi Mungeol,
Thanks a lot for your help. I will try that.
On Tuesday, July 19, 2016 9:21 AM, Mungeol Heo
wrote:
Try to run a action at a Intermediate stage of
work, and editing the query, then retesting, repeatedly until you cut
the execution time by a significant fraction- Using the Spark UI or spark shell
to check the skew and make sure partitions are evenly distributed
On Jul 18, 2016, at 3:33 AM, Zhiliang Zhu wrote:
Thanks a lot for your reply .
In
Hi All,
Here I have lot of data with around 1,000,000 rows, 97% of them are negative
class and 3% of them are positive class . I applied Random Forest algorithm to
build the model and predict the testing data.
For the data preparation,i. firstly randomly split all the data as training
data and
use matrix SVD decomposition and spark has the lib .
http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html#singular-value-decomposition-svd
On Thursday, December 10, 2015 7:33 PM, Arunkumar Pillai
wrote:
Hi
I need to find inverse (X(Transpose) * X) matrix. I hav
Dear All,
For some rdd, while there is just one partition, then the operation &
arithmetic would only be single, the rdd has lose all the parallelism benefit
from spark system ...
Is it exactly like that?
Thanks very much in advance!Zhiliang
Dear All,
I need to iterator some job / rdd quite a lot of times, but just lost in the
problem of spark only accept to call around 350 number of map before it meets
one action Function , besides, dozens of action will obviously increase the run
time.Is there any proper way ...
As tested, there i
l be big (for example more than 1000
) ...
Thanks in advance!Zhiliang
On Monday, December 21, 2015 7:44 PM, Zhiliang Zhu
wrote:
Dear All,
I need to iterator some job / rdd quite a lot of times, but just lost in the
problem of spark only accept to call around 350 number of map before
ering[T] = null)
Cheers
On Mon, Dec 21, 2015 at 2:47 AM, Zhiliang Zhu
wrote:
Dear All,
For some rdd, while there is just one partition, then the operation &
arithmetic would only be single, the rdd has lose all the parallelism benefit
from spark system ...
Is it exactly like that?
Thanks
, you can
collapse all these functions into one, right? In the meantime, it is not
recommended to collectall data to driver.
Thanks.
Zhan Zhang
On Dec 21, 2015, at 3:44 AM, Zhiliang Zhu wrote:
Dear All,
I need to iterator some job / rdd quite a lot of times, but just lost in the
problem of spark
in rdd0 ?That way you can
increase the parallelism.
Cheers
On Mon, Dec 21, 2015 at 9:40 AM, Zhiliang Zhu wrote:
Hi Ted,
Thanks a lot for your kind reply.
I needs to convert this rdd0 into another rdd1, rows of rdd1 are generated
from rdd0's row randomly combination operation.From
all these functions into one, right? In the meantime, it is not
recommended to collectall data to driver.
Thanks.
Zhan Zhang
On Dec 21, 2015, at 3:44 AM, Zhiliang Zhu wrote:
Dear All,
I need to iterator some job / rdd quite a lot of times, but just lost in the
problem of spark only accept to call
n? If so, how do they depend on?I think either you can optimize your
implementation, or Spark is not the right one for your specific application.
Thanks.
Zhan Zhang
On Dec 21, 2015, at 10:43 AM, Zhiliang Zhu wrote:
What is difference between repartition / collect and collapse ...Is collapse
more costlythan collect ...
Hopefully you are using the Kryo serializer already.
This would be all right. From your experience , is Kryo improve efficiency
obviously ...
RegardsSab
On Mon, Dec 21, 2015 at 5:51 PM, Zhiliang Zhu
wrote:
Dear All.
I have some kind of iteration job, that is
In order to make job run faster, some parameters would be specified in the
command lines, such as --executor-cores , --executor-memory and --num-executors
...
However, as tested, it seemed that those numbers would not be reset randomly,
or some trouble would be caused for the cluster.What is mor
For some file on hdfs, it is necessary to copy/move it to some another specific
hdfs directory, and the directory name would keep unchanged.Just need finish
it in spark program, but not hdfs commands.Is there any codes, it seems not to
be done by searching spark doc ...
Thanks in advance!
for some other reasons...
This issue is urgent for me, would some expert provide some help about this
problem...
I will show sincere appreciation towards your help.
Thank you!Best Regards,Zhiliang
On Friday, September 25, 2015 7:53 PM, Zhiliang Zhu
wrote:
Hi all,
The spark job will
Hi All,
I would like to submit spark job on some another remote machine outside the
cluster,I also copied hadoop/spark conf files under the remote machine, then
hadoopjob would be submitted, but spark job would not.
In spark-env.sh, it may be due to that SPARK_LOCAL_IP is not properly set,or
for
on linux command side?
Best Regards,Zhiliang
On Saturday, September 26, 2015 10:07 AM, Gavin Yue
wrote:
Print out your env variables and check first
Sent from my iPhone
On Sep 25, 2015, at 18:43, Zhiliang Zhu wrote:
Hi All,
I would like to submit spark job on some another
where the
cluster's resource manager is.
I think this tutorial is pretty clear:
http://spark.apache.org/docs/latest/running-on-yarn.html
On Fri, Sep 25, 2015 at 7:11 PM, Zhiliang Zhu wrote:
Hi Yue,
Thanks very much for your kind reply.
I would like to submit spark job remotely on anoth
Hi All,
Would some expert help me some about the issue...
I shall appreciate you kind help very much!
Thank you!
Zhiliang
On Sunday, September 27, 2015 7:40 PM, Zhiliang Zhu
wrote:
Hi Alexis, Gavin,
Thanks very much for your kind comment.My spark command is :
spark-submit
anager is.
I think this tutorial is pretty clear:
http://spark.apache.org/docs/latest/running-on-yarn.html
On Fri, Sep 25, 2015 at 7:11 PM, Zhiliang Zhu wrote:
Hi Yue,
Thanks very much for your kind reply.
I would like to submit spark job remotely on another machine outside the
cluster,and
Dear All,
I would like to use spark ml to develop some project related with optimization
algorithm, however, in spark 1.4.1 it seems that under ml's optimizer there are
only about 2 optimization algorithms.
My project may needs more kinds of optimization algorithms, then how would I
use spark ml
Dear All,
I am new for spark ml.
There is some project for me, for some given math model and I would like to get
its optimized solution.It is very similar with spark mllib application.
However, the key problem for me is that the given math model is not obviously
belonging to the models ( as clas
Hi Sujit, and All,
Currently I lost in large difficulty, I am eager to get some help from you.
There is some big linear system of equations as:Ax = b, A with N number of row
and N number of column, N is very large, b = [0, 0, ..., 0, 1]TThen, I will
sovle it to get x = [x1, x2, ..., xn]T.
The si
-dimensionality-reduction.html[2]
http://math.stackexchange.com/questions/458404/how-can-we-compute-pseudoinverse-for-any-matrix
On Fri, Oct 23, 2015 at 2:19 AM, Zhiliang Zhu wrote:
Hi Sujit, and All,
Currently I lost in large difficulty, I am eager to get some help from you.
There is some big linear
Dear All,
I have some program as below which makes me very much confused and inscrutable,
it is about multiple dimension linear regression mode, the weight / coefficient
is always perfect while the dimension is smaller than 4, otherwise it is wrong
all the time.Or, whether the LinearRegressionWi
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Sun, Oct 25, 2015 at 10:14 AM, Zhiliang Zhu
wrote:
> Dear All,
>
> I have some program as below which makes me very much confused and
> inscrutable, it is about multiple dimension linear regression mode, the
> weight /
:
Final w:
[0.999477672867,1.999748740578,3.500112393734,3.50011239377]
Thank you,Zhiliang
On Monday, October 26, 2015 10:25 AM, DB Tsai wrote:
Column 4 is always constant, so no predictive power resulting zero weight.
On Sunday, October 25, 2015, Zhiliang Zhu
On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu
wrote:
Hi DB Tsai,
Thanks very much for your kind help. I get it now.
I am sorry that there is another issue, the weight/coefficient result is
perfect while A is triangular matrix, however, while A is not triangular matrix
(but
ear equations? If
so, you can probably try breeze.
On Sun, Oct 25, 2015 at 9:10 PM, Zhiliang Zhu
wrote:
>
>
>
> On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu
> wrote:
>
>
> Hi DB Tsai,
>
> Thanks very much for your kind help. I get it now.
>
> I
e.g.
label = intercept + features dot weight
To get the result you want, you need to force the intercept to be zero.
Just curious, are you trying to solve systems of linear equations? If
so, you can probably try breeze.
On Sun, Oct 25, 2015 at 9:10 PM, Zhiliang Zhu
wrote:
>
>
>
> On Monda
Dear All,
I will program a small project by spark, and the run speed is big concern.
I have a question, since RDD is always big on the cluster, is it proper to make
RDD variable as parameter transferred during function call ?
Thank you,Zhiliang
Hi All,
There is some file with line number N + M,, as I need to read the first N lines
into one RDD .
1. i) read all the N + M lines as one RDD, ii) select the RDD's top N rows, may
be some one solution;2. if introduced some broadcast variable set N, then it is
used to decide while map the file
Dear All,
As for N dimension linear regression, while the labeled training points number
(or the rank of the labeled point space) is less than N, then from perspective
of math, the weight of the trained linear model may be not unique.
However, the output of model.weight() by spark may be with so
Dear All,
As I am facing some typical linear programming issue, and I know simplex method
is specific in solving LP question, I am very sorry that whether there is
already some mature package in spark about simplex method...
Thank you very much~Best Wishes!Zhiliang
, 2015 1:43 AM, Ted Yu wrote:
A brief search in code base shows the following:
TODO: Add simplex constraints to allow alpha in
(0,1)../mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala
I guess the answer to your question is no.
FYI
On Sun, Nov 1, 2015 at 9:37 AM, Zhiliang
Hi All,
I would like to filter some elements in some given RDD, only the needed left,
at the time the row number of the result RDD is smaller.
Then I select filter function, however, by test, filter function would only
accept Boolean type, that is to say, will only JavaRDDbe returned for
filter.
implex...if you want to
> use interior point method you can use ecos
> https://github.com/embotech/ecos-java-scala ...spark summit 2014 talk on
> quadratic solver in matrix factorization will show you example integration
> with spark. ecos runs as jni process in every executor.
>
>
to
> use interior point method you can use ecos
> https://github.com/embotech/ecos-java-scala ...spark summit 2014 talk on
> quadratic solver in matrix factorization will show you example integration
> with spark. ecos runs as jni process in every executor.
>
> On Nov 1, 2015 9:52 A
ntly, there is no open source
implementation in Spark.
Sincerely,
DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Sun, Nov 1, 2015 at 9:22 AM, Zhiliang Zhu wrote:
> Dear All,
>
> As for N dimension linear regress
ememt.
Where is the API or link site for the breeze quadratic minimizer integrated
with spark?And where is the breeze lpsolver...
Alternatively you can use breeze lpsolver as well that uses simplex from apache
math.
Thank you,Zhiliang
On Nov 4, 2015 1:05 AM, "Zhiliang Zhu" wrote:
Hi All,
I need debug spark job, my general way is to print out the log, however, some
bug is in spark functions as mapPartitions etc, and not any log printed from
those functionscould be found...Would you help point what is way to the log in
the spark own function as mapPartitions? Or, what is g
executor logs, which you can view via the
Spark UI, in the Executors page (stderr log).
HTH,Deng
On Tue, Nov 10, 2015 at 11:33 AM, Zhiliang Zhu
wrote:
Hi All,
I need debug spark job, my general way is to print out the log, however, some
bug is in spark functions as mapPartitions etc, and not any
Also for Spark UI , that is, log from other places could be found, but the log
from the functions as mapPartitions could not.
On Tuesday, November 10, 2015 11:52 AM, Zhiliang Zhu
wrote:
Dear Ching-Mallete ,
There are machines master01, master02 and master03 for the cluster, I
Hi Ching-Mallete,
I have found the log and the reason for that.
Thanks a lot!Zhiliang
On Tuesday, November 10, 2015 12:23 PM, Zhiliang Zhu
wrote:
Also for Spark UI , that is, log from other places could be found, but the
log from the functions as mapPartitions could not
Deng Ching-Mallete
wrote:
Hi Zhiliang,
You should be able to see them in the executor logs, which you can view via the
Spark UI, in the Executors page (stderr log).
HTH,Deng
On Tue, Nov 10, 2015 at 11:33 AM, Zhiliang Zhu
wrote:
Hi All,
I need debug spark job, my general way is to print o
As more test, the Function call by map/sortBy etc must be defined as static, or
it can be defined as non-static and must be called by other static normal
function.I am really confused by it.
On Tuesday, November 10, 2015 4:12 PM, Zhiliang Zhu
wrote:
Hi All,
I have met some bug
while new the Function obj, and in the Function inner class the inner
normal function can be called.
On Tuesday, November 10, 2015 5:12 PM, Zhiliang Zhu
wrote:
As more test, the Function call by map/sortBy etc must be defined as static,
or it can be defined as non-static and must
esRDD = n_lines.map(n => { //Read and return 5 lines (n._1) from
the file (n._2)
})
ThanksBest Regards
On Thu, Oct 29, 2015 at 9:51 PM, Zhiliang Zhu
wrote:
Hi All,
There is some file with line number N + M,, as I need to read the first N lines
into one RDD .
1. i) read all the N + M
Dear Jack,
As is known, Breeze is numerical calculation package wrote by scala , spark
mllib also use it as underlying package for algebra usage.Here I am also
preparing to use Breeze for nonlinear equation optimization, however, it seemed
that I could not find the exact doc or API for Breeze ex
Thursday, November 19, 2015 1:46 PM, Ted Yu wrote:
Have you looked athttps://github.com/scalanlp/breeze/wiki
Cheers
On Nov 18, 2015, at 9:34 PM, Zhiliang Zhu wrote:
Dear Jack,
As is known, Breeze is numerical calculation package wrote by scala , spark
mllib also use it as underlying package
Hi all,
I have some optimization problem, I have googled a lot but still did not get
the exact algorithm or third-party open package to apply in it.
Its type is like this,
Objective function: f(x1, x2, ..., xn) (n >= 100, and f may be linear or
non-linear)Constraint functions:
x1 + x2 + ... + x
looks
like, based on your line 3. However if the problem is non convex then it'll be
hard to solve in most cases.
On Thu, Nov 19, 2015, 9:42 AM 'Zhiliang Zhu' via All ADATAO Team Members
wrote:
Hi all,
I have some optimization problem, I have googled a lot but still did not get
Hi All,
I would like to compare any two adjacent elements in one given rdd, just as the
single machine code part:
int a[N] = {...};for (int i=0; i < N - 1; ++i) { compareFun(a[i], a[i+1]);}...
mapPartitions may work for some situations, however, it could not compare
elements in different parti
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu wrote:
> Hi All,
>
> I would like to compare any two adjacent elements in one given rdd, just as
> the single machine code part:
015 3:52 PM, Zhiliang Zhu
wrote:
Hi DB Tsai,
Thanks very much for your kind reply!
Sorry that for one more issue, as tested it seems that filter could only return
JavaRDD but not any JavaRDD , is it ?Then it is not much convenient
to do general filter for RDD, mapPartitions could work some,
: 0xAF08DF8D
On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu wrote:
> Hi All,
>
> I would like to compare any two adjacent elements in one given rdd, just as
> the single machine code part:
>
> int a[N] = {...};
> for (int i=0; i < N - 1; ++i) {
> compareFun(a[i], a[i+1]);
in advance!
Sincerely,
DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Sun, Dec 6, 2015 at 6:27 PM, Zhiliang Zhu wrote:
>
>
>
>
> On Saturday, December 5, 2015 3:00 PM, DB Tsai wrote:
>
>
> Thi
ious order among the elements,
and will it also not work ?
Thanks very much in advance!
On Monday, December 7, 2015 11:32 AM, Zhiliang Zhu
wrote:
On Monday, December 7, 2015 10:37 AM, DB Tsai wrote:
Only beginning and ending part of data. The rest in the partition can
b
Hi All,
I need to do optimize objective function with some linear constraints by
genetic algorithm. I would like to make as much parallelism for it by spark.
repartition / shuffle may be used sometimes in it, however, is repartition API
very cost ?
Thanks in advance!Zhiliang
you need to do
performance testing to see if a repartition is worth the shuffle time. A
common model is to repartition the data once after ingest to achieve
parallelism and avoid shuffles whenever possible later. From: Zhiliang Zhu
[mailto:zchl.j...@yahoo.com.INVALID]
Sent: Tuesday, Decembe
Dear ,
I have took lots of days to think into this issue, however, without any
success...I shall appreciate your all kind help.
There is an RDD rdd1, I would like get a new RDD rdd2, each row in
rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use...
Thanks very much!J
r of the items.
What exactly are you trying to accomplish?
Romi Kuntsman, Big Data Engineer
http://www.totango.com
On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu
wrote:
Dear ,
I have took lots of days to think into this issue, however, without any
success...I shall appreciate your all kind help.
The
onday, September 21, 2015 11:48 PM, Sujit Pal
wrote:
Hi Zhiliang,
Would something like this work?
val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))
-sujit
On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu
wrote:
Hi Romi,
Thanks very much for your kind help comment~~
In fact there is so
Zhiliang,
Would something like this work?
val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))
-sujit
On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu
wrote:
Hi Romi,
Thanks very much for your kind help comment~~
In fact there is some valid backgroud of the application, it is about R data
analy
Dear Romi, Priya, Sujt and Shivaram and all,
I have took lots of days to think into this issue, however, without any enough
good solution...I shall appreciate your all kind help.
There is an RDD rdd1, and another RDD rdd2,
(rdd2 can be PairRDD, or DataFrame with two columns as ).StringDate colum
gRDD.html
So maybe something like this:
new SlidingRDD(rdd1, 2, ClassTag$.apply(Class))
-sujit
On Mon, Sep 21, 2015 at 9:16 AM, Zhiliang Zhu wrote:
Hi Sujit,
I must appreciate your kind help very much~
It seems to be OK, however, do you know the corresponding spark Java API
achievement...Is there
join.
Does that make sense?
On Mon, Sep 21, 2015 at 8:37 PM Zhiliang Zhu wrote:
Dear Romi, Priya, Sujt and Shivaram and all,
I have took lots of days to think into this issue, however, without any enough
good solution...I shall appreciate your all kind help.
There is an RDD rdd1, and another RDD
Dear Sujit,
Since you are senior with Spark, I might not know whether it is convenient for
you to help comment some on my dilemma
while using spark to deal with R background application ...
Thank you very much!Zhiliang
On Tuesday, September 22, 2015 1:45 AM, Zhiliang Zhu
wrote
Dear Experts,
Spark job is running on the cluster by yarn. Since the job can be submited at
the place on the machine from the cluster,however, I would like to submit the
job from another machine which does not belong to the cluster.I know for this,
hadoop job could be done by way of another ma
HADOOP_CONF_DIR in spark to the
configuration.
Thanks
Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu wrote:
Dear Experts,
Spark job is running on the cluster by yarn. Since the job can be submited at
the place on the machine from the cluster,however, I would like to submit the
job from another
HADOOP_CONF_DIR in spark to the
configuration.
Thanks
Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu wrote:
Dear Experts,
Spark job is running on the cluster by yarn. Since the job can be submited at
the place on the machine from the cluster,however, I would like to submit the
job from another
the latter is used to launch application on top of yarn.
Then in the spark-env.sh, you add export HADOOP_CONF_DIR=/etc/hadoop/conf.
Thanks.
Zhan Zhang
On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu wrote:
Hi Zhan,
Yes, I get it now.
I have not ever deployed hadoop configuration locally, and do not
Hi All,
There are two RDDs : RDD> rdd1, and RDD> rdd2,that
is to say, rdd1 and rdd2 are similar with DataFrame, or Matrix with same row
number and column number.
I would like to get RDD> rdd3, each element in rdd3 is the
subtract between rdd1 and rdd2 of thesame position, which is similar Matr
there is matrix add API, might map rdd2 each row element to be negative , then
make rdd1 and rdd2 and call add ?
Or some more ways ...
On Wednesday, September 23, 2015 3:11 PM, Zhiliang Zhu
wrote:
Hi All,
There are two RDDs : RDD> rdd1, and RDD> rdd2,that
is to say, rd
On Wed, Sep 23, 2015 at 12:23 AM, Zhiliang Zhu wrote:
there is matrix add API, might map rdd2 each row element to be negative , then
make rdd1 and rdd2 and call add ?
Or some more ways ...
On Wednesday, September 23, 2015 3:11 PM, Zhiliang Zhu
wrote:
Hi All,
There are two RDDs
g
On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu wrote:
Hi Zhan,
Yes, I get it now.
I have not ever deployed hadoop configuration locally, and do not find the
specific doc, would you help provide the doc to do that...
Thank you,Zhiliang
On Wednesday, September 23, 2015 11:08 AM, Zhan Zhang
wrote:
And the remote machine is not in the same local area network with the cluster .
On Friday, September 25, 2015 12:28 PM, Zhiliang Zhu
wrote:
Hi Zhan,
I have done that as your kind help.
However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at
ote:
On 25 Sep 2015, at 05:25, Zhiliang Zhu wrote:
However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at
the remote machine with gateway,
which means the namenode is reachable; all those commands only need to interact
with it.
but commands "
It seems that is due to spark SPARK_LOCAL_IP setting.export
SPARK_LOCAL_IP=localhost
will not work.
Then, how it would be set.
Thank you all~~
On Friday, September 25, 2015 5:57 PM, Zhiliang Zhu
wrote:
Hi Steve,
Thanks a lot for your reply.
That is, some commands could work on
Hi all,
The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or just
set asexport SPARK_LOCAL_IP=localhost #or set as the specific node ip on
the specific spark install directory
It will work well to submit spark job on master node of cluster, however, it
will fail by way
On Friday, September 25, 2015 7:46 PM, Zhiliang Zhu
wrote:
Hi all,
The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or just
set asexport SPARK_LOCAL_IP=localhost #or set as the specific node ip on
the specific spark install directory
It will work well
Dear All,
I am building model by spark pipeline, and in the pipeline I used Random Forest
Alg as its stage.
If I just use Random Forest but not make it by way of pipeline, I could see the
information about the forest by API as
rfModel.toDebugString() and rfModel.toString() .
However, while it
, November 24, 2016 2:15 AM, Xiaomeng Wan
wrote:
You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The
number (0 in example) for stages depends on the order you call setStages.
Shawn
On 23 November 2016 at 10:21, Zhiliang Zhu wrote:
Dear All,
I am building model by spark
1 - 100 of 104 matches
Mail list logo