how to add one more column in DataFrame

2016-05-16 Thread Zhiliang Zhu
Hi All, For the given DataFrame created by hive sql, however, then it is required to add one more column based on the existing column, and should also keep the previous columns there for the result DataFrame. final double DAYS_30 = 1000 * 60 * 60 * 24 * 30.0; //DAYS_30 seems difficult to call in

what is the wrong while adding one column in the dataframe

2016-05-16 Thread Zhiliang Zhu
Hi All, For the given DataFrame created by hive sql, however, then it is required to add one more column based on the existing column, and should also keep the previous columns there for the result DataFrame. final double DAYS_30 = 1000 * 60 * 60 * 24 * 30.0; //DAYS_30 seems difficult to call i

test - what is the wrong while adding one column in the dataframe

2016-06-16 Thread Zhiliang Zhu
On Tuesday, May 17, 2016 10:44 AM, Zhiliang Zhu wrote: Hi All, For the given DataFrame created by hive sql, however, then it is required to add one more column based on the existing column, and should also keep the previous columns there for the result DataFrame. final double

Re: test - what is the wrong while adding one column in the dataframe

2016-06-16 Thread Zhiliang Zhu
just for test, since it seemed that the user email system was something wrong ago, is okay now. On Friday, June 17, 2016 12:18 PM, Zhiliang Zhu wrote: On Tuesday, May 17, 2016 10:44 AM, Zhiliang Zhu wrote: Hi All, For the given DataFrame created by hive sql, however

spark job killed without rhyme or reason

2016-06-16 Thread Zhiliang Zhu
Hi All, I have a big job which mainly takes more than one hour to run the whole, however, it is very much unreasonable to exit & finish to run midway (almost 80% of the job finished actually, but not all), without any apparent error or exception log. I submitted the same job for many times, it i

Re: spark job automatically killed without rhyme or reason

2016-06-16 Thread Zhiliang Zhu
anyone ever met the similar problem, which is quite strange ...  On Friday, June 17, 2016 2:13 PM, Zhiliang Zhu wrote: Hi All, I have a big job which mainly takes more than one hour to run the whole, however, it is very much unreasonable to exit & finish to run midway (almost 80

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
n this situation, please check yarn userlogs for more information…   --WBR, Alexander   From: Zhiliang Zhu Sent: 17 июня 2016 г. 9:36 To: Zhiliang Zhu; User Subject: Re: spark job automatically killed without rhyme or reason   anyone ever met the similar problem, which is quite strange ...  On Frid

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
in advance~   On Friday, June 17, 2016 6:53 PM, Zhiliang Zhu wrote: Hi Alexander, Thanks a lot for your reply. Yes, submitted by yarn.Do you just mean in the executor log file by way of yarn logs -applicationId id,  in this file, both in some containers' stdout  and stderr : 16/06/

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
more information…   --WBR, Alexander   From: Zhiliang Zhu Sent: 17 июня 2016 г. 9:36 To: Zhiliang Zhu; User Subject: Re: spark job automatically killed without rhyme or reason   anyone ever met the similar problem, which is quite strange ...  On Friday, June 17, 2016 2:13 PM, Zhiliang Zhu w

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
currently     ... Thank you in advance~   On Friday, June 17, 2016 6:53 PM, Zhiliang Zhu wrote: Hi Alexander, Thanks a lot for your reply. Yes, submitted by yarn.Do you just mean in the executor log file by way of yarn logs -applicationId id,  in this file, both in some containers

Re: spark job automatically killed without rhyme or reason

2016-06-20 Thread Zhiliang Zhu
, Alexander   From: Zhiliang Zhu Sent: 17 июня 2016 г. 14:10 To: User; kp...@hotmail.com Subject: Re: spark job automatically killed without rhyme or reason   Show original message Hi Alexander, is your yarn userlog   just for the executor log ? as for those logs seem a little difficult to exactly

Re: spark job automatically killed without rhyme or reason

2016-06-22 Thread Zhiliang Zhu
lar spark.yarn.executor.memoryOverhead Everything else you mention is a symptom of YARN shutting down your jobs because your memory settings don't match what your app does. They're not problems per se, based on what you have provided. On Mon, Jun 20, 2016 at 9:17 AM, Zhiliang Zhu wrote: >

the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
Hi All,   Here we have one application, it needs to extract different columns from 6 hive tables, and then does some easy calculation, there is around 100,000 number of rows in each table,finally need to output another table or file (with format of consistent columns) .  However, after lots of d

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
the sql logic in the program is very much complex , so do not describe the detailed codes   here .  On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu wrote: Hi All,   Here we have one application, it needs to extract different columns from 6 hive tables, and then does some easy

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
u can use something like EXPLAIN command to show what going on.   On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu wrote: the sql logic in the program is very much complex , so do not describe the detailed codes   here .  On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu wrote: Hi All,   He

Re: Spark driver getting out of memory

2016-07-18 Thread Zhiliang Zhu
try to set --drive-memory xg , x would be as large as can be set .  On Monday, July 18, 2016 6:31 PM, Saurav Sinha wrote: Hi, I am running spark job. Master memory - 5Gexecutor memort 10G(running on 4 node) My job is getting killed as no of partition increase to 20K. 16/07/18 14:53:13 I

the spark job is so slow during shuffle - almost frozen

2016-07-18 Thread Zhiliang Zhu
clue is also good.  Thanks in advance~ On Tuesday, July 19, 2016 11:05 AM, Zhiliang Zhu wrote: Show original message Hi Mungeol, Thanks a lot for your help. I will try that. On Tuesday, July 19, 2016 9:21 AM, Mungeol Heo wrote: Try to run a action at a Intermediate stage of

Re: the spark job is so slow - almost frozen

2016-07-20 Thread Zhiliang Zhu
work, and editing the query, then retesting, repeatedly until you cut the execution time by a significant fraction- Using the Spark UI or spark shell to check the skew and make sure partitions are evenly distributed On Jul 18, 2016, at 3:33 AM, Zhiliang Zhu wrote: Thanks a lot for your reply . In

spark ml : auc on extreme distributed data

2016-08-14 Thread Zhiliang Zhu
Hi All,  Here I have lot of data with around 1,000,000 rows, 97% of them are negative class and 3% of them are positive class .  I applied Random Forest algorithm to build the model and predict the testing data. For the data preparation,i. firstly randomly split all the data as training data and

Re: Inverse of the matrix

2015-12-11 Thread Zhiliang Zhu
use matrix SVD decomposition  and spark has the lib . http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html#singular-value-decomposition-svd   On Thursday, December 10, 2015 7:33 PM, Arunkumar Pillai wrote: Hi I need to find inverse (X(Transpose) * X) matrix. I hav

rdd only with one partition

2015-12-21 Thread Zhiliang Zhu
Dear All, For some rdd, while there is just one partition, then the operation & arithmetic would only be single, the rdd has lose all the parallelism benefit from spark  system ... Is it exactly like that? Thanks very much in advance!Zhiliang

number limit of map for spark

2015-12-21 Thread Zhiliang Zhu
Dear All, I need to iterator some job / rdd quite a lot of times, but just lost in the problem of spark only accept to call around 350 number of map before it meets one action Function , besides, dozens of action will obviously increase the run time.Is there any proper way ... As tested, there i

[Beg for help] spark job with very low efficiency

2015-12-21 Thread Zhiliang Zhu
l be big (for example more than 1000 )  ...  Thanks in advance!Zhiliang   On Monday, December 21, 2015 7:44 PM, Zhiliang Zhu wrote: Dear All, I need to iterator some job / rdd quite a lot of times, but just lost in the problem of spark only accept to call around 350 number of map before

Re: rdd only with one partition

2015-12-21 Thread Zhiliang Zhu
ering[T] = null) Cheers On Mon, Dec 21, 2015 at 2:47 AM, Zhiliang Zhu wrote: Dear All, For some rdd, while there is just one partition, then the operation & arithmetic would only be single, the rdd has lose all the parallelism benefit from spark  system ... Is it exactly like that? Thanks

Re: number limit of map for spark

2015-12-21 Thread Zhiliang Zhu
, you can collapse all these functions into one, right? In the meantime, it is not recommended to collectall data to driver. Thanks. Zhan Zhang On Dec 21, 2015, at 3:44 AM, Zhiliang Zhu wrote: Dear All, I need to iterator some job / rdd quite a lot of times, but just lost in the problem of spark

Re: rdd only with one partition

2015-12-21 Thread Zhiliang Zhu
in rdd0 ?That way you can increase the parallelism. Cheers On Mon, Dec 21, 2015 at 9:40 AM, Zhiliang Zhu wrote: Hi Ted, Thanks a lot for your kind reply. I needs to convert this rdd0 into another rdd1, rows of  rdd1 are generated from rdd0's row randomly combination operation.From

Re: number limit of map for spark

2015-12-21 Thread Zhiliang Zhu
all these functions into one, right? In the meantime, it is not recommended to collectall data to driver. Thanks. Zhan Zhang On Dec 21, 2015, at 3:44 AM, Zhiliang Zhu wrote: Dear All, I need to iterator some job / rdd quite a lot of times, but just lost in the problem of spark only accept to call

Re: number limit of map for spark

2015-12-21 Thread Zhiliang Zhu
n? If so, how do they depend on?I think either you can optimize your implementation, or Spark is not the right one for your specific application. Thanks. Zhan Zhang  On Dec 21, 2015, at 10:43 AM, Zhiliang Zhu wrote: What is difference between repartition  / collect and   collapse ...Is collapse

Re: [Beg for help] spark job with very low efficiency

2015-12-21 Thread Zhiliang Zhu
more costlythan collect ... Hopefully you are using the Kryo serializer already. This would be all right.  From your experience , is Kryo improve efficiency obviously ...  RegardsSab On Mon, Dec 21, 2015 at 5:51 PM, Zhiliang Zhu wrote: Dear All. I have some kind of  iteration job, that is

what is the proper number set about --num-executors etc

2015-12-31 Thread Zhiliang Zhu
In order to make job run faster, some parameters would be specified in the command lines, such as --executor-cores , --executor-memory and --num-executors ... However, as tested, it seemed that those numbers would not be reset randomly, or some trouble would be caused for the cluster.What is mor

copy/mv hdfs file to another directory by spark program

2016-01-04 Thread Zhiliang Zhu
For some file on hdfs, it is necessary to copy/move it to some another specific hdfs  directory, and the directory name would keep unchanged.Just need finish it in spark program, but not hdfs commands.Is there any codes, it seems not to be done by searching spark doc ... Thanks in advance! 

How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Zhiliang Zhu
for some other reasons... This issue is urgent for me, would some expert provide some help about this problem... I will show sincere appreciation towards your help. Thank you!Best Regards,Zhiliang On Friday, September 25, 2015 7:53 PM, Zhiliang Zhu wrote: Hi all, The spark job will

How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Zhiliang Zhu
Hi All, I would like to submit spark job on some another remote machine outside the cluster,I also copied hadoop/spark conf files under the remote machine, then hadoopjob would be submitted, but spark job would not. In spark-env.sh, it may be due to that SPARK_LOCAL_IP is not properly set,or for

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Zhiliang Zhu
on linux command side? Best Regards,Zhiliang On Saturday, September 26, 2015 10:07 AM, Gavin Yue wrote: Print out your env variables and check first  Sent from my iPhone On Sep 25, 2015, at 18:43, Zhiliang Zhu wrote: Hi All, I would like to submit spark job on some another

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-27 Thread Zhiliang Zhu
where the cluster's resource manager is. I think this tutorial is pretty clear: http://spark.apache.org/docs/latest/running-on-yarn.html On Fri, Sep 25, 2015 at 7:11 PM, Zhiliang Zhu wrote: Hi Yue, Thanks very much for your kind reply. I would like to submit spark job remotely on anoth

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-27 Thread Zhiliang Zhu
Hi All, Would some expert help me some about the issue... I shall appreciate you kind help very much! Thank you!   Zhiliang  On Sunday, September 27, 2015 7:40 PM, Zhiliang Zhu wrote: Hi Alexis, Gavin, Thanks very much for your kind comment.My spark command is : spark-submit

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-28 Thread Zhiliang Zhu
anager is. I think this tutorial is pretty clear: http://spark.apache.org/docs/latest/running-on-yarn.html On Fri, Sep 25, 2015 at 7:11 PM, Zhiliang Zhu wrote: Hi Yue, Thanks very much for your kind reply. I would like to submit spark job remotely on another machine outside the cluster,and

[Spark ML] How to extends MLlib's optimization algorithm

2015-10-15 Thread Zhiliang Zhu
Dear All, I would like to use spark ml to develop some project related with optimization algorithm, however, in spark 1.4.1 it seems that under ml's optimizer there are only about 2 optimization algorithms. My project may needs more kinds of optimization algorithms, then how would I use spark ml

[Spark MLlib] How to apply spark ml given models for questions with general background

2015-10-19 Thread Zhiliang Zhu
Dear All, I am new for spark ml. There is some project for me, for some given math model and I would like to get its optimized solution.It is very similar with spark mllib application. However, the key problem for me is that the given math model is not obviously belonging to the models ( as clas

How to get inverse Matrix / RDD or how to solve linear system of equations

2015-10-23 Thread Zhiliang Zhu
Hi Sujit, and All, Currently I lost in large difficulty, I am eager to get some help from you. There is some big linear system of equations as:Ax = b,  A with N number of row and N number of column, N is very large, b = [0, 0, ..., 0, 1]TThen, I will sovle it to get x = [x1, x2, ..., xn]T. The si

Re: How to get inverse Matrix / RDD or how to solve linear system of equations

2015-10-23 Thread Zhiliang Zhu
-dimensionality-reduction.html[2] http://math.stackexchange.com/questions/458404/how-can-we-compute-pseudoinverse-for-any-matrix On Fri, Oct 23, 2015 at 2:19 AM, Zhiliang Zhu wrote: Hi Sujit, and All, Currently I lost in large difficulty, I am eager to get some help from you. There is some big linear

[SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu
Dear All, I have some program as below which makes me very much confused and inscrutable, it is about multiple dimension linear regression mode, the weight / coefficient is always perfect while the dimension is smaller than 4, otherwise it is wrong all the time.Or, whether the LinearRegressionWi

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu
-- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Oct 25, 2015 at 10:14 AM, Zhiliang Zhu wrote: > Dear All, > > I have some program as below which makes me very much confused and > inscrutable, it is about multiple dimension linear regression mode, the > weight /

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu
: Final w: [0.999477672867,1.999748740578,3.500112393734,3.50011239377]   Thank you,Zhiliang On Monday, October 26, 2015 10:25 AM, DB Tsai wrote: Column 4 is always constant, so no predictive power resulting zero weight. On Sunday, October 25, 2015, Zhiliang Zhu

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu
On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu wrote: Hi DB Tsai, Thanks very much for your kind help. I  get it now. I am sorry that there is another issue, the weight/coefficient result is perfect while A is triangular matrix, however, while A is not triangular matrix (but

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-26 Thread Zhiliang Zhu
ear equations? If so, you can probably try breeze. On Sun, Oct 25, 2015 at 9:10 PM, Zhiliang Zhu wrote: > > > > On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu > wrote: > > > Hi DB Tsai, > > Thanks very much for your kind help. I  get it now. > > I

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-26 Thread Zhiliang Zhu
e.g. label = intercept + features dot weight To get the result you want, you need to force the intercept to be zero. Just curious, are you trying to solve systems of linear equations? If so, you can probably try breeze. On Sun, Oct 25, 2015 at 9:10 PM, Zhiliang Zhu wrote: > > > > On Monda

is it proper to make RDD as function parameter in the codes

2015-10-27 Thread Zhiliang Zhu
Dear All, I will program a small project by spark, and the run speed is big concern. I have a question, since RDD is always big on the cluster, is it proper to make RDD variable as parameter transferred during function call ? Thank you,Zhiliang

How to properly read the first number lines of file into a RDD

2015-10-29 Thread Zhiliang Zhu
Hi All, There is some file with line number N + M,, as I need to read the first N lines into one RDD . 1. i) read all the N + M lines as one RDD, ii) select the RDD's top N rows, may be some one solution;2. if introduced some broadcast variable set N, then it is used to decide while map the file

[Spark MLlib] about linear regression issue

2015-11-01 Thread Zhiliang Zhu
Dear All, As for N dimension linear regression, while the labeled training points number (or the rank of the labeled point space) is less than N, then from perspective of math, the weight of the trained linear model may be not unique.  However, the output of model.weight() by spark may be with so

apply simplex method to fix linear programming in spark

2015-11-01 Thread Zhiliang Zhu
Dear All, As I am facing some typical linear programming issue, and I know simplex method is specific in solving LP question, I am very sorry that whether there is already some mature package in spark about simplex method... Thank you very much~Best Wishes!Zhiliang

Re: apply simplex method to fix linear programming in spark

2015-11-01 Thread Zhiliang Zhu
, 2015 1:43 AM, Ted Yu wrote: A brief search in code base shows the following:     TODO: Add simplex constraints to allow alpha in (0,1)../mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala I guess the answer to your question is no. FYI On Sun, Nov 1, 2015 at 9:37 AM, Zhiliang

spark filter function

2015-11-04 Thread Zhiliang Zhu
Hi All, I would like to filter some elements in some given RDD, only the needed left, at the time the row number of the result RDD is smaller. Then I select filter function, however, by test, filter function would only accept Boolean type, that is to say, will only JavaRDDbe returned for filter.

Re: apply simplex method to fix linear programming in spark

2015-11-04 Thread Zhiliang Zhu
implex...if you want to > use interior point method you can use ecos > https://github.com/embotech/ecos-java-scala ...spark summit 2014 talk on > quadratic solver in matrix factorization will show you example integration > with spark. ecos runs as jni process in every executor. > >

Re: apply simplex method to fix linear programming in spark

2015-11-04 Thread Zhiliang Zhu
to > use interior point method you can use ecos > https://github.com/embotech/ecos-java-scala ...spark summit 2014 talk on > quadratic solver in matrix factorization will show you example integration > with spark. ecos runs as jni process in every executor. > > On Nov 1, 2015 9:52 A

Re: [Spark MLlib] about linear regression issue

2015-11-04 Thread Zhiliang Zhu
ntly, there is no open source implementation in Spark. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Nov 1, 2015 at 9:22 AM, Zhiliang Zhu wrote: > Dear All, > > As for N dimension linear regress

Re: apply simplex method to fix linear programming in spark

2015-11-04 Thread Zhiliang Zhu
ememt. Where is the API or link site for the breeze quadratic minimizer integrated with spark?And where is the breeze lpsolver... Alternatively you can use breeze lpsolver as well that uses simplex from apache math. Thank you,Zhiliang  On Nov 4, 2015 1:05 AM, "Zhiliang Zhu" wrote:

could not see the print out log in spark functions as mapPartitions

2015-11-09 Thread Zhiliang Zhu
Hi All, I need debug spark job, my general way is to print out the log, however, some bug is in spark functions as mapPartitions etc, and not any log printed from those functionscould be found...Would you help point what is way to the log in the spark own function as mapPartitions? Or, what is g

Re: could not see the print out log in spark functions as mapPartitions

2015-11-09 Thread Zhiliang Zhu
executor logs, which you can view via the Spark UI, in the Executors page (stderr log). HTH,Deng On Tue, Nov 10, 2015 at 11:33 AM, Zhiliang Zhu wrote: Hi All, I need debug spark job, my general way is to print out the log, however, some bug is in spark functions as mapPartitions etc, and not any

Re: could not see the print out log in spark functions as mapPartitions

2015-11-09 Thread Zhiliang Zhu
Also for Spark UI , that  is, log from other places could be found, but the log from the functions as mapPartitions could not. On Tuesday, November 10, 2015 11:52 AM, Zhiliang Zhu wrote: Dear Ching-Mallete , There are machines master01, master02 and master03 for the cluster, I

Re: could not see the print out log in spark functions as mapPartitions

2015-11-09 Thread Zhiliang Zhu
Hi Ching-Mallete, I  have found the log and the reason for that. Thanks a lot!Zhiliang  On Tuesday, November 10, 2015 12:23 PM, Zhiliang Zhu wrote: Also for Spark UI , that  is, log from other places could be found, but the log from the functions as mapPartitions could not

static spark Function as map

2015-11-10 Thread Zhiliang Zhu
Deng Ching-Mallete wrote: Hi Zhiliang, You should be able to see them in the executor logs, which you can view via the Spark UI, in the Executors page (stderr log). HTH,Deng On Tue, Nov 10, 2015 at 11:33 AM, Zhiliang Zhu wrote: Hi All, I need debug spark job, my general way is to print o

could not understand issue about static spark Function (map / sortBy ...)

2015-11-10 Thread Zhiliang Zhu
As more test, the Function call by map/sortBy etc must be defined as static, or it can be defined as non-static and must be called by other static normal function.I am really confused by it. On Tuesday, November 10, 2015 4:12 PM, Zhiliang Zhu wrote: Hi All, I have met some bug

Re: could not understand issue about static spark Function (map / sortBy ...)

2015-11-10 Thread Zhiliang Zhu
while new the Function obj, and in the Function inner class the inner normal function can be called. On Tuesday, November 10, 2015 5:12 PM, Zhiliang Zhu wrote: As more test, the Function call by map/sortBy etc must be defined as static, or it can be defined as non-static and must

Re: How to properly read the first number lines of file into a RDD

2015-11-17 Thread Zhiliang Zhu
esRDD = n_lines.map(n => {     //Read and return 5 lines (n._1) from the file (n._2)      }) ​ ThanksBest Regards On Thu, Oct 29, 2015 at 9:51 PM, Zhiliang Zhu wrote: Hi All, There is some file with line number N + M,, as I need to read the first N lines into one RDD . 1. i) read all the N + M

Re: spark with breeze error of NoClassDefFoundError

2015-11-18 Thread Zhiliang Zhu
Dear Jack, As is known, Breeze is numerical calculation package wrote by scala , spark mllib also use it as underlying package for algebra usage.Here I am also preparing to use Breeze for nonlinear equation optimization, however, it seemed that I could not find the exact doc or API for Breeze ex

Re: spark with breeze error of NoClassDefFoundError

2015-11-18 Thread Zhiliang Zhu
Thursday, November 19, 2015 1:46 PM, Ted Yu wrote: Have you looked athttps://github.com/scalanlp/breeze/wiki Cheers On Nov 18, 2015, at 9:34 PM, Zhiliang Zhu wrote: Dear Jack, As is known, Breeze is numerical calculation package wrote by scala , spark mllib also use it as underlying package

what is algorithm to optimize function with nonlinear constraints

2015-11-19 Thread Zhiliang Zhu
Hi all, I have some optimization problem, I have googled a lot but still did not get the exact algorithm or third-party open package to apply in it. Its type is like this, Objective function: f(x1, x2, ..., xn)   (n >= 100, and f may be linear or non-linear)Constraint functions: x1 + x2 + ... + x

Re: what is algorithm to optimize function with nonlinear constraints

2015-12-01 Thread Zhiliang Zhu
looks like, based on your line 3. However if the problem is non convex then it'll be hard to solve in most cases. On Thu, Nov 19, 2015, 9:42 AM 'Zhiliang Zhu' via All ADATAO Team Members wrote: Hi all, I have some optimization problem, I have googled a lot but still did not get

the way to compare any two adjacent elements in one rdd

2015-12-04 Thread Zhiliang Zhu
Hi All, I would like to compare any two adjacent elements in one given rdd, just as the single machine code part: int a[N] = {...};for (int i=0; i < N - 1; ++i) {   compareFun(a[i], a[i+1]);}... mapPartitions may work for some situations, however, it could not compare elements in different  parti

Re: the way to compare any two adjacent elements in one rdd

2015-12-04 Thread Zhiliang Zhu
-- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu wrote: > Hi All, > > I would like to compare any two adjacent elements in one given rdd, just as > the single machine code part:

Re: the way to compare any two adjacent elements in one rdd

2015-12-05 Thread Zhiliang Zhu
015 3:52 PM, Zhiliang Zhu wrote: Hi DB Tsai, Thanks very much for your kind reply! Sorry that for one more issue, as tested it seems that filter could only return JavaRDD but not any JavaRDD , is it ?Then it is not much convenient to do general filter for RDD, mapPartitions could work some,

Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread Zhiliang Zhu
: 0xAF08DF8D On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu wrote: > Hi All, > > I would like to compare any two adjacent elements in one given rdd, just as > the single machine code part: > > int a[N] = {...}; > for (int i=0; i < N - 1; ++i) { >    compareFun(a[i], a[i+1]);

Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread Zhiliang Zhu
in advance! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Dec 6, 2015 at 6:27 PM, Zhiliang Zhu wrote: > > > > > On Saturday, December 5, 2015 3:00 PM, DB Tsai wrote: > > > Thi

what's the way to access the last element from another partition

2015-12-08 Thread Zhiliang Zhu
ious order among the elements, and will it also not work ? Thanks very much in advance!  On Monday, December 7, 2015 11:32 AM, Zhiliang Zhu wrote: On Monday, December 7, 2015 10:37 AM, DB Tsai wrote: Only beginning and ending part of data. The rest in the partition can b

is repartition very cost

2015-12-08 Thread Zhiliang Zhu
Hi All, I need to do optimize objective function with some linear constraints by   genetic algorithm. I would like to make as much parallelism for it by spark. repartition / shuffle may be used sometimes in it, however, is repartition API very cost ? Thanks in advance!Zhiliang

Re: is repartition very cost

2015-12-08 Thread Zhiliang Zhu
you need to do performance testing to see if a repartition is worth the shuffle time.   A common model is to repartition the data once after ingest to achieve parallelism and avoid shuffles whenever possible later.   From: Zhiliang Zhu [mailto:zchl.j...@yahoo.com.INVALID] Sent: Tuesday, Decembe

How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
Dear , I have took lots of days to think into this issue, however, without any success...I shall appreciate your all kind help. There is an RDD rdd1, I would like get a new RDD rdd2, each row in rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use... Thanks very much!J

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
r of the items. What exactly are you trying to accomplish? Romi Kuntsman, Big Data Engineer http://www.totango.com On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu wrote: Dear , I have took lots of days to think into this issue, however, without any success...I shall appreciate your all kind help. The

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
onday, September 21, 2015 11:48 PM, Sujit Pal wrote: Hi Zhiliang,  Would something like this work? val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0)) -sujit On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu wrote: Hi Romi, Thanks very much for your kind help comment~~ In fact there is so

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
Zhiliang,  Would something like this work? val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0)) -sujit On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu wrote: Hi Romi, Thanks very much for your kind help comment~~ In fact there is some valid backgroud of the application, it is about R data analy

how to get RDD from two different RDDs with cross column

2015-09-21 Thread Zhiliang Zhu
Dear Romi, Priya, Sujt and Shivaram and all, I have took lots of days to think into this issue, however, without  any enough good solution...I shall appreciate your all kind help. There is an RDD rdd1, and another RDD rdd2, (rdd2 can be PairRDD, or DataFrame with two columns as ).StringDate colum

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
gRDD.html So maybe something like this: new SlidingRDD(rdd1, 2, ClassTag$.apply(Class)) -sujit On Mon, Sep 21, 2015 at 9:16 AM, Zhiliang Zhu wrote: Hi Sujit, I must appreciate your kind help very much~ It seems to be OK, however, do you know the corresponding spark Java API achievement...Is there

Re: how to get RDD from two different RDDs with cross column

2015-09-21 Thread Zhiliang Zhu
join. Does that make sense? On Mon, Sep 21, 2015 at 8:37 PM Zhiliang Zhu wrote: Dear Romi, Priya, Sujt and Shivaram and all, I have took lots of days to think into this issue, however, without  any enough good solution...I shall appreciate your all kind help. There is an RDD rdd1, and another RDD

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-22 Thread Zhiliang Zhu
Dear Sujit, Since you are senior with Spark, I might not know whether it is convenient for you to help comment some on my dilemma while using spark to deal with R background application ... Thank you very much!Zhiliang On Tuesday, September 22, 2015 1:45 AM, Zhiliang Zhu wrote

how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
Dear Experts, Spark job is running on the cluster by yarn. Since the job can be submited at the place on the machine from the cluster,however, I would like to submit the job from another machine which does not belong to the cluster.I know for this, hadoop job could be done by way of another ma

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
 HADOOP_CONF_DIR in spark to the configuration. Thanks Zhan Zhang On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu wrote: Dear Experts, Spark job is running on the cluster by yarn. Since the job can be submited at the place on the machine from the cluster,however, I would like to submit the job from another

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
 HADOOP_CONF_DIR in spark to the configuration. Thanks Zhan Zhang On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu wrote: Dear Experts, Spark job is running on the cluster by yarn. Since the job can be submited at the place on the machine from the cluster,however, I would like to submit the job from another

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
the latter is used to launch application on top of yarn. Then in the spark-env.sh, you add export HADOOP_CONF_DIR=/etc/hadoop/conf.  Thanks. Zhan Zhang On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu wrote: Hi Zhan, Yes, I get it now. I have not ever deployed hadoop configuration locally, and do not

How to subtract two RDDs with same size

2015-09-23 Thread Zhiliang Zhu
Hi All, There are two RDDs :  RDD> rdd1, and RDD> rdd2,that is to say, rdd1 and rdd2 are similar with DataFrame, or Matrix with same row number and column number. I would like to get RDD> rdd3,  each element in rdd3 is the subtract between rdd1 and rdd2 of thesame position, which is similar Matr

Re: How to subtract two RDDs with same size

2015-09-23 Thread Zhiliang Zhu
there is matrix add API, might map rdd2 each row element to be negative , then make rdd1 and rdd2 and call add ? Or some more ways ... On Wednesday, September 23, 2015 3:11 PM, Zhiliang Zhu wrote: Hi All, There are two RDDs :  RDD> rdd1, and RDD> rdd2,that is to say, rd

Re: How to subtract two RDDs with same size

2015-09-23 Thread Zhiliang Zhu
On Wed, Sep 23, 2015 at 12:23 AM, Zhiliang Zhu wrote: there is matrix add API, might map rdd2 each row element to be negative , then make rdd1 and rdd2 and call add ? Or some more ways ... On Wednesday, September 23, 2015 3:11 PM, Zhiliang Zhu wrote: Hi All, There are two RDDs

Re: how to submit the spark job outside the cluster

2015-09-24 Thread Zhiliang Zhu
g On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu wrote: Hi Zhan, Yes, I get it now. I have not ever deployed hadoop configuration locally, and do not find the specific doc, would you help provide the doc to do that... Thank you,Zhiliang On Wednesday, September 23, 2015 11:08 AM, Zhan Zhang wrote:

Re: how to submit the spark job outside the cluster

2015-09-24 Thread Zhiliang Zhu
And the remote machine is not in the same local area network with the cluster . On Friday, September 25, 2015 12:28 PM, Zhiliang Zhu wrote: Hi Zhan, I have done that as your kind help. However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at

Re: how to submit the spark job outside the cluster

2015-09-25 Thread Zhiliang Zhu
ote: On 25 Sep 2015, at 05:25, Zhiliang Zhu wrote: However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at the remote machine with gateway,  which means the namenode is reachable; all those commands only need to interact with it. but commands "

Re: how to submit the spark job outside the cluster

2015-09-25 Thread Zhiliang Zhu
It seems that is due to spark  SPARK_LOCAL_IP setting.export SPARK_LOCAL_IP=localhost will not work. Then, how it would be set. Thank you all~~ On Friday, September 25, 2015 5:57 PM, Zhiliang Zhu wrote: Hi Steve, Thanks a lot for your reply. That is, some commands could work on

How to set spark envoirnment variable SPARK_LOCAL_IP in conf/spark-env.sh

2015-09-25 Thread Zhiliang Zhu
Hi all, The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or just set asexport  SPARK_LOCAL_IP=localhost    #or set as the specific node ip on the specific spark install directory It will work well to submit spark job on master node of cluster, however, it will fail by way

Re: How to set spark envoirnment variable SPARK_LOCAL_IP in conf/spark-env.sh

2015-09-25 Thread Zhiliang Zhu
On Friday, September 25, 2015 7:46 PM, Zhiliang Zhu wrote: Hi all, The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or just set asexport  SPARK_LOCAL_IP=localhost    #or set as the specific node ip on the specific spark install directory It will work well

how to see Pipeline model information

2016-11-23 Thread Zhiliang Zhu
Dear All, I am building model by spark pipeline, and in the pipeline I used Random Forest Alg as its stage. If I just use Random Forest but not make it by way of pipeline, I could see the information about the forest by API as rfModel.toDebugString() and rfModel.toString() . However, while it

Re: how to see Pipeline model information

2016-11-24 Thread Zhiliang Zhu
, November 24, 2016 2:15 AM, Xiaomeng Wan wrote: You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The number (0 in example) for stages depends on the order you call setStages. Shawn On 23 November 2016 at 10:21, Zhiliang Zhu wrote: Dear All, I am building model by spark

  1   2   >