Congratulations Peter and Xidou.
On Sun, Aug 6, 2023, 7:05 PM Wenchen Fan wrote:
> Hi all,
>
> The Spark PMC recently voted to add two new committers. Please join me in
> welcoming them to their new role!
>
> - Peter Toth (Spark SQL)
> - Xiduo You (Spark SQL)
>
> They consistently make contribut
Congratulations Xinrong !
On Tue, Aug 9, 2022, 10:00 PM Rui Wang wrote:
> Congrats Xinrong!
>
>
> -Rui
>
> On Tue, Aug 9, 2022 at 8:57 PM Xingbo Jiang wrote:
>
>> Congratulations!
>>
>> Yuanjian Li 于2022年8月9日 周二20:31写道:
>>
>>> Congratulations, Xinrong!
>>>
>>> XiDuo You 于2022年8月9日 周二19:18写道:
>>
Congratulations to the whole spark community ! It's a great achievement.
On Sat, May 14, 2022, 2:49 AM Yikun Jiang wrote:
> Awesome! Congrats to the whole community!
>
> On Fri, May 13, 2022 at 3:44 AM Matei Zaharia
> wrote:
>
>> Hi all,
>>
>> We recently found out that Apache Spark received
>>
Hi,
ECOS is a solver for second order conic programs and we showed the Spark
integration at 2014 Spark Summit
https://spark-summit.org/2014/quadratic-programing-solver-for-non-negative-matrix-factorization/.
Right now the examples show how to reformulate matrix factorization as a
SOCP and solve ea
If you can point me to previous benchmarks that are done, I would like to
use smoothing and see if the LBFGS convergence improved while not impacting
linear svc loss.
Thanks.
Deb
On Dec 16, 2017 7:48 PM, "Debasish Das" wrote:
Hi Weichen,
Traditionally svm are solved using quadratic p
re that proves changing max to soft-max can behave
> well?
> I’m more than happy to see some benchmarks if you can have.
>
> + Yuhao, who did similar effort in this PR: https://github.com/apache/
> spark/pull/17862
>
> Regards
> Yanbo
>
> On Dec 13, 2017, at 12:20 A
Hi,
I looked into the LinearSVC flow and found the gradient for hinge as
follows:
Our loss function with {0, 1} labels is max(0, 1 - (2y - 1) (f_w(x)))
Therefore the gradient is -(2y - 1)*x
max is a non-smooth function.
Did we try using ReLu/Softmax function and use that to smooth the hinge
los
+1
Is there any design doc related to API/internal changes ? Will CP be the
default in structured streaming or it's a mode in conjunction with
exisiting behavior.
Thanks.
Deb
On Nov 1, 2017 8:37 AM, "Reynold Xin" wrote:
Earlier I sent out a discussion thread for CP in Structured Streaming:
ht
Thanks Cody for bringing up a valid point...I picked up Spark in 2014 as
soon as I looked into it since compared to writing Java map-reduce and
Cascading code, Spark made writing distributed code fun...But now as we
went deeper with Spark and real-time streaming use-case gets more
prominent, I thin
Decoupling mlllib and core is difficult...it is not intended to run spark
core 1.5 with spark mllib 1.6 snapshot...core is more stabilized due to new
algorithms getting added to mllib and sometimes you might be tempted to do
that but its not recommend.
On Nov 21, 2015 8:04 PM, "Reynold Xin" wrote:
Rdd nesting can lead to recursive nesting...i would like to know the
usecase and why join can't support it...you can always expose an api over a
rdd and access that in another rdd mappartition...use a external data
source like hbase cassandra redis to support the api...
For ur case group by and th
t;
>
> Graphically, the access path is as follows:
>
>
>
> Spark SQL JDBC Interface -> Spark SQL Parser/Analyzer/Optimizer->Astro
> Optimizer-> HBase Scans/Gets -> … -> HBase Region server
>
>
>
>
>
> Regards,
>
>
>
> Yan
>
>
>
Hi Yan,
Is it possible to access the hbase table through spark sql jdbc layer ?
Thanks.
Deb
On Jul 22, 2015 9:03 PM, "Yan Zhou.sc" wrote:
> Yes, but not all SQL-standard insert variants .
>
>
>
> *From:* Debasish Das [mailto:debasish.da...@gmail.com]
> *Sent:* Wedn
unt instances.
>
>
> On Sun, Jul 26, 2015 at 9:19 AM, Debasish Das
> wrote:
> > Yeah, I think the idea of confidence is a bit different than what I am
> > looking for using implicit factorization to do document clustering.
> >
> > I basically need (r_ij - w_ih_j)^
10x more
> than the latter. It's very heavily skewed to pay attention to the
> high-count instances.
>
>
> On Sun, Jul 26, 2015 at 9:19 AM, Debasish Das
> wrote:
> > Yeah, I think the idea of confidence is a bit different than what I am
> > looking for using
I will think further but in the current implicit formulation with
confidence, looks like I am factorizing a 0/1 matrix with weights 1 +
alpha*rating for observed (1) values and 1 for unobserved (0) values. It's
a bit different from LSA model.
>> On Sun, Jul 26, 2015 at 6:45 AM, D
Instead the rating matrix
> is the thing being factorized directly.
>
> On Sun, Jul 26, 2015 at 6:45 AM, Debasish Das
> wrote:
> > Hi,
> >
> > Implicit factorization is important for us since it drives recommendation
> > when modeling user click/no-click and
Hi,
Implicit factorization is important for us since it drives recommendation
when modeling user click/no-click and also topic modeling to handle 0
counts in document x word matrices through NMF and Sparse Coding.
I am a bit confused on this code:
val c1 = alpha * math.abs(rating)
if (rating > 0
Does it also support insert operations ?
On Jul 22, 2015 4:53 PM, "Bing Xiao (Bing)" wrote:
> We are happy to announce the availability of the Spark SQL on HBase
> 1.0.0 release.
> http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
>
> The main features in this package, dubbed “As
Hi,
Akka cluster uses gossip protocol for Master election. The approach in
Spark right now is to use Zookeeper for high availability.
Interestingly Cassandra and Redis clusters are both using Gossip protocol.
I am not sure what is the default behavior right now. If the master dies
and zookeeper
, 2015 at 12:21 AM, Debasish Das
wrote:
> Hi,
>
> I have some impala created parquet tables which hive 0.13.2 can read fine.
>
> Now the same table when I want to read using Spark SQL 1.3 I am getting
> exception class exception that parquet.hive.serde.ParquetHiveSerde not
Hi,
The demo of end-to-end ML pipeline including the model server component at
Spark Summit was really cool.
I was wondering if the Model Server component is based upon Velox or it
uses a completely different architecture.
https://github.com/amplab/velox-modelserver
We are looking for an open s
Congratulations to All.
DB great work in bringing quasi newton methods to Spark !
On Wed, Jun 17, 2015 at 3:18 PM, Chester Chen wrote:
> Congratulations to All.
>
> DB and Sandy, great works !
>
>
> On Wed, Jun 17, 2015 at 3:12 PM, Matei Zaharia
> wrote:
>
>> Hey all,
>>
>> Over the past 1.5 m
Hi,
I have some impala created parquet tables which hive 0.13.2 can read fine.
Now the same table when I want to read using Spark SQL 1.3 I am getting
exception class exception that parquet.hive.serde.ParquetHiveSerde not
found.
I am assuming that hive somewhere is putting the parquet-hive-bundl
Hi,
We want to keep the model created and loaded in memory through Spark batch
context since blocked matrix operations are required to optimize on runtime.
The data is streamed in through Kafka / raw sockets and Spark Streaming
Context. We want to run some prediction operations with the streaming
In general for implicit feedback in als you have to do a blocked gram
matrix calculation which might not fit in graphx flow and lot of blocked
operations can be used...but if your loss is likelihood or kl divergence or
just simple sgd update rules and not least square then graphx idea makes
sense..
5:53 PM, "Joseph Bradley" wrote:
> That's a good question; I could imagine it being much more efficient if
> kept in a BlockMatrix and using BLAS2 ops.
>
> On Sat, May 23, 2015 at 8:09 PM, Debasish Das
> wrote:
>
>> Hi,
>>
>> What was the m
23, 2015 at 6:37 PM, Ted Yu wrote:
>
>> Pardon me.
>>
>> Please use '8192k'
>>
>> Cheers
>>
>> On Sat, May 23, 2015 at 6:24 PM, Debasish Das
>> wrote:
>>
>>> Tried "8mb"...still I am failing on the s
, May 23, 2015, Patrick Wendell wrote:
>>
>>> Yes - spark packages can include non ASF licenses.
>>>
>>> On Sat, May 23, 2015 at 6:16 PM, Debasish Das
>>> wrote:
>>> > Hi,
>>> >
>>> > Is it possible to add GPL/LGPL c
Hi,
What was the motivation to write power iteration clustering using graphx
and not a vector matrix multiplication over similarity matrix represented
as say coordinate matrix ?
We can use gemv in that flow to block the computation.
Over graphx can we do all k eigen vector computation together b
Tried "8mb"...still I am failing on the same error...
On Sat, May 23, 2015 at 6:10 PM, Ted Yu wrote:
> bq. it shuld be "8mb"
>
> Please use the above syntax.
>
> Cheers
>
> On Sat, May 23, 2015 at 6:04 PM, Debasish Das
> wrote:
>
>> Hi,
Hi,
Is it possible to add GPL/LGPL code on spark packages or it must be
licensed under Apache as well ?
I want to expose Professor Tim Davis's LGPL library for sparse algebra and
ECOS GPL library through the package.
Thanks.
Deb
Hi,
I am on last week's master but all the examples that set up the following
.set("spark.kryoserializer.buffer", "8m")
are failing with the following error:
Exception in thread "main" java.lang.IllegalArgumentException:
spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb.
loo
Hi,
For indexedrowmatrix and rowmatrix, both take RDD(vector)is it possible
that it has intermixed dense and sparse vectorbasically I am
considering a gemv flow when indexedrowmatrix has dense flag true, dot flow
otherwise...
Thanks.
Deb
I opened it up today but it should help you:
https://github.com/apache/spark/pull/6213
On Sat, May 16, 2015 at 6:18 PM, Chunnan Yao wrote:
> Hi all,
> Recently I've ran into a scenario to conduct two sample tests between all
> paired combination of columns of an RDD. But the networking load and
Hi,
We recently added ADMM based proximal algorithm in
breeze.optimize.proximal.NonlinearMinimizer which uses a combination of
BFGS and proximal algorithms (soft thresholding for L1 for example) to
solve large scale constrained optimization problem of form f(x) + g(z). Its
usage is similar to curr
as I see the result. I am not sure if it is
supported by public packages like graphlab or scikit but the plsa papers
show interesting results.
On Mar 30, 2015 2:31 PM, "Xiangrui Meng" wrote:
> On Wed, Mar 25, 2015 at 7:59 AM, Debasish Das
> wrote:
> > Hi Xiangrui,
> >
rmance hit we take from combining
> > binary & multiclass logistic loss/gradient. If it's not a big hit, then
> it
> > might be simpler from an outside API perspective to keep them in 1 class
> > (even if it's more complicated within).
> > Joseph
> >
Hi,
Right now LogisticGradient implements both binary and multi-class in the
same class using an if-else statement which is a bit convoluted.
For Generalized matrix factorization, if the data has distinct ratings I
want to use LeastSquareGradient (regression has given best results to date)
but if
that ALM will support MAP
(and may be KL divergence loss) with sparsity constraints (probability
simplex and bounds are fine for what I am focused at right now)...
Thanks.
Deb
On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das
wrote:
> There is a usability difference...I am not sure if recommenda
ind the JIRA to track this here: SPARK-6442
> <https://issues.apache.org/jira/browse/SPARK-6442>
>
> The design doc is here: http://goo.gl/sf5LCE
>
> We would very much appreciate your feedback and input.
>
> Best,
> Burak
>
> On Thu, Mar 19, 2015 at 3:06 PM,
nctions I need, that can be found in Breeze (and netlib-java). The same
> concerns are applicable to MLlib Vector.
>
> Best regards, Alexander
>
> 19.03.2015, в 14:16, "Debasish Das" написал(а):
>
> I think for Breeze we are focused on dot and dgemv right now (al
>
> Also, could someone please elaborate on the linalg.BLAS and Matrix? Are
> they going to be developed further, should in long term all developers use
> them?
>
> Best regards, Alexander
>
> 18.03.2015, в 23:21, "Debasish Das" написал(а):
>
> dgemm dg
dgemm dgemv and dot come to Breeze and Spark through netlib-java
Right now both in dot and dgemv Breeze does a extra memory allocate but we
already found the issue and we are working on adding a common trait that
will provide a sink operation (basically memory will be allocated by
user)...addi
Hi David,
We are stress testing breeze.optimize.proximal and nnls...if you are
cutting a release now, we will need another release soon once we get the
runtime optimizations in place and merged to breeze.
Thanks.
Deb
On Mar 15, 2015 9:39 PM, "David Hall" wrote:
> snapshot is pushed. If you ver
Any reason why the regularization path cannot be implemented using current
owlqn pr ?
We can change owlqn in breeze to fit your needs...
On Feb 24, 2015 3:27 PM, "Joseph Bradley" wrote:
> Hi Mike,
>
> I'm not aware of a "standard" big dataset, but there are a number
> available:
> * The YearPre
Hi,
Some of my jobs failed due to no space left on device and on those jobs I
was monitoring the shuffle space...when the job failed shuffle space did
not clean and I had to manually clean it...
Is there a JIRA already tracking this issue ? If no one has been assigned
to it, I can take a look.
T
7, 2015 at 4:10 PM, Debasish Das
> wrote:
> > It will be really help us if we merge it but I guess it is already
> diverged
> > from the new ALS...I will also take a look at it again and try update
> with
> > the new ALS...
> >
> > On Tue, Feb 17, 2015 at 3:2
d fit. For a general matrix factorization package, let's
> make a JIRA and move our discussion there. -Xiangrui
>
> On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das
> wrote:
> > Hi,
> >
> > I am bit confused on the mllib design in the master. I thought that core
> >
r
> pass on your PR today. -Xiangrui
>
> On Tue, Feb 10, 2015 at 8:01 AM, Debasish Das
> wrote:
> > Hi,
> >
> > Will it be possible to merge this PR to 1.3 ?
> >
> > https://github.com/apache/spark/pull/3098
> >
> > The batch prediction API
Hi,
I am bit confused on the mllib design in the master. I thought that core
algorithms will stay in mllib and ml will define the pipelines over the
core algorithm but looks like in master ALS is moved from mllib to ml...
I am refactoring my PR to a factorization package and I want to build it on
Hi,
Will it be possible to merge this PR to 1.3 ?
https://github.com/apache/spark/pull/3098
The batch prediction API in ALS will be useful for us who want to cross
validate on prec@k and MAP...
Thanks.
Deb
Congratulations !
Keep helping the community :-)
On Tue, Feb 3, 2015 at 5:34 PM, Denny Lee wrote:
> Awesome stuff - congratulations! :)
>
> On Tue Feb 03 2015 at 5:34:06 PM Chao Chen wrote:
>
> > Congratulations guys, well done!
> >
> > 在 15-2-4 上午9:26, Nan Zhu 写道:
> > > Congratulations!
> > >
protobuf comes from missing -Phadoop2.3
On Fri, Dec 12, 2014 at 2:34 PM, Sean Owen wrote:
>
> What errors do you see? protobuf errors usually mean you didn't build
> for the right version of Hadoop, but if you are using -Phadoop-2.3 or
> better -Phadoop-2.4 that should be fine. Yes, a stack trace
For CDH this works well for me...tested till 5.1...
./make-distribution -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn
-Phive -DskipTests
To build with hive thriftserver support for spark-sql
On Fri, Dec 12, 2014 at 1:41 PM, Ganelin, Ilya
wrote:
>
> Hi all – we’re running CDH 5.2 and would
a matrix A (i.e. computing
> AA^T, which is expensive).
>
> There is a JIRA to track handling (1) and (2) more efficiently than
> computing all pairs: https://issues.apache.org/jira/browse/SPARK-3066
>
>
>
> On Wed, Dec 10, 2014 at 2:44 PM, Debasish Das
> wrote:
>
>>
Hi,
It seems there are multiple places where we would like to compute row
similarity (accurate or approximate similarities)
Basically through RowMatrix columnSimilarities we can compute column
similarities of a tall skinny matrix
Similarly we should have an API in RowMatrix called rowSimilaritie
with Jellyfish code http://i.stanford.edu/hazy/victor/Hogwild/), will
reproduce the failure...
https://issues.apache.org/jira/browse/SPARK-4231
The failed job I will debug more and figure out the real cause. If needed I
will open up new JIRAs.
On Sun, Nov 23, 2014 at 9:50 AM, Debasish Das
wrote
-1 from me...same FetchFailed issue as what Hector saw...
I am running Netflix dataset and dumping out recommendation for all users.
It shuffles around 100 GB data on disk to run a reduceByKey per user on
utils.BoundedPriorityQueue...The code runs fine with MovieLens1m dataset...
I gave Spark 10
missing in training and appears in test, we can simply
> ignore it. -Xiangrui
>
> On Tue, Nov 18, 2014 at 6:59 AM, Debasish Das
> wrote:
> > Sean,
> >
> > I thought sampleByKey (stratified sampling) in 1.1 was designed to solve
> > the problem that randomSpl
t;
> I am not sure why your subtract does not work. I suspect it is because
> the values do not partition the same way, or they don't evaluate
> equality in the expected way, but I don't see any reason why. Tuples
> work as expected here.
>
> On Tue, Nov 18, 2014 at 4:32
Hi,
I have a rdd whose key is a userId and value is (movieId, rating)...
I want to sample 80% of the (movieId,rating) that each userId has seen for
train, rest is for test...
val indexedRating = sc.textFile(...).map{x=> Rating(x(0), x(1), x(2))
val keyedRatings = indexedRating.map{x => (x.produ
Andrew,
I put up 1.1.1 branch and I am getting shuffle failures while doing flatMap
followed by groupBy...My cluster memory is less than the memory I need and
therefore flatMap does around 400 GB of shuffle...memory is around 120 GB...
14/11/13 23:10:49 WARN TaskSetManager: Lost task 22.1 in stag
Hi,
I am noticing the first step for Spark jobs does a TimSort in 1.2
branch...and there is some time spent doing the TimSort...Is this assigning
the RDD blocks to different nodes based on a sort order ?
Could someone please point to a JIRA about this change so that I can read
more about it ?
Th
066
>
> The easiest case is when one side is small. If both sides are large,
> this is a super-expensive operation. We can do block-wise cross
> product and then find top-k for each user.
>
> Best,
> Xiangrui
>
> On Thu, Nov 6, 2014 at 4:51 PM, Debasish Das
> wrote:
> &
that we can calculate MAP statistics on
large samples of data ?
On Thu, Nov 6, 2014 at 4:41 PM, Xiangrui Meng wrote:
> ALS model contains RDDs. So you cannot put `model.recommendProducts`
> inside a RDD closure `userProductsRDD.map`. -Xiangrui
>
> On Thu, Nov 6, 2014 at 4:39 PM, Deba
e to cache the models to make userFeatures.lookup(user).head to
work ?
On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng wrote:
> Was "user" presented in training? We can put a check there and return
> NaN if the user is not included in the model. -Xiangrui
>
> On Mon, Nov 3,
+1
The app to track PRs based on component is a great idea...
On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara
wrote:
> +1
>
> Sean
>
> On Nov 5, 2014, at 6:32 PM, Matei Zaharia wrote:
>
> > Hi all,
> >
> > I wanted to share a discussion we've been having on the PMC list, as
> well as call for an
Hi,
I build the master today and I was testing IR statistics on movielens
dataset (open up a PR in a bit)...
Right now in the master examples.MovieLensALS, case class Params extends
AbstractParam[Params]
On my localhost spark, if I run as follows it fails:
./bin/spark-submit --master spark://
t
Hi,
I am testing MatrixFactorizationModel.predict(user: Int, product: Int) but
the code fails on userFeatures.lookup(user).head
In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has been
called and in all the test-cases that API has been used...
I can perhaps refactor my code to d
from Mailbox
> >
> >
> > On Thu, Oct 30, 2014 at 11:24 PM, Sean Owen wrote:
> >>
> >> MAP is effectively an average over all k from 1 to min(#
> >> recommendations, # items rated) Getting first recommendations right is
> >> more important than the
ion,
> and RankingMetrics will already do that as-is. I don't know that a
> confusion matrix for this binary classification does much.
>
>
> On Thu, Oct 30, 2014 at 9:41 PM, Debasish Das
> wrote:
> > I am working on it...I will open up a JIRA once I see some results..
&
any of the topic modeling
algorithms as well...
Is there a better place for it other than mllib examples ?
On Thu, Oct 30, 2014 at 8:13 AM, Debasish Das
wrote:
> I thought topK will save us...for each user we have 1xrank...now our movie
> factor is a RDD...we pick topK movie factors ba
here is a smarter
> way?)
>
> I wonder if it is possible to extend the DIMSUM idea to computing top K
> matrix multiply between the user and item factor matrices, as opposed to
> all-pairs similarity of one matrix?
>
> On Thu, Oct 30, 2014 at 5:28 AM, Debasish Das
> wrote:
&g
ree to add prec@k and ndcg@k to examples.MovielensALS. ROC
> should be good to add as well. -Xiangrui
>
>
> On Wed, Oct 29, 2014 at 11:23 AM, Debasish Das
> wrote:
> > Hi,
> >
> > In the current factorization flow, we cross validate on the test dataset
> > using the RMSE num
distribution over
> the 5 rating levels. Treating it as a binary classification problem or
> a ranking problem does make sense. The RankingMetricc is in master.
> Free free to add prec@k and ndcg@k to examples.MovielensALS. ROC
> should be good to add as well. -Xiangrui
>
>
> On W
Hi,
In the current factorization flow, we cross validate on the test dataset
using the RMSE number but there are some other measures which are worth
looking into.
If we consider the problem as a regression problem and the ratings 1-5 are
considered as 5 classes, it is possible to generate a confu
y adding web servers however you usually do.
>>
>> See graphflow too.
>> On Oct 18, 2014 5:06 PM, "Rajiv Abraham" wrote:
>>
>> > Oryx 2 seems to be geared for Spark
>> >
>> > https://github.com/OryxProject/oryx
>> >
>> &g
Hi,
Is someone working on a project on integrating Oryx model serving layer
with Spark ? Models will be built using either Streaming data / Batch data
in HDFS and cross validated with mllib APIs but the model serving layer
will give API endpoints like Oryx
and read the models may be from hdfs/impa
Hi,
I am validating the proximal algorithm for positive and bound constrained
ALS and I came across the bug detailed in the JIRA while running ALS with
NNLS:
https://issues.apache.org/jira/browse/SPARK-3987
ADMM based proximal algorithm came up with correct result...
Thanks.
Deb
Just checked, QR is exposed by netlib: import org.netlib.lapack.Dgeqrf
For the equality and bound version, I will use QR...it will be faster than
the LU that I am using through jblas.solveSymmetric...
On Thu, Oct 16, 2014 at 8:34 AM, Debasish Das
wrote:
> @xiangrui should we add this epsi
benchmarked that but I opted for QR in a different implementation and it
> has worked fine.
>
> Now I have to go hunt for how the QR decomposition is exposed in BLAS...
> Looks like its GEQRF which JBLAS helpfully exposes. Debasish you could try
> it for fun at least.
> On Oct 15,
ct 15, 2014 at 5:01 PM, Liquan Pei wrote:
> Hi Debaish,
>
> I think ||r - wi'hj||^{2} is semi-positive definite.
>
> Thanks,
> Liquan
>
> On Wed, Oct 15, 2014 at 4:57 PM, Debasish Das
> wrote:
>
>> Hi,
>>
>> If I take the Movielens data and run the
Hi,
If I take the Movielens data and run the default ALS with regularization as
0.0, I am hitting exception from LAPACK that the gram matrix is not
positive definite. This is on the master branch.
This is how I run it :
./bin/spark-submit --total-executor-cores 1 --master spark://
tusca09lmlvt00
Awesome news Matei !
Congratulations to the databricks team and all the community members...
On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia
wrote:
> Hi folks,
>
> I interrupt your regularly scheduled user / dev list to bring you some
> pretty cool news for the project, which is that we've been
N
> log4j.logger.kafka=WARN
> log4j.logger.akka=WARN
> log4j.logger.org.apache.spark=WARN
> log4j.logger.org.apache.spark.storage.BlockManager=ERROR
> log4j.logger.org.apache.zookeeper=WARN
> log4j.logger.org.eclipse.jetty=WARN
> log4j.logger.org.I0Itec.zkclient=WARN
>
> On Tue, Oct 7, 2014 at 7:42 PM, Deb
Hi,
I have added some changes to ALS tests and I am re-running tests as:
mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn
-DwildcardSuites=org.apache.spark.mllib.recommendation.ALSSuite test
I have some INFO logs in the code which I want to see on my console. They
work fine if I add print
I have done mvn clean several times...
Consistently all the mllib tests that are using
LocalClusterSparkContext.scala, they fail !
Hi,
Inside mllib I am running tests using:
mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn install
The locat tests run fine but cluster tests are failing..
LBFGSClusterSuite:
- task size should be small *** FAILED ***
org.apache.spark.SparkException: Job aborted due to stage failure
You should look into Evan Spark's talk from Spark Summit 2014
http://spark-summit.org/2014/talk/model-search-at-scale
I am not sure if some of it is already open sourced through MLBase...
On Mon, Sep 29, 2014 at 7:45 PM, Lochana Menikarachchi
wrote:
> Hi,
>
> Is there anyone who works on hyper
Thanks Christoph.
Are these numbers for mllib als implicit and explicit feedback on
movielens/netflix datasets documented on JIRA ?
On Sep 19, 2014 1:16 PM, "Christoph Sawade" <
christoph.saw...@googlemail.com> wrote:
> Hey Deb,
>
> NDCG is the "Normalized Discounted Cumulative Gain" [1]. Anothe
Hi Xiangrui,
Could you please point to some reference for calculating prec@k and ndcg@k ?
prec is precision I suppose but ndcg I have no idea about...
Thanks.
Deb
On Mon, Aug 25, 2014 at 12:28 PM, Xiangrui Meng wrote:
> The evaluation metrics are definitely useful. How do they differ from
>
ALS is using a bunch of off-heap memory?). You mentioned
> earlier in this thread that the property wasn't showing up in the
> Environment tab. Are you sure it's making it in?
>
> -Sandy
>
> On Tue, Sep 9, 2014 at 11:58 AM, Debasish Das
> wrote:
>
>> Hmm...I d
.
>
> -Sandy
>
> On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das
> wrote:
>
>> Hi Sandy,
>>
>> Any resolution for YARN failures ? It's a blocker for running spark on
>> top of YARN.
>>
>> Thanks.
>> Deb
>>
>> On Tue, Aug 19,
21 . We know that the
> container got killed by YARN because it used much more memory that it
> requested. But we haven't figured out the root cause yet.
>
> +Sandy
>
> Best,
> Xiangrui
>
> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das
> wrote:
> > Hi,
>
odeManager
> configuration, yarn.nodemanager.vmem-check-enabled is set to false.
>
> -Sandy
>
>
> On Wed, Aug 20, 2014 at 12:27 AM, Debasish Das
> wrote:
>
>> I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is
>> definitely a YARN related problem...
&
sai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Wed, Aug 20, 2014 at 3:19 PM, Debasish Das
> wrote:
> > Hi Patrick,
> >
> > Last few days I came across some bugs which got exposed due to ALS runs
> on
> > large scale data...although it
rk's actor system
> directly - it is an internal communication component in Spark and could
> e.g. be re-factored later to not use akka at all. Could you elaborate a bit
> more on your use case?
>
> - Patrick
>
>
> On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das
> wrot
Hi,
There have been some recent changes in the way akka is used in spark and I
feel they are major changes...
Is there a design document / JIRA / experiment on large datasets that
highlight the impact of changes (1.0 vs 1.1) ? Basically it will be great
to understand where akka is used in the cod
issue as described in
> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
> container got killed by YARN because it used much more memory that it
> requested. But we haven't figured out the root cause yet.
>
> +Sandy
>
> Best,
> Xiangrui
>
> O
1 - 100 of 206 matches
Mail list logo